How to Tame a Very Large ER Diagram Drawing Algorithms) Yannis Tzitzikas

Proceedings of 24th Int. Conf. on Conceptual Modeling, ER’2005
Klagenfurt, Austria, Oct. 2005
How to Tame a Very Large ER Diagram
(using Link Analysis and Force-Directed
Drawing Algorithms)
Yannis Tzitzikas1 and Jean-Luc Hainaut2
2
1
University of Crete and FORTH-ICS, Heraklion, Greece
Institut d’Informatique, University of Namur (F.U.N.D.P.), Belgium
email: [email protected], [email protected]
Abstract. Understanding a large schema without the assistance of persons already familiar with it (and its associated applications), is a hard
and very time consuming task that occurs very frequently in reverse engineering and in information integration. In this paper we describe a novel
method that can aid the understanding and the visualization of very
large ER diagrams that is inspired by the link analysis techniques that
are used in Web Searching. Specifically, this method takes as input an
ER diagram and returns a smaller (top-k) diagram that consists of the
major entity and relationship types of the initial diagram. Concerning
the drawing of the resulting top-k graphs in the 2D space, we propose a
force-directed placement algorithm especially adapted for ER diagrams.
Specifically, we describe and analyze experimentally two different force
models and various configurations. The experimental evaluation on large
diagrams of real world applications proved the effectiveness of this technique.
1
Prologue
It has been recognized long ago that the usefulness of conceptual diagrams (e.g.
ER/UML diagrams) degrades rapidly as they grow in size. Understanding a
large schema without the assistance of persons already familiar with it, is often a nightmare. Unfortunately, large conceptual schemas are becoming more
and more frequent. The integration of information systems, the development or
reverse engineering of large systems, the usage of ERP (the SAP database includes 30.000 tables) and the development of the Semantic Web (structured into
ontologies potentially including dozens of thousands of classes) naturally lead
to the building of very large schemas. Although a good drawing of a conceptual schema could aid its understanding, and several approaches for automatic
placement have been already proposed (e.g. see [33, 23, 8, 28, 10]), it is a widely
accepted opinion that the automatic layout facilities offered by current UMLbased CASE tools are not satisfactory even for very small diagrams (for more
1
see [14])1 . Consequently, the vast majority of layouts created today are done
”by hand”; a human designer makes most, if not all, of the decisions about the
position of the objects to be presented [26]. The visualization and drawing of
large conceptual graphs is even less explored. The classical hierarchical decomposition techniques that are used for visualizing large plain graphs (for a survey
see Chapter 3 of [29]), have not been applied or tested on conceptual graphs.
Consequently, only manual collapsing mechanisms (like those described in [22])
are currently available for decreasing the visual clutter and for aiding the understanding of big conceptual graphs. In addition, the techniques that have have
been proposed for reducing the size of a conceptual graph (in order to aid its
comprehension), specifically ER clustering, either require human input [15, 34,
18, 7], or they are automated but not tested on large conceptual schemas [30, 1].
We decided to devise an automatic technique for identifying the major entity
and relationship types of a very large conceptual graph as a means for facilitating its understanding. As Link Analysis has been proved very successful in Web
Searching [6, 25] and recently in several other application domains [19, 20], we
decided to design a similar in spirit technique for one very common and important kind of conceptual schemas, namely Entity-Relationship (ER) diagrams [9].
Concerning the drawing in the 2D space of the resulting top-k graphs, we describe a force-directed placement algorithm especially adapted for ER diagrams.
Specifically, we describe and analyze experimentally two different force models
and various configurations.
Both of the techniques that are presented in this paper can be applied not
only to ER diagrams, but also on other kinds of conceptual graphs. The Semantic Web is one interesting application area because it is founded on ontologies
(potentially including dozens of thousands of classes) that are exchanged in a
layout-missing format. In this context, the provision of top-k diagrams and automatic layout services is very important as they can aid understanding that
is very important for accomplishing tasks like semantic annotation, creation of
ontology mappings, ontology specialization, etc.
This paper is structured as follows: Section 2 describes Link Analysis for
ER diagrams, Section 3 introduces force-directed placement algorithms for ER
diagrams and finally, Section 4 concludes this paper.
2
Link Analysis for ER Diagrams
Our objective here is to identify the major entity and relationship types of a
very large ER diagram in order to facilitate its understanding. We designed
a PageRank [6] style scoring method because PageRank is described in terms
on the entire Web, while HITS [25] is mainly applied on small collections of
pages (say those retrieved in response to a query). We view an ER diagram as a
triple (E, R, I) where E = {e1 , ..., eN } denotes the entity types, R = {r1 , ..., rm }
denotes the relationship types, and I the isA relationships over E (i.e. I ⊆
1
General graph drawings algorithms (e.g. see [2]) usually make some assumptions
that are not always valid in conceptual graphs.
2
E × E). For any given e ∈ E, we shall use connR (e) to denote those entity
types that are connected with e through relationship types, connsb (e) denote
the direct subtypes of e, and connsp (e) the direct supertypes of e. We shall also
use the following shorthands: connI (e) = connsb (e) ∪ connsp (e) and conn(e) =
connR (e) ∪ connI (e). Since two entity types may be connected with more than
one relationship types we consider connR (e) as a bag for being able to record
duplicates. In addition, we shall use attrs(e) to denote the attributes of an entity
type e.
Now the score (or EntityRank) of an entity type e in E, denoted Sc(e), can
be defined as follows:
X
Sc(e) = q/N + (1 − q) ∗
e0 ∈connR (e)
Sc(e0 )
|connR (e0 )|
(1)
where q stands for a constant less than 1 (e.g. 0.15 as in the case of Google)2 . One
can easily see that the above formula simulates a random walk in the schema.
Under this view each relationship type is viewed as a bidirectional transition
and the probability of randomly jumping to an entity type is the same for all
entity types (i.e. q/N ). The resulting scores of the entity types correspond to
the stationary probabilities of the Markov chain.
A rising question here is how we can incorporate n-ary (n > 2) relationship
types into the aforementioned model. This can be achieved by replacing each nary relationship type (n > 2) over n entity types e1 , ..., en by n(n − 1)/2 binary
relationship types that form a complete graph3 over e1 , ..., en . Consequently, an
n-ary relationship type is viewed as n(n − 1)/2 binary relationships.
An alternative approach is to assume that the probability of jumping to a
random entity is not the same for all entities, but it depends on the number of
its attributes. In this case we define the score (or BEntityRank) of an entity type
e in E as follows:
Sc(e) = q
|attrs(e)|
+ (1 − q) ·
|Attr|
X
e0 ∈connR (e)
Sc(e0 )
|connR (e0 )|
(2)
where Attr denotes the set of all attributes of all entity types (i.e. Attr =
∪{ attrs(e) | e ∈ E}). This particular formula simulates a user navigating randomly in the schema who jumps to a random entity e with probability q|attrs(e)|
|Attr|
or follows a random relationship type (on the current entity). The probability
|attrs(e)|
corresponds to the probability of selecting e by clicking randomly on a
|Attr|
list that enumerates the attributes of all entity types of the schema.
The linear algebra version of EntityRank and BEntityRank is given in Appendix A.
2
3
If we set q = 0.15 or below then an iterative method for computing the scores (e.g.
the Jacobi method) requires at most 100 iterations to convergence.
A complete graph is a graph in which each pair of graph vertices is connected by an
edge.
3
Let’s now discuss the differences between link analysis for ER diagrams and
link analysis for the Web. Firstly, Web links are directed, while relationship
types are not directed thus the latter are considered as bidirectional transitions.
Secondly, we do not collapse all relationships types between two entity types
into one (as it is done with Web links), and this is the reason why we consider
connR as a bag. Thirdly, in ER diagrams we should count ”self hyperlinks”
(i.e. cyclic relationship types), although the Web techniques ignore them. For
example, consider a schema consisting of two entity types {Person, City} and
two relationship types {Person lives City, Person f atherOf Person}. If we
ignore the relationship type f atherOf , then both Person and City would be
equally scored, a not so good choice. At last, although in the Web link analysis
is exploited mainly for ranking the results of retrieval queries, in our case we
don’t need just a ranked list of entity types, but rather another diagram that
consists of the major entity and relationship types.
We have implemented and evaluated the above scoring schemes into the
DB − MAIN CASE tool (for more see [17, 21]). The designer provides a threshold
per between 0 and 100. Subsequently, all entity types with score lower than
per% ∗ ScM ax, where ScM ax denotes the highest score, disappear. Controlling
the visibility of entity types according to their score, and not according to their
rank, is preferred as it better handles ties. Concerning relationship types, only
those that connect the visible entity types are displayed. The computation of the
scores takes only some seconds on a conventional PC. Specifically, to compute the
scores we use the Jacobi iterative algorithm. We have noticed that 50 iterations
give quite stable orderings and their application on schemas with 1000 entity
and relationship types takes less than 2 seconds in a conventional PC.
colvatmsl/c olva tmsldet
fia /ori
1-1
0-N
RETOUR FOURNISSEUR
Numéro de réparation
Quantité re tournée c hez le fou
Date de cré ation ér pa ration
Date d'envoi aufournisse ur
PRIX:REFERENCE/CATALOGUE
page -seconda rie 1
page -seconda rie 2
page -seconda rie 3
c ode tva
tauxde tva francais
sur face
type d'offre
type de prix
PRIX:POSITION
PRIX: REFERENCE
tauxde pondera tion
c oloris
page livre bla nc
prxre f/1-1
prxre fcat ireference
ta lile
lbe0-N
le refe renc e
c ommenta rie
ref mere
libelle coloris
1-N
top1-1
a
libellé taille
topb prxre f/prxpos
libellé position
topc
Code positionmere
topd DUCATALOGUE
0-NatPRIX:CATALOGUE-RAYON
PRIX:P
OSITIONS
refc
0-N
1-1 prxraycat/prx
libelle rayo
n
1-1oscat
scann
t/perxrefcat
1-1
prxp
prixos/p
derxp
revi
etop
ntprx
preepo
visio
lf
se cteur
EXCEPTION( multi)p
1-1rése ntation
prix de ve ntetop
frafnca si
groupe
chiffre ca talogue
antiteqreuealise 1e/p
Nays ve nte
prix de ve ntequ
theori
CA TTCré alisé
antictheolreoqaliiqsueefra nce
topmultiple
prix de ve ntequ
psy
CA hors taxe ér alisé
hiffre cata ol gue
prixrevient previsionnelfra pri
n x de ve ntec ret
oscat/prxp
0-N
1-1
osexeenu
marge brute hors taxe ré alisé e
prixde vente francais prxp
prxpgec at/prxrefcat
quantité réa ilsée France
bf-prévisionnel calculé
quantité prévue fra nce quantité réa ilsée /pays vente
bf-réalisé
codifica tion
quantité prévue Fra nce
c a horsta xe prévisionnel
flagmodif position
0-N 1-1
marge brute hors taxe prévisio
flagblocage des prix
multiplic ateur ff prévisionne l
type de prix
codification
PRIX:PAGESDUCATALOGUE
page
FAMILLE PILE PILE/MARQUE 1-1
libe le de la page
famille Be bat Code marque
CA TTC réalisé
libellé fa mille BebCod
a t e pile dans la marque
CA hors taxe réalisé
Marge brute hors taxe réalisée
1-N
1-1
CA hors prx
taxe préc at/p
visionnel t
Marge brute pay
hors taxerxrefca
pré visio
pile, famille pile1-1 Correspo
PILE0-Nnd
prxpaycat/prxra yca t
PRIX:CATALOGUE
N¦ Be bat
Code catalogue
libellé Bebat
libelle catalogue
Code format pile
ac tionpays d'origine
Code I.E.C.pile
saisonpa ys d'origine
Poidsde la pile
Zone action
1-N
Ha uteur pile
Zone de prix
Longueur pile
durée cata ol gue e nmois
PRODUIT FOURNISSEUR
0-N Largeur pile
da te da' oj ut
OFFRE PRODUIT
référence fournisse ur
numé rooffre
libe lé produit fournisse ur
prxcat /prxpaycat%fraisde transpor t
date début validité
0-N
date finva lidité
%droits de douane
TYPE D'EMBALLAGE
montant minim um à comma nde r
%commissiond'agent
code type e mballage
quantité minimum à c ommander
%fraisdivers achat produit
libellé
date de transmission par fourn
unité de mesure
libellé de l'offre
MODE DECREDIT
nombre article par boite
type d'offre
c ode c rédit
quantité minimum à commander
désigna tionc rédit
montantminimum à c ommander
délais de livraison
texte bonde commande
nombre moisgarantie sur pièce
nombre moisde gara ntie sur pr
tauxd'importation
type de fournisseur
quantité /boite fournisseur
0-N
tauxinspec tion
TYPEADRESSE
c ode a dresse
descriptionadre sse
LIGNE PHOTO
nume rode ligne
Nombre d'article ssurla photo
c ode montage studio
topmonta ge fait
1-1
0-N
PRIX:CATALOGUE PARPAYS
taux de change
STATISTIQUESDE CONSOMMATION
taux de cession c ode sta tis tiques de consommat
TYPEARTICLESCORING pa rtie fixe
désigna tionsta tistiques de co
taux de cessionldf
Code type art/scoring
pa rtie fixe LDF
Libellé
diffusion
actionpa ys ve nte
saison pays vente
Code devise
1-1
prxray/prxref
TOPCLIENT
Code Top Client
Libellé Top clie nt
Top
1-1e r/détoper
prxray/prxrayca t
tra ductioncodes/type emballag
OFFREPOSITION
ligne offre
prix de base
prix de l'offre
prix da' cha t
unité de commande
Prixde revient prévisionnel
0-1
type d'offre
tauxristourne 1
tauxristourne 2 0-1
tauxristourne 3
taux0-N
ristourne 4
tauxristourne 5
frais d'emba lage
PRIX:RAYON
numéro offre
rayon
0-N
ilbe le rayon 0-N
secteur Ve nte à cr édit
group
e té e nde vise in
valeurmen
suali
nombre me nsua ilté s
valeura compte e ndevise ni ter
taegma ximum
taegrée l
nombre ma xterme s autorisé
e ch/photoa prvealeur
ndremensualité e nde vise ex
valeura compte e ndevise exte r
type de top 1-1
1-1
SUIVI DESTOPSSANSCOMMANDE
Nombre de Clients
1-1
1-1
suivi bnb
1-1
pays
1-1
REFERENCECONTIENT
nombre de piles
tva/ec h
SERVICE
c ode se rvice
désigna tionse vr ice
Topservice dans prixa rtic el
0-N
code fournisseur
Nom du of urnisseur
Numéro de TVA
numérode registre de comme rce
Code jour de livraison 1
Topte xte libre fournisseur
Délaide ilvraison
Va el ur minimum à commander
Date de dé but de vaca nce s
Date de finde vac anc es
type de fournisseur
Code jour de livraison 2
Code conditiond'envoi
Poids minimum à commander
Volume minimum à comma nder
Numero de l'a che teur
Delaide confirmationbonde c
nom fournisseur
code oj ur de livra sion
délais de livraisonmaximum
Code mode de paiement
Numéro compta ble dufournisseu
Code de vise
Toplivr aisonsé paré e
Topdema nde sé paré e
Topfournisseur pe rso
Code mode de dema nde
1-1
e stlivre
0-N
0-N
0-N
0-N
1-1 1-1
0-N
mode livraison/pa ys ech
pa rdéfaut
code tva
pa y/tpbsta
REFERENCE ARTICLE
Numérode ré férence article
Libellé franc aisde l'article
Code marque article
Code article enc ombrant
Code article LDF
Code tissu
Topsanspromotion
Code plage d'attr ibutionnumé r
Topproduit
Type article pourpromotion
Fournisseur pe rsonna lisation
Code statistique de c onsommati
Libellé née rla nda is de l'artic
Code de famille
Code taille groupée
Taille minimum
Code mode d'e mploi
Numérode l'acheteur
Code type d'e mballage
Topdesc ription te chnique
topfragile
Nombre de piece sdans le lot
topemballa ge
Toptexte pour conce ption
Toptexte condition SAAV
toptexte de contrôle
numéro d'a vis
topinstruc tionsaisie cde
topdesc riptionducolis
topinstruc tionpourle pic kin
toptexte
unitaire
catpopsrix
pay/p
ay
toptexte pour transporteur
toptexte pour showroom
Toppré montage
Refere nce produit decompose
toptyp/tpbcli
SUIVI DESTOPSCLIENT
Nombre de Clients
services proposés
quantité se rvice
Topmodification logistique
1-N
1-1
0-N
0-N
date dé but de validité
0-N
0-N
da te de finde validité
0-Nfrn/rc nlgn
est géré
0-N
1-1
0-N
0-N
ej t/tpbc il
date de réce ption
0-N
DATE
date
Numéro de sema ni e
1-1
0-N
0-N
e ststocké
0-N
SAISON
Code saison
Libe lé de la saison
Toptra ite mentpar fac turation
0-N
1-1
CONDITIONNEMENT
Code conditionneme nt
Libe lé c onditionneme nt
Cout unita ire
SUIVI COMMANDES
Numérode sé que nce suivic omma
Etat pour syste me de mesure
1-1
0-N
1-1
0-N
0-N
0-N
1-N
0-N
0-N
0-N
habitation/pa ys
1-1
Rés ervé e à cette ac tivité
frn/pay
0-N
0-N 0-N
0-N
0-N
catpospay/dev
0-N
pay/soprue
pay/tpbcli
0-N
pays de la devise
pays/quartier
0-N
fournisseur/pays
description
0-1
1-1
décrire
0-N
0-N
0-N
0-N
c alcul canal
Borne supérieur e code postal
Borne infé rie ure code postal
Notice
0-N
SECTEUR ACTIVITE
Code secteura ctivité
libellé secteurd'a ctivité
code enseigne 0-N
Unité sbudgé taire
Prix de ve nte/pay
de texte cible
Valeur ca talogue de l'aCo
rtide
Co
dec el(texte stimula tion
Valeur hors tva de l'article
de texte special
Valeur tva de l'a rtic elCo
(inter
Valeur tva de l'a rtic elCo
(edextertexte c alcul
Tauxde TVA
TopTVAdansprixcata ol gue
0-N
0-N
0-N
FRAISPAR DEFAUT
Code frais
Montantdes fraisde port ni te
Montantdes fraisde port exte
Ta uxde c hange
cn/pay
1-N
SUIVI ACTIVITE
Code application
Code mouve ment
Code type mouve ments
Cumulc ommande
Cumulc olis
Cumula rtic el
Cumulvaleur ca talogue
0-N
0-N
0-N
Ca deaua ssocié
0-N
0-N
0-N
LANGUE
code langue
libe lé c od0e-Nlang
ue
PAYSENS
libe EIGNE
le langueLANGUE
0-N Top la ngue pour le c lient
0-N
Poids max/c ana l
Poids maximum ducolis
Poids minim um ducolis
0-N
0-N
1-1
0-1
1-1
cn-cnd
0-N
c n/circuit
0-N
0-N
cn/c ntyp
TYPE PAIEMENT RECEPTION
c ode type paieme nt réc eption
libellécn/rla
type pa ie ment réceptio
0-1
0-1
LOGEMENT
c ode logeme nt
0-NType de logement
Code bloc age
Code état dulogement
Code pse udoniveau
Code état d'inventaire
Da te dernie rinve ntaire
Da te inve ntaire - 1
Da te inve ntaire - 2
Da te inve ntaire - 3
Da te inve ntaire - 4
1-1
0-N
Toppro
tec tionincendie
Code type de prélèvement
Code groupe de loge ment
0-N
e nvexd/lgt
1-N
0-1 1-N
0-1
ENVOI EXCEDENCE
Nombre d'articles
UNITEBUDGETAIRE
Nombre de préleve ments
Numérod'unité budgétaire
Type d'e nvoi excéde nce
Libellé 1-1
code état
CONTROLEnMOUVEM
ENTSnPAIEMENTS
umérode sélectio
Numéro contrôle mouveme nt
Da te de réc eption1b-1
ande
Nombre
de record
e
enve xd/envex
dpras sur band
Valeur
totale paiement
interne
0-N t
Code traitemen
DaATI
te deON
paiement
ENVOISEXEDENCE PREPAR
Ident ifica tionba nde paiement
Nombres d'articles
Nombre de prélèveme nts Use r colis
Prixde vente a exc edence Valeur totale paiement externe
Type d'envoise xce dence Topa rchivage
Numérode sé lection exc édence Valeur paiement anomalie inter
Valeur paiement anomalie exte r
Code etat
TYPEDE CANAL
code type de ca nal
libe lé dutype de canal[0-1]
topc an1-1
a lde c ommande(o/n)[0-1]
topc ana lde livraison(o/n)[0-1]
topsa isie commandes[0-1]
COMMENTAIRE
Numé rocommentaire
Libellé fra nca is commenta ire
Libellé néerlandais comme ntair
1-1
support duloge ment
0-N
0-N
TYPECIRCULAIRE
Code type circula ire
Libellé type circula ire
Code ca talogue/cade au
Topbond'ac hat
Topristourne
Topà payer
PARTICULARITE MOUVEMENT
Type de mouvement
Code pa rtic ularité mouvement
Date débutvalidité
Date f in validité
Libellé pa rtic ularité
cn/controle goulotte
Stat par canal
cnNu
/csamé rode niveaude statistiqu
PROMOTION D'UNE COMMANDE
Réponse a uconcours
CANAL/FERM
NombETURE
re de promotions
Code é tat
Valeur promotioninterne
Valeur promotione xterne
0-N
0-N
1-N
0-N
0-N
FRAISD'EXPEDITION
Poids maximum frais d'expéditi
Poids minimum fraise xpédition
Va el ur frais de' xpédition
0-1
0-N
1-1
0-N
1-N
1-N
1-1
rs
0-1 concou
0-1
1-1
1-N
0-N
c n/cnla n
tpd/tpdliv
c n/colsta
SUPPORT DESTOCKAGE COLISSANSADRESSES CONTROLE CALCUL GOULOTTE
GRP
-PRM#C PROMOTION
c ode c anal
Code support de stocka ge Composteur
RES
ERVATION
Libe lé fra nca is dusupport Date de retour
topsec teur ac tivité goREGR
ulotteOUPEMENT STATISTIQUE
code promotio0-N
n
TYPE CENTREDISPATCHING
Longueurdusupport
topmode de réceptionCod
gouelotte
regroupement
Texte
de RM#
la Dé
promoti
DET-P
Ctail onde reser
Code type ce ntr e dis patching
opéra tric e d'ouverture
toppays pourgoulotteLibellé regroupeme nt
La rgeur du support
Début d0e-N
va lidité
Pays
Hauteur dusupport
c ode c analdéviationgoulotte
Finde va lidité
Code postal
CONTROLECIRCUITPoids à vide
c ode c analdéviationtourné e
TopSans commande pr évu
1-N
REVALORISATION
Numérode localité
Code circuit
Actionpré vue
Nombre de logements contigCo
us mmenta ir e oucolis retrouvé
FERMETURE RELAIS
Numérode reva lorisation
TYPE FRAISD'EXPEDITION
Nombre de logements opposés
code canal
Da te reva lorisation
Créate ur commentaire c sa/csade t
c nd-cndtyp Libellé circuit
Code type fraisd'e xpédition
Stock ava nt
1-N
Date début ferme ture
Indice supplémenta ire
0-N
Libellé de frais d'expédition
1-1
Date finfermeture
Prixavant
LIGNE DECOLISSANSADRESSE
Quantité livrée / fournisseur
CIRCUIT
1-1
Indic e de la ligne
Prixfournisseur
Nouveauprix moyen pondé ré
code origine
0-N
Référence article
TYPESTOCK
CENTRE DISPATCHING
Code type de stock
JOUR
Code ce ntr e dis patching
Libellé type de stock
Libelle c entre dispatc hing Libellé dujour
0-1
Donné e e ntr eprise centre dispa
TOMBOLACHANCEARTICLE
numé rodubillet
0-N
0-N
1-N
DISPONIBILITE
Code disponibilité
Libellé disponibilité
cn/pnttyp
1-1
0-1
0-N
0-N
revalorise r
QUANTITEEN STOCK
Stoc kthé orique
0-N
calculgoulotte
Borne supé rie ure code postal
Borne ni férieure code postal
code tournée
TRANSPORTEURPAR CANAL
c crtyp/envcop
0-N
1-1
0-N
op/sai
0-N
Document par langue
1-1
0-N
0-N
0-N
0-N
0-N
1-1
EARLY BIRD
Nombre de cadeaux attribué sto
Nombre de cadeaux maxim um tota
Nombre de cadeaux maxim um/jour
Nombre de jours
Numéro dujour en cours
Nombre de cadeaux dujour en c
0-N
lan/qrt
1-1
0-N
0-N
SERVICE PAYS
Cout duservic e interne
Cout duservic e ext erne
Ta uxde c hange
Largeur emballé e de l'art ci el
0-N
Ha uteu
r emballée de l'a rtic le
Longueur montée de 'larticle
Largeur montée de 'larticle
Ha uteur monté e de l'article
Code fournisseur LDF
Nombre minimum article stock
Code frais
Code type de remplace ment
Topabandonde 'larticle
Type de message sur bor derea u
Volume article pour transport
Code loge ment
Prixvente solde com/lan
Cod
agne suivant
ilbe lrepo
é durtc ocamp
mmentaire
Toparticle sur-stock
Topmodif prixsolde
toparticle composé[0-1]
Da te ni itia ilsationstatistiqu
0-N
0-N
0-N
0-N
1-N
0-N
0-N
0-1
FACTURE
Numéro de facture
Valeurfa cture interne
c ode c anal
Valeurfa cture e xterne
0-N
0-N
CANAL
c ode c ana l
libellé duca nal[0-1]
délaide livraison[0-1]
topca nal de commande(o/n)[0-1]
topca nal de livraison(o/n)[0-1]
Mouve mentde stock
topca nal de retour[0-1]
Numéro de mouve ment
type de mouvementde paiement[0-1] Nombre de pièc es
c ode blocage[0-1]
Da te dumouvement
c ode c ana lde déroutage [0-1]
Prixac hat sta nda rd
topsaisie des comma ndes[0-1]
Prixde vente de ba se
nom duresponsa ble duca nal[0-1]
Toptexte associé
numé rode téléphone ducanal[0-1]
c ode c irc uit[0-1]
c ode langue [0-1]
c ode posta l[0-1]
c ode dutype de ma gasin[0-1]
Nombre maximum de colis
Topscission ducolis
Topattributionnolivraison
Type de bordereau de livraison
Origine dumouve ment PAR
Code type centre dispatching
0-N
Numérode code postal/localité
Code depot
Nombre de poste de travail/heu
0-N
0-N
1-1
1-1
prmrsv/sai
Cadeau
ARTICLE VPC
Code article VPC
libellé francais article VPC
Libellé né erlandaisa rtic le VP
Code colorisde 'larticle
Dé al ide livra sion
0-N
Code édition
Code exception
Longueur emballé e de l'a tr ci el
article vpc pièc e
Poidsde 'larticle
a rticle vpc ense mble
0-N
Code pli article 0-N
Prixac hat standard
SUIVIOPERATRICE
Volume de l'a rtic el
Code oper atrice
Prixde vente de base
Code type de délai de livraiso
Unité de ve nte
code type de relais
Nombre de colis pour 'larticle
Code type de suivi opératrice
Toptec hnique 1
Cumul commande
Toptec hniquositi
e 2 ond'un ense mble
Cumul ilgne
Prixacdécomp
hanti
at ho
rsdan
cession le
téVP
Cumul temps de' ncodage
Code qu
article
C sinensemb
terne
Cumul vale ur catalogue
Code article VPC externe
0-N e stre mplac e
Cumul vale ur nette
Quantité minim um article piRemp
c ki lac ement0-1
quantité ré servé e 24h
re mplac e
1-1
Code taille de 'larticle
0-1
1-1
1-1
COMPOSITION D'ENVOI
Code compositiond'envoi
Libellé compositiond'envoi
Commentaire
Dernier je t
Date dernier jet
Topenvoireta rdé
Date d'envoiprobable
Topcircuit
Actionà éditer
Valeur bond'achat externe
Valeur minim ale commande exter
Canal commande
Taux de ristourne à édit er
1-1
Canal livraison
0-N
0-N
0-N
0-N
0-N
0-1
Sec teur ac tivite
0-N
Mise enlogeme nt
Quantité entrée en stock
Code état
Code article vpc 0-N
0-N
cadeaua preleve r
Libe lé a tr ci el à c ommuniquer
POS-DTE
nbre art. disponibles
nbre art. suspens
nbre art. autre stock
encours cde frn
lan/tpbc il
fournisseur/langue
0-N
individu/pays
0-N
envcop/jet
0-N
0-N
1-N
0-N
0-N
1-1
1-N
STATFOURNISSEUR MENSUELLES
Code mois anne e
Nombre d'articles réc eptionnés
Chiffre d'affa rie e ndevise
Chiffre d'affa rie
Chiffre d'affa rie prixde revi
c d/eta
0-N
E
0-N
CREDITPAR MONTANT
Prixde vente compt ant externe
Va le ur de l'ac ompte externe
Va le ur de mensua lité externe
Nombre de mensua lités
Tauxa nnuel effect if global ma
Tauxa nnuel globalre elutilis
PARAMETRER SIMULATION/SECTEUR
Déc alage envoi/r éception
0-N
date de demande
0-N
dema nde clie nt
date de saisie
Code é tat
top circuit
Numé rode c ommande[0-1]
Motif
réponse subsidiaire
Code t ype de saisie
1-1 Co
de user de saisie
0-N
1-1
POSITIONS/ACTIVITE/MOIS
quantité vendue par mois
LOT
Affra nchissement unita rie
0-N
0-N
c gp/tpbcli
0-N
0-N
al n/soprue
dte/pay
tauxde c hange
0-N
0-N
0-N
POS-SEM
nombre d'ar itcle par sema ni e
liene ch/pos
0-N
1-N
pays dufournisseur
Suivi des jets
pondération
tauxre partition tai/clr
tauxre partition clr/ref
tauxpondéra tionlongterme ex
0-N
0-N
ETAT D'AVANCEMENT (LIGNE OU CO
type d'éta t[0-1]
c ode é tat
libellé état[0-1]
Libellé état en 3cara ctère s
Libellé état en 10c aract ères
Nombre de jours de réte ntion
Code refus 1-1
0-N
0-N
Da te de de pot
1-1
servic1-N
es repris
Cout service à la clie ntè el in
Cout service à la clie ntè el
0-1
1-N
1-N
1-N
1-1
1-1
0-1
1-1
pays de provenanc e
0-N
0-N
0-N
0-N
Carte
0-1
sa che t
1-1
lan/tpbsta
1-1
1-1
1-1
pondération
taux repa rtitiontai/clr
taux repa rtitionclr/ref
taux pondé rationlongte rme ex re pre sente
0-N
0-N
e ta/etatyp
JET
Numérode ej t
Date de dema nde
1-1
dte/je t
0-N
POSITION photoa pre ndre
0-N
0-N
0-N
CAT-PRD- SEM
nombr e de référe nce c umulée
Longterme a ctualisé
longterme e xtrapole
1-N
ACHETEUR
c ode a che teur
Nom acheteur
groupe achat
c ode paysde provenance par de
c ode paysde fabricationpar d
10période sde cade nce ment
nombre de jourde sécurité
numé rode personne
tauxde c ouverture
Période d'échelonnement[ 10-10]
1-1
PAR défaut
Code TVA
0-N
0-N
1-1
0-N
1-N
tpbmtf/tpbsta
RECEPTION
numé roréce ption
date réce ption réelle 0-N 0-N
0-N
Da te mise enstockde ré ceptio
nom dufirme transporte ur 0-N
numé roplaque camion
heure a rrivée transport
heure dépa rttra nsport
nombre colis planifié
nombre pa el ttes planifié
nombre colis réel
nombre pa el ttes réel
texte
réc eption
TYPEDOSSIER
SAAV
Code type dossie rSAAVnumé roemplacement stockte mpo
nombre de ligne sde réce ption
Libellé dutype dossier0-N
SAAV
Code état
SUIVRE LE DOSSIER
Code gestionachat
Heure
Poidsta xable
Texte suivi dos sie rSAVCu
bage (en m3)
Code type de réc eption
rcn/rcnlg
c ode marque
1-n1
c ode type a rtic el ré ception LIGNE RECEPTION
uméro
réception1-N
nombre pe rsonnn
e ma lig
nutneention
1-1 c ode deballe nombre a rtic el sre ceptionnés c
nomb
re asprtic el sré ceptionnés n
Da te arrivée qu
ai tran
type article livré
code état avanceme nt ligne ré c
topc ompte renduréc eption
date finré ception
nom responsable contrôle quali
nombre a rtic el sré ceptionthéo
semaine texte ilbre de réception
année topmarquage à la réce ption
pr econditionne ment à al ré c
semaine
1-N to
code secte ur da' ctivité
0-N
FOURNISSEUR
1-1
tpb/tpbsta
frnpubptc/mrq
c atref/svc
Nombre de se rvice
Cout duse rvice à la clientèle
CAT-PRD-DTE
Longte rme actualisé
nombre de ré férence cumulée
longte rme pré vu
ta uxa vancement produit/c atalo
ta uxc ouverture
ta uxmajoration
ta uxre tours produit
0-N
0-N
0-N
0-N
dte 2/tpbstadte
0-N
0-N
1-N
remise finanné e fournisseur
tauxde ris tourne
type de ristourne
Libe lé ristourne fournisse ur
1-1
1-N
pa ei mentsdes mensualités
Noséquentiel
Type de record
Montant à payer (inte rne)
TVA
Date dupaiement
c ode TVA
Montant payé (inter ne)
répartition/tLib
va ellé TVA0-N
Libe lé paiement
0-N
Taux de répartitio
nde laTatvauxde tva
Quia liquidé
indice comptable de la tva
Pa rtdes fraisfinanciers
Toptraite mentmise à jour
Pa rtde la tva dans la mensual
Pa rtà re mbourser (inte rne)
Montant à payer (externe)
Montant payé (externe)
pays de vente
Pa rtà re mbourser (externe)
1-1
MARQUE
code marque
0-N
crédits possibles
0-N
0-N
0-N
0-N
est commandé
gere
0-N
1-Ndte/tpbstadte
dte3/tpbsta dte
est commandé1-1
par
1-1
TRANCHECA 1-N
CA fournisseur de base
FILIALE
c ode filiale
Libellé filia le
TYPEETAT
Code type éta t
Libellé type éta t
top/a ction COMMANDE
numerode1-1
commande
date livraisonproblable
date rende z-vous
prixac hat e nFB
date de mande
quantité demandée
prixac hat e ndevise
est prese nt
date butoir pour lay-out
topfiche tec hnique
toptissu
topbois
tauxde ris tourne
etat cde
numérode el ttre
Date initiale de la livraison
fac ture / commande
1-N
est emballé
1-1
0-N
0-N
1-N
0-N
implanté
1-1
dte /tpbsta
MODE DELIVRAISON
c ode mode de ilvraison
libelle mode de livraison
0-N
0-N
TYPEDE TOPCLIENT
0-N
conc erne
date de réception
1-1
0-1
0-N
ECHANTILLON CONTIENT
nombre de piles
0-N
0-N
im planter le sta illes
Taille impla ntée
1-1
PRODUIT/CATEGORIE
Nombre d'articles
Topinformation
0-N
ECHANTILLON
numéro échantillon
libellé née rla nda si
libellé franc ais
libellé fournisseur
Volume da' rtic el pour rt anspor
poids
nombre de colis
hauteurmonté
large ur monté
longue ur monté
hauteure mballé
large ur emba lé
longue ur emba lé
volume e mballé
code mode d'emploi
top fragile
taille
top suivi
prix achat article
1-1
top conforme
top reponse courrie r
top ve nda ble
top mode emploi conforme
top emballa ge conforme
unité compléme ntaire
prix vente estimé
Topatt
Code coloride l'article
Toptest externe
Refere nce c hez le fournisse ur
Code fournisseur de l'a rticle
al ngue
1-1 0-N
0-N
tva par dé faut
1-N
0-N
REFERENCEDOUANE
Code réfé renc e douane
Comme ntaire de la ré férence do
Code unité complémenta ire doua
Libe lé réfé renc e douaniè re
Libe lé franca is réfé renc e dou
code c oloris
libellé colorise nfrancais
libellé colorise nné erlandais
TRADUCTION DES CODES
code type de texte
Valeur de laclé
Traduction
JOURNALDESVENTES
1-1
0-N
BIC
Numé rode réfé rence de l'a rtic
Code filiale
Numé roBIC
Libellé duBIC
COLORIS
0-N
1-1
prxpa yca t/prxpgec at 0-N
1-N
0-N
1-1
RECEPTIONECHANTILLON
numéror éce ption
0-N
date r éc eption rée lle
nomdu firme trans porteur
COMPLEMENT DOSSI ER SAAV
numéroplaque camion
Code pa ys
heure ar rivé e trans port
Nom de l'a dresse
heure dé part trans port
1-1
Numérode téléphone
nombre colis planifié
0-1 1-1
Localité
nombre pale ttes planifié
CORRESPONDANT 3SUISS
CodES
e postal
nombre colis ré el
cn#c
Rue de 'ladre sse
0-1
nombre pale ttes ré el
Numérode 'ladre sse
stock???
Numéroducolis
te xte réc eption
POSITION FOURNISSEUR
FACTURATION
Da te facture
LIEU DELIVRAISON
code fo
urniss
DETAIL DOSSIE
R SAAV
me nrofacturation
Code fournisseurde 'larticle
Numéro
de répanu
ratio
e stpe
01-1
-Nliv
ré eur
TYPEINTERVENTION Nu
code lie u cod
e ty
article
méro de dossier
libellé position/fournisse ur
ligne de facturation
libe llé
Code type di'nterve ntion
Numérode dossier
Numéro de ligne
nombre de ligne s de r éce pti on ligne
type fournisse ur
nbre article fa cturé
Libellé type di'nterve ntionSA
LIGNERE
1-1
CE
0-NPTION ECHANTILLON
Texte de1al-1ligne
Code état
prixunita rie
facture /rec ept oi n
numero de l gi ne
0-N
Code ge stionachat
c ommenta rie
quantité the or ique
Poids taxable
e tat factura tion
ECHANTILLON/CATEGORIE
MODE
DEgeDEM
1-1
0-N
Cuba
(e nANDE
m3
) DE PERSO date finlivrai son
LIGNE PROPOSITIOND'ACHAT
Nombre d'articlesTYPE DETOP PUBLICITAIRE
DOSSIERSAAV
Code mod
eedetydemand
code e tat
Cod
rée ceptio
crédit repris/c atalogue
0-N
Numéroligne propositiond'ach
LibelléDate
de mo
depedeedednemastock
nde denr éce ptio
Numérode1-N
dossier SAV
mise
est réc eptionné
Date initia el de ilvraison
Date dudos ise rSAV
TYPED'INTERV
PARrDOS
SIE quantité livré e réelle Qua ntitée proposé e
code marquage
RéféreENTION
nce f ournisseu
article
mois
toptyp/tpc
quantité livré e nonconforme
code type article r éce ption
Nombre minimum article à comma
Libellé francais article
anné e
nombre pers onne manute ntion
Numérode 'loffre
0-N
Numérode page dudocume nt eta tlv
code de balle
mois
Qua ntité rée le proposée
Code saison
RESERVATION
PRODUIT COLORIS
Date
ar rivIéTIONS
e quaiD'tra
PROPOS
Ansp
CHAT
PRODUIT TAILLEnumérobonrés erva tion
Code action
libellé francais coloris dupr
Todeprpendez
usac hat
Numéro
ropositvo
ion
taille
Code éta t
Article cata ol gue
Type de propositionac hat
VENTE AUCOMPTANT
libellé né erlandaisducoloris
ilbe lé taille francais
Numérode clie nt (interne)
nombre d'articles rése rvés
tpbsta/toptyp
nombre me nsua iltés ma ximum pou
Da te ni itia el de ilvraison
ilbe lé taille ennéerlandais
Code canal
nbarticles commandés sur rése
Da te propositiond'achat
prixve nte c omptant
ilbe lé taille née rlandais
Numérodu colis
période de rés erva tion: début
Horizon ennombre de jour
descriptionte chnique
période de rés erva tion: fin
Date de facturation
Code état
Numérode ér pa ration
1-1
1-1
Code type da' dresse
Code état
1-1
Prixde vente de base
1-1
0-N
Se cteur d'ac itvité
Numéroliena dresse courrier
Numéroliena dresse livraison
Numéroadresse si absent
0-N
1-1
prmrsv/prmtyp
dev/pntctl
1-1
0-N
0-N
0-N
prm/prmrsv
Fig. 1. Excerpt (< 10%) of a large ER diagram drawn using a force-directed placement
algorithm
Figure 1 shows a very small part of the ER diagram of a Belgian distribution
company. Though the schema comprises about 450 nodes and 800 edges only, the
4
layout is definitely useless for understanding the schema and the corresponding
application.
COLIS
0-N
0-N
1-1
CANAL
0-N
0-N
LIVRAISON COLIS
1-1
1-1
0-N
FOURNISSEUR
cn/suicol
réexpédition colis
0-N
colcom/suicol
SUIVI LIVRAISON COLIS
0-N
0-N
0-N
col/eta
1-N
0-N
remise fin année fournisseur
0-N
1-1
ETAT D'AVANCEMENT (LIGNE OU CO
0-N
SUIVRE LE DOSSIER
0-N
0-N
1-1
1-1
article vpc pièce
0-N
1-N
1-1
0-N
date début de validité
0-N
0-N
1-1
1-1
1-1
0-1
DATE
article vpc ensemble
0-N
1-1
date de fin de validité
0-N
STAT FOURNISSEUR
artvpc/frnMENSUELLES
0-N
décomposition d'un ensemble
fournisseur/langue
0-N
achfrn/lan
POS-DTE
0-N
description
décrire
0-N
1-1
0-N
0-1
ARTICLE VPC
1-1
est remplace
0-1
remplace
0-N
Notice
1-1
0-N
artref/artvpc
Canal
Canalcommande
livraison
0-1
HISTORIQUE
cd/suicolCOLIS
Poids max/canal
Remplacement
pays de la devise
EARLY BIRD
pays du fournisseur
fournisseur/pays
0-N
0-N
0-N
0-N
cd/eta
0-N
dte/pay
0-N
0-N
0-N
0-N
0-N
1-N
Mouvement de stock
1-1
0-N
0-N
0-N
0-N
0-N
FORMULE
0-N PAR PAYS/LANGUE
0-N
artref/tva
simulation prixartref/douref
de vente
Pays d'origine
artref/atvsec
SUIVI DES TOPS CLIENT
0-N 0-N
0-N
0-N
0-N
PAYS ENSEIGNE LANGUE
0-N
0-N 0-N
0-N
0-N
atvsec/cd
0-N
SECTEUR ACTIVITE
0-N
FRAIS PAR DEFAUT
0-N
0-N
0-N
1-N
0-N
0-N
0-N
1-1
PAYS LANGUE
Composition
0-N
COMMANDE
LANGUE
Date de depot
1-1
LOT
0-N
ALIMENTER
0-N
0-N 0-N 0-N
pondération
calculSUIVI
canalACTIVITE
calcul goulotte
1-N 0-N 1-N
0-1 0-N
1-N
1-1
0-1
REFERENCE ARTICLE
cumul mouvement systeme mesure
POSITIONS/ACTIVITE/MOIS
0-N
1-N
0-N
0-N
0-N 0-N
0-N
0-N
0-N
0-N
1-NCadeau associé
0-N
cn/pay
Document par langue
SUIVI OPERATRICE
0-N
0-N COMMANDES
SUIVI
0-N
0-N
PAYS
pays d'origine
0-N
pays destination
0-N
FRAIS DE DOUANE
Fig. 2. The diagram of the top-11 entity types of the schema of Figure 1
Now Figure 2 shows in micrography the diagram of the top-11 entity types
of the schema of Figure 1 according to EntityRank (for reasons of space, the
attributes are not displayed in this figure). Although this schema has 54 relationship types it is extremely more easy to visualize, and thus to understand,
than the original schema. Of course, one user could start from even smaller diagrams. For instance, Figure 3 shows the graph of the top-5 entity types of the
same schema. It indeed contains the major entity types of this application and
a user can immediately understand the application domain of this schema.
In case of diagrams with big isA hierarchies, some entity types, although major, may not receive high scores because their relationship types are scattered
in several subentity types. To handle this case, we introduced a (optional) preprocessing step in which each isA hierarchy of the schema is collapsed into one
entity type that collects all the attributes and relationship types of its subentity
types.
Concerning evaluation, at first we have to note that the evaluation of the
effectiveness of link analysis techniques for ER diagrams (for conceptual graphs
in general), is more difficult than in the case of Web. In the latter case, it is
not so hard to judge whether the top ranked pages are indeed relevant to the
submitted query. However, in the case of conceptual graphs, one has to know well
in advance the conceptual graph in order to judge whether the resulting small
graph indeed contains the major concepts of the conceptual graph and of its
underlying domain. Inevitably, the most reliable evaluation of such techniques
can be done only in already known conceptual graphs. For this reason we applied
this method to almost every conceptual schema that the DBMAIN group has
produced the last 3 years. This was a quite representative test bed as it includes
5
Channel
decomposition
0-N
0-N
0-N
partOf
0-N
Bonus
0-N
Max Weight/Channel
0-N
EARLY BIRD
State of Progress
0-N
0-N
1-1
Operator
0-N
path Calculation
0-N
cn/pay
channelCalculation
follow-up activity
0-N
Item
0-N
0-N
0-N
replacedBy
replaces
0-1
0-1
description 1-1
describes
0-1
replacement
0-N
0-N
0-N
supplies
0-N
POSITIONS/ACTIVITIES/MONTHS
Country
0-N
0-N
0-N
Notice
0-N
origin
0-N
destination
0-N
0-N
0-N
0-N
1-N
0-N
accumulated movements
0-N
0-N
0-N
0-N
Stock Movement
0-N
0-N
country language
0-N
SectorOfActivity
DefaultCost
Customs costs
Fig. 3. The diagram of the top-5 entity types of the schema of Figure 1
big schemas of existing (and non artificial) real world applications. We always
obtained surprisingly good results. For reasons of space we cannot report here the
exact results of the evaluation of EntityRank (and BEntityRank) using metrics
coming from the area of IR (for more [35]). In addition, it is an advantage that
the proposed formulas for link analysis are mathematically founded and that
the underlying model (the random walk model) is quite relevant to browsing,
i.e. to the most widely used method for understanding a conceptual graph. At
last, another evidence that link analysis is indeed appropriate for ER diagrams is
that all large schemas that we have tested have a small set of elements, usually
less than 5% of the total ones, whose scores are significantly higher than the
rest. This at least indicates that big ER diagrams tend to have a well connected
kernel which, at least in our experiments, always comprised the more important
concepts of the application domain.
3
Automatic ER Drawing
For drawing automatically the top-k ER diagrams that are derived by the previous technique, we shall view them as mechanical systems. Below we present
two force models that combine the spring-model (proposed and developed in [13,
24, 16]) with the magnetic-spring model (proposed in [32, 31]) in a way that is
appropriate for ER diagrams.
3.1
Force Model A
Here entity types are viewed as equally charged particles which repel each other.
Relationship types and isA relationships are viewed as springs that pull their
adjacent entity types. Moreover, we assume that the springs that correspond to
isA links are all magnetized and that there is a global magnetic field that acts on
these springs. Specifically, this magnetic field is parallel (i.e. all magnetic forces
operate in the same direction) and the isA springs are magnetized unidirectionally, so they tend to align with the direction of the magnetic field, here upwards.
Figure 4 illustrates this metaphor.
6
worksAt
Person
Manager
11
00
00
11
Secretary
11
00
00
11
11
00
00
11
Company
11
00
00
11
magnet
magnetic field
electrical repulsion
Fig. 4. Viewing an ER diagram as a mechanical system
Under the above force model, the force on a entity type ei is given by:
X
X
X
F (ei ) =
f (ej , ei ) +
g(ej , ei ) +
h(ej , ei )
ej ∈conn(ei )
ej ∈E,ej 6=ei
(3)
ej ∈connI (ei )
where: f (ej , ei ) is the force exerted on ei by the spring between ej and ei (note
that ei and ej are connected by a relationship type or an isA link), g(ej , ei ) is
the electrical repulsion exerted on ei by the entity type ej , and h(ej , ei ) is the
rotational force exerted on ei by the entity type ej (here ei and ej are connected
by an isA link).
Figure 5 gives some indicative examples that explain the role of the forces f ,
g and h. Specifically, figure (a) justifies the spring force, figure (b) justifies the
electrical repulsion and shows that high electrical repulsion (high K e ) results in
symmetrical drawings, and figure (c) illustrates how the magnetic field can be
used in order to obtain the classical top-down drawings for isA hierarchies.
no force
with f
(a)
with f
with f and g (low Ke)
no force
with f and g (high Ke)
with f and g (high Ke)
(b)
with f and g (high Ke) and h
(c)
Fig. 5. Forces and ER Drawings
The spring force f (ej , ei ) follows Hooke’s law, i.e. it is proportional to the
difference between the distance between ej and ei and the zero-energy length of
7
the spring. Let d(p, p0 ) denote the Euclidean distance between two points p and
p0 and let pi = (xi , yi ) denote the position of an entity type ei . The x component
of the force f (ei ) is given by:
X
xj − xi
s
fx (ei ) =
Ki,j
(d(pi , pj ) − Li,j )
d(pi , pj )
ej ∈conn(ei )
where Li,j denotes the natural (zero energy) length of the spring between ei and
ej . This means that if d(pi , pj ) = Li,j then no force is exerted by the spring
s
denotes the stiffness of the spring between ei and
between ei and ej . Now Ki,j
s
ej . The larger the value of Ki,j
, the more tendency for the distance d(pi , pj ) to
be close to Li,j . The y component of the force f (ei ) is defined analogously.
The electrical force g(ej , ei ) follows an inverse square law. The x component
of the force g(ei ) is given by:
gx (ei ) =
X
ej ∈E,ej 6=ei
e
Ki,j
xi − xj
d(pi , pj )2 d(pi , pj )
e
Ki,j
where
is used to control the repulsion strength between ei and ej . The y
component of the force g(ei ) is defined analogously.
The magnetic force h(ej , ei ) depends on the angle between the isA spring
(that connects ej and ei ) and the direction of the magnetic field and it induces a
rotational force on that spring. For example, Figure 6 shows an isA link between
ei and ej and the exerted forces on ei and ej due to the magnetic field. The x
and y components of the magnetic force h(ei ) are given by:
X
X
xj − xi
xj − xi
hx (ei ) =
Km
+
Km
Li,j
Li,j
ej ∈connsp (ei )
X
hy (ei ) =
ej ∈connsp (ei )
ej ∈connsb (ei )
Li,j + yj − yi
Km
−
Li,j
X
ej ∈connsb (ei )
Km
Li,j + yi − yj
Li,j
where K m is used to control the strength of the magnetic field.
hy (ej)
hx (ej)
ej
l
l
yj − yi
ei
hy (ei)
hx (ei)
xj− xi
Magnetic Field
Fig. 6. Magnetic forces and isA links
The x and y components of the composed force F (ei ) on an entity type
ei are obtained by summing up, i.e.: Fx (ei ) = fx (ei ) + gx (ei ) + hx (ei ) and
Fy (ei ) = fy (ei ) + gy (ei ) + hy (ei ).
8
As in the link analysis technique, we view an n-ary relationship type as
n(n − 1)/2 springs.
3.2
Force Model B
One weakness of the above model is that the resulting drawings can have several
overlaps. The reason is that: (a) there is no repulsion among relationship types,
and (b) there is no repulsion between entity and relationship types. Figure 7
illustrates this problem. This drove us to introduce a different force model where
each relationship type is viewed as a particle too. Clearly, the resulting electrical
repulsion discourages the creation of overlaps (between entity and relationship
types, or between relationship types themselves). Notice that according to this
view, a relationship type does no longer correspond to one spring. Specifically,
the particle of a relationship type over k entity types, is connected with one
spring with each one of them. The forces on entity types and relationship types
are computed analogously to the force model A.
Without rel−rel repulsion
With rel−rel repulsion
(a)
Without ent−rel. repulsion
With ent−rel repulsion
(b)
Fig. 7. Forces and ER Drawings
3.3
The Drawing Algorithm
We can reach a drawing by an algorithm that simulates the mechanical system.
Such a algorithm would seek for a configuration with locally minimal energy,
i.e. a drawing in which the forces on each node is zero. A variety of numerical
techniques can be used to find an equilibrium configuration, and thus the final
drawing. We have adopted the iterative method based on the method proposed
in [13]. At first the nodes are placed at random positions. At each iteration,
the force on each node is computed and then the node is moved towards the
corresponding direction by a small amount proportional to the magnitude of the
force. This can be continued until convergence, but we can also limit the number
of iterations.
Note that if we would like to find a drawing that corresponds to a state
with globally minimal energy, then we would have to resort to very general optimization methods. For instance, a method based on simulated annealing is
proposed in [11], while an approach based on genetic algorithms is described
in [4]. However the computational complexity of these techniques turns them
9
not very appropriate for interactive design systems. In addition, and according
to the results of the extensive empirical analysis of several force-directed algorithms (including globally minimal energy algorithms) upon plain graphs that
are reported in [3], there is no universal winner and the general approach is to
try several methods and choose the best.
3.4
Experimental Evaluation
We have investigated and evaluated all these issues in the context of the CASE
tool DB-MAIN. The specification of the parameters L, K s , K e and K m is not a
trivial task as these parameters determine in a high degree how the final drawing
will look like. One flexibility of the proposed approach is that we can adjust the
s
e
spring length (Li,j ), spring stiffness(Ki,j
) and electrical repulsion(Ki,j
), in order
to customize the appearance of the drawing according to the semantics of the
ER diagram constructs. For instance, as it is desirable to keep the nodes of an
isA hierarchy close enough and since between any two isA-related entity types
we only have to draw a line (and not any hexagon-enclosed string), we can use
a smaller length for isA-springs than that of relationship-springs. In any case,
the user can change their value at run-time.
Figure 8 shows one drawing obtained by the algorithm using low electrical
repulsion. Although the isA hierarchy is drawn as a top-down drawing and we
have no overlaps, this drawing is not satisfying because a designer would hardly
manually place into the space occupied by an isA hierarchy an entity type that
does not belong to that hierarchy. After we increased the repulsion and the
magnetic field we never faced again such a drawing. The lesson learned is that
high repulsion not only results in symmetrical drawings but its combination
with a strong magnetic field results in clear isA drawings. Another drawing of
a diagram with 4 isA hierarchies that is derived by the algorithm according to
force model A, is shown in Figure 9.
low Ke
high Ke, high Km
Fig. 8. How to obtain clean isA drawings
A more complex case is shown in Figure 10. Figure 10.(a) shows a manually
placed diagram where all subentity types have been placed at the outer part of
the drawing. Figure (b) shows the drawing obtained according to force model
B. Notice that every isA hierarchy now corresponds to a top-down drawing and
that the entire drawing is symmetrical and satisfying.
10
ENTITY_1
R
1-1
0-N
ENTITY
R_1
0-N
ENTITY_3
1-1
ENTITY_4
ENTITY_11
ENTITY_2
0-N
R_2
1-1
ENTITY_5
ENTITY_10
ENTITY_8
ENTITY_6
0-N
R_3
1-1
ENTITY_9
ENTITY_7
Fig. 9. A drawing of a diagram with 4 isA hierarchies according to force model A
ENTITY_5
ENTITY
R_2
1-1
0-N
1-1
R_6
0-N
0-N
1-1
R_10
R
ENTITY_5
ENTITY
1-1
0-N
0-N
1-1
0-N
R_10
R
1-1
R_2
R_6
1-1
1-11-1
0-N0-N
0-N
ENTITY_1
ENTITY_9
ENTITY_4
ENTITY_1
ENTITY_6
ENTITY_4
ENTITY_6
ENTITY_9
R_5
R_3
0-N
1-1
1-1
R_3
0-N
0-N
1-1
1-1
R_5
0-N
1-1
R_9
0-N
ENTITY_2
0-N
R_4
ENTITY_2
R_7
1-1
0-N
R_4
1-1
ENTITY_3
R_7
R_9
ENTITY_3
0-N
1-1
0-N
1-1
ENTITY_8
ENTITY_7
ENTITY_8
1-1
R_8
0-N
1-1
0-N
R_8
ENTITY_7
(a)
(b)
Fig. 10. Drawing of a diagram with several IsA hierarchies
(a): manual drawing where isA links are not vertical. (b): drawing obtained according to force model B
The experimental evaluation showed that the drawings according to force
model A suffer from overlaps, while those according to force model B have a few
(or none) overlaps. The difference between force model A and force model B is
even more evident in dense diagrams. Figure 11 shows the drawings obtained
by these two models when applied on the top-5 (|E| = 5, |R| = 17) diagram of
Figure 3. Again, the second drawing is evidently better. A noteworthy remark
here is that the second diagram is more clear and intuitive than the manually
specified layout that is shown in Figure 3. This indicates that in certain cases
(at least when the diagram is very dense) the automatically-derived drawings
can be better than the manually drawn.
However, we have to note that force model B has two weaknesses comparing
to force model A: (i) it is computational more expensive, and (ii) in the resulting
drawings the tentacles of binary relationship types are in many cases unnecessarily not aligned. This is evident in Figure 12. Although this is not a major
problem it is an issue for further research.
Figure 13 shows the automatic layout obtained for the top-11 diagram (that
was presented in Figure 2). The high relative number of relationships makes
the drawing almost unreadable. This example suggests that we should take into
account the density of a diagram, in order to reach readable and clear drawings.
Roughly, we could handle dense diagrams by considering: (i) larger springs,
(ii) higher electrical repulsion, (iii) less stiff springs. For example, and assuming
force model A, Figure 14.(a) shows the drawing obtained with spring length
11
replacement
Channel
decomposition
Channel
replacedBy
replaces
0-1
0-1
1-1
0-N
1-1
0-N
partOf
0-N
0-N
0-N
cn/pay
0-N
0-N
0-N
0-N
State of Progress
cn/pay
EARLY BIRD
channelCalculation
Notice
supplies
0-N
Operator
0-N
0-N
0-N
0-N
0-N
Max Weight/Channel
Operator
0-N
0-N
0-N
0-N
0-N
Max Weight/Channel
0-N
0-N
0-N
0-N
EARLY BIRD
description
1-1
0-1
0-N
follow-up activity
0-N
POSITIONS/ACTIVITIES/MONTHS0-N
0-N
0-N
0-N
channelCalculation
follow-up
path Calculation
activity
Stock Movement
0-N
destination
0-N
0-N
Country
Customs costs
POSITIONS/ACTIVITIES/MONTHS
Stock Movement
origin
0-N
0-N
Country
accumulated movements
0-N
SectorOfActivity
destination
0-N
1-N
0-N
0-N
1-N
0-N
0-N
accumulated movements
supplies
State of Progress
0-N
path Calculation
Bonus
0-N
describes
0-N
Item
replacedBy
describes
0-1
0-N Bonus
0-N
Item
decomposition
replacement
Notice
0-N
1-1
description
replaces
partOf
0-N
0-1
0-1
origin
0-N
0-N
0-N
0-N
0-N
0-N
country
DefaultCost
0-Nlanguage
0-N
0-N
Customs costs
country language
0-N
SectorOfActivity
DefaultCost
(a)
(b)
Fig. 11. Force model A vs force model B on a diagram with |E| = 5 and |R| = 17
ENTITY_9
R
ENTITY_3
1-1
1-1
0-N
1-1
R_11
R_10
R_15
1-1
R_6
1-1
ENTITY_7
1-1
R_3
0-N
R_6
ENTITY_10
1-1
R_10
R_6
0-N
R_3
0-N
R_19
0-N
R_3
1-1
R_20
R_7
R_14
1-1
0-N
0-N
R_27
0-N
R_20
R_13
0-N
1-1
R_26
0-N
0-N
R_14
R_9
1-1
1-1
1-1
R_4
ENTITY_8
ENTITY_2
1-1
1-1
R_21
0-N
R_7
ENTITY_11
1-1
0-N
0-N
ENTITY_14
1-1
1-1
ENTITY_8
1-1
1-1
R_5
1-1
0-N
R_8
0-N
1-1
ENTITY_2
1-1
1-1
1-1
1-1
1-1
0-N
R_24
0-N
ENTITY_5
0-N
0-N
R_21
1-1
ENTITY_5
0-N
0-N
ENTITY_5
R_25
1-1
0-N
R_8
R_22
1-1
1-1
ENTITY_11
0-N
R_26
0-N
R_5
R_5
ENTITY_1
0-N
ENTITY_14
1-1
ENTITY_2
R_25
R_8
(a)
ENTITY_4
R_4
R_4
R_5
0-N
0-N
0-N
0-N
R_7
1-1
ENTITY_5
R_6
1-1
0-N
R_13
0-N
ENTITY_8
0-N
1-1
1-1
0-N
0-N
0-N
1-1
0-N
1-1
ENTITY_7
ENTITY_1
1-1
R_14
R_13
0-N
R_9
1-1
R_15
1-1
0-N
1-1
1-1
0-N
ENTITY_4
1-1
R_9
1-1
1-1
R_11
R_12
R_28
R_19
R_11
R_15
1-1
1-1
1-1
1-1
R_27
0-N
1-1
ENTITY_2
1-1
0-N
1-1
0-N
1-1
R_7
1-1
R_4
0-N
1-1
0-N
0-N
1-1
R_12
0-N
R_14
0-N
R_18
0-N
R_2
1-1
ENTITY_7
0-N
R_13
1-1
1-1
ENTITY
ENTITY_10
ENTITY
R_2
0-N
0-N
1-1
0-N
ENTITY_4
0-N
ENTITY_8
R_18
0-N
ENTITY_6
1-1
0-N
R_3
1-1
ENTITY_6
1-1
R_10
0-N
ENTITY_1
0-N
1-1
0-N
0-N
R_28
1-1
0-N
ENTITY_7
R_9
0-N
1-1
0-N
0-N
1-1
R_29
R
0-N
1-1
1-1
0-N
1-1
1-1
ENTITY_1
0-N
1-1
0-N
R_12
1-1
0-N
ENTITY_4
0-N
1-1
ENTITY_15
1-1
ENTITY_15
R_15
1-1
0-N
R
0-N
R_2
1-1
1-1
0-N
1-1
R_29
R_1
0-N
R_10
1-1
R_17
0-N
R_1
0-N
R_11
0-N
R_12
0-N
ENTITY_3
R_30
R_17
ENTITY_3
R_30
0-N
1-1
0-N
1-1
0-N
1-1
0-N
0-N
1-1
ENTITY
ENTITY_6
0-N
R_16
R_31
1-1
R_16
R_31
1-1
0-N
ENTITY
ENTITY_6
ENTITY_9
1-1
0-N
1-1
0-N
0-N
1-1
0-N
R_2
ENTITY_16
ENTITY_16
ENTITY_3
0-N
R_1
R
0-N
R_1
1-1
R_22
R_23
1-1
0-N
1-1
0-N
R_8
1-1
1-1
0-N
R_24
R_23
1-1
0-N
ENTITY_13
0-N
ENTITY_13
ENTITY_12
(b)
(c)
Fig. 12. Force model A vs force model B
ENTITY_12
(d)
(a): force model A. (b): force model B. (c): force model A. (d): force
model B.
FOURNISSEUR
0-N
remise fin année fournisseur
1-1
date de fin de validité
fournisseur/langue
achfrn/lan
1-N
0-N
0-N
1-1
date début de validité
REFERENCE ARTICLE 0-N
LANGUE
0-N
STAT FOURNISSEUR
artvpc/frn
MENSUELLES0-N
COLIS
0-N
1-1
DATE
0-N
1-N
0-1du
pays
fournisseur/pays
defournisseur
la 0-N
devise 0-N
0-N
1-1
Date de pays
depot
1-1
LIVRAISON
cn/suicol
COLIS
0-N0-N
0-N1-N artref/artvpc
0-N
1-1
Document
0-N1-1
0-N 1-N
0-1par langue
SUIVI
réexpédition
colcom/suicol
LIVRAISON
col/eta colis
COLIS0-N 0-N
LOT
POS-DTE
description
décrire
simulation
Pays
artref/douref
artref/tva
d'origine
prix de vente
1-N
SUIVRE LE
DOSSIER
CANAL
0-N
article
ensemble
SUIVI
PAYS
DES
Composition
PAR
TOPS
LANGUE
PAYS/LANGUE
1-1CLIENT
0-N FORMULE
0-Nvpcremplace
EARLY
BIRD
1-1
Cadeau
0-N
0-1
0-Nassocié
dte/pay
0-N
0-N
0-1
HISTORIQUE
cd/suicolCOLIS0-N
0-N
artref/atvsec
1-1
SUIVI COMMANDES
0-N 1-N0-N
0-N
ARTICLE VPC
0-N cn/pay
0-N 0-N(LIGNE
0-N
pondération
LANGUE
ETAT D'AVANCEMENT
OU
COPAYS ENSEIGNE
pays d'origine
0-N 0-N0-N
0-N
article est
vpcremplace
pièce
ALIMENTER
Canal
Canal
commande
livraison
0-N 0-N
0-N
0-N
0-N
1-N
0-N
0-N
SUIVI
Poids
OPERATRICE
max/canal
0-N
0-N
0-1
1-1
SUIVI
calcul
calcul
ACTIVITE
goulotte
canal
0-N
cd/eta
0-N
0-N
0-N
PAYSde stock
1-1
POSITIONS/ACTIVITE/MOIS
Mouvement
0-N
0-N0-N
mesure
0-N
0-1 cumul mouvement systeme
pays destination
1-N0-N
0-N
0-NDEFAUT
0-NFRAIS PAR
COMMANDE1-1
0-N
0-N
atvsec/cd 0-N
SECTEUR ACTIVITE
Fig. 13. Dense diagram drawing
12
Notice
décomposition
Remplacement
d'un ensemble
FRAIS DE DOUANE
0
L0 = 5L, Figure 14.(b) shows the drawing obtained with K e = 100K e , and
0
Figure 14.(c) shows the drawing obtained with K s = K s /10000. Indeed, all are
better than the original drawing shown in Figure 13. Another simple method that
is both effective and efficient is to scale up the entire drawing (i.e. multiply each
coordinate by a constant c > 1). Nevertheless, an issue that is worth further
research is to investigate the effectiveness of local-density adaptations, e.g. to
adapt the spring lengths according to the local density of the graph. EntityRank
and BEntityRank scores could be exploited for this purpose.
FOURNISSEUR
0-N
remise fin année fournisseur
REFERENCE ARTICLE
1-N
0-N
FOURNISSEUR
1-N
1-1artvpc/frnMENSUELLES
STAT FOURNISSEUR
date de fin de validité
0-N
1-1
date début de validité
0-N
COLIS
article vpcremplace
ensemble
0-N 0-1
fournisseur/langue
achfrn/lan
ARTICLE VPC
1-1
1-1
POS-DTE
0-N
0-N
1-1
LIVRAISON
cn/suicol
COLIS
1-1
artref/artvpc
1-1
Cadeau associé
0-N
0-N
0-N
SUIVI
réexpédition
colcom/suicol
LIVRAISON
col/eta colis
COLIS
CANAL
REFERENCE ARTICLE
0-N
Date de depot
0-N1-1
SUIVRE LE DOSSIER
0-N
0-N
0-N
1-N
0-1
0-N
EARLY BIRD
0-N
LANGUE
POSITIONS/ACTIVITE/MOIS
Mouvement
0-N de stock
0-N
0-N
0-N0-N
pondération
ALIMENTER
0-N
FORMULE
SUIVIPAYS
DES
Composition
PAR
TOPS
LANGUE
PAYS/LANGUE
CLIENT
0-N
0-N
SUIVI
Poids
OPERATRICE
max/canal
artref/atvsec
0-N
SUIVI
calcul
calcul
ACTIVITE
goulotte
canal
cd/eta
1-1
0-N
LOT
0-N
cn/pay
simulation
Pays
artref/douref
artref/tva
d'origine
prix de
vente
0-N
Canal
Canalcommande
livraison
0-N
1-N
0-N
1-1
description
décrire
0-1
0-N
0-N
0-N
dte/pay
0-N
1-N
ETAT D'AVANCEMENT (LIGNE OU CO
0-N
SUIVI COMMANDES
0-N
Document
par langue
0-N
1-N
0-N
1-1
0-N
HISTORIQUE
cd/suicolCOLIS
1-N
0-N
pays d'origine
0-N
PAYS
pays destination
0-N
0-N
PAYS ENSEIGNE LANGUE
0-N
0-N
0-N
1-1
cumul mouvement systeme mesure
0-1
0-N
0-N
0-N
1-N
0-N0-N
COMMANDE
0-N
0-N
FRAIS DE DOUANE
0-N
0-N
FRAIS PAR DEFAUT
0-N
1-1
atvsec/cd
0-N
pays
pays
fournisseur/pays
dudefournisseur
la devise
DATE
0-N
0-N
1-1
0-N
décomposition
Remplacement
d'un ensemble
articleest
vpcremplace
pièce
0-N 0-1
0-N
artref/atvsec
ETAT D'AVANCEMENT (LIGNE OU CO
0-N
0-N
Notice
simulation
Pays
artref/douref
artref/tva
d'origine
prix de vente
0-N
0-Ncumul mouvement systeme mesure
DATE
0-N
0-N
1-N
0-N
ALIMENTER
pondération
0-N
0-N
0-N
date de fin de validité
artref/artvpc 0-N
0-N
0-N
dte/pay
SUIVI COMMANDES
Date
de depot
SUIVI
réexpédition
colcom/suicol
LIVRAISON
col/eta colis
COLIS
date début de validité
0-N
1-1
0-N
0-N
cd/eta
0-N pays d'origine
SECTEUR ACTIVITE
0-N
0-N
FRAIS PAR DEFAUT
0-N PAYS
DE
DOUANE
0-N
0-N FRAIS
pays
paysdu
fournisseur/pays
defournisseur
la devise
1-1
1-1
1-1
FOURNISSEUR0-Nremise fin année fournisseur
0-N
0-N destination
0-N
1-1
0-N
pays
0-N LOT 0-N
0-N
0-N
1-1
POS-DTE
1-N
0-N
0-N
0-1
1-1
atvsec/cd
1-N
0-N
PAYS ENSEIGNE
LANGUE
1-N
0-N
1-1
0-N
COLIS 1-1
0-N FORMULE
POSITIONS/ACTIVITE/MOIS
Mouvement de stock
SUIVIPAYS
DES
Composition
PAR
TOPS
LANGUE
PAYS/LANGUE
CLIENT
0-N
1-1
1-1
fournisseur/langue
achfrn/lan
SUIVI
calcul
calcul
ACTIVITE
goulotte
canal
0-N artvpc/frnMENSUELLES
STAT FOURNISSEUR
HISTORIQUE
cd/suicolCOLIS
0-N
0-N1-N
0-N
COMMANDE
0-N
SUIVI
Poids
OPERATRICE
max/canal
0-N
1-1
0-N
cn/pay
1-N
0-1
EARLY
BIRD
article
vpcremplace
ensemble
0-N
1-1
Cadeau
associé
1-1 LANGUE
0-N
Notice
Document par langue 0-N
0-N 0-N 0-1
LIVRAISON
cn/suicol
COLIS
0-N
ARTICLE VPC1-1décomposition
Remplacement
d'undescription
ensemble
décrire
articleest
vpcremplace
pièce
Canal
Canalcommande
livraison
0-N
0-1
0-N
1-1
0-N 0-1
0-N
0-N
0-N
0-N
remise fin année fournisseur
0-N
LANGUE
1-N
0-N
0-N
Notice
DATE
1-1
0-N
1-1
0-N
1-1
SUIVI
réexpédition
colcom/suicol
LIVRAISON
col/eta colis
COLIS
SUIVRE LE DOSSIER
0-N
cd/eta
1-1
0-1
artref/artvpc
LOT
POS-DTE
pays
paysdudefournisseur
fournisseur/pays
la devise
1-N
dte/pay
CANAL
Document par langue
1-1
description
décrire
0-N
0-1
0-N
1-N
0-1
FORMULE
SUIVIPAYS
DES
Composition
PAR
TOPS
LANGUE
PAYS/LANGUE
CLIENT
0-N
1-1
1-1
0-N
EARLY BIRD
simulation
Pays
artref/douref
artref/tva
d'origine
prix de vente
SUIVI COMMANDES
HISTORIQUE
cd/suicolCOLIS 0-N
0-N
1-N
0-N
STAT FOURNISSEUR
artvpc/frnMENSUELLES
0-N
0-N
0-N
ETAT D'AVANCEMENT (LIGNE OU CO
0-N
0-N
0-N
1-N
0-N
0-N
Date de depot
0-N1-1
0-N
1-N
0-1
0-N
LIVRAISON
cn/suicol
COLIS
0-N 0-N
1-1
0-N
fournisseur/langue
achfrn/lan
date début de validité
REFERENCE ARTICLE
0-N
COLIS
SUIVRE LE DOSSIER
0-N
1-1
date de fin de validité
0-N
0-N
1-N
0-1
0-N
1-N
0-1
0-N
Cadeau associé
article vpcremplace
ensemble
0-N
1-1
0-N
0-N0-1
0-N
ARTICLE VPC
décomposition
Remplacement
d'un ensemble
0-N
0-N
articleest
vpc
remplace
pièce
0-N
1-N
0-N
0-N
0-N0-1
0-N
cn/pay
ALIMENTER
0-N
Canal
Canalcommande
livraison
0-Nartref/atvsec
0-N
0-N
pondération
PAYS ENSEIGNE LANGUE
pays d'origine
0-N
0-N
0-N
PAYS
FRAIS DE DOUANE
pays destination
1-1
0-N
cumul mouvement systeme mesure
SUIVI
calcul
calcul
ACTIVITE
goulotte
canal
COMMANDE
SUIVI
Poids
OPERATRICE
max/canal
0-N
0-N
POSITIONS/ACTIVITE/MOIS
Mouvement de stock
0-N
1-1
0-N
0-N
0-N
FRAIS PAR DEFAUT
0-N
atvsec/cd
1-N
0-N
0-N
0-N
0-N
0-N
0-N
0-N
SECTEUR ACTIVITE
(a)
CANAL
SECTEUR ACTIVITE
(b)
(c)
Fig. 14. (a): larger springs; (b): higher repulsion; (c) less stiff springs
As a final remark note that the above drawing techniques can be applied
for drawing the structural part of ontologies expressed in RDFS [5] and OWL
[12]. The only difference is that RDFS supports property specialization which
however will be handled correctly due to the magnetic field that is applied on
specialization/generalization links (also indicated by Figure 9).
4
Conclusion
We described a novel method for identifying the major elements of an ER diagram that is based on link analysis. This method can significantly aid (a) the
understanding, (b) the visualization, and (c) the drawing of very large schemas.
The proposed technique can elevate automatically the major elements and allows
exploring the schema gradually: from the more important elements to the less.
Consequently, it can be very useful in reverse engineering and in information integration. Moreover, the scores can be exploited for ordering the schema elements
that match a keyword query of the user. In addition, and given the inability
to produce automatically aesthetically satisfying layouts for large schemas, the
small (top-k graphs) that can be derived by this technique can be visualized
effectively and this is very useful during communication (e.g. between designers
and application programmers or in requirements engineering and training). For
this purpose we investigated a force-directed drawing algorithm and evaluated
two different force models upon several conceptual schemas of real applications.
For small and medium sized diagrams the results were satisfying in most of the
cases. In the rest cases, human intervention (moving, nailing) and rerun of the
drawing algorithm could rectify the problems.
13
Acknowledgements
The first author wants to thank Tonia Dellaporta for the several fruitful and really
enjoyable discussions on this issue. Also many thanks to Jean-Rock Maurisse, AnneFrance Brogneaux and Jean Herald for their help on using the DB-MAIN toolkit.
References
1. Jacky Akoka and Isabelle Comyn-Wattiau. “Entity-Relationship and ObjectOriented Model Automatic Clustering”. Data and Knowledge Engineering,
20(2):87–117, 1996.
2. Giuseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis Tollis. “Graph
drawing: algorithms for the visualization of graphs”. Prentice Hall Englewood Cliffs
(N.J.), 1999. ISBN/ISSN : 0-13-301615-3.
3. F. J. Braedenburg, M. Himsolt, and C. Rohrer. “An Experimental Comparison of
Force-Direceted and Randomized Graph Drawing Algorithms”. In Procs of Graph
Drawing, GD’95, pages 76–87, 1996.
4. J. Branke, F. Bucher, and H. Schmeck. “Using Genetic Algorithms for Drawing
Undirected Graphs”. In Procs of the 3rd Nordic Workshop on Genetic Algorithms
and Their Applications, 3NWGA, pages 193–2005, 1997.
5. Dan Brickley and R. V. Guha.
“Resource Description Framework
(RDF) Schema specification: Proposed Recommendation, W3C”, March 1999.
http://www.w3.org/TR/1999/PR-rdf-schema-19990303.
6. Sergey Brin and Lawrence Page. “The Anatomy of a Large-scale Hypertextual
Web Search Engine”. In Proceedings of the 7th International WWW Conference,
Brisbane, Australia, April 1998.
7. L. J. Campbell, Terry A. Halpin, and Henderik Alex Proper. “Conceptual Schemas
with Abstractions: Making Flat Conceptual Schemas More Comprehensible”. Data
and Knowledge Engineering, 20(1):39–85, 1996.
8. Rodolfo Castello, Rym Mili, and I. Tollis. “A Framework for the Static and Interactive Visualization of Statecharts”. Journal of Graph Algorithms and Applications,
6(3):313–351, 2002.
9. P. Chen. “The Entity-Relationship Model - Toward a Unified View of Data”. ACM
Transactions on Database Systems, 1(1):9–36, March 1976.
10. Richard Cole. “Automatic Layout of Concept Lattices using Force Directed Placement and Genetic Algorithms”. In Proc. of the 23th Australiasian Computer Science Conference, pages 47–53. Australian Computer Science Communications 1,
IEEE Computer Society, 2000.
11. R. Davidson and D. Harel. “Drawing Graphics Nicely Using Simulated Annealing”.
ACM Trans. Graph., 15, 1996.
12. M. Dean, D. Connolly, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, and L.A. Stein. “OWL Web Ontology Language 1.0
Reference”, 2002. (http://www.w3c.org/TR/owl-ref).
13. P. Eades. “A Heuristic for Graph Drawing”. Congressus Numerantium, 42, 1984.
14. Holger Eichelberger and Jurgen Wolff von Gudenberg. “UML Class Diagrams State of the Art in Layout Techniques”. In Proceeding of Vissoft 2003, International
Workshop on Visualizing Software for Understanding and Analysis, pages 30–34,
2003.
15. P. Feldman and D. Miller. “Entity Model Clustering: Structuring a Data Model
by Abstraction”. The Computer Journal, 29(4):348–360, 1986.
14
16. T. Fruchterman and E. Reingold. “Graph Drawing by Force-directed Placement”.
Software - Practice and Experience, 21(11):1129–1164, 1991.
17. F.U.N.D.P. “DB-MAIN”. (http://www.info.fundp.ac.be/∼dbm/).
18. Munish Gandhi, EdwardL Robertson, and Dirk Van Gucht. “Levelled Entity Relationship Model”. In Procs of the 13rd Intern. Conf. on the Entity Relationship
Approach, ER’94, pages 420–436, Manchester, U.K., December 1994.
19. Floris Geerts, Heikki Mannila, and Evimaria Terzi. “Relational Link-based ranking”. In Procs of the 30th Intern. Conference on Verly Large Data Bases,
VLDB’2004, Toronto, Canada, August 2004.
20. Zoltan Gyongyi, Hector Garcia-Molina, and Jan Pedersen. “Combating Web Spam
with TrustRank”. In Procs of the 30th Intern. Conference on Verly Large Data
Bases, VLDB’2004, Toronto, Canada, August 2004.
21. Jean-Luc Hainaut. “Transformation-based Database Engineering”. In Transformation of Knowledge, Information and Data: Theory and Applications. IDEA Group
Pub., 2004.
22. Jouni Huotari, Kalle Lyytinen, and Marketta Niemela. “Improving Graphical Information System Model Use with Elision and Connecting Lines”. ACM Transactions
on Computer-Human Interaction, 10(4), 2003.
23. Yannis E. Ioannidis, Miron Livny, Jian Bao, and Eben M. Haber. “User-Oriented
Visual Layout at Multiple Granularities”. In Proc. of the 3rd International Workshop on Advanced Visual Interfaces, pages 184–193, Gubbio, Italy, May 1996.
24. T. Kamada. ”On Visualization of Abstract Objects and Relations”. PhD thesis,
Dept. of Information Science, Univ. of Tokyo, Dec 1988.
25. Jon Kleinberg. “Authoritative Sources in a Hyperlinked Environment”. In Proceedings of 9th ACM-SIAM Symposium on Discrete Algorithms, San Francisco, USA,
1998.
26. Simon Lok and Steven Feiner. “A Survey of Automated Layout Techniques for
Information Presentations”. In Procs of the 1st. Int. Symp. on Smart Graphics,
Hawthorne, NY, 2001.
27. Rajeev Motwani and Prabhakar Raghavan. Randomized algorithms. Cambridge
University Press, 1995.
28. Arthur Ouwerkerk and Heiner Stuckenschmidt. “Visualizing RDF Data for P2P
Information Sharing”. In Procs of the workshop on Visualizing Information in
Knowledge Engineering, VIKE’03, Sanibel Island, FL, 2003.
29. Aaron J. Quigley. “Large Scale Relational Information Visualization, Clustering,
and Abstraction”. PhD thesis, University of Newcastle, Australia, August 2001.
(http://www.it.usyd.edu.au/∼aquigley/thesis/aquigley-thesis-mar-02.pdf).
30. O. Rauh and E. Stickel. ”Entity Tree Clustering: A Method for Simplifying ER
Design”. In Procs of the 11th Int. Conf. on Entity-Relationship Approach, ER’92.
31. K. Sugiyama and K. Misue. “A Simple and Unified Method for Drawing Graphs:
Magnetic-Spring Algorithm”. In Procs of Graph Drawing Conference, GD’94, pages
364–375, 1994.
32. K. Sugiyama and K. Misue. “Graph Drawing by Magnetic-Spring Model”. Journal
on Visual Lang. Comput., 6(3), 1995.
33. R. Tamassia, C. Batini, and M. Talamo. “An algorithm for automatic layout
of entity-relationship diagrams”. In Procs of the 3rd International Conference
on Entity-relationship approach to software engineering, pages 421–439. Elsevier
North-Holland, Inc., 1983.
34. T. J. Teory, W. Guangping, D. L. Bolton, and J. A. Koenig. “ER Model Clustering as an Aid for User Communication and Documentation in Database Design”.
Communications of the ACM, 32(8):975–987, 1989.
15
35. Yannis Tzitzikas and Jean-Luc Hainaut. ”Ranking the Elements of Conceptual
Diagrams”, 2005. (submitted for publication).
A
Linear Algebra Version of (B)EntityRank
Let A be the generalized adjacency matrix of an ER diagram where A[ei , ej ] equals
the number of transitions from ei to ej . Now the probability transition matrix M is
obtained by normalizing each row of A to sum to 1. EntityRank is based on a Markov
chain on the entity types with transition matrix
q · U + (1 − q) · M
where U is the transition matrix of uniform transition probabilities i.e. U[ei , ej ] = 1/N
for all i, j. The vector of the EntityRank scores, denoted by Sc, is then defined to
be the stationary distribution of this Markov chain. Equivalently, Sc is the principal
right eigenvector of the transition matrix (q · U + (1 − q) · M)T , since by definition
the stationary distribution satisfies (q · U + (1 − q) · M)T Sc = Sc. On the other hand,
BEntityRank (the biased version of EntityRank) is based on the transition matrix:
q · B + (1 − q) · M
i )|
where Attr denotes the set of all attributes of all entity
where B[ej , ei ] = |attrs(e
|Attr|
types (i.e. Attr = ∪{ attrs(e) | e ∈ E}).
As another remark note that in an undirected (strongly connected and non-bipartite)
graph G = (V, R), the stationary probability of a node u is given by P (u) = deg(u)
2|R|
where deg(u) is the degree of u [27]. This means that in an undirected graph (or multigraph) the stationary probabilities can be computed very efficiently and without the
need of an iterative algorithm. In our case we cannot employ the above method due to
the ”teleporting” transitions which are indispensable in our case for ensuring that the
transition graph is strongly connected (note that large ER diagrams are not always
connected). Specifically, the ”teleporting” transitions of BEntityRank are not symmetric and this cannot be captured by an undirected graph. For instance, consider the case
of an ER diagram consisting of two entity types e1 and e2 and one relationship type
between them, where e1 has one attribute and e2 has two attributes. According to a
random walk on the undirected graph both entity types have probability 1/2. According to BEntityRank if q = 0 then P (e1 ) = P (e2 ) = 1/2, if q = 1 then P (e1 ) = 1/3 and
P (e2 ) = 2/3, and if q = 0.5 then P (e1 ) = 0.44 and P (e2 ) = 0.55.
16