Proceedings of 24th Int. Conf. on Conceptual Modeling, ER’2005 Klagenfurt, Austria, Oct. 2005 How to Tame a Very Large ER Diagram (using Link Analysis and Force-Directed Drawing Algorithms) Yannis Tzitzikas1 and Jean-Luc Hainaut2 2 1 University of Crete and FORTH-ICS, Heraklion, Greece Institut d’Informatique, University of Namur (F.U.N.D.P.), Belgium email: [email protected], [email protected] Abstract. Understanding a large schema without the assistance of persons already familiar with it (and its associated applications), is a hard and very time consuming task that occurs very frequently in reverse engineering and in information integration. In this paper we describe a novel method that can aid the understanding and the visualization of very large ER diagrams that is inspired by the link analysis techniques that are used in Web Searching. Specifically, this method takes as input an ER diagram and returns a smaller (top-k) diagram that consists of the major entity and relationship types of the initial diagram. Concerning the drawing of the resulting top-k graphs in the 2D space, we propose a force-directed placement algorithm especially adapted for ER diagrams. Specifically, we describe and analyze experimentally two different force models and various configurations. The experimental evaluation on large diagrams of real world applications proved the effectiveness of this technique. 1 Prologue It has been recognized long ago that the usefulness of conceptual diagrams (e.g. ER/UML diagrams) degrades rapidly as they grow in size. Understanding a large schema without the assistance of persons already familiar with it, is often a nightmare. Unfortunately, large conceptual schemas are becoming more and more frequent. The integration of information systems, the development or reverse engineering of large systems, the usage of ERP (the SAP database includes 30.000 tables) and the development of the Semantic Web (structured into ontologies potentially including dozens of thousands of classes) naturally lead to the building of very large schemas. Although a good drawing of a conceptual schema could aid its understanding, and several approaches for automatic placement have been already proposed (e.g. see [33, 23, 8, 28, 10]), it is a widely accepted opinion that the automatic layout facilities offered by current UMLbased CASE tools are not satisfactory even for very small diagrams (for more 1 see [14])1 . Consequently, the vast majority of layouts created today are done ”by hand”; a human designer makes most, if not all, of the decisions about the position of the objects to be presented [26]. The visualization and drawing of large conceptual graphs is even less explored. The classical hierarchical decomposition techniques that are used for visualizing large plain graphs (for a survey see Chapter 3 of [29]), have not been applied or tested on conceptual graphs. Consequently, only manual collapsing mechanisms (like those described in [22]) are currently available for decreasing the visual clutter and for aiding the understanding of big conceptual graphs. In addition, the techniques that have have been proposed for reducing the size of a conceptual graph (in order to aid its comprehension), specifically ER clustering, either require human input [15, 34, 18, 7], or they are automated but not tested on large conceptual schemas [30, 1]. We decided to devise an automatic technique for identifying the major entity and relationship types of a very large conceptual graph as a means for facilitating its understanding. As Link Analysis has been proved very successful in Web Searching [6, 25] and recently in several other application domains [19, 20], we decided to design a similar in spirit technique for one very common and important kind of conceptual schemas, namely Entity-Relationship (ER) diagrams [9]. Concerning the drawing in the 2D space of the resulting top-k graphs, we describe a force-directed placement algorithm especially adapted for ER diagrams. Specifically, we describe and analyze experimentally two different force models and various configurations. Both of the techniques that are presented in this paper can be applied not only to ER diagrams, but also on other kinds of conceptual graphs. The Semantic Web is one interesting application area because it is founded on ontologies (potentially including dozens of thousands of classes) that are exchanged in a layout-missing format. In this context, the provision of top-k diagrams and automatic layout services is very important as they can aid understanding that is very important for accomplishing tasks like semantic annotation, creation of ontology mappings, ontology specialization, etc. This paper is structured as follows: Section 2 describes Link Analysis for ER diagrams, Section 3 introduces force-directed placement algorithms for ER diagrams and finally, Section 4 concludes this paper. 2 Link Analysis for ER Diagrams Our objective here is to identify the major entity and relationship types of a very large ER diagram in order to facilitate its understanding. We designed a PageRank [6] style scoring method because PageRank is described in terms on the entire Web, while HITS [25] is mainly applied on small collections of pages (say those retrieved in response to a query). We view an ER diagram as a triple (E, R, I) where E = {e1 , ..., eN } denotes the entity types, R = {r1 , ..., rm } denotes the relationship types, and I the isA relationships over E (i.e. I ⊆ 1 General graph drawings algorithms (e.g. see [2]) usually make some assumptions that are not always valid in conceptual graphs. 2 E × E). For any given e ∈ E, we shall use connR (e) to denote those entity types that are connected with e through relationship types, connsb (e) denote the direct subtypes of e, and connsp (e) the direct supertypes of e. We shall also use the following shorthands: connI (e) = connsb (e) ∪ connsp (e) and conn(e) = connR (e) ∪ connI (e). Since two entity types may be connected with more than one relationship types we consider connR (e) as a bag for being able to record duplicates. In addition, we shall use attrs(e) to denote the attributes of an entity type e. Now the score (or EntityRank) of an entity type e in E, denoted Sc(e), can be defined as follows: X Sc(e) = q/N + (1 − q) ∗ e0 ∈connR (e) Sc(e0 ) |connR (e0 )| (1) where q stands for a constant less than 1 (e.g. 0.15 as in the case of Google)2 . One can easily see that the above formula simulates a random walk in the schema. Under this view each relationship type is viewed as a bidirectional transition and the probability of randomly jumping to an entity type is the same for all entity types (i.e. q/N ). The resulting scores of the entity types correspond to the stationary probabilities of the Markov chain. A rising question here is how we can incorporate n-ary (n > 2) relationship types into the aforementioned model. This can be achieved by replacing each nary relationship type (n > 2) over n entity types e1 , ..., en by n(n − 1)/2 binary relationship types that form a complete graph3 over e1 , ..., en . Consequently, an n-ary relationship type is viewed as n(n − 1)/2 binary relationships. An alternative approach is to assume that the probability of jumping to a random entity is not the same for all entities, but it depends on the number of its attributes. In this case we define the score (or BEntityRank) of an entity type e in E as follows: Sc(e) = q |attrs(e)| + (1 − q) · |Attr| X e0 ∈connR (e) Sc(e0 ) |connR (e0 )| (2) where Attr denotes the set of all attributes of all entity types (i.e. Attr = ∪{ attrs(e) | e ∈ E}). This particular formula simulates a user navigating randomly in the schema who jumps to a random entity e with probability q|attrs(e)| |Attr| or follows a random relationship type (on the current entity). The probability |attrs(e)| corresponds to the probability of selecting e by clicking randomly on a |Attr| list that enumerates the attributes of all entity types of the schema. The linear algebra version of EntityRank and BEntityRank is given in Appendix A. 2 3 If we set q = 0.15 or below then an iterative method for computing the scores (e.g. the Jacobi method) requires at most 100 iterations to convergence. A complete graph is a graph in which each pair of graph vertices is connected by an edge. 3 Let’s now discuss the differences between link analysis for ER diagrams and link analysis for the Web. Firstly, Web links are directed, while relationship types are not directed thus the latter are considered as bidirectional transitions. Secondly, we do not collapse all relationships types between two entity types into one (as it is done with Web links), and this is the reason why we consider connR as a bag. Thirdly, in ER diagrams we should count ”self hyperlinks” (i.e. cyclic relationship types), although the Web techniques ignore them. For example, consider a schema consisting of two entity types {Person, City} and two relationship types {Person lives City, Person f atherOf Person}. If we ignore the relationship type f atherOf , then both Person and City would be equally scored, a not so good choice. At last, although in the Web link analysis is exploited mainly for ranking the results of retrieval queries, in our case we don’t need just a ranked list of entity types, but rather another diagram that consists of the major entity and relationship types. We have implemented and evaluated the above scoring schemes into the DB − MAIN CASE tool (for more see [17, 21]). The designer provides a threshold per between 0 and 100. Subsequently, all entity types with score lower than per% ∗ ScM ax, where ScM ax denotes the highest score, disappear. Controlling the visibility of entity types according to their score, and not according to their rank, is preferred as it better handles ties. Concerning relationship types, only those that connect the visible entity types are displayed. The computation of the scores takes only some seconds on a conventional PC. Specifically, to compute the scores we use the Jacobi iterative algorithm. We have noticed that 50 iterations give quite stable orderings and their application on schemas with 1000 entity and relationship types takes less than 2 seconds in a conventional PC. colvatmsl/c olva tmsldet fia /ori 1-1 0-N RETOUR FOURNISSEUR Numéro de réparation Quantité re tournée c hez le fou Date de cré ation ér pa ration Date d'envoi aufournisse ur PRIX:REFERENCE/CATALOGUE page -seconda rie 1 page -seconda rie 2 page -seconda rie 3 c ode tva tauxde tva francais sur face type d'offre type de prix PRIX:POSITION PRIX: REFERENCE tauxde pondera tion c oloris page livre bla nc prxre f/1-1 prxre fcat ireference ta lile lbe0-N le refe renc e c ommenta rie ref mere libelle coloris 1-N top1-1 a libellé taille topb prxre f/prxpos libellé position topc Code positionmere topd DUCATALOGUE 0-NatPRIX:CATALOGUE-RAYON PRIX:P OSITIONS refc 0-N 1-1 prxraycat/prx libelle rayo n 1-1oscat scann t/perxrefcat 1-1 prxp prixos/p derxp revi etop ntprx preepo visio lf se cteur EXCEPTION( multi)p 1-1rése ntation prix de ve ntetop frafnca si groupe chiffre ca talogue antiteqreuealise 1e/p Nays ve nte prix de ve ntequ theori CA TTCré alisé antictheolreoqaliiqsueefra nce topmultiple prix de ve ntequ psy CA hors taxe ér alisé hiffre cata ol gue prixrevient previsionnelfra pri n x de ve ntec ret oscat/prxp 0-N 1-1 osexeenu marge brute hors taxe ré alisé e prixde vente francais prxp prxpgec at/prxrefcat quantité réa ilsée France bf-prévisionnel calculé quantité prévue fra nce quantité réa ilsée /pays vente bf-réalisé codifica tion quantité prévue Fra nce c a horsta xe prévisionnel flagmodif position 0-N 1-1 marge brute hors taxe prévisio flagblocage des prix multiplic ateur ff prévisionne l type de prix codification PRIX:PAGESDUCATALOGUE page FAMILLE PILE PILE/MARQUE 1-1 libe le de la page famille Be bat Code marque CA TTC réalisé libellé fa mille BebCod a t e pile dans la marque CA hors taxe réalisé Marge brute hors taxe réalisée 1-N 1-1 CA hors prx taxe préc at/p visionnel t Marge brute pay hors taxerxrefca pré visio pile, famille pile1-1 Correspo PILE0-Nnd prxpaycat/prxra yca t PRIX:CATALOGUE N¦ Be bat Code catalogue libellé Bebat libelle catalogue Code format pile ac tionpays d'origine Code I.E.C.pile saisonpa ys d'origine Poidsde la pile Zone action 1-N Ha uteur pile Zone de prix Longueur pile durée cata ol gue e nmois PRODUIT FOURNISSEUR 0-N Largeur pile da te da' oj ut OFFRE PRODUIT référence fournisse ur numé rooffre libe lé produit fournisse ur prxcat /prxpaycat%fraisde transpor t date début validité 0-N date finva lidité %droits de douane TYPE D'EMBALLAGE montant minim um à comma nde r %commissiond'agent code type e mballage quantité minimum à c ommander %fraisdivers achat produit libellé date de transmission par fourn unité de mesure libellé de l'offre MODE DECREDIT nombre article par boite type d'offre c ode c rédit quantité minimum à commander désigna tionc rédit montantminimum à c ommander délais de livraison texte bonde commande nombre moisgarantie sur pièce nombre moisde gara ntie sur pr tauxd'importation type de fournisseur quantité /boite fournisseur 0-N tauxinspec tion TYPEADRESSE c ode a dresse descriptionadre sse LIGNE PHOTO nume rode ligne Nombre d'article ssurla photo c ode montage studio topmonta ge fait 1-1 0-N PRIX:CATALOGUE PARPAYS taux de change STATISTIQUESDE CONSOMMATION taux de cession c ode sta tis tiques de consommat TYPEARTICLESCORING pa rtie fixe désigna tionsta tistiques de co taux de cessionldf Code type art/scoring pa rtie fixe LDF Libellé diffusion actionpa ys ve nte saison pays vente Code devise 1-1 prxray/prxref TOPCLIENT Code Top Client Libellé Top clie nt Top 1-1e r/détoper prxray/prxrayca t tra ductioncodes/type emballag OFFREPOSITION ligne offre prix de base prix de l'offre prix da' cha t unité de commande Prixde revient prévisionnel 0-1 type d'offre tauxristourne 1 tauxristourne 2 0-1 tauxristourne 3 taux0-N ristourne 4 tauxristourne 5 frais d'emba lage PRIX:RAYON numéro offre rayon 0-N ilbe le rayon 0-N secteur Ve nte à cr édit group e té e nde vise in valeurmen suali nombre me nsua ilté s valeura compte e ndevise ni ter taegma ximum taegrée l nombre ma xterme s autorisé e ch/photoa prvealeur ndremensualité e nde vise ex valeura compte e ndevise exte r type de top 1-1 1-1 SUIVI DESTOPSSANSCOMMANDE Nombre de Clients 1-1 1-1 suivi bnb 1-1 pays 1-1 REFERENCECONTIENT nombre de piles tva/ec h SERVICE c ode se rvice désigna tionse vr ice Topservice dans prixa rtic el 0-N code fournisseur Nom du of urnisseur Numéro de TVA numérode registre de comme rce Code jour de livraison 1 Topte xte libre fournisseur Délaide ilvraison Va el ur minimum à commander Date de dé but de vaca nce s Date de finde vac anc es type de fournisseur Code jour de livraison 2 Code conditiond'envoi Poids minimum à commander Volume minimum à comma nder Numero de l'a che teur Delaide confirmationbonde c nom fournisseur code oj ur de livra sion délais de livraisonmaximum Code mode de paiement Numéro compta ble dufournisseu Code de vise Toplivr aisonsé paré e Topdema nde sé paré e Topfournisseur pe rso Code mode de dema nde 1-1 e stlivre 0-N 0-N 0-N 0-N 1-1 1-1 0-N mode livraison/pa ys ech pa rdéfaut code tva pa y/tpbsta REFERENCE ARTICLE Numérode ré férence article Libellé franc aisde l'article Code marque article Code article enc ombrant Code article LDF Code tissu Topsanspromotion Code plage d'attr ibutionnumé r Topproduit Type article pourpromotion Fournisseur pe rsonna lisation Code statistique de c onsommati Libellé née rla nda is de l'artic Code de famille Code taille groupée Taille minimum Code mode d'e mploi Numérode l'acheteur Code type d'e mballage Topdesc ription te chnique topfragile Nombre de piece sdans le lot topemballa ge Toptexte pour conce ption Toptexte condition SAAV toptexte de contrôle numéro d'a vis topinstruc tionsaisie cde topdesc riptionducolis topinstruc tionpourle pic kin toptexte unitaire catpopsrix pay/p ay toptexte pour transporteur toptexte pour showroom Toppré montage Refere nce produit decompose toptyp/tpbcli SUIVI DESTOPSCLIENT Nombre de Clients services proposés quantité se rvice Topmodification logistique 1-N 1-1 0-N 0-N date dé but de validité 0-N 0-N da te de finde validité 0-Nfrn/rc nlgn est géré 0-N 1-1 0-N 0-N ej t/tpbc il date de réce ption 0-N DATE date Numéro de sema ni e 1-1 0-N 0-N e ststocké 0-N SAISON Code saison Libe lé de la saison Toptra ite mentpar fac turation 0-N 1-1 CONDITIONNEMENT Code conditionneme nt Libe lé c onditionneme nt Cout unita ire SUIVI COMMANDES Numérode sé que nce suivic omma Etat pour syste me de mesure 1-1 0-N 1-1 0-N 0-N 0-N 1-N 0-N 0-N 0-N habitation/pa ys 1-1 Rés ervé e à cette ac tivité frn/pay 0-N 0-N 0-N 0-N 0-N catpospay/dev 0-N pay/soprue pay/tpbcli 0-N pays de la devise pays/quartier 0-N fournisseur/pays description 0-1 1-1 décrire 0-N 0-N 0-N 0-N c alcul canal Borne supérieur e code postal Borne infé rie ure code postal Notice 0-N SECTEUR ACTIVITE Code secteura ctivité libellé secteurd'a ctivité code enseigne 0-N Unité sbudgé taire Prix de ve nte/pay de texte cible Valeur ca talogue de l'aCo rtide Co dec el(texte stimula tion Valeur hors tva de l'article de texte special Valeur tva de l'a rtic elCo (inter Valeur tva de l'a rtic elCo (edextertexte c alcul Tauxde TVA TopTVAdansprixcata ol gue 0-N 0-N 0-N FRAISPAR DEFAUT Code frais Montantdes fraisde port ni te Montantdes fraisde port exte Ta uxde c hange cn/pay 1-N SUIVI ACTIVITE Code application Code mouve ment Code type mouve ments Cumulc ommande Cumulc olis Cumula rtic el Cumulvaleur ca talogue 0-N 0-N 0-N Ca deaua ssocié 0-N 0-N 0-N LANGUE code langue libe lé c od0e-Nlang ue PAYSENS libe EIGNE le langueLANGUE 0-N Top la ngue pour le c lient 0-N Poids max/c ana l Poids maximum ducolis Poids minim um ducolis 0-N 0-N 1-1 0-1 1-1 cn-cnd 0-N c n/circuit 0-N 0-N cn/c ntyp TYPE PAIEMENT RECEPTION c ode type paieme nt réc eption libellécn/rla type pa ie ment réceptio 0-1 0-1 LOGEMENT c ode logeme nt 0-NType de logement Code bloc age Code état dulogement Code pse udoniveau Code état d'inventaire Da te dernie rinve ntaire Da te inve ntaire - 1 Da te inve ntaire - 2 Da te inve ntaire - 3 Da te inve ntaire - 4 1-1 0-N Toppro tec tionincendie Code type de prélèvement Code groupe de loge ment 0-N e nvexd/lgt 1-N 0-1 1-N 0-1 ENVOI EXCEDENCE Nombre d'articles UNITEBUDGETAIRE Nombre de préleve ments Numérod'unité budgétaire Type d'e nvoi excéde nce Libellé 1-1 code état CONTROLEnMOUVEM ENTSnPAIEMENTS umérode sélectio Numéro contrôle mouveme nt Da te de réc eption1b-1 ande Nombre de record e enve xd/envex dpras sur band Valeur totale paiement interne 0-N t Code traitemen DaATI te deON paiement ENVOISEXEDENCE PREPAR Ident ifica tionba nde paiement Nombres d'articles Nombre de prélèveme nts Use r colis Prixde vente a exc edence Valeur totale paiement externe Type d'envoise xce dence Topa rchivage Numérode sé lection exc édence Valeur paiement anomalie inter Valeur paiement anomalie exte r Code etat TYPEDE CANAL code type de ca nal libe lé dutype de canal[0-1] topc an1-1 a lde c ommande(o/n)[0-1] topc ana lde livraison(o/n)[0-1] topsa isie commandes[0-1] COMMENTAIRE Numé rocommentaire Libellé fra nca is commenta ire Libellé néerlandais comme ntair 1-1 support duloge ment 0-N 0-N TYPECIRCULAIRE Code type circula ire Libellé type circula ire Code ca talogue/cade au Topbond'ac hat Topristourne Topà payer PARTICULARITE MOUVEMENT Type de mouvement Code pa rtic ularité mouvement Date débutvalidité Date f in validité Libellé pa rtic ularité cn/controle goulotte Stat par canal cnNu /csamé rode niveaude statistiqu PROMOTION D'UNE COMMANDE Réponse a uconcours CANAL/FERM NombETURE re de promotions Code é tat Valeur promotioninterne Valeur promotione xterne 0-N 0-N 1-N 0-N 0-N FRAISD'EXPEDITION Poids maximum frais d'expéditi Poids minimum fraise xpédition Va el ur frais de' xpédition 0-1 0-N 1-1 0-N 1-N 1-N 1-1 rs 0-1 concou 0-1 1-1 1-N 0-N c n/cnla n tpd/tpdliv c n/colsta SUPPORT DESTOCKAGE COLISSANSADRESSES CONTROLE CALCUL GOULOTTE GRP -PRM#C PROMOTION c ode c anal Code support de stocka ge Composteur RES ERVATION Libe lé fra nca is dusupport Date de retour topsec teur ac tivité goREGR ulotteOUPEMENT STATISTIQUE code promotio0-N n TYPE CENTREDISPATCHING Longueurdusupport topmode de réceptionCod gouelotte regroupement Texte de RM# la Dé promoti DET-P Ctail onde reser Code type ce ntr e dis patching opéra tric e d'ouverture toppays pourgoulotteLibellé regroupeme nt La rgeur du support Début d0e-N va lidité Pays Hauteur dusupport c ode c analdéviationgoulotte Finde va lidité Code postal CONTROLECIRCUITPoids à vide c ode c analdéviationtourné e TopSans commande pr évu 1-N REVALORISATION Numérode localité Code circuit Actionpré vue Nombre de logements contigCo us mmenta ir e oucolis retrouvé FERMETURE RELAIS Numérode reva lorisation TYPE FRAISD'EXPEDITION Nombre de logements opposés code canal Da te reva lorisation Créate ur commentaire c sa/csade t c nd-cndtyp Libellé circuit Code type fraisd'e xpédition Stock ava nt 1-N Date début ferme ture Indice supplémenta ire 0-N Libellé de frais d'expédition 1-1 Date finfermeture Prixavant LIGNE DECOLISSANSADRESSE Quantité livrée / fournisseur CIRCUIT 1-1 Indic e de la ligne Prixfournisseur Nouveauprix moyen pondé ré code origine 0-N Référence article TYPESTOCK CENTRE DISPATCHING Code type de stock JOUR Code ce ntr e dis patching Libellé type de stock Libelle c entre dispatc hing Libellé dujour 0-1 Donné e e ntr eprise centre dispa TOMBOLACHANCEARTICLE numé rodubillet 0-N 0-N 1-N DISPONIBILITE Code disponibilité Libellé disponibilité cn/pnttyp 1-1 0-1 0-N 0-N revalorise r QUANTITEEN STOCK Stoc kthé orique 0-N calculgoulotte Borne supé rie ure code postal Borne ni férieure code postal code tournée TRANSPORTEURPAR CANAL c crtyp/envcop 0-N 1-1 0-N op/sai 0-N Document par langue 1-1 0-N 0-N 0-N 0-N 0-N 1-1 EARLY BIRD Nombre de cadeaux attribué sto Nombre de cadeaux maxim um tota Nombre de cadeaux maxim um/jour Nombre de jours Numéro dujour en cours Nombre de cadeaux dujour en c 0-N lan/qrt 1-1 0-N 0-N SERVICE PAYS Cout duservic e interne Cout duservic e ext erne Ta uxde c hange Largeur emballé e de l'art ci el 0-N Ha uteu r emballée de l'a rtic le Longueur montée de 'larticle Largeur montée de 'larticle Ha uteur monté e de l'article Code fournisseur LDF Nombre minimum article stock Code frais Code type de remplace ment Topabandonde 'larticle Type de message sur bor derea u Volume article pour transport Code loge ment Prixvente solde com/lan Cod agne suivant ilbe lrepo é durtc ocamp mmentaire Toparticle sur-stock Topmodif prixsolde toparticle composé[0-1] Da te ni itia ilsationstatistiqu 0-N 0-N 0-N 0-N 1-N 0-N 0-N 0-1 FACTURE Numéro de facture Valeurfa cture interne c ode c anal Valeurfa cture e xterne 0-N 0-N CANAL c ode c ana l libellé duca nal[0-1] délaide livraison[0-1] topca nal de commande(o/n)[0-1] topca nal de livraison(o/n)[0-1] Mouve mentde stock topca nal de retour[0-1] Numéro de mouve ment type de mouvementde paiement[0-1] Nombre de pièc es c ode blocage[0-1] Da te dumouvement c ode c ana lde déroutage [0-1] Prixac hat sta nda rd topsaisie des comma ndes[0-1] Prixde vente de ba se nom duresponsa ble duca nal[0-1] Toptexte associé numé rode téléphone ducanal[0-1] c ode c irc uit[0-1] c ode langue [0-1] c ode posta l[0-1] c ode dutype de ma gasin[0-1] Nombre maximum de colis Topscission ducolis Topattributionnolivraison Type de bordereau de livraison Origine dumouve ment PAR Code type centre dispatching 0-N Numérode code postal/localité Code depot Nombre de poste de travail/heu 0-N 0-N 1-1 1-1 prmrsv/sai Cadeau ARTICLE VPC Code article VPC libellé francais article VPC Libellé né erlandaisa rtic le VP Code colorisde 'larticle Dé al ide livra sion 0-N Code édition Code exception Longueur emballé e de l'a tr ci el article vpc pièc e Poidsde 'larticle a rticle vpc ense mble 0-N Code pli article 0-N Prixac hat standard SUIVIOPERATRICE Volume de l'a rtic el Code oper atrice Prixde vente de base Code type de délai de livraiso Unité de ve nte code type de relais Nombre de colis pour 'larticle Code type de suivi opératrice Toptec hnique 1 Cumul commande Toptec hniquositi e 2 ond'un ense mble Cumul ilgne Prixacdécomp hanti at ho rsdan cession le téVP Cumul temps de' ncodage Code qu article C sinensemb terne Cumul vale ur catalogue Code article VPC externe 0-N e stre mplac e Cumul vale ur nette Quantité minim um article piRemp c ki lac ement0-1 quantité ré servé e 24h re mplac e 1-1 Code taille de 'larticle 0-1 1-1 1-1 COMPOSITION D'ENVOI Code compositiond'envoi Libellé compositiond'envoi Commentaire Dernier je t Date dernier jet Topenvoireta rdé Date d'envoiprobable Topcircuit Actionà éditer Valeur bond'achat externe Valeur minim ale commande exter Canal commande Taux de ristourne à édit er 1-1 Canal livraison 0-N 0-N 0-N 0-N 0-N 0-1 Sec teur ac tivite 0-N Mise enlogeme nt Quantité entrée en stock Code état Code article vpc 0-N 0-N cadeaua preleve r Libe lé a tr ci el à c ommuniquer POS-DTE nbre art. disponibles nbre art. suspens nbre art. autre stock encours cde frn lan/tpbc il fournisseur/langue 0-N individu/pays 0-N envcop/jet 0-N 0-N 1-N 0-N 0-N 1-1 1-N STATFOURNISSEUR MENSUELLES Code mois anne e Nombre d'articles réc eptionnés Chiffre d'affa rie e ndevise Chiffre d'affa rie Chiffre d'affa rie prixde revi c d/eta 0-N E 0-N CREDITPAR MONTANT Prixde vente compt ant externe Va le ur de l'ac ompte externe Va le ur de mensua lité externe Nombre de mensua lités Tauxa nnuel effect if global ma Tauxa nnuel globalre elutilis PARAMETRER SIMULATION/SECTEUR Déc alage envoi/r éception 0-N date de demande 0-N dema nde clie nt date de saisie Code é tat top circuit Numé rode c ommande[0-1] Motif réponse subsidiaire Code t ype de saisie 1-1 Co de user de saisie 0-N 1-1 POSITIONS/ACTIVITE/MOIS quantité vendue par mois LOT Affra nchissement unita rie 0-N 0-N c gp/tpbcli 0-N 0-N al n/soprue dte/pay tauxde c hange 0-N 0-N 0-N POS-SEM nombre d'ar itcle par sema ni e liene ch/pos 0-N 1-N pays dufournisseur Suivi des jets pondération tauxre partition tai/clr tauxre partition clr/ref tauxpondéra tionlongterme ex 0-N 0-N ETAT D'AVANCEMENT (LIGNE OU CO type d'éta t[0-1] c ode é tat libellé état[0-1] Libellé état en 3cara ctère s Libellé état en 10c aract ères Nombre de jours de réte ntion Code refus 1-1 0-N 0-N Da te de de pot 1-1 servic1-N es repris Cout service à la clie ntè el in Cout service à la clie ntè el 0-1 1-N 1-N 1-N 1-1 1-1 0-1 1-1 pays de provenanc e 0-N 0-N 0-N 0-N Carte 0-1 sa che t 1-1 lan/tpbsta 1-1 1-1 1-1 pondération taux repa rtitiontai/clr taux repa rtitionclr/ref taux pondé rationlongte rme ex re pre sente 0-N 0-N e ta/etatyp JET Numérode ej t Date de dema nde 1-1 dte/je t 0-N POSITION photoa pre ndre 0-N 0-N 0-N CAT-PRD- SEM nombr e de référe nce c umulée Longterme a ctualisé longterme e xtrapole 1-N ACHETEUR c ode a che teur Nom acheteur groupe achat c ode paysde provenance par de c ode paysde fabricationpar d 10période sde cade nce ment nombre de jourde sécurité numé rode personne tauxde c ouverture Période d'échelonnement[ 10-10] 1-1 PAR défaut Code TVA 0-N 0-N 1-1 0-N 1-N tpbmtf/tpbsta RECEPTION numé roréce ption date réce ption réelle 0-N 0-N 0-N Da te mise enstockde ré ceptio nom dufirme transporte ur 0-N numé roplaque camion heure a rrivée transport heure dépa rttra nsport nombre colis planifié nombre pa el ttes planifié nombre colis réel nombre pa el ttes réel texte réc eption TYPEDOSSIER SAAV Code type dossie rSAAVnumé roemplacement stockte mpo nombre de ligne sde réce ption Libellé dutype dossier0-N SAAV Code état SUIVRE LE DOSSIER Code gestionachat Heure Poidsta xable Texte suivi dos sie rSAVCu bage (en m3) Code type de réc eption rcn/rcnlg c ode marque 1-n1 c ode type a rtic el ré ception LIGNE RECEPTION uméro réception1-N nombre pe rsonnn e ma lig nutneention 1-1 c ode deballe nombre a rtic el sre ceptionnés c nomb re asprtic el sré ceptionnés n Da te arrivée qu ai tran type article livré code état avanceme nt ligne ré c topc ompte renduréc eption date finré ception nom responsable contrôle quali nombre a rtic el sré ceptionthéo semaine texte ilbre de réception année topmarquage à la réce ption pr econditionne ment à al ré c semaine 1-N to code secte ur da' ctivité 0-N FOURNISSEUR 1-1 tpb/tpbsta frnpubptc/mrq c atref/svc Nombre de se rvice Cout duse rvice à la clientèle CAT-PRD-DTE Longte rme actualisé nombre de ré férence cumulée longte rme pré vu ta uxa vancement produit/c atalo ta uxc ouverture ta uxmajoration ta uxre tours produit 0-N 0-N 0-N 0-N dte 2/tpbstadte 0-N 0-N 1-N remise finanné e fournisseur tauxde ris tourne type de ristourne Libe lé ristourne fournisse ur 1-1 1-N pa ei mentsdes mensualités Noséquentiel Type de record Montant à payer (inte rne) TVA Date dupaiement c ode TVA Montant payé (inter ne) répartition/tLib va ellé TVA0-N Libe lé paiement 0-N Taux de répartitio nde laTatvauxde tva Quia liquidé indice comptable de la tva Pa rtdes fraisfinanciers Toptraite mentmise à jour Pa rtde la tva dans la mensual Pa rtà re mbourser (inte rne) Montant à payer (externe) Montant payé (externe) pays de vente Pa rtà re mbourser (externe) 1-1 MARQUE code marque 0-N crédits possibles 0-N 0-N 0-N 0-N est commandé gere 0-N 1-Ndte/tpbstadte dte3/tpbsta dte est commandé1-1 par 1-1 TRANCHECA 1-N CA fournisseur de base FILIALE c ode filiale Libellé filia le TYPEETAT Code type éta t Libellé type éta t top/a ction COMMANDE numerode1-1 commande date livraisonproblable date rende z-vous prixac hat e nFB date de mande quantité demandée prixac hat e ndevise est prese nt date butoir pour lay-out topfiche tec hnique toptissu topbois tauxde ris tourne etat cde numérode el ttre Date initiale de la livraison fac ture / commande 1-N est emballé 1-1 0-N 0-N 1-N 0-N implanté 1-1 dte /tpbsta MODE DELIVRAISON c ode mode de ilvraison libelle mode de livraison 0-N 0-N TYPEDE TOPCLIENT 0-N conc erne date de réception 1-1 0-1 0-N ECHANTILLON CONTIENT nombre de piles 0-N 0-N im planter le sta illes Taille impla ntée 1-1 PRODUIT/CATEGORIE Nombre d'articles Topinformation 0-N ECHANTILLON numéro échantillon libellé née rla nda si libellé franc ais libellé fournisseur Volume da' rtic el pour rt anspor poids nombre de colis hauteurmonté large ur monté longue ur monté hauteure mballé large ur emba lé longue ur emba lé volume e mballé code mode d'emploi top fragile taille top suivi prix achat article 1-1 top conforme top reponse courrie r top ve nda ble top mode emploi conforme top emballa ge conforme unité compléme ntaire prix vente estimé Topatt Code coloride l'article Toptest externe Refere nce c hez le fournisse ur Code fournisseur de l'a rticle al ngue 1-1 0-N 0-N tva par dé faut 1-N 0-N REFERENCEDOUANE Code réfé renc e douane Comme ntaire de la ré férence do Code unité complémenta ire doua Libe lé réfé renc e douaniè re Libe lé franca is réfé renc e dou code c oloris libellé colorise nfrancais libellé colorise nné erlandais TRADUCTION DES CODES code type de texte Valeur de laclé Traduction JOURNALDESVENTES 1-1 0-N BIC Numé rode réfé rence de l'a rtic Code filiale Numé roBIC Libellé duBIC COLORIS 0-N 1-1 prxpa yca t/prxpgec at 0-N 1-N 0-N 1-1 RECEPTIONECHANTILLON numéror éce ption 0-N date r éc eption rée lle nomdu firme trans porteur COMPLEMENT DOSSI ER SAAV numéroplaque camion Code pa ys heure ar rivé e trans port Nom de l'a dresse heure dé part trans port 1-1 Numérode téléphone nombre colis planifié 0-1 1-1 Localité nombre pale ttes planifié CORRESPONDANT 3SUISS CodES e postal nombre colis ré el cn#c Rue de 'ladre sse 0-1 nombre pale ttes ré el Numérode 'ladre sse stock??? Numéroducolis te xte réc eption POSITION FOURNISSEUR FACTURATION Da te facture LIEU DELIVRAISON code fo urniss DETAIL DOSSIE R SAAV me nrofacturation Code fournisseurde 'larticle Numéro de répanu ratio e stpe 01-1 -Nliv ré eur TYPEINTERVENTION Nu code lie u cod e ty article méro de dossier libellé position/fournisse ur ligne de facturation libe llé Code type di'nterve ntion Numérode dossier Numéro de ligne nombre de ligne s de r éce pti on ligne type fournisse ur nbre article fa cturé Libellé type di'nterve ntionSA LIGNERE 1-1 CE 0-NPTION ECHANTILLON Texte de1al-1ligne Code état prixunita rie facture /rec ept oi n numero de l gi ne 0-N Code ge stionachat c ommenta rie quantité the or ique Poids taxable e tat factura tion ECHANTILLON/CATEGORIE MODE DEgeDEM 1-1 0-N Cuba (e nANDE m3 ) DE PERSO date finlivrai son LIGNE PROPOSITIOND'ACHAT Nombre d'articlesTYPE DETOP PUBLICITAIRE DOSSIERSAAV Code mod eedetydemand code e tat Cod rée ceptio crédit repris/c atalogue 0-N Numéroligne propositiond'ach LibelléDate de mo depedeedednemastock nde denr éce ptio Numérode1-N dossier SAV mise est réc eptionné Date initia el de ilvraison Date dudos ise rSAV TYPED'INTERV PARrDOS SIE quantité livré e réelle Qua ntitée proposé e code marquage RéféreENTION nce f ournisseu article mois toptyp/tpc quantité livré e nonconforme code type article r éce ption Nombre minimum article à comma Libellé francais article anné e nombre pers onne manute ntion Numérode 'loffre 0-N Numérode page dudocume nt eta tlv code de balle mois Qua ntité rée le proposée Code saison RESERVATION PRODUIT COLORIS Date ar rivIéTIONS e quaiD'tra PROPOS Ansp CHAT PRODUIT TAILLEnumérobonrés erva tion Code action libellé francais coloris dupr Todeprpendez usac hat Numéro ropositvo ion taille Code éta t Article cata ol gue Type de propositionac hat VENTE AUCOMPTANT libellé né erlandaisducoloris ilbe lé taille francais Numérode clie nt (interne) nombre d'articles rése rvés tpbsta/toptyp nombre me nsua iltés ma ximum pou Da te ni itia el de ilvraison ilbe lé taille ennéerlandais Code canal nbarticles commandés sur rése Da te propositiond'achat prixve nte c omptant ilbe lé taille née rlandais Numérodu colis période de rés erva tion: début Horizon ennombre de jour descriptionte chnique période de rés erva tion: fin Date de facturation Code état Numérode ér pa ration 1-1 1-1 Code type da' dresse Code état 1-1 Prixde vente de base 1-1 0-N Se cteur d'ac itvité Numéroliena dresse courrier Numéroliena dresse livraison Numéroadresse si absent 0-N 1-1 prmrsv/prmtyp dev/pntctl 1-1 0-N 0-N 0-N prm/prmrsv Fig. 1. Excerpt (< 10%) of a large ER diagram drawn using a force-directed placement algorithm Figure 1 shows a very small part of the ER diagram of a Belgian distribution company. Though the schema comprises about 450 nodes and 800 edges only, the 4 layout is definitely useless for understanding the schema and the corresponding application. COLIS 0-N 0-N 1-1 CANAL 0-N 0-N LIVRAISON COLIS 1-1 1-1 0-N FOURNISSEUR cn/suicol réexpédition colis 0-N colcom/suicol SUIVI LIVRAISON COLIS 0-N 0-N 0-N col/eta 1-N 0-N remise fin année fournisseur 0-N 1-1 ETAT D'AVANCEMENT (LIGNE OU CO 0-N SUIVRE LE DOSSIER 0-N 0-N 1-1 1-1 article vpc pièce 0-N 1-N 1-1 0-N date début de validité 0-N 0-N 1-1 1-1 1-1 0-1 DATE article vpc ensemble 0-N 1-1 date de fin de validité 0-N STAT FOURNISSEUR artvpc/frnMENSUELLES 0-N décomposition d'un ensemble fournisseur/langue 0-N achfrn/lan POS-DTE 0-N description décrire 0-N 1-1 0-N 0-1 ARTICLE VPC 1-1 est remplace 0-1 remplace 0-N Notice 1-1 0-N artref/artvpc Canal Canalcommande livraison 0-1 HISTORIQUE cd/suicolCOLIS Poids max/canal Remplacement pays de la devise EARLY BIRD pays du fournisseur fournisseur/pays 0-N 0-N 0-N 0-N cd/eta 0-N dte/pay 0-N 0-N 0-N 0-N 0-N 1-N Mouvement de stock 1-1 0-N 0-N 0-N 0-N 0-N FORMULE 0-N PAR PAYS/LANGUE 0-N artref/tva simulation prixartref/douref de vente Pays d'origine artref/atvsec SUIVI DES TOPS CLIENT 0-N 0-N 0-N 0-N 0-N PAYS ENSEIGNE LANGUE 0-N 0-N 0-N 0-N 0-N atvsec/cd 0-N SECTEUR ACTIVITE 0-N FRAIS PAR DEFAUT 0-N 0-N 0-N 1-N 0-N 0-N 0-N 1-1 PAYS LANGUE Composition 0-N COMMANDE LANGUE Date de depot 1-1 LOT 0-N ALIMENTER 0-N 0-N 0-N 0-N pondération calculSUIVI canalACTIVITE calcul goulotte 1-N 0-N 1-N 0-1 0-N 1-N 1-1 0-1 REFERENCE ARTICLE cumul mouvement systeme mesure POSITIONS/ACTIVITE/MOIS 0-N 1-N 0-N 0-N 0-N 0-N 0-N 0-N 0-N 0-N 1-NCadeau associé 0-N cn/pay Document par langue SUIVI OPERATRICE 0-N 0-N COMMANDES SUIVI 0-N 0-N PAYS pays d'origine 0-N pays destination 0-N FRAIS DE DOUANE Fig. 2. The diagram of the top-11 entity types of the schema of Figure 1 Now Figure 2 shows in micrography the diagram of the top-11 entity types of the schema of Figure 1 according to EntityRank (for reasons of space, the attributes are not displayed in this figure). Although this schema has 54 relationship types it is extremely more easy to visualize, and thus to understand, than the original schema. Of course, one user could start from even smaller diagrams. For instance, Figure 3 shows the graph of the top-5 entity types of the same schema. It indeed contains the major entity types of this application and a user can immediately understand the application domain of this schema. In case of diagrams with big isA hierarchies, some entity types, although major, may not receive high scores because their relationship types are scattered in several subentity types. To handle this case, we introduced a (optional) preprocessing step in which each isA hierarchy of the schema is collapsed into one entity type that collects all the attributes and relationship types of its subentity types. Concerning evaluation, at first we have to note that the evaluation of the effectiveness of link analysis techniques for ER diagrams (for conceptual graphs in general), is more difficult than in the case of Web. In the latter case, it is not so hard to judge whether the top ranked pages are indeed relevant to the submitted query. However, in the case of conceptual graphs, one has to know well in advance the conceptual graph in order to judge whether the resulting small graph indeed contains the major concepts of the conceptual graph and of its underlying domain. Inevitably, the most reliable evaluation of such techniques can be done only in already known conceptual graphs. For this reason we applied this method to almost every conceptual schema that the DBMAIN group has produced the last 3 years. This was a quite representative test bed as it includes 5 Channel decomposition 0-N 0-N 0-N partOf 0-N Bonus 0-N Max Weight/Channel 0-N EARLY BIRD State of Progress 0-N 0-N 1-1 Operator 0-N path Calculation 0-N cn/pay channelCalculation follow-up activity 0-N Item 0-N 0-N 0-N replacedBy replaces 0-1 0-1 description 1-1 describes 0-1 replacement 0-N 0-N 0-N supplies 0-N POSITIONS/ACTIVITIES/MONTHS Country 0-N 0-N 0-N Notice 0-N origin 0-N destination 0-N 0-N 0-N 0-N 1-N 0-N accumulated movements 0-N 0-N 0-N 0-N Stock Movement 0-N 0-N country language 0-N SectorOfActivity DefaultCost Customs costs Fig. 3. The diagram of the top-5 entity types of the schema of Figure 1 big schemas of existing (and non artificial) real world applications. We always obtained surprisingly good results. For reasons of space we cannot report here the exact results of the evaluation of EntityRank (and BEntityRank) using metrics coming from the area of IR (for more [35]). In addition, it is an advantage that the proposed formulas for link analysis are mathematically founded and that the underlying model (the random walk model) is quite relevant to browsing, i.e. to the most widely used method for understanding a conceptual graph. At last, another evidence that link analysis is indeed appropriate for ER diagrams is that all large schemas that we have tested have a small set of elements, usually less than 5% of the total ones, whose scores are significantly higher than the rest. This at least indicates that big ER diagrams tend to have a well connected kernel which, at least in our experiments, always comprised the more important concepts of the application domain. 3 Automatic ER Drawing For drawing automatically the top-k ER diagrams that are derived by the previous technique, we shall view them as mechanical systems. Below we present two force models that combine the spring-model (proposed and developed in [13, 24, 16]) with the magnetic-spring model (proposed in [32, 31]) in a way that is appropriate for ER diagrams. 3.1 Force Model A Here entity types are viewed as equally charged particles which repel each other. Relationship types and isA relationships are viewed as springs that pull their adjacent entity types. Moreover, we assume that the springs that correspond to isA links are all magnetized and that there is a global magnetic field that acts on these springs. Specifically, this magnetic field is parallel (i.e. all magnetic forces operate in the same direction) and the isA springs are magnetized unidirectionally, so they tend to align with the direction of the magnetic field, here upwards. Figure 4 illustrates this metaphor. 6 worksAt Person Manager 11 00 00 11 Secretary 11 00 00 11 11 00 00 11 Company 11 00 00 11 magnet magnetic field electrical repulsion Fig. 4. Viewing an ER diagram as a mechanical system Under the above force model, the force on a entity type ei is given by: X X X F (ei ) = f (ej , ei ) + g(ej , ei ) + h(ej , ei ) ej ∈conn(ei ) ej ∈E,ej 6=ei (3) ej ∈connI (ei ) where: f (ej , ei ) is the force exerted on ei by the spring between ej and ei (note that ei and ej are connected by a relationship type or an isA link), g(ej , ei ) is the electrical repulsion exerted on ei by the entity type ej , and h(ej , ei ) is the rotational force exerted on ei by the entity type ej (here ei and ej are connected by an isA link). Figure 5 gives some indicative examples that explain the role of the forces f , g and h. Specifically, figure (a) justifies the spring force, figure (b) justifies the electrical repulsion and shows that high electrical repulsion (high K e ) results in symmetrical drawings, and figure (c) illustrates how the magnetic field can be used in order to obtain the classical top-down drawings for isA hierarchies. no force with f (a) with f with f and g (low Ke) no force with f and g (high Ke) with f and g (high Ke) (b) with f and g (high Ke) and h (c) Fig. 5. Forces and ER Drawings The spring force f (ej , ei ) follows Hooke’s law, i.e. it is proportional to the difference between the distance between ej and ei and the zero-energy length of 7 the spring. Let d(p, p0 ) denote the Euclidean distance between two points p and p0 and let pi = (xi , yi ) denote the position of an entity type ei . The x component of the force f (ei ) is given by: X xj − xi s fx (ei ) = Ki,j (d(pi , pj ) − Li,j ) d(pi , pj ) ej ∈conn(ei ) where Li,j denotes the natural (zero energy) length of the spring between ei and ej . This means that if d(pi , pj ) = Li,j then no force is exerted by the spring s denotes the stiffness of the spring between ei and between ei and ej . Now Ki,j s ej . The larger the value of Ki,j , the more tendency for the distance d(pi , pj ) to be close to Li,j . The y component of the force f (ei ) is defined analogously. The electrical force g(ej , ei ) follows an inverse square law. The x component of the force g(ei ) is given by: gx (ei ) = X ej ∈E,ej 6=ei e Ki,j xi − xj d(pi , pj )2 d(pi , pj ) e Ki,j where is used to control the repulsion strength between ei and ej . The y component of the force g(ei ) is defined analogously. The magnetic force h(ej , ei ) depends on the angle between the isA spring (that connects ej and ei ) and the direction of the magnetic field and it induces a rotational force on that spring. For example, Figure 6 shows an isA link between ei and ej and the exerted forces on ei and ej due to the magnetic field. The x and y components of the magnetic force h(ei ) are given by: X X xj − xi xj − xi hx (ei ) = Km + Km Li,j Li,j ej ∈connsp (ei ) X hy (ei ) = ej ∈connsp (ei ) ej ∈connsb (ei ) Li,j + yj − yi Km − Li,j X ej ∈connsb (ei ) Km Li,j + yi − yj Li,j where K m is used to control the strength of the magnetic field. hy (ej) hx (ej) ej l l yj − yi ei hy (ei) hx (ei) xj− xi Magnetic Field Fig. 6. Magnetic forces and isA links The x and y components of the composed force F (ei ) on an entity type ei are obtained by summing up, i.e.: Fx (ei ) = fx (ei ) + gx (ei ) + hx (ei ) and Fy (ei ) = fy (ei ) + gy (ei ) + hy (ei ). 8 As in the link analysis technique, we view an n-ary relationship type as n(n − 1)/2 springs. 3.2 Force Model B One weakness of the above model is that the resulting drawings can have several overlaps. The reason is that: (a) there is no repulsion among relationship types, and (b) there is no repulsion between entity and relationship types. Figure 7 illustrates this problem. This drove us to introduce a different force model where each relationship type is viewed as a particle too. Clearly, the resulting electrical repulsion discourages the creation of overlaps (between entity and relationship types, or between relationship types themselves). Notice that according to this view, a relationship type does no longer correspond to one spring. Specifically, the particle of a relationship type over k entity types, is connected with one spring with each one of them. The forces on entity types and relationship types are computed analogously to the force model A. Without rel−rel repulsion With rel−rel repulsion (a) Without ent−rel. repulsion With ent−rel repulsion (b) Fig. 7. Forces and ER Drawings 3.3 The Drawing Algorithm We can reach a drawing by an algorithm that simulates the mechanical system. Such a algorithm would seek for a configuration with locally minimal energy, i.e. a drawing in which the forces on each node is zero. A variety of numerical techniques can be used to find an equilibrium configuration, and thus the final drawing. We have adopted the iterative method based on the method proposed in [13]. At first the nodes are placed at random positions. At each iteration, the force on each node is computed and then the node is moved towards the corresponding direction by a small amount proportional to the magnitude of the force. This can be continued until convergence, but we can also limit the number of iterations. Note that if we would like to find a drawing that corresponds to a state with globally minimal energy, then we would have to resort to very general optimization methods. For instance, a method based on simulated annealing is proposed in [11], while an approach based on genetic algorithms is described in [4]. However the computational complexity of these techniques turns them 9 not very appropriate for interactive design systems. In addition, and according to the results of the extensive empirical analysis of several force-directed algorithms (including globally minimal energy algorithms) upon plain graphs that are reported in [3], there is no universal winner and the general approach is to try several methods and choose the best. 3.4 Experimental Evaluation We have investigated and evaluated all these issues in the context of the CASE tool DB-MAIN. The specification of the parameters L, K s , K e and K m is not a trivial task as these parameters determine in a high degree how the final drawing will look like. One flexibility of the proposed approach is that we can adjust the s e spring length (Li,j ), spring stiffness(Ki,j ) and electrical repulsion(Ki,j ), in order to customize the appearance of the drawing according to the semantics of the ER diagram constructs. For instance, as it is desirable to keep the nodes of an isA hierarchy close enough and since between any two isA-related entity types we only have to draw a line (and not any hexagon-enclosed string), we can use a smaller length for isA-springs than that of relationship-springs. In any case, the user can change their value at run-time. Figure 8 shows one drawing obtained by the algorithm using low electrical repulsion. Although the isA hierarchy is drawn as a top-down drawing and we have no overlaps, this drawing is not satisfying because a designer would hardly manually place into the space occupied by an isA hierarchy an entity type that does not belong to that hierarchy. After we increased the repulsion and the magnetic field we never faced again such a drawing. The lesson learned is that high repulsion not only results in symmetrical drawings but its combination with a strong magnetic field results in clear isA drawings. Another drawing of a diagram with 4 isA hierarchies that is derived by the algorithm according to force model A, is shown in Figure 9. low Ke high Ke, high Km Fig. 8. How to obtain clean isA drawings A more complex case is shown in Figure 10. Figure 10.(a) shows a manually placed diagram where all subentity types have been placed at the outer part of the drawing. Figure (b) shows the drawing obtained according to force model B. Notice that every isA hierarchy now corresponds to a top-down drawing and that the entire drawing is symmetrical and satisfying. 10 ENTITY_1 R 1-1 0-N ENTITY R_1 0-N ENTITY_3 1-1 ENTITY_4 ENTITY_11 ENTITY_2 0-N R_2 1-1 ENTITY_5 ENTITY_10 ENTITY_8 ENTITY_6 0-N R_3 1-1 ENTITY_9 ENTITY_7 Fig. 9. A drawing of a diagram with 4 isA hierarchies according to force model A ENTITY_5 ENTITY R_2 1-1 0-N 1-1 R_6 0-N 0-N 1-1 R_10 R ENTITY_5 ENTITY 1-1 0-N 0-N 1-1 0-N R_10 R 1-1 R_2 R_6 1-1 1-11-1 0-N0-N 0-N ENTITY_1 ENTITY_9 ENTITY_4 ENTITY_1 ENTITY_6 ENTITY_4 ENTITY_6 ENTITY_9 R_5 R_3 0-N 1-1 1-1 R_3 0-N 0-N 1-1 1-1 R_5 0-N 1-1 R_9 0-N ENTITY_2 0-N R_4 ENTITY_2 R_7 1-1 0-N R_4 1-1 ENTITY_3 R_7 R_9 ENTITY_3 0-N 1-1 0-N 1-1 ENTITY_8 ENTITY_7 ENTITY_8 1-1 R_8 0-N 1-1 0-N R_8 ENTITY_7 (a) (b) Fig. 10. Drawing of a diagram with several IsA hierarchies (a): manual drawing where isA links are not vertical. (b): drawing obtained according to force model B The experimental evaluation showed that the drawings according to force model A suffer from overlaps, while those according to force model B have a few (or none) overlaps. The difference between force model A and force model B is even more evident in dense diagrams. Figure 11 shows the drawings obtained by these two models when applied on the top-5 (|E| = 5, |R| = 17) diagram of Figure 3. Again, the second drawing is evidently better. A noteworthy remark here is that the second diagram is more clear and intuitive than the manually specified layout that is shown in Figure 3. This indicates that in certain cases (at least when the diagram is very dense) the automatically-derived drawings can be better than the manually drawn. However, we have to note that force model B has two weaknesses comparing to force model A: (i) it is computational more expensive, and (ii) in the resulting drawings the tentacles of binary relationship types are in many cases unnecessarily not aligned. This is evident in Figure 12. Although this is not a major problem it is an issue for further research. Figure 13 shows the automatic layout obtained for the top-11 diagram (that was presented in Figure 2). The high relative number of relationships makes the drawing almost unreadable. This example suggests that we should take into account the density of a diagram, in order to reach readable and clear drawings. Roughly, we could handle dense diagrams by considering: (i) larger springs, (ii) higher electrical repulsion, (iii) less stiff springs. For example, and assuming force model A, Figure 14.(a) shows the drawing obtained with spring length 11 replacement Channel decomposition Channel replacedBy replaces 0-1 0-1 1-1 0-N 1-1 0-N partOf 0-N 0-N 0-N cn/pay 0-N 0-N 0-N 0-N State of Progress cn/pay EARLY BIRD channelCalculation Notice supplies 0-N Operator 0-N 0-N 0-N 0-N 0-N Max Weight/Channel Operator 0-N 0-N 0-N 0-N 0-N Max Weight/Channel 0-N 0-N 0-N 0-N EARLY BIRD description 1-1 0-1 0-N follow-up activity 0-N POSITIONS/ACTIVITIES/MONTHS0-N 0-N 0-N 0-N channelCalculation follow-up path Calculation activity Stock Movement 0-N destination 0-N 0-N Country Customs costs POSITIONS/ACTIVITIES/MONTHS Stock Movement origin 0-N 0-N Country accumulated movements 0-N SectorOfActivity destination 0-N 1-N 0-N 0-N 1-N 0-N 0-N accumulated movements supplies State of Progress 0-N path Calculation Bonus 0-N describes 0-N Item replacedBy describes 0-1 0-N Bonus 0-N Item decomposition replacement Notice 0-N 1-1 description replaces partOf 0-N 0-1 0-1 origin 0-N 0-N 0-N 0-N 0-N 0-N country DefaultCost 0-Nlanguage 0-N 0-N Customs costs country language 0-N SectorOfActivity DefaultCost (a) (b) Fig. 11. Force model A vs force model B on a diagram with |E| = 5 and |R| = 17 ENTITY_9 R ENTITY_3 1-1 1-1 0-N 1-1 R_11 R_10 R_15 1-1 R_6 1-1 ENTITY_7 1-1 R_3 0-N R_6 ENTITY_10 1-1 R_10 R_6 0-N R_3 0-N R_19 0-N R_3 1-1 R_20 R_7 R_14 1-1 0-N 0-N R_27 0-N R_20 R_13 0-N 1-1 R_26 0-N 0-N R_14 R_9 1-1 1-1 1-1 R_4 ENTITY_8 ENTITY_2 1-1 1-1 R_21 0-N R_7 ENTITY_11 1-1 0-N 0-N ENTITY_14 1-1 1-1 ENTITY_8 1-1 1-1 R_5 1-1 0-N R_8 0-N 1-1 ENTITY_2 1-1 1-1 1-1 1-1 1-1 0-N R_24 0-N ENTITY_5 0-N 0-N R_21 1-1 ENTITY_5 0-N 0-N ENTITY_5 R_25 1-1 0-N R_8 R_22 1-1 1-1 ENTITY_11 0-N R_26 0-N R_5 R_5 ENTITY_1 0-N ENTITY_14 1-1 ENTITY_2 R_25 R_8 (a) ENTITY_4 R_4 R_4 R_5 0-N 0-N 0-N 0-N R_7 1-1 ENTITY_5 R_6 1-1 0-N R_13 0-N ENTITY_8 0-N 1-1 1-1 0-N 0-N 0-N 1-1 0-N 1-1 ENTITY_7 ENTITY_1 1-1 R_14 R_13 0-N R_9 1-1 R_15 1-1 0-N 1-1 1-1 0-N ENTITY_4 1-1 R_9 1-1 1-1 R_11 R_12 R_28 R_19 R_11 R_15 1-1 1-1 1-1 1-1 R_27 0-N 1-1 ENTITY_2 1-1 0-N 1-1 0-N 1-1 R_7 1-1 R_4 0-N 1-1 0-N 0-N 1-1 R_12 0-N R_14 0-N R_18 0-N R_2 1-1 ENTITY_7 0-N R_13 1-1 1-1 ENTITY ENTITY_10 ENTITY R_2 0-N 0-N 1-1 0-N ENTITY_4 0-N ENTITY_8 R_18 0-N ENTITY_6 1-1 0-N R_3 1-1 ENTITY_6 1-1 R_10 0-N ENTITY_1 0-N 1-1 0-N 0-N R_28 1-1 0-N ENTITY_7 R_9 0-N 1-1 0-N 0-N 1-1 R_29 R 0-N 1-1 1-1 0-N 1-1 1-1 ENTITY_1 0-N 1-1 0-N R_12 1-1 0-N ENTITY_4 0-N 1-1 ENTITY_15 1-1 ENTITY_15 R_15 1-1 0-N R 0-N R_2 1-1 1-1 0-N 1-1 R_29 R_1 0-N R_10 1-1 R_17 0-N R_1 0-N R_11 0-N R_12 0-N ENTITY_3 R_30 R_17 ENTITY_3 R_30 0-N 1-1 0-N 1-1 0-N 1-1 0-N 0-N 1-1 ENTITY ENTITY_6 0-N R_16 R_31 1-1 R_16 R_31 1-1 0-N ENTITY ENTITY_6 ENTITY_9 1-1 0-N 1-1 0-N 0-N 1-1 0-N R_2 ENTITY_16 ENTITY_16 ENTITY_3 0-N R_1 R 0-N R_1 1-1 R_22 R_23 1-1 0-N 1-1 0-N R_8 1-1 1-1 0-N R_24 R_23 1-1 0-N ENTITY_13 0-N ENTITY_13 ENTITY_12 (b) (c) Fig. 12. Force model A vs force model B ENTITY_12 (d) (a): force model A. (b): force model B. (c): force model A. (d): force model B. FOURNISSEUR 0-N remise fin année fournisseur 1-1 date de fin de validité fournisseur/langue achfrn/lan 1-N 0-N 0-N 1-1 date début de validité REFERENCE ARTICLE 0-N LANGUE 0-N STAT FOURNISSEUR artvpc/frn MENSUELLES0-N COLIS 0-N 1-1 DATE 0-N 1-N 0-1du pays fournisseur/pays defournisseur la 0-N devise 0-N 0-N 1-1 Date de pays depot 1-1 LIVRAISON cn/suicol COLIS 0-N0-N 0-N1-N artref/artvpc 0-N 1-1 Document 0-N1-1 0-N 1-N 0-1par langue SUIVI réexpédition colcom/suicol LIVRAISON col/eta colis COLIS0-N 0-N LOT POS-DTE description décrire simulation Pays artref/douref artref/tva d'origine prix de vente 1-N SUIVRE LE DOSSIER CANAL 0-N article ensemble SUIVI PAYS DES Composition PAR TOPS LANGUE PAYS/LANGUE 1-1CLIENT 0-N FORMULE 0-Nvpcremplace EARLY BIRD 1-1 Cadeau 0-N 0-1 0-Nassocié dte/pay 0-N 0-N 0-1 HISTORIQUE cd/suicolCOLIS0-N 0-N artref/atvsec 1-1 SUIVI COMMANDES 0-N 1-N0-N 0-N ARTICLE VPC 0-N cn/pay 0-N 0-N(LIGNE 0-N pondération LANGUE ETAT D'AVANCEMENT OU COPAYS ENSEIGNE pays d'origine 0-N 0-N0-N 0-N article est vpcremplace pièce ALIMENTER Canal Canal commande livraison 0-N 0-N 0-N 0-N 0-N 1-N 0-N 0-N SUIVI Poids OPERATRICE max/canal 0-N 0-N 0-1 1-1 SUIVI calcul calcul ACTIVITE goulotte canal 0-N cd/eta 0-N 0-N 0-N PAYSde stock 1-1 POSITIONS/ACTIVITE/MOIS Mouvement 0-N 0-N0-N mesure 0-N 0-1 cumul mouvement systeme pays destination 1-N0-N 0-N 0-NDEFAUT 0-NFRAIS PAR COMMANDE1-1 0-N 0-N atvsec/cd 0-N SECTEUR ACTIVITE Fig. 13. Dense diagram drawing 12 Notice décomposition Remplacement d'un ensemble FRAIS DE DOUANE 0 L0 = 5L, Figure 14.(b) shows the drawing obtained with K e = 100K e , and 0 Figure 14.(c) shows the drawing obtained with K s = K s /10000. Indeed, all are better than the original drawing shown in Figure 13. Another simple method that is both effective and efficient is to scale up the entire drawing (i.e. multiply each coordinate by a constant c > 1). Nevertheless, an issue that is worth further research is to investigate the effectiveness of local-density adaptations, e.g. to adapt the spring lengths according to the local density of the graph. EntityRank and BEntityRank scores could be exploited for this purpose. FOURNISSEUR 0-N remise fin année fournisseur REFERENCE ARTICLE 1-N 0-N FOURNISSEUR 1-N 1-1artvpc/frnMENSUELLES STAT FOURNISSEUR date de fin de validité 0-N 1-1 date début de validité 0-N COLIS article vpcremplace ensemble 0-N 0-1 fournisseur/langue achfrn/lan ARTICLE VPC 1-1 1-1 POS-DTE 0-N 0-N 1-1 LIVRAISON cn/suicol COLIS 1-1 artref/artvpc 1-1 Cadeau associé 0-N 0-N 0-N SUIVI réexpédition colcom/suicol LIVRAISON col/eta colis COLIS CANAL REFERENCE ARTICLE 0-N Date de depot 0-N1-1 SUIVRE LE DOSSIER 0-N 0-N 0-N 1-N 0-1 0-N EARLY BIRD 0-N LANGUE POSITIONS/ACTIVITE/MOIS Mouvement 0-N de stock 0-N 0-N 0-N0-N pondération ALIMENTER 0-N FORMULE SUIVIPAYS DES Composition PAR TOPS LANGUE PAYS/LANGUE CLIENT 0-N 0-N SUIVI Poids OPERATRICE max/canal artref/atvsec 0-N SUIVI calcul calcul ACTIVITE goulotte canal cd/eta 1-1 0-N LOT 0-N cn/pay simulation Pays artref/douref artref/tva d'origine prix de vente 0-N Canal Canalcommande livraison 0-N 1-N 0-N 1-1 description décrire 0-1 0-N 0-N 0-N dte/pay 0-N 1-N ETAT D'AVANCEMENT (LIGNE OU CO 0-N SUIVI COMMANDES 0-N Document par langue 0-N 1-N 0-N 1-1 0-N HISTORIQUE cd/suicolCOLIS 1-N 0-N pays d'origine 0-N PAYS pays destination 0-N 0-N PAYS ENSEIGNE LANGUE 0-N 0-N 0-N 1-1 cumul mouvement systeme mesure 0-1 0-N 0-N 0-N 1-N 0-N0-N COMMANDE 0-N 0-N FRAIS DE DOUANE 0-N 0-N FRAIS PAR DEFAUT 0-N 1-1 atvsec/cd 0-N pays pays fournisseur/pays dudefournisseur la devise DATE 0-N 0-N 1-1 0-N décomposition Remplacement d'un ensemble articleest vpcremplace pièce 0-N 0-1 0-N artref/atvsec ETAT D'AVANCEMENT (LIGNE OU CO 0-N 0-N Notice simulation Pays artref/douref artref/tva d'origine prix de vente 0-N 0-Ncumul mouvement systeme mesure DATE 0-N 0-N 1-N 0-N ALIMENTER pondération 0-N 0-N 0-N date de fin de validité artref/artvpc 0-N 0-N 0-N dte/pay SUIVI COMMANDES Date de depot SUIVI réexpédition colcom/suicol LIVRAISON col/eta colis COLIS date début de validité 0-N 1-1 0-N 0-N cd/eta 0-N pays d'origine SECTEUR ACTIVITE 0-N 0-N FRAIS PAR DEFAUT 0-N PAYS DE DOUANE 0-N 0-N FRAIS pays paysdu fournisseur/pays defournisseur la devise 1-1 1-1 1-1 FOURNISSEUR0-Nremise fin année fournisseur 0-N 0-N destination 0-N 1-1 0-N pays 0-N LOT 0-N 0-N 0-N 1-1 POS-DTE 1-N 0-N 0-N 0-1 1-1 atvsec/cd 1-N 0-N PAYS ENSEIGNE LANGUE 1-N 0-N 1-1 0-N COLIS 1-1 0-N FORMULE POSITIONS/ACTIVITE/MOIS Mouvement de stock SUIVIPAYS DES Composition PAR TOPS LANGUE PAYS/LANGUE CLIENT 0-N 1-1 1-1 fournisseur/langue achfrn/lan SUIVI calcul calcul ACTIVITE goulotte canal 0-N artvpc/frnMENSUELLES STAT FOURNISSEUR HISTORIQUE cd/suicolCOLIS 0-N 0-N1-N 0-N COMMANDE 0-N SUIVI Poids OPERATRICE max/canal 0-N 1-1 0-N cn/pay 1-N 0-1 EARLY BIRD article vpcremplace ensemble 0-N 1-1 Cadeau associé 1-1 LANGUE 0-N Notice Document par langue 0-N 0-N 0-N 0-1 LIVRAISON cn/suicol COLIS 0-N ARTICLE VPC1-1décomposition Remplacement d'undescription ensemble décrire articleest vpcremplace pièce Canal Canalcommande livraison 0-N 0-1 0-N 1-1 0-N 0-1 0-N 0-N 0-N 0-N remise fin année fournisseur 0-N LANGUE 1-N 0-N 0-N Notice DATE 1-1 0-N 1-1 0-N 1-1 SUIVI réexpédition colcom/suicol LIVRAISON col/eta colis COLIS SUIVRE LE DOSSIER 0-N cd/eta 1-1 0-1 artref/artvpc LOT POS-DTE pays paysdudefournisseur fournisseur/pays la devise 1-N dte/pay CANAL Document par langue 1-1 description décrire 0-N 0-1 0-N 1-N 0-1 FORMULE SUIVIPAYS DES Composition PAR TOPS LANGUE PAYS/LANGUE CLIENT 0-N 1-1 1-1 0-N EARLY BIRD simulation Pays artref/douref artref/tva d'origine prix de vente SUIVI COMMANDES HISTORIQUE cd/suicolCOLIS 0-N 0-N 1-N 0-N STAT FOURNISSEUR artvpc/frnMENSUELLES 0-N 0-N 0-N ETAT D'AVANCEMENT (LIGNE OU CO 0-N 0-N 0-N 1-N 0-N 0-N Date de depot 0-N1-1 0-N 1-N 0-1 0-N LIVRAISON cn/suicol COLIS 0-N 0-N 1-1 0-N fournisseur/langue achfrn/lan date début de validité REFERENCE ARTICLE 0-N COLIS SUIVRE LE DOSSIER 0-N 1-1 date de fin de validité 0-N 0-N 1-N 0-1 0-N 1-N 0-1 0-N Cadeau associé article vpcremplace ensemble 0-N 1-1 0-N 0-N0-1 0-N ARTICLE VPC décomposition Remplacement d'un ensemble 0-N 0-N articleest vpc remplace pièce 0-N 1-N 0-N 0-N 0-N0-1 0-N cn/pay ALIMENTER 0-N Canal Canalcommande livraison 0-Nartref/atvsec 0-N 0-N pondération PAYS ENSEIGNE LANGUE pays d'origine 0-N 0-N 0-N PAYS FRAIS DE DOUANE pays destination 1-1 0-N cumul mouvement systeme mesure SUIVI calcul calcul ACTIVITE goulotte canal COMMANDE SUIVI Poids OPERATRICE max/canal 0-N 0-N POSITIONS/ACTIVITE/MOIS Mouvement de stock 0-N 1-1 0-N 0-N 0-N FRAIS PAR DEFAUT 0-N atvsec/cd 1-N 0-N 0-N 0-N 0-N 0-N 0-N 0-N SECTEUR ACTIVITE (a) CANAL SECTEUR ACTIVITE (b) (c) Fig. 14. (a): larger springs; (b): higher repulsion; (c) less stiff springs As a final remark note that the above drawing techniques can be applied for drawing the structural part of ontologies expressed in RDFS [5] and OWL [12]. The only difference is that RDFS supports property specialization which however will be handled correctly due to the magnetic field that is applied on specialization/generalization links (also indicated by Figure 9). 4 Conclusion We described a novel method for identifying the major elements of an ER diagram that is based on link analysis. This method can significantly aid (a) the understanding, (b) the visualization, and (c) the drawing of very large schemas. The proposed technique can elevate automatically the major elements and allows exploring the schema gradually: from the more important elements to the less. Consequently, it can be very useful in reverse engineering and in information integration. Moreover, the scores can be exploited for ordering the schema elements that match a keyword query of the user. In addition, and given the inability to produce automatically aesthetically satisfying layouts for large schemas, the small (top-k graphs) that can be derived by this technique can be visualized effectively and this is very useful during communication (e.g. between designers and application programmers or in requirements engineering and training). For this purpose we investigated a force-directed drawing algorithm and evaluated two different force models upon several conceptual schemas of real applications. For small and medium sized diagrams the results were satisfying in most of the cases. In the rest cases, human intervention (moving, nailing) and rerun of the drawing algorithm could rectify the problems. 13 Acknowledgements The first author wants to thank Tonia Dellaporta for the several fruitful and really enjoyable discussions on this issue. Also many thanks to Jean-Rock Maurisse, AnneFrance Brogneaux and Jean Herald for their help on using the DB-MAIN toolkit. References 1. Jacky Akoka and Isabelle Comyn-Wattiau. “Entity-Relationship and ObjectOriented Model Automatic Clustering”. Data and Knowledge Engineering, 20(2):87–117, 1996. 2. Giuseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis Tollis. “Graph drawing: algorithms for the visualization of graphs”. Prentice Hall Englewood Cliffs (N.J.), 1999. ISBN/ISSN : 0-13-301615-3. 3. F. J. Braedenburg, M. Himsolt, and C. Rohrer. “An Experimental Comparison of Force-Direceted and Randomized Graph Drawing Algorithms”. In Procs of Graph Drawing, GD’95, pages 76–87, 1996. 4. J. Branke, F. Bucher, and H. Schmeck. “Using Genetic Algorithms for Drawing Undirected Graphs”. In Procs of the 3rd Nordic Workshop on Genetic Algorithms and Their Applications, 3NWGA, pages 193–2005, 1997. 5. Dan Brickley and R. V. Guha. “Resource Description Framework (RDF) Schema specification: Proposed Recommendation, W3C”, March 1999. http://www.w3.org/TR/1999/PR-rdf-schema-19990303. 6. Sergey Brin and Lawrence Page. “The Anatomy of a Large-scale Hypertextual Web Search Engine”. In Proceedings of the 7th International WWW Conference, Brisbane, Australia, April 1998. 7. L. J. Campbell, Terry A. Halpin, and Henderik Alex Proper. “Conceptual Schemas with Abstractions: Making Flat Conceptual Schemas More Comprehensible”. Data and Knowledge Engineering, 20(1):39–85, 1996. 8. Rodolfo Castello, Rym Mili, and I. Tollis. “A Framework for the Static and Interactive Visualization of Statecharts”. Journal of Graph Algorithms and Applications, 6(3):313–351, 2002. 9. P. Chen. “The Entity-Relationship Model - Toward a Unified View of Data”. ACM Transactions on Database Systems, 1(1):9–36, March 1976. 10. Richard Cole. “Automatic Layout of Concept Lattices using Force Directed Placement and Genetic Algorithms”. In Proc. of the 23th Australiasian Computer Science Conference, pages 47–53. Australian Computer Science Communications 1, IEEE Computer Society, 2000. 11. R. Davidson and D. Harel. “Drawing Graphics Nicely Using Simulated Annealing”. ACM Trans. Graph., 15, 1996. 12. M. Dean, D. Connolly, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, and L.A. Stein. “OWL Web Ontology Language 1.0 Reference”, 2002. (http://www.w3c.org/TR/owl-ref). 13. P. Eades. “A Heuristic for Graph Drawing”. Congressus Numerantium, 42, 1984. 14. Holger Eichelberger and Jurgen Wolff von Gudenberg. “UML Class Diagrams State of the Art in Layout Techniques”. In Proceeding of Vissoft 2003, International Workshop on Visualizing Software for Understanding and Analysis, pages 30–34, 2003. 15. P. Feldman and D. Miller. “Entity Model Clustering: Structuring a Data Model by Abstraction”. The Computer Journal, 29(4):348–360, 1986. 14 16. T. Fruchterman and E. Reingold. “Graph Drawing by Force-directed Placement”. Software - Practice and Experience, 21(11):1129–1164, 1991. 17. F.U.N.D.P. “DB-MAIN”. (http://www.info.fundp.ac.be/∼dbm/). 18. Munish Gandhi, EdwardL Robertson, and Dirk Van Gucht. “Levelled Entity Relationship Model”. In Procs of the 13rd Intern. Conf. on the Entity Relationship Approach, ER’94, pages 420–436, Manchester, U.K., December 1994. 19. Floris Geerts, Heikki Mannila, and Evimaria Terzi. “Relational Link-based ranking”. In Procs of the 30th Intern. Conference on Verly Large Data Bases, VLDB’2004, Toronto, Canada, August 2004. 20. Zoltan Gyongyi, Hector Garcia-Molina, and Jan Pedersen. “Combating Web Spam with TrustRank”. In Procs of the 30th Intern. Conference on Verly Large Data Bases, VLDB’2004, Toronto, Canada, August 2004. 21. Jean-Luc Hainaut. “Transformation-based Database Engineering”. In Transformation of Knowledge, Information and Data: Theory and Applications. IDEA Group Pub., 2004. 22. Jouni Huotari, Kalle Lyytinen, and Marketta Niemela. “Improving Graphical Information System Model Use with Elision and Connecting Lines”. ACM Transactions on Computer-Human Interaction, 10(4), 2003. 23. Yannis E. Ioannidis, Miron Livny, Jian Bao, and Eben M. Haber. “User-Oriented Visual Layout at Multiple Granularities”. In Proc. of the 3rd International Workshop on Advanced Visual Interfaces, pages 184–193, Gubbio, Italy, May 1996. 24. T. Kamada. ”On Visualization of Abstract Objects and Relations”. PhD thesis, Dept. of Information Science, Univ. of Tokyo, Dec 1988. 25. Jon Kleinberg. “Authoritative Sources in a Hyperlinked Environment”. In Proceedings of 9th ACM-SIAM Symposium on Discrete Algorithms, San Francisco, USA, 1998. 26. Simon Lok and Steven Feiner. “A Survey of Automated Layout Techniques for Information Presentations”. In Procs of the 1st. Int. Symp. on Smart Graphics, Hawthorne, NY, 2001. 27. Rajeev Motwani and Prabhakar Raghavan. Randomized algorithms. Cambridge University Press, 1995. 28. Arthur Ouwerkerk and Heiner Stuckenschmidt. “Visualizing RDF Data for P2P Information Sharing”. In Procs of the workshop on Visualizing Information in Knowledge Engineering, VIKE’03, Sanibel Island, FL, 2003. 29. Aaron J. Quigley. “Large Scale Relational Information Visualization, Clustering, and Abstraction”. PhD thesis, University of Newcastle, Australia, August 2001. (http://www.it.usyd.edu.au/∼aquigley/thesis/aquigley-thesis-mar-02.pdf). 30. O. Rauh and E. Stickel. ”Entity Tree Clustering: A Method for Simplifying ER Design”. In Procs of the 11th Int. Conf. on Entity-Relationship Approach, ER’92. 31. K. Sugiyama and K. Misue. “A Simple and Unified Method for Drawing Graphs: Magnetic-Spring Algorithm”. In Procs of Graph Drawing Conference, GD’94, pages 364–375, 1994. 32. K. Sugiyama and K. Misue. “Graph Drawing by Magnetic-Spring Model”. Journal on Visual Lang. Comput., 6(3), 1995. 33. R. Tamassia, C. Batini, and M. Talamo. “An algorithm for automatic layout of entity-relationship diagrams”. In Procs of the 3rd International Conference on Entity-relationship approach to software engineering, pages 421–439. Elsevier North-Holland, Inc., 1983. 34. T. J. Teory, W. Guangping, D. L. Bolton, and J. A. Koenig. “ER Model Clustering as an Aid for User Communication and Documentation in Database Design”. Communications of the ACM, 32(8):975–987, 1989. 15 35. Yannis Tzitzikas and Jean-Luc Hainaut. ”Ranking the Elements of Conceptual Diagrams”, 2005. (submitted for publication). A Linear Algebra Version of (B)EntityRank Let A be the generalized adjacency matrix of an ER diagram where A[ei , ej ] equals the number of transitions from ei to ej . Now the probability transition matrix M is obtained by normalizing each row of A to sum to 1. EntityRank is based on a Markov chain on the entity types with transition matrix q · U + (1 − q) · M where U is the transition matrix of uniform transition probabilities i.e. U[ei , ej ] = 1/N for all i, j. The vector of the EntityRank scores, denoted by Sc, is then defined to be the stationary distribution of this Markov chain. Equivalently, Sc is the principal right eigenvector of the transition matrix (q · U + (1 − q) · M)T , since by definition the stationary distribution satisfies (q · U + (1 − q) · M)T Sc = Sc. On the other hand, BEntityRank (the biased version of EntityRank) is based on the transition matrix: q · B + (1 − q) · M i )| where Attr denotes the set of all attributes of all entity where B[ej , ei ] = |attrs(e |Attr| types (i.e. Attr = ∪{ attrs(e) | e ∈ E}). As another remark note that in an undirected (strongly connected and non-bipartite) graph G = (V, R), the stationary probability of a node u is given by P (u) = deg(u) 2|R| where deg(u) is the degree of u [27]. This means that in an undirected graph (or multigraph) the stationary probabilities can be computed very efficiently and without the need of an iterative algorithm. In our case we cannot employ the above method due to the ”teleporting” transitions which are indispensable in our case for ensuring that the transition graph is strongly connected (note that large ER diagrams are not always connected). Specifically, the ”teleporting” transitions of BEntityRank are not symmetric and this cannot be captured by an undirected graph. For instance, consider the case of an ER diagram consisting of two entity types e1 and e2 and one relationship type between them, where e1 has one attribute and e2 has two attributes. According to a random walk on the undirected graph both entity types have probability 1/2. According to BEntityRank if q = 0 then P (e1 ) = P (e2 ) = 1/2, if q = 1 then P (e1 ) = 1/3 and P (e2 ) = 2/3, and if q = 0.5 then P (e1 ) = 0.44 and P (e2 ) = 0.55. 16
© Copyright 2024