Visual Analytics Multimedia Information Systems 2 VU (SS 2014, 707.025) Vedran Sabol KTI, TU Graz March 26th 2015 Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 Structure of the Lecture • • • • Visual Analytics (recapitulation) Scalable Layout Algorithms Clutter Reduction Aggregation-based Methods Hierarchical Layout Level of Detail Rendering Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 2 Visual Analytics (recapitulation) Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 3 Visualization • Definition Graphical representation of data, information and knowledge Use of human visual system, supported by computer graphics, to analyze and interpret large amounts of data • Approach Machines transform the data into a suitable graphical representation Employ the human visual system for pattern recognition • Challenges How should the graphical representations look like (design)? How to compute the graphical representation (algorithms)? Which operations shall be supported on the graphical representation (interactivity)? Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 4 Visualisation - Motivation • Human visual apparatus is an highly efficient „processing machine“ • Enormous amounts of information are transferred by the visual nerve into the brain cortex - extremely high bandwidth • Visual cortex remains unbeatable in recognition of objects and complex patterns - extreme parallel processing • Pre-attentive processing: capability to process certain visual information without focusing our attention Criterion 1: Processing time < 200 - 250ms (single glimpse) Criterion 2: Processing time does not correlate with the amount of noise in the data Limited number of pre-attentive features Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 5 Pre-attentive processing Yes Borderline No Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 6 Visualization • Fundamental categories of visual representation Formalisms: abstract schematic representations • must be learnt Metaphors: representations based on a real-world equivalent • Intuitive: user can understand the meaning through building analogies Models: based on mental representations of the physical world • Data has a natural representation in the real world • Visualisation subdivision Data/Scientific Visualization Information Visualization Knowledge Visualization Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 7 Visualization Examples Pressure coefficients [NASA] • Data visualisation • Uses a model Vedran Sabol (KTI, TU Graz) Cultural Heritage - Roman Theatres [Blaise] • Knowledge visualisation • Uses formalisms Visual Analytics March 26th, 2015 8 Visualisation Examples Information Visualisation Themeriver [Havre]: trends in document clusters • Uses a metaphor: river flow Vedran Sabol (KTI, TU Graz) Information Landscape [Sabol]: topical similarity in document repositories • Uses a metaphor: landscape Visual Analytics March 26th, 2015 9 Visual Analytics New Insights New Knowledge Repository Algorithms Visualization • A new interdisciplinary research area at the crossroads of • Data mining and knowledge discovery • Data, information and knowledge visualisation • Perceptual and cognitive sciences • Human in the loop Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 10 Visual Analytics • Combines automatic methods with interactive visualisation to get the best of both [Keim 2008] • interaction between humans and machines through visual interfaces to derive new knowledge Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 11 Visual Analytics 1. Machines perform the initial analysis 2. Visualization presents the data and analysis results 3. Humans are integrated in the analytical process through means for explorative analysis • User spots patterns and makes a hypothesis about the data • Further analysis steps - visual and/or automatic - to verify the hypothesis • Confirmed or rejected hypothesis: new knowledge! Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 12 Today’s Focus Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 13 Data • Visual analysis of Weakly structured data: text repositories • Most commonly used data type • Text is highly unstructured • Accompanied by structured metadata Structured Data: network data/graphs • Rapidly gaining importance • Social Networks • Web graph • Semantic Knowledge Bases (Ontologies, Linked Data) • Approach Develop methods for unstructured data Extend them on structured data • More complexity: consider relationships Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 14 Methods • Geometry computation Projection and layout algorithms: scalable, visually compelling 2D layout of the data set Clarity improvement methods for graphs: edge bundling, edge routing Label overlap minimization • Aggregation-based methods Provide a coarse overview of the whole data set • To avoid the overload of the user (and of the Web client) Introduce more details when zooming • Level of Detail based rendering Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 15 Projection and Layout Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 16 Layout How to visualize high-dimensional vector spaces? • Project them into a „smaller“ (i.e. 2D or 3D) visualization space – Relationships can be viewed, understood and explored by users • Preserve original distances/similarities as far as possible – Related data elements are spatially close – structures arise • Dimensionality reduction techniques How to layout complex (semantic) graphs • Connect nodes with edges – Understand structures, explore relationships by following edges • Maximize clarity of the potentially very complex representation – Place strongly interconnected node groups (i.e. those sharing similar neighborhood) spatially close together – Minimize overlap and edge crossings Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 17 Ordination Methods Distance-/similarity preserving methods • Multidimensional scaling • Input is a distance-/similarity matrix – Computed using similarity coefficients (e.g. cosine coefficient) • Dimensions of the low-dimensional space have no meaning – and no relation to the original dimensions Transformations of the feature space • Principal Component Analysis, Self Organizing Maps • Input are high-dimensional feature vectors • Dimensions of the low-dimensional space may be related to the highdimensional space Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 18 Multidimensional Scaling Motivation Example: distance (i.e. dissimilarity) matrix of car makers • Which car makers are similar? • Need visualisation: impossible to read from large matrices Siehe http://www.wiwi.uni-wuppertal.de/kappelhoff/papers/mds.pdf Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 19 Multidimensional Scaling See: http://www.wiwi.uni-wuppertal.de/kappelhoff/papers/mds.pdf Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 20 Multidimensional Scaling Example: 2D to 1D space Information loss is inherent! x y 1 0.2 1 2 0.5 1 3 1.5 0.2 Minimise differences between HD and LD distances Distance computation a= dist(1,2)= 0.3 b= dist(2,3) = 0.6 c= dist(1,3) = 1.55 1 2 3 1 0 0.3 1.55 2 0.3 0 0.6 3 1.55 0.6 0 a 2 1 Dimensionality Reduction b c 2 1 3 ~a ~b 3 ~c High-Dim (2D) Vedran Sabol (KTI, TU Graz) Low-Dim (1D) Visual Analytics March 26th, 2015 21 Multidimensional Scaling Force-Directed Placement Heuristic multidimensional scaling method Spring model simulates a physical system • Compute forces between objects – Force between an object pair depends on similarity/distance in the highdimensional vector space » Topical similarity for text documents » Connectivity with the local neighborhood for graphs – Similar/connected object attract, dissimilar/disconnected objects repulse • Element position interactively computed depending on the forces Physical system converges towards a local minimum Stop condition • Object movements have subsided (velocity) • Difference between high- and low-dimensional distances stopped sinking (stress) Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 22 Force-Directed Placement Basic Force Model dist high distlow force(d i , d j ) distlow (d i , d j ) dist high (d i , d j ) di dj disthigh distlow di force(d i , d j ) 0 attractive force force(d i , d j ) 0 dj Repulsive force Attempts to reconstruct the original distances in the visualisation space Disadvantages • Scaling of high-dimensional distances often unsuitable for visualization • Need to convey proportions, but not the exact distances • No parameterization/tuning possibilities Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 23 Force Directed Placement Item Position Computation N 1 d i .x d i .x force(di , d j )(d j .x di .x) N 1 j 1, j i N 1 di . y di . y force(di , d j )(d j . y di . y) N 1 j 1, j i force 1 d i .x d i .x 1 (d j .x d i .x) d j .x force 0 d i .x d i .x 0 (d j .x d i .x) d i .x Force d3 d1 di d2 Resulting Force Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 24 Force-Directed Placement Improved Force Model Similarity in original space force(d i , d j ) sim(d i , d j ) d Constant p grav r dist (d i , d j ) Repulsive force Distance in projection space First term: attractive force proportional to similarity • Attracts similar/connected objects Second term: rapidly rising, short-range repulsive force • Prevents „gravitational collapse“ of very similar elements Third term: weak cohesive force • prevent endless expansion of non-similar elements Vast parameterization and tuning possibilities Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 25 Force-Directed Placement Graph Layout Specifics Adjacency Matrix used as input Specifies which nodes are directly connected by an edge Not all force interactions between non-connected nodes must be computed Is suffices to consider the neighbourhood in the 2D layout: speeds up computation enormously for sparse graphs Close non-connected nodes should repel each other D A C B Vedran Sabol (KTI, TU Graz) Visual Analytics 0 1 1 0 1 0 1 0 1 1 0 1 0 0 1 0 March 26th, 2015 26 Force-Directed Placement Advantages and Disadvantages Advantages: • Good layout quality • Visually pleasing results (with a little tuning) • Incremental: changes in the data integrated smoothly in the layout Disadvantages: • Tends to get stuck in local minima – Especially for larger data sets – Possible remedy: “shake out” techniques • Scalability of the brute force algorithm: – Text: O(n3) time-complexity due to full distance/similarity matrices » May be improved by matrix pruning (not considering low similarity values) – Graphs: much better for sparse graphs as adjacency matrix is sparser » But still at least O(n2) Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 27 Force Directed Placement Addressing Scalability Issues More efficient approaches based on FDP • Stochastic Sampling (neighbor + random sets) [Chalmers 1996]: O(n^2) • Apply sampling and interpolation recursively [Jourdan & Melançon 04]: O(n*log(n)) • Aggregate data into hierarchy (clustering), apply FDP recursively [Muhr, Sabol, Granitzer 2010]: O(n*log(n)) – Will be discussed into detail later Alternative Techniques: • Least Square Projection, Random Projections, FASTMAP, IDMAP… • Fast, but mostly inferior layout quality Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 28 Force-Directed Placement Layout Quality Stress measure • Difference between pairwise distances in high- and low-dimensional spaces stress (d i , d j ) 1 (disthigh (d i , d j ) distlow (d i , d j ))2 N 1 i j • Heat maps to visualise stress per object: find problem areas [Seifert, Sabol, Kienreich 2010] Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 29 Clutter Reduction Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 30 Graph Visualization Large Graphs Cytoscape (http://cytoscape.org/) • • Left: 50000 nodes, 250000 edges (http://cytoscape.org/manual/Cytoscape2_6Manual.html) Right: 30000 nodes (http://www.mkbergman.com/419/so-what-might-the-webs-subject-backbone-look-like/) Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 31 Graph Visualization Force-Directed Edge Bundling Idea: bundle Edges which are • parallel rather than perpendicular • of similar lengths • proximate in space Reduces edge crossing significantly traffic between geographic locations [Holten & van Wijk 2009] Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 32 Force-Directed Edge Bundling Input: straight line node-link diagram • For example generated with FDP layout Straight edges are subdivided into segments • The shape changes along subdivision points Forces applied on the subdivision points • Spring forces between consecutive points of an edge: Fs K p || pi pi 1 || – Tend to straighten the edge, Kp controls the amount of bundling • Electrostatic forces between corresponding point pairs from different edges Fe 1 / || pi qi || – Attractive force bundling the edges together Fe P Pi+1 Pi Fs Qi+1 Q Vedran Sabol (KTI, TU Graz) Visual Analytics Qi March 26th, 2015 33 Force-Directed Edge Bundling Fe P Pi+1 Pi Fs Qi+1 Q Qi Need a finer control of Fe • For some point pairs Fe too strong, for some too weak • Introduce edge compatibility measure Ce(P,Q) Iteratively apply FDP • Compute resulting force Fpi for each point • Move point it in the resulting direction Fpi K p (|| pi 1 pi || || pi pi 1 ||) Ce ( P, Q) / || pi qi || QE Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 34 Force-Directed Edge Bundling Edge Compatibility Measure Edge compatibility measure Ce (within the range [0,1]) composed of • • • • Ca(P,Q) - angle compatibility: bundle almost parallel edges Cs(P,Q) - scale compatibility: bundle edges of comparable length Cp(P,Q) - position compatibility: bundle edges which are close Cv(P,Q) – visibility compatibility: bundle edges which overlap Ce ( P , Q ) C a ( P , Q ) C s ( P , Q ) C p ( P , Q ) Cv ( P , Q ) Taken from [Holten & van Wijk 2009] Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 35 Force-Directed Edge Bundling Example – Concept Co-occurrence over Time Concept and Links extracted from 10 years of i-Know conference papers Time encoding: Node importance for the time interval represented by the inner (blue) circle • Timeline for choosing the time interval Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 37 Edge Routing Idea: Construct a grid and route the edges along the grid [Lambert et al. 2010] Goals: • Vedran Sabol (KTI, TU Graz) Visual Analytics eliminate node-edge overlap completely ! March 26th, 2015 38 Edge Routing Grid Generation Discretise the 2D plane into regions (area subdivision) Use region boundaries as “roads” for routing edges Grid generation • • Quad trees – Each area has exactly four children – Subdivide area until each data element (graph node) is assigned one tree node – “Multi-resolution” grid: grid density higher where node density is high – Disadvantage: rectangular edges Voronoi diagrams – Subdivide areas so that each data element (graph node) is assigned a convex polygon Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 39 Edge Routing Voronoi Subdivision Brute force approach • • • Compute Delaunay triangulation for the data points (black) Compute perpendicular bisectors to triangulation edges (red) Compute intersections of adjacent edges – These points define the Voronoi polygons More efficient algorithm for the 2D plane: Fortune's algorithm • image: Wikipedia O(|V|*log(|V|)) time complexity Weighted Voronoi Diagrams Consider node weight Slide bisectors towards lower weight nodes Assign more area to nodes with higher weight – Useful: reserve area for icons of different sizes [Andrews et al. 2003] Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 40 Edge Routing Dijkstra's Shortest Path Algorithm For a given node finds lowest weight path to any other node For a current node: check unvisited neighbours and update distances (if smaller) • Mark current node as visited, make the unvisited node with smallest distance the new current node • Finished when target node marked visited Efficient implementation complexity: O(|E|+|V|*log|V|) • Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 41 Edge Routing Road Metaphor Route original graph edges along the Grid using Dijkstra Problem: edges follow many different paths – low bundling of edges Road metaphor: group edges along highly used paths • • • • Reduce the weights of frequently used paths Recompute the shortest path for each original edge Frequently used paths attract more edges becoming “highways” Iterate a fixed amount of times Disadvantages of the edge routing method • • Very high edge overlap Edges can not be distinguished and followed easily Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 42 Label Overlap Reduction Overlapping labels can massively impair readability Possible solutions: Size-sensitive layout algorithms – Assign enough space for each label (e.g. using FDP) – Works only for limited number of labels – Attention when zooming: dynamic label layout annoying for users [Granitzer et al. 2004] Example: Abraham Lincoln's family Tree [http://jay.askren.net/Genealogy/Projects/Tree/AbrahamLincolnRadialGraph.JPG] Prioritisation Aggregation Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 43 Label Overlap Reduction Label prioritisation Assign a priority to each label Begin label rendering with highest-priority labels If overlap (or too high label density) do not render the label Re-evaluate when zoom factor changes [Kienreich et. al 2007] Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 44 Aggregation and Hierarchical Layout Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 45 Clustering Application Browsing data collections • Apply clustering recursively to compute a hierarchy • Labeled hierarchy as “virtual table of contents” Cluster hierarchy Feature Vectors Similarities 46 Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 Scalable Layout Algorithm Idea: • Aggregate a large data set to a small number of clusters (hierarchically) • Apply FDP separately on a small number of items (makes it fast) – Clusters – Clusters children – …down to data element level • Important: number of children is strictly limited (e.g. 20-50) Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 47 Scalable Layout Algorithm Input: • Base area (rectangle, circle…) • Data elements (nodes, documents) – Incl. possibility to compute similarity between them • Edges (in case of graphs) Output: • Hierarchy of nested areas • 2D data element positions • Edge geometry (for graphs) Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 48 Scalable Layout Algorithm Recursive Algorithm: • Aggregation: – Clustering (e.g. k-means) and cluster labeling – Alternative: use an existing hierarchy » E.g. class hierarchy of an ontology, file system hierarchy… • Layout: – force-directed placement – inscribing into parent area • Area subdivision: Voronoi diagrams • For each cluster: cluster size > threshold? – Yes: apply algorithm recursively on the cluster – No: layout data elements (bottom level) and terminate Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 49 Inscribing Into Voronoi Areas Given: • Set of 2D points in the plane {p1, … pn}, normalised into [-1,1] space • A Voronoi area A with m bounding edges aj Procedure • Compute centre of gravity c for A, align it with plane origin (0,0) ri • For each pi 1 – Cast a ray ri from c to pi – Find intersected aj and intersection point qi pi – Calculate pi’: translate pi along ri so that d(c, pi’)/d(c, pi) = d(c, qi) qi pi‘ 0 c A -1 Vedran Sabol (KTI, TU Graz) Visual Analytics 0 March 26th, 2015 1 50 Scalable Layout Algorithm Advantages • Hierarchical labeled geometry Navigation and exploration along the hierarchy Labels and geometry adapted to the level of detail • Scalable Time and space complexity: O(n*log(n)) Parallelization fairly straightforward Tunable (FDP-based) Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 51 Text - Information Landscape Visualisation Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 Scalable Graph Layouting Edge Aggregation • Input: edges, nodes aggregated to meta-nodes • Aggregate inter-cluster edges to meta-edges connecting meta-nodes Bottom-up propagation of inter-cluster edges until they connect siblings Meta-edge weight: aggregated weight of inner-cluster edges • Inner-cluster edges remain unaffected Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 53 Graph Layout Hierarchical Edge Bundling • Edge bundling applied locally Within a meta-node’s area On relations between metanode’s children For all levels of hierarchy Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 54 Graph Visualization Edge Bundling – Flat vs. Hierarchical Aggregation • Reduces edge clutter massively • Automatic meta-node/edge expansion on zoom in • Disadvantage: some node-edge overlap remains Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 55 Graph Layout Hierarchical Edge Routing • Route edges along the hierarchical (Voronoi) Grid • Inter-cluster edges are routed over several hierarchy levels Apply Dijkstra's shortest path algorithm in a top down manner Locally within an area’s Voronoi polygon Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 56 Graph Visualization Edge Routing – Flat vs. Hierarchical Aggregation • Reduces edge clutter and eliminates edge-node overlap • Disadvantage: massive edge overlap on Voronoi boundaries Edge stroke indicator for number of overlapping edges Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 57 Scalable Graph Visualization Summary Node Aggregation Edge Aggregation Hierarchical Node Layout Hierarchical Edge Bundling Grid Generation Hierarchical Edge Routing Graph Visualization Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 58 Summary: Scalable Layout Pipeline Similarities Mathematical Model: Vector Space Model Feature Engineering NLP, feature frequencies + TF/IDF, node connectivity … Similarity/Connectivity Euclidean distance, Cosine coefficient… Recursive k-means, hierarchical agglomerative clustering… HTML5 canvas, SVG… FDP, Voronoi subdivision, edge bundling/routing… Rendering Layout Visualizations Vedran Sabol (KTI, TU Graz) Hierarchical geometry: spatial proximity conveys relatedness Visual Analytics Aggregation Data (text, graphs) Aggregation (Hierarchy) “virtual table of contents” March 26th, 2015 59 Level of Detail Rendering Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 60 Variable Level of Detail (LOD) • Coarse-grained overview Decreased complexity of representation for far-away objects • Provide more details when zooming-in Technique well-known from 3D environments and from GoogleMaps • Make use of the hierarchical geometry • Only a limited amount of geometrical detail shown at each moment Reduces clutter Reduces cognitive load on the user Useful for low computing power devices (Web, mobile) – Client loads more detailed geometry on demand – And discards it when not needed Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 61 Variable Level of Detail (LOD) Mouse Anatomy Ontology Zoom out Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 62 Variable Level of Detail (LOD) Mouse Anatomy Ontology Zoom level 1 Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 63 Variable Level of Detail (LOD) Mouse Anatomy Ontology Zoom level 2 Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 64 Variable Level of Detail (LOD) Mouse Anatomy Ontology Zoom level 3 Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 65 Possible Applications for Graph Visualisation • Analysis of Twitter data Nodes: users Edges: communication (weighted) between users Graph visualisation: show groups of communicating users, communication between groups Label groups with conversation topics: frequent terms from the tweets • Visualisation of aligned ontologies Alignment: connect nodes with equal or similar names (high weight) Graph visualisation: explore which ontology parts could be aligned and which not Let users interactively connect and disconnect nodes Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 66 Thank you Next lecture on 23.04.2015 “Visualisation of Semantic Data” Vedran Sabol (KTI, TU Graz) Visual Analytics March 26th, 2015 67
© Copyright 2024