Information Visualization Lecture Outline n n n Overview
















![Scatterplot Matrix n Scatterplot matrix of abrasion loss data [Tie 1989] 17 Scatterplot Matrix n Scatterplot matrix of abrasion loss data [Tie 1989] 17](https://slidetodoc.com/presentation_image_h2/7f8e0fb19b89fe4b7ae84dfeb87bb031/image-17.jpg)
![Possible Problems With Scatterplots n Everitt [Eve 78, p. 5] gives two reasons why Possible Problems With Scatterplots n Everitt [Eve 78, p. 5] gives two reasons why](https://slidetodoc.com/presentation_image_h2/7f8e0fb19b89fe4b7ae84dfeb87bb031/image-18.jpg)

























![Query-Dependent Pixel-based Techniques Overall Distance n Result of a complex query [Ke. K 1994] Query-Dependent Pixel-based Techniques Overall Distance n Result of a complex query [Ke. K 1994]](https://slidetodoc.com/presentation_image_h2/7f8e0fb19b89fe4b7ae84dfeb87bb031/image-44.jpg)






- Slides: 50

Information Visualization

Lecture Outline n n n Overview of information visualization The role of visualization in the process of data mining The patterns being sought: clusters and outliers Issues when visualizing higher dimensional relationships Criteria for comparison A range of visualization techniques for exploratory data analysis 2

Information Visualization n A conjunction of a number of fields: n n Data Mining Cognitive Science Graphic Design Interactive Computer Graphics 3

Information Visualization n n Information Visualization attempts to use visual approaches and dynamic controls to provide understanding and analysis of multidimensional data The data may have no inherent 2 D or 3 D semantics and may be abstract in nature. There is no underlying physical model. Much of the data in databases is of this type 4

Role of Information Visualization n n Acts as an exploratory tool Useful for identifying subsets of the data Structures, trends and outliers may be identified Statistical tests tend to incorporate isolated instances into a broader model as they attempt to formulate global features There is no requirement for an hypothesis, but the techniques can also support the formulation of hypotheses if wanted 5

Integrating Visualization With Data Mining n There are four possible approaches: n n Use the visualization technique to present the results of the data mining process. Use visualization techniques as complements to the data mining process. n They complement and increase understanding in a passive way. 6

Integrating Visualization With Data Mining n Use visualization techniques to steer the data mining process. n n The visualization aids in deciding the appropriate data mining technique to use and appropriate subsets of the data to consider. Apply data mining techniques to the visualization rather than directly to the data. n The idea is to capture the essential semantics visually then apply the data mining tools. 7

Discovery in Databases (a. k. a. Data Mining) Data Cleaning & Selection Enrichment Coding -domain consistency -de-duplication -disambiguation Data mining Reporting - clustering - segmentation - prediction Information Requirement Action Feedback Operational data External data The Knowledge Discovery in Databases (KDD) process (Ad. Z 1996) 8

Visualization in the Context of the Data Mining Process n Visualization tools can potentially be used at a number of steps in the DM process. But: n n the same tools may not be appropriate at each step how they will be used may be different 9

Visualization in the Context of the Data Mining Process n In general, it is not important whether data visualization is the first step in the process or not n the feedback loop which moves the process forward may be commenced by either a visualization or a query 10

Visualization in the Context of the Data Mining Process n some visualizations, (e. g. see slide 25) require an initial query to generate a visualization n this is an example of a complementary approach n questions generate visualizations, which may prompt further questions or generate hypotheses 11

Motivations for Visualization n The human visual system is extremely good at recognizing patterns n n it is quicker and easier to understand visual representations than to absorb information from language or formal notations. Exploratory visualization assists in: n n identifying areas of interest identifying questions which might usefully be asked 12

Motivations for Visualization n i. e. a relevant or revealing visualization of either part or all of a data set, may suggest useful questions and/or hypotheses to the analyst. These can then be confirmed by more rigorous approaches n e. g. some clustering techniques require an initial estimate of the number of clusters present in the data n visualization techniques can assist in this estimation 13

Criteria for Comparison of Visualization Tools n n n Number of dimensions that can be represented Number of data items that can be handled Ability to handle categorical and other nonnumeric data types Ability to reveal patterns Ease of use Learning Curve (to what degree is the technique intuitive) 14

Examples - Scatterplot n Each pair of features (i. e. fields of records) in a multidimensional database is graphed as a point in two dimensions (2 D) n This straightforward graphing procedure produces a simple scatterplot - a projection of the multidimensional data into 2 D 15

Examples - Scatterplot n The scatterplots of all pair-wise combinations of features are arranged in a matrix n n The figure on the following slide illustrates a scatter plot matrix of 3 D from a study of abrasion loss in tyres. The features are hardness, tensilestrength, abrasion-loss [Tie 1989] Each “sub-graph” gives insight into the relationship between a pair of features 16
![Scatterplot Matrix n Scatterplot matrix of abrasion loss data Tie 1989 17 Scatterplot Matrix n Scatterplot matrix of abrasion loss data [Tie 1989] 17](https://slidetodoc.com/presentation_image_h2/7f8e0fb19b89fe4b7ae84dfeb87bb031/image-17.jpg)
Scatterplot Matrix n Scatterplot matrix of abrasion loss data [Tie 1989] 17
![Possible Problems With Scatterplots n Everitt Eve 78 p 5 gives two reasons why Possible Problems With Scatterplots n Everitt [Eve 78, p. 5] gives two reasons why](https://slidetodoc.com/presentation_image_h2/7f8e0fb19b89fe4b7ae84dfeb87bb031/image-18.jpg)
Possible Problems With Scatterplots n Everitt [Eve 78, p. 5] gives two reasons why scatter plots can prove unsatisfactory: n if number of features is greater than ~10, the number of plots to be examined is very large n n this is just as likely to lead to confusion as to knowledge of the structures in the data. structures existing in multidimensional data set do not necessarily appear in the 2 D projections of the features represented in scatterplots (see next slide) 18

Possible Problems With Scatterplots n Despite these potential problems, variations on the scatterplot approach are the most commonly used of all the visualization techniques 19

Scatterplots: Recognizing Highdimensional Structures - 1 n A structure which appears as a cluster in a 2 D projection may in fact be a “pipe” in 3 D n a pipe is a structure in 3 D that looks like a rod or pipe when viewed in a 3 D representation 20

Scatterplots: Recognizing Highdimensional Structures - 1 n While the pipe is easily identifiable in a 3 D display only projections of it will appear in the 2 D components of the scatterplot matrix n depending of the orientation of the pipe in 3 D, it may not appear as an obvious cluster, if at all 21

Scatterplots: Recognizing Highdimensional Structures - 1 n Equivalent structures can exist in higher dimensions, e. g. a cluster in 5 D might be a “pipe” in 6 D n the appearance of high-D structures in lower. D projections depends on the luck and skill of the analyst in choosing the projections, and on the alignment of the structures to the axes 22

Scatterplots: recognizing highdimensional structures - 2 Random(Uniform) A cluster in 2 D May be a plane in 3 D May be a pipe in 3 D (or a cluster in 3 D) 23

Example Tool: Spotfire http: //www. spotfire. com/ 24

Example Tool: Spotfire http: //www. spotfire. com/ n n The user interacts with data by choosing which features will form the horizontal and vertical axes Other features can be represented by color n this is an example of using the richness of visual representations to provide more information to the user. As well as 2 D spatial position, other modes such as colour, size, shape and even sound can be used to convey information about high-dimensional data 25

Example Tool: Spotfire http: //www. spotfire. com/ n n On the previous slide, the data set contains a 3 D cluster The cluster can seen, with its centre at around (20, 74) n all the points in the cluster are red, showing that it’s a 3 D cluster 26

Example Tool: DBMiner http: //www. dbminer. com/ 27

Example Tool: DBMiner http: //www. dbminer. com/ n n DBMiner is an integrated data mining tool It employs a data visualization known as a “data cube” (see On-Line Analytic Processing - OLAP) 28

Example Tool: DBMiner http: //www. dbminer. com/ n After creating a data cube, user can apply a variety of data mining techniques to analyze the data further, including: n n association, classification, prediction and clustering, etc. The figure on the preceding slide shows a data cube for a data set which has 3 D cluster of data instances in a 3 D space 29

Examples: Parallel Coordinates -1 n n Uses the idea of mapping a point in a multidimensional feature space on to a number of parallel axes Each feature is mapped one axis n n as many axes as need can be lined up side to side there is no limit to the number of dimensions that can be represented 30

Examples: Parallel Coordinates -1 n n A single polygonal line connects the individual coordinate mappings for each point The technique has been applied in air traffic control, robotics, computer vision and computational geometry 31

Examples: Parallel Coordinates -2 Ci Ci-1 Cn C 1 X 2 X 3 n Xi-1 Xn Parallel axes for RN. The polygonal line shown represents the point C= (C 1, . . , C i-1, Ci+1, . . . , Cn) 32

Examples: Parallel Coordinates -3 n The Parallel Coordinates visualization technique is employed in the software Win. Viz http: //www. computer. org/intelligent/ex 1996/x 5069 abs. htm n The main advantage of the technique is that it can represent unlimited numbers of dimensions 33

Examples: Parallel Coordinates -3 n n When many points are represented using the parallel coordinates, the overlap of the polygonal lines can make it difficult to identify structures in the data. Certain structures, such as clusters, can often be identified but others are hidden due to the overlap. 34

Two Clusters In Win. Viz 35

Examples: Stick Figures n The stick figure technique is intended to make use of the user’s low-level perceptual processes [PGL 1995], such as perception of: n n texture, color, motion, and depth The hope is that the user will “automatically” try to make physical sense of the pictures of the data created 36

Examples: Stick Figures n n Visualizations which represent multidimensional feature spaces by using a number of subspaces of 3 D or less (e. g. scatterplots) rely more on our cognitive abilities than our perceptual abilities Stick figures avoid this, and present all variables and data points in a single representation. 37

Iconographic display using stick figures US Census Data http: //ivpr. cs. uml. edu/g allery/ 38

39

40

41

Examples: Pixel-based techniques http: //www. dbs. informatik. uni-muenchen. de/dbs/projekt/visdb. html n Query-Dependent Pixel-based Techniques n n n based on a query, a “semantic distance” is calculated between each of the query feature values and the features of each instance in the DB Distance is mapped to colour for each attribute Overall distance between the data values for a specific instance and the data attribute values used in the predicate of the query is also calculated 42

Examples: Pixel-based techniques http: //www. dbs. informatik. uni-muenchen. de/dbs/projekt/visdb. html n n n Instances are arranged on the screen, with the data items with highest relevance in the centre of the display, and then proceeding outwards in a spiral the values for each of the attributes are presented in separate subwindows the arrangement inside the subwindows is according to the overall distance 43
![QueryDependent Pixelbased Techniques Overall Distance n Result of a complex query Ke K 1994 Query-Dependent Pixel-based Techniques Overall Distance n Result of a complex query [Ke. K 1994]](https://slidetodoc.com/presentation_image_h2/7f8e0fb19b89fe4b7ae84dfeb87bb031/image-44.jpg)
Query-Dependent Pixel-based Techniques Overall Distance n Result of a complex query [Ke. K 1994] 44

Examples: Worlds within Worlds http: //www. cs. columbia. edu/graphics/projects/Auto. Visual. html n Employs virtual reality devices to represent an n. D virtual world in 3 D or 4 D-Hyperworlds n basic approach to reducing the complexity of a multidimensional function is to hold one or more of its independent variables constant n n equivalent to taking an infinitely thin slice of the world perpendicular to the constant variable’s axis can be repeated until there are 3 dimensions and the resulting slice can be manipulated and displayed with conventional 3 D graphics hardware 45

Examples: Worlds within Worlds http: //www. cs. columbia. edu/graphics/projects/Auto. Visual. html n After reducing the higher-dimensional space to 3 dimensions the additional dimensions can be added back, by adding additional 3 D worlds within the first 3 D world 46

Worlds within Worlds 47

Dynamic Techniques n Allow interaction with the visualization to explore the data more effectively. Can potentially be applied to all visualization techniques n n n Dynamic linking of the data attributes to the parameters of the visualization. Filtering Linking and “brushing” between multiple visualizations Zooming Details on demand 48

Other Techniques n n n Keim and Kriegel’s query independent approach Chernoff faces http: //www. fas. harvard. edu/~stats/Chernoff/Hcindex. htm Cone trees Perspective walls Visualization Spreadsheet A number of techniques especially developed for web pages and their links 49

Web References n n More lectures and demo software available at: http: //www. cs. auc. dk/·DVDM/courses. html 50