Multivariate Data Visualization Adapted from Slides by Matthew

Multivariate Data Visualization Adapted from Slides by: Matthew O. Ward Computer Science Department Worcester Polytechnic Institute This work was supported under NSF Grant IIS-9732897

What is Multivariate Data? z. Each data point has N variables or observations z. Each observation can be: y nominal or ordinal ydiscrete or continuous yscalar, vector, or tensor z. May or may not have spatial, temporal, or other connectivity attribute

Characteristics of a Variable z Order: grades have an order, brand names do not. z Distance metric: for income, distance equals difference. For rankings, difference is not a distance metric. z A variable can be classified by these three attributes, called Scale. z Effective visualizations attempt to match the scale of the data dimension with the graphical attribute conveying it.

Sources of Multivariate Data z. Sensors (e. g. , images, gauges) z. Simulations z. Census or other surveys z. Commerce (e. g. , stock market) z. Communication systems z. Spreadsheets and databases

Issues in Visualizing Multivariate Data z How many variables? z How many records? z Types of variables? z User task (exploration, confirmation, presentation) z Data feature of interest (clusters, anomalies, trends, patterns, …. ) z Background of user (domain expert, visualization specialist, decision-maker, …. )

Methods for Visualizing Multivariate Data z. Dimensional Subsetting z. Dimensional Reorganization z. Dimensional Reduction

Dimensional Subsetting z Scatterplot matrix displays all pairwise plots z Selection allows linkage between views z Clusters, trends, and correlations readily discerned between pairs of dimensions

Dimensional Reorganization z Parallel Coordinates creates parallel, rather than orthogonal, dimensions. z Data point corresponds to polyline across axes z Clusters, trends, and anomalies discernable as groupings or outliers, based on intercepts and slopes

Dimensional Reorganization z Glyphs map data dimensions to graphical attributes z Size, color, shape, and orientation are commonly used z Similarities/differences in features give insights into relations

Dimensional Reduction z Map N-D locations to M-D display space while best preserving N-D relations z Approaches include MDS, PCA, and Kohonen Self Organizing Maps z Relationships conveyed by position, links, color, shape, size, etc.

The Role of Selection z User needs to interact with display, examine interesting patterns or anomalies, validate hypotheses z Selection allows isolation of subset of data for highlighting, deleting, focussed analysis z Direct (clicking on displayed items ) vs. indirect (range sliders) z Screen space (2 -D) vs. data space (N-D)

Auxiliary Tools z Extent scaling to reduce occlusion of bands z Dimensional zooming - fill display with selected subspace (N-D distortion) z Dynamic masking to fade out selected or unselected data z Saving selected subsets z Enabling/disabling dimensions z Univariate displays (Tukey box plots, tree maps)