Quick Simple Introduction to Multidimensional Scaling n Professor













- Slides: 13
Quick & Simple Introduction to Multidimensional Scaling n Professor Tony Coxon n Hon. Professorial Research Fellow, University of Edinburgh ( apm. coxon@ed. ac. uk ) see www. tonycoxon. com for information on me n see www. newmdsx. com for information resource on MDS and New. MDSX programs/doc. n See: n n “The User’s Guide to MDS” and “Key Texts in MDS” (readings), Heineman 1982 Available as pdf at £ 15 from newmdsx
What is Multidimensional Scaling? A student’s definition: n If you are interested in how certain objects relate to each other … and if you would like to present these relationships in the form of a map then MDS is the technique you need” (Mr Gawels, KUB) A good start! MDS is a family of models structured by D-T-M: n n (DATA) the empirical information on inter-relationships between a set of “objects”/variables are given in a set of dis/similarity data (TRANSFORMATION) which are then re-scaled ( according to permissible transformations for the data / level of measurement) , in terms of (MODEL) the assumptions of the model chosen to represent the data
MDS Solution … to produce a SOLUTION, consisting of : 1. a CONFIGURATION, which is a n i. pattern of points representing the “objects” n ii. located in a space of a small number of dimensions (hence SSA – “Smallest-Space Analysis”) n iii. where the distances between the points represent the 1. iv. as perfectly as possible dis/similarities between the data-points (the imperfection/badness of fit is measured by Stress) n “Low stress is desirable; No stress is perfection”
Distances & Maps n Given a map, it’s easy to calculate the (Euclidean) distances between the points : n MDS operates the other way round: Given the “distances” [data] find the map [configuration] which generated them n n … and MDS can do so when all but ordinal information has been jettisoned (fruit of the “non-metric revolution”) even when there are missing data and in the presence of considerable “noise”/error (MDS is robust). MDS thus provides at least n [exploratory] a useful and easily-assimilable graphic visualization of a complex data set (Tukey: “A picture is worth a thousand words”)
What is like MDS? Related and Special-case Models: n Metric Scalar Products Models: n n n Metric and Non-Metric Ultrametric Distance, Discrete models o o o n *Simple (2 W 2 M) and Multiple (3 W) Correspondence Analysis BECAUSE OF NON-METRIC (MONOTONE) REGRESSION, MDS ALSO OFFERS ORDINAL EQUIVALENTS OF: o o n *Hierarchical Clustering *Partition Clustering (CONPAR) Additive Clustering ( 2 and 3 -way) Metric Chi-squared Distance Model for 2 W 2 M and 3 W data / Tables o n *PRINCIPAL COMPONENTS ANALYSIS FACTOR ANALYSIS (+ communalities) *ANOVA other simple composition models …* UNICON (All models with asterisk * exist as programs within New. MDSX)
How does MDS differ from other Multivariate Methods? Compared to other multivariate methods, MDS models are usually: n distribution-free n n make conservative (non-metric) demands on the structure of the data, are relatively unaffected by non-systematic missing data, can be used with a very wide variety of types of data: n n n (though MLE models do exist – Ramsay’s MULTISCALE) direct data (pair comparisons, ratings, rankings, triads, sortings) derived data (profiles, co-occurrence matrices, textual data, aggregated data) measures of association/correlation etc derived from simpler data, and tables of data. range of transformations n monotonic (ordinal), linear/metric (interval), but also log-interval, power, “smoothness” – even “maximum variance non-dimensional scaling” (Shepard)
How does MDS differ from other Multivariate Methods (2)? Compared to other multivariate methods, MDS models are also offer: n n range of models (chiefly distance (Euclidean, but also City-block), factor/vector (scalar-products), simple composition (additive). Also there are hierarchies of models: n n n Similarity models: 2 W 1 M METRIC – 3 W 2 M INDSCAL – IDIOSCAL (honest!) Preference models : Vector-distance-weighted distance-rotated, weighted (PREFMAP) Procrustes rotation for putting configurations into maximum conformity, and then increasingly complex transformations: PINDIS the solutions are visually assimilable & readily interpretable the structure is not limited to dimensional information – also other simple structures (“horseshoes”, radex/circumplex, clusters, directions).
Weaknesses in MDS n n Relative ignorance of the sampling properties of stress prone-ness to local minima solutions n n n There ARE any? ? ! (but less so, and interactive programs like PERMAP allow thousands of runs to check) a few forms of data/models are prone to degeneracies (especially MD Unfolding – but see new PREFSCAL in SPSS) difficulty in representing the asymmetry of causal models n n though external analysis is very akin to dependent-independent modelling, there are convergences with GLM in hybrid models such as CLASCAL (INDSCAL with parameterization of latent classes)
CHARACTERIZATION OF BASIC MDS & TERMINOLOGY Structure of MDS specifiable in terms of D-T-M DATA (specifies input data shape and content) DATA MATRIX INPUT: n n WAY: ‘dimensionality’ of array (2, 3, 4. . . ) MODALITY: No of distinct sets (to be represented) (1, 2, 3 …) n n NB: Modality < or = Way Common examples: n n n 2 W 1 M 2 W 2 M 3 W 2 M basic models (LTM, UTM, FSM) rectangular, joint (conditional )mapping (“stack” of 2 W 1 M) Individual differences Scaling
CHARACTERIZATION OF BASIC MDS (2) TRANSFORMATION (form or type of rescaling performed on data) o Non-Metric /Ordinal: = M(d) § Monotonic Increasing (sims) or Decreasing (dissims) · Order/inequality o Strong / Guttman: (j, k) > (l, m) -> d(j, k) > d(l, m) o weak/Kruskal: (j, k) > (l, m) -> d(j, k) d(l, m) · Equality / ties o Primary (j, k) = (l, m) -> d(j, k) = OR d(l, m) o 2 ndary (j, k) = (l, m) -> d(j, k) = d(l, m) o Metric / Linear § Linear: = L(d) § = a + b(d)
CHARACTERIZATION OF BASIC MDS (3) n MODEL: Euclidean Distance where x(i, a) is the co-ordinate of point i on dimension a in the solution configuration X of low dimension n The basic model is Euclidean distance, but other Minkowski metrics are available, including: n City Block Model
(Badness of) FIT: Stress
Types of Analysis INTERNAL: If the analysis depends solely on the input data, it is termed “internal”, but n EXTERNAL: If the analysis uses additionally to the input data / solution information relating to the same points (but from another source), it is termed “external”. n