Object Oried Data Analysis Last Time Primal vs
Object Orie’d Data Analysis, Last Time • Primal vs. Dual PCA • PCA vs. SVD (cols vs. rows) (Centering) • Surface & Matrix (color) views (All were useful) • Network Data • Generalized SCREE plot
Vectors vs. Functions For “Functional Data Analysis” A philosophical issue Some: • (Useful? ) ways to think about this • Personal opinions
Vectors vs. Functions Recall overall structure: Object Space Feature Space Curves (functions) Vectors Connection 1: Digitization Parallel Coordinates Connection 2: Basis Representation
Vectors vs. Functions Connection 1: Digitization: Given a function , define vector Where is a suitable grid, e. g. equally spaced:
Vectors vs. Functions Connection 1: Given a vector Parallel Coordinates: , define a function where And linearly interpolate to “connect the dots” Proposed as High Dimensional Visualization Method by Inselberg (1985)
Vectors vs. Functions Parallel Coordinates: Given , define Now can “rescale argument” To get function on [0, 1], evaluated at equally spaced grid
Vectors vs. Functions Bridge between vectors & functions: Vectors Functions Isometry follows from convergence of: Inner Products By Reimann Summation
Vectors vs. Functions Main lesson: - OK to think about functions - But actually work with vectors For me, there is little difference But there is a statistical theory, and mathematical statistical literature on this Start with Ramsay & Silverman (2005)
Vectors vs. Functions Recall overall structure: Object Space Feature Space Curves (functions) Vectors Connection 1: Digitization Parallel Coordinates Connection 2: Basis Representation
Vectors vs. Functions Connection 2: Basis Representations: Given an orthonormal basis (in function space) E. g. – Fourier – B-spline – Wavelet Represent functions as:
Vectors vs. Functions Connection 2: Basis Representations: Represent functions as: Bridge between discrete and continuous:
Vectors vs. Functions Connection 2: Basis Representations: Represent functions as: Finite dimensional approximation: Again there is mathematical statistical theory, based on (same ref. )
Vectors vs. Functions Repeat Main lesson: - OK to think about functions - But actually work with vectors For me, there is little difference (but only personal taste)
PCA for shapes New Data Set: Corpus Callossum Data • “Window” between right and left halves of the brain • From a vertical slice MR image of head • “Segmented” (ie. found boundary) • Shape is resulting closed curve • Have sample from n = 71 people • Feature vector of d = 80 coeffic’ts from Fourier boundary representation (closed curve)
PCA for shapes Raw Data: Special thanks to Sean Ho View curves as movie Modes of variation?
PCA for shapes PC 1: Movie shows evolution along eigenvector Projections in bottom plot 2 Data Subclasses • Schizophrenics • Controls
PCA for shapes PC 1 Summary (Corpus Callossum Data) • Direction is “overall bending” • Colors studied later (sub populations) • An outlier? ? ? • Find it in the data? • Case 2: could delete & repeat (will study outliers in more detail)
PCA for shapes Raw Data: This time with numbers So can identify outlier
PCA for shapes PC 2: Movie shows evolution along eigenvector Projections in bottom plot
PCA for shapes PC 2 Summary (Corpus Callossum Data) • Rotation of right end • “Sharpening” of left end • “Location” of left end • These are correlated with each other • But independent of PC 1
PCA for shapes PC 3: Thin vs. fat Important mode of variation?
PCA for shapes Raw Data: Revisit to look for 3 modes • Bending • Endpts • Thinning
PCA for shapes Raw Data: Medial Repr’n Heart is Medial Atoms Spokes imply boundary Modes of Variation?
PCA for shapes PC 1 Summary (medial representation) • From same data as above Fourier boundary rep’n • But they look different • Since different type of fitting was done • Also, worst outlier was deleted • Modes of variation?
PCA for shapes PC 1: Overall Bending Same as for Fourier above Corr’d with right end fattening
PCA for shapes PC 2: Rotation of ends Similar to PC 2 of Fourier rep’n above
PCA for shapes PC 3: Distortion of Curvature Different from PC 2 of Fourier rep’n above
PCA for shapes PC 3 Summary (medial representation) • Systematic “distortion of curvature” • This time different from above Fourier boundary PC 3 • Lesson: different rep’ns focus on different aspects of data • I. e. not just differences in fitting • But instead on features that are emphasized • Thus choice of “features” is very important
PCA for shapes PC 4: Fattening and Thinning? Relate to Fourier rep’n ? ? ?
PCA for shapes PC 4 Summary (medial representation) • more like fattening and thinning • i. e. similar to Fourier boundary PC 3 (view again below) • but “more local” in nature • an important property of M-reps
PCA for shapes PC 3: Review this For Comparison with PC 4 from M-reps
Cornea Data Cornea: Outer surface of the eye Driver of Vision: Curvature of Cornea Sequence of Images Objects: Images on the unit disk Curvature as “Heat Map” Special Thanks to K. L. Cohen, N. Tripoli, UNC Ophthalmology
Cornea Data: Raw Data Modes of Variation?
Cornea Data • Reference: Locantore, et al (1999) Visualization (generally true for images): • More challenging than for curves (since can’t overlay) • Instead view sequence of images • Harder to see “population structure” (than for curves) • So PCA type decomposition of variation is more important
Cornea Data Nature of images (on the unit disk, not usual rectangle) • Color is “curvature” • Along radii of circle (direction with most effect on vision) • Hotter (red, yellow) for “more curvature” • Cooler (blue, green) for “less curvature” • Feature vec. is coeff’s of Zernike expansion • Zernike basis: ~ Fourier basis, on disk • Conveniently represented in polar coord’s
Cornea Data Representation - Zernike Basis • Pixels as features is large and wasteful • Natural to find more efficient represent’n • Polar Coordinate Tensor Product of: – Fourier basis (angular) – Special Jacobi (radial, to avoid singularities) • See: – Schwiegerling, Greivenkamp & Miller (1995) – Born & Wolf (1980)
PCA of Cornea Data Recall: PCA can find (often insightful) direction of greatest variability • Main problem: display of result (no overlays for images) • Solution: show movie of “marching along the direction vector” Cornea Data PC 1 movie
PCA of Cornea Data PC 1 Movie:
PCA of Cornea Data PC 1 Summary: • Mean (1 st image): mild vert’l astigmatism known pop’n structure called “with the rule” • Main dir’n: “more curved” & “less curved” • Corresponds to first optometric measure (89% of variat’n, in Mean Resid. SS sense) • Also: “stronger astig’m” & “no astig’m” • Found corr’n between astig’m and curv’re • Scores (blue): Apparent Gaussian dist’n
PCA of Cornea Data PC 2 Movie: Mean: same as above • Common centerpoint of point cloud • Are studying “directions from mean” Images along direction vector: • Looks terrible? ? ? • Why?
PCA of Cornea Data PC 2 Movie:
PCA of Cornea Data PC 2 Movie: • Reason made clear in Scores Plot (blue): • Single outlying data object drives PC dir’n • A known problem with PCA • Recall finds direction with “max variation” • In sense of variance • Easily dominated by single large observat’n
PCA of Cornea Data Toy Example: Single Outlier Driving PCA
PCA of Cornea Data PC 2 Affected by Outlier: How bad is this problem? View 1: Statistician: Arrggghh!!!! • Outliers are very dangerous • Can give arbitrary and meaningless dir’ns • What does 4% of MR SS mean? ? ?
PCA of Cornea Data PC 2 Affected by Outlier: How bad is this problem? View 2: Ophthalmologist: No Problem • Driven by “edge effects” (see raw data) • Artifact of “light reflection” data gathering (“eyelid blocking”, and drying effects) • Routinely “visually ignore” those anyway • Found interesting (& well known) dir’n: steeper superior vs steeper inferior
Cornea Data: Raw Data Which one is the outlier?
PCA of Cornea Data PC 3 Movie
PCA of Cornea Data PC 3 Movie (ophthalmologist’s view): • Edge Effect Outlier is present • But focusing on “central region” shows changing dir’n of astig’m (3% of MR SS) • “with the rule” (vertical) vs. “against the rule” (horizontal) • most astigmatism is “with the rule” • most of rest is “against the rule” (known folklore)
PCA of Cornea Data PC 4 movie
PCA of Cornea Data Continue with ophthalmologists view… PC 4 movie version: • Other direction of astigmatism? ? ? • Location (i. e. “registration”) effect? ? ? • Harder to interpret … • OK, since only 1. 7% of MR SS • Substantially less than for PC 2 & PC 3
PCA of Cornea Data Ophthalmologists View (cont. ) Overall Impressions / Conclusions: • Useful decomposition of population variation • Useful insight into population structure
PCA of Cornea Data Now return to Statistician’s View: • How can we handle these outliers? • Even though not fatal here, can be for other examples… • Recall Simple Toy Example (in 2 d):
- Slides: 52