Hailuoto Workshop UNC Stat OR Object Oriented Data
Hailuoto Workshop UNC, Stat & OR Object Oriented Data Analysis, I J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina 17 December 2021 1
Object Oriented Data Analysis, I UNC, Stat & OR What is the “atom” of a statistical analysis? n 1 st Course: Numbers n Multivariate Analysis Course : n Functional Data Analysis: n More generally: Data Objects Vectors Curves 2
Functional Data Analysis, I UNC, Stat & OR Curves as Data Objects Important Duality: Curve Space Point Cloud Space Illustrate with Travis Gaydos Graphics n 2 dim’al curves (easy to visualize) 3
Functional Data Analysis, Toy EG I UNC, Stat & OR 4
Functional Data Analysis, Toy EG II UNC, Stat & OR 5
Functional Data Analysis, Toy EG III UNC, Stat & OR 6
Functional Data Analysis, Toy EG IV UNC, Stat & OR 7
Functional Data Analysis, Toy EG V UNC, Stat & OR 8
Functional Data Analysis, Toy EG VI UNC, Stat & OR 9
Functional Data Analysis, Toy EG VII UNC, Stat & OR 10
Functional Data Analysis, Toy EG VIII UNC, Stat & OR 11
Functional Data Analysis, Toy EG VIII UNC, Stat & OR 12
Functional Data Analysis, Toy EG IX UNC, Stat & OR 13
Functional Data Analysis, Toy EG X UNC, Stat & OR 14
Functional Data Analysis, 10 -d Toy EG 1 UNC, Stat & OR 15
Functional Data Analysis, 10 -d Toy EG 1 UNC, Stat & OR 16
Functional Data Analysis, 10 -d Toy EG 2 UNC, Stat & OR 17
Functional Data Analysis, 10 -d Toy EG 2 UNC, Stat & OR 18
Object Oriented Data Analysis, I UNC, Stat & OR What is the “atom” of a statistical analysis? n 1 st Course: Numbers n Multivariate Analysis Course : n Functional Data Analysis: n More generally: Data Objects Vectors Curves 19
Object Oriented Data Analysis, II UNC, Stat & OR Examples: n n Medical Image Analysis n Images as Data Objects? n Shape Representations as Objects Micro-arrays n Just multivariate analysis? 20
Object Oriented Data Analysis, III UNC, Stat & OR Typical Goals: n Understanding population variation n Principal Component Analysis + n Discrimination (a. k. a. Classification) n Time Series of Data Objects 21
Object Oriented Data Analysis, IV UNC, Stat & OR Major Statistical Challenge, I: High Dimension Low Sample Size (HDLSS) n Dimension d >> sample size n n “Multivariate Analysis” nearly useless n n Can’t “normalize the data” Land of Opportunity for Statisticians n Need for “creative statisticians” 22
Object Oriented Data Analysis, V UNC, Stat & OR Major Statistical Challenge, II: n n Data may live in non-Euclidean space n Lie Group / Symmetric Spaces n Trees/Graphs as data objects Interesting Issues: n What is “the mean” (pop’n center)? n How do we quantify “pop’n variation”? 23
Statistics in Image Analysis, I UNC, Stat & OR First Generation Problems: n Denoising n Segmentation n Registration (all about single images) 24
Statistics in Image Analysis, II UNC, Stat & OR Second Generation Problems: n Populations of Images n Understanding Population Variation n Discrimination (a. k. a. Classification) n Complex Data Structures (& Spaces) n HDLSS Statistics 25
HDLSS Statistics in Imaging UNC, Stat & OR Why HDLSS (High Dim, Low Sample Size)? n Complex 3 -d Objects Hard to Represent n n Often need d = 100’s of parameters Complex 3 -d Objects Costly to Segment n Often have n = 10’s cases 26
Object Representation UNC, Stat & OR n n n Landmarks (hard to find) Boundary Rep’ns (no correspondence) Medial representations n n Find “skeleton” Discretize as “atoms” called M-reps 27
3 -d m-reps UNC, Stat & OR Bladder – Prostate – Rectum (multiple objects, J. Y. Jeong) • Medial Atoms provide “skeleton” • Implied Boundary from “spokes” “surface” 28
Personal HDLSS Viewpoint: Data UNC, Stat & OR n. Images “Points” (cases) are n. In Feature Space n. Features are Axes n. Data set is “Point Clouds” n. Use Proj’ns to visualize 29
Personal HDLSS Viewpoint: PCA UNC, Stat & OR n. Rotated n. Often n. One Axes Insightful set of Dir’ns n. Others Useful, too 30
Cornea Data, I UNC, Stat & OR Images as data n ~42 Cornea Images n. Outer n. Heat surface of eye map of curvature (in radial direction) n. Hard to understand “population structure” 31
Cornea Data, II UNC, Stat & OR PC 1 n. Starts at Pop’n Mean n. Overall Curvature n. Vertical Astigmatism n. Correlated! n. Gaussian Projections n. Visualization: Can’t Overlay (so use movie) 32
Cornea Data, III UNC, Stat & OR PC 2 n. Horrible Outlier! (present in data) n. But look only in center: Steep at top -- bottom n. Want n. For Robust PCA HDLSS data ? ? ? 33
Cornea Data, IV UNC, Stat & OR Robust PC 2 n. No outlier impact n. See top – bottom variation n. Projections now Gaussian 34
PCA for m-reps, I UNC, Stat & OR Major issue: m-reps live in (locations, radius and angles) E. g. “average” of: = ? ? ? Natural Data Structure is: Lie Groups ~ Symmetric spaces (smooth, curved manifolds) 35
PCA for m-reps, II UNC, Stat & OR PCA on non-Euclidean spaces? (i. e. on Lie Groups / Symmetric Spaces) T. Fletcher: Principal Geodesic Analysis Idea: replace “linear summary of data” With “geodesic summary of data”… 36
PGA for m-reps, Bladder-Prostate-Rectum UNC, Stat & OR Bladder – Prostate – Rectum, 1 person, 17 days PG 1 PG 2 PG 3 (analysis by Ja Yeon Jeong) 37
PGA for m-reps, Bladder-Prostate-Rectum UNC, Stat & OR Bladder – Prostate – Rectum, 1 person, 17 days PG 1 PG 2 PG 3 (analysis by Ja Yeon Jeong) 38
PGA for m-reps, Bladder-Prostate-Rectum UNC, Stat & OR Bladder – Prostate – Rectum, 1 person, 17 days PG 1 PG 2 PG 3 (analysis by Ja Yeon Jeong) 39
- Slides: 39