STOR 881 Object Oriented Data Analysis UNC Stat
STOR 881 Object Oriented Data Analysis UNC, Stat & OR Angle-Based Joint and Individual Variation Explained Meilei Jiang Dept. of Statistics and Operations Research University of North Carolina May 18, 2021 1
Acronym UNC, Stat & OR 2
History UNC, Stat & OR • Generation 1 – JIVE: Eric Lock, Katherine Hoadley, J. S. Marron, Andrew Nobel. Joint and Individual Variation Explained (JIVE) for integrated analysis of multiple data types. Annals of Applied Statistics. 2013. • Generation 2 – AJIVE: Qing Feng, Jan Hannig, Meilei Jiang, J. S. Marron. Angle-Based Joint and Individual Variation Explained. ar. Xiv: 1704. 02060, 2017. 3
AJIVE Overview UNC, Stat & OR 4
AJIVE Overview UNC, Stat & OR 5
AJIVE Data Structure UNC, Stat & OR Organizational Model for Single Data Set: Matrix Convention in this talk: “Columns as Data Objects” (Atoms of the Statistical Analysis, i. e. Experimental Units, or Data Vectors) 6
AJIVE Data Structure UNC, Stat & OR Organizational Model for Single Data Set: Matrix Convention in this talk: “Columns as Data Objects” 7
AJIVE Data Structure UNC, Stat & OR Organizational Model for Single Data Set: Matrix Convention in this talk: Data Are “Mean Centered” i. e. Mean Vector is 0 8
AJIVE Data Structure UNC, Stat & OR Organizational Model for Single Data Set: Matrix Convention in this talk: Data Are “Mean Centered” i. e. Mean Vector is 0 i. e. Mean of Entries in Each Row is 0 9
AJIVE Data Structure UNC, Stat & OR AJIVE Organizational Model: Multiple Matrices (Data Types, i. e. “Blocks”) With common Columns as Data Objects 10
AJIVE Data Structure UNC, Stat & OR AJIVE Organizational Model: Multiple Matrices (Data Types, i. e. “Blocks”) With common Columns as Data Objects 1 st consider only 2 blocks 11
AJIVE Data Structure UNC, Stat & OR AJIVE Organizational Model: Multiple Matrices: 12
AJIVE Analytic Goals UNC, Stat & OR Explore & Quantify Variation In spirit of PCA (Principal Component Analysis) 13
Example of PCA Data Exploration UNC, Stat & OR Context: Single Data Set Fun Example: Spanish Male Mortality Age (5 -95) Thanks to: Andrés M. Alonso U. Carlos III, Madrid 14
Example of PCA Data Exploration UNC, Stat & OR Context: Single Data Set Fun Example: Spanish Male Mortality Data Objects: Mortality(age) = log(# of age died / # age) 15
Example of PCA Data Exploration UNC, Stat & OR Context: Single Data Set Fun Example: Spanish Male Mortality Data Objects: Mortality(age) = log(# of age died / # age) Over years (1908 - 2002) 16
PCA Exploration of Spanish Mortality UNC, Stat & OR Rainbow Representing Year: Magenta = 1908 Red = 2002 17
PCA Exploration of Spanish Mortality UNC, Stat & OR Color Code (Years) 18
PCA Exploration of Spanish Mortality UNC, Stat & OR Find Population Center (Mean Vector) 19
PCA Exploration of Spanish Mortality UNC, Stat & OR Blips Appear At Decades Since Ages Not Precise (in Spain) Reported as “about 50”, Etc. 20
PCA Exploration of Spanish Mortality UNC, Stat & OR Mean Residual View of Shifting Data To Origin 21
PCA Exploration of Spanish Mortality UNC, Stat & OR Shows: Main Age Effects in Mean, Not Variation About Mean 22
PCA Exploration of Spanish Mortality UNC, Stat & OR Projections Onto PC 1 Direction Loadings Plot Main Mode Of Variation: Constant Across Ages 23
PCA Exploration of Spanish Mortality UNC, Stat & OR Shows Major Improvement Over Time (medical technology, etc. ) 24
PCA Exploration of Spanish Mortality UNC, Stat & OR Shows Major Improvement Over Time (medical technology, etc. ) And Change In Age Rounding Blips 25
PCA Exploration of Spanish Mortality UNC, Stat & OR Projections Onto PC 2 Direction Loadings Plot 26
PCA Exploration of Spanish Mortality UNC, Stat & OR Projections Onto PC 2 Direction 2 nd Mode Of Variation: Difference Between 20 -45 & Rest 27
PCA Exploration of Spanish Mortality UNC, Stat & OR Scatterplot Matrix View Scores Plot Connecting Lines Highlight Time Order 28
PCA Exploration of Spanish Mortality UNC, Stat & OR Scatterplot Matrix View Scores Plot Common 1 st Graphic in Exploratory Analysis Reveals Relationships Between Data Objects 29
PCA Exploration of Spanish Mortality UNC, Stat & OR Common 2 nd Graphic in Exploratory Analysis Loadings Plot Shows Driver Of PC Direction Via Vector Entries 30
AJIVE Analytic Goals UNC, Stat & OR Explore & Quantify Variation In spirit of PCA (Principal Component Analysis) For 2+ Data Blocks 31
AJIVE Analytic Goals UNC, Stat & OR Explore & Quantify Variation Through Joint Variation (Focus on: Interactions) 32
AJIVE Analytic Goals UNC, Stat & OR Explore & Quantify Variation Through Joint Variation (Focus on: Interactions) And Individual Variation (Highlight: Unique Aspects) 33
AJIVE Toy Example UNC, Stat & OR “Heat Map” View Based on Color Bar Red for > 0 White for = 0 Blue for < 0 Intensity shows magnitude 34
AJIVE Toy Example UNC, Stat & OR 35
AJIVE Toy Example UNC, Stat & OR Underlying Components (Additive Signals + + Noise) 36
AJIVE Toy Example UNC, Stat & OR Underlying Components: Joint – Rank 1 Same “Subject Signal” (linear combo) 37
AJIVE Toy Example UNC, Stat & OR 38
AJIVE Toy Example UNC, Stat & OR 39
AJIVE Toy Example UNC, Stat & OR 40
AJIVE Toy Example UNC, Stat & OR Partial Least Squares: Decent Rank 2 Approx. , but no Indiv. Comp. 41
AJIVE Toy Example UNC, Stat & OR Early JIVE (Lock et al, 2013): Good 2 -d Approx. – Missed Joint-Ind Decomp. 42
AJIVE Toy Example UNC, Stat & OR AJIVE(Feng et al, 2017): Excellent Decomposition 43
AJIVE Algorithm UNC, Stat & OR Overview: Step 1: Signal Space Initial Extraction Step 2: Score Space Segmentation Step 3: Final Decomposition 44
AJIVE Algorithm UNC, Stat & OR Step 1 45
AJIVE Algorithm UNC, Stat & OR Step 1: Signal Space Initial Extraction Residuals 46
AJIVE Algorithm UNC, Stat & OR Step 1: Signal Space Initial Extraction X SVD rank = 2 Y SVD rank = 3 Toy Example: Scree Plot Rank Selection 47
AJIVE Algorithm UNC, Stat & OR Step 2: Overview 48
AJIVE Algorithm UNC, Stat & OR Step 2: Score Space Segmentation (Further Decomposition) 49
AJIVE Algorithm UNC, Stat & OR Step 2: Score Space Segmentation – Notion of Joint 50
AJIVE Algorithm UNC, Stat & OR Step 2: Score Space Segmentation – Notion of Individual 51
AJIVE Algorithm UNC, Stat & OR Step 2 a: Principal Angle Analysis 52
AJIVE Algorithm UNC, Stat & OR 53
AJIVE Algorithm UNC, Stat & OR 54
AJIVE Algorithm UNC, Stat & OR 55
AJIVE Algorithm UNC, Stat & OR Step 2: Score Space Segmentation Principal Angle Analysis – SVD Principal Vectors Joint Space Basis Vectors: Common Normalized Scores 56
AJIVE Algorithm UNC, Stat & OR Step 2: Score Space Segmentation Multi-block Case – Generalization of Principal Angle Analysis Joint Space Basis Vectors: Common Normalized Scores 57
AJIVE Algorithm UNC, Stat & OR Step 2 b: Segment ation of Joint Space Basis 58
AJIVE Algorithm UNC, Stat & OR 59
AJIVE Algorithm UNC, Stat & OR Step 2: Score Space Segmentation Thresholding of SVD in Toy Example 60
AJIVE Algorithm UNC, Stat & OR Step 3 61
AJIVE Algorithm UNC, Stat & OR Step 3: Final Decomposition Challenge: After Finding Joint Space, Some Blocks Might Have Comp. Var. < Thresh. (e. g. “Semi-Joint” in Multi-Block) Then Move Comp. to Indiv. or Error 62
AJIVE Algorithm UNC, Stat & OR 63
AJIVE Algorithm UNC, Stat & OR CNS 64
AJIVE Algorithm UNC, Stat & OR 65
AJIVE Algorithm UNC, Stat & OR BSS 66
AJIVE Algorithm UNC, Stat & OR 67
AJIVE Algorithm UNC, Stat & OR Full Matrix 68
AJIVE Algorithm UNC, Stat & OR 69
AJIVE Algorithm UNC, Stat & OR 70
AJIVE Applications UNC, Stat & OR Application 1: Spanish Mortality Data Males vs Females Application 2: Cancer Genetics Gene Expression vs Copy Number vs Protein Expression vs Mutation Application 3: Functional Magnetic Resonance Imaging Behavior vs Brain Connectivity 71
Revisit Spanish Mortality Data UNC, Stat & OR Recall PCA Example & Analysis Above: 72
Revisit Spanish Mortality Data UNC, Stat & OR Data Blocks: + Spanish Males Age as features (0 – 95) Years as Subjects (1908 - 2002) Spanish Females Age as features 73
AJIVE Analysis – Spanish Mortality Data UNC, Stat & OR First Joint Component Male Female 74
AJIVE Analysis – Spanish Mortality Data UNC, Stat & OR Second Joint Component Male Female 75
AJIVE Analysis – Spanish Mortality Data UNC, Stat & OR Individual Component – Male 76
AJIVE Analysis – Spanish Mortality Data UNC, Stat & OR Individual Component – Female 77
AJIVE Research in Progress UNC, Stat & OR v Improved 1 st Thresholds (Design a Test? ) v Sub-Joint Blocks (e. g. Pairs) 78
AJIVE Research in Progress UNC, Stat & OR v Improved 1 st Thresholds (Design a Test? ) v Sub-Joint Blocks (e. g. Pairs) ≈ + + 79
AJIVE Research in Progress UNC, Stat & OR v Improved 1 st Thresholds (Design a Test? ) v Sub-Joint Blocks (e. g. Pairs) v Statistical Inference v Joint Block as Nuisance Batch Effects v Supervised JIVE (Vert’l and/or Horiz’l) 80
Major Application: Cancer Genetics UNC, Stat & OR Data Source: The Cancer Genome Atlas Subset Here: Breast Cancer Figure : The Cancer Genome Atlas Research Network, Weinstein, J. N. , Collisson, E. A. , Mills, G. B. , Shaw, K. M. , Ozenberger, B. A. , Ellrott, K. , Shmulevich, I. , Sander, C. , and Stuart, J. M. (2013) The Cancer Genome Atlas Pan-Cancer analysis project. 81
Major Application: Cancer Genetics UNC, Stat & OR 82
TCGA Breast Cancer Data UNC, Stat & OR Separate PCA on Gene Expression (GE) Clear Subtype Effects (Not Surprising) 83
TCGA Breast Cancer Data UNC, Stat & OR Separate PCA on Gene Expression (GE) PC 1 Flags Basals PC 2 Flags Lum. As 84
TCGA Breast Cancer Data UNC, Stat & OR Separate PCA on Copy Number (CN) Other Effects +Subtypes 85
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR Gene Expression (GE) 1 st SVD Threshold Suggested Ranks: 11 / 16 86
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR Copy Number (CN) 1 st SVD Threshold Suggested Ranks: 6 / 12 87
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR Proteins (RPPA) 1 st SVD Threshold Suggested Ranks: 8 / 15 88
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR Mutations (Mut) 1 st SVD Threshold Suggested Ranks: 12 / 16 89
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR 4 -way Joint Threshold 2 nd Comp. ~ Thresh. 90
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR 4 -way Joint Threshold After Step 3: Only 1 Remaining Joint Component (2 nd Strong in all but Mut. ) (Fits w/ Biologist’s Impression of Mut. Data) 91
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR 4 -way Joint Scores (CNS) Flags Lum. As Effect Driven By All 4 Factors Together Interesting Point: Mutation Plays Role Here 92
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR Next Issue: 3 -way JIVE for GE-CN-RPPA Approach: Use 3 Indiv. Blocks from above, GE, CN, RPPA as inputs Perform JIVE, Study Joint PC 1 93
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR Joint JIVE PC 1 for GE-CN-RPPA (CNS) Great Separation of Basals Some Separation of Her 2 s Lum. As & Lum. Bs Combined All More Distinct Than Raw Data Differences Separate from Mutation 94
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR Now Focus on Pair-Wise Comparisons: q AJIVE on GE – CN only q Hypothesis Test of Basals vs. Others (Di. Pro. Perm Z-scores Allow Quantitation) 95
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR AJIVE on GE – CN, Test of Basals vs. Others (SEPARATE!) CN - PCA View Group Coloring 96
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR AJIVE on GE – CN, Test of Basals vs. Others Better Separation From DWD & Ortho PC Directions How Separated? ? ? 97
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR AJIVE on GE – CN, Test of Basals vs. Others Di. Pro. Perm Hypothesis Test Stat. Sig. ~ Z-score 98
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR AJIVE on GE – CN, Test of Basals vs. Others Summarize Z-scores Di. Pro. Perm Z-Score (MD) Basal Vs. Other Tumor Separate PCA: GE >> CN GE (16) - Copy Number (6): n=790 78. 30 69. 75 (Not Surprising) 32. 44 27. 07 9. 62 -0. 30 GE Copy Number Joint GE Joint Copy Number Indivdual GE Individual Copy Number 99
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR AJIVE on GE – CN, Test of Basals vs. Others Summarize Z-scores Joint AJIVE >> Separate Di. Pro. Perm Z-Score (MD) Basal Vs. Other Tumor GE (16) - Copy Number (6): n=790 78. 30 69. 75 32. 44 AJIVE Data Combo Helps! 27. 07 9. 62 -0. 30 GE Copy Number Joint GE Joint Copy Number Indivdual GE Individual Copy Number 100
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR AJIVE on GE – CN, Test of Basals vs. Others Summarize Z-scores Di. Pro. Perm Z-Score (MD) Basal Vs. Other Tumor GE (16) - Copy Number (6): n=790 GE has info Not in CN 78. 30 69. 75 32. 44 But not CN 27. 07 9. 62 -0. 30 GE Copy Number Joint GE Joint Copy Number Indivdual GE Individual Copy Number 101
AJIVE Analysis – TCGA Breast Cancer Data UNC, Stat & OR To be Continued … (Both Inter of Above, & Other Case & Scores Plots) 102
Functional Magnetic Resonance Imaging UNC, Stat & OR 2015 SAMSI Program On Neuroscience Qunqun Yu Benjamin Risk 103
Functional Magnetic Resonance Imaging UNC, Stat & OR Approach: Ø Measure “Brain Activity” Ø Over Both Location & Time Ø Via Blood Flow Ø Data Objects? ? ? 104
Functional Magnetic Resonance Imaging UNC, Stat & OR Data Objects? 3 d- Movie? Time Series At Voxels? Thanks to fmri. ucsd. edu 105
Functional Magnetic Resonance Imaging UNC, Stat & OR Study Data From: 106
Functional Magnetic Resonance Imaging UNC, Stat & OR 107
Functional Magnetic Resonance Imaging UNC, Stat & OR Separate PC 1 of Behavior Features View Entries of Eigenvectors Important Groups 108
Functional Magnetic Resonance Imaging UNC, Stat & OR Separate PC 1 of Image Loadings View Entries of Eigenvectors All Up & Down Together Also Some Hot Spots? 109
Functional Magnetic Resonance Imaging UNC, Stat & OR Separate PC 2 of Image Loadings Interesting Structure ? ? ? And Same Hot Spots? 110
Functional Magnetic Resonance Imaging UNC, Stat & OR Previous Mapping of Regions of Interest Hot Spots Are Working Memory Related 111
Functional Magnetic Resonance Imaging UNC, Stat & OR Joint PC 1 of Image Loadings Using Behavior Combines Hot Spots From Separate PC 1 & 2 112
Functional Magnetic Resonance Imaging UNC, Stat & OR Overall Comparison Sep. Beh. PC 1 Assoc. w. Image Joint Im. PC 1 Concentrates Relevant Parts Of Sep. PC 1 & PC 2 113
Functional Magnetic Resonance Imaging UNC, Stat & OR Overall Comparison Overall Heat Var’n not Assoc. w/ Behavior New 2 nd Mode Of Joint Var’n Behavior ~ Sep. PC 2 114
- Slides: 114