PCA to find clusters Return to PCA of
- Slides: 141
PCA to find clusters Return to PCA of Mass Flux Data: Big Question: Are The 3 Clusters Really There?
PCA to find clusters Si. Zer analysis of Mass Flux, PC 1 All 3 Signif’t
Statistical Smoothing q Usefulness of Si. Zer : Detailed & Insightful Analysis of 1 Dataset q To Analyze Many Data Sets Need Automatic Choice q Reference: Jones, et al. (1996) q Also Recall Goldilocks Visual Approach Too Small, Too Big, Just Right
Q-Q plots Simple Toy Example, non-Gaussian!
Q-Q plots Simple Toy Example, non-Gaussian(? )
Q-Q plots Simple Toy Example, Gaussian
Q-Q plots Simple Toy Example, Gaussian?
Q-Q plots •
ROC Curve Slide Cutoff To Trace Out Curve
Q-Q plots Slide Cutoff To Trace Out Curve
Q-Q plots Illustrative graphic (toy data set): Empirical Qs near Theoretical Qs when Q-Q curve is near 450 line (general use of Q-Q plots)
Alternate Viewpoints P-P Plot = ROC Curve: Study Differences Between Data Sets Focus on Main Body of Distributions Q-Q Plot: For Checking Empirical Distribution vs. Theoretical Distribution Focus on Tails
Q-Q plots Gaussian? Departures from line?
Q-Q plots non-Gaussian! departures from line?
Q-Q plots non-Gaussian (? ) departures from line?
Q-Q plots Gaussian: Stays Within Envelope As Expected, Since This Is Null Hypothesis
Q-Q plots Gaussian? departures from line?
Q-Q plots What were these distributions? • Non-Gaussian! – 0. 5 N(-1. 5, 0. 752) + 0. 5 N(1. 5, 0. 752) • Non-Gaussian (? ) – 0. 4 N(0, 1) + 0. 3 N(0, 0. 52) + 0. 3 N(0, 0. 252) • Gaussian? – 0. 7 N(0, 1) + 0. 3 N(0, 0. 52)
Q-Q plots Non-Gaussian!. 5 N(-1. 5, 0. 752) + 0. 5 N(1. 5, 0. 752) True Density
Q-Q plots Non-Gaussian (? ) 0. 4 N(0, 1) + 0. 3 N(0, 0. 52) + 0. 3 N(0, 0. 252) Strong Kurtosis Now Visible
Q-Q plots Gaussian
Q-Q plots Gaussian? 0. 7 N(0, 1) + 0. 3 N(0, 0. 52) Less Kurtosis But Present
Q-Q Envelope Plots Marron’s Matlab Software: qq. LM. m In General Directory
Q-Q plots •
Q-Q plots •
Q-Q plots Variations on Q-Q Plots: • Can replace Gaussian with other dist’ns • Can compare 2 theoretical distn’s • Can compare 2 empirical distn’s • Could also Vary P-P plots = ROC curves
Clustering •
Clustering Important References: • Mac. Queen (1967) • Hartigan (1975) • Gersho and Gray (1992) • Kaufman and Rousseeuw (2005) See Also: Wikipedia
K-means Clustering • Each goes into exactly 1 class
K-means Clustering •
K-means Clustering •
K-means Clustering •
K-means Clustering •
2 -means Clustering Study CI, using simple 1 -d examples • Varying Standard Deviation
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering Study CI, using simple 1 -d examples • Varying Standard Deviation • Varying Mean
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering Study CI, using simple 1 -d examples • Varying Standard Deviation • Varying Mean • Varying Proportion
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering Study CI, using simple 1 -d examples • Over changing Classes (moving b’dry)
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering C. Index for Clustering Greens & Blues
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering Curve Shows CI for Many Reasonable Clusterings
2 -means Clustering •
2 -means Clustering
2 -means Clustering Study CI, using simple 1 -d examples • Over changing Classes (moving b’dry) • Multi-modal data interesting effects – Can have 4 (or more) local mins (even in 1 dimension, with K = 2)
2 -means Clustering
2 -means Clustering Study CI, using simple 1 -d examples • Over changing Classes (moving b’dry) • Multi-modal data interesting effects – Local mins can be hard to find – i. e. iterative procedures can “get stuck” (even in 1 dimension, with K = 2) Common, But Slippery, Approach: Many Random Restarts
2 -means Clustering Study CI, using simple 1 -d examples • Effect of a single outlier?
2 -means Clustering Minimum CI Splits in Half
2 -means Clustering Already Have Local Minima
2 -means Clustering
2 -means Clustering Global CI Minimum Now Here
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering
2 -means Clustering Single Outlier Can Make CI Arbitrarily Small
2 -means Clustering Study CI, using simple 1 -d examples • Effect of a single outlier? – Can create local minimum – Can also yield a global minimum – This gives a one point class – Can make CI arbitrarily small (really a “good clustering”? ? ? )
K-Means Clustering 2 -d Toy Example Recall From Before (When Studying Kernel PCA) Long Thin Cluster Close Round Clusters Outliers or Clusters? ? ?
K-Means Clustering 2 -d Toy Example K-Means Can Be Slippery Local Minimum?
K-Means Clustering 2 -d Toy Example, No Outliers K-Means Can Be Slippery, Careful About Local Minima
SWISS Score Another Application of CI (Cluster Index) Cabanski et al (2010) Idea: Use CI in bioinformatics to “measure quality of data preprocessing” Philosophy: Clusters Are Scientific Goal So Want to Accentuate Them
SWISS Score Toy Examples (2 -d): Which are “More Clustered? ”
SWISS Score Toy Examples (2 -d): Which are “More Clustered? ”
SWISS Score •
SWISS Score •
SWISS Score •
SWISS Score •
SWISS Score •
SWISS Score Revisit Toy Examples (2 -d): Which are “More Clustered? ”
SWISS Score Toy Examples (2 -d): Which are “More Clustered? ”
SWISS Score Toy Examples (2 -d): Which are “More Clustered? ”
SWISS Score •
SWISS Score •
SWISS Score •
SWISS Score •
SWISS Score •
SWISS Score K-Class SWISS: Instead of using K-Class CI Use Average of Pairwise SWISS Scores
SWISS Score K-Class SWISS: Instead of using K-Class CI Use Average of Pairwise SWISS Scores (Preserves [0, 1] Range)
SWISS Score Avg. Pairwise SWISS – Toy Examples
SWISS Score Additional Feature: Ǝ Hypothesis Tests: ü H 1: SWISS 1 < 1 ü H 1: SWISS 1 < SWISS 2 Permutation Based See Cabanski et al (2010)
Clustering • A Very Large Area • K-Means is Only One Approach • Has its Drawbacks (Many Toy Examples of This) • Ǝ Many Other Approaches • Important (And Broad) Class Hierarchical Clustering
Hierarchical Clustering Idea: Consider Either: Bottom Up Aggregation: One by One Combine Data Top Down Splitting: All Data in One Cluster & Split Through Entire Data Set, to get Dendogram
Hierarchical Clustering Aggregate or Split, to get Dendogram Thanks to US EPA: water. epa. gov
Hierarchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals
Hierarchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals, Move Up
Hierarchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals, Move Up
Hierarchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals, Move Up
Hierarchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals, Move Up
Hierarchical Clustering Aggregate or Split, to get Dendogram Aggregate: Start With Individuals, Move Up, End Up With All in 1 Cluster
Hierarchical Clustering Aggregate or Split, to get Dendogram Split: Start With All in 1 Cluster
Hierarchical Clustering Aggregate or Split, to get Dendogram Split: Start With All in 1 Cluster, Move Down
Hierarchical Clustering Aggregate or Split, to get Dendogram Split: Start With All in 1 Cluster, Move Down, End With Individuals
Hierarchical Clustering Aggregate or Split, to get Dendogram While Result Is Same, There Are Computational Considerations
Hierarchical Clustering • A Lot of “Art” Involved
Hierarchical Clustering Dendogram Interpretation Branch Length Reflects Cluster Strength
Hierarchical Clustering 2 -d Toy Example Recall From Before (When Studying Kernel PCA) Long Thin Cluster Close Round Clusters Outliers or Clusters? ? ?
Participant Presentation Ram Basak: FDA on Health Outcomes Siqi Xiang: Analysis of Knee Osteoarthritis Data: Auto transformation and BET Nicolas Wolczynski: Urban Sound Classification Mingyi Wang: Symbolic Data principal component analysis
- How to find internal rate of return
- How to find expected return
- Danielson clusters
- Virginia career clusters
- Mn career clusters
- O*net work importance profiler
- Mapreduce simplified data processing on large clusters
- Lower brain
- Mcis career clusters
- Enrichment clusters
- Noun cluster
- Regional clusters
- Types of waves quad clusters
- Galaxy clusters
- Vowels and consonants difference
- Introduction to career clusters - vocabulary
- Mcis career clusters
- Cocci in grape-like clusters
- Fcs lanchpad
- Vdoe career clusters
- A set of nested clusters organized as a hierarchical tree
- Two almond shaped neural clusters
- High performance computing linux
- Short entrance lane
- Why are career clusters important
- Jmu cluster 3
- Types of waves quad clusters answer key
- Mapreduce simplified data processing on large clusters
- Four major population clusters
- Rocks clusters
- Two almond shaped neural clusters
- Design principles of computer clusters
- Maryland career clusters
- Colorado career clusters
- 15 grand strategies
- Hindi transliteration
- Cue clusters in nursing
- Mn career clusters
- Marking bad clusters data hiding technique
- Georgia career clusters
- Types of waves quad clusters answer key
- Career cluster definition
- Career cluster definition
- 2013 pearson education inc
- Galaxy clusters
- Mapreduce simplified data processing on large clusters
- Cluster b personality disorder
- Rit cs clusters
- Schizotypal personality disorder
- 16 career clusters
- Define career cluster
- Regents
- Galaxy clusters
- Empty space between traffic clusters
- Dx cluster software
- Virtualization of clusters in cloud computing
- Four major population clusters
- Pca example
- Que es la pca
- Is pca unsupervised learning
- Pca fa-113
- Generalized pca
- Statquest with john starmer
- Pruebas pca ejemplos
- Loadings pca
- Mark gerstein
- Unicef pca guidelines
- Stata ereplace
- Tsne vs pca
- Covariance pca
- Broken stick pca
- Sparse pca
- Nacora – cargo insurance
- Pca example
- Que es la pca
- Pca vs cfa
- Knn pca
- Rmmss
- Pca vs ica
- Pca 512 instrukcja
- Covariance pca
- Pca crucible
- Pca and ica
- Pca algorithm steps
- Pca vs ica
- Jmp machine learning
- Rmr pca
- Oracle private cloud infrastructure
- Kernel pca
- Pruebas pca
- Pca
- Pca logistic regression
- 21w12
- Rmr pca
- Pca observation chart
- Covariance pca
- Isomap matlab
- Que es la pca
- Ggfortify pca
- Https://www.retourabsence.bscc.fr
- Pca téglik
- Mnchoices assessment ramsey county
- Pca map
- Disposable pca pump
- Que es la pca
- Pca example
- Pca bcp
- Introduction to pca
- Rotated component matrix interpretation
- 7 face
- Que es la pca
- Jmp pca
- Npv profiles
- Developing pricing strategies and programs
- Return on ordinary shareholders equity formula
- Shield punt protection
- Ingram micro rma
- Risk and return
- How to calculate expected return
- Knock-out option
- Function without return python
- Return wall drawing
- Return crease no ball
- Return on sales formula
- When did romare bearden die
- Pneumatic drawing
- Rate of return regulation
- Return statement in c
- Lesson 4 parameters and return make
- Current ratio adalah
- Fungsi return pada python
- Nevada department of taxation modified business tax return
- Total anomalous venous return
- Investment appraisal techniques
- Sequence diagram for restaurant management system
- Return on common stockholders equity formula
- Net operating assets
- Point of no return
- Capital market line
- No deposit no return bottles
- Shipment linl
- Advantage and disadvantage of npv