Simplicial Depth An Improved Definition Analysis and Efficiency
- Slides: 22
Simplicial Depth: An Improved Definition, Analysis, and Efficiency in the Finite Sample Case Michael A. Burr, Eynat Rafalin, and Diane L. Souvaine Tufts University www. cs. tufts. edu/research/geometry CCCG 2004 NSF grant #EIA-99 -96237
Introduction • Introduction to Data Depth – Why? – Examples – Desirable Properties • Simplicial Depth – Definition – Properties – Problems • Revised Definition – Properties • Ongoing work
What is Data Depth and Why? • Measures how deep (central) a given point is relative to a distribution or a data cloud. – Deals with the shape of the data. – Can be thought of as a measure of how well a point characterizes a data set • Provides an alternative to classical statistical analysis. – No assumption about the underlying distribution of the data. • Deals with outliers. • Why study? – Many measures are geometric in nature. – Can be computationally expensive to compute depth.
Examples • • Half-Space (Tukey, Location) (Tukey 75) Regression Depth (Rousseeuw and Hubert 94) Simplicial Depth (Liu 90) … and many more. 2 Data Points in this Half-plane 3 Data Points in this Half-plane
Desirable Properties of Data Depth • Liu (90) / Serfling and Zuo (00) – P 1 – Affine Invariance – P 2 – Maximality at Center – P 3 – Monotonicity Relative to Deepest Point – P 4 – Vanishing at Infinity • We propose (BRS 04) – P 5 – Invariance Under Dimensions Change
Affine Invariance (P 1) A – affine transformation
Maximality at Center (P 2) p is the center q is any point
Monotonicity Relative to Deepest Point (P 3) point between p and q p is the deepest point q is any point
Vanishing at Infinity (P 4) q is far from the data cloud
Invariance Under Dimensions Change (P 5) Is this an data set?
Simplicial Depth (Liu 90) • The simplicial depth of a point p with respect to a probability distribution F in is the probability that a random closed simplex in contains p. where is a closed simplex formed by d+1 random observations from F. • The simplicial depth of a point p with respect to a data set in is the fraction of closed simplicies formed by d+1 points of S containing p. where I is the indicator function.
Sample Version of Simplicial Depth • The simplicial depth of a point p with respect to a data set in is the fraction of closed simplicies formed by d+1 points of S containing p. 6 (3) =20 Total number of simplicies= . 2 . 3. 3 . 2 p is contained in 6 simplicies 6 The depth of p= __ =. 3 20 . 3. 3. 2 . 3 . 4 . 4. 4. 3. 2 . 4. 3. 3 . 3. 2
Properties • Is a statistical depth function in the continuous case. (Liu 90) • Is affine invariant (P 1) and vanishes at infinity (P 4) in the sample case. (Serfling and Zuo 00)
Problems in the Sample Case • Does not always attain maximality at the center (P 2) and does not always have monotonicity relative to the deepest point (P 3). (Serfling and Zuo 00) • The depth on the boundary of cells is at least the depth in each of the adjacent cells – causes discontinuities. • Does not have invariance under dimensions change (P 5).
Simplicial Depth (Liu 90) (BRS 04). 6. 3 B . 4. 3 E . 8. 5. 4 X. 5. 35 . 3. 6 C . 3. 5 . 4 Y . 7. 4 . 6. 3 . 4 . 3 . 6. 3 D Averaging number of closed and open simplicies containing a point A Total number of simplicies = (53 ) = 10
Revised Definition (BRS 04) • The simplicial depth of a point p with respect to a data set in is the average of the fraction of closed simplicies containing p and the fraction of open simplicies containing p, formed by d+1 points of S. • Equivalently • - the fraction of simplicies with data points as vertices which contain p in their open interior. • - the fraction of simplicies with data points as vertices which contain p in their boundary.
Properties of the Revised Definition • Reduces to the original definition, for continuous distributions and for points lying in the interior of cells. • Keeps ranking order of data points • Can be calculated using the existing algorithms, with slight modifications. • Fixes Zuo and Serfling’s counterexamples. • The depth on the boundary of two cells is the average of the two adjacent cells. • Invariant under dimensions change (P 5) for the change from to.
Invariance Under Dimension Change (P 5) • Degenerate simplicies – Both points C and A (a point between B and C) lie within the open (degenerate) simplex BCD – think of it as a very thin triangle. – Both points B and D are vertices of the (degenerate) simplex BCD. • For a point, p, consider the ratio: – For both definitions, the ratio for a position (non-data point) is 2/3. – For Liu’s definition, the ratio for a data point is not 2/3. – For the BRS definition, the ratio for a data point is 2/3.
Remaining Problems (P 2 and P 3)
Remaining Problems (Data Points) • Data points are still over counted – there can still be discontinuities at data points. However, to fix the depth at data points, more features need to be considered. – Data points are inherently part of simplicies (a point makes a triangle with every other pair of points) and edges are inherently part of simplicies (the two endpoints of an edge make a triangle with every other vertex). – To retain invariance under dimensions change (P 5), given a data set in , which lies on a d-flat, then the depth of a point when the data set is evaluated as a d-dimensional data set should be a multiple of the depth when the data set is evaluated as a b-dimensional data set. • Neither of the above ideas completely solve the problem and it appears that the best solutions take into account the geometry of the entire data set.
Ongoing Work • The current algorithm for finding the median (the deepest point) is O(n 4) to walk an arrangement of O(n 2) segments. – We can improve this algorithm by comparing simplicial depth and half-space depth. – We are further improving this by considering simplicial depth in the dual. • The problems with data points are improved by generalizing this work to higher dimensions. • To find the depth at all points, we are using local information to form an approximation for the depth measure.
References • • G. Aloupis, C. Cortes, F. Gomez, M. Soss, and G. Toussaint. Lower bounds for computing statistical depth. Computational Statistics & Data Analysis, 40(2): 223229, 2002. G. Aloupis, S. Langerman, M. Soss, and G. Toussaint. Algorithms for bivariate medians and a Fermat-Torricelli problem for lines. In Proc. 13 th CCCG, pages 2124, 2001. M. Burr, E. Rafalin, and D. L. Souvaine. Simplicial depth: An improved definition, analysis, and efficiency for the sample case. Technical report 2003 -28, DIMACS, 2003. A. Y. Cheng and M. Ouyang. On algorithms for simplicial depth. In Proc. 13 th CCCG, pages 53 -56, 2001. J. Gil, W. Steiger, and A. Wigderson. Geometric medians. Discrete Math. , 108(13): 37 -51, 1992. Topological, algebraical and combinatorial structures. Frolík's memorial volume. S. Khuller and J. S. B. Mitchell. On a triangle counting problem. Inform. Process. Lett. , 33(6): 319 -321, 1990. R. Liu. On a notion of data depth based on random simplices. Ann. of Statist. , 18: 405 -414, 1990. Y. Zuo and R. Serfling. General notions of statistical depth function. Ann. Statist. , 28(2): 461 -482, 2000.
- Refraction through a semi-circular glass block
- Productive inefficiency and allocative inefficiency
- Productively efficient vs allocatively efficient
- Allocative efficiency vs productive efficiency
- Unrivalled meaning in urdu
- Theoretical analysis of time efficiency
- Zero based channel
- Fundamentals of the analysis of algorithm efficiency
- Basic efficiency classes
- Variation definition in floral design
- Beda stjernschantz pastoral
- Improved sej drawing 6914
- Howland current source
- An improved illumination model for shaded display
- Plant breeding for improved food quality
- 52 kg sej drawing
- If we can't measure it we can't manage it
- Coaching for improved performance
- Improved euler formula
- Conventional sej
- Rt 6902
- Improved sej drawing 6902
- Improved power grid