Distance Metric Measures the dissimilarity between two data

Distance Metric Measures the dissimilarity between two data points. A metric is a fctn, d, of 2 points X and Y, such that d(X, Y) is positive definite: if (X Y), d(X, Y) > 0 if (X = Y), d(X, Y) = 0 d(X, Y) is symmetric: d(X, Y) = d(Y, X) d(X, Y) satisfies triangle inequality: d(X, Y) + d(Y, Z) d(X, Z)

Standard Distance Metrics Minkowski distance or Lp distance, Manhattan distance, (P = 1) Euclidian distance, (P = 2) Max distance, (P = )

An Example Y (6, 4) A two-dimensional space: Manhattan, d 1(X, Y) = XZ+ ZY = 4+3 = 7 Euclidian, d 2(X, Y) = XY = 5 Z X (2, 1) Max, d (X, Y) = Max(XZ, ZY) = XZ = 4 d 1 d 2 d For any positive integer p,

HOBbit Similarity These notes contain NDSU confidential & Proprietary material. Patents pending on b. SQ, Ptree technology Higher Order Bit (HOBbit) similarity: HOBbit. S(A, B) = A, B: two scalars (integer) ai, bi : ith bit of A and B (left to right) m : number of bits Bit position: 1 2 3 4 5 6 7 8 x 1: 0 1 1 0 0 1 y 1: 0 1 1 1 0 1 x 2: 0 1 1 1 0 1 y 2: 0 1 0 0 HOBbit. S(x 1, y 1) = 3 HOBbit. S(x 2, y 2) = 4

HOBbit Distance (High Order Bifurcation bit) HOBbit distance between two scalar value A and B: dv(A, B) = m – HOBbit(A, B) Example: Bit position: 1 2 3 4 5 6 7 8 x 1: 0 1 1 0 0 1 y 1: 0 1 1 1 0 1 x 2: 0 1 1 1 0 1 y 2: 0 1 0 0 HOBbit. S(x 1, y 1) = 3 HOBbit. S(x 2, y 2) = 4 dv(x 1, y 1) = 8 – 3 = 5 dv(x 2, y 2) = 8 – 4 = 4 HOBbit distance for X and Y: In our example (considering 2 -dim data): dh(X, Y) = max (5, 4) = 5

HOBbit Distance Is a Metric HOBbit distance is positive definite if (X = Y), = 0 if (X Y), > 0 HOBbit distance is symmetric HOBbit distance holds triangle inequality

Neighborhood of a Point Neighborhood of a target point, T, is a set of points, S, such that X S if and only if d(T, X) r 2 r 2 r T X X X T 2 r 2 r X T T Manhattan Euclidian Max HOBbit If X is a point on the boundary, d(T, X) = r

Decision Boundary decision boundary between points A and B, is the A locus of the point X satisfying d(A, X) = d(B, X) R 1 d(A, X) X d(B, X) R 2 B D A A B B Decision boundary for HOBbit Distance is perpendicular to axis that makes max distance Manhattan Euclidian Max Euclidian A Manhattan A B > 45 Decision boundaries for Manhattan, Euclidean and max distance B < 45

Minkowski Metrics Lp-metrics (aka: Minkowski metrics) dp(X, Y) = ( i=1 to n wi|xi - yi|p)1/p (weights, wi assumed =1) Unit Disks Boundary p=1 (Manhattan) p=2 (Euclidean) p=3, 4, …. . P= (chessboard) P=½, ⅓, ¼, … ? dmax≡ max|xi - yi| d ≡ limp dp(X, Y). Proof (sort of) limp { i=1 to n aip }1/p max(ai) ≡b. For p large enough, other aip << bp since y=xp increasingly concave, so i=1 to n aip k*bp (k=duplicity of b in the sum), so { i=1 to n aip }1/p k 1/p*b and k 1/p 1

q 2 4 9 100 MAX x 1. 5. 5. 5 y 1 0 0 0 q 2 3 7 100 MAX x 1. 71. 71. 71 y 1 0 0 0 q 2 8 1000 MAX x 1. 99. 99. 99 y 1 0 0 0 q 2 9 1000 MAX x 1 1 1 q 2 9 1000 MAX x 1. 9. 9. 9 x 2. 5. 5. 5 x 2. 71. 71. 71 y 2 0 0 0 Lq distance x to y. 7071067812. 5946035575. 5400298694. 503477775. 5 y y 2 0 0 0 Lq distance x to y 1. 0. 8908987181. 7807091822. 7120250978. 7071067812 x 2. 99. 99. 99 y 2 0 0 0 Lq distance x to y 1. 4000714267 1. 0796026553. 9968859946. 9906864536. 99 y 1 0 0 0 x 2 1 1 1 y 2 0 0 0 Lq distance x to y 1. 4142135624 1. 0800597389 1. 0069555501 1. 0006933875 1 y y 1 0 0 0 x 2. 1. 1. 1 y 2 0 0 0 Lq distance x to y. 9055385138. 900003. 9. 9. 9 y Lq distance x to y 4. 2426406871 3. 7797631497 3. 271523198 3. 0208666502 3 y q 2 3 8 100 MAX x 1 3 3 3 y 1 0 0 0 x 2 3 3 3 y 2 0 0 0 q 6 9 100 MAX x 1 90 90 y 1 0 0 x 2 45 45 y 2 0 0 Lq distance x to y 90. 232863532 90. 019514317 90 90 x P>1 Lp metrics x y x x

q 1. 8. 4. 2. 1. 04. 02. 01 2 x 1. 1. 1 y 1 0 0 0 0 0 x 2. 1. 1. 1 y 2 Lq distance x to y 0. 238 0. 566 0 3. 2 0 102 0 3355443 0 112589990684263 0 1. 2676 E+29 0. 141421356 q 1. 8. 4. 2. 1. 04. 02. 01 2 x 1. 5. 5. 5 y 1 0 0 0 0 0 x 2. 5. 5. 5 y 2 Lq distance x to y 0 1. 19 0 2. 83 0 16 0 512 0 16777216 0 5. 63 E+14 0 6. 34 E+29 0. 7071 q 1. 8. 4. 2. 1. 04. 02. 01 2 x 1. 9. 9. 9 y 1 0 0 0 0 0 x 2 y 2 Lq distance x to y 0. 1 0 1. 098 0. 1 0 2. 1445 0. 1 0 10. 82 0. 1 0 326. 27 0. 1 0 10312196. 962 0. 1 0 341871052443154 0. 1 0 3. 8 E+29 0. 1 0. 906 x P<1 Lp metrics y d 1/p(X, Y) = ( i=1 to n |xi - yi|1/p)p For p=0 (lim as p 0), doesn’t exist (Does not converge. ) y y x x P<1 Lp

Min dissimilarity function The dmin function ( dmin(X, Y) = min i=1 to n |xi - yi| ) is strange. It is not even a psuedo-metric. The Unit Disk is: And the neighborhood of the blue point relative to the red point (the neighborhood of points closer to the blue than the red) is strangely shaped! http: //www. cs. ndsu. nodak. edu/~serazi/research/Distance. html

Other Interesting Metrics Canberra metric: dc(X, Y) = ( i=1 to n |xi – yi| / (xi + yi) normalized manhattan distance Square Cord metric: dsc(X, Y) = i=1 to n ( xi – yi )2 Already discussed as Lp with p=1/2 Squared Chi-squared metric: Scalar Product metric: dchi(X, Y) = i=1 to n (xi – yi)2 / (xi + yi) dchi(X, Y) = X • Y = i=1 to n xi * yi Hyperbolic metrics: (which map infinite space 1 -1 onto a sphere) Which are rotationally invariant? Translation invariant? Other? Some notes on distance functions can be found at http: //www. cs. ndsu. No. Dak. edu/~datasurg/distance_similarity. pdf