Distance Similarity Measures Similarity and Dissimilarity Similarity Numerical
![Distance & Similarity Measures Distance & Similarity Measures](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-1.jpg)
![Similarity and Dissimilarity • Similarity – Numerical measure of how alike two data objects Similarity and Dissimilarity • Similarity – Numerical measure of how alike two data objects](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-2.jpg)
![Data structures Data structures](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-3.jpg)
![Euclidean Distance • Euclidean Distance Where n is the number of dimensions (attributes) and Euclidean Distance • Euclidean Distance Where n is the number of dimensions (attributes) and](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-4.jpg)
![Euclidean Distance Matrix Euclidean Distance Matrix](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-5.jpg)
![Minkowski Distance • Minkowski Distance is a generalization of Euclidean Distance Where r is Minkowski Distance • Minkowski Distance is a generalization of Euclidean Distance Where r is](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-6.jpg)
![Minkowski Distance: Examples • r = 1. City block (Manhattan, taxicab, L 1 norm) Minkowski Distance: Examples • r = 1. City block (Manhattan, taxicab, L 1 norm)](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-7.jpg)
![Minkowski Distance Matrix Minkowski Distance Matrix](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-8.jpg)
![Common Properties of a Distance • Distances, such as the Euclidean distance, have some Common Properties of a Distance • Distances, such as the Euclidean distance, have some](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-9.jpg)
![Common Properties of a Similarity • Similarities, also have some well known properties. 1. Common Properties of a Similarity • Similarities, also have some well known properties. 1.](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-10.jpg)
![Similarity Between Binary Vectors Similarity Between Binary Vectors](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-11.jpg)
![Example Example](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-12.jpg)
![SMC versus Jaccard: Example p= 100000 q= 0000001001 M 01 = 2 M 10 SMC versus Jaccard: Example p= 100000 q= 0000001001 M 01 = 2 M 10](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-13.jpg)
![Cosine Similarity • If d 1 and d 2 are two document vectors, then Cosine Similarity • If d 1 and d 2 are two document vectors, then](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-14.jpg)
- Slides: 14
![Distance Similarity Measures Distance & Similarity Measures](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-1.jpg)
Distance & Similarity Measures
![Similarity and Dissimilarity Similarity Numerical measure of how alike two data objects Similarity and Dissimilarity • Similarity – Numerical measure of how alike two data objects](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-2.jpg)
Similarity and Dissimilarity • Similarity – Numerical measure of how alike two data objects are. – Is higher when objects are more alike. – Often falls in the range [0, 1] • Dissimilarity – Numerical measure of how different are two data objects – Lower when objects are more alike
![Data structures Data structures](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-3.jpg)
Data structures
![Euclidean Distance Euclidean Distance Where n is the number of dimensions attributes and Euclidean Distance • Euclidean Distance Where n is the number of dimensions (attributes) and](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-4.jpg)
Euclidean Distance • Euclidean Distance Where n is the number of dimensions (attributes) and pk and qk are, respectively, the kth attributes (components) or data objects p and q. • Standardization is necessary, if scales differ.
![Euclidean Distance Matrix Euclidean Distance Matrix](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-5.jpg)
Euclidean Distance Matrix
![Minkowski Distance Minkowski Distance is a generalization of Euclidean Distance Where r is Minkowski Distance • Minkowski Distance is a generalization of Euclidean Distance Where r is](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-6.jpg)
Minkowski Distance • Minkowski Distance is a generalization of Euclidean Distance Where r is a parameter, n is the number of dimensions (attributes) and pk and qk are, respectively, the kth attributes (components) or data objects p and q.
![Minkowski Distance Examples r 1 City block Manhattan taxicab L 1 norm Minkowski Distance: Examples • r = 1. City block (Manhattan, taxicab, L 1 norm)](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-7.jpg)
Minkowski Distance: Examples • r = 1. City block (Manhattan, taxicab, L 1 norm) distance. – A common example of this is the Hamming distance, which is just the number of bits that are different between two binary vectors • r = 2. Euclidean distance
![Minkowski Distance Matrix Minkowski Distance Matrix](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-8.jpg)
Minkowski Distance Matrix
![Common Properties of a Distance Distances such as the Euclidean distance have some Common Properties of a Distance • Distances, such as the Euclidean distance, have some](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-9.jpg)
Common Properties of a Distance • Distances, such as the Euclidean distance, have some well known properties. 1. 2. 3. d(p, q) 0 for all p and q and d(p, q) = 0 only if p = q. (Positive definiteness) d(p, q) = d(q, p) for all p and q. (Symmetry) d(p, r) d(p, q) + d(q, r) for all points p, q, and r. (Triangle Inequality) where d(p, q) is the distance (dissimilarity) between points (data objects), p and q. • A distance that satisfies these properties is a metric, and a space is called a metric space
![Common Properties of a Similarity Similarities also have some well known properties 1 Common Properties of a Similarity • Similarities, also have some well known properties. 1.](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-10.jpg)
Common Properties of a Similarity • Similarities, also have some well known properties. 1. s(p, q) = 1 (or maximum similarity) only if p = q. 2. s(p, q) = s(q, p) for all p and q. (Symmetry) where s(p, q) is the similarity between points (data objects), p and q.
![Similarity Between Binary Vectors Similarity Between Binary Vectors](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-11.jpg)
Similarity Between Binary Vectors
![Example Example](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-12.jpg)
Example
![SMC versus Jaccard Example p 100000 q 0000001001 M 01 2 M 10 SMC versus Jaccard: Example p= 100000 q= 0000001001 M 01 = 2 M 10](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-13.jpg)
SMC versus Jaccard: Example p= 100000 q= 0000001001 M 01 = 2 M 10 = 1 M 00 = 7 M 11 = 0 (the number of attributes where p was 0 and q was 1) (the number of attributes where p was 1 and q was 0) (the number of attributes where p was 0 and q was 0) (the number of attributes where p was 1 and q was 1) SMC = (M 11 + M 00)/(M 01 + M 10 + M 11 + M 00) = (0+7) / (2+1+0+7) = 0. 7 d. Jaccard = (M 01 + M 10 ) / (M 01 + M 10 + M 11) = 3 / (2 + 1 + 0) =1
![Cosine Similarity If d 1 and d 2 are two document vectors then Cosine Similarity • If d 1 and d 2 are two document vectors, then](https://slidetodoc.com/presentation_image_h/a8464c2892e033e62ce432e17124f3aa/image-14.jpg)
Cosine Similarity • If d 1 and d 2 are two document vectors, then cos( d 1, d 2 ) = (d 1 d 2) / ||d 1|| ||d 2|| , where indicates vector dot product and || is the length of vector d. • Example: d 1 = 3 2 0 5 0 0 0 2 0 0 d 2 = 1 0 0 0 1 0 2 d 1 d 2= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5 ||d 1|| = (3*3+2*2+0*0+5*5+0*0+0*0+2*2+0*0)0. 5 = (42) 0. 5 = 6. 48 ||d 2|| = (1*1+0*0+0*0+0*0+1*1+0*0+2*2) 0. 5 = (6) 0. 5 = 2. 44 cos( d 1, d 2 ) = 0. 316, distance=1 -cos(d 1, d 2)
Similarity and dissimilarity measures in data mining
Smc vs jaccard
Dissimilarity matrix in data mining
Proximity measures for nominal attributes
Index of dissimilarity
Numerical measures
Numerical descriptive measures exercises
Mean deviation formula for grouped data
Numerical descriptive statistics
Descriptive statistics numerical measures
Numerical descriptive measures
Numerical descriptive measures
In the box
Numerical descriptive measures
Describing data with numerical measures