E 1 Gene 2 Exp 1 Exp 2

הגדרות מרחק 1. Euclidean distance: D(X, Y)=sqrt[(x 1 -y 1)2+(x 2 -y 2)2+…(xn-yn)2]

מרחק בין וקטורים - דמיון בין פרטים מגדירים וקטור המקבל פרמטרים על סמך

? איך מפרידים לקבוצות Simpson's Family School Employees Females Males

Partitional Clustering • Nonhierarchical, each instance is placed in exactly one of K nonoverlapping

K-means Clustering: Step 1 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1

K-means Clustering: Step 2 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1

K-means Clustering: Step 3 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1

K-means Clustering: Step 4 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1

K-means Clustering: Step 5 Algorithm: k-means, Distance Metric: Euclidean Distance k 1 k 2

How similar are the names “Peter” and “Piotr”? Edit Distance Assume the following cost

Pedro (Portuguese/Spanish) Pio tr Pyo tr Pet ros Pie tro Ped ro Pie rre

DENDOGRAM בניית 0 D( , ) = 6 D( , ) = 1 6

Slides: 35

Download presentation

כמה גן מבוטא בכל ניסוי E 1 Gene 2 Exp 1 Exp 2 Exp 3 Gene N E 2 E 3

הגדרות מרחק 1. Euclidean distance: D(X, Y)=sqrt[(x 1 -y 1)2+(x 2 -y 2)2+…(xn-yn)2] 2. (Pearson) Correlation coefficient R(X, Y)=1/n*∑[(xi-E(x))/ x *(yi-E(y))/ y] x= sqrt(E(x 2)-E(x)2); E(x)=expected value of x R=1 if x=y 0 if E(xy)=E(x)E(y) 3. Norm 1 D(X, Y)=|x 1 -y 1|+|x 2 -y 2|+…|(xn-yn)| 4. Norm inf D(X, Y)=maxi(|xn-yn|)

מרחק בין וקטורים - דמיון בין פרטים מגדירים וקטור המקבל פרמטרים על סמך מאפיינים קבועים מראש v=[dress color, earings, height, hair, weight] Patty =[ 3, 2, 1. 7, 4, 65 ] Salma= [4 , 1, 1. 7, 3 , 65 ] Marge=[5, 0, 1. 6, 6, 60] || Patty-Salma||1 = 1+1+0 = 3 || Patty-Marge||1 = 2+2+0. 1+2+5 = 11. 1 || Salma-Marge||1 = 1+1+0. 1+3+5 = 10. 1 || Patty-Salma|| ∞= 1 || Patty-Marge|| ∞ = 5 || Salma-Marge|| ∞ = 5 מרחק זה נקרא מרחק עריכה edit distance

Data Clustering

? איך מפרידים לקבוצות Simpson's Family School Employees Females Males

Partitional Clustering • Nonhierarchical, each instance is placed in exactly one of K nonoverlapping clusters. • Since only one set of clusters is output, the user normally has to input the desired number of clusters K.

K-means Clustering: Step 1 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1 3 k 2 2 1 k 3 0 0 1 2 3 4 5

K-means Clustering: Step 2 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1 3 k 2 2 1 k 3 0 0 1 2 3 4 5

K-means Clustering: Step 3 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1 3 2 k 3 k 2 1 0 0 1 2 3 4 5

K-means Clustering: Step 4 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1 3 2 k 3 k 2 1 0 0 1 2 3 4 5

K-means Clustering: Step 5 Algorithm: k-means, Distance Metric: Euclidean Distance k 1 k 2 k 3

Hierarchical clustering E 1 E 2 E 3

אשכול היררכי Hierarchical Partitional

How similar are the names “Peter” and “Piotr”? Edit Distance Assume the following cost function Substitution Insertion Deletion 1 Unit D(Peter, Piotr) is 3 Peter Substitution (i for e) Piter Insertion (o) er Pet Ped ro Pie rre Pie ro tro Pie ros Pet tr Pyo Pio tr Pioter Deletion (e) Piotr

Pedro (Portuguese/Spanish) Pio tr Pyo tr Pet ros Pie tro Ped ro Pie rre Pie ro Pet er Ped er Pek a Pea dar Petros (Greek), Peter (English), Piotr (Polish), Peadar (Irish), Pierre (French), Peder (Danish), Peka (Hawaiian), Pietro (Italian), Piero (Italian Alternative), Petr (Czech), Pyotr (Russian)

DENDOGRAM בניית 0 D( , ) = 6 D( , ) = 1 6 8 5 7 0 2 4 4 0 3 3 0 1 0

0 6 8 5 0 2 4 0 3 0 D( , )=2

0 6 5 0 3 0 D( , )=3

Matlab….

דוגמא 4 8 4 4 7 5 3 4 7 10 4 7 7 3 3 6 (a) (b) (c) (d) (e) 5