E 1 Gene 2 Exp 1 Exp 2





![הגדרות מרחק 1. Euclidean distance: D(X, Y)=sqrt[(x 1 -y 1)2+(x 2 -y 2)2+…(xn-yn)2] הגדרות מרחק 1. Euclidean distance: D(X, Y)=sqrt[(x 1 -y 1)2+(x 2 -y 2)2+…(xn-yn)2]](https://slidetodoc.com/presentation_image_h/9ec52f1e8dc05d5f215b31d73b370092/image-6.jpg)





























- Slides: 35
כמה גן מבוטא בכל ניסוי E 1 Gene 2 Exp 1 Exp 2 Exp 3 Gene N E 2 E 3
הגדרות מרחק 1. Euclidean distance: D(X, Y)=sqrt[(x 1 -y 1)2+(x 2 -y 2)2+…(xn-yn)2] 2. (Pearson) Correlation coefficient R(X, Y)=1/n*∑[(xi-E(x))/ x *(yi-E(y))/ y] x= sqrt(E(x 2)-E(x)2); E(x)=expected value of x R=1 if x=y 0 if E(xy)=E(x)E(y) 3. Norm 1 D(X, Y)=|x 1 -y 1|+|x 2 -y 2|+…|(xn-yn)| 4. Norm inf D(X, Y)=maxi(|xn-yn|)
מרחק בין וקטורים - דמיון בין פרטים מגדירים וקטור המקבל פרמטרים על סמך מאפיינים קבועים מראש v=[dress color, earings, height, hair, weight] Patty =[ 3, 2, 1. 7, 4, 65 ] Salma= [4 , 1, 1. 7, 3 , 65 ] Marge=[5, 0, 1. 6, 6, 60] || Patty-Salma||1 = 1+1+0 = 3 || Patty-Marge||1 = 2+2+0. 1+2+5 = 11. 1 || Salma-Marge||1 = 1+1+0. 1+3+5 = 10. 1 || Patty-Salma|| ∞= 1 || Patty-Marge|| ∞ = 5 || Salma-Marge|| ∞ = 5 מרחק זה נקרא מרחק עריכה edit distance
Data Clustering
? איך מפרידים לקבוצות Simpson's Family School Employees Females Males
Partitional Clustering • Nonhierarchical, each instance is placed in exactly one of K nonoverlapping clusters. • Since only one set of clusters is output, the user normally has to input the desired number of clusters K.
K-means Clustering: Step 1 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1 3 k 2 2 1 k 3 0 0 1 2 3 4 5
K-means Clustering: Step 2 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1 3 k 2 2 1 k 3 0 0 1 2 3 4 5
K-means Clustering: Step 3 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1 3 2 k 3 k 2 1 0 0 1 2 3 4 5
K-means Clustering: Step 4 Algorithm: k-means, Distance Metric: Euclidean Distance 5 4 k 1 3 2 k 3 k 2 1 0 0 1 2 3 4 5
K-means Clustering: Step 5 Algorithm: k-means, Distance Metric: Euclidean Distance k 1 k 2 k 3
Hierarchical clustering E 1 E 2 E 3
אשכול היררכי Hierarchical Partitional
How similar are the names “Peter” and “Piotr”? Edit Distance Assume the following cost function Substitution Insertion Deletion 1 Unit D(Peter, Piotr) is 3 Peter Substitution (i for e) Piter Insertion (o) er Pet Ped ro Pie rre Pie ro tro Pie ros Pet tr Pyo Pio tr Pioter Deletion (e) Piotr
Pedro (Portuguese/Spanish) Pio tr Pyo tr Pet ros Pie tro Ped ro Pie rre Pie ro Pet er Ped er Pek a Pea dar Petros (Greek), Peter (English), Piotr (Polish), Peadar (Irish), Pierre (French), Peder (Danish), Peka (Hawaiian), Pietro (Italian), Piero (Italian Alternative), Petr (Czech), Pyotr (Russian)
Pedro (Portuguese/Spanish) Pio tr Pyo tr Pet ros Pie tro Ped ro Pie rre Pie ro Pet er Ped er Pek a Pea dar Petros (Greek), Peter (English), Piotr (Polish), Peadar (Irish), Pierre (French), Peder (Danish), Peka (Hawaiian), Pietro (Italian), Piero (Italian Alternative), Petr (Czech), Pyotr (Russian)
DENDOGRAM בניית 0 D( , ) = 6 D( , ) = 1 6 8 5 7 0 2 4 4 0 3 3 0 1 0
0 6 8 5 0 2 4 0 3 0 D( , )=2
0 6 5 0 3 0 D( , )=3
Matlab….
דוגמא 4 8 4 4 7 5 3 4 7 10 4 7 7 3 3 6 (a) (b) (c) (d) (e) 5