Parallel C 3 M Aylin Toku Erkan Okuyan
- Slides: 24
Parallel C 3 M Aylin Tokuç Erkan Okuyan Özlem Gür Parallel C 3 M 1
Outline • Basics of Parallel computing • Sequential C 3 M • Parallel C 3 M 2
Parallel Computation Decomposition: The process of dividing a computation into smaller parts. Task: Programmer defined units of computation into which the main computation is subdivided by means of decomposition. Parallel C 3 M 3
Parallel Computation Primary Considerations • Load Balancing • Minimizing Communication • Task Dependency Optimization Parallel C 3 M 4
Parallel Computation Load Balancing Parallel C 3 M 5
Parallel Computation Minimizing Communication Parallel C 3 M 6
Parallel Computation Task Dependency Optimization Parallel C 3 M 7
C 3 M Algorithm 1 - Determine the cluster seeds of the database. 2 - if d, is not a cluster seed then Find the cluster seed (if any) that maximally covers d 3 - If there remain unclustered documents, group them into a ragbag cluster. Parallel C 3 M 8
C 3 M Formulas Parallel C 3 M 9
C 3 M – Sample Matrices Parallel C 3 M 10
Parallel C 3 M- Distribution Distribute rows among processors Ø Load balancing by cyclic block distribution Parallel C 3 M 11
Local Calculations All processors calculate α, partial β and Pi Current Method for Weighted Matrix: too costly Need coloumn vectors (but rowwise partitioned) Parallel C 3 M 12
Seed Powers Pi • Seed power Pi, should be small for a document whose terms appear in too many documents or too few documents. • Seed power Pi, should be bigger for a document whose terms appear in a moderate number of documents. Parallel C 3 M 13
Minimize Communication Proposed Heuristic All processors calculate α, partial β and β’ # of non-zeros Parallel C 3 M 14
Effectiveness of Heuristic • A matlab script is written to compare the effectiveness of the proposed heuristic. • Correlation Coeeficient = 0. 95 Parallel C 3 M 15
Communication btw Processors • Partial β and β’ vectors are exchanged btw processors to calculate the final β and β’ vectors. • Then, all processor calculate cii=δi Parallel C 3 M 16
# of Clusters • Processors exchange local δ • All processors calculate nc Parallel C 3 M 17
Cluster-head Selection • Calculate seed power of local documents • Exchange largest nc seed powers. • Calculate largest nc seed powers among all Pi and find cluster heads. Parallel C 3 M 18
Clustering Non-seed Docs • Exchange seed documents • Cluster non-seed documents (as in sequential C 3 M) in each processor. Parallel C 3 M 19
Future Work • Term Based Clustering • Overlapping Clusters Parallel C 3 M 20
C 3 M Summary • Load Balancing with cyclic block distribution • Communication minimization by a new heuristic • Task dependency minimized with block distirbution & heuristic. Parallel C 3 M 21
References • Concepts and the effectiveness of the cover coefficient-based clustering methodology, F. Can, E. A. Ozkarahan • Parallelizing the Buckshot Algorithm for Efficient Document Clustering, Eric C. Jensen, Steven M. Beitzel, Angelo J. Pilotto, Nazli Goharian, Ophir Frieder • Clustering and Classification of Large Document Bases in a Parallel Environment, Anthony S. Ruocco, Ophir Frieder • Efficient Clustering of Very Large Document Collections, I. S. Dhillon, J. Fan, Y. Guan Parallel C 3 M 22
Questions? Parallel C 3 M 23
The End Thank you for your patience Parallel C 3 M 24
- Rovnice spojitosti toku
- Membrna
- Kontrast crta po toku i karakteru
- Wdib sekcja toku studiów
- Fáze toku řeky
- Erkan sacma
- Erkan çetintaş
- Emre telatar
- Hcran
- Aylin abuk duygulu
- Hızlı maruziyet değerlendirme formu
- Duisburg sightseeing
- Ymm mehmet erkan
- Erkan tokatlı
- Erkan uslu avesis
- Dr aylin seven
- Erkan babür
- Lara roscher
- Erkan tokatlı
- Erkan ayyıldız
- Kemal erkan
- Aylin yener
- Serial peripheral interface
- Erkan karaarslan
- Ryley thompson