Graph Processing COS 518 Advanced Computer Systems Lecture
- Slides: 15
Graph Processing COS 518: Advanced Computer Systems Lecture 12 Mike Freedman [Content adapted from K. Jamieson and J. Gonzalez]
Graphs are Everywhere Collaborative Filtering Users Social Network Netflix Movies Probabilistic Analysis Docs Text Analysis Wiki Words
Concrete Examples Label Propagation Page Rank
Label Propagation Algorithm • Social Arithmetic: + Sue Ann 50% What I list on my profile 40% Sue Ann Likes 10% Carlos Like I Like: 60% Cameras, 40% Biking Profile Me • Recurrence Algorithm: 80% Cameras 20% Biking 40% – iterate until convergence • Parallelism: – Compute all Likes[i] in parallel 50% Cameras 50% Biking Carlos 10% 30% Cameras 70% Biking
Page. Rank Algorithm • Page. Rank of u is dependent on PR of all pages linking to u, divided by the number of links from each of these pages • Recurrence Algorithm: PR[u] = �� v∈Bu PR[v] / L[v] – iterate until convergence • Parallelism: – Compute all PR[u] in parallel
Properties of Graph Parallel Algorithms Dependency Graph Factored Computation Iterative Computation What I Like What My Friends Like
Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Map. Reduce Feature Extraction Algorithm Tuning Basic Data Processing Graph-Parallel Map. Reduce? Label Propagation Lasso Kernel Methods Tensor Factorization Deep Belief Networks Belief Propagation Page. Rank Neural Networks 7
Problem: Data Dependencies • Map. Reduce doesn’t efficiently express data dependencies Independent Data Rows – User must code substantial data transformations – Costly data replication
Iterative Algorithms • MR doesn’t efficiently express iterative algorithms: Iterations Data CPU 1 Data Data Data CPU 2 Data Data CPU 3 Data Data Barrier Slow Processor CPU 1 Data
Map. Abuse: Iterative Map. Reduce • Only a subset of data needs computation: Iterations Data CPU 1 Data Data CPU 2 Data Data CPU 3 Data Data Barrier Data
Map. Abuse: Iterative Map. Reduce • System is not optimized for iteration: Iterations Data CPU 1 Data CPU 2 CPU 3 Data Data CPU 2 CPU 3 Disk Penalty CPU 3 Data Startup Penalty Data CPU 1 Disk Penalty Data Disk Penalty Startup Penalty CPU 2 Data Startup Penalty Data Data Data
ML Tasks Beyond Data-Parallelism Data-Parallel Graph-Parallel Map Reduce Feature Extraction Cross Validation Computing Sufficient Statistics Graphical Models Gibbs Sampling Belief Propagation Variational Opt. Collaborative Filtering Tensor Factorization Semi-Supervised Learning Label Propagation Co. EM Graph Analysis Page. Rank Triangle Counting 12
This week’s lectures • Graph processing – Why relationships, sampling, and iterations often use in graph processing not well fit by Map. Reduce – How to take a graph-centric processing perspective • Machine learning – These are solving one type of ML algorithm – What other systems are needed, particularly given heavy focus on iterative algorithms 13
Today’s readings • Streaming is about unbounded data sets, not particular execution engines • Streaming is in fact a strict superset of batch, Lambda Architecture destined for retirement • Needs of good streaming systems: correctness and tools for reasoning about time. • Differences between event time and processing time, and the challenges they impose 14
Today’s readings • What about major data processing approaches for bounded & unbounded data? • Challenges/needs for unbounded include: – time-agnostic, approximation, windowing by processing time, windowing by event time • Key mechanisms (in Cloud Data. Flow) – Watermarks: ideal vs. heuristic – Triggers – Discarding, accumulating + retracting 15
- Opwekking 694 tekst
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Advanced inorganic chemistry lecture notes
- Natural language processing
- Image processing lecture notes
- Natural language processing lecture notes
- Natural language processing nlp - theory lecture
- Natural language processing lecture notes
- Image processing lecture
- Natural language processing lecture notes
- Natural language processing lecture notes
- Advanced image processing techniques
- Advanced data processing
- Advanced weather interactive processing system
- Graph neural network lecture
- Tentukan nilai dari cos 1950°