Mass Data Processing Technology on Large Scale Clusters









- Slides: 9
Mass Data Processing Technology on Large Scale Clusters Summer, 2007, Tsinghua University All course material (slides, labs, etc) is licensed under the Creative Commons Attribution 2. 5 License. Many thanks to Aaron Kimball & Sierra Michels-Slettvet for their original version 1
Staff Kang Chen – Instructor • ck 99@mails. tsinghua. edu. cn Dahai Li – Project Lead • dahaili@google. com Kai Wang – Project Lead • kaiwang@google. com Yubing Yin – Teaching Assistant • burningice 9@gmail. com 2
Thanks to Kuang Chen • Without his help, this course could not happen. Christophe Bisciglia • He is the man who brings the course here. Albert Wong • Help us to setup the cluster and preparing the lab material. Hannah Tang • Help us preparing various materials including the lectures, homework, discussions etc. 3
Goals and Expectations � 5 Lectures �Readings �Homework + discussion � 4 Labs �Final project proposal �Project reports �Project review @ Google 4
Deliverable �Homework � 3 Lab reports �Final project proposal �Final project report �Final project presentation 5
Key Information Key: Lecture 8 -203 Lab 9 -225 Homework Hours: Grading: 8: 50 Lab: 12: 00 AM 2: 00 HW: 5: 00 PM Final Project: 30% 20% 50% 6
Timetable week of Location M Tu W Th F 13 -Aug Lecture lecture 1 lecture 2 lecture 3 lecture 4 discussion Lab lab 0 lab 1 lab 2 lab 0/1 due HW readings hw due Lecture lecture 5 FP discussion Lab lab 3 FP proposal lab 2/3 due HW readings 27 -Aug lecture/lab FP begin 3 -Sep lecture/lab guest lecture 10 -Sep lecture/lab guest lecture 20 -Aug Google * HW: homework FP: final project readings hw due guest lecture FP review 7
Lectures and Labs Description Lecture 1 Introduction to Networking and Distributed Systems Lecture 2 Map/Reduce Theory and Implementation Lecture 3 Distributed File System and the Google File System Lecture 4 Distributed Graph Algorithms and Page. Rank Lecture 5 Clustering – an Overview and Sample Map. Reduce Implementation Lab 0 Setup Lab 1 Simple Inverted Index Lab 2 Page. Rank Lab 3 Clustering 8
And… Let us start. 9