Cloud Map Reduce A Map Reduce Implementation on

  • Slides: 21
Download presentation
Cloud Map. Reduce: A Map. Reduce Implementation on top of a Cloud Operation System

Cloud Map. Reduce: A Map. Reduce Implementation on top of a Cloud Operation System Huan Liu, Dan Orban Accenture Technology Labs 2011, 11 th IEEE/ACM International Symposium on 9962161 江嘉福 100062228 徐光成 100062229 章博遠 1

OUTLINE I. Introduction II. Cloud Map. Reduce. Architecture & Implementation III. Pros & Cons

OUTLINE I. Introduction II. Cloud Map. Reduce. Architecture & Implementation III. Pros & Cons of Cloud Map. Reduce IV. Experimental Evaluation V. Conclusions & Future Works VI. References 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2

INTRODUCTION 1. What is Cloud OS ? 2. Challenges posed by a cloud OS

INTRODUCTION 1. What is Cloud OS ? 2. Challenges posed by a cloud OS 3. Cloud Map. Reduce? 4. Advantages of Cloud Map. Reduce 9962161 江嘉福 100062228 徐光成 100062229 章博遠 3

What is Cloud OS ? 1. Managing the low level cloud resources 2. Presenting

What is Cloud OS ? 1. Managing the low level cloud resources 2. Presenting a high level interface to the application programmers 3. key difference :scalable 圖一 9962161 江嘉福 100062228 徐光成 100062229 章博遠 4

Challenges posed by a cloud OS 1. Scalability comes at a price. 2. Data

Challenges posed by a cloud OS 1. Scalability comes at a price. 2. Data consistency, system availability, and tolerance to network partition. 圖二 9962161 江嘉福 100062228 徐光成 100062229 章博遠 5

Cloud Map. Reduce? 1. Map. Reduce programming model 2. horizontal scaling 3. eventual consistency

Cloud Map. Reduce? 1. Map. Reduce programming model 2. horizontal scaling 3. eventual consistency 4. overcome limitations 9962161 江嘉福 100062228 徐光成 100062229 章博遠 6

Advantages of Cloud Map. Reduce 1. Incremental scalability: Can scale incrementally in the number

Advantages of Cloud Map. Reduce 1. Incremental scalability: Can scale incrementally in the number of computing nodes. 2. Symmetry and Decentralization: Node has the same set of responsibilities. 3. Heterogeneity: Nodes have varying computation capacity. 9962161 江嘉福 100062228 徐光成 100062229 章博遠 7

Cloud Map. Reduce. Architecture and Implementation 1. The architecture 2. Cloud challnenges 3. General

Cloud Map. Reduce. Architecture and Implementation 1. The architecture 2. Cloud challnenges 3. General solution approaches 9962161 江嘉福 100062228 徐光成 100062229 章博遠 8

The Architecture 9962161 江嘉福 100062228 徐光成 100062229 章博遠 9

The Architecture 9962161 江嘉福 100062228 徐光成 100062229 章博遠 9

Cloud challenges & General solution approaches 1. Long latency 2. Horizontal scaling 3. Don’t

Cloud challenges & General solution approaches 1. Long latency 2. Horizontal scaling 3. Don’t know when a queue is created for the first time 9962161 江嘉福 100062228 徐光成 100062229 章博遠 10

Con’t 4. Duplicate message 5. Potential node failure 6. Indeterminstic eventual consistency windows 9962161

Con’t 4. Duplicate message 5. Potential node failure 6. Indeterminstic eventual consistency windows 9962161 江嘉福 100062228 徐光成 100062229 章博遠 11

Pros ● 3000 lines of Java code(L. O. C) vs 285375 Hadoop L. O.

Pros ● 3000 lines of Java code(L. O. C) vs 285375 Hadoop L. O. C ● Large & Reliable FS ● High Bandwidth(fast read/write) ● Single point of contact(high throughput) 9962161 江嘉福 100062228 徐光成 100062229 章博遠 12

Cons ● Uses only network(no local storage) ● Leads to bottleneck 9962161 江嘉福 100062228

Cons ● Uses only network(no local storage) ● Leads to bottleneck 9962161 江嘉福 100062228 徐光成 100062229 章博遠 13

Evaluation Almost twice as fast! 9962161 江嘉福 100062228 徐光成 100062229 章博遠 14

Evaluation Almost twice as fast! 9962161 江嘉福 100062228 徐光成 100062229 章博遠 14

Evaluation ● Hadoop - 385 s total, network/CPU under utilized ● CMR - 210

Evaluation ● Hadoop - 385 s total, network/CPU under utilized ● CMR - 210 s, more efficient network/CPU usage 9962161 江嘉福 100062228 徐光成 100062229 章博遠 15

Evaluation Wiki Word Count ● Combiner: Hadoop - 747 s CMR - 436 s

Evaluation Wiki Word Count ● Combiner: Hadoop - 747 s CMR - 436 s ● No Combiner: Hadoop - 1733 s CMR - 1247 s 9962161 江嘉福 100062228 徐光成 100062229 章博遠 16

Evaluation Amazon ● Word Count -> 400 GB using 100 nodes ● Approx. 1

Evaluation Amazon ● Word Count -> 400 GB using 100 nodes ● Approx. 1 hr ● 983, 152 Requests -> $0. 98 ● Using Simple. DB? ● 3. 7 hrs -> $0. 52 9962161 江嘉福 100062228 徐光成 100062229 章博遠 17

Evaluation Comparison ● Distributed Grep Word Count -> 13 GB of data ● CMR

Evaluation Comparison ● Distributed Grep Word Count -> 13 GB of data ● CMR = 962 seconds ● Hadoop 1047 seconds ● Results are almost the same, why? ● More CPU intensive tasks 9962161 江嘉福 100062228 徐光成 100062229 章博遠 18

Evaluation 12 GB - 923670 HTML files ● Hadoop -> 6 hrs+ ● CMR

Evaluation 12 GB - 923670 HTML files ● Hadoop -> 6 hrs+ ● CMR -> 297 seconds ● Hadoop - High overhead from task creation 9962161 江嘉福 100062228 徐光成 100062229 章博遠 19

Conclusion ● Cloud cannot be implemented on any system ● Poor Performance ● CMR

Conclusion ● Cloud cannot be implemented on any system ● Poor Performance ● CMR techniques overcome cloud limitations ● 0 Performance Degradation ● Good to use for other systems 9962161 江嘉福 100062228 徐光成 100062229 章博遠 20

REFERENCES 圖一:http: //techcrunch. com/ 圖二:http: //blog. csdn. net/zouqingfang/article/details/7269920 http: //zh. wikipedia. org/ https: //code.

REFERENCES 圖一:http: //techcrunch. com/ 圖二:http: //blog. csdn. net/zouqingfang/article/details/7269920 http: //zh. wikipedia. org/ https: //code. google. com/p/cloudmapreduce/ http: //searchcloudcomputing. techtarget. com/definition/Map. Reduce http: //myblog-maurice. blogspot. tw/2012/08/nosqlcap. html 9962161 江嘉福 100062228 徐光成 100062229 章博遠 21