Cloud Computing Lecture 1 What is Cloud Computing

  • Slides: 44
Download presentation
Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing)

Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The i. School University of Maryland Wednesday, September 3, 2008 Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3. 0 License) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3. 0 United States See http: //creativecommons. org/licenses/by-nc-sa/3. 0/us/ for details

Source: http: //www. free-pictures-photos. com/

Source: http: //www. free-pictures-photos. com/

What is Cloud Computing? 1. Web-scale problems 2. Large data centers 3. Different models

What is Cloud Computing? 1. Web-scale problems 2. Large data centers 3. Different models of computing 4. Highly-interactive Web applications The i. School University of Maryland

1. Web-Scale Problems ¢ Characteristics: l l ¢ Definitely data-intensive May also be processing

1. Web-Scale Problems ¢ Characteristics: l l ¢ Definitely data-intensive May also be processing intensive Examples: l l l Crawling, indexing, searching, mining the Web “Post-genomics” life sciences research Other scientific data (physics, astronomers, etc. ) Sensor networks Web 2. 0 applications … The i. School University of Maryland

How much data? ¢ Wayback Machine has 2 PB + 20 TB/month (2006) ¢

How much data? ¢ Wayback Machine has 2 PB + 20 TB/month (2006) ¢ Google processes 20 PB a day (2008) ¢ “all words ever spoken by human beings” ~ 5 EB ¢ NOAA has ~1 PB climate data (2007) ¢ CERN’s LHC will generate 15 PB a year (2008) 640 K ought to be enough for anybody. The i. School University of Maryland

Maximilien Brice, © CERN

Maximilien Brice, © CERN

Maximilien Brice, © CERN

Maximilien Brice, © CERN

There’s nothing like more data! s/inspiration/data/g; (Banko and Brill, ACL 2001) (Brants et al.

There’s nothing like more data! s/inspiration/data/g; (Banko and Brill, ACL 2001) (Brants et al. , EMNLP 2007) The i. School University of Maryland

What to do with more data? ¢ Answering factoid questions l l Pattern matching

What to do with more data? ¢ Answering factoid questions l l Pattern matching on the Web Works amazingly well Who shot Abraham Lincoln? X shot Abraham Lincoln ¢ Learning relations l l l Start with seed instances Search for patterns on the Web Using patterns to find more instances Wolfgang Amadeus Mozart (1756 - 1791) Einstein was born in 1879 Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) PERSON (DATE – PERSON was born in DATE (Brill et al. , TREC 2001; Lin, ACM TOIS 2007) (Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … ) The i. School University of Maryland

2. Large Data Centers ¢ Web-scale problems? Throw more machines at it! ¢ Clear

2. Large Data Centers ¢ Web-scale problems? Throw more machines at it! ¢ Clear trend: centralization of computing resources in large data centers l l ¢ Necessary ingredients: fiber, juice, and space What do Oregon, Iceland, and abandoned mines have in common? Important Issues: l l Redundancy Efficiency Utilization Management The i. School University of Maryland

Source: Harper’s (Feb, 2008)

Source: Harper’s (Feb, 2008)

Maximilien Brice, © CERN

Maximilien Brice, © CERN

Key Technology: Virtualization App App App OS OS OS Operating System Hypervisor Hardware Traditional

Key Technology: Virtualization App App App OS OS OS Operating System Hypervisor Hardware Traditional Stack Virtualized Stack The i. School University of Maryland

3. Different Computing Models “Why do it yourself if you can pay someone to

3. Different Computing Models “Why do it yourself if you can pay someone to do it for you? ” ¢ Utility computing l l ¢ Platform as a Service (Paa. S) l l ¢ Why buy machines when you can rent cycles? Examples: Amazon’s EC 2, Go. Grid, App. Nexus Give me nice API and take care of the implementation Example: Google App Engine Software as a Service (Saa. S) l l Just run it for me! Example: Gmail The i. School University of Maryland

4. Web Applications ¢ A mistake on top of a hack built on sand

4. Web Applications ¢ A mistake on top of a hack built on sand held together by duct tape? ¢ What is the nature of software applications? l l l ¢ From the desktop to the browser Saa. S == Web-based applications Examples: Google Maps, Facebook How do we deliver highly-interactive Web-based applications? l l AJAX (asynchronous Java. Script and XML) For better, or for worse… The i. School University of Maryland

What is the course about? ¢ Map. Reduce: the “back-end” of cloud computing l

What is the course about? ¢ Map. Reduce: the “back-end” of cloud computing l ¢ Ajax: the “front-end” of cloud computing l ¢ Batch-oriented processing of large datasets Highly-interactive Web-based applications Computing “in the clouds” l Amazon’s EC 2/S 3 as an example of utility computing The i. School University of Maryland

Amazon Web Services ¢ Elastic Compute Cloud (EC 2) l l l ¢ Simple

Amazon Web Services ¢ Elastic Compute Cloud (EC 2) l l l ¢ Simple Storage Service (S 3) l l l ¢ Rent computing resources by the hour Basic unit of accounting = instance-hour Additional costs for bandwidth Persistent storage Charge by the GB/month Additional costs for bandwidth You’ll be using EC 2/S 3 for course assignments! The i. School University of Maryland

This course is not for you… ¢ If you’re not genuinely interested in the

This course is not for you… ¢ If you’re not genuinely interested in the topic ¢ If you’re not ready to do a lot of programming ¢ If you’re not open to thinking about computing in new ways ¢ If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software ¢ If you can’t put in the time Otherwise, this will be a richly rewarding course! The i. School University of Maryland

Source: http: //davidzinger. wordpress. com/2007/05/page/2/

Source: http: //davidzinger. wordpress. com/2007/05/page/2/

Cloud Computing Zen ¢ Don’t get frustrated (take a deep breath)… l l ¢

Cloud Computing Zen ¢ Don’t get frustrated (take a deep breath)… l l ¢ Be patient… l ¢ This is the second first time I’ve taught this course Be flexible… l ¢ This is bleeding edge technology Those W$*#T@F! moments There will be unanticipated issues along the way Be constructive… l Tell me how I can make everyone’s experience better The i. School University of Maryland

Source: Wikipedia

Source: Wikipedia

Source: Wikipedia

Source: Wikipedia

Source: Wikipedia

Source: Wikipedia

Source: Wikipedia

Source: Wikipedia

Things to go over… ¢ Course schedule ¢ Assignments and deliverables ¢ Amazon EC

Things to go over… ¢ Course schedule ¢ Assignments and deliverables ¢ Amazon EC 2/S 3 The i. School University of Maryland

Web-Scale Problems? ¢ Don’t hold your breath: l l ¢ Biocomputing Nanocomputing Quantum computing

Web-Scale Problems? ¢ Don’t hold your breath: l l ¢ Biocomputing Nanocomputing Quantum computing … It all boils down to… l l Divide-and-conquer Throwing more hardware at the problem Simple to understand… a lifetime to master… The i. School University of Maryland

Divide and Conquer “Work” Partition w 1 w 2 w 3 “worker” r 1

Divide and Conquer “Work” Partition w 1 w 2 w 3 “worker” r 1 r 2 r 3 “Result” Combine The i. School University of Maryland

Different Workers ¢ Different threads in the same core ¢ Different cores in the

Different Workers ¢ Different threads in the same core ¢ Different cores in the same CPU ¢ Different CPUs in a multi-processor system ¢ Different machines in a distributed system The i. School University of Maryland

Choices, Choices ¢ Commodity vs. “exotic” hardware ¢ Number of machines vs. processor vs.

Choices, Choices ¢ Commodity vs. “exotic” hardware ¢ Number of machines vs. processor vs. cores ¢ Bandwidth of memory vs. disk vs. network ¢ Different programming models The i. School University of Maryland

Flynn’s Taxonomy Single (SD) Multiple (MD) Data Instructions Single (SI) Multiple (MI) SISD MISD

Flynn’s Taxonomy Single (SD) Multiple (MD) Data Instructions Single (SI) Multiple (MI) SISD MISD Single-threaded process Pipeline architecture SIMD MIMD Vector Processing Multi-threaded Programming The i. School University of Maryland

SISD Processor D D D D Instructions The i. School University of Maryland

SISD Processor D D D D Instructions The i. School University of Maryland

SIMD Processor D 0 D 0 D 1 D 1 D 2 D 2

SIMD Processor D 0 D 0 D 1 D 1 D 2 D 2 D 3 D 3 D 4 D 4 … … … … Dn Dn Instructions The i. School University of Maryland

MIMD Processor D D D D D Instructions Processor D D Instructions The i.

MIMD Processor D D D D D Instructions Processor D D Instructions The i. School University of Maryland

Memory Typology: Shared Processor Memory Processor The i. School University of Maryland

Memory Typology: Shared Processor Memory Processor The i. School University of Maryland

Memory Typology: Distributed Processor Memory Network Processor Memory The i. School University of Maryland

Memory Typology: Distributed Processor Memory Network Processor Memory The i. School University of Maryland

Memory Typology: Hybrid Processor Memory Processor Network Processor Memory Processor The i. School University

Memory Typology: Hybrid Processor Memory Processor Network Processor Memory Processor The i. School University of Maryland

Parallelization Problems ¢ How do we assign work units to workers? ¢ What if

Parallelization Problems ¢ How do we assign work units to workers? ¢ What if we have more work units than workers? ¢ What if workers need to share partial results? ¢ How do we aggregate partial results? ¢ How do we know all the workers have finished? ¢ What if workers die? What is the common theme of all of these problems? The i. School University of Maryland

General Theme? ¢ Parallelization problems arise from: l l Communication between workers Access to

General Theme? ¢ Parallelization problems arise from: l l Communication between workers Access to shared resources (e. g. , data) ¢ Thus, we need a synchronization system! ¢ This is tricky: l l Finding bugs is hard Solving bugs is even harder The i. School University of Maryland

Managing Multiple Workers ¢ Difficult because l l l ¢ Thus, we need: l

Managing Multiple Workers ¢ Difficult because l l l ¢ Thus, we need: l l l ¢ Semaphores (lock, unlock) Conditional variables (wait, notify, broadcast) Barriers Still, lots of problems: l ¢ (Often) don’t know the order in which workers run (Often) don’t know where the workers are running (Often) don’t know when workers interrupt each other Deadlock, livelock, race conditions, . . . Moral of the story: be careful! l Even trickier if the workers are on different machines The i. School University of Maryland

Patterns for Parallelism ¢ Parallel computing has been around for decades ¢ Here are

Patterns for Parallelism ¢ Parallel computing has been around for decades ¢ Here are some “design patterns” … The i. School University of Maryland

Master/Slaves master slaves The i. School University of Maryland

Master/Slaves master slaves The i. School University of Maryland

Producer/Consumer Flow P C P C P C The i. School University of Maryland

Producer/Consumer Flow P C P C P C The i. School University of Maryland

Work Queues P P P shared queue W W W C C C The

Work Queues P P P shared queue W W W C C C The i. School University of Maryland

Rubber Meets Road ¢ From patterns to implementation: l l l ¢ The reality:

Rubber Meets Road ¢ From patterns to implementation: l l l ¢ The reality: l l l ¢ pthreads, Open. MP for multi-threaded programming MPI for clustering computing … Lots of one-off solutions, custom code Write you own dedicated library, then program with it Burden on the programmer to explicitly manage everything Map. Reduce to the rescue! l (for next time) The i. School University of Maryland