EECE 571 R Dataintensive computing systems Matei Ripeanu

  • Slides: 54
Download presentation
EECE 571 R: Data-intensive computing systems Matei Ripeanu matei at ece. ubc. ca Matei

EECE 571 R: Data-intensive computing systems Matei Ripeanu matei at ece. ubc. ca Matei Ripeanu, UBC

Contact Info Email: matei @ ece. ubc. ca Office: KAIS 4033 Office hours: by

Contact Info Email: matei @ ece. ubc. ca Office: KAIS 4033 Office hours: by appointment (email me) Course page: http: //www. ece. ubc. ca/~matei/EECE 571/ Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 2

EECE 571 R: Course Goals l Primary – Gain deep understanding of fundamental issues

EECE 571 R: Course Goals l Primary – Gain deep understanding of fundamental issues that affect design of: > Data-intensive systems > (more generally) Large-scale distributed systems – Survey main current research themes – Gain experience with distributed systems research > Research on: federated system, networks l Secondary – By studying a set of outstanding papers, build knowledge of how to do & present research – Learn how to read papers & evaluate ideas Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 3

What I’ll Assume You Know l Basic Internet architecture – IP, TCP, DNS, HTTP

What I’ll Assume You Know l Basic Internet architecture – IP, TCP, DNS, HTTP l Basic principles of distributed computing – Asynchrony (cannot distinguish between communication failures and latency) – Incomplete & inconsistent global state knowledge (cannot know everything correctly) – Failures happen (In large systems, even rare failures of individual components, aggregate to high failure rates) l If there are things that don’t make sense, ask! Matei Ripeanu, UBC 4

Outline l Case study (and project ideas): – Volunteer computing: SETI@home /BOINC – Virtual

Outline l Case study (and project ideas): – Volunteer computing: SETI@home /BOINC – Virtual Data System – Batch Aware Distributed File System l Administrative Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 5

Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 6

Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 6

How does it work? SETI@home Master-worker architecture Characteristics: l Fixed-rate data processing task l

How does it work? SETI@home Master-worker architecture Characteristics: l Fixed-rate data processing task l Low bandwidth/computation ratio l Independent parallelism l Error tolerance Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 7

SETI@home Operations tape backup user DB data recorder science DB tape archive, delete redundancy

SETI@home Operations tape backup user DB data recorder science DB tape archive, delete redundancy checking master DB DLT tapes CGI program garbage collector acct. queue result queue splitters screensavers WU storage data server web page generator web site Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) RFI elimination repeat detection 8

History and Statistics l l l Conceived 1995, launched April 1999 Millions of users,

History and Statistics l l l Conceived 1995, launched April 1999 Millions of users, hosts… No ET signals yet, but other results Total Users Last 24 Hours (as of Wed Feb 23 07: 04: 51) 5, 361, 313 4, 391 Results received 1, 779 millions Total CPU time 2. 2 million years 3610. 717 years Average CPU time/work unit 10 hr 58 min 14. 0 sec 6 hr 19 min 30. 1 sec Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 5 million 9

Millions of individual contributors! (Problems) l Server scalability l Dealing with excess CPU time

Millions of individual contributors! (Problems) l Server scalability l Dealing with excess CPU time l Untrusted environment: Bad user behavior – Cheating – Team recruitment by spam – Sale of accounts on e. Bay l Malfunctions of individual components Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 10

SETI@home: Summary l l l The characteristics of the problem … – Massive (“embarrassing”)

SETI@home: Summary l l l The characteristics of the problem … – Massive (“embarrassing”) parallelism – Low bandwidth/computation ratio – Fixed-rate data processing task … make possible a solution that operates in an unfriendly environment – Wide area distribution; huge scale – High failure rates – Untrusted/malicious components Solution: Master-worker design > Master=central point of control > Single point of failure > Performance bottleneck Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 11

Outline l Case study (and project ideas): – Volunteer computing: SETI@home /BOINC – Virtual

Outline l Case study (and project ideas): – Volunteer computing: SETI@home /BOINC – Virtual Data System – Batch Aware Distributed File System l Administrative Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 12

Virtual Data System l Context: ’big science’ l Motivation/goals: support science process, – i.

Virtual Data System l Context: ’big science’ l Motivation/goals: support science process, – i. e. , track all aspects of data capture, production, transformation, and analysis l l Requirements: ability to define complex workflows, and to reliably & efficiently execute workflows in heterogeneous, multi-domain environments. Derived benefits: helps to audit, validate, reproduce, and/or rerun with corrections various data transformations. Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 13

BIG Science! The European Organisation for Nuclear Research CERN builds particle accelerators for particle

BIG Science! The European Organisation for Nuclear Research CERN builds particle accelerators for particle physics research Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 14

detector event filter (selection & reconstruction) reconstruction Data Handling and Computation for Physics Analysis

detector event filter (selection & reconstruction) reconstruction Data Handling and Computation for Physics Analysis processed data event summary data raw data event reprocessing analysis batch physics analysis event simulation CER N simulation interactive physics analysis les. robertson@cern. ch analysis objects (extracted by physics topic)

CMS Grid Hierarchy 2500 Physists, 40 countries Experiment 10 s of Petabytes/Yr by 2008

CMS Grid Hierarchy 2500 Physists, 40 countries Experiment 10 s of Petabytes/Yr by 2008 Online System Tier 0 Bunch crossing per 25 ns 100 triggers per second ~1 MByte per event Tier 1 France Center 100 MB~1. 5 GB/sec 10 ~ 40 Gbits/sec CERN Computer Center > 20 TIPS Italy Center UK Center USA Center 2. 5 -10 Gbits/sec Tier 2 Center 0. 6 -2. 5 Gbits/sec Institute Tier 3 Physics data cache Tier 4 Institute Institute 0. 1 -1 Gbits/sec Workstations, other portals Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 16

“I’ve come across some interesting data, but I need to understand the nature of

“I’ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes. ” Motivations (1) Data Product-of Transformation execution-of “I’ve detected a calibration error in an instrument and want to know which derived data to recompute. ” consumed-by/ generated-by Derivation “I want to search an astronomical database for “I want to apply an galaxies with certain astronomical analysis characteristics. If a program to millions of objects. that performs this analysis exists, I won’t have to write one If the results already exist, I’ll Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) from Matei scratch. ” save weeks of computation. ” 17

Motivations (2) l Data track-ability and result audit-ability l Repair and correction of data

Motivations (2) l Data track-ability and result audit-ability l Repair and correction of data – Rebuild data products—c. f. , “make” l Workflow management – A new, structured paradigm for organizing, locating, specifying, and requesting data products l Performance optimizations – Ability to re-create data rather than move it Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 18

Requirements l Express complex multi-step “workflows” – Perhaps 100, 000 s of individual tasks

Requirements l Express complex multi-step “workflows” – Perhaps 100, 000 s of individual tasks l Operate on heterogeneous distributed data – Different formats & access protocols l Harness many computing resources – Parallel computers &/or distributed Grids l Execute workflows reliably – Despite diverse failure conditions l Enable reuse of data & workflows – Discovery & composition l Support many users, workflows, resources – Policy specification & enforcement Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 19

Workflow spec VDL Program Virtual Data catalog Virtual Data Workflow Generator Virtual Data System

Workflow spec VDL Program Virtual Data catalog Virtual Data Workflow Generator Virtual Data System Create Execution Plan Grid Workflow Execution Statically Partitioned DAG Dynamically Planned DAG Local planner DAGman & Condor-G Job Planner Job Cleanup Abstract workflow Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 20

VDS Software Stack l Express complex multi-step “workflows” – Perhaps 100, 000 s of

VDS Software Stack l Express complex multi-step “workflows” – Perhaps 100, 000 s of individual tasks l Operate on heterogeneous distributed data – Different formats & access protocols l Harness many computing resources – Parallel computers &/or distributed res. l Execute workflows reliably & efficiently – Despite diverse failure conditions l Enable reuse of data & workflows – Discovery & composition l Support many users, workflows, resources – Policy specification & enforcement Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) VDL, XDTM Pegasus, DAGman, Globus VDC TBD 21

Outline l Case study (and project ideas): – Volunteer computing: SETI@home /BOINC – Virtual

Outline l Case study (and project ideas): – Volunteer computing: SETI@home /BOINC – Virtual Data System – Batch Aware Distributed File System l Administrative Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 22

Batch-aware Distributed File System Matei Ripeanu, UBC

Batch-aware Distributed File System Matei Ripeanu, UBC

Motivating question: Are existing distributed file systems adequate for batch computing workloads? l NO.

Motivating question: Are existing distributed file systems adequate for batch computing workloads? l NO. Internal decisions inappropriate – Caching, consistency, replication l A solution: Combine scheduling knowledge with external storage control – Detail information about workload is known – Storage layer allows external control – External scheduler makes informed storage decisions l Combining information and control results in – Improved performance – More robust failure handling – Simplified implementation Explicit Control in a Batch-Aware Distributed File System, John Bent, Douglas Thain, Andrea 24 Matei Ripeanu, Remzi UBC EECE 571 R Data-intensive computing (Spring’ 07) C. Arpaci-Dusseau, H. Arpaci-Dusseau, Miron Livny, (NSDI '04)

Outline l Batch computing – Systems – Workloads – Environment – Why not DFS?

Outline l Batch computing – Systems – Workloads – Environment – Why not DFS? l Solution: BAD-FS – Design – Experimental evaluation Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 25

Batch computing t e rn e t In Home storage Matei Ripeanu, UBC EECE

Batch computing t e rn e t In Home storage Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 26

Batch computing l Not interactive l Compute Loop – Users submit jobs > Job

Batch computing l Not interactive l Compute Loop – Users submit jobs > Job description languages – System itself executes – Results are copied back to user system l Many exiting batch systems – Condor, LSF, PBS, Sun Grid Engine Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 27

Batch computing Compute node CPU Manager Internet Home storage 1 2 3 4 1

Batch computing Compute node CPU Manager Internet Home storage 1 2 3 4 1 2 Scheduler 3 4 Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) Job queue 28

Batch workloads l General properties – Large number of processes – Process and data

Batch workloads l General properties – Large number of processes – Process and data dependencies – I/O intensive l Different types of I/O – Endpoint – Batch – Pipeline l Usage: mainly scientific workloads, but also video production, data mining, electronic design, financial services, graphic rendering Pipeline and Batch Sharing in Grid Workloads, Douglas Thain, John Bent, Andrea Arpaci 29 Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) Dusseau, Remzi Arpaci-Dussea, Miron Livny. HPDC 12, 2003.

Pipeline Batch workloads Endpoint Pipeline Endpoint Pipeline Pipeline Endpoint Matei Ripeanu, UBC EECE 571

Pipeline Batch workloads Endpoint Pipeline Endpoint Pipeline Pipeline Endpoint Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) Batch dataset 30

Cluster-to-cluster (c 2 c) l Not quite p 2 p – More organized –

Cluster-to-cluster (c 2 c) l Not quite p 2 p – More organized – Less hostile – More homogeneity l Home store Internet Each cluster is autonomous – Run and managed by different entities l An obvious bottleneck is wide-area network Q: How to manage flow of data into, within and out of these clusters? Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 31

Why not a traditional Distributed File System ? l Internet Home store Distributed file

Why not a traditional Distributed File System ? l Internet Home store Distributed file system (DFS) would be ideal – Easy to use – Uniform name space l But. . . – Designed for wide-area networks – Not practical – Embedded decisions are wrong Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 32

Distributed file systems make ‘bad’ decisions l Caching – Must guess what and how

Distributed file systems make ‘bad’ decisions l Caching – Must guess what and how to cache l Consistency – Output: Must guess when to commit – Input: Needs mechanism to invalidate cache l Replication – Must guess what to replicate Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 33

BAD-FS makes ‘good’ (i. e. informed) decisions l Removes the guesswork – Scheduler has

BAD-FS makes ‘good’ (i. e. informed) decisions l Removes the guesswork – Scheduler has detailed workload knowledge – Storage layer designed to allow external control – Scheduler makes informed storage decisions > Manages data as well as computations l Retains simplicity of distributed file systems l Practical and deployable Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 34

Outline l l Introduction Batch computing – Systems – Workloads – Environment – Why

Outline l l Introduction Batch computing – Systems – Workloads – Environment – Why not DFS? l One solution: BAD-FS – Design – Experimental evaluation Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 35

Solution BAD-FS: Practical and deployable l l User-level; requires no privilege Packaged as a

Solution BAD-FS: Practical and deployable l l User-level; requires no privilege Packaged as a modified batch system SGE SGE BADFS Home store l l Internet SGE SGE A new batch system which includes BAD-FS General: will work on all batch systems Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 36

Solution BAD-FS: Components Compute node CPU Manager Storage BAD-FS Manager BAD-FS Manager 1) Storage

Solution BAD-FS: Components Compute node CPU Manager Storage BAD-FS Manager BAD-FS Manager 1) Storage managers 2) Batch-Aware Distributed File System 3) Expanded job description language Home storage 1 2 3 4 4) BAD-FS scheduler BAD-FS Scheduler Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) Job queue 37

Information used l Remote cluster knowledge – Storage availability – Failure rates l Workload

Information used l Remote cluster knowledge – Storage availability – Failure rates l Workload knowledge – Data type (batch, pipeline, or endpoint) – Data quantity – Job dependencies Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 38

Control through volumes l Guaranteed storage allocations – Containers for job I/O l Scheduler

Control through volumes l Guaranteed storage allocations – Containers for job I/O l Scheduler – Creates volumes to cache input data > Subsequent jobs can reuse this data – Creates volumes to buffer output data > Destroys pipeline, copies endpoint – Configures workload to access containers Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 39

Knowledge plus control l Enhanced performance – I/O scoping – Capacity-aware scheduling l Improved

Knowledge plus control l Enhanced performance – I/O scoping – Capacity-aware scheduling l Improved failure handling – Cost-benefit replication l Simplified implementation – No cache consistency protocol Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 40

Real workload experience l Setup – 16 jobs – 16 compute nodes – Emulated

Real workload experience l Setup – 16 jobs – 16 compute nodes – Emulated wide-area l Configuration – Remote I/O – AFS-like with /tmp – BAD-FS l Result is order of magnitude improvement Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 41

BAD-FS Lessons l Generic solutions may be inefficient – Often designed with specific tradeoffs

BAD-FS Lessons l Generic solutions may be inefficient – Often designed with specific tradeoffs in mind (e. g. , most common workloads) l Fix: – Redesign for new workload – Use explicit information available at runtime to optimize the execution of lower layers Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 42

Course Organization/Syllabus/etc. Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 43

Course Organization/Syllabus/etc. Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 43

Administravia: Course structure l Lectures – About 1/3 of all classes l Student projects

Administravia: Course structure l Lectures – About 1/3 of all classes l Student projects – Aim high! Have fun! It’s a class project, not your Ph. D! – Teams of up to 3 students – Project presentations at the end of the term l Paper discussion – The other classes Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 44

Administravia: Weekly schedule (tentative) Introduction. Overview of current research problems, technologies, and applications. 2.

Administravia: Weekly schedule (tentative) Introduction. Overview of current research problems, technologies, and applications. 2. File system semantics, data durability and availability, replication and consistency, fault-tolerance. 3. Data storage technologies. Storage hierarchies. Capacity management. 4. Scientific applications: data access patterns, workload characterization. 5. Integration with compute systems. Grids and Virtual Data 6. Performance focus: caching, parallel access, striping. 7. Structured overlays. Distributed hash tables. Data systems harnessing structured overlays. 8. Security. 9. Applications I: Experience with deployed systems. (NFS, AFS, Google File System) 10. Applications II: Data archival. Cooperative internet proxy caches. Content distribution networks. 11. Applications III: Peer-to-peer file-sharing (Bit. Torrent, Free. Loader) 12. Project presentations 1. Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 45

Administravia: Grading l Paper reviewing: 35% l Discussion leading: 15% l Project: 50% Matei

Administravia: Grading l Paper reviewing: 35% l Discussion leading: 15% l Project: 50% Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 46

Administravia: Paper Reviewing (1) l Goals: – – – l l l Think of

Administravia: Paper Reviewing (1) l Goals: – – – l l l Think of what you read Expand your knowledge beyond the papers that are assigned Get used to writing paper reviews Reviews due by midnight the day before the class Be professional in your writing Have an eye on the writing style: – – Clarity Beware of traps: learn to use them in writing and detect them in reading – Detect (and stay away from) trivial claims. E. g. , 1 st sentence in the Introduction: “The tremendous/unprecedented/phenomenal growth/scale/ubiquity of the Internet…” Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 47

Administravia: Paper Reviewing (2) Follow the form provided when relevant. l State the main

Administravia: Paper Reviewing (2) Follow the form provided when relevant. l State the main contribution of the paper l Critique the main contribution: l Rate the significance of the paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). l Explain your rating in a sentence or two. Rate how convincing the methodology is. l Do the claims and conclusions follow from the experiments? l Are the assumptions realistic? l Are the experiments well designed? l Are there different experiments that would be more convincing? l Are there other alternatives the authors should have considered? l (And, of course, is the paper free of methodological errors? ) Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 48

Administravia: Paper Reviewing (3) l l What is the most important limitation of the

Administravia: Paper Reviewing (3) l l What is the most important limitation of the approach? What are three strongest and/or most interesting ideas in the paper? l What are three most striking weaknesses in the paper? l Name three questions that you would like to ask the authors. l l Detail an interesting extension to the work not mentioned in the future work section. Optional comments on the paper that you’d like to see discussed in class. Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 49

Administravia: Discussion leading l Come prepared! – Prepare discussion outline – Prepare questions: >

Administravia: Discussion leading l Come prepared! – Prepare discussion outline – Prepare questions: > “What if”s > Unclear aspects of the solution proposed >… – Similar ideas in different contexts – Initiate short brainstorming sessions l l Leaders do NOT need to submit paper reviews Main goals: – Keep discussion flowing – Keep discussion relevant – Engage everybody (I’ll have an eye on this, too) Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 50

Administravia: Projects l l Combine with your research if relevant to the class Get

Administravia: Projects l l Combine with your research if relevant to the class Get approval from all instructors if you overlap final projects: – Don’t sell the same piece of work twice – You can get more than twice as many results with less than twice as much work l Aim high! – Put one extra month and get a publication out of it – It is doable! l Try ideas that you postponed out of fear: it’s just a class, not your Ph. D. Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 51

Administravia: Project deadlines (tentative) l l 3 rd week (Tue): 1 -page project proposal

Administravia: Project deadlines (tentative) l l 3 rd week (Tue): 1 -page project proposal 5 th week (Tue): 3 -page literature survey – Know relevant work in your problem area – If implementation project, list tools, similar projects – Expand proposal l 7 th week (Tue): 5 -page Midterm project due – Have a clear image of what’s possible/doable – Report preliminary results l First week of exam session: In-class project presentation – Demo, if appropriate l Last week of exam session: – 10 -page write-up Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 52

Next Class (Thu, 11/01) l Note room change: KAIS l Discussion of some project

Next Class (Thu, 11/01) l Note room change: KAIS l Discussion of some project ideas l Presentation by Matei To do: l l Subscribe to mailing list Volunteers for discussion leaders for class next week Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 53

Questions? Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 54

Questions? Matei Ripeanu, UBC EECE 571 R Data-intensive computing (Spring’ 07) 54