What is new in the cloud Donald Kossmann
- Slides: 40
What is new in the cloud? Donald Kossmann ETH Zurich http: //systems. ethz. ch
Acknowledgments
Questions?
Agenda • Why? • How? • What?
Simple Truths • „Power of data“ – the more data the merrier (GB -> TB -> PB) – data comes from everywhere in all shapes – value of data often discovered later – data has no owner within an organization (no silos!) • Services turn data into $ – the more services the merrier (10 s -> 1000 s -> Ms) – need to adapt quickly • Examples: Google, FB, Amadeus, Walmart, BMW, . . . • Platforms: Oracle, MS, SAP, Google, . . . , 28 msec
Promises of cloud computing? • Cost – „pay as you go“ for HW and SW • no upfront cost / investment: Cap. Ex vs. Op. Ex • scale down if service becomes less popular – utilization: statistical allocation of resources – out-source and commoditize computing • HW automatically gets cheaper and faster • economy of scale for admin: patches, backups, etc. – failures: cost of preventing and having failures • Time to market – avoid unnecessary steps • HW provisioning, puchasing, test
What to optimize? Feature Traditional Cloud Cost [$] fixed optimize Performance [tps, secs] optimize fixed Scale-out [#cores] optimize fixed - fixed ? ? ? - optimize Predictability [s($)] Consistency [%] Flexibility [#variants] Put $ on the y-axis of your graphs!!! [Florescu & Kossmann, SIGMOD Record 2009]
Misconceptions • Variable Cost -> Unpredictable Cost – pay-as-you-go and predictability can be combined – IT department needs to rethink „budget models“ • Performance is more fundamental than $ – at that scale, prices must be honest – how relevant are your perf. numbers of 1992 today? – technology follows business; business follows technol. • Time is money („secs“ ~ „$“ in my graphs) – often true; often enough not true: • Put computing where the energy is (ocean, desert, . . . ) • Writing inner track of disk consumes 2 x energy [Source: SIGMOD, VLDB, ICDE Reviews]
Problem: Vendor Lock-In • Hardware – no standard APIs for Iaa. S – expensive to move TBs of data between clouds – this was actually a solved problem before the cloud • Platform – Paa. S makes it neither better nor worse – (situation is very bad as is) • Apps and Devices – i. Tunes, Google Docs, Amazon Kindle, i. Phone Apps, . . . – they own your data; you don´t own their (paid for) data
Agenda • Why? • How? • What?
Teach your DBMS to swim + Industry: Add a layer to your favorite DBMS
Research Perspective. . . It is time to start from scratch!
Scope of this talk • Workloads: Focus on OLTP – OLAP under heavy debate by others – streaming not addressed yet (~ OLTP) – testing, archiving, etc. is boring • Types of clouds: Any type – both private, public, hybrid • only difference: private clouds have planned downtime – cloud on the chip – swarms: ad-hoc private clouds • Iaa. S vs. Paa. S vs. Saa. S: Focus on Paa. S
Game Changers • OLTP: „Key-value Store“ vs. „DBMS“ [No-SQL] – virtually infinite scale-out – fault-tolerance • Virtualization – transparent use of resources (computers + humans) • hide heterogeneity of resources • 100 Ks machines are a reality – problems that need 100 Ks machines are a reality
Reference Architecture Client HTTP XML, JSON, HTML Web Server FCGI, . . . XML, JSON, HTML App Server SQL records DB Server get/put block Store
Open Questions Client • How to map stack to Iaa. S? Web Server • How to implement store layer? App Server • What consistency model? DB Server • What programming model? Store • Whether and how to cache?
Variant I: Partition Workload by „Request“ Client HTTP Client XML, JSON, HTML Workload Splitter Web Server FCGI, . . . XML, JSON, HTML Server-A Server-B App Server SQL records DB Server get/put block Store-A Store-B
Partition Workload by „Request“ • Principle – partition data by „tenant“ – route request to DB of that tenant • Advantages – reuse existing database stack (RDBMS) • Disadvantages – multi-tenant problem [Salesforce], [Jacobs] • optimization, migration, load balancing, fix cost – need DB federator for inter-tenant requests – expensive HW and SW for high availabilty
Variant II: Partition Workload by „Load“ Client HTTP Client XML, JSON, HTML Workload Splitter Web Server FCGI, . . . XML, JSON, HTML Server-A Server-B App Server ? ? ? SQL records Store (e. g. , S 3) DB Server get/put block Store
Partition Workload by „Load“ • Principle – fine-grained data partitioning by page or object – any server can handle any request – implement DBMS as a library (not server) • Advantages – avoids disadvantages of Variant I • Disadvantages – new synchronization problem (CAP theorem) – whole new breed of systems – caching not effective (see later)
Experiments [Loesing et al. 2010] • TPC-W Benchmark – throuphput: WIPS – latency: fixed depending on request type – cost: cost / WIPS, total cost, predictability • Players – Amazon RDS, Simple. DB – 28 msec [Brantner et al. 2008] – Google App. Engine – Microsoft Azure
Scale-up Experiments
Cost / WIPS (m$) Low Load Peak Load Amazon RDS (V 1) 1. 212 0. 005 Amazon S 3 (V 2) - 0. 007 Google AE/C (V 2) 0. 002 0. 028 MS Azure (V 1) 0. 775 0. 005
Open Questions • • • How to map traditional DB stack to Iaa. S? How to implement the storage layer? What is the right consistency model? What is the right programming model? Whether and how to make use of caching?
Store Variants • Traditional (e. g. , Amazon EBS) – local disks with physically exclusive access – put/get interface; no synchronization – only works for V 1 • Key-value stores (e. g. , Amazon S 3) – DHTs with concurrent access – put/get interface; no synchronization – works for V 1 and V 2; makes more sense for V 2
Open Questions • • • How to map traditional DB stack to Iaa. S? How to implement the storage layer? What is the right consistency model? What is the right programming model? Whether and how to make use of caching?
CAP Theorem • Three properties of distributed systems – Consistency (ACID transactions w. serializability) – Availability (nobody is ever blocked) – resilience to network Partitioning • Result – it is trivial to achieve 2 out of 3 – it is impossible to have all three • Two schools – Databases: sacrifice availability – Distributed systems: sacrifice consistency
Why sacrifice Consistency? • It is a simple solution – nobody understands what sacrificing „P“ means – sacrificing „A“ is unacceptable in the Web – possible to push the problem to app developer • „C“ not needed in many applications – Banks do not implement ACID (classic example wrong) – Airline reservation only transacts reads (Huh? ) – My. SQL et al. ship by default in lower isolation level • Data is noisy and inconsistent anyway – making it, say, 1% worse does not matter [Vogels, VLDB 2007]
What have people done? • Client-side Consistency Models [Tannenbaum], [PNUTS 08] • New DB transaction models – Escrow, Reservation Pattern [O‘Neil 86], [Gawlick 09] – SAGAs and compensation; e. g. , in BPEL [G. -Molina, Salem] – SAP, Amadeus et al. [Buck-Emden], [Kemper et al. 98] • Limit the size of transacted data – E. g. , Microsoft Azure • Levels of Consistency, Consistency-Cost Tradeoffs – read/write monotonicy + „A“ + „P“ [Brantner 08] – economic models for consistency [Amadeus], [Kraska 09] • Educate Application Developers [Helland 2009]
Open Questions • • • How to map traditional DB stack to Iaa. S? How to implement the storage layer? What is the right consistency model? What is the right programming model? Whether and how to make use of caching?
Programming Model • Properties of a programming lang. for the cloud – support DB-style + OO-style – avoid keeping state at servers for V 2 architecture • Many languages will work in the cloud – SQL, XQuery, Ruby, . . . ; we have shown it for XQuery – J 2 EE will not work • Open (research) questions – do OLAP on the OLTP data: My guess is yes! – rewrite your apps: My guess is yes!
Caching • Many Variants Possible – this is just one – V 1 caching mandatory – V 2 caching prohibitive • TPC-W Experiments – marginal improvements for Google App. Engine • No low hanging fruit
Agenda • Why? • How? • What?
What is Sausalito? • Application Server + Web Server + Database – keeps any kind of data – runs services • Fully cloud-enabled – full elasticity (cost and throughput) – full fault-tolerance – runs on cheap hardware (private and public clouds) • Fully Web Standard compliant – Web Services, REST – XML, JSON, CSV, . . . – XML Schema, XQuery, XPath
Sausalito in the Cloud (V 2) 36
Sausalito in the Cloud (offline) App 1
Bets Made • How to map traditional DB stack to Iaa. S? – implemented both architectures (V 1 + V 2) – V 1 only in a single server variant for low end • How to implement the storage layer? – EBS for V 1; KVS for V 2 • What is the right consistency model? – ACID for V 1; configurable for V 2 • What is the right data + programming model? – XML & XQuery • Whether and how to make use of caching? – No! (Only for code / precompiled query plans)
Cloud: Fans and Skeptics • Fans – VCs: low Cap. Ex, Gartner hype – USA Government: lack of alternative – Departments: time-to-market, by-pass IT dept. – USA Researchers: next big thing – IT start-ups: levels the field • Skeptics – EU Government: next big USA thing – EU Researchers: burnt by Grid Computing – IT department: lock-in, become irrelevant – Big enterprise IT vendors: low margins, forced to adapt
Conclusion • Researchers study tradeoffs – Key-values stores are game changers – Measuring $ is a game changer – MMDBs (Clock. Scan) could be a game changer • Entrepreneurs make bets – Pay per use is a game changer – XML & XQuery could be game changers • Personal experience: You cannot do both! – You cannot play and observe at the same time [Heisenberg]
- Donald kossmann
- Which computing refers to applications and services
- Cloud integration patterns
- Public cloud vs private cloud cost analysis
- Lepsnap
- Hình ảnh bộ gõ cơ thể búng tay
- Ng-html
- Bổ thể
- Tỉ lệ cơ thể trẻ em
- Chó sói
- Thang điểm glasgow
- Chúa yêu trần thế
- Các môn thể thao bắt đầu bằng tiếng bóng
- Thế nào là hệ số cao nhất
- Các châu lục và đại dương trên thế giới
- Cong thức tính động năng
- Trời xanh đây là của chúng ta thể thơ
- Mật thư anh em như thể tay chân
- Làm thế nào để 102-1=99
- độ dài liên kết
- Các châu lục và đại dương trên thế giới
- Thơ thất ngôn tứ tuyệt đường luật
- Quá trình desamine hóa có thể tạo ra
- Một số thể thơ truyền thống
- Cái miệng nó xinh thế chỉ nói điều hay thôi
- Vẽ hình chiếu vuông góc của vật thể sau
- Thế nào là sự mỏi cơ
- đặc điểm cơ thể của người tối cổ
- V cc
- Vẽ hình chiếu đứng bằng cạnh của vật thể
- Tia chieu sa te
- Thẻ vin
- đại từ thay thế
- điện thế nghỉ
- Tư thế ngồi viết
- Diễn thế sinh thái là
- Các loại đột biến cấu trúc nhiễm sắc thể
- Số.nguyên tố
- Tư thế ngồi viết
- Lời thề hippocrates
- Thiếu nhi thế giới liên hoan