Data Base as a Service Seminar ICDE 2010
Data. Base as a Service Seminar, ICDE 2010 桑成良 2010 -10 -18
主要内容 Multi. Tenanc y Virtualization Logical Data Model Storage Model Security Distributed Storage Replication Service Level Agreements Query & Programming Model
虚拟化 APP 1 APP 2 APP 3 APP 4 OPERATING SYSTEM VIRTUAL MACHINE 1 CPU APP 5 OPERATING SYSTEM VIRTUAL MACHINE 2 MEM CPU MEM NET VIRTUAL MACHINE MONITOR (VMM) PHYSICAL STORAGE PHYSICAL MACHINE CPU CPU MEM NET
¡ 配置VM参数(节点资源与数据库) ¡ 配置DSMS参数(给定资源的最优化) a configuration enumerator a cost estimator [sigmod 08]Automatic Virtual Machine Configuration for Database. Workloads
Virtualization in DBaa. S environments DB Layer Instance Layer DB Server Layer VM Layer HW Layer DB DB Instance DB Server VM Instance DB Server VM VM VM
Existing Tools for Node Virtualization DB Layer Instance Layer DB Server Layer VM Layer HW Layer DB DB Instance DB DB DB Advisor • Indexes • MQTs • MDC • Redistribution of Tables DB Workload Manager DB Instance DB Server VM Static Environment Assumptions VM VM • Advisor expects. Node static hardware environment • VM expects static (peak) resource Ressource requirements Model • Interactions between layers can improve performance/utilization VM Configuration • Monitoring • Resources Configuration • (manual) Migration
Virtualization in DBaa. S Environments (2) DB Layer Instance Layer DB Server Layer VM Layer DB DB Instance DB DB DB Instance DB Server VM VM HW Layer Storage Layer Shared Disk Local Disk Instance VM
Virtualization in DBaa. S Environments (2) DB Layer Instance Layer DB Server Layer VM Layer DB DB Instance DB DB Advisor • Indexes • MQTs • MDC • Redistribution of Tables DB Workload Manager Instance DB Server VM VM VM Storage Ressource Model HW Layer Storage Layer Shared Disk Local Disk Storage Configuration • Device Bundling • Replication • Archiving
Multi. Tenanc y Virtualization Logical Data Model Storage Model Security Distributed Storage Replication Service Level Agreements Query & Programming Model
多租赁 ¡ 目标:整合多个客户到同一个可用的系统中 flexible, but limited scalability best resource utilization separate DB per tenant shared DB separate schema shared DB shared schema ¡ 需求: ¡ ☆可扩展(extensibility): 具体客户模式的改变 ¡ ☆安全性(security): 阻止非法的其他租户的数据访问 ¡ ☆性能/伸缩(performance/scalability): scale-up , scale-out ¡ ☆维护(maintenance): 租户级别非数据库级别
模式 Extension Table Universal Table Pivot. Table
Application owns the schema XML columns ¡ Private tables ¡ Universal tables ¡ Extension tables ¡ Chunk folding ¡ XML Columns ¡ Pivot table Chunk folding Extension table Database owns the schema Pivot table Private tables Universal table
Multi. Tenanc y Virtualization Logical Data Model Storage Model Security Distributed Storage Replication Service Level Agreements Query & Programming Model
Map. Reduce ¡ Map. Reduce 是一种 编程模型,适合用來处理大量的数据。它有fault- tolerance, data distribution, I/O scheduling, load balancing等良好特性。 Map. Reduce的整个架构是由Map 以及 Reduce 两个函数 所组成,使用 者输入一组key/value,Map function 产生许多组 intermediate key/value, 然后 reduce function 合并具有相同 key 的intermediate pair,产生最后 的结果。 map (in_key, in_value) -> (out_key, intermediate_value) list { (key, value) } M reduce (out_key, intermediate_value list) -> out_value list R M
example Mapper 1 Hadoop Map/Reduce is a Hadoop 1 Map 1 17 software framework for Reduce 1 is 1 a 1 … 45 easily writing applications … Hadoop Map/Reduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) inparallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner… Sort/Shuffle Reducer Hadoop [1, 1, 1, …, 1] Hadoop 5 Map [1, 1, 1, …, 1] Map 12 Reduce [1, 1, 1, …, 1] Reduce 12 is [1, 1, 1, …, 1] is 42 a [1, 1, 1, …, 1] a 23
Pig Latin ¡ 基于Map. Reduce/Hadoop ¡ 是声明式SQL和过程式Map. Reduce的混合 ¡ 由两部分组成 ¡ Pig. Latin: A Data Processing Language ¡ Pig Infrastructure: An Evaluator for Pig. Latin programs ¡ 例子:网站评估 Visits User URL Time Amy cnn. com 8: 00 10: Amy bbc. com 00 URL Info URL Categ Page. Ra ory nk cnn. co News m bbc. co News m 0. 9 0. 8 Pig Latin: A Not-So-Foreign Language for Data Processing sigmod 08 yahoo
Map 1 过程 load Visits Reduce 1 group by url load URL Info Map 2 foreach url generate count join on url Reduce 2 Map 3 group by category Reduce 3 foreach category generate top 10 URLs
Multi. Tenanc y Virtualization Logical Data Model Storage Model Security Distributed Storage Replication Service Level Agreements Query & Programming Model
乐观复制:元素 1 12 2 12 1. operation submission 12 2. propagation 1+2 12 3. scheduling 1+2 4. conflict resolution 5. commitment Y. Saito, M. Shapiro: Optimistic Replication, ACM Computing Surveys, 5(3): 1 -44, 2005
解决冲突 Single master Thomas write rule Prohibit Ignore Dividing objects, . . . Reduce App-specific ordering or precondition s Vector clocks Syntactic Semantic Detect & repair
不一致性的原则 ¡ (1)Reality is real ¡ ☆业务数据可能并不一定反应现实业务的真实状况。 ¡ (2)Out-of-order works ¡ ☆事务有时可能不以预期的顺序发生,暂时违反一致性约束。 ¡ (3)I will do it eventually ¡ ☆次要数据有时没必要跟重要数据一起更新。 ¡ (4) Process steps should focus: At most one transaction per process step ¡ ☆ Process应该由事件连接的process step组成,process step最多由一个事务组成,这个事务最晚在这个process step 的最后提交。 Principles for inconsistency, SAP research , CIDR 2009
¡ (9)I think I can ¡ ☆流程步骤和用户体验应该设计成支持试探性的操 作和面向道歉的计算 ¡ (10) Solipsists get things done quickly ¡ ☆每个事务基于数据的本地视图,不管其他的本地 事务。(Each transaction acts based on its local view of the data, without considering other local transactions ) ¡ (11)The show must go ¡ ☆无论如何,服务总是可用的。
¡ Transaction component: TC Transactional locking(isolation, cc) Transactional atomicity Transactional logging Log forcing at appropriate times for transaction durability ¡ Data component: DC Provide atomic operations on its data Maintain indexes and storage structures Provide cache management, staging the data pages VLDB 2010 Tutorial Query Processing Concurrency Control Recovery TC DC Access Methods Cache Manager
Application 1 calls Application 2 calls deploys Cloud Services TC 1: transactional recovery&CC DC 1: tables&indexes storage&cache TC 3: transactional recovery&CC DC 4: tables&indexes storage&cache VLDB 2010 Tutorial DC 5: RDF & text DC 6: 3 D-shape index
¡ Separate System and Application State System metadata is critical but small Application data has varying needs Separation allows use of different class of protocols ¡ Partition a database at the scheme level data fragments from multiple tables to be collo- cated in a single database partition support for a rich class of transactions [TR 2010] Elas. Tra. S: An Elastic, Scalable, and Self Managing Transactional Database for the Cloud [Hot. Cloud 2009] Elas. Tra. S: An Elastic Transactional Data Store in. VLDB the 2010 Cloud Tutorial
¡ Limit interactions to a single node Allows systems to scale horizontally Graceful degradation during failures ¡ Decouple Ownership from Data Storage Ownership refers to exclusive read/write access to data Decoupling allows light weight ownership transfer ¡ Limited distributed synchronization is practical Provide strong guarantees only for data that needs it VLDB 2010 Tutorial
Application Clients DB Read/Write Workload TM Master Health and Load Management OTM Lease Management Application Logic Elas. Tra. S Client Metadata Manager Master Proxy MM Proxy OTM Durable Writes Txn Manager Log Manager P 1 P 2 Pn DB Partitions Distributed Fault-tolerant Storage VLDB 2010 Tutorial
¡ Distributed Fault-tolerant Storage (DFS): A consistent, append-only, replicated storage manager Tolerate node failures Ensuring durability of writes ¡ Owning Transaction. Managers (OTM): Executes trans-actions Guarantees ACID properties, Performs concurrency control Recovery on partitions Caches the contents of the partitions it owns ¡ TM Master(TM): Assigningpartitions to OTMs Partition reassignment for load balancing and elasticity Detecting and recovering from OTM failures ¡ Metadata. Manager (MM) The mapping of partitions to OTMs Leasing information for the OTMs and the TM master
Keys located on different nodes Horizontal Partitions of the Keys [ACM SOCC], 2010 G-Store: A Scalable Data Store for Transactional Multi key Access in the Cloud Key Group A single node gains ownership of all keys in a Key. Group Formation Phase. VLDB 2010 Tutorial
[ACM SOCC], 2010 G-Store: A Scalable Data Store for. Transactional Multi key Access in the Cloud Application Clients Transactional Multi-Key Access Grouping Middleware Layer resident on top of a Key-Value Store Grouping Transaction Layer Manager Key-Value Store Logic Distributed Storage G-Store VLDB 2010 Tutorial
Ec. Store Optimistic concurrency contr Muti version Recovery control Load balancing Availability Replication [VLDB 2010] Towards Elastic Transactional Cloud Storage with Range Query Support, Hoang Tam Vo #1, Chun Chen x 2, Beng Chin Ooi #3 VLDB 2010 Tutorial
Multi. Tenanc y Virtualization Logical Data Model Storage Model Security Distributed Storage Replication Service Level Agreements Query & Programming Model
WLM: Model classes workload classification MPL admission control & scheduling transaction response time ¡Admission control: limit the number of simultanously executing requests (multiprogramming level = MPL) ¡ 52 Scheduling: ordering requests by priority result
Workload Modeling & Prediction ¡ Goal: predict resource requirements for a given workload, i. e. , find correlation between query features and performance features ¡ Approaches: regression, correlation analysis, Kernel Canonical CA query plans/ job descr. job feature matrix performanc e statistics performance feature matrix KCCA Ganapathi et al. : Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. ICDE 2009 53 query plan projection performance projection Prediction: Calculate job coordinates in query plan projection based on job feature vector Infer job‘s coordinates on the performance projection
Multi. Tenanc y Virtualization Logical Data Model Storage Model Security Distributed Storage Replication Service Level Agreements Query & Programming Model
outsourcing Private information retrieval / Access privacy User Query Pre/Postprocessor queries Query Engine Data Owner Data Preprocessor Overview and Challenges Data confidentiality/ privacy query results Completeness and correctness Service Provider (un-trusted)
Executing SQL over Encrypted Data Hacigumus et al. , (SIGMOD 2002) ¡ 主要步骤: Partition sensitive domains ¡ Order preserving: supports comparison ¡ Random: query rewriting becomes hard Rewrite queries to target partitions Execute queries and return results Prune/post-process results on client ¡ Privacy-Precision Trade-off Larger segments/partitions increased privacy decreased precision increased overheads in query processing
加密数据查询过程 Metadata Query Translator client-side query Temporary Result result Query Engine User original query server-side query encrypted results Query Executor Client Site Service Provider (un-trusted)
(2)私有信息获取(PIR) ¡ 用户的查询对服务商是不可见的 Xi xi x 1, x 2, …, xn User a 1 = + x l l ϵ Q 1 Service Provider 1 Q 1∈{1, …, n} ¡ 2—server PIR User Q 2=Q 1 + i a 2 = + x l l ϵ Q 2 xi = a 1 + a 2 Service Provider 2
MS Azure Amazon RDS + Simple. DB Amazon Dynamo Google Big. Table Yahoo! PNUTS 60
one DB for all clients one DB per client Amazon RDS Microsoft SQL Azure Amazon Yahoo! PNUTS Simple. DB / Dynamo Google Bigtable Amazon S 3 Virtualization Distributed Storage Replication 61
Dynam Bigtable o PNUTS Query Model get+keybased range scans single table SQL selection+p rojection SQL Logical Data Model keyvalue flexible tables relational Consisten cy Model eventu al relaxed per-record timeline consistenc y strict Transactio ? n Guarantee s row-level ACID Replicatio n GFS recordlevel DB-level datalevel Amazon RDS SQL Azure
Challenges & Trends Resource provisioning: Query & Programming Model • Virtualization on system and database level Logical Data Model Virtualization Service-level agreements: • Shielding: one (virtual) box per client • Limiting functionality: SQL vs. put/get operations • Workload management Storage Model Distributed Storage Confidentiality and trust • Data encryption • Information distribution Scalability and availability • Through redundancy and partitioning • But may affect consistency model 63 Service Level Agreements Expressiveness: • Limiting functionality: SQL vs. put/get vs. MR
- Slides: 63