Query Optimization Over Web Services Utkarsh Kamesh Shrivastava
Query Optimization Over Web Services Utkarsh Kamesh Shrivastava Munagala Jennifer Wisdom Rajeev Motwani Presented By Ajay Kumar Sarda CS 632 1
Motivation Web services emerging as a popular standard for sharing data and functionality Databases behind web services DBMS-like capabilities when data sources are web services Need for query optimization for queries spanning multiple web services CS 632 2
Motivating Example A credit card company wants to send out mails for it’s new credit card offer. I: Potential recipient names WS 1: name(n) credit rating (cr) WS 2: name(n) credit card number (ccn) WS 3: card number (ccn) payment history (ph) One Possible execution is WS 1, WS 2, WS 3 Is it optimal? CS 632 3
Challenges Different response time of web services Precedence constraints Tradeoff between linear pipeline and parallelism Parsing SOAP/XML headers overhead
Related Work Query optimization in the presence of limited access patterns Binding pattern R (Ab, Bf) Annotated query plans in the search space, prunes invalid and non-viable plans Starts with initial set S of plans containing only atomic plans S is iteratively updated by adding new plans obtained by combining plans from S using selection and join operations
Outline of the Talk WSMS Preliminaries Query Optimization with and without precedence constraints Data Chunking Experimental Evaluation Conclusion Future work CS 632 6
WSMS Architecture CS 632 7
Query Model Web Service denoted as WS(Xbi, , Yfi) Xi - Bound Attributes Yi - Free Attributes CS 632 8
Query Model (Contd. ) CS 632 9
Query Plans CS 632 10
Execution Model Ti created for each web service Ti takes input from join thread Ji Ji joins the outputs of parents of WSi Jout joins the outputs of all leaves web service. CS 632 11
Execution Model (Contd. ) CS 632 12
Statistics Per-tuple response time(Ci) ci=1/ri where ri is maximum rate of at which results of invocations can be obtained from Wsi Depends on web service provisioning, network conditions and load on the web service Selectivity(Si) Average number of returned tuples that remain unfiltered after applying predicates Si <=1 (selective) or Si > 1 (proliferative) CS 632 13
Bottleneck Cost Metric Query plan H Pi(H) -the set of predecessors of WSi in H R[S]-- the combined selectivity of all the web services in S Every tuple in I input to plan H, the average number of tuples that WSi needs to process is given by R[Pi(H)] Average processing time required by WSi per original input tuple in I is is R[Pi(H)]. Ci Cost of the query plan H max(R[Pi(H)]. Ci) CS 632 14
Bottleneck Cost Metric (Contd. ) § Plan 1 : max(2*I, 10*0. 1*I, 5*0. 5*I)=2. 5 § Plan 2 : max(2*I, 10*I, 5*5*I)=25 § Plan 2 is 10 times slower than plan 1 CS 632 15
Q. O without Precedence Constraints Lemma: “There exists an optimal plan that is a linear ordering of the selective web services, i. e. , has no parallel dispatch of data. ” Si CS 632 16
Q. O without Precedence Constraints Lemma: “Let WS 1, . . . , WSn be a plan with a linear ordering of the selective web services. If ci > ci+1, then WSi and WSi+1 can be swapped without increasing the cost of the plan. ” Fi. Si. Ci+1 Fi. Si+1 Ci F i. C i Fi. Ci+1 (Si, Ci) (Si+1, Ci+1) Ci+1 Ci Ci > Ci+1
Q. O without Precedence Constraints(Contd. ) Theorem : “For selective web services with no precedence constraints, the optimal plan is a linear ordering of the web services by increasing response time, ignoring selectivity's. ” CS 632 18
Q. O with Precedence Constraints Constructs the plan DAG H incrementally by greedily adding to it one web service at a time Web service chosen should be the one that can be added to H with minimum cost, and all of whose prerequisite web services have already been added to H Mi -- the set of all web services that are prerequisites for WSi CS 632 19
Adding a Web Service to the Plan A partial plan H (bar) and add WSx Compute the best cut Cx such that on placing edges from the web services in Cx to WSx, cost is minimized PCx –set of all the web services in Cx and all the predecessors in H(bar) Cost incurred by adding WSx is Cost(WSx)=R[PCx]. Cx CS 632 20
Adding a Web Service (Contd. ) • A variable Zi with every WSi, set to 1 if Wsi belongs to PCx. • Optimal set PCx obtained by solving LP problem CS 632 21
Greedy Algorithm CS 632 22
Data Chunking Parsing SOAP/XML headers and network cost overhead on web service call Pass tuples to a web service in chunks Response time of WSi depends on input chunk size Ci(k) – Response time of WSi on a chunk of size k A limit kimax exists on max chunk size CS 632 23
Data Chunking (Contd. ) Query Optimizer must decide on optimal chunk size for each web service “The optimal chunk size to be used by WSi is Ki* such that ci(Ki*)/Ki* is minimized” Profiling combined with query processing for trying out various chunk sizes Intermediate tuples between any two web services in the pipelined plan are buffered CS 632 24
Experimental Evaluation Total running time as metric Compare the plans produced by optimizer against Parallel – Dispatch data in parallel Sel. Order—Choose WS with lower selectivity Compare the running time with and without chunking Compare the WSMS cost against the slowest web service CS 632 25
Experimental Setup WSMS prototype is multithreaded system in Java Apache Axis tools for communicating with web services Java Reflection Different costs by varying delays Different selectivities by rejecting tuple with probability 1 -Si CS 632 26
No Precedence Constraints WS 1, WS 2, WS 3, WS 4 Selectivities set as 0. 4, 0. 3, 0. 2, 0. 1 Range of cost c varied from [0. 2, 2] to [2, 2] Parallel – WS 4 Sel. Order – WS 4 CS 632 27
Precedence Constraints WS 1, WS 2, WS 3, WS 4 WS 1 < WS 3, WS 2 < WS 4 Selectivities : 2, 1, 0. 1 Uniform cost of WS 1, WS 2, WS 3 with WS 4 varied from 0. 4 to 2 CS 632 28
Data Chunking WS 1, WS 2, WS 3, WS 4 No precedence constraints Uniform cost Selectivity set to 0. 5 Web Services are arranged in linear pipeline (Optimizer) Equal chunk size CS 632 29
WSMS Cost Vs Bottleneck Cost No precedence constraints Uniform web service costs Selectivity set to 0. 5 Web Services arranged in linear pipeline CS 632 30
Future Work Different input tuples to follow different plans Adaptive plans that changes with response times Web Services with monetary costs Multiple web services for same data Profiling techniques that track response time and selectivities Caching Techniques at WSMS CS 632 31
Conclusion Web Service Management System Bottleneck cost – cost of pipelined plan Optimal pipelined plan respecting precedence constraints Optimal chunk size CS 632 32
References Query Optimization over Web Services U. Srivastava, J. Widom, K. Munagala, and R. Motwani Query optimization in the presence of limited access patterns. In Proc. of ACM SIGMOD Conf. on Management of Data
Thank You! CS 632 34
- Slides: 34