CSD 305 Advanced Databases Query Processing 1 Query
CSD 305 Advanced Databases Query Processing 1
• Query Processing Fundamentals • QP vs. RA • QP in MS SQL Server • Rule-based optimization • Cost-based optimization • • • Statistical query optimization Query Plans Search Conditions Access Methods Join Methods Query tuning CSD 305 Advanced Databases Agenda 2
Fundamentals of Query Processing SELECT title FROM Loan JOIN Book ON book. No JOIN Member ON member. No WHERE member. No = 4128 can be expressed as p title((rmember#=4128 Loan) Book) CSD 305 Advanced Databases A SQL statement is semantically rather similar to a specification in Relational Calculus and can be translated into an expression in Relational Algebra Member ) 3
Relational algebra reminder Book) Member ) CSD 305 Advanced Databases p title((rmember#=4128 Loan) 4
Notes on the algebra • Each RA operation can be binary or unary • And the result being a single relation CSD 305 Advanced Databases • Note that the expression can be represented as a tree • With each leaf representing a base table • And each node representing an RA operation 5
CSD 305 Advanced Databases Does this look at all familiar? This picture shows a Query Plan from SQL Server 6
• As we have seen Query Plans are rather similar to Relational Algebra • They take the form of binary trees • They produce intermediate results (relations) on the way to producing the answer • But • They are concerned with physical structures as well as logical structures • The operations are physical implementations of RA expressions • e. g. there are several ways to do a JOIN CSD 305 Advanced Databases QP vs. RA 7
• SQL requests are sent to the DBMS • The DBMS • parses the SQL • accesses metadata • prepares query plan(s) • chooses the cheapest plan • executes the chosen plan • Results in the form of a table are returned to the client CSD 305 Advanced Databases SQL Server QP Basics 8
CSD 305 Advanced Databases QP Schematic 9
CSD 305 Advanced Databases Inside the DBMS 10
What do the QP components do? • Within the SQL Processor • The Query Parser • The Query Optimizer • Generates plans for carrying out the query and chooses the best • Saves plans in the cache • The Query Executor • Picks plans from the cache • Carries out the chosen plan • Returns results to the client CSD 305 Advanced Databases • Ensures that the SQL is well-formed and is consistent with the schema 11
What do the QP components do? • Within the Storage Manager • The Page Manager • Processes the request for data at a logical level (the database page) • Processes the request at a physical level (disk input/output) CSD 305 Advanced Databases • The I/O Manager 12
Query Optimization • There are three basic techniques of query optimization • Rule-based • Cost-based • In which the costs of various Query Plans are estimated and compared • Statistical • In which the optimizer makes use of statistics about data distribution to improve Query Plans • Microsoft SQL Server uses a combination of all three to select a QP CSD 305 Advanced Databases • In which rules are applied to generate a Query Plan 13
Rule-based Query Optimization • Apply WHERE clause conditions first if it is possible • If you have an index and you know the key value, use the key for direct access • If you have an index and you know the leftmost part of the key value, use the key for direct access CSD 305 Advanced Databases • A sequence of heuristics (rules of thumb) is applied to construct a QP • Examples of rules 14
• Each QP is evaluated with respect to its estimated costs, including • Disk I/O • Number and size of rows processed • Use of processing cycles • The accuracy of cost estimation can vary greatly • e. g. what percentage of rows will be returned when the condition (WHERE sex = ‘F’) is applied? • 1%, 10%, 50%? • The Query Optimizer needs statistics to make a good estimate CSD 305 Advanced Databases Cost-based optimization 15
• The DBMS maintains statistics about the distribution of data values in tables and indexes • The statistics are stored as part of the metadata and used by the Query Optimizer • The statistics include • Densities of key values (calculated as 1 divided by the number of different values) • Sample values or range endpoints (up to 200 of them) CSD 305 Advanced Databases Statistical optimization 16
CSD 305 Advanced Databases Statistics Example 17
Analyzing Query Plans • Use SQL Management Studio • Choose “New Query” • On the Query toolbar • Run your SQL query • A graphical representation of the plan will be displayed • You can then click on any node to "drill down" to find more about what is going on • Example on next few slides CSD 305 Advanced Databases • Turn on “Display Estimated Execution Plan" option 18
CSD 305 Advanced Databases The SQL query and results 19
CSD 305 Advanced Databases The QP overview 20
CSD 305 Advanced Databases Drilling down on the first node 21
CSD 305 Advanced Databases …and the next 22
CSD 305 Advanced Databases …and the last 23
Some notes on the query plan • The query begins by seeking the class. Code in a nonclustered index • The bookmarks are used in the next step to look up the corresponding row in the clustered index • The rows are then returned to the client CSD 305 Advanced Databases • From this it will collect bookmarks 24
Summary • SQL queries are first parsed to establish syntactic correctness • The Query Optimizer generates query plan(s) • A query plan is somewhat similar to the RA • but incorporates physical considerations CSD 305 Advanced Databases • using rule-based, cost-based and statistical techniques 25
- Slides: 25