Oracle Optimizer Combining Output From Multiple Index Scans
Oracle Optimizer
Combining Output From Multiple Index Scans • AND-EQUAL: – select * from sailors where sname = 'Jim' and rating = 10 • Suppose we have 2 indexes: sname, rating TABLE ACCESS BY ROWID AND-EQUAL INDEX RANGE SCAN Sailors(sname) INDEX RANGE SCAN Sailors(rating) • Suppose we also have an index on (sname, rating) – How should the query be performed?
Operations that Manipulate Data Sets • Up until now, all operations returned the rows as they were found • There are operations that must find all rows before returning a single row • Try to avoid these operations for online users! – SORT ORDER BY: query with order by select sname, age from Sailors order by age;
Operations that Manipulate Data Sets – SORT UNIQUE: sorting records while eliminating duplicates e. g. , query with distinct; query with minus, intersect or union select DISTINCT age from Sailors; – SORT AGGREGATE, SORT GROUP BY: queries with aggregate or grouping functions (like MIN, MAX)
Is the table always accessed? What if there is no index?
Operations that Manipulate Data Sets • Consider the query: – select sname from sailors union select bname from boats;
Operations that Manipulate Data Sets • Consider the query: – select sname from sailors minus select bname from boats; How do you think that Oracle implements intersect? union all?
Operations that Manipulate Data Sets • Select age, COUNT(*) from Sailors GROUP BY age SORT GROUP BY TABLE ACCESS FULL
Distinct • What should Oracle do when processing the query (assuming that sid is the primary key): – select distinct sid from Sailors
Join Methods • Select * from Sailors, Reserves where Sailors. sid = Reserves. sid • Oracle can use an index on Sailors. sid or on Reserves. sid (note that both will not be used) • Join Methods: MERGE JOIN, NESTED LOOPS, HASH JOIN
Nested Loops Joins • Block nested loop join NESTED LOOPS TABLE ACCESS FULL OF our_outer_table TABLE ACCESS FULL OF our_inner_table • Index nested loop join NESTED LOOPS TABLE ACCESS FULL OF our_outer_table TABLE ACCESS BY ROWID OF our_inner_table INDEX RANGE SCAN OF inner_table_index
When Are Nested Loops Joins Used? • If tables are of unequal size • If results should be returned online
Hash Join //Partition R into k partitions foreach tuple r in R do //flush when fills read r and add it to buffer page h(ri) foreach tuple s in S do //flush when fills read s and add it to buffer page h(sj) for l = 1. . k //Build in-memory hash table for Rl using h 2 foreach tuple r in Rl do read r and insert into hash table with h 2 foreach tuple s in Sl do read s and probe table using h 2 output matching pairs <r, s>
Hash Join Plan HASH JOIN TABLE ACCESS FULL OF table_A TABLE ACCESS FULL OF table_B
When Are Hash Joins Used? • If tables are small • If results should be returned online
Sort-Merge Join Plan MERGE JOIN SORT JOIN TABLE ACCESS FULL OF table_A SORT JOIN TABLE ACCESS FULL OF table_B
When Are Sort/Merge Joins Used? • Performs badly when tables are of unequal size. Why?
Hints • You can give the optimizer hints about how to perform query evaluation • Hints are written in /*+ */ right after the select • Note: These are only hints. The oracle optimizer can choose to ignore your hints
Examples Select /*+ FULL (sailors) */ sid From sailors Where sname=‘Joe’; Select /*+ INDEX (sailors s_ind) */ sid From sailors S, reserves R Where S. sid=R. sid AND sname=‘Joe’;
More Examples Select /*+ USE_NL (sailors) */ sid From sailors S, reserves R Where S. sid=R. sid AND sname=‘Joe’; inner table Select /*+ USE_MERGE (sailors, reserves) */ sid From sailors S, reserves R Where S. sid=R. sid AND sname=‘Joe’; Select /*+ USE_HASH */ sid From sailors S, reserves R Where S. sid=R. sid AND sname=‘Joe’;
Information Retrieval and DB
CONTAINS • Introduce text search in SQL • CONTAINS operator select Name from article where CONTAINS(abstract, ‘play’) > 0; • Can combine OR, AND
Stemming • Given the “stem” of a word, Oracle will expand the list of words to search for to include all words having the same stem – Stem of plays, played, playing, playful: play – where CONTAINS(abstract, ‘$play’) > 0;
Ranking • We need to rank between the retrieved tuples according to their relevance – Open challenge – Several implementations for oracle The following slides are based on those of Dr. Sara Cohen
The Vector Space Model • The Vector Space Model (VSM) is a way of representing text data through the words that they contain • It is a standard technique in Information Retrieval • In the following, we call this text data, document (classical IR) • The VSM allows decisions to be made about which documents are similar to each other and to keyword queries
How Does it Work? • Each document is represented as a vector which contains a value for each word in the vocabulary – this value is 0, if the word does not appear in the document • Similarly, a query is represented as a vector • The rank of the document with respect the query is the distance between their vectors
Example: Boolean Value • P 1 = “I live in a green house with a green roof” • P 2 = “There is no life form on Mars” • P 3 = “Men love green cars” • P 4 = “I saw some little green men yesterday” 1 if the word appears, 0 otherwise
Example: Boolean Value • P 1 = “I live in a green house with a green roof” • P 2 = “There is no life form on Mars” • P 3 = “Men love green cars” • P 4 = “I saw some little green men yesterday” Vector for P 1
Example: Boolean Value • Q = green OR mars
Distance Between Vectors • For two vectors d and d’ the cosine distance between d and d’ is given by: • d d’ is the scalar product of d and d’, calculated by multiplying corresponding values together • |d| is the norm of d • The “cosine measure” calculates the cosine between the vectors in a high-dimensional virtual space
Distance Between Documents t 3 d 2 d 3 d 1 θ φ t 1 d 5 t 2 d 4
Example • Consider the query Q="green men" and the document P 3 = "Men love green cars" • The cosine distance: – scalar product: 1*0 + 1*1+ 1*0 + 1*1 = 2 – norms: (12 + 12 ) = 2 (02 + 12 + 02 + 12 ) = 2 – Similarity: 2/(2 2) = 1/ 2 Only dimensions that are nonzero in one of the vectors are shown
Defining Vector Values: TF • Instead of boolean value, put word frequency (called tf, for "term frequency") • What affect does this give? • Sometimes a normalized version is used: – term frequency/number of words in the document
Normalized TF Always: Sum = 1
Another Option: Defining Vector Values as IDF • We can combine TF with IDF, inverse document frequency – 1/(number of documents containing the word) • What is the affect?
Normalized IDF • Sometimes a normalized version is used: • The logarithm gives less influence to IDF when TF and IDF are combined • What is the value for a word that appears in all documents? Why? Number of documents in which w appears
Standard Measure is TF-IDF • Use normalized TF times normalized IDF • Note: Once the values are chosen (using any of the schemes considered), we use cosine distance to compare the document and query
XML (Extensible Markup Language) and the Semi-Structured Data Model
Motivation • We have seen that relational databases are very convenient to query. However: – There is a LOT of data not in relational databases!! • Perhaps the most widely accessed database is the web, and it certainly isn’t a relational database.
Querying the Web • The web can be queried using a search engine, however, we can’t ask questions like: – What is the lowest price for which a Jaguar is sold on the web? • Problems: – There are no facilities for asking complex questions, such as aggregation of data
Understanding the Web • In order to query the web, we must be able to understand it. • 2 Computer Science Approaches: – Artificial Intelligence Approach – Database Approach
Database Approach “The web is unstructured and we will structure it” • Sometimes problems that are very difficult can be solved easily by enforcing a standard • Encourage the use of XML as a standard for data exchange on the web
Example XML Document <addresses > <person friend="yes"> Opening Tag <name> Jeff Cohen</name> <tel> 04 -828 -1345 </tel> <tel> 054 -470 -778 </tel> <email> jeffc@cs. technion. ac. il </email> </person> Attribute <person friend="no"> <name> Irma Levy</name> <tel> 03 -426 -1142 </tel> <email>irmal@yourmail. com</email> </person> </addresses> Closing Tag Element
Very Unstructured XML <? xml version=“ 1. 0”? > <Damage. Report> The insured’s <Vehicle Make = “Toyota”> Corolla </Vehicle> broke through the guard rail and plummeted into the ravine. The cause was determined to be <Cause>faulty brakes </Cause>. Amazingly there were no casualties. </Damage. Report>
XML Vs. HTML • XML and HTML are brothers. They are both special cases of SGML. • HTML has specific tag and attribute names. These are associated with a specific meaning • XML can have any tag and attribute name. These are not associated with any meaning • HTML is used to specify visual style • XML is used to specify meaning
A Different Data Model Relational Semi-Structured Abstract Sets of Labeled Directed Model tuples Graph Concrete Tables XML Documents Standard Storing Data Exchange for Data Separating Content Model from Style
Data Exchange • Problem: Many data sources, each of a different type (different vendor), with a different schema. – How can the data be combined and used together? – How can different companies collaborate on their data? – What format should be used to exchange the data?
Separating Content from Style • Web sites develop over time • Important to separate style from data in order to allow changes to the site structure and appearance • Using XML, we can store data alone • CSS separates style from data only in a limited way • Using XSL, this data can be translated into HTML • The data can be translated differently as the site develops
Write Once Use Everywhere XML Data XS L WML (hand-held devices) XSL HTML (web browser L S X TEXT (Excel)
Using XML • Quering and Searching XML: There are query languages and search engines that query XML and return XML. Examples: Xpath, Xquery /SQL 4 X, Equix, XSEarch • Displaying XML: An XML document can have an associated style-sheet which specifies how the document should be translated to HTML. Examples: CSS, XSL
DTD: Document Type Descriptors • Document Type Descriptors (DTDs) impose structure on an XML document • There is some relationship between a DTD and a schema
- Slides: 51