Oracle Optimizer Combining Output From Multiple Index Scans

Oracle Optimizer

Combining Output From Multiple Index Scans • AND-EQUAL: – select * from sailors where sname = 'Jim' and rating = 10 • Suppose we have 2 indexes: sname, rating TABLE ACCESS BY ROWID AND-EQUAL INDEX RANGE SCAN Sailors(sname) INDEX RANGE SCAN Sailors(rating) • Suppose we also have an index on (sname, rating) – How should the query be performed?

Operations that Manipulate Data Sets • Up until now, all operations returned the rows as they were found • There are operations that must find all rows before returning a single row • Try to avoid these operations for online users! – SORT ORDER BY: query with order by select sname, age from Sailors order by age;

Operations that Manipulate Data Sets – SORT UNIQUE: sorting records while eliminating duplicates e. g. , query with distinct; query with minus, intersect or union select DISTINCT age from Sailors; – SORT AGGREGATE, SORT GROUP BY: queries with aggregate or grouping functions (like MIN, MAX)

Is the table always accessed? What if there is no index?

Operations that Manipulate Data Sets • Consider the query: – select sname from sailors union select bname from boats;

Operations that Manipulate Data Sets • Consider the query: – select sname from sailors minus select bname from boats; How do you think that Oracle implements intersect? union all?

Operations that Manipulate Data Sets • Select age, COUNT(*) from Sailors GROUP BY age SORT GROUP BY TABLE ACCESS FULL

Distinct • What should Oracle do when processing the query (assuming that sid is the primary key): – select distinct sid from Sailors

Join Methods • Select * from Sailors, Reserves where Sailors. sid = Reserves. sid • Oracle can use an index on Sailors. sid or on Reserves. sid (note that both will not be used) • Join Methods: MERGE JOIN, NESTED LOOPS, HASH JOIN

Nested Loops Joins • Block nested loop join NESTED LOOPS TABLE ACCESS FULL OF our_outer_table TABLE ACCESS FULL OF our_inner_table • Index nested loop join NESTED LOOPS TABLE ACCESS FULL OF our_outer_table TABLE ACCESS BY ROWID OF our_inner_table INDEX RANGE SCAN OF inner_table_index

When Are Nested Loops Joins Used? • If tables are of unequal size • If results should be returned online

Hash Join //Partition R into k partitions foreach tuple r in R do //flush when fills read r and add it to buffer page h(ri) foreach tuple s in S do //flush when fills read s and add it to buffer page h(sj) for l = 1. . k //Build in-memory hash table for Rl using h 2 foreach tuple r in Rl do read r and insert into hash table with h 2 foreach tuple s in Sl do read s and probe table using h 2 output matching pairs <r, s>

Hash Join Plan HASH JOIN TABLE ACCESS FULL OF table_A TABLE ACCESS FULL OF table_B

When Are Hash Joins Used? • If tables are small • If results should be returned online

Sort-Merge Join Plan MERGE JOIN SORT JOIN TABLE ACCESS FULL OF table_A SORT JOIN TABLE ACCESS FULL OF table_B

When Are Sort/Merge Joins Used? • Performs badly when tables are of unequal size. Why?

Hints • You can give the optimizer hints about how to perform query evaluation • Hints are written in /*+ */ right after the select • Note: These are only hints. The oracle optimizer can choose to ignore your hints

Examples Select /*+ FULL (sailors) */ sid From sailors Where sname=‘Joe’; Select /*+ INDEX (sailors s_ind) */ sid From sailors S, reserves R Where S. sid=R. sid AND sname=‘Joe’;

More Examples Select /*+ USE_NL (sailors) */ sid From sailors S, reserves R Where S. sid=R. sid AND sname=‘Joe’; inner table Select /*+ USE_MERGE (sailors, reserves) */ sid From sailors S, reserves R Where S. sid=R. sid AND sname=‘Joe’; Select /*+ USE_HASH */ sid From sailors S, reserves R Where S. sid=R. sid AND sname=‘Joe’;

Information Retrieval and DB

CONTAINS • Introduce text search in SQL • CONTAINS operator select Name from article where CONTAINS(abstract, ‘play’) > 0; • Can combine OR, AND

Stemming • Given the “stem” of a word, Oracle will expand the list of words to search for to include all words having the same stem – Stem of plays, played, playing, playful: play – where CONTAINS(abstract, ‘$play’) > 0;

Ranking • We need to rank between the retrieved tuples according to their relevance – Open challenge – Several implementations for oracle The following slides are based on those of Dr. Sara Cohen

The Vector Space Model • The Vector Space Model (VSM) is a way of representing text data through the words that they contain • It is a standard technique in Information Retrieval • In the following, we call this text data, document (classical IR) • The VSM allows decisions to be made about which documents are similar to each other and to keyword queries

How Does it Work? • Each document is represented as a vector which contains a value for each word in the vocabulary – this value is 0, if the word does not appear in the document • Similarly, a query is represented as a vector • The rank of the document with respect the query is the distance between their vectors

Example: Boolean Value • P 1 = “I live in a green house with a green roof” • P 2 = “There is no life form on Mars” • P 3 = “Men love green cars” • P 4 = “I saw some little green men yesterday” 1 if the word appears, 0 otherwise

Example: Boolean Value • P 1 = “I live in a green house with a green roof” • P 2 = “There is no life form on Mars” • P 3 = “Men love green cars” • P 4 = “I saw some little green men yesterday” Vector for P 1

Example: Boolean Value • Q = green OR mars

Distance Between Vectors • For two vectors d and d’ the cosine distance between d and d’ is given by: • d d’ is the scalar product of d and d’, calculated by multiplying corresponding values together • |d| is the norm of d • The “cosine measure” calculates the cosine between the vectors in a high-dimensional virtual space

Distance Between Documents t 3 d 2 d 3 d 1 θ φ t 1 d 5 t 2 d 4

Example • Consider the query Q="green men" and the document P 3 = "Men love green cars" • The cosine distance: – scalar product: 1*0 + 1*1+ 1*0 + 1*1 = 2 – norms: (12 + 12 ) = 2 (02 + 12 + 02 + 12 ) = 2 – Similarity: 2/(2 2) = 1/ 2 Only dimensions that are nonzero in one of the vectors are shown

Defining Vector Values: TF • Instead of boolean value, put word frequency (called tf, for "term frequency") • What affect does this give? • Sometimes a normalized version is used: – term frequency/number of words in the document

Normalized TF Always: Sum = 1

Another Option: Defining Vector Values as IDF • We can combine TF with IDF, inverse document frequency – 1/(number of documents containing the word) • What is the affect?

Normalized IDF • Sometimes a normalized version is used: • The logarithm gives less influence to IDF when TF and IDF are combined • What is the value for a word that appears in all documents? Why? Number of documents in which w appears

Standard Measure is TF-IDF • Use normalized TF times normalized IDF • Note: Once the values are chosen (using any of the schemes considered), we use cosine distance to compare the document and query

XML (Extensible Markup Language) and the Semi-Structured Data Model

Motivation • We have seen that relational databases are very convenient to query. However: – There is a LOT of data not in relational databases!! • Perhaps the most widely accessed database is the web, and it certainly isn’t a relational database.

Querying the Web • The web can be queried using a search engine, however, we can’t ask questions like: – What is the lowest price for which a Jaguar is sold on the web? • Problems: – There are no facilities for asking complex questions, such as aggregation of data

Understanding the Web • In order to query the web, we must be able to understand it. • 2 Computer Science Approaches: – Artificial Intelligence Approach – Database Approach

Database Approach “The web is unstructured and we will structure it” • Sometimes problems that are very difficult can be solved easily by enforcing a standard • Encourage the use of XML as a standard for data exchange on the web

Example XML Document <addresses > <person friend="yes"> Opening Tag <name> Jeff Cohen</name> <tel> 04 -828 -1345 </tel> <tel> 054 -470 -778 </tel> <email> jeffc@cs. technion. ac. il </email> </person> Attribute <person friend="no"> <name> Irma Levy</name> <tel> 03 -426 -1142 </tel> <email>irmal@yourmail. com</email> </person> </addresses> Closing Tag Element

Very Unstructured XML <? xml version=“ 1. 0”? > <Damage. Report> The insured’s <Vehicle Make = “Toyota”> Corolla </Vehicle> broke through the guard rail and plummeted into the ravine. The cause was determined to be <Cause>faulty brakes </Cause>. Amazingly there were no casualties. </Damage. Report>

XML Vs. HTML • XML and HTML are brothers. They are both special cases of SGML. • HTML has specific tag and attribute names. These are associated with a specific meaning • XML can have any tag and attribute name. These are not associated with any meaning • HTML is used to specify visual style • XML is used to specify meaning

A Different Data Model Relational Semi-Structured Abstract Sets of Labeled Directed Model tuples Graph Concrete Tables XML Documents Standard Storing Data Exchange for Data Separating Content Model from Style

Data Exchange • Problem: Many data sources, each of a different type (different vendor), with a different schema. – How can the data be combined and used together? – How can different companies collaborate on their data? – What format should be used to exchange the data?

Separating Content from Style • Web sites develop over time • Important to separate style from data in order to allow changes to the site structure and appearance • Using XML, we can store data alone • CSS separates style from data only in a limited way • Using XSL, this data can be translated into HTML • The data can be translated differently as the site develops

Write Once Use Everywhere XML Data XS L WML (hand-held devices) XSL HTML (web browser L S X TEXT (Excel)

Using XML • Quering and Searching XML: There are query languages and search engines that query XML and return XML. Examples: Xpath, Xquery /SQL 4 X, Equix, XSEarch • Displaying XML: An XML document can have an associated style-sheet which specifies how the document should be translated to HTML. Examples: CSS, XSL

DTD: Document Type Descriptors • Document Type Descriptors (DTDs) impose structure on an XML document • There is some relationship between a DTD and a schema