Managing XML and Semistructured Data Lecture 5 Query

  • Slides: 16
Download presentation
Managing XML and Semistructured Data Lecture 5: Query Languages - Lorel and Un. QL

Managing XML and Semistructured Data Lecture 5: Query Languages - Lorel and Un. QL Prof. Dan Suciu Spring 2001

In this lecture • A core query language • Lorel • Un. QL Resources:

In this lecture • A core query language • Lorel • Un. QL Resources: Un. QL: A Query Language and Algebra for Semistructured Data Based on Structural Recursion Buneman, Fernandez, Suciu. VLDBJ 2000 The Lorel Query Language for Semistructured Data by Abiteboul, Quass, Mc. Hugh, Widom, Wiener, in International Journal on Digital Libraries, 1997.

A Core Query Language Will illustrate with: biblio DB = &o 1 paper &o

A Core Query Language Will illustrate with: biblio DB = &o 1 paper &o 12 book &o 24 &o 29 . . . title author &o 47 Smith date &o 48 1999 title author &o 52 &o 50 &96 date &25 1976 Roux Combalusier Database Systems &30 Database Systems

SELECT author: X FROM biblio. book. author X Query 1: answer biblio &o 1

SELECT author: X FROM biblio. book. author X Query 1: answer biblio &o 1 paper book author &o 12 &o 24 &o 29 . . . title author &o 47 Smith date &o 48 1999 title author &o 52 &o 50 author &96 date &25 1976 Roux Combalusier Database Systems Answer = {author: “Smith”, author: “Roux”, author: “Comalusier”} &30 Database Systems

SELECT row: X FROM biblio. _ X WHERE “Smith” in X. author Query 2:

SELECT row: X FROM biblio. _ X WHERE “Smith” in X. author Query 2: row biblio . . . &o 1 paper &o 12 book &o 24 title author &o 47 Smith date &o 48 1999 Answer = {row: {author: “Smith”, date: 1999, title: “Database…”}, row: … } row &o 29 . . . author answer title author &o 52 &o 50 &96 date &25 1976 Roux Combalusier Database Systems &30 Database Systems

SELECT row: ( SELECT author: Y Query 3: FROM X. author Y) FROM biblio.

SELECT row: ( SELECT author: Y Query 3: FROM X. author Y) FROM biblio. book X row answer biblio &o 1 paper &o 12 book &o 29 . . . &o 47 Smith &o 48 1999 title author &o 52 &o 50 author title author date &a 2 author &o 24 author row &a 1 book &96 date &25 1976 Roux Combalusier Database Systems Answer = {row: {author: “Smith”}, row: {author: “Roux”, author: “Combalusier”, }, } &30 Database Systems

SELECT ( SELECT row: {author: Y, title: T} FROM X. author Y, X. title

SELECT ( SELECT row: {author: Y, title: T} FROM X. author Y, X. title T) FROM biblio. book X WHERE “Roux” in X. author Query 4: row answer biblio &o 1 paper &o 12 &a 1 book &o 24 author &o 29 . . . &o 47 Smith date &o 48 1999 title author &o 52 &o 50 title &a 2 author title author row &96 date &25 1976 Roux Combalusier Database Systems &30 Answer = {row: {author: “Roux”, title: “Database…”}, row: {author: “Combalusier”, title: “Database…”}, } Database Systems (Query has typo in the book )

Formal Semantics • Given query Q = and database DB • SELECT E[X 1,

Formal Semantics • Given query Q = and database DB • SELECT E[X 1, …, Xn] FROM F WHERE C Answer(Q, DB) is defined in two steps: – Step 1: compute all bindings: • Cij are node oids or atomic values • Must satisfy paths in F • Must satisfy conditions in C X 1 X 2 … Xn … … Ci 1 Ci 2 … Cin … … – Step 2: answer is E[C 11, …, C 1 n] … E[Cm 1, …, Cmn]

Formal Semantics • When E has nested subqueries, apply semantics recursively • Note: so

Formal Semantics • When E has nested subqueries, apply semantics recursively • Note: so far we have dealt with an unordered model – What do we need to do for order ? • Complexity: PTIME in |DB| (not in |Q|).

Lorel • Minor syntactic differences in regular path expressions (% instead of _, #

Lorel • Minor syntactic differences in regular path expressions (% instead of _, # instead of _*) • Common path convention: SELECT biblio. book. author FROM biblio. book WHERE biblio. book. year = 1999 becomes: SELECT X. author FROM biblio. book X WHERE X. year = 1999

Lorel • Existential variables: SELECT biblio. book. year FROM biblio. book WHERE biblio. book.

Lorel • Existential variables: SELECT biblio. book. year FROM biblio. book WHERE biblio. book. author = “Roux” – What happens with books having multiple authors ? Author is existentially quantified: SELECT X. year FROM biblio. book X, X. author Y WHERE Y = “Roux”

Lorel • Path variables. @P in: SELECT P FROM biblio. # @P X –

Lorel • Path variables. @P in: SELECT P FROM biblio. # @P X – What happens on graphs with cycles ? • Constructing new results – Several default rules • Casting between datatypes – Very useful in practice

Un. QL Patterns: SELECT row: X WHERE {biblio. book: {author “Roux”, title X}} in

Un. QL Patterns: SELECT row: X WHERE {biblio. book: {author “Roux”, title X}} in DB, Equivalent to: SELECT row: X FROM biblio. book Y, Y. author Z, Y. title X WHERE Z=“Roux”

Un. QL Label variables: – “find all publication types and their titles where Roux

Un. QL Label variables: – “find all publication types and their titles where Roux is an author” SELECT row: {type: L, title : X} WHERE {biblio. L: {author “Roux”, title X}} in DB,

Un. QL Unrestricted use of label variables creates problems: SELECT row: {type: L, title

Un. QL Unrestricted use of label variables creates problems: SELECT row: {type: L, title : Y} WHERE {biblio. (book|L). title X} in DB, SELECT row: {type: L, title : Y} WHERE {biblio. (L)*. title X} in DB,

Un. QL In Un. QL regular path expressions cannot contain label variables: Pat :

Un. QL In Un. QL regular path expressions cannot contain label variables: Pat : : = Var | Const | {L 1: Pat 1, …, Ln: Patn} L : : = Regular. Path. Expression | Label. Variable