Completeness of Queries over SQL Databases Werner Nutt
Completeness of Queries over SQL Databases Werner Nutt and Simon Razniewski
Introduction � Data Quality research investigates how good data is � Dimensions of Data Quality are: � Correctness � Timeliness � Completeness 2 Completeness of Queries over SQL Databases 31. 10. 2012
Completeness � Query answering over incomplete data: extensively studied � Codd: NULL values [1975] � Imielinski/Lipski: Representation systems [1984] � Query completeness: Little attention � Razniewski/Nutt: 3 Only on missing records [VLDB 2011] Completeness of Queries over SQL Databases 31. 10. 2012
Bolzano is in the Province of South Tyrol Bolzano � Trilingual province in the north of Italy � Has its own school administration 4 Completeness of Queries over SQL Databases 31. 10. 2012
Incompleteness in the school data Available database Ideal database Facts in real world result(Paul, Music, A) result(Giulia, Music, A) Facts in school database result(Paul, Music, NULL) Missing information in the school database: - no grade for Paul (missing value) - no entry for Giulia (missing record) 5 Completeness of Queries over SQL Databases 31. 10. 2012
Consequence: Query answers are incorrect Query Q: ”How many pupils have grade A in Music? ” According to ideal database: According to available database: Q( result(Paul, Music, A) result(Giulia, Music, A) )=2 result(Paul, Q( Music, NULL) )=0 If data is incomplete, query answers become incorrect. 6 Completeness of Queries over SQL Databases 31. 10. 2012
Use Metadata to guarantee completeness!. . . vocational schools use Completeness cannot be checked by inspecting a the information system of the. . . primary schools database: province � One cannot see what is missing! took part in a survey to manage grades of music education However, we may know whether parts of a db are complete, e. g. , � “The � grades from vocational schools are complete” “The Music grades from primary schools are complete” Idea: Assess completeness of a query 7 Completeness of assertions Queries over SQL for Databases 31. 10. 2012 using completeness (parts of) tables
Reasoning about query completeness Grades from vocational schools are complete I want to know “How many pupils have grade A in Music? “ Can I trust the query answer? All Music grades from primary schools are complete Biology grades from high schools are complete Space of possible information Assertions about partial completeness 8 Completeness of Queries over SQL Databases You cannot, because information about pupils from high schools could be missing 31. 10. 2012
Reasoning about query completeness (2) Grades from vocational schools are complete I want to know “How many pupils at vocational schools have grade A in Music? “ Can I trust the query answer? All Music grades from primary schools are complete 1. Biology grades from high schools are Formalization: complete 2. Reasoning methods Space of possible information Assertions about partial • incomplete dbs completeness • assertions about db completeness 9 Completeness of Queries over SQL Databases You can, because all needed information is complete in the database 3. Implementation techniques [Demo today] 31. 10. 2012
Running example: Schema result(name, subject, grade) pupil(name, school. Name, school. Type) 10 Completeness of Queries over SQL Databases 31. 10. 2012
Formalization: Incomplete database When talking about incompleteness, we need a complete reference An incomplete database D is a pair of an ideal database Di and an available database Da D = (Di, Da) such that Di each record in Da is less informative than some record in 11 Completeness of Queries over SQL Databases 31. 10. 2012
Example: Incomplete database Di Da less informative than result(Paul, Music, A) result(Giulia, Music, A) pupil(Paul, Verdi, Voc) 12 result(Paul, Music, NULL) Completeness of Queries over SQL Databases 31. 10. 2012
Formalization: Query completeness [Motro 1989] 13 Completeness of Queries over SQL Databases 31. 10. 2012
Formalization: DB completeness � This is a tuplegenerating dependency (TGD) 14 Completeness of Queries over SQL Databases 31. 10. 2012
Example: DB completeness � 15 Di Da result(Paul, Music, A) result(Giulia, Music, A) pupil(Paul, Verdi, Voc) result(Paul, Music, NULL) Completeness of Queries over SQL Databases 31. 10. 2012
The reasoning problem I want to know “How many pupils at vocational schools have taken Music? “ Can I trust the query answer? Grades from vocational schools are complete All Music grades Table from primary schools are completeness complete statements C Biology grades from high schools are complete Query Q Space of possible information Assertions about partial completeness Does C imply Compl(Q)? 16 Completeness of Queries over SQL Databases 31. 10. 2012
Reasoning: The principle Query: “Pupils at vocational schools that took Music“ Qpupils(x): -result(x, Music, g), pupil(x, sn, Voc) 1. Assume Qpupils returns x‘ over Di 2. See which facts must be in Di … result(x‘, Music, g‘) pupil(x‘, sn‘, Voc) … 17 Completeness of Queries over SQL Databases 31. 10. 2012
Reasoning: The principle (2) � Di … result(x‘, Music, g‘) pupil(x‘, sn‘, Voc) … 18 Da … result(x‘, Music, NULL) pupil (x‘, sn‘, Voc) … Completeness of Queries over SQL Databases 31. 10. 2012
Reasoning: The principle (3) Da … result(x‘, Music, g‘) pupil (x‘, sn‘, Voc) … 4. Query the available database “Pupils at vocational schools that took Music“ Q(Da) = {x'} x' is also in Q(Da) Conclusion: Query Q is complete given the table completeness statements 19 Completeness of Queries over SQL Databases 31. 10. 2012
Reasoning summary Is that 1. Assume, Q returns a generic answer x‘ over Di unique? 2. 3. 4. 5. See which facts must be in Di Use table completeness to derive facts in Da Evaluate Q(Da) How to If x‘ is returned, the query is complete evaluate over databases with Reasoning is NP-complete for DBs without NULLs? [Razniewski/Nutt VLDB 2011] 20 Completeness of Queries over SQL Databases 31. 10. 2012
What is the Meaning of NULL? result(Paul, Pottery, NULL) � No grades were given in the Pottery course? Non-existing value � Paul � It received a grade, but the grade was not recorded? Unknown value is unknown, which of the two is the case? Ambiguous NULLs may indicate incomplete information, but need not Usage of NULLs is ambiguous 21 Completeness of Queries over SQL Databases 31. 10. 2012
Reasoning over databases w/ NULLs Qpupils(x): -result(x, Music, A), pupil(x, sn, Voc) “Pupils at vocational schools with A in Mu 1. Assume Qpupils returns x‘ over Di 2. See which facts must be in Di result(x‘, Music, A) pupil(x‘, sn‘ , Voc) result(x‘, Music, A) pupil(x‘, ? ? , Voc) 22 result(x‘, Music, A) pupil(x‘, NULL, Voc) Completeness of Queries over SQL Databases 31. 10. 2012
Challenge 1: How can we adapt the reasoning to NULLs? � 23 Completeness of Queries over SQL Databases 31. 10. 2012
Reasoning with NULLs: Complexity NULLs mean unknown inapplicable values 24 Queries w/o selfjoins PTIME Queries w/ selfjoins NP-complete PTIME Completeness of Queries over SQL Databases 31. 10. 2012
Challenge 2: How to compute answers of complete queries? Q(g) : - result(Paul, s, g) Da result(Paul, Pottery, NULL) result(Paul, Pottery, A) result(Paul, Music, B) Q(Da) = {NULL, A, B} ”All grades of Paul” Q is complete over (Di, Da) if NULL stands for a non-existing value {A, B} if NULL stands for an unknown value 25 Completeness of Queries over SQL Databases 31. 10. 2012
Reasoning with both kinds of NULLs Result: If a query is complete, tuples that contain unknown NULLs can be forgotten in the query answer 26 Completeness of Queries over SQL Databases 31. 10. 2012
Make different NULLs explicit Ambiguity can be resolved by boolean guards result name subject was. Graded grade Paul … yes B B Giulia … yes NULL Maria … no NULL Andre a … NULL Unknown Not applicable Unknown whether applicabl e Allows to count how many pupils received a grade (2 -3) In practice, boolean guards possibly already used where needed 27 Completeness of Queries over SQL Databases 31. 10. 2012
Outcome � 28 Completeness of Queries over SQL Databases 31. 10. 2012
Conclusion � Query completeness assessment is practically relevant � Reasoning � Demo 29 over SQL databases is possible at http: //magik-demo. inf. unibz. it/ Completeness of Queries over SQL Databases 31. 10. 2012
Questions? 30 Completeness of Queries over SQL Databases 31. 10. 2012
- Slides: 30