CS 422 Principles of Database Systems Introduction to

CS 422 Principles of Database Systems Introduction to Query Processing Chengyu Sun California State University, Los Angeles

DBMS Client • • • psql pg. Admin php. Pg. Admin user applications … Server DB SQL DB Results DB

Simple. DB Developed by Edward Sciore A simplified DBMS for educational purpose Source code http: //csns. calstatela. edu/wiki/content/c ysun/course_materials/cs 422/simpledb

Simple. DB Basics Server n n Server class: simpledb. server. Simple. DB Startup program: simpledb. server. Startup Clients n simpledb. client. SQLInterpreter n Some other programs

Query Processing Departments( d. Id, d. Name ) Students( s. Id, s. Name, major. Id ) select s. Name, d. Name from Students, Departments where major. Id = d. Id and s. Id = 1; What happens in a DBMS server when we run a query?

Query Processing in Simple. DB SQL Lexer Tokens Parser Result. Set Execution Query Planner

Query Parsing Analyze the query string and convert it into some data structure that can be used for query execution

Lexical Analysis Split the input string into a series of tokens select sname from students where sid = 1 keyword identifier delimiter constant

Token <type, value> Type Value keyword select identifier sname keyword from identifier students keyword where identifier id delimiter = intconstant 1

Simple. DB Token Types Single-character delimiter Integer constants String constants Keywords Identifiers

Simple. DB Lexer Implementation Java Stream. Tokenizer Number Word Quoted String Single-character Token Simple. DB Lexer Integer Keyword Identifier String Single-character Delimiter Example: Stream. Tokenizer. Test

Lexer API … The API used by the parser Iterate through the tokens n Check the current token – “Match” w match. Keyword(), match. Int. Constant() … n Consume the current token – “Eat” w eat. Keyword(), eat. Int. Constant() …

… Lexer API select sname from students where sid = 1 lexer. match. Keyword(“select”); lexer. eat. Keyword(“select”); current token select sname from students where sid = 1 current token

Syntax A set of rules that describes the strings that could possibly be meaningful statements Example: a syntactically wrong statement select from a and b where c – 3;

Part of Simple. DB Grammar … <Field> <Constant> <Expression> <Term> <Predicate> : = Id. Tok : = Str. Tok | Int. Tok : = <Field> | <Constant> : = <Expression> : = <Term> [ AND <Predicate> ] Backus-Naur Form (BNF)

… Part of Simple. DB Grammar <Query> : = SELECT <Select. List> FROM <Table. List> [ WHERE <Predicate> ] <Select. List> : = <Field> [ , <Select. List>] <Table. List> : = Id. Tok [ , <Table. List> ] <Create. Table> : = CREATE TABLE Id. Tok ( <Field. Defs> ) <Field. Defs> : = <Field. Def> [ , <Field. Defs> ] <Field. Def> : = Id. Tok <Type. Def> : = INT | VARCHAR ( Int. Tok ) Full Simple. DB Grammar in Textbook Figure 18 -4
![Recursive Definition in Grammar <Select. List> : = <Field> [ , <Select. List>] select Recursive Definition in Grammar <Select. List> : = <Field> [ , <Select. List>] select](http://slidetodoc.com/presentation_image_h2/a892eabe06e58d31a9b8b15286a2c142/image-17.jpg)
Recursive Definition in Grammar <Select. List> : = <Field> [ , <Select. List>] select a, b, c from t where x = 10; <Select. List> ? ?

Using Grammar Which of the following are valid Simple. DB SQL statements? ? create table students (id integer, name varchar(10)) select * from students;

From Grammar to Code … public Query. Data query() { lex. eat. Keyword( “select” ); Collection<String> fields = select. List(); lex. eat. Keyword( “from” ); Collection<String> tables = table. List(); Predicate pred = new Predicate(); if( lex. match. Keyword(“where”) ) { lex. eat. Keyword(“where”); pred = predicate(); } return new Query. Data( fields, tables, pred ); }

… From Grammar to Code public Collection<String> select. List() { Collection<String> L = new Array. List<String>(); L. add( field() ); if( lex. match. Delim(‘, ’) ) { lex. eat. Delim(‘, ’); L. add. All( select. List() ); } return L; } public String field() { return lex. eat. Id(); }

After Parsing select s. Name, d. Name from Students, Departments where major. Id = d. Id and s. Id = 1; Query. Data in Simple. DB: fields { “s. Name”, “d. Name” } tables { “Students”, “Departments” } pred { {“major. Id”, “d. Id”}, {“s. Id”, 1} More in the simpledb. parse package.

Query Planning Break a query into individual operations, and organize them into certain order, i. e. a query plan.

Relational Algebra Operations Selection, projection, product Join Rename Set operations: union, intersection, difference Extended Relation Algebra operations n n Duplicate elimination Sorting Extended projection, outer join Aggregation and grouping

Selection Input sid sname 1 Joe 2 Amy Output sid=1 sid 1 sname Joe

Projection Input sid sname Output sname 1 Joe 2 Amy

Product sid Input sname dept did dname 1 Joe 10 10 CS 2 Amy 20 20 Math sid sname Output dept did dname 1 Joe 10 10 CS 1 Joe 10 20 Math 2 Amy 20 10 CS 2 Amy 20 20 Math

Scan A scan is an interface to a RA operation implementation public interface Scan { public boolean next(); // move to the next result public int get. Int( String field. Name ); public String get. String( String field. Name ); }

Scan Example: Table. Scan public Table. Scan( Table. Info ti, Transaction tx ) { record. File = new Record. File( ti, tx ); } public boolean next() { return record. File. next(); } public int get. Int( String field. Name ) { return record. File. get. Int( field. Name ); } public int get. String( String field. Name ) { return record. File. get. String( field. Name ); }

Scan Example: Select. Scan public Select. Scan( Scan s, Predicate pred ) { this. s = s; this. pred = pred; } public boolean next() { while( s. next() ) if( pred. is. Satisfied(s) ) return true; return false; }

Query Execution select name from students where id = 1; Result next() Project. Scan (name) Select. Scan(sid=1) Table. Scan(students)

About Implementations of RA Operations Each RA operation can be implemented and optimized independently from others A RA operation may have multiple implementations n E. g. table scan vs. index scan for selection The efficiency of an implementation depends on the characteristics of the data

A Query Plan select s. Name, d. Name from Students, Departments where major. Id = d. Id and s. Id = 1; Projection: {s. Name, d. Name} Selection: major. Id=d. Id and s. Id=1 Product Students Departments

A Better Query Plan – Query Optimization Projection: {s. Name, d. Name} Selection: major. Id=d. Id Product Selection: s. Id=1 Students Departments

Readings Textbook Chapter 17, 18, 19
- Slides: 34