CQL a Common Query Language 1 What CQL

  • Slides: 42
Download presentation
CQL – a Common Query Language ? 1. What CQL is 2. Motivation 3.

CQL – a Common Query Language ? 1. What CQL is 2. Motivation 3. Examples and explanation 4. Applications 5. Implementation CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Chapter 1: What CQL is ● CQL is a query language: – For humans

Chapter 1: What CQL is ● CQL is a query language: – For humans to type – For query forms to generate – For translating other languages into CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Chapter 1: What CQL is ● ● CQL is a query language: – For

Chapter 1: What CQL is ● ● CQL is a query language: – For humans to type – For query forms to generate – For translating other languages into The only query language of SRW/SRU CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Chapter 1: What CQL is ● ● ● CQL is a query language: –

Chapter 1: What CQL is ● ● ● CQL is a query language: – For humans to type – For query forms to generate – For translating other languages into The only query language of SRW/SRU Also applicable in other contexts: – Z 39. 50 (instead of the Type-1 Query) – Vendor-neutral format for Metasearch CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Specifications and implementations ● CQL is a specification for expressing queries abstractly. – you

Specifications and implementations ● CQL is a specification for expressing queries abstractly. – you don't need to know the database schema. CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Specifications and implementations ● ● CQL is a specification for expressing queries abstractly. –

Specifications and implementations ● ● CQL is a specification for expressing queries abstractly. – you don't need to know the database schema. It has to be parsed by a CQL parser. – parser produces a form easy to program with. CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Specifications and implementations ● ● ● CQL is a specification for expressing queries abstractly.

Specifications and implementations ● ● ● CQL is a specification for expressing queries abstractly. – you don't need to know the database schema. It has to be parsed by a CQL parser. – parser produces a form easy to program with. It has to be executed by some specific database engine. – implementations will vary in what they support. CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Chapter 2: Motivation Most query languages fall into one of two camps: ● Complex

Chapter 2: Motivation Most query languages fall into one of two camps: ● Complex and powerful, but cryptic and hard to learn – SQL, Prefix Query Format (PQF), XML Query CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Chapter 2: Motivation Most query languages fall into one of two camps: ● ●

Chapter 2: Motivation Most query languages fall into one of two camps: ● ● Complex and powerful, but cryptic and hard to learn – SQL, Prefix Query Format (PQF), XML Query Easy to learn and use, but lacking in power – Google, Alta. Vista, CCL CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Chapter 2: Motivation Most query languages fall into one of two camps: ● ●

Chapter 2: Motivation Most query languages fall into one of two camps: ● ● Complex and powerful, but cryptic and hard to learn – SQL, Prefix Query Format (PQF), XML Query Easy to learn and use, but lacking in power – Google, Alta. Vista, CCL CQL aims to “make simple queries easy, and complex queries possible” (to paraphrase Larry Wall, of Perl) CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Learning curves for query languages Effort in learning query language SQL Power of query

Learning curves for query languages Effort in learning query language SQL Power of query that can be expressed CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Effort in learning query language Learning curves for query languages SQL Google Power of

Effort in learning query language Learning curves for query languages SQL Google Power of query that can be expressed CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Effort in learning query language Learning curves for query languages SQL CQL Google Power

Effort in learning query language Learning curves for query languages SQL CQL Google Power of query that can be expressed CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Chapter 3: Examples and explanation CQL – a Common Query Language Mike Taylor <mike@indexdata.

Chapter 3: Examples and explanation CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Chapter 3: Examples and explanation CQL – a Common Query Language Mike Taylor <mike@indexdata.

Chapter 3: Examples and explanation CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: simple terms Here are some perfectly good CQL queries: ● fish ●

CQL features: simple terms Here are some perfectly good CQL queries: ● fish ● Churchill ● dinosaur ● comp. sources. misc CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: quoting Double-quote marks remove the special meanings of special characters like space

CQL features: quoting Double-quote marks remove the special meanings of special characters like space (which otherwise separates tokens) and of keywords such as “and” and “or”. ● ● "dinosaur" "the complete dinosaur" "ext–>u. generic" "and" CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: quoting Double-quote marks remove the special meanings of special characters like space

CQL features: quoting Double-quote marks remove the special meanings of special characters like space (which otherwise separates tokens) and of keywords such as “and” and “or”. ● ● "dinosaur" "the complete dinosaur" "ext–>u. generic" "and" (Backslash removes the special meaning of following double-quote characters. ) ● "the "nuxi" problem" CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: booleans The keywords “and” and “or” are boolean operators. The keyword “not”

CQL features: booleans The keywords “and” and “or” are boolean operators. The keyword “not” is an and-not binary operator. There is no unary negation operator. Case is not significant, so “AND” and “a. Nd” also work. ● ● ● dinosaur or bird dinosaur not reptile dinosaur and bird and reptile dinosaur and bird or dinobird dinosaur not theropod not ornithischian CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: boolean precedence The “and”, “or” and “not” booleans all have equal precedence

CQL features: boolean precedence The “and”, “or” and “not” booleans all have equal precedence and are evaluated left-to-right. ● ● dinosaur and bird or dinobird MEANS (dinosaur and bird) or dinobird dinosaur or bird and dinobird MEANS (dinosaur or bird) and dinobird NOT dinosaur or (bird and dinobird) CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: parentheses Parentheses may be used to override the default left-to-right parsing of

CQL features: parentheses Parentheses may be used to override the default left-to-right parsing of boolean operators. ● ● ● dinosaur and (bird or dinobird) dinosaur or (bird and dinobird) (bird or dinosaur) and (feathers or scales) "feathered dinosaur" and (yixian or jehol) (((a and b) or (c not d) not (e or f and g)) and h not i) or j CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: pattern matching There are two pattern-matching characters: * matches any number of

CQL features: pattern matching There are two pattern-matching characters: * matches any number of characters ? matches any single character ● ● ● dinosaur* – matches “dinosaurs”, “dinosauria” *sauria – matches “dinosauria”, “carnosau man? raptor – matches “maniraptor”, “manuraptor” man? raptor* – matches the plurals of these "comp* *saur" – matches “complete dinosaur” CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: pattern matching There are two pattern-matching characters: * matches any number of

CQL features: pattern matching There are two pattern-matching characters: * matches any number of characters ? matches any single character ● ● ● dinosaur* – matches “dinosaurs”, “dinosauria” *sauria – matches “dinosauria”, “carnosau man? raptor – matches “maniraptor”, “manuraptor” man? raptor* – matches the plurals of these "comp* *saur" – matches “complete dinosaur” A preceding backslash removes their special meaning. ● char* CQL – a Common Query Language – matches literal “char*” Mike Taylor <mike@indexdata. com>

CQL features: indexes A term of the form name=value is a query for the

CQL features: indexes A term of the form name=value is a query for the specified value occurring within the named index. CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: indexes A term of the form name=value is a query for the

CQL features: indexes A term of the form name=value is a query for the specified value occurring within the named index. ● ● ● title=Churchill – finds biographies of Churchill author=Churchill – finds books written by him title=dinosaur and author=farlow title=(dinosaur and bird) subject=(dinosaur* or pterosaur*) Index names are case-insensitive, so “title” is the same index as “TITLE”, “Title” or “ti. TLe”. CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: prefixes The meaning of an index can be specified more fully by

CQL features: prefixes The meaning of an index can be specified more fully by a prefix indicating what context set it is from. The meaning of “title” is different in cross-domain searching (Dublin Core), bibliographic searching (Bath Profile) and heraldry. CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: prefixes The meaning of an index can be specified more fully by

CQL features: prefixes The meaning of an index can be specified more fully by a prefix indicating what context set it is from. The meaning of “title” is different in cross-domain searching (Dublin Core), bibliographic searching (Bath Profile) and heraldry. ● ● ● dc. title="the complete dinosaur" property. title=freehold heraldry. title=(viscount or duke) cql. server. Choice=fruit cql. result. Set=YXJja. GJpc 2 hvc. Ap Prefixes are case-insensitive. CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: context sets A context set is a set of indexes that are

CQL features: context sets A context set is a set of indexes that are related to a particular area (plus some other more esoteric stuff that you can ignore). For example, the Dublin Core context set contains indexes for searching against the fifteen DC elements: title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, rights. The context set prose must define their semantics. CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: some context sets A few core sets created by the SRW editorial

CQL features: some context sets A few core sets created by the SRW editorial board: ● ● CQL – for core indexes such as result. Set. Id DC – for metadata searching with Dublin Core Rec – metadata about the record, not the resource Net – network concepts such as host-name and port CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: some context sets A few core sets created by the SRW editorial

CQL features: some context sets A few core sets created by the SRW editorial board: ● ● CQL – for core indexes such as result. Set. Id DC – for metadata searching with Dublin Core Rec – metadata about the record, not the resource Net – network concepts such as host-name and port Also, many application-specific sets: ● ● ● Bath, Zthes, CCG, Music Rel – deep voodoo for relevance matching GILS and GEO are in development CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

A digression on the CQL context set The CQL context set is special. It

A digression on the CQL context set The CQL context set is special. It contains some “magic” indexes: CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

A digression on the CQL context set The CQL context set is special. It

A digression on the CQL context set The CQL context set is special. It contains some “magic” indexes: ● ● ● cql. anywhere – searches in all the indexes available cql. server. Choice – allows the server to choose whatever index or indexes are suitable cql. result. Set. Id – finds the records obtained in a previous search, e. g. for refinement by combining with other query terms. CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: relations Usually “=” connects an index with its relation, but all the

CQL features: relations Usually “=” connects an index with its relation, but all the other obvious numeric relations are supported: ● ● ● Height = 13 number. Of. Wheels <= 3 number. Of. Plates = 18 length. Of. Femur > 2. 4 Bio. Mass >= 100 Number. Of. Toes <> 3 CQL – a Common Query Language (inequality) Mike Taylor <mike@indexdata. com>

CQL features: special relations The keywords “any” and “all” can be used as relations,

CQL features: special relations The keywords “any” and “all” can be used as relations, indicating that any one of, or all of, the words specified in the term must be found in the index: CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: special relations The keywords “any” and “all” can be used as relations,

CQL features: special relations The keywords “any” and “all” can be used as relations, indicating that any one of, or all of, the words specified in the term must be found in the index: ● author all "kernighan ritchie" – shorthand for author=kernighan and author=ritchie CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: special relations The keywords “any” and “all” can be used as relations,

CQL features: special relations The keywords “any” and “all” can be used as relations, indicating that any one of, or all of, the words specified in the term must be found in the index: ● ● author all "kernighan ritchie" – shorthand for author=kernighan and author=ritchie author any "kernighan ritchie thompson" – shorthand for author=kernighan or author=ritchie or author=thompson CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: whole-field searching The keywords “exact” can be used as a relation, indicating

CQL features: whole-field searching The keywords “exact” can be used as a relation, indicating a search for the value of a whole field rather than words within it: CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: whole-field searching The keywords “exact” can be used as a relation, indicating

CQL features: whole-field searching The keywords “exact” can be used as a relation, indicating a search for the value of a whole field rather than words within it: ● ● title=jaws – finds Jaws and The Jaws of Fate. title exact jaws – finds Jaws but NOT The Jaws of Fate. CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

CQL features: whole-field searching The keywords “exact” can be used as a relation, indicating

CQL features: whole-field searching The keywords “exact” can be used as a relation, indicating a search for the value of a whole field rather than words within it: ● ● ● title=jaws – finds Jaws and The Jaws of Fate. title exact jaws – finds Jaws but NOT The Jaws of Fate. title exact "The Jaws of Fate" – finds The Jaws of Fate but NOT Jaws. CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Chapter 4: Applications CQL has been deployed in many kinds of application: ● Google-like

Chapter 4: Applications CQL has been deployed in many kinds of application: ● Google-like structureless searching ● Simple metadata searching with the Dublin Core ● Bath Profile for bibliographic data ● Zthes profile for hierarchical thesaurus navigation ● CCG for collectable card games ● Music – musical. Key, arranger, duration, etc. ● GILS (Global Information Locator Service) ● . . . your application goes here! CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Chapter 5: Implementations There are good-quality free CQL implementations in several important languages: ●

Chapter 5: Implementations There are good-quality free CQL implementations in several important languages: ● Java (Mike Taylor's CQL-Java package) ● C/C++ (Adam Dickmeiss in Index Data's YAZ) ● Python (Rob Sanderson in Cheshire) ● Perl (Ed Summers' CQL: : Parser module) ● Visual Basic is in development (Thomas Habing) ● . . . your language goes here! CQL – a Common Query Language Mike Taylor <mike@indexdata. com>

Conclusion: What to take home ● CQL makes easy queries easy and hard ones

Conclusion: What to take home ● CQL makes easy queries easy and hard ones possible ● You can use it well without learning the hard bits ● It is used in SRW/SRU but also applicable elsewhere ● It is extensible through context sets ● Existing context sets support lots of applications ● There are free implementations in several languages ● Tutorial on-line at: http: //zing. z 3950. org/cql/intro. html CQL – a Common Query Language Mike Taylor <mike@indexdata. com>