CQL Common Query Language Ray Denenberg March 2005
CQL “Common Query Language” Ray Denenberg March 2005
CQL’s Goals n Combine the simplicity and intuitiveness of google searching with the expressive power of Xquery. n n Support very simple queries; and arbitrarily complex expressions as necessary. Example: search on “cat”
n cat
n cat (That’s it. The whole query. )
Simple CQL Queries n cat and dog n title = cat
Simple CQL Queries n cat (simplest) n cat and dog (simple boolean) n title = cat (index)
Simple CQL Queries n cat (simplest) n cat and dog (simple boolean) n title = cat n dc. title = cat (index) (index qualified)
Boolean n cat and dog n cat or dog n Cat not dog
Boolean n cat and dog n cat or dog n Cat not dog n cat not dog and fish or frog
Boolean n cat not dog and fish or frog n evaluates to: (((cat not dog) and fish) or frog)
Boolean n cat not dog and fish or frog n evaluates to: (((cat not dog) and fish) or frog) n Not: (cat not dog) and (fish or frog)
index Search n title = cat
Qualified index title = cat n dc. title = cat n bib. title = cat n Bath. key. Title n
Fielded/index Search dc. title = cat n bib. title = cat n
n dc. title A name given to the resource n bib. title (fictitious) A word, phrase, character, or group of characters, normally appearing in an item, that names the item or the work contained in it.
Zthes Indexes zthes. nt=sauropod and zthes. bt=macronaria narrower than sauropod but broader than macronaria.
Relations
Relations The triple: <index> <relation> <search term> (e. g. title = cat) Is called a: Search Clause
Relations <index> <relation> <search term>
Simple Relations n Title = "the complete dinosaur" n title all "complete dinosaur“ n n title any "dinosaur bird reptile" title exact "the complete dinosaur"
the n = relation Title = "the complete dinosaur“ (find these three words, adjacent and in this order)
n n n Title = "the complete dinosaur“ matches “a day in the life of the complete dinosaur“ and “the complete dinosaur goes to Paris“
= n n Title = "the complete dinosaur“ matches “a day in the life of the complete dinosaur“ and “the complete dinosaur goes to Paris“ but not “the complete and unabridged dinosaur"
All n n n Title all "complete dinosaur“ matches “the complete and unabridged dinosaur“ does not match “the unabridged dinosaur“
n n Title all "dinosaur bird reptile“ does not match “the complete dinosaur"
Any n n n Title any "dinosaur bird reptile“ does match “the complete dinosaur" and “the unabridged dinosaur"
Exact title exact "the complete dinosaur" matches "the complete dinosaur" n
Exact title exact "the complete dinosaur" matches "the complete dinosaur" n Does not match: “a day in the life of the complete dinosaur or “the complete dinosaur goes to Paris“ or “the complete and unabridged dinosaur “
Relations …. observations
Relations …. observations n Observation 1: Shorthand
n title all "old man sea" same as n title="old" and title="man" and title="sea"
Relations …. observations n Observation 2: Anchoring ^ The anchor character
Recall ……. n Title = "the complete dinosaur“ n matches “a day in the life of the complete dinosaur“
Anchoring title="^the complete dinosaur" would not match “a day in the life of the complete dinosaur” n
Anchoring title="^the complete dinosaur" would not match “a day in the life of the complete dinosaur” n title="the complete dinosaur^" would not match “the complete dinosaur goes to Paris” n
Relations …. observations n Observation 3: Index and Relation go together
Index and Relation go together n Cat n Title = cat
Index and Relation go together n Cat n Title = cat n Title cat n = cat
Index and Relation go together n Cat n Title = cat n Title cat n = cat
BNF search. Clause : : ='(' cql. Query ')‘ | index relation search. Term | search. Term
Basic Relations …. summary Title = "the complete dinosaur" n title all "complete dinosaur“ n title any "dinosaur bird reptile" n title exact "the complete dinosaur" n
A few more relations … n < > <= >= n <> n n n less greater less or equal greater or equal (see next) not equal
= relation = means: n n word adjacency, when the term is a list of words. Equality, otherwise.
Relation Modifiers Stem n relevant n Fuzzy n phonetic n
Stemming n title =/stem "these completed dinosaurs“ matches n The Complete Dinosaur.
Relevance subject any/relevant "fish frog" would find records whose subject field included words like shark, tuna, coelocanth, toad, amphibian, etc.
Relation Modifiers Stem n relevant n Fuzzy n phonetic n
fuzzy n Fuzzy means: n “be liberal in what you count as a match … details left to the server. Might include permutations of character order, off-by-one for numerical terms. ” Title =/fuzzy “sharlot simmins” might match “I am Charlotte Simmons” n n telephone. Number exact/fuzzy “ 303 441 1319"
Relation Modifiers Stem n relevant n Fuzzy n phonetic n
Phonetic n Match words that sound the same e. g. Hostel might match “hostile”
Booleans n And n Or n not
Booleans n And n Or n Not n Proximity
n And cat and dog n Or cat or dog n Not n Proximity cat not dog cat prox dog
n And cat and dog n Or cat or dog n Not cat not dog n Proximity cat prox dog roughly: “find cat near dog”
Proximity (chestnut prox “Cryphonectaria parasitica”) prox (“dutch elm” prox Ceratocystisulmi)
Proximity parameters n relation n Distance n unit n ordering
Proximity parameters n n relation Distance n unit n ordering e. g: “Find cat in the same sentence as dog” Relation: less or equal Distance: 0 Unit: sentence Ordering: unordered
n relation ("<", ">" , "<=" , ">=" , "<>"; default "<="), n distance (integer; default: 1 for word, zero otherwise) n unit ("word", "sentence", "paragraph", or "element"; default "word"), n ordering ("ordered" or "unordered"; default "unordered")
“Find cat in the same sentence as dog” cat prox//sentence dog
“Find cat in the same sentence as dog” cat prox//sentence dog same as: cat prox/<=/0/sentence/unordered dog
(chestnut prox//sentence “Cryphonectaria parasitica”) prox//paragraph (“dutch elm” prox//sentence Ceratocystisulmi)
(chestnut prox//sentence “Cryphonectaria parasitica”) prox//paragraph (“dutch elm” prox//sentence Ceratocystisulmi) (find chestnut in the same sentence as “Cryphonectaria parasitica”, and “dutch elm” In the same sentence as Ceratocystisulmi, and both sentences in the same paragraph. )
(chestnut prox//paragraph “Cryphonectaria parasitica”) and (“dutch elm” prox//paragraph Ceratocystisulmi)
(chestnut prox//paragraph “Cryphonectaria parasitica”) and (“dutch elm” prox//paragraph Ceratocystisulmi) (find chestnut in the same paragraph as “Cryphonectaria parasitica”, and “dutch elm” In the same paragraph as Ceratocystisulmi. )
cat prox/>/2//ordered hat retrieves “cat in the hat” but not “cat in hat” nor “hat on the cat”
Pattern Matching n ? Matches any single character n * Matches any sequence of zero or more characters n ^ word-anchoring
Pattern Matching n ? Matches any single character • n c? t matches cat, cot, cut, but not coat or ct. c? ? t matches cart, but not cat or crypt. * Matches any sequence of zero or more characters • n c*t matches cat, coat, crypt and counterargument. ^ word-anchoring ---
Word Anchoring n title="^the complete dinosaur" n n n Matches “the complete dinosaur meets godzilla” But not “a day in the life of the complete dinosaur” title="the complete dinosaur^ “ n n Matches a day in the life of the complete dinosaur” But not “the complete dinosaur meets godzilla”
Word Anchoring - any n title any "^cat ^dog rat“ • Means title with cat at the beginning, or with dog at the beginning, or with rat anywhere.
Word Anchoring - any n title any "^cat ^dog rat“ • Means title with cat anywhere, or with rat anywhere, or with dog at the beginning. n matches • 'cat eats dog', • 'dog eats hat' • ‘hat eats rat’ n but not • ‘hat eats dog'
CQL Syntax n Reserved words: n n and, or, not, prox Special Characters n Space ( ) = < > ” /
Tokens n A string that has no special characters; or n Any string at all enclosed by double quotes. (Except the string cannot include a double quote, unless escaped. )
Escape Character n Backslash () escapes '*', '? ', " and '^' , as well as itself "“why not? " she said" Results in the following token: “why not? " she said
Context sets
Context sets n Indexes n Relation modifiers n Boolean Modifiers
subject any/relevant "fish frog"
subject any/relevant "fish frog" index relation Relation modifier Search term
subject any/relevant "fish frog" index relation Relation modifier Subject to context qualification Search term
dc. subject any/relevant "fish frog" Context set
dc. subject any/relevant "fish frog"
dc. subject any/rel. lr "fish frog"
dc. subject any/rel. lr "fish frog" Context set A specific Relevance algorithn
dc. subject cql. any/rel. lr "fish frog" Context set
Example–fictitious relation: “only” index n relation depicts only “cat" Matching images would depict only a cat and nothing else. The same cat with a person would not match.
n image. depicts image. only “cat" Context for index Context for relation
Go back to: subject any/relevant "fish frog"
subject any/relevant "fish frog" Or title any/relevant “cat dog"
subject any/relevant "fish frog" Or/rel. mean title any/relevant “cat dog"
subject any/relevant "fish frog" Or/rel. mean Context set Boolean modifier title any/relevant “cat dog"
Defaults n Consider the query: cat n The server needs to turn that into a search clause, I. e. an index, relation, and search term. n As it is, there’s only a search term
<index> <relation> cat cql. server. Choice (default index) cql. scr (default context set and relation) scr: “server choice relation”
n Next, consider the query: title = cat
n Next, consider the query: title = cat n The server needs to assign a context set to the index (title) and a context set to the relation (=)
n Next, consider the query: title = cat n The server needs to assign a context set to the index (title) and a context set to the relation (=) n Or to make it even more complicated….
n Add a relation modifier title = cat/relevant n The server needs to assign a context set to the index (title) and a context set to the relation (=), and a context set to the relation modifier.
Default Context Sets <>. title cql. = cat/cql. relevant Default index seleted by server Default context set for relation is ‘cql’ Default context set for relation modifier is ‘cql’
Additional relation modifiers n word n string The term should be broken into words, (according to the server's definition of a 'word‘) The term is a single item, and should not be broken up. n iso. Date Each item within the term conforms to ISO 8601 n number n uri n masked Each item within the term is a number. Each item within the term is a URI. (default modifier)
n Title any “cat dog” same as Title any/word “cat dog”
n Title any “cat dog” same as Title any/word “cat dog” n Title exact “cat in the hat” same as title exact/string “cat in the hat”
n Title any “cat dog” same as Title any/word “cat dog” n Title exact “cat in the hat” same as title exact/string “cat in the hat” n Title = “cat * hat” same as Title =/masked “cat * hat”
- Slides: 100