Advanced CQL and Profiling 1 Esoteric CQL features

  • Slides: 20
Download presentation
Advanced CQL and Profiling ? ! 1. Esoteric CQL features: – Word Anchoring –

Advanced CQL and Profiling ? ! 1. Esoteric CQL features: – Word Anchoring – Proximity – Relation modifiers – Boolean modifiers 2. Profiling 3. Prefix mapping 4. Defining relations Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

CQL features: esoterica “You are not expected to understand this. ” – comment in

CQL features: esoterica “You are not expected to understand this. ” – comment in the Unix Version 7 source code. The point is that new users are not required to understand this, and may happily use CQL for many years – perhaps forever – without needing to. Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

CQL esoterica: word anchoring A word beginning with “^” must occur at the start

CQL esoterica: word anchoring A word beginning with “^” must occur at the start of its field. A word ending with “^” must occur at the end of its field. ● ● ● dinosaur^ ^dinosaur the ^the the^ Advanced CQL and Profiling – matches “the complete dinosaur” – also matches – does not match – matches “the complete dino – also matches – does not match Mike Taylor <mike@indexdata. com>

CQL esoterica: proximity The “prox” boolean, by default, requires its operands to be next

CQL esoterica: proximity The “prox” boolean, by default, requires its operands to be next to each other, in either order: ● ● cervical prox vertebra – equivalent to "cervical vertebra" or "vertebra cervical" (cervical or dorsal) prox vertebra – equivalent to "cervical vertebra" or "dorsal vertebra" or "vertebra cervical" or "vertebra dorsal" Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

CQL esoterica: proximity II Modifiers can generalise the semantics of proximity: ● ● cervical

CQL esoterica: proximity II Modifiers can generalise the semantics of proximity: ● ● cervical prox/distance<=5/ vertebrae – within five words of each other cervical prox/distance=0/unit=sentence vertebrae – within the same sentence cervical prox/distance>0/unit=paragraph vertebrae – in different paragraphs cervical prox/ordered vertebrae – in the specified order: exactly equivalent to "cervical vertebra" Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

CQL esoterica: relation modifiers Modifiers can refine the semantics of relations: ● ● title

CQL esoterica: relation modifiers Modifiers can refine the semantics of relations: ● ● title =/stem dig – finds “dig”, “digging”, “dug”, etc. title any/relevant "dinosaur bird reptile" – finds “sauropods”, “avian”, “crocodile”, “snake”, etc. author =/fuzzy tailor – finds “Mike Taylor” phone. Number exact/fuzzy "020 8348 6768" – finds “ 020 8348 6769” Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

CQL esoterica: relation modifiers II Relation modifiers can be overloaded to specify extra information

CQL esoterica: relation modifiers II Relation modifiers can be overloaded to specify extra information about the term that the relation joins to the index: ● ● created. Date >/iso. Date "2004 -03 -12 09: 45: 00" – the term is in ISO 8601 format. location within/geom. polygon "(12, 46) (15, 52)" – the term indicates a polygon of two points (i. e. a straight line) rather than the corners of a rectangle. Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

CQL esoterica: boolean modifiers Modifiers can refine the semantics of boolean operators. We've already

CQL esoterica: boolean modifiers Modifiers can refine the semantics of boolean operators. We've already seen some examples of this in proximity. ● ● ● cervical prox/distance<=5/ vertebrae – within five words of each other cervical or/exclusive vertebrae – one or the other, but not both. "denenberg or/rel. mean "information retrieval" "denenberg or/rel. sum "information retrieval" "denenberg or/rel. max "information retrieval" – average, total or maximum relevance of operands Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

Profiling CQL For simple searching, it suffices to use common indexes. Semantic interoperability requires

Profiling CQL For simple searching, it suffices to use common indexes. Semantic interoperability requires more precise behaviour. This lesson was learned in the Z 39. 50 world and resulted in the invention of “profiles” - specifications for a subset of the full specification that are needed to support an application. The classic example in Z 39. 50 is a Bath Profile for bibliographic searching. Similarly, we define a Bath Profile for CQL searching. Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

Profiles and context sets A profile is not the same thing as a context

Profiles and context sets A profile is not the same thing as a context set! ● ● A context set is merely a bag of indexes (and relation modifi and boolean modifiers) that may be used in any application A profile provides a palette of indexes drawn from several context sets. The distinction is similar to that between XML namespaces an XML Schemas. ● ● Schemas depend on namespaces, and may use several. CQL profiles depend on context sets, and may use several. Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

Example: the Bath Profile See http: //zing. z 3950. org/srw/bath/2. 0/ Bath searches may

Example: the Bath Profile See http: //zing. z 3950. org/srw/bath/2. 0/ Bath searches may use any of the following indexes: dc. creator dc. title dc. subject cql. anywhere dc. identifier dc. date bath. key. Title dc. format dc. language bath. possessing. Institution bath. name Advanced CQL and Profiling bath. personal. Nam bath. corporate. Na bath. conference. N bath. uniform. Title bath. issn rec. id bath. geographic. Name bath. notes bath. topical. Subject bath. genre. Form Mike Taylor <mike@indexdata. com>

Existing and possible profiles Explicit CQL profiles have been created for some applications ●

Existing and possible profiles Explicit CQL profiles have been created for some applications ● Bath Profile for bibliographic data ● Zthes profile for hierarchical thesaurus navigation Profile are in development (or “unwritten”) for others: ● Google-like structureless searching ● Simple metadata searching with the Dublin Core ● CCG for collectable card games ● Music – musical. Key, arranger, duration, etc. ● GILS (Global Information Locator Service). . . your application goes here! Advanced CQL and Profiling ● Mike Taylor <mike@indexdata. com>

CQL esoterica: prefix mapping So far, we have been free and easy with index

CQL esoterica: prefix mapping So far, we have been free and easy with index prefixes such as “dc”. But how do we know what they mean? Why should “dc” mean Dublin Core rather than Deep Custard? ● dc. custard. Depth <= 20 Why should “bath” mean the Bath Profile for bibliographic searching instead of plumbing supplies? ● bath. capacity. In. Gallons > 45 Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

CQL esoterica: prefix mapping II Prefixes are just convenient, easy-to-type abbreviations. The real identifier

CQL esoterica: prefix mapping II Prefixes are just convenient, easy-to-type abbreviations. The real identifier of a context set is its URI. For example, the Dublin Core context set is info: srw/cql-context-set/1/dc-v 1. 1 but we map that URI to a prefix for convenience. This is exactly like XML namespaces: they are identified by URIs, but the URIs do not appear in the names of elements or attributes: short prefixes are used instead. Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

CQL esoterica: prefix mapping III In XML, a prefix is associated with a namespace

CQL esoterica: prefix mapping III In XML, a prefix is associated with a namespace using: ● <element xmlns: prefix="http: //example. org/xyz/"> In CQL, a prefix is associated with a namespace using: ● >prefix=http: //example. org/xyz/ and the rest of the query follows. The following queries are exactly equivalent: ● >dc=info: srw/cql-context-set/1/dc-v 1. 1 dc. title=fish ● >yx=info: srw/cql-context-set/1/dc-v 1. 1 yx. title=fish Most applications will have established default mappings. Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

CQL esoterica: prefix mapping IV It is possible to establish the context set from

CQL esoterica: prefix mapping IV It is possible to establish the context set from which indexes with no explicit prefix are taken by omitting the “prefix=” part from the mapping: ● >http: //example. org/heraldry/ title=baron and side=sinister So the following queries are exactly equivalent: ● >info: srw/cql-context-set/1/dc-v 1. 1 title=fish ● >yx=info: srw/cql-context-set/1/dc-v 1. 1 yx. title=fish Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

CQL esoterica: prefix mapping V Finally. . . Finally! : -) Prefix mappings can

CQL esoterica: prefix mapping V Finally. . . Finally! : -) Prefix mappings can be stacked up: ● >dc = info: srw/cql-context-set/1/dc-v 1. 1 >bath=http: //zing. z 3950. org/cql/bath/2. 0/ >rec=info: srw/cql-context-set/2/rec-1. 0 rec. created < 2004 -10 -09 and dc. title=ecology and bath. conference. Name=dinosaur (Yes, this is all one query. ) Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

CQL esoterica: prefix mapping VI Don't try this at home. Advanced CQL and Profiling

CQL esoterica: prefix mapping VI Don't try this at home. Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

Defining relations CQL has a “feature” where any word can act as a relation.

Defining relations CQL has a “feature” where any word can act as a relation. For example, the query: foo bar baz is interpreted as index-name “foo”, relation “bar”, term “baz” – even though there is no relation “bar”. This is a misfeature. it prevents the obvious interpretation of this query as a phrase-search or AND search. If your profile needs a new relation, consider defining it as a relation modifier on one of the existing relation, instead. Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

Thanks for listening! ? ! Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>

Thanks for listening! ? ! Advanced CQL and Profiling Mike Taylor <mike@indexdata. com>