Schemas or Vocabularies April 26 2005 OASIS Symposium
Schemas or Vocabularies? April 26, 2005 OASIS Symposium on the Future of XML Vocabularies Bob Du. Charme Lexis. Nexis 1
Outline • review Dublin Core • “vocabularies” • creating vocabularies (and maybe schemas): required and optional steps • case study: PRISM 2
Dublin Core • Dublin Core Metadata Initiative • dublincore. org • DCMI Metadata Terms: elements, element refinements, encoding schemes, and vocabulary terms • element: “A discrete unit of data or metadata. An element may contain subelements that are called qualifiers in Dublin Core. ” • creator, date, description, format, identifier… 3
“vocabulary”? • list of words? • DTD? • schema? – W 3 C Schema? – RELAX NG schema? – RDF Schema? 4
Mandatory step 1 Define your standard list of words: • The actual words to use (Publish. Date? publish-date? Pub. Date? ) • Their meanings. • (optional) Value restrictions, e. g. – formatting, such as ISO 8601 for dates (“ 2005 -04 -26 T 09: 20”) – list of values to choose from (Y/N, True/False, ISO 3166 country codes) 5
Example Dublin Core definition Term Name: format URI: http: //purl. org/dc/elements/1. 1/format Label: Format Definition: The physical or digital manifestation of the resource. Comment: Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats). Reference: [MIME] http: //www. iana. org/assignments/media-types/ Type of Term: element Status: recommended Date Issued: 1999 -07 -02 6
Optional Steps 2 and 3 • Figure out the relationships of your labeled pieces of information • Write it down in a machine-readable form 7
RDF Schemas “RDF user communities also need the ability to define the vocabularies (terms) they intend to use in those statements, specifically, to indicate that they are describing specific kinds or classes of resources, and will use specific properties in describing those resources…” - W 3 C RDF Tutorial 8
Validation? “RDF classes and properties are in some respects very different from programming language types. RDF class and property descriptions do not create a straightjacket into which information must be forced, but instead provide additional information about the RDF resources they describe. ” 9
Flexibility • advantage: more systems can adapt, politically easier to sell • disadvantage: fuzziness, more work to adopt a standard 10
PRISM • Publishing Requirements for Industry Standard Metadata • “Developing a standard XML metadata vocabulary for the publishing industry” • http: //www. prismstandard. org • v 1. 0: 2001; current version: 1. 2 11
PRISM 1. 2 “elements” General Purpose Provenance Dates and Time Subject Description Relations Rights Controlled Vocabs Inline Markup dc: identifier title creator contributor description language format type dc: publisher source prism: creation. Date expiration. Date modification. Date publication. Date release. Date reception. Date prism: is. Correction. Of has. Correction is. Part. Of has. Part is. Version. Of has. Version is. Format. Of has. Format References is. Referenced. By is. Based. On is. Basis. For is. Translation. Of has. Translation requires is. Required. By is. Alternative. For has. Alternative 12 dc: rights pcv: broader. Term code definition Descriptor label narrower. Term related. Term synonym Vocabulary pim: event industry location object. Title organization person quote prism: category prism: distributor edition issue. Name number starting. Page Volume dc: coverage subject prism: event industry location person organization section prism: copyright expiration. Time release. Time rights. Agent prl: geography industry usage
PRISM DTDs • metadata vs. data: – article titles, bylines – identifying inline entities • PRISM Aggregator DTD (PAM) • Two levels of compliance – level one: well-formed XML, dc: identifier – level two: RDF profile, rdf: about 13
PRISM RDF Schema • Tony Hammond, Nature Publishing Group • under “contributed resources” on PRISM website 14
Lessons Learned • Which works best for your industry: vocabulary, DTD, XSD, RNG… • Layered approach a good option • Say what you mean 15
Schemas or Vocabularies? April 26, 2005 OASIS Symposium on the Future of XML Vocabularies Bob Du. Charme Lexis. Nexis 16
- Slides: 16