OASIS Electronic Trial Master File Standard Technical Committee

  • Slides: 24
Download presentation
OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer January 20, 2014

OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer January 20, 2014 9: 00 – 10: 00 AM PST

Agenda Topic 9: 00 -9: 05 Call to Order & Roll Call 9: 05

Agenda Topic 9: 00 -9: 05 Call to Order & Roll Call 9: 05 -9: 10 Approval of Minutes https: //www. oasisopen. org/committees/documents. php? wg_abbrev=etmf Presenter Zack Schmidt All TC Process and Administration (deferred) Chet Ensign 9: 10 -9: 20 Outreach Subcommittee - All Jennifer Alpert 9: 20 -9: 50 Tech presentation – Content Classification Layer Z. Schmidt/Aliaa 9: 50 -9: 55 New Business All 9: 55 -10: 00 Next meeting agenda / Date Z. Schmidt 2

Roll Call Name Company Voting Status Present? Jennifer Alpert Palchak Care. Lex Voter y

Roll Call Name Company Voting Status Present? Jennifer Alpert Palchak Care. Lex Voter y Aliaa Badr Care. Lex Voter y Oleksiy (Alex) Palinkash Care. Lex Voter y Troy Jacobson Forte Research Voter y Lou Chappuie Individual Voter y Lisa Mulcahy Individual Non-Voter y Robert Gehrke Mayo Clinic Voter n Rich Lustig Oracle Non-Voter y Michael Agard Paragon Solutions Non-Voter y Christopher Mc. Spiritt Paragon Solutions Non-Voter y Jamie O’Keefe Paragon Solutions Non-Voter n Fran Ross Paragon Solutions Non-Voter y Peter Alterman SAFE-Bio. Pharma Voter y Catherine Schmidt Sterling. Bio Voter y Zack Schmidt Sure. Clinical Voter y Trish Whetzel, Ph. D Sure. Clinical Non-Voter y Peter Junge Beijing Sursen Observer n Laura Hilty Forte Research Observer n Tony O’Hare Forte Research Observer n Eldin Rammell Consulting Observer n Robin Cover OASIS staff Non-Voter n Chet Ensign OASIS staff Non-Voter n

Meeting Etiquette • Announce your name prior to making comments or suggestions • Keep

Meeting Etiquette • Announce your name prior to making comments or suggestions • Keep your phone on mute when not speaking (#6) • Do not put your phone on hold – Hang up and dial in again when finished with your other call – Hold = Elevator Music = very frustrated speakers and participants • Meetings will be recorded and posted – Another reason to keep your phone on mute when not speaking! • Use the join. me “Chat” feature for questions / comments / Votes • NOTE: We will Robert’s Rules of Order Thisfollow meeting is being recorded and minutes will be posted on TC page after the meeting 4 From e. TMF Std TC to Participants: Hi everyone: remember to keep your phone on mute

Outreach Subcommittee • Status – New Members: – Oracle – Joined – In Progress:

Outreach Subcommittee • Status – New Members: – Oracle – Joined – In Progress: EMC, Kaiser Permanente, Shire, Medtronics • Activities / Milestones

Tech Discussion • Status • Timeline • In parallel with other Tech work from

Tech Discussion • Status • Timeline • In parallel with other Tech work from charter

Content Classification System Discussion – Classification System Components: • Classification Categories – Taxonomy, hierarchy

Content Classification System Discussion – Classification System Components: • Classification Categories – Taxonomy, hierarchy • Metadata (‘Tags’) – Characterizes content • Content Model – Published set of classifications, metadata for a domain (e. g. , e. TMF)

Classification Categories Component – Hierarchy of categories • Classification Categories Hierarchy Categories, subcategories, content

Classification Categories Component – Hierarchy of categories • Classification Categories Hierarchy Categories, subcategories, content types – Defined relationships with rules: Parent-Child – All categories, content types required to have unique names and machine codes – Each content type is associated with Metadata Properties (includes core and domain-specific) – Content items are linked to content types. – Unique classification and term codes based on Universal Decimal Classification System (UDC) numbering, widely used in libraries worldwide. Human and machine readable; infinitely expandable – Can be described, edited and validated using OWL editor (like open source editor Protégé’) – Supports any simple text vocabulary, including TMF Ref Model and other vocabularies – W 3 C OWL 2 and RDF/XML supported Study Digital Content

Metadata Component – Used to tag or index digital content items Metadata Classes: Core

Metadata Component – Used to tag or index digital content items Metadata Classes: Core - Comprised of four areas: File Properties, Classification, Audit Trail Business Process Domain-specific -- Metadata for a domain in life sciences such as e. TMF, finance, legal administration, or others. Uses standards-based terms from groups like NCI Org Specific – Metadata that meets organizations needs – not standards based General – obtained from public standards-based vocabulary terminology resources like dublin core Annotation Properties Metadata about classification categories and metadata: § Core, Org-Specific metadata Core Metadata Example – File Properties:

Content Model Component – Contains classification hierarchy, metadata in machine readable format:

Content Model Component – Contains classification hierarchy, metadata in machine readable format:

Classification System – Term Sources Term Sourcing Concepts: • Terms adopted by standards bodies

Classification System – Term Sources Term Sourcing Concepts: • Terms adopted by standards bodies should be used first in e. TMF model Primary Term Sources for e. TMF Classification System: – Internet Standards Dev Orgs: W 3 C, IETF, ISO, etc. » Required for interoperability of machine code – NIH NCIthesaurus: Term database for FDA, CDISC, HL 7, other orgs » Required for interoperability of clinical / health sciences data Secondary Term Sources for e. TMF Classification System: • Industry sources – widely used terms in enterprise content mgmt software, TMF RM *Spec, Table 6, p 21

Classification Categories Component – Classification hierarchy and numbering is based on UDC library numbering

Classification Categories Component – Classification hierarchy and numbering is based on UDC library numbering standard and XML naming – Digital dot notation – Designed for human and machine readability – Each number is also a unique code for naming and ordering in the hierarchy – Primary Categories (PC): Three digit. e. TMF: 100 -200 – Subcategories (SC): Two digit: 10 -99 Hierarchy Numbering/Naming Considerations: – Content Types (CT): Two digit: 10 -99 – Maximum number of Sub. Category divisions is 5, excluding the 3 -digits for the Primary Category [1] Per spec section 2. 1. 1; 6. 0 Classification Categories Hierarchy and Numbering [1]: : • • • Flexible, standards-based approach (W 3 C XML compliant naming*) Ability to add multiple hierarchy divisions / levels 5 11 • Proposed: 5 divisions = [100*90 ) = 5. 9 x 10 Content Types Uniqueness of numbers – usable as machine code identifiers Machine readable, human readable No sorting issues, no need for leading zeros*, no special chars *Leading zeros in XML syntax are ignored: http: //www. w 3. org/TR/REC-xml/

Classification Categories Component Numbering and Naming Scheme Numbering • Primary Categories and Sub-Categories :

Classification Categories Component Numbering and Naming Scheme Numbering • Primary Categories and Sub-Categories : – Category Code number • Content Type: – Content Type ID Naming • Primary Categories and Sub-Categories – Simple text-based names – Unique name, 64 char limit – Abbreviation – 16 char limit suggested – Compatible with W 3 C XML naming standards : No special characters : ()< > ? /%# @! Example: Classification Categories Hierarchy, Naming, Numbering

Classification Categories Component Modifying Classification Category Entities – General Editing Rules Domain Specific Classification

Classification Categories Component Modifying Classification Category Entities – General Editing Rules Domain Specific Classification Category, Content Type Editing Rules* – Classifications cannot be deleted –> Reserve/Unreserve Type Import Terms Generate Code Add/Modify Delete/Reserve Domain Specific Yes No No/Yes** Reserve/Unreserve Organization Specific Yes Yes/Yes Delete – Modifications allowed to some annotation properties (see spec) – Codes (Category Codes, CT Type ID) cannot be generated Organization Specific – Classifications can be deleted – Modifications allowed for classification metadata, annotations – Codes (Category Codes, CT Type ID) can be generated *Spec, Table 6, p 21 **Annotation metadata

Classification Editing Tool – Free, Open Source Protégé (From Stanford University: http: //protege. stanford.

Classification Editing Tool – Free, Open Source Protégé (From Stanford University: http: //protege. stanford. edu/ ) Protégé Editor: -Edit Classification Taxonomy and Metadata Terms -Validate Taxonomy and Term name compliance -Create valid RDF/XML Ontology *Spec, Table 6, p 21

Classification Categories - Summary Proposed Classification System has following Properties: • Based on Naming

Classification Categories - Summary Proposed Classification System has following Properties: • Based on Naming and Numbering that is W 3 C XML compliant – No special characters: ( ) & # @ / … etc. – No leading zeros in classification numbers • Based on Universal Decimal Classification (UDC) system for content classification: – 100 199 : e. TMF Domain – UDC system used in 170+ countries worldwide; expandable, human and machine readable, sortable http: //en. wikipedia. org/wiki/Universal_Decimal_Classification • Flexible and customizable for organizations, yet interoperable – Domain classifications – Standardized; Organization-specific classifications – Editable • Defined set of rules for Editing, modifying Taxonomy • Any Organization can Modify/Edit taxonomy using open source editors like Protégé *Spec, Table 6, p 21

Appendix

Appendix

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture – Objectives: • Classification, Subclassification concept - – Supports RDF/XML, OWL languages – Non-domain specific, generic terms – Easily understandable by anyone - conveys concept – Conveys hierarchy – No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s – First priority – Source terms from standards bodies *Spec, Table 6, p 21

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture • Classification, Subclassification term concept: Term Options: Source Definition Category, Sub. Category NIH NCIthesaurus Category: ‘This term is used informally to mean a class of things’ (NCI code: C 25372); Subcategory: ‘A subdivision that has common differentiating characteristics within a larger category. ’ (NCI Code C 25692) Class, Sub. Class W 3 C OWL Class: ‘Resources may be divided into groups called classes’ Sub. Class: ‘Subclasses are classes; If a class C is a subclass of a class C', then all instances of C will also be instances of C'. (W 3 C RDF Class def) TMF Zone, Section TMF Ref Model TMF Zone = Primary Classification (no published def found online) Section = Sub. Classification (no published def found online) Proposed Term *Spec, Table 6, p 21

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture • Classification, Subclassification term concept: Term Options: Source +/- Category, Sub. Category NIH NCIthesaurus +Everyone knows it +Describes hierarchy +In use by standards body (NIH NCI Thesaurus) +Generic Class, Sub. Class W 3 C OWL +Describes hierarchy +In use by standards body +Generic - Could be a reserved word for some development tools TMF Zone, Section TMF Ref Model +In use by TMF RM users -Doesn’t convey hierarchy -Not in use by standards body -Not Generic Proposed Term *Spec, Table 6, p 21

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture – Objectives: • Content Type concept – Supports RDF/XML, OWL languages – Non-domain specific, generic terms – Easily understandable by anyone – conveys concept – No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s – First priority – Source terms from standards bodies *Spec, Table 6, p 21

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture • Content Type term concept: Term Source Definition Content Type W 3 C & Care. Lex Oracle W 3 C: ‘Specifies the nature of a linked resource’ W 3 C and Proposed Term RFC 2045] and [RFC 2046] Care. Lex: A content type is a reusable collection of metadata, business processes, behavior, and other settings for a category of items or documents in electronic content material. Oracle: Content types are used to define the metadata that you can associate with content. *Spec, Table 6, p 21 Artifact TMF Ref Model ‘A collection of documents’ Wikipedia (Not published)

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture

Classification System – Core Terms Content Classification System – Core Terms needed for Architecture • Content Type term concept: Proposed Term *Spec, Table 6, p 21 Term Source +/- Content Type W 3 C +Widely used in internet SW +ECM SW use - Microsoft, Oracle, Alfresco, etc. +In use by standards body (W 3 C) +Generic Artifact TMF Ref Model +In use by TMF RM users -Not in use by standards body -Not Generic -Doesn’t convey concept of metadata

Draft Agenda: Next Meeting • Roll call • Reports – Outreach – Tech Discussion:

Draft Agenda: Next Meeting • Roll call • Reports – Outreach – Tech Discussion: Classification Layer: Core Metadata (Charter item 2, p. 2) • New business