Draft Ideas on a Process to Design and

  • Slides: 21
Download presentation
Draft Ideas on a Process to Design and Build the DFT Vocabulary Terms Concepts

Draft Ideas on a Process to Design and Build the DFT Vocabulary Terms Concepts Synonyms Conceptual Model Gary Berg-Cross Developed for DFT WG Session at 2 nd RDA Plenary Sept. 2013 Washington DC

Topical Outline • Overview of Phased Plan • Basic Ideas –Vocabularies and Concepts •

Topical Outline • Overview of Phased Plan • Basic Ideas –Vocabularies and Concepts • Start up, Requirements analysis and development of candidate list • Vocabulary Analysis Process • Vocabulary Design process • Refinement • Draft Vocabulary Publication and Review

Basic Ideas • Term Definition Verbal designation (3. 4. 1) of a general concept

Basic Ideas • Term Definition Verbal designation (3. 4. 1) of a general concept in a specific subject field. (ISO 1087 -1: 2000) • NOTE: A term may contain symbols and can have variants, e. g. different forms of spelling. • A Controlled Vocabulary (CV) is a consensus, standardized set of terms, including new ones, used to refer to concepts • Terms are proxies for concepts within a conceptual system • Standards (like ISO) emphasize principles of vocabulary control that guide their design and development. 1. 2. Eliminate (conceptual) ambiguity 1. use principles for defining/describing concepts to which terms are assigned 2. show relations and structure to help understanding Control synonyms - term equivalence –simple formulation as a synonym ring Data Object Digital Object digital record • “mg/l” has synonym of “milligrams per liter” - both refer to a concept • A “databank” is an obsolete synonym for ”database” (preferred term) 3. 4. 5. Establish relations among terms where appropriate Consider and systematize the role of lexical modifiers (e. g. digital or data object) Employ Guiding Principles… Data Set Data Element

Vocabulary, Concepts and Reality: (after Ogden and Richards) Data Concept (part of cognitive Organization)

Vocabulary, Concepts and Reality: (after Ogden and Richards) Data Concept (part of cognitive Organization) Correspondence Understand Reality – Semantics starts with evidence What is the concept? Something we understand. “Data Object” Name Reality term/Voc Objects > data Objects Vocabulary - Sets of terms used by groups of cognitive agents to represent & communicate about concepts. In a language syntax can relate symbolic designations/terms This data collection is different than that a data set A dataset series (according to ISO 19115, 19113 & 19114) is a collection of datasets sharing the same product specification… Extant Work, Philosophy, Psych, Data Science Perspectives …. . Maybe there is more than 1 type of Aggregation. ? Task- Regiment Language (assertions using controlled terms? ) Dataset A dataset is a logically meaningful grouping of similar or related data. • Dataset isa grouping • Data in a dataset has relation • Relation types- source or class of source, processing level, algorithms, topic, time period……. . • A dataset is. Part. Of a dataset series……

Start up Activities • Scope • Needs to be practical & focused but useful

Start up Activities • Scope • Needs to be practical & focused but useful to overall RDA effort. • Implied by scope of current documents & cited work • Input from RDA WGs • Identify Core concepts and terms • Growing list in current documents • Gather definitions from sources and interested parties • 3 -4 examples discussed later at this session • Implied concepts that may be needed to bridge differences • Gather and discuss ideas/needs/interests from RDA WGs • Understand a concept-vocabulary development process • Employ Guiding Principles 1. Reuse Don’t re-invent • Adapt existing standards, methods and vocabulary that are fit for purpose. • Engage test and validate terms in the community • relevant WGs and Communities of Practice (Co. I/Co. P) for analysis, design, production

Many international standards and other specifications • ISO 704: 2000 – Terminology – Principles

Many international standards and other specifications • ISO 704: 2000 – Terminology – Principles and methods • ISO 1087 -1: 2000 – Terminology – Part 1: General vocabulary • ISO/IEC 11179 -3: 2003 – Metadata registries – Part 3: Metamodel & basic attributes • ISO/IEC 11179 -6: 2005 – Metadata registries – Part 6: Registration • ISO/IEC 11404: 2007 – General purpose datatypes • ISO/IEC 19773: 2011 – Metadata Modules • Open Government Vocabularies – Content Model • Open Government Vocabularies Working Group Data Element: A logical, identifiable unit of data that forms the basic organizational component in a database. Usually a combination of characters or bytes referring to one separate piece of information. A data element may combine with one or more other data elements or digital objects to form a digital record. https: //www. google. com/url? sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0 CDQQFj. AB&url=http%3 A%2 F%2 Fwww. w 3. org%2 F 2011 %2 Fgld%2 Fwiki%2 Fimages%2 F 96%2 FOpen_Government_Vocabualries_-_Content_Model_v 02. doc

A Start on Vocabulary Analysis Process • Identify concepts and concept relations implied by

A Start on Vocabulary Analysis Process • Identify concepts and concept relations implied by collected terms; • Analyze and model concept systems on the basis of identified concepts and concept relations that are used to understand a term and its referent; • Establishing representations of concept systems through concept diagrams; • Craft concept-oriented definitions as a concept base; • Test arrangement in taxonomical class hierarchy(s) • Add essential Properties/Attributes/slots to distinguish related concepts • Link concepts via Relations…. . etc. • Associate a designated vocabulary term to each concept (in one or more languages); and, • Document the vocabulary in an agreed upon form, • perhaps starting as a structured glossary and support concept models Adapted liberally from ISO TC 37 Standards Basic Principles of Terminology

Vocabulary Design Process & Vocabulary Qualities • Both analysis and design may employ conceptual

Vocabulary Design Process & Vocabulary Qualities • Both analysis and design may employ conceptual modeling to capture the essential meaning and structure of the descriptions of the vocabulary. • The product of this is some form of conceptual model. • Desired Qualities 1. Adequate capture of content intuitions, expressed by domain experts, 1. 2. 3. 4. in an understandable forms includes details on constraining descriptions Uses well defined relations, taxonomic and others Illustrate with examples 2. Rigorous – stands up to rational analysis 3. Minimally redundant - no unintended synonyms

Vocabulary & Model Artifact Designed by Rigorous Method World Situations Act of observing a

Vocabulary & Model Artifact Designed by Rigorous Method World Situations Act of observing a phenomenon, with goal of producing an estimated property value. -OGC O &M model Adapted liberally from Guarino’s 1998 Formal Ontology in Information Systems Data on a Wetlands Conceptualization starts to model (part of) the data world Expressed in a Communicative form Data World Situations Observation Interaction A specific artifact designed to express the intended meaning of a (shared) Vocabulary commits to certain relations say subtypes <RDA-DFT: data object> Subclass < RDA-DFT: digital object > …. . Our Vocabulary & Model Product Intended Model Fitting Conceptualization 9

Simple Vocab Entry Example • Data Object • Type of: Abstract Object • Sub-types:

Simple Vocab Entry Example • Data Object • Type of: Abstract Object • Sub-types: Data object, digital object, …… • Definition: n computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure. (With the later introduction of object oriented programming the same word, "object", refers to a particular instance of a class) • http: //en. wikipedia. org/wiki/Data_object • Definition 2: a Data Object is a dataset • • Equivalent terms (other languages). . . Attributes…. Relations a data element is. Partof Data Object…. Examples/Instances include: repository metadata, data models, databases, tables, views, files, entities, columns, data elements, and attributes. (Source http: //www. indiana. edu/~dss/Services/Naming/nvgglossary. html)

Refinement Community validation in necessary for standardization, but also Gradual Formalization Simplify Analysis to

Refinement Community validation in necessary for standardization, but also Gradual Formalization Simplify Analysis to Understand Data Concepts fy pli Sim ore m Data Reality Informal Models Formal Models From Long-term Preservation for Spatial Data Infrastructures: a Metadata Framework and Geo-portal Implementation http: //www. dlib. org/dlib/september 11/shaon/09 shaon. html Natural Language…. Controlled Voc Scientific ” Models” (see above) Semantic Web formalisms Taxonomies. RDF(S), OWL….

Vocabulary can be Built out in Stages –for example RDA Scope…. 12 months WG

Vocabulary can be Built out in Stages –for example RDA Scope…. 12 months WG Scope 9 months Core 6 months Starter Set On 3 months

Draft vocabulary Publication and Review • Products • a reference document about DFT, •

Draft vocabulary Publication and Review • Products • a reference document about DFT, • Including a structured and well documented vocabulary including distinction of • Preferred, Admitted, Deprecated, Obsolete terms • Register the defined terms in an ISO-like concept registry so that everyone can easily refer to them • Create an accompanying abstract data organization/conceptual model that may be also expressed graphically.

Thank You Questions? 14

Thank You Questions? 14

Backup Slides

Backup Slides

Organizing Relations – For Example Kinds of “Structure”

Organizing Relations – For Example Kinds of “Structure”

Add Relations Incrementally: Richer Schemata & Reusable Patterns DO, …. Every DO is a

Add Relations Incrementally: Richer Schemata & Reusable Patterns DO, …. Every DO is a type of data element…. binary form Simple Feature-State Model (from GRAIL) becomes a richer schema Abstract Object described by metadata, has associated symbolic content may have parts data element 17

 • The semantic problem, "what is a data aggregation? " • • •

• The semantic problem, "what is a data aggregation? " • • • In statistics, aggregate data describes data combined from several measurements. When data are aggregated, groups of observations are replaced with summary statistics based on those observations like an average. In economics, aggregate data or data aggregates describes high-level data that is composed from a multitude or combination of other more individual data. But it could just be some type of merging of data and not integrated. One term - three concepts What does “Understanding” such things involve? – Terms & concepts (recognition, disambiguation via MEANING) 18

Terminological Services • Common in medical realm

Terminological Services • Common in medical realm

Controlled English tools

Controlled English tools

Understanding Why Data Is Structured as It Is Place Location code…. Place Has Location

Understanding Why Data Is Structured as It Is Place Location code…. Place Has Location tuples Identified by Type Place Repository event Intersection Is an artifact to link tables Repository event Processes Physical Object Participates in Event Have this static part Need this Type Location code Object Entities are Related Meaningfully And “attributes” are Related Entities. admin event A deeper understanding of why the data values are what they are ……. 21