Data Frames Version 3 Proposal Data Frames Version

  • Slides: 9
Download presentation
Data Frames Version 3 Proposal

Data Frames Version 3 Proposal

Data Frames Version 2 l l l l Year matches [2] constant { extract

Data Frames Version 2 l l l l Year matches [2] constant { extract context end; l l l Mileage matches [8] constant { extract "b[1 -9]d{1, 2}k"; } 0. 6, { extract "[1 -9]d? , d{3}"; } 0. 3; keyword "bmilesb", "bmib"; end; l Also: except, substitute, filter phrases; lexicons "d{2}"; "([^$d]|^)d{2}[^, dk. K]"; } 0. 5, "d{2}"; "([^$d]|^)d{2}, [^d]"; } 0. 6, "d{2}"; "b'd{2}b"; } 0. 8;

Kimball’s Ontology Editor Still allow negation Introduce idea of “required context” Each phrase may

Kimball’s Ontology Editor Still allow negation Introduce idea of “required context” Each phrase may be labeled Strong separation of value and keyword phrases Expressions are richer than regular expressions. Supports Boolean and proximity operators; also lexicons and macros. Allow keyword to be specific to a subset of the value phrases for this data frame

Internal Representation l Replace SQL field length with arbitrary type field ¡ This is

Internal Representation l Replace SQL field length with arbitrary type field ¡ This is the “internal representation” ¡ Type is either lexical or nonlexical ¡ Type could be the name of an object set in the ontology ¡ Or it could be the name of a type in whatever language will be used to implement methods (more on this later), together with a units name (e. g. “miles”, “meters”, “grams”, “pounds”)

Methods l Add a method phrase to data frames ¡ ¡ Conceptually they are

Methods l Add a method phrase to data frames ¡ ¡ Conceptually they are restricted derived object sets and relationship sets We only declare method signatures in data frames l l l ¡ Another language (e. g. Java) is used to define the method body Our tool will generate a template in which the programmer can write method bodies The template will have OO structures that allow read-only access to the seamless model/data instance Keyword phrases may also apply to methods

Canonicalization Methods l Each value phrase may have an associated canonicalization method ¡ The

Canonicalization Methods l Each value phrase may have an associated canonicalization method ¡ The purpose is to convert the extracted value string into a common form l The data frame may have a default canonicalization method that applies if there is no individual method for a value phrase

Inheritance l Inheritance is defined more cleanly ¡ ¡ ¡ Generalization/specialization will indicate inheritance

Inheritance l Inheritance is defined more cleanly ¡ ¡ ¡ Generalization/specialization will indicate inheritance hierarchy The internal representation cannot be overridden in specializations Multiple parents must have the same internal representation Individual inherited phrases can be deleted or overridden New phrases can be added In the case of name conflict, we require fully qualified names to be used (no automatic disambiguation)

General Constraints l We may decide to implement a limited form of general constraint

General Constraints l We may decide to implement a limited form of general constraint in the ontology ¡ ¡ E. g. “Birth Date <= Death Date” Or “Event Distance. to. Miles() <= 26 l If so, we may want to implement operator overloading (something like C++) l The general constraint issue is not core to the current data frame discussion, but it has interesting ramifications

Other Issues l How to integrate methods and confidence values into record-assembly heuristics l

Other Issues l How to integrate methods and confidence values into record-assembly heuristics l Ontos system will have to be rewritten l Extract into model instance, not SQL tables ¡ We can always generate database tables later if we’d like l Ontologies as XML created graphically and stored