Some LIRICS topics Peter Wittenburg Marc KempsSnijders MPI
Some LIRICS topics Peter Wittenburg, Marc Kemps-Snijders MPI for Psycholinguistics 1 hour ? 1
LMF Topics • is this LIRICS? • what is LMF compliance? Some ideas about header and related standards UNICODE • docking mechanism of components • relation mechanism • operation mechanism • this is LIRICS for MPI and Sheffield primarily • DCR Syntax API • LMF API • lexicon and component registries incl metadata 2
DCR API 1 • tools such as GATE, LEXUS, ANNEX, … have to make use of the ISO DCR (and probably other ontologies/concept registries) • so we need an API (that can also be re-used) • API has to be delivered finally as a web-service including all aspects • UDDI layer to search/browse for services • WSDL layer to describe the interface (methods, …) • SOAP layer for message exchange • SYNTAX ISO DCR was not set up as a service, but as a management tool for editing boards • therefore a split in final API (the ideal) and first phase API • in 1. phase: no webservice, URL and details of services are known • all what comes is a result of a smooth interaction with the LORIA folks 3
DCR API 2 • function List load. Profiles () • give me all profiles in DCR • function List load. Data. Categories (a. Profile, a. Working. Language, a. Object. Language) • give me all datcats for a certain profile; result is perhaps a structure • function List search. Data. Categories (a. Query. String, a. Profile, a. Working. Language, a. Object. Language) • search for a datcat by specifying some pattern – mostly a name • function Data. Category load. Data. Category. Reduced (URID, a. Working. Language, a. Object. Language) • give me some info for a certain datcat (ID, definition, conceptual domain) • function Data. Category load. Data. Category (URID, a Working. Language, a. Object. Language) • give me all info for a certain datcat for specified languages • function List load. All. Top. Broader. Generic. Concepts () • give me all top conceptual domains • function List give. Links (a. Data. Category) • give me additional information such as constraints (to be worked out!) 4
DCR API 3 • function Data. Category load. Broader. Generic. Concept. Data. Category (a. Data. Category/URID) • give me a broader concept for a name or URID • function List load. Data. Categories. Using. Broader. Generic. Concept (a. Data. Category) • give me all datcats for a broader concept • function Workspace open. Workspace (a. User. Name, a. User. Login) • open a private workspace for a user • function Data. Category Add. Data. Category. To. Workspace (a. Workspace, a. Data. Category, a. Status) • add a datcat to a workspace • login into the system • synchronize with a given cash 5
LMF Registries General • an LMF API makes only sense if you have a service • a service makes only sense if you can serve something • what can LMF services serve: • lexical schemas • extensions i. e. ready-made components created by someone • LMF compliant lexica created by someone • other lexica related information • so we need registration services • MPI will start doing so since we need it now • will set things up similar to IMDI • all is open (registries and portal code) • everyone can easily setup his/her own portal • will and have to synch about various things • perhaps people will like it 6
LMF Registries • registries must give the following services • register and store a lexical schema • register and store a lexical component schema • register (and store) a lexicon (storage can be everywhere) • delete an entry • modify an entry • let the user browse or search for lexica • let the user browse or search for schemas metadata based • for metadata start we suggest the stuff that came out of the discussions in ISLE/MILE (see IMDI) 7
LMF API 1 • if we have found a lexicon what then … • services based on web-services • UDDI level – why not the ISLE/MILE stuff • function Lexical. Database create. Lexical. Database (Name) create a lexicon in a workspace • function Lexical. Database load. Lexical. Database (URID) give me a certain lexicon (into workspace or local? ) • function Lexical. Database load. Lexical. Database. Detail (URID, a. Structure) give me a certain lexicon part (filtering into workspace or local? ) • function void store. Lexical. Database (Lexical. Database) store/upload a lexical database • function Lexical. Entry create. Lexical. Entry (URID, Lexical. Database) create/add a lexical entry • function Lexical. Entry load. Lexical. Entry (URID) upload a lexical entry • function void store. Lexical. Entry (URID, Lexical. Entry) update a lexical entry 8
LMF API 2 • function List search. Lexical. Entries (a. Query) search for lexical entries matching the string unstructured • function List search. Linguistic. Information. Units (a. Query, a. Structure) search for lexical entries matching the string on specific attributes returning substructures (filtering) In addition • function Lexical. Database. Schema load. Schema (URID) give me a schema for a specific lexicon • function void store (Lexical. Database. Schema) store and register a schema • function Global. Info load. Global. Info (URID) give me the metadata/global. Info for a lexicon • function void store. Global. Info (Global. Info) store and register a lexicon with metadata • function List search. Lexical. Database (a. Query) search for lexica based on metadata 9
LMF API 3 • what about • relations • where to store relations – need registry mechanism • if there is one integrated domain of lexica relations can be registered under this common root • Gil: took UML – UML has everything in it so also relations are in UML – so why bother • Peter: where to register is the question • these were just first ideas!! • Monica/Thierry haben ein Tool fuer die Constraints gemacht 10
What else: Relations • actually component association is a relation of special type bank breite Sitzgelegenheit something broad to sit on sitzgelegenheit etwas um zu sitzen • need various type of relations between attributes and units in value strings • each relation can be associated with features, i. e. relations can be seen as components in its own something to sit on schmal gegenteil zu breit contrary to broad 11
Relation Mechanism • need a generalized relation mechanism (look in Parole lexica etc) • prefer very simple graphics instead of UML hiding the essentials relation V type = any from cardinality to cardinality component K 1. . N component L • relation components are almost normal components, i. e. they can have components and datcats • however they don’t have a parent • do we need the destinction between “to” and “from” in general relations? ? • component reference is a special type of relation here we need to distinguish “to” and “from” 1. . N • added a few additional stuff in paper (direction) component X component Y relation U 1. . 1 type = refine from to cardinality 1. . N 12
What else: conditions (operations) just one example from DOBES lexemtype head if lexemtype = “stem | idiom | lexical word” sense nr outer-body-L meaning if lexemtype = “auxil | inflect affix” etc sense nr • probably better examples around if value(X) then modify contraints(Y) etc meaning effect categorial effect etc 13
Operation Mechanism • well – nothing special perhaps (operators as datcats – see Gil) • but need sequence of operations • but we need to be able to add complex operations (code) then need an invocation and interfacing standard 14
LEXUS etc • we need to go ahead since we have to deliver usable infrastructures • so hope on critical comments and fast convergence • relation mechanism is next on our action list • LMF API relevant for us since we have to combine LEXUS and LAMUS • LAMUS = Language Archive Management and Upload System is ready and working for simple objects (annotated media, …) • but need to handle complex objects such as lexica • metadata is done – people can integrate and search for lexica • registries for schemas and sub-schemas comes next as well 15
LEXUS state • ISO DCR integrated – Shoebox MDF as well, GOLD to come • private and protected workspace is ready • Shoebox/ CHAT filters ready, XML grabbing to come • first cross lexicon search is ready • working on private DCR (stripped but compliant Syntax) • working on Concept Profiles (bottom up generated concept lists) • working on tools to link bottom up stuff with ISO, … • working on easy mapping framework • first interaction with corpora is ready • first merging is integrated • what else? ? 16
Logging onto the application Users must authenticate before loggin onto the application. Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 17
User workspace Each user has his/her own personal workspace where private lexica are stored Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 18
Lexicon creation New lexica may be created… Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 19
Lexicon import New lexica may be imported from a lexical resource… Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 20
Lexicon structure The LMF core model can be identified in this simple structure. Components and datacategories can be identified using different icons. All may be dynamically created or modified. Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 21
Lexicon structure Representation of a more complex structure. By selecting a node in the Tree the content of a component or datacategory is shown and may be modified. Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 22
Data category selection Data categories can easily be selected from data category registries. . Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 23
Lexical entry overview Overview of lexical entries. By selecting a lexical entry the details will be revealed. Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 24
Lexical entry details Details of a lexical entry. Entry structure modifications are bound to schema definition, e. g. cardinality. Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 25
Lexical entry details Attribute values can be easily modified. Various value types are supported( text, video, audio, image or file) Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 26
Lexical entry details Example of uploading a video file. Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 27
Lexical entry details Viewing multimedia content. Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 28
Alternative entry view Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 Alternative views are provided which may be customized in look and feel. 29
Synchronization of lexica Personal Workspace Main Lexicon Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 Lexica may be copied to and modified in personal workspace 30
Synchronization of lexica Personal Workspace Main Lexicon Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 Lexica may be synchronized with main lexicon 31
Synchronization of lexica Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 When synchronizing lexica the user is notified of structural changes and is in total control of the synchronization proces. 32
Future directions • Support for various types of relations • Import of data from other sources • Support for other Data Category Registries, e. g. GOLD • Integration with MPI archive • Integration with exploitation tools (ELAN, ANNEX) • Miscellaneous user requests Workshop ‘Lexical. Dabases and digital tools’ Nijmegen April 2004 33
- Slides: 33