RECENT TRENDS IN METADATA GENERATION Milena Dobreva Nikola

RECENT TRENDS IN METADATA GENERATION Milena Dobreva, Nikola Ikonomov IMI-BAS SEEDI Conference, Cetinje, September 2007

Warm-up question: • Do we need this talk? We are all metadata experts! SEEDI Conference, Cetinje, September 2007

A Metadata Metaphor? • We know there are − Many standards − Too many ad-hoc solutions • We even know how to use some of them (or at least which is the right one for our project) • BUT we typically do not know − How to save time and human effort in creating and editing metadata? SEEDI Conference, Cetinje, September 2007

The current picture • We can not avoid looking for answers to the question of saving time/effort, because − We live in the time of data deluge − The number of digitally born objects grows rapidly i. e. the demand for metadata and quality grows SEEDI Conference, Cetinje, September 2007

Metadata in the Digital Library Context: the DELOS project reference model SEEDI Conference, Cetinje, September 2007

Metadata seems to be part only of the CONTENT, but it influences all core concepts • Content is the entry point for all the concepts related to the content that is managed and disseminated by the DL e. g. collections, information space model, metadata, ontologies; • User is the root for concepts like roles, communities, profiles, etc. , that represent aspects of the DL users; • Functionality is the entrance to that part of the model which concerns DL functions; • Architecture regards software components, hosting nodes and how these are linked and constrained; • Quality groups qualitative parameters characterizing the digital library behavior within a given operational domain; • Policy covers all the concepts that are related to established procedures or plans of actions governing the DL, such as collection management, preservation, access rights, etc. SEEDI Conference, Cetinje, September 2007

Definitions • Recall: proportion of relevant documents, which are retrieved out of all relevant documents; • Precision: proportion of retrieved and relevant documents; • Accuracy: denotes the quantity of retrieved docs which are matching exactly the topic. SEEDI Conference, Cetinje, September 2007

Automatic extraction of metadata • A group of NLP methods – text analysis aimed at extraction of specific metadata elements • Various elements • Measurement: through information retrieval measures (accuracy, recall, precision) SEEDI Conference, Cetinje, September 2007

Current research SEEDI Conference, Cetinje, September 2007

Current research (cont’d) SEEDI Conference, Cetinje, September 2007

Current research (cont’d) SEEDI Conference, Cetinje, September 2007

Conclusions • These tools are all used for processing of English texts – the Balkan languages impose more challenges • The quality of achieved results is not high enough yet, but this is a field of active work • Integration of image and text processing is another direction for future work. SEEDI Conference, Cetinje, September 2007

Thank you for your attention! SEEDI Conference, Cetinje, September 2007
- Slides: 13