CLa DABG Overview Kiril Simov and Petya Osenova
CLa. DA-BG Overview Kiril Simov and Petya Osenova Advisory Board Meeting 4 -5 July 2019 Sofia, Bulgaria
Outline � CLARIN ERIC and DARIAH ERIC � CLa. DA-BG: Bulgarian CLARIN and DARIAH research infrastructure for resources and technologies for linguistic, cultural and historical heritage � Centers � Applications � Conclusions
CLARIN Mission Common Language Resources and Technology Infrastructure ◦ to create a European federation of existing digital repositories that include language based data ◦ with uniform access to the data wherever it is ◦ with easy access to existing language and speech technology tools as web services, to retrieve, manipulate, enhance, explore and exploit the data ◦ with as its primary target audience researchers in social sciences and humanities (i. e. in general people without technological background) Footer
DARIAH Mission Construction of a virtual network of web-based data and technological services for Cultural and Historical Heritage Data Collections and Technologies ◦ Computer network of repositories and technologies with internet interface ◦ Electronic content ◦ Software technologies Footer
CLa. DA-BG: Bulgarian CLARIN and DARIAH � Construction of Virtual Research Infrastructure for Language Technology, Art and Historical Heritage � Language Technology for Ends Users (researchers, teachers, etc. ) � Semantic Technology for Artifact Modeling � Linked Open Data for Language Technology and Resources Description � 3 D Technologies for digitization, postprocessing, and collection organization and visualization
CLa. DA-BG Consortium Technological Partners � Institute of Information and Communication Technologies at BAS (IICT-BAS) � Institute of Mathematics and BAS (IMI-BAS) � Sofia University "St. Kliment Ohridski" (SU) � Ontotext AD (Onto) – Semantic Web Technology Company � New Bulgarian University (NBU) � "Konstantin Preslavski" University of Shumen (Sh. U) � Bulgariana - a non-profit association for preservation of cultural heritage (Bulgariana) Footer
CLa. DA-BG Consortium Content and Expertise Providing Partners � Cyril and Methodius Research Centre at BAS (CMRC- BAS) � Institute of Balkan Studies with Thracology Center at BAS (IBSTC-BAS) � Institute of Ethnology and Folklore Studies with Ethnographic Museum at BAS (IEFSEM-BAS) � Regional Library “Ivan Vazov” - Plovdiv (RLIV-Plovdiv) � The South-West University "Neofit Rilski" (SWU) � Burgas Free University (BFU) � Sirma Media (SM) � Sofia Museum of History (SMH) Footer
CLa. DA-BG-А Infrastructure Center. CLa. DA-BG-А, coordinated by IICT-BAS � Its mission will be to offer services that are relevant for the infrastructure as a whole and that need to be offered at a high level of commitment (availability, persistence) � Services: • • Federated search service Joint metadata portal Data category registration service Schema registration services, connection to other EU CLARIN and DARIAH centers
CLa. DA-BG-B Language Data and Technology Center. CLa. DA -BG-B 01, coordinated by IICT-BAS � Its mission will be to offer services that include the access to the resources stored by them and tools deployed at the centre via specified and CLARIN ERIC compliant interfaces in a stable and persistent way � Services: • Corpora management • Remote access to language tools
CLa. DA-BG-C Language Data and Technology Center. CLa. DA -BG-C 01, coordinated by IMI-BAS � Its mission will be to offer machine readable metadata in a stable and persistent way allowing service providers to harvest their metadata and making them browsable, searchable and combinable � Services: • Metadata standard localization, • Metadata editors for object description records
CLa. DA-BG-D � National Centers of type D: CLa. DA-BG-D 01, CLa. DA-BG-D 02, CLa. DA-BG-D 03 � The centers will be specific to the requirements defined by DARIAH-ERIC � The centers of this type will provide technological services related to: • Digitalization of arts and humanities data, and • Data services for management of and access to actual data collections Footer
CLa. DA-BG-D 01 3 D Technologies Center. CLa. DA-BG-D 01, coordinated by IICT-BAS � It will provide a comprehensive set of technologies for 3 D processing of CHH artifacts. This will include: • 3 D digitization • post-processing 3 D models • 3 D collection organization and visualization Footer
CLa. DA-BG-D 02 Semantic Technology Center. CLa. DA-BG-D 02 coordinated by Ontotext/IICT-BAS � It will provide services for conceptual modeling of CHH data. This will include: • Ontology and Linked Open Data for CHH • Search and Visualization • Support of Research Tasks Footer
CLa. DA-BG-D 03 CHH Data Collections Management Center. CLa. DA-BG-D 03, coordinated by Sofia University and IMI-BAS. � It will provide services for long term preservation and access of CHH data collections � Existing data collections will be made available and aggregated in Europeana via the center � Services • Editors for collection management • Visualization of data collections Footer
CLa. DA-BG-K 01 Knowledge Center. CLa. DA-BG-K 01, coordinated by IICT-BAS � Its mission will be to offer expertise and advice about various matters that are relevant for the researchers to easily make use of the CLa. DA-BG services � Services: ◦ Service documentation and courses ◦ Studying the user needs ◦ Joint Master Programme in Language, Semantics and Data Technologies
CLa. DA-BG Use Cases � Use cases will provide testbeds for the technologies, resources and tools � Use cases are useful applications in themselves � Could be a basis for industrial applications � For each of the infrastructures there are three use cases � Active participation of Content and Expertise Providing Partners
CLa. DA-BG Use Cases � Language Technologies for Disabled People � Language Technologies for e. Learning � Language Technologies for Political Studies � Re-use of Digitized Cultural Content � Enrichment of digital cultural content for creative industries � Cultural and Historical Heritage for Education
Research in SS&H �A huge variety of research objects � A wide range of specific technologies for creation, generalization, storage, search, etc. � CLa. DA-BG will not be able to construct services for each of them � The research in SS&H requires we to put the various types of data in the context of each other � We name this process of integration of the data contextualization
CLa. DA-BG Contextualization We consider contextualization as a network of interlinked descriptions of people, events, geographical entities, objects, documents, authors, opinions, etc. ◦ People – biographical data – events in their life, their roles ◦ Geographical entities – history of cities, etc. ◦ Objects – creation, discovery ◦ Events – place, time, participants, connection to other events ◦ Documents – authors, contents, opinion about peoples, events, … ◦ Text as main source of information ◦ Technology – Linked Open Data and Knowledge graph
Knowledge Graph
Knowledge Graph
LOD Cloud
Architecture for Building a Knowledge Graph
Data Hierarchy in CLa. DA-BG Metadata Knowledge Graph Domain Specific Data Set 01 Domain Specific Data Set 02 Domain Specific Data Set 0 K
Conclusion � CLa. DA-BG is in its construction phase � Many resources and services that constitute the content of the infrastructure already exist since they were built within different European and national projects � Our first focus is on the integration of the existing resources and technologies � Our aim is to contextualize resources in order to support research in Social Sciences and Humanities
- Slides: 25