TOWARDS LINGUISTICS ANALYSIS OF THE BULGARIAN FOLKLORE DOMAIN

TOWARDS LINGUISTICS ANALYSIS OF THE BULGARIAN FOLKLORE DOMAIN Galina Bogdanova, Konstantin Rangochev, Desislava Paneva-Marinova, Nikolay Noev Institute of Mathematics and Informatics, Bulgarian Academy of Sciences galina@math. bas. bg, krangochev@yahoo. com, dessi@cc. bas. bg, nickey. noev@gmail. com International Conference on Information Research and Applications – i. Tech 2011, Varna, Bulgaria, June, 2011

Linguistics research and analysis of the Bulgarian folklore (1) • Dictionaries classifications: • By format: traditional dictionaries, digital dictionaries (online (webbased) or local (desktop)) • By their purpose: descriptive dictionaries, grammar dictionaries, dictionary of synonyms, valence dictionary, dictionaries of etymology, phrase logical dictionaries, frequency dictionaries, concordance dictionaries, specialized dictionaries, terminological dictionaries, etc. • By the number and type of languages: mono-lingual, bilingual, multilanguage dictionaries

Linguistics research and analysis of the Bulgarian folklore (2) The main component of the linguistic research of the Bulgarian folklore is the analysis of its lexical structure. • How many and what token it contains? • Is there and what is the domination or the lack of some groups of tokens? • Paradigm relationships in the folklore lexemes • Context lexemes/Folklore language formulas • Frequency of the lexemes, verses/sentences in which they are, numbering in the song, etc. of the verses/sentences. • Word forms • Regional characteristics of the folklore lexical structure, etc.

Linguistics research and analysis of the Bulgarian folklore (3) Tools, formalizing the folklore analysis: • Frequency dictionary • A general frequency dictionary – it contains the all lexical units which are in a folklore object repository; • A regional frequency dictionary – it contains all the text units which come of a definite folklore region or of a concrete settlement; • A functional frequency dictionary – it contains all the text units which have identical functions: descriptions of the rites, various types of songs, narratives, etc.

Linguistics research and analysis of the Bulgarian folklore (4) Table: Comparison of the Bulgarian folklore and spoken languages

Linguistics research and analysis of the Bulgarian folklore (5) • Concordance dictionaries show the lexeme with/in her context. • Example for songs: “Fifty heroes are drinking wine” – the underlined lexeme is the examined and the lexemes in italic are her context. • Example for narrative text: In the description of the rituals one complete sentence is the context of the observed lexeme (from point to point).

Folk. Know project � Folk. Know project: “Knowledge Technologies for Creation of Digital Presentation and Significant Repositories of Folklore Heritage” (contract number: IO-0303/2006) � Supported by National Science Fund of the Bulgarian Ministry of Education and Science � Partners: Institute of Mathematics and Informatics - BAS, Institute for Folklore. BAS, Veliko Tarnovo University � Module 2: “Development, Annotation and Protection of a Digital Archive “Bulgarian Folklore Heritage” � Module 3: “Development of Digital Libraries and Information Portal with Virtual Exposition - Bulgarian Folklore Heritage”

Folk. Know project Main aim: research and development of complete webbased environment for the registration, documentation, and access to a wide range of Bulgarian folklore objects Target domain: Bulgarian folklore Target group of users: professionals and scientists, nonprofessionals, connoisseurs and viewers Used technologies: Digital libraries and Semantic Web technologies

Bulgarian folklore digital library Web address: http: //folknow. cc. bas. bg/

Folklore object preview Description of folklore object

Services - example Extended search through all the object’s characteristics

Linguistic search in text folklore objects � Search of a word in the different types of dictionaries; � Search of two or more words, searching of verbal formulas in the folklore lexis: “Drinking wine”, “Marko seated”. � Search of a group of words, investigating the paradigmatic relations in the folklore lexis (river- stream- brook- rill…) � Search for a root of a word, studying the folklore wordformation: “drink” (I am drinking, I have drunk, they have drunk…).

Experimental linguistic component in BFDL Frequency dictionary functional requirements � Linguistic analysis of the available set of test folklore objects; � Determination of the frequency of meeting the lexemes in text folklore objects; � Creating of lists of the lexemes, � in frequency order � in alphabetical order � � Taking the number of the lexical units; Taking the number of the repeats of the lexical units.

Experimental linguistic component in BFDL Sequence Diagram

Experimental linguistic component in BFDL Analysis class diagram for the BFDL linguistic component

The Frequency Dictionary Project � frequency dictionary for texts with folklore themes � WEB interface � full text search � rules and concepts in the field of Bulgarian folklore that filter the words/phrases � words/phrases are representatives of 20 different folklore rubrics (thematic headings)

Folklore rubrics 1. Village information 3. Songs 5. Dance folklore (descriptions) 7. Prose 9. National beliefs and knowledge 11. Magic 13. Dreams 15. Belongings 17. Architecture, monuments 19. Festivals, gatherings and reviews 2. Rituals and feasts 4. Instrumental music (descriptions) 6. Children folklore 8. Proverb, saying 10. National medicine 12. Fortune-telling 14. Clothing and adornment 16. National art 18. Food and feeding 20. Others

Frequency Dictionary Specification (1) � administrative area: Adding of a text: here the application has a text field, that enables addition of text and a field that enables upload of the source file. � Adding of a rubric: the application is simplified to the limit and the administrator chooses the level on which he wants to add a rubric and gives only its name � Change of a rubric � Deletion: After an object is chosen to be deleted at the chosen level, the system deletes cascade all lower levels � � User part: � composed of the search form that allows for selecting a desired item, the level and the corresponding text. The results appeared on the screen in which information rubric, how many files and how the words are distributed

Frequency Dictionary Specification (2) � Full text search: System perform a full text search of corpuses of text � Text filtration by rubrics, indexes or metadata �

Thank you for your attention! Bulgarian Folklore Digital Library http: //folknow. cc. bas. bg For contacts: dessi@cc. bas. bg galina@math. bas. bg krangochev@yahoo. com nickey. noev@gmail. com
- Slides: 20