Information Query Formulation in a Slavonic Language and
- Slides: 20
Information Query Formulation in a Slavonic Language and its Automatic Processing Experience from Polish and Czech in comparison to Western European Languages Petr Strossa University of Economics, Prague Department of Information & Knowledge Engineering TEL-ME-MOR/M-CAST Seminar, 2006
General Issue 86 Question/Answer Types and the basic idea of their recognition in texts [D. Laurent et al. , SYNAPSE, Toulouse] TEL-ME-MOR/M-CAST Seminar, 2006 2
Technology Priberam’s lexicon data structure Sinta. Gest software tool [Priberam Informática, Lisbon] TEL-ME-MOR/M-CAST Seminar, 2006 3
Question-Answer Pattern (Example) Question(WEIGHT) : Root("jaký")? Dist(0, 5) Weight. Noun = 20// Jaká je hmotnost Země? : Wrd(jak) Weight. Adj = 20 // Jak těžký může být slon? : Wrd(kolik) Weight. Unit = 20 // Kolik kg má dospělý kapr? : Wrd(kolik) Root("vážit") = 20 // Kolik váží kapr? Answer : Weight. Noun Definition With Pivot Dist(0, 5) {Number 6 Weight. Unit} = 20 // Váha kapra může dosáhnout až 5 kg. : Pivot Dist(0, 5) Cat(V) Dist(0, 5) {Number 6 Weight. Unit} = 20 // Roční kapr může dosáhnout 5 kg tělesné váhy. ; Answer(WEIGHT) : Number 6 Weight. Unit = 20 ; TEL-ME-MOR/M-CAST Seminar, 2006 4
Definitions of Constants Used in the Previous Example Const Weight. Noun = Any. Root(hmotnost, hmota, "tíha", "váha", "zatížení"); Const Weight. Adj = Any. Root("těžký", "lehký"); Const Weight. Unit 1 = Any. Root(mikrogram, miligram, centigram, decigram, dekagram, hektogram, kilo, cent, megagram, miligram, tuna, "karát", pond, kilopond, megapond, libra); Const Weight. Unit 2 = Any. Wrd(mg, cg, dg, g, dag, deka, Dg, dkg, hg, kg, q, Mg, t, p, kp, Mp, lb, "lb. ", lbs, "lbs. ", cwt, "cwt. "); Const Weight. Unit = Any. Const(Weight. Unit 1, Weight. Unit 2); TEL-ME-MOR/M-CAST Seminar, 2006 5
General Observation • The conception and the tools designed to process Western European languages can be adapted to process Slavonic languages, as Polish and Czech. • Some basic differences between the language families must be kept in mind during such an adaptation! TEL-ME-MOR/M-CAST Seminar, 2006 6
The Abundance of Morphology • Nouns: 4 (!) genders, 2 numbers, 7 cases • Adjectives: e. g. světlý (bright) – 3 degrees: – 4 genders: – 2 numbers: – 7 cases: světlý ↔ světlejší, nejsvětlejší světlý ↔ světlá, světlé světlý ↔ světlí světlý ↔ světlého, světlému, . . . TEL-ME-MOR/M-CAST Seminar, 2006 7
The Abundance of Morphology (2) • Adjectives Continued: – Theoretically every adjective may have 3*4*2*7 = 168 forms altogether! – Practically some of them are regularly (without exceptions) equal. . . – A general scheme for a morphology pattern description cannot work with less than 57 forms (= 3 degrees * 19 possibly differing gender/number/case endings). TEL-ME-MOR/M-CAST Seminar, 2006 8
The Abundance of Morphology (3): Illustration – the 19 Ending System TEL-ME-MOR/M-CAST Seminar, 2006 9
The Abundance of Morphology (4) • Adjectives Continued: – In fact, not all of them may have all the forms. – Some adjectives cannot undergo gradation for purely morphological reasons: domácí (home, home-made) – Other adjectives usually do not undergo gradation for semantic reasons: jednofázový (one-phase) TEL-ME-MOR/M-CAST Seminar, 2006 10
Morphological Pattern (Ex. 1) Nom. Gen. Dat. Acc. Voc. Loc. Instr. Sg. babk babc babk a y e u o e ou Pl. babk babek babk babk TEL-ME-MOR/M-CAST Seminar, 2006 y ám y y ách ami 11
Morphological Pattern (Ex. 2) Nom. Gen. Dat. Acc. Voc. Loc. Instr. Sg. <S> <S> <S+S 0> <S> a y e u o e ou Pl. <S> <S+E> <S> <S> <S> TEL-ME-MOR/M-CAST Seminar, 2006 y ám y y ách ami 12
Morphology of Nouns: Some Statistics NUMBER OF PATTERNS NUMBER OF NOUNS FOLLOWING THEM PERCENTAGE 2 85 000 70 11 110 000 90 19 114 000 95 27 116 000 97 56 118 000 99 292 120 000 100 TEL-ME-MOR/M-CAST Seminar, 2006 13
Morphology of Nouns: Some Statistics (2) • We need about 300 noun patterns altogether. • We have about 90 noun patterns that describe the declension of at least 10 different nouns. • We have about 80 noun patterns that describe only 1 noun each. • About one half of the noun patterns describe the declension of 1– 3 nouns each. TEL-ME-MOR/M-CAST Seminar, 2006 14
Inherent Homonymy of Forms • A typical situation for our type of morphology: světlé (bright) – – – nominative/accusative/vocative singular neuter genitive/dative/locative singular feminine nom. /acc. /voc. plural fem. acc. pl. masculine animate nom. /acc. /voc. pl. masculine inanimate i. e. 13 possible grammatical interpretations altogether! TEL-ME-MOR/M-CAST Seminar, 2006 15
Inherent Homonymy of Forms (2) • Only a little bit less typical situation: Ženu holí stroj. – I am setting a machine in motion with a stick. • OR: I am setting a machine of sticks in motion. (*) – The woman is shaved by a machine. – Dress the woman with a stick. • OR: Dress the woman of sticks. (*) TEL-ME-MOR/M-CAST Seminar, 2006 16
Inherent Homonymy of Forms (3) • All the previous once again – in a question: Jaký je plat Petra Hanka? – What is the salary of XY? • X {Petr, Peter, Petar} • Y {Hank, Hanek, Hanke, Hanko} • The only thing we know for sure: X ≠ Petra (though such name exists); Y ≠ Hanka (though such name exists)! TEL-ME-MOR/M-CAST Seminar, 2006 17
Inherent Homonymy of Forms (4) Jaký je plat Petra Hanka? – What is the salary of XY? • The only thing we know for sure: X ≠ Petra (though such name exists); Y ≠ Hanka (though such name exists)! : Jaký plat Hanka dává svým zaměstnancům? – What salary does Hanka give to her/his employees? TEL-ME-MOR/M-CAST Seminar, 2006 18
Inherent Homonymy of Forms (Conclusion) • Due to our free word order, it is generally quite problematic to try any limited context disambiguation. • A really safe disambiguation is possible only after a complete syntactic analysis of a sentence (which should keep all the possible meanings of all the words up to the end). – (But we do not make complete syntactic analysis of sentences in M-CAST. ) TEL-ME-MOR/M-CAST Seminar, 2006 19
Free Word Order Again • How far is it to Brno? – Jak daleko je do Brna? – Jak je daleko do Brna? – Jak je do Brna daleko? – Do Brna je jak daleko? – Do Brna jak je daleko? – Do Brna je daleko jak? – Daleko je do Brna jak? (+++) (++) (+) (+) TEL-ME-MOR/M-CAST Seminar, 2006 20
- Slavonic language
- Why problem formulation follow goal formulation
- Query tree and query graph
- Query tree and query graph
- Iterative query vs recursive query
- Query operations in information retrieval
- Wildcard queries in information retrieval
- My structured query language
- Oql query examples
- Google data visualization api
- Convert natural language to sql query
- Introduction to structured query language (sql)
- Introduction to structured query language (sql)
- Corpus query language
- Language integrated query developer
- Common query language
- Kepanjangan sql
- Find the id name dept_name
- A structured query language – sql operators are
- Sql stands for structured query language
- Google visualization api query