FF FER Comparative Analysis of Automatic Term and
- Slides: 15
FF & FER Comparative Analysis of Automatic Term and Collocation Extraction Sanja Seljan, Bojana Dalbelo Bašić, Jan Šnajder, Davor Delač, Matija Šamec-Gjurin, Dina Crnec Faculty of Humanities and Social Sciences, Department of Information Sciences Faculty of Electrical Engineering and Computing INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
Overview I. FF & FER Introduction – II. Reasons for extraction Research – – Resources & tools Extracted lists III. Evaluation – Precision, recall, F-measure IV. Conclusion INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
I. Introduction FF & FER • Monolingual and multilingual resources – Helpful – Integrated – Require human intervention • EU pre-accession activities – Speed up + consistency • Used in further research and practice INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
FF & FER • List: – Terms (Member State, European Union) – Collocations (adopt a/the resolution, decided as follows) – Multi-word units (depend on, well-being) • Term extraction process: – Term extraction (term acquisition)- identification – Term recognition - verification INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
II. Research FF & FER • Resources – 10 documents – legislation, Cro-Eng • Tools – Terme. X tool (FER) – list A – SDL Multi Term Extract + Noo. J (FF) – list B • Reference list – Evaluation – reference list INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
Reference list FF & FER • 470 terms and collocations • Exclude unigrams • Balance between lexical coverage, adequacy, practicality – terms (NPs: 346/470) – collocations (VPs) INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
Reference list FF & FER • Contains: – Terms (acquiring company, applicant country) – Collocations (adopt a/the resolution, decided as follows, entry into force, having regard to) – Names and abbreviations (Economic and Monetary Union EMU, European Union EU) – Relevant embedded terms (crime prevention, crime prevention bodies, national crime prevention measures). INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
List B FF & FER • Language-independent statistically-based SDL Multi Term Extract tool – Frequency treshold set to 4 – Filtered by the list of stop-words -> 369 cand. • Language dependant Noo. J tool – 36 local grammars -> 512 cand. INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
List A FF & FER • Terme. X – Lexical association measures (AMs) – 14 AMs (PMI, Dice, Chi-square, …) – Lemmatization – POS filtering – Frequency treshold set to ? INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
List A FF & FER • Extracted terms ranked by AM value – 1816 candidates • AMs used: – 2 -grams – PMI – 3 -grams, 4 -grams – heuristic extensions • Noun phrases only INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
Results FF & FER • Evaluation – F 1 -measure (precision, recall) – True positives calculated by taking into account inflection (suffix stripping) List A List B No. of terms 1816 508 Valid terms 202 234 Precision (%) 11. 56 47. 37 Recall (%) 42. 98 49. 79 F 1 (%) 18. 22 48. 55 INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
Results FF & FER • List A unsatisfactory – Low recall – Verb phrases, terms consisting of more than 4 words – Low precision – ranked list, can be improved with cut-off (true positives are better ranked) • List B modest – can be improved with lemmatization, definition of upper/lower cases, more detailed local grammar INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
Conclusion FF & FER • Comparison of two hybrid approaches to term extraction • Human created lists differ from extracted lists – human knowledge, experience and intuition • Space for improvement – automatic extraction combined human intervention INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
FF & FER Thank you! INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
FF & FER INFuture 2009: Digital Resources and Knowledge Sharing, 4 -7 November 2009
- Uvod u teoriju računarstva fer
- Uvod u teoriju računarstva fer
- Aminoácidos esenciales
- Fer signali i sustavi
- Fer signali i sustavi
- Fernitine
- Fer doktorski studij
- Bobine à noyau de fer
- Shadoks oeufs en fer
- Fer'i harfler
- Pressione colloido osmotica rene
- Ff fer
- Imersul
- Test d'emmel positif
- Fıkhi hükümler ve kaynakları
- Planeación financiera a corto plazo