Gr ETEL 4 Jan Odijk LREC Miyazaki 2018

  • Slides: 25
Download presentation
Gr. ETEL 4 Jan Odijk LREC Miyazaki 2018 -05 -10 1

Gr. ETEL 4 Jan Odijk LREC Miyazaki 2018 -05 -10 1

Overview • Gr. ETEL 1, 2, 3 • Gr. ETEL 4 – Developers: Martijn

Overview • Gr. ETEL 1, 2, 3 • Gr. ETEL 4 – Developers: Martijn van der Klis, Sheean Spoel, Gerson Foks (DH Lab) • Illustration 2

Gr. ETEL 1, 2, 3 • Gr. ETEL: KU Leuven – Cooperation CLARIN-NL and

Gr. ETEL 1, 2, 3 • Gr. ETEL: KU Leuven – Cooperation CLARIN-NL and CLARIN Flanders • Gr. ETEL 2, 3: extensions, improvements in other Flemish projects • Application for searching in a treebank – Treebank = text corpus in which each sentence has been assigned a syntactic structure – Syntactic structure is usually a tree • Core feature: example based querying 3

Gr. ETEL 1, 2, 3 • Treebanks: – LASSY-Small (1 m tokens, written language)

Gr. ETEL 1, 2, 3 • Treebanks: – LASSY-Small (1 m tokens, written language) – CGN (1 m tokens, spoken language) – (V 3) So. Na. R Treebank (>500 m tokens) • V 1: http: //nederbooms. ccl. kuleuven. be/eng/gretel/ • V 2: http: //gretel. ccl. kuleuven. be/gretel-2. 0/ • V 3: http: //gretel. ccl. kuleuven. be/gretel 3/index. php 4

Gr. ETEL 4 • Gr. ETEL 4: UU Utrecht – In CLARIAH and UU-internal

Gr. ETEL 4 • Gr. ETEL 4: UU Utrecht – In CLARIAH and UU-internal Ann. Cor project • New functionality that KU Leuven could not add: – Upload a user’s own corpus incl. metadata – Search in the user’s own automatically parsed corpus – Analysis of search results combined with metadata – Better support for Xpath Queries – Improved interface functionality • V 4 (alpha!) http: //gretel. hum. uu. nl/gretel 4/ 5

Illustration • Upload Corpus – Plain text or CHILDES CHAT – TEI and Fo.

Illustration • Upload Corpus – Plain text or CHILDES CHAT – TEI and Fo. LIA to follow • CHAT Utterances are cleaned and metadata uploaded: – knor [!= pigsound], ik heb honger – knor, ik heb honger 6

Corpus Upload 7

Corpus Upload 7

Corpus Overview 8

Corpus Overview 8

Corpus Details 9

Corpus Details 9

Query Example • Constructions with 3 bare verbs in the Dutch CHILDES Van Kampen

Query Example • Constructions with 3 bare verbs in the Dutch CHILDES Van Kampen Laura Corpus • Example sentence: – Hij zal dat willen doen 10

Example Sentence 11

Example Sentence 11

Parse Tree 12

Parse Tree 12

Select Parts 13

Select Parts 13

Query Tree 14

Query Tree 14

Select Treebank 15

Select Treebank 15

Query //node[@cat and node[@pt="ww" and @rel="hd"] and node[@cat="inf" and @rel="vc" and node[@rel="hd" and @pt="ww"]

Query //node[@cat and node[@pt="ww" and @rel="hd"] and node[@cat="inf" and @rel="vc" and node[@rel="hd" and @pt="ww"] and node[@rel="vc" and @cat="inf" and node[@pt="ww" and @rel="hd"]]]] 16

Example: Query Output 17

Example: Query Output 17

Utterance Details 18

Utterance Details 18

Result Statistics 19

Result Statistics 19

Analysis 20

Analysis 20

Some Results • 3 verbs: – 335 hits found – 313 by adults, 12

Some Results • 3 verbs: – 335 hits found – 313 by adults, 12 by child – 4 by child do not occur among adults – 8 others are not in most frequent of adults – Child examples as of month 43 (3; 7) • 2 verbs: – 6, 645 in total, 1, 363 uttered by child – as of month 23 (1; 11). 21

Concluding remarks • Gr. ETEL is a very user-friendly search engine – Enables searching

Concluding remarks • Gr. ETEL is a very user-friendly search engine – Enables searching for constructions – Enables search for disambiguated words • Utrecht extensions – Enable searching in your own research corpus – Enable detailed analysis of search results 22

Concluding remarks • User-friendliness – Also implies limitations! • Automatic parsing – Is not

Concluding remarks • User-friendliness – Also implies limitations! • Automatic parsing – Is not flawless – Requires additional checks before conclusions can be reliably drawn • Try it out! http: //gretel. hum. uu. nl/gretel 4/index. php – Even if it is still under development 23

Thanks for your attention 24

Thanks for your attention 24

More information • http: //portal. clarin. nl, http: //www. clariah. nl • • •

More information • http: //portal. clarin. nl, http: //www. clariah. nl • • • Recorded lecture on Gr. ETEL: http: //lecturenet. uu. nl/Site 1/Catalog/Full/c 9 f 887 bc 45154 af 5 bd 7 cdb 218216816621 Educational Package: http: //dev. clarin. nl/sites/default/files/Educational. Module-v 4 b. pdf Augustinus, L, Vandeghinste, V, Schuurman, I and Van Eynde, F. 2017. Gr. ETEL: A Tool for Example-Based Treebank Mining. In: Odijk, J and van Hessen, A. (eds. ) 2017. CLARIN in the Low Countries, Pp. 269– 280. London: Ubiquity Press. DOI: https: //doi. org/10. 5334/bbi. 22 License: CC-BY 4. 0 Odijk, J. , van der Klis, M. , and Spoel, S. (2018). Extensions to the Gr. ETEL treebank query application. Proceedings of the 16 th International Workshop on Treebanks and Linguistic Theories (TLT 16) pp 46 -55, Prague. http: //aclweb. org/anthology/W/W 17 -7608. pdf Odijk & Van Hessen (eds. ) 2017. CLARIN in the Low Countries. London: Ubiquity Press. (Open Access). DOI: http: //dx. doi. org/10. 5334/bbi 25