METU Turkish Discourse Bank Browser Utku 1 irin
METU Turkish Discourse Bank Browser Utku 1 Şirin , Ruket 1 Çakıcı , Deniz 2 Zeyrek 1 Department , Computer Engineering Informatics 1, 2 Middle East Technical University, Ankara, Turkey utkusirin@gmail. com, ruken@ceng. metu. edu. tr, dezeyrek@metu. edu. tr Introduction Search The Middle East Technical University (METU) Turkish Discourse Bank (TDB) project extends the METU Turkish Courpus (MTC) from a sentence-level resource to a dşscourse-level language resource (Zeyrek et al. , 2008). The TDB aims to capture discourse relations to the extent that they are instantiated by explicit discourse connectives. The annotations are creted by the DATT (Aktaş et al. , 2010) which generates a layer of annotation data in XML format by means of character indexes. The METU TDB browser uses these annotation files and the indexes created by the DATT to serve as a clear interface for the annotations in the TDB to effectively identify and exploit various aspects of Turkish Discourse. Quick search filters the text file list by only the connective and the genre such that there are only text files which include at least one relation whose connective is the specified connective and text files whose genre is the specified genre. Turkish Discourse Structure Connectives establish discourse relations mainly by taking two arguments, i. e. , text spans that the connective relates. The argument-connectiveargument structure is the basis of the TDB formalism enhanced with some other elements of Turkish discourse. In the TDB browser, we abbreviate the first argument of a discourse connective as ARG 1 and the second one as ARG 2. In all the examples in the paper, ARG 1 is rendered in italics, ARG 2 in bold. The connective is underlined. Discourse connectives are shown as CONN. In addition to these basic categories, text spans that supplement ARG 1 and ARG 2 are also annotated, respectively called SUPP 1 and SUPP 2. Finally, modifiers of the connectives, abbreviated as MOD, and grammatical elements that are shared by two arguments, abbreviated as SHARED. 2 Institute When ARG 1 and ARG 2 spans are consecutive with only punctuation marks, the corresponding conncetive, its modifier, or shared arguments intervening, then ARG 1 is defined as adjacent to ARG 2. In (1) ARG 2 is a discontinous argument and ARG 1 is nonadjacent to ARG 2 due to the phrase ‘as we explained earlier’. (1) General search is performed withing a selected text file. After Figure 3: Quick Search Specifying the string, user can see all of the matching (sub)strings in the text file. Relation filters the relations that are listed in the relation list. After speficying a prefix string, user can see the connectives whose prefixes matches exactly to the given prefix string. Bakan açıklama yapmadı. Daha önce dediğimiz gibi, bu durum aslında beklenmedik değil. The minister has not made an explanation. Indeed, as we explained earlier, this is not something we do not expect. Technical Characteristics Figure 4: Relation Filter Advanced search facility provide wider range of search options. One can perform a string search in any element of a relation either by a regular expression or basic text search. The discontinuity of any element of a relation and the adjacency information for arguments can be retrieved through advanced search. Beside, genre, author, publisher and publishing date of texts can also be specified. Long advanced search queries may also be saved permanently, or temporarily. The METU TDB browser is written in Java SE 6 with Net. Beans 6. 9. There are three different versions of the METU TDB browser for three different platforms, Mac, Ubuntu, and Windows. The browser is licenced by LGPL. Its source code is publicly available at https: //sourceforge. net/projects/tdbbrowser. Figure 7 shows the software architecture of the METU TDB borwser. The browser is initialized by three file paths, the annotation directory, the text files directory and the tag file. After the initialization the browser steps into Main Window component and through this component, user can use the browser for browsing, searching etc… Figure 7: Software Architecture of the Browser METU TDB Browser User Evaluations There are two basic features of METU TDB browser, browsing and more imporantantly, searching. The browser can also explore the structural aspects of the arguments and connectives such as discountiuity and adjacency. Browsing Figure 5: Advanced Search There are three main parts in browsing window text file list at the left, the selected text file at the middle, and the relation list in the selected text file at the right. Selected annotations are highlighted in the middle window with respect to the colors in Figure 2. Advanced search query results are shown in a separate window, which provides to perform multiple advanced search queries and use the results concurrently. Conclusion Figure 6: Advanced Search Resışts Figure 1: Browsing Figure 2: Colors One of the connectives, i. e. için ‘because/to’ has different senses depending on the suffix its ARG 2 takes. Browser provides to identify such differences. Secondly, discontinous and nonadjacen arguments of a specific connective can be identified, which is particularly important in understanding what connectives do. Zeyrek et al. (in print) find that ayrıca ‘besides’ has more nonadjacent ARG 1 s than oysa ‘whereas’ and fakat ‘but’, which may be used to prove the hypothesis that ayrıca s a discourse adverbial (i. e. a connective whose meaning is not necessarily derived by the adjacency of its arguments). Discontinuity concerns either a connective or an argument. A discontinous connective would be either … or and its equivalent ya … ya da in Turkish. Discontinuity of and argument means that there is intervening material inside an argument. Adjacency, on the other hand, is a relationship between the ARG 1 of a connective and its ARG 2. In this paper, we have introduced the METU TDB browser. As future work, we will add some more parameters to the quick search option and use METU TDB browser for some statistical analyses over annotated MTC corpus. REFERENCES Berfin Aktaş, Cem Bozşahin, and Deniz Zeyrek. 2010. Discourse Relation Configurations in turkish and an Annotation Environment. In Proc. of the 4 th Lingusitc Annotation Workshop, ACL 2010, pages 202206. Deniz Zeyrek and Bonnie Weber. 2008. A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus. In Proc. Of the 6 th Workshop on Asian Language Resources, The 3 rd IJCNLP. Deniz Zeyrek, Ümit Deniz Turan, Işın Demirşahin, and Ruket Çakıcı. (in print). Differential Properties of Three Discourse Connective in Turkish: A Corpus-based Analysis of Fakat, Yoksa, Ayrıca. In Antın Benzi Peter Khlein, Manfred Stede (Eds. ) Constraints in Discourse III.
- Slides: 1