Using XSL and XQL For Efficient Customised Access

  • Slides: 27
Download presentation
Using XSL and XQL For Efficient, Customised Access To Dictionary Information Kevin Jansz kjansz@sultry.

Using XSL and XQL For Efficient, Customised Access To Dictionary Information Kevin Jansz kjansz@sultry. arts. usyd. edu. au Department of Linguistics, University of Sydney, Australia Jim Sng Wee School of Applied Science, Nanyang Technological University, Singapore Christopher Manning Departments of Computer Science and Linguistics, Stanford University, USA Nitin Indurkhya School of Applied Science, Nanyang Technological University, Singapore

Objectives Provide innovative ways for representing a dictionary, through creative use of web technology

Objectives Provide innovative ways for representing a dictionary, through creative use of web technology n Provide practical, educationally useful access to information that can be customised to suit the needs of many users (at low labour cost) n Examine the richness of lexical structure Initial target: the Warlpiri dictionary. n

Research Program: Lexicon n A language is more than individual words with a definition

Research Program: Lexicon n A language is more than individual words with a definition – it is a vast network of associations between words and within and across the concepts represented by words n n n Aim to provide people with a better understanding of this conceptual map. Traditional paper dictionaries offer very limited ways for making such networks visible There are no such limitations on a computer

Research: Computational Lexicography n Dictionaries on computers are now commonplace – – n Few

Research: Computational Lexicography n Dictionaries on computers are now commonplace – – n Few utilise the potential of the new medium Many present a plain, search-oriented representation of the paper version Goal: fun dictionary tools that are effective for language learning, browsing – – – Like flicking through pages of a paper dictionary Words are grouped by their meaning and their association with each other Key to the effectiveness of this browsing is that the user has control over the way this is presented.

Initial focus: Warlpiri n n Warlpiri is an Australian Aboriginal language spoken in the

Initial focus: Warlpiri n n Warlpiri is an Australian Aboriginal language spoken in the Tanami desert (NW of Alice) There a number of factors influencing this choice: – – – One of the most comprehensive lexical databases for any Australian Language (Laughren & Nash 1983) Relatively large community of people interested in learning their traditional language Until now, results haven’t been produced in a format usable by the community (only raw printouts)

Target user community

Target user community

Kirrkirr: A Warlpiri dictionary browser (Jansz 1998; Jansz, Manning and Indurkhya 1999) n n

Kirrkirr: A Warlpiri dictionary browser (Jansz 1998; Jansz, Manning and Indurkhya 1999) n n An environment for the interactive exploration of dictionaries. Current work has just been with Warlpiri, the design is general (Arrernte coming soon!) Attempts to more fully utilise graphical interfaces, hypertext, multimedia, and different ways of indexing and accessing information It can either be run over the web [high bandwidth] or run locally (here Java’s main advantage is crossplatform support).

Overview n Animated Graph layout of word relationships

Overview n Animated Graph layout of word relationships

Overview n n Graph layout Formatted entries

Overview n n Graph layout Formatted entries

Overview n n n Graph layout Formatted entries A Notes facility for ‘jotting in

Overview n n n Graph layout Formatted entries A Notes facility for ‘jotting in the margin’

Overview n n Graph layout Formatted entries Notes Multimedia: audio, pictures

Overview n n Graph layout Formatted entries Notes Multimedia: audio, pictures

Overview n n n Graph layout Formatted entries Notes Multimedia Advanced searching interfaces

Overview n n n Graph layout Formatted entries Notes Multimedia Advanced searching interfaces

Overview n n n Graph layout Formatted entries Notes Multimedia Advanced searching Semantic Domain

Overview n n n Graph layout Formatted entries Notes Multimedia Advanced searching Semantic Domain Browsing

Overview n n n Graph layout Formatted entries Notes Multimedia Advanced searching Semantic Domain

Overview n n n Graph layout Formatted entries Notes Multimedia Advanced searching Semantic Domain Browsing n Others in planning: formatting (XSL) editing, figuration patterns. n These attempt to cater to users with different interests and competence levels

The lexical database n n Original materials stored in an ad hoc format of

The lexical database n n Original materials stored in an ad hoc format of markup using backslash codes with some (rather odd) nesting of structural tags These were converted to XML using an errorcorrecting stack-based parser (written in PERL). – The inconsistency and flexibility of dictionary entries actually made this a surprisingly difficult task. – But parser tries to impose data integrity n n Use of XML gives a clear structure to the lexical data, and makes available many (free) tools Result remains a portable, tangible text file

XML indexing - challenges n Few XML parsers make single entries retrievable from the

XML indexing - challenges n Few XML parsers make single entries retrievable from the file n Typically, the entire XML document is put in memory n This is not practical when parsing significant XML databases (e. g. , the Warlpiri dictionary is approx. 10 Mb).

XML Dictionary Indexing (XDI) n Hierarchical structure of XML lends itself to indexing –

XML Dictionary Indexing (XDI) n Hierarchical structure of XML lends itself to indexing – n Each entry in the XML file can be considered as a separate entity To make the Warlpiri dictionary usable for Kirrkirr an ad hoc indexing system was developed – Uses a slightly modified Ælfred XML parser – Entries indexed by headword in a separate index file n The system returns an XML document object containing the single dictionary entry, facilitating: – processing for related words (Graph layout) – XSL processing to HTML

Kirrkirr’s XML Index Process Index in Memory XML Formatted Warlpiri dictionary file headword file

Kirrkirr’s XML Index Process Index in Memory XML Formatted Warlpiri dictionary file headword file position <DICTIONARY> Across file system or web Kirrkirr XML Parser Dictionary Browser HTML document XSL Processor XML Document Object + XSL file <ENTRY>. . . </ENTRY> </DICTIONARY>

XDI in Kirrkirr n The XML indexing process considerably improves efficiency as only requested

XDI in Kirrkirr n The XML indexing process considerably improves efficiency as only requested entries are parsed n Parsed entires are kept temporarily in a cache n Thus Kirrkirr uses XML as a median between the structure and indexing of a relational database, with the freedom and functionality of text.

XQL - Potential n An alternative to investigate for the future is using a

XQL - Potential n An alternative to investigate for the future is using a standard query language – such as XQL – to get material out of the XML dictionary, rather than using our ad hoc index. n At the moment not a huge issue since most retrieval is focussed on components of a particular word

XQL - Optimizations n n n Revamp data structure – reduce redundancy, amount to

XQL - Optimizations n n n Revamp data structure – reduce redundancy, amount to load at start-up PDOM (Persistent Document Object Model) – represents XML document as a collection of objects in a tree like model XQL (Extensible Query Language) – query language for XML – e. g. /DICTIONARY/ENTRY[9] – DICTIONARY/ENTRY[HW='jaja']

Performance - Startup time n Impact on Startup time.

Performance - Startup time n Impact on Startup time.

Customised Presentation of Dictionary Content n n n Produced dynamically from the XML by

Customised Presentation of Dictionary Content n n n Produced dynamically from the XML by using XSL (via James Clark’s XT) XSL allows easy modelling of some user preferences. This is useful as many users find information overload quite confusing and demotivating Can produce bilingual or monolingual dictionary Opportunities for various output styles, and formats such as RTF or Te. X for printing.

Performance - XSL Presentation n n Creates minimal load on the application Requires file

Performance - XSL Presentation n n Creates minimal load on the application Requires file creation permission for the applet Takes load off file system (no need for 9000+ pregenerated files) Gives the user the opportunity to customise the formatting.

Conclusions n While we have focused our research on Warlpiri, the system can be

Conclusions n While we have focused our research on Warlpiri, the system can be easily applied to other languages n The Key to the effectiveness of the browsing interfaces is that the user has the ability to customise their functionality due to the flexibility of the XML & Kirrkirr technology n Throughout this research, the educational interests of the user have been the highest priority. n Hope to better understand the usefulness & practicality of innovative dictionary browsing environments.

Links • Kirrkirr homepage: http: //www. sultry. arts. usyd. edu. au/kirr

Links • Kirrkirr homepage: http: //www. sultry. arts. usyd. edu. au/kirr

Using XSL and XQL For Efficient, Customised Access To Dictionary Information Kevin Jansz kjansz@sultry.

Using XSL and XQL For Efficient, Customised Access To Dictionary Information Kevin Jansz kjansz@sultry. arts. usyd. edu. au Department of Linguistics, University of Sydney, Australia Jim Sng Wee School of Applied Science, Nanyang Technological University, Singapore Christopher Manning Departments of Computer Science and Linguistics, Stanford University, USA Nitin Indurkhya School of Applied Science, Nanyang Technological University, Singapore