Some Thoughts on HPC in Natural Language Engineering


















- Slides: 18
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania
Sponsorship Natural Language Engineering: Integrating Parallel and Parametric Processing Victorian Partnership for Advanced Computing Expertise Grant EPPNME 092. 2003
NLE Application Areas q q q Spoken dialogue systems Cross-language information retrieval Word-sense disambiguation Multi-document summarisation Natural language database interfaces Information Extraction Information Retrieval Authoring Tools Language Analysis Language Understanding Knowledge Representation Knowledge Discovery q q q q Spoken Language Input Written Language Input Natural Language Generation Spoken Output Multilinguality Multimodality Discourse and Dialogue
Some NLE Applications in detail n Information extraction from broadcast news q n Spoken language dialogue systems (SLDS) q n Tokenization, alignment, entity detection, coreference resolution, semantic mapping Speech recognition, parsing, user modelling, discourse management, generation, synthesis Language analysis q Interlinear text annotation, lexicon development, morphosyntactic grammar development
Meta Activities n Discovery q q n Reuse q q n What tools work with data in format X? What lexical resources exist for language Y? Diverse implementation frameworks Component integration, wrapping, etc Training and evaluation q q Parametric and parallel processing Comparing systems running on the same data Gold standard vs theory comparison Analyzing interaction logs
Learn about NLE n n n This department hosts a mirror of the ACL digital anthology 50 k pages, 40 years http: //www. cs. mu. oz. au/acl/
SLDS Architecture
SLDS Components
Another SLDS Architecture
Observations n Common components, different arrangements q n Most NLE components convert between information types q q q n Multiple components for doing the same task Parser: from strings to trees ASR: from speech to text Summariser: from text to selected text But: q q q Many processes benefit from other information sources (e. g. exploiting intonation in input) Input and output can be aligned Solution: multilayer annotations
Multilayer annotations
Multilayer Annotations
Annotation Graphs n Labelled digraphs with timestamped nodes
Annotation Graphs: complex example n AGTK: Annotation Graph Toolkit q q library, applications agtk. sourceforge. net
NLE and Grids n NLE Applications q q q n To use grids in NLE: q q n typically constructed out of numerous components each component responsible for a specialised task executed against large data sets subscribe to a model which allows automated discovery of data and components flexible design of applications, coordination of execution, storage of results Ideally: q view grid as a commodity, hidden from application developers
Architectural Components n Data q q n Language resources for analysis E. g. Switchboard, 2400 annotated telephone conversations (26 CDs) Software Components q minimal individual functional units n q n Dublin Core Application Profile for NLE resources Application q q n common interface specification Metadata Repositories q n e. g. Annotation Server, Alignment, ASR, Data Source Packaging, Format Conversion, Text Annotation, Lexicon Server, Semantic Mapping data + components + processing instructions declarative specification in XML Grid Service q computational and storage resources for application execution
Architecture
Conclusion n Natural Language Engineering q q interesting test case for grid services many mature component technologies applications that are both data and processor intensive applications for building the multilingual information society of the future. . .