Some Thoughts on HPC in Natural Language Engineering

  • Slides: 18
Download presentation
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne &

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania

Sponsorship Natural Language Engineering: Integrating Parallel and Parametric Processing Victorian Partnership for Advanced Computing

Sponsorship Natural Language Engineering: Integrating Parallel and Parametric Processing Victorian Partnership for Advanced Computing Expertise Grant EPPNME 092. 2003

NLE Application Areas q q q Spoken dialogue systems Cross-language information retrieval Word-sense disambiguation

NLE Application Areas q q q Spoken dialogue systems Cross-language information retrieval Word-sense disambiguation Multi-document summarisation Natural language database interfaces Information Extraction Information Retrieval Authoring Tools Language Analysis Language Understanding Knowledge Representation Knowledge Discovery q q q q Spoken Language Input Written Language Input Natural Language Generation Spoken Output Multilinguality Multimodality Discourse and Dialogue

Some NLE Applications in detail n Information extraction from broadcast news q n Spoken

Some NLE Applications in detail n Information extraction from broadcast news q n Spoken language dialogue systems (SLDS) q n Tokenization, alignment, entity detection, coreference resolution, semantic mapping Speech recognition, parsing, user modelling, discourse management, generation, synthesis Language analysis q Interlinear text annotation, lexicon development, morphosyntactic grammar development

Meta Activities n Discovery q q n Reuse q q n What tools work

Meta Activities n Discovery q q n Reuse q q n What tools work with data in format X? What lexical resources exist for language Y? Diverse implementation frameworks Component integration, wrapping, etc Training and evaluation q q Parametric and parallel processing Comparing systems running on the same data Gold standard vs theory comparison Analyzing interaction logs

Learn about NLE n n n This department hosts a mirror of the ACL

Learn about NLE n n n This department hosts a mirror of the ACL digital anthology 50 k pages, 40 years http: //www. cs. mu. oz. au/acl/

SLDS Architecture

SLDS Architecture

SLDS Components

SLDS Components

Another SLDS Architecture

Another SLDS Architecture

Observations n Common components, different arrangements q n Most NLE components convert between information

Observations n Common components, different arrangements q n Most NLE components convert between information types q q q n Multiple components for doing the same task Parser: from strings to trees ASR: from speech to text Summariser: from text to selected text But: q q q Many processes benefit from other information sources (e. g. exploiting intonation in input) Input and output can be aligned Solution: multilayer annotations

Multilayer annotations

Multilayer annotations

Multilayer Annotations

Multilayer Annotations

Annotation Graphs n Labelled digraphs with timestamped nodes

Annotation Graphs n Labelled digraphs with timestamped nodes

Annotation Graphs: complex example n AGTK: Annotation Graph Toolkit q q library, applications agtk.

Annotation Graphs: complex example n AGTK: Annotation Graph Toolkit q q library, applications agtk. sourceforge. net

NLE and Grids n NLE Applications q q q n To use grids in

NLE and Grids n NLE Applications q q q n To use grids in NLE: q q n typically constructed out of numerous components each component responsible for a specialised task executed against large data sets subscribe to a model which allows automated discovery of data and components flexible design of applications, coordination of execution, storage of results Ideally: q view grid as a commodity, hidden from application developers

Architectural Components n Data q q n Language resources for analysis E. g. Switchboard,

Architectural Components n Data q q n Language resources for analysis E. g. Switchboard, 2400 annotated telephone conversations (26 CDs) Software Components q minimal individual functional units n q n Dublin Core Application Profile for NLE resources Application q q n common interface specification Metadata Repositories q n e. g. Annotation Server, Alignment, ASR, Data Source Packaging, Format Conversion, Text Annotation, Lexicon Server, Semantic Mapping data + components + processing instructions declarative specification in XML Grid Service q computational and storage resources for application execution

Architecture

Architecture

Conclusion n Natural Language Engineering q q interesting test case for grid services many

Conclusion n Natural Language Engineering q q interesting test case for grid services many mature component technologies applications that are both data and processor intensive applications for building the multilingual information society of the future. . .