MandarinEnglish Information MEI Johns Hopkins University Summer Workshop

  • Slides: 13
Download presentation
Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000 presented at the TDT-3 Workshop

Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000 presented at the TDT-3 Workshop February 28, 2000 Helen Meng Sanjeev Khudanpur Douglas W. Oard Hsin-Min Wang The Chinese University of Hong Kong Johns Hopkins University of Maryland Academia Sinica, Taiwan

Outline • Background • The MEI Project – Multiscale Retrieval – Multiscale Translation •

Outline • Background • The MEI Project – Multiscale Retrieval – Multiscale Translation • Using the TDT-3 collection • Schedule

Motivation • Emerging speech retrieval applications – E. g. , http: //speechbot. research. compaq.

Motivation • Emerging speech retrieval applications – E. g. , http: //speechbot. research. compaq. com • Increasing need for translingual audio search – 1896 Internet accessible radio & TV stations – 529 of these (28%) are not in English source: www. real. com

The Big Picture MEI Translingual Audio Search Translingual Audio Browsing Select English Query Speech

The Big Picture MEI Translingual Audio Search Translingual Audio Browsing Select English Query Speech to Speech Translation Examine English Audio

Related Work • TREC Spoken Document Retrieval – Close coupling of recognition and retrieval

Related Work • TREC Spoken Document Retrieval – Close coupling of recognition and retrieval • TREC Cross-Language Retrieval – Close coupling of translation and retrieval • TDT-3 – Coupling recognition, translation and retrieval – Using baseline recognizer transcripts

The MEI Project • Closely coupling recognition and translation – For the purpose of

The MEI Project • Closely coupling recognition and translation – For the purpose of retrieval • English text queries, Mandarin news audio • Specific research issues: – Multi-scale retrieval – Multi-scale translation

Multi-scale Analysis of Mandarin Preme/Toneme Preme/Core Final Initial/Final /j/ /i/ /j/ /ng/ /ang/ /iang/

Multi-scale Analysis of Mandarin Preme/Toneme Preme/Core Final Initial/Final /j/ /i/ /j/ /ng/ /ang/ /iang/

Multi-scale Retrieval • Subword-scale – – Syllable lattice matching [Chen, Wang & Lee, 2000]

Multi-scale Retrieval • Subword-scale – – Syllable lattice matching [Chen, Wang & Lee, 2000] Overlapping syllable n-grams [Meng et al. , 1999] Skipped syllable pairs [Chen, Wang & Lee, 2000] Syllable confusion matrix [Meng et al. , 1999] • Word-scale – Structured queries [Pirkola, 1998] • Multi-scale – Unified retrieval using a merged feature set – Scale-optimized retrieval with result-set merging

Why Multi-scale Retrieval? • Word-based retrieval exploits lexical knowledge – Enhances precision • Subword

Why Multi-scale Retrieval? • Word-based retrieval exploits lexical knowledge – Enhances precision • Subword units achieve complete phonological coverage – Enhances recall • Combination of evidence may beat either alone

Multi-scale Translation • Word-scale – Dictionary-based [Levow & Oard, 2000] – Parallel corpora [Nie,

Multi-scale Translation • Word-scale – Dictionary-based [Levow & Oard, 2000] – Parallel corpora [Nie, 1999] – Comparable corpora [Fung, 1998] • Subword-scale – Cross-language phonetic map [Knight & Graehl, 1997] • /bei 2 ai 4 er 3 lan 2/ • Kosovo (/ke 1 -sou 3 -wo 4/, /ke 1 -sou 3 -fo 2/, /ke 1 -sou 3 -fu 1/, /ke 1 -sou 3 -fu 2/)

Using the TDT-3 Collection • English queries formed from topic descriptions – 2 -4

Using the TDT-3 Collection • English queries formed from topic descriptions – 2 -4 words (simulated Web search) – Full topic description (simulated routing profile) • Mandarin broadcast news audio (121 hours) – Story-boundary-known condition (4624 stories) – Baseline recognizer transcripts provide words

Dec Feb Sec Pla ond M nni ng EI T Me eam etin g

Dec Feb Sec Pla ond M nni ng EI T Me eam etin g Su m Pla mer nni W ng ork Me sho etin p g Fir Pla st ME nni I T ng e Me am etin g Schedule Six Weeks: Apr Jun Aug

Things We Need • Ideas – To sharpen our focus • Connections – To

Things We Need • Ideas – To sharpen our focus • Connections – To build a community of interest • Resources – To build on what others have done