MandarinEnglish Information MEI Johns Hopkins University Summer Workshop







![Multi-scale Retrieval • Subword-scale – – Syllable lattice matching [Chen, Wang & Lee, 2000] Multi-scale Retrieval • Subword-scale – – Syllable lattice matching [Chen, Wang & Lee, 2000]](https://slidetodoc.com/presentation_image/8be14cd2ce560c39abd9b2d1bcdecbdb/image-8.jpg)

![Multi-scale Translation • Word-scale – Dictionary-based [Levow & Oard, 2000] – Parallel corpora [Nie, Multi-scale Translation • Word-scale – Dictionary-based [Levow & Oard, 2000] – Parallel corpora [Nie,](https://slidetodoc.com/presentation_image/8be14cd2ce560c39abd9b2d1bcdecbdb/image-10.jpg)



- Slides: 13

Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000 presented at the TDT-3 Workshop February 28, 2000 Helen Meng Sanjeev Khudanpur Douglas W. Oard Hsin-Min Wang The Chinese University of Hong Kong Johns Hopkins University of Maryland Academia Sinica, Taiwan

Outline • Background • The MEI Project – Multiscale Retrieval – Multiscale Translation • Using the TDT-3 collection • Schedule

Motivation • Emerging speech retrieval applications – E. g. , http: //speechbot. research. compaq. com • Increasing need for translingual audio search – 1896 Internet accessible radio & TV stations – 529 of these (28%) are not in English source: www. real. com

The Big Picture MEI Translingual Audio Search Translingual Audio Browsing Select English Query Speech to Speech Translation Examine English Audio

Related Work • TREC Spoken Document Retrieval – Close coupling of recognition and retrieval • TREC Cross-Language Retrieval – Close coupling of translation and retrieval • TDT-3 – Coupling recognition, translation and retrieval – Using baseline recognizer transcripts

The MEI Project • Closely coupling recognition and translation – For the purpose of retrieval • English text queries, Mandarin news audio • Specific research issues: – Multi-scale retrieval – Multi-scale translation

Multi-scale Analysis of Mandarin Preme/Toneme Preme/Core Final Initial/Final /j/ /i/ /j/ /ng/ /ang/ /iang/
![Multiscale Retrieval Subwordscale Syllable lattice matching Chen Wang Lee 2000 Multi-scale Retrieval • Subword-scale – – Syllable lattice matching [Chen, Wang & Lee, 2000]](https://slidetodoc.com/presentation_image/8be14cd2ce560c39abd9b2d1bcdecbdb/image-8.jpg)
Multi-scale Retrieval • Subword-scale – – Syllable lattice matching [Chen, Wang & Lee, 2000] Overlapping syllable n-grams [Meng et al. , 1999] Skipped syllable pairs [Chen, Wang & Lee, 2000] Syllable confusion matrix [Meng et al. , 1999] • Word-scale – Structured queries [Pirkola, 1998] • Multi-scale – Unified retrieval using a merged feature set – Scale-optimized retrieval with result-set merging

Why Multi-scale Retrieval? • Word-based retrieval exploits lexical knowledge – Enhances precision • Subword units achieve complete phonological coverage – Enhances recall • Combination of evidence may beat either alone
![Multiscale Translation Wordscale Dictionarybased Levow Oard 2000 Parallel corpora Nie Multi-scale Translation • Word-scale – Dictionary-based [Levow & Oard, 2000] – Parallel corpora [Nie,](https://slidetodoc.com/presentation_image/8be14cd2ce560c39abd9b2d1bcdecbdb/image-10.jpg)
Multi-scale Translation • Word-scale – Dictionary-based [Levow & Oard, 2000] – Parallel corpora [Nie, 1999] – Comparable corpora [Fung, 1998] • Subword-scale – Cross-language phonetic map [Knight & Graehl, 1997] • /bei 2 ai 4 er 3 lan 2/ • Kosovo (/ke 1 -sou 3 -wo 4/, /ke 1 -sou 3 -fo 2/, /ke 1 -sou 3 -fu 1/, /ke 1 -sou 3 -fu 2/)

Using the TDT-3 Collection • English queries formed from topic descriptions – 2 -4 words (simulated Web search) – Full topic description (simulated routing profile) • Mandarin broadcast news audio (121 hours) – Story-boundary-known condition (4624 stories) – Baseline recognizer transcripts provide words

Dec Feb Sec Pla ond M nni ng EI T Me eam etin g Su m Pla mer nni W ng ork Me sho etin p g Fir Pla st ME nni I T ng e Me am etin g Schedule Six Weeks: Apr Jun Aug

Things We Need • Ideas – To sharpen our focus • Connections – To build a community of interest • Resources – To build on what others have done