MandarinEnglish Information MEI Johns Hopkins University Summer Workshop
- Slides: 13
Mandarin-English Information (MEI) Johns Hopkins University Summer Workshop 2000 presented at the TDT-3 Workshop February 28, 2000 Helen Meng Sanjeev Khudanpur Douglas W. Oard Hsin-Min Wang The Chinese University of Hong Kong Johns Hopkins University of Maryland Academia Sinica, Taiwan
Outline • Background • The MEI Project – Multiscale Retrieval – Multiscale Translation • Using the TDT-3 collection • Schedule
Motivation • Emerging speech retrieval applications – E. g. , http: //speechbot. research. compaq. com • Increasing need for translingual audio search – 1896 Internet accessible radio & TV stations – 529 of these (28%) are not in English source: www. real. com
The Big Picture MEI Translingual Audio Search Translingual Audio Browsing Select English Query Speech to Speech Translation Examine English Audio
Related Work • TREC Spoken Document Retrieval – Close coupling of recognition and retrieval • TREC Cross-Language Retrieval – Close coupling of translation and retrieval • TDT-3 – Coupling recognition, translation and retrieval – Using baseline recognizer transcripts
The MEI Project • Closely coupling recognition and translation – For the purpose of retrieval • English text queries, Mandarin news audio • Specific research issues: – Multi-scale retrieval – Multi-scale translation
Multi-scale Analysis of Mandarin Preme/Toneme Preme/Core Final Initial/Final /j/ /i/ /j/ /ng/ /ang/ /iang/
Multi-scale Retrieval • Subword-scale – – Syllable lattice matching [Chen, Wang & Lee, 2000] Overlapping syllable n-grams [Meng et al. , 1999] Skipped syllable pairs [Chen, Wang & Lee, 2000] Syllable confusion matrix [Meng et al. , 1999] • Word-scale – Structured queries [Pirkola, 1998] • Multi-scale – Unified retrieval using a merged feature set – Scale-optimized retrieval with result-set merging
Why Multi-scale Retrieval? • Word-based retrieval exploits lexical knowledge – Enhances precision • Subword units achieve complete phonological coverage – Enhances recall • Combination of evidence may beat either alone
Multi-scale Translation • Word-scale – Dictionary-based [Levow & Oard, 2000] – Parallel corpora [Nie, 1999] – Comparable corpora [Fung, 1998] • Subword-scale – Cross-language phonetic map [Knight & Graehl, 1997] • /bei 2 ai 4 er 3 lan 2/ • Kosovo (/ke 1 -sou 3 -wo 4/, /ke 1 -sou 3 -fo 2/, /ke 1 -sou 3 -fu 1/, /ke 1 -sou 3 -fu 2/)
Using the TDT-3 Collection • English queries formed from topic descriptions – 2 -4 words (simulated Web search) – Full topic description (simulated routing profile) • Mandarin broadcast news audio (121 hours) – Story-boundary-known condition (4624 stories) – Baseline recognizer transcripts provide words
Dec Feb Sec Pla ond M nni ng EI T Me eam etin g Su m Pla mer nni W ng ork Me sho etin p g Fir Pla st ME nni I T ng e Me am etin g Schedule Six Weeks: Apr Jun Aug
Things We Need • Ideas – To sharpen our focus • Connections – To build a community of interest • Resources – To build on what others have done
- Johns hopkins medicine strategic plan
- Sneedville
- Wilmer eye clinic bel air maryland
- Dr michelle petri
- Scrivener template
- Johns hopkins college essays
- Johns hopkins individual evidence summary tool
- Jh community physicians
- Ictr redcap
- Johns hopkins
- Johns hopkins medicine strategic plan
- Jhu apl internship
- John hopkins university covid 19 map
- John hopkins university