CLEF2007 CrossLanguage Speech Retrieval Track Overview CU Pavel

  • Slides: 35
Download presentation
CLEF-2007 Cross-Language Speech Retrieval Track Overview CU: Pavel Pecina, Jan Hajic, Petra Hoffmannova DCU:

CLEF-2007 Cross-Language Speech Retrieval Track Overview CU: Pavel Pecina, Jan Hajic, Petra Hoffmannova DCU: Gareth Jones, Ying Zhang UMD: Doug Oard, Dagobert Soergel, Scott Olsson IBM: Bhuvana Ramabhadran JHU: Bill Byrne (Cambridge), Zak Shafran (OHSU) USC: Sam Gustman UWB: Pavel Ircing

Speech “Retrieval” Evaluations 1996 -1998: TREC SDR 1997 -2004: TDT English broadcast news; English

Speech “Retrieval” Evaluations 1996 -1998: TREC SDR 1997 -2004: TDT English broadcast news; English queries multilingual news; query by example 2003 -2004: CLEF CL-SDR English broadcast news; Multilingual queries Ø 2005 -2007: CLEF CL-SR English/Czech interviews; Multilingual queries 2007: CLEF QAST English lectures/meetings, English questions

What’s New in 2007? Ø Czech Fixed “quickstart” time alignment problem! 29 training topics,

What’s New in 2007? Ø Czech Fixed “quickstart” time alignment problem! 29 training topics, 42 new evaluation topics 3 new teams (Brown, Chicago, Charles U) English 17% relative improvement over 2006 (TD, ASR) 4 new teams (Brown, Chicago, Jaen, Amsterdam) Same topics and ASR as 2006 63 training topics, 33 evaluation topics

CLEF‒ 2007 Cross-Language Speech Retrieval Track Overview

CLEF‒ 2007 Cross-Language Speech Retrieval Track Overview

English ASR 2006 A ASR 2003 A ASR 2004 A Training: 200 hours from

English ASR 2006 A ASR 2003 A ASR 2004 A Training: 200 hours from 800 speakers

<DOC> <DOCNO>VHF 00009 -056154. 003</DOCNO> <INTERVIEWDATA> Sidonia L. . . | 1927 | Shaindl

<DOC> <DOCNO>VHF 00009 -056154. 003</DOCNO> <INTERVIEWDATA> Sidonia L. . . | 1927 | Shaindl | L. . . | Sydzia </INTERVIEWDATA> <NAME>Issac L. . . , Cyla L. . . </NAME> <MANUALKEYWORD> Shabbat | Jewish identity | customs and observances, Jewish | Przemysl (Poland) | food | Poland 1918 (November 11) - 1939 (August 31) </MANUALKEYWORD> <SUMMARY>SL recounts her daily activities. She notes her family's Jewish identity and she talks about a typical Shabbat. SL describes cholent. </SUMMARY> <ASRTEXT 2003 A>december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of = em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home any … <ASRTEXT 2004 A>december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of = em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home … <ASRTEXT 2006 A>december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of = em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home any … <ASRTEXT 2006 B>december when you talk about what your daily life was like well after realized i was a little girl going to public school during the day after works there were all the lessons and activities in the winter either an awful lot of ice skating and i was a very good ice skater figure i scared that then i would play with friends on the weekends there are things some of = em now are were my parents were religious that meant that on saturday i couldn't do very much i they went to the services i was hanging on outside was uh the kids making all the noise but afterwards we had to go home any … <AUTOKEYWORD 2004 A 1> cultural and social activities | customs and observances, Jewish | family life | food | Shabbat | sports and games | education | family homes | grandmothers | education, Jewish | Jewish-gentile relations | schools | synagogues | Polish (language) | working life | photographs (stills) 1930 s | Poland 1918 (November 11) - 1939 (August 31) | Poland 1935 (May 13) - 1939 (August 31) | Cracow (Poland) | Germany 1918 (November 11) - 1939 (August 31) </AUTOKEYWORD 2004 A 1> <AUTOKEYWORD 2004 A 2> Poland 1918 (November 11) - 1939 (August 31) | customs and observances, Jewish | education | cultural and social activities | extended family members | education, Jewish | family life | Jewish-gentile relations | Jewish identity | Hungary 1918 (November 11) - 1939 (August 31) | Shabbat | sports and games | Budapest (Hungary) | Poland 1941 (June 21) - 1944 (July 21) | synagogue attendance | Hungary 1939 (September 1) - 1944 (March 18) | food in the ghettos | forced labor in the ghettos | fate of loved ones | food </AUTOKEYWORD 2004 A 2> </DOC>

An English Topic Number: 1148 Title: Jewish resistance in Europe Description: Provide testimonies or

An English Topic Number: 1148 Title: Jewish resistance in Europe Description: Provide testimonies or describe actions of Jewish resistance in Europe before and during the war. Narrative: The relevant material should describe actions of only- or mostly Jewish resistance in Europe. Both individual and group-based actions are relevant. Type of actions may include survival (fleeing, hiding, saving children), testifying (alerting the outside world, writing, hiding testimonies), fighting (partisans, uprising, political security) Information about undifferentiated resistance groups is not relevant.

Automatic English TD Runs Ottawa DCU Brown Chicago Amsterdam

Automatic English TD Runs Ottawa DCU Brown Chicago Amsterdam

Automatic English TD Runs Run ID MAP Lang Query Document Fields Site uo. En.

Automatic English TD Runs Run ID MAP Lang Query Document Fields Site uo. En. TDt. QEx. F 1 0. 0855 EN TD AK 1, AK 2, ASR 04 UO uo. En. TDt. QEx. F 2 0. 0841 EN TD AK 1, AK 2, ASR 04 UO dcu. En. TDauto 0. 0787 EN TD AK 1, AK 2, ASR 06 B DCU brown. TD. auto 0. 0785 EN TD AK 1, AK 2, ASR 06 B BLLIP UCkw. ENTD 0. 0571 EN TD AK 1, AK 2, ASR 06 B UC UCbase. ENTD 1 0. 0512 EN TD ASR 06 B UC Uv. A_2_en 4 g 0. 0444 EN TD AK 2, ASR 06 B UVA Uv. A_1_base 0. 0430 EN TD ASR 06 B UVA AK 1 = AUTOKEYWORD 2004 A 1, AK 2 = AUTOKEYWORD 2004 A 2, ASR 03 = ASRTEXT 2003 A, ASR 04 = ASRTEXT 2004 A, ASR 06 A =ASRTEXT 2006 A, and ASR 06 B = ASRTEXT 2006 B.

Wilcoxon Signed-Rank Test UO 0. 0698 0. 0015 0. 0246 0. 1437 0. 3930

Wilcoxon Signed-Rank Test UO 0. 0698 0. 0015 0. 0246 0. 1437 0. 3930 0. 0045 0. 0473 0. 0099 0. 1601 0. 0644 0. 4063 0. 0701 0. 1268 0. 2947 0. 0053 0. 0155 0. 0025 0. 1049 0. 0830 0. 0858 0. 1500 0. 0144 0. 0003 0. 0280 0. 2196 0. 0116 0. 0080 0. 0013 0. 1339 0. 0043 0. 0134 0. 0382 0. 0843 0. 0855 DCU 0. 2018 0. 0019 0. 0316 0. 1212 0. 3219 0. 0001 0. 0797 0. 0088 0. 1571 0. 0244 0. 2077 0. 0709 0. 1211 0. 1225 0. 0080 0. 0052 0. 0008 0. 0813 0. 1076 0. 0795 0. 1614 0. 0048 0. 0004 0. 0110 0. 1799 0. 0444 0. 0087 0. 0012 0. 1990 0. 0022 0. 0884 0. 0657 0. 0766 0. 0787 BLLIP 0. 0003 0. 0002 0. 0187 0. 1191 0. 3863 0. 0001 0. 0207 0. 1706 0. 0392 0. 3546 0. 0790 0. 2185 0. 2529 0. 0064 0. 0023 0. 0006 0. 0675 0. 0941 0. 0658 0. 1351 0. 0177 0. 0014 0. 0204 0. 0782 0. 0504 0. 0120 0. 0007 0. 2290 0. 0025 0. 0324 0. 0611 0. 0539 0. 0785 UC 0. 0433 0. 0022 0. 0211 0. 1722 0. 2484 0. 0004 0. 0529 0. 0050 0. 1148 0. 0186 0. 1189 0. 0375 0. 1021 0. 0714 0. 0038 0. 0152 0. 0007 0. 0664 0. 0587 0. 0574 0. 1487 0. 0052 0. 0002 0. 0113 0. 1195 0. 0489 0. 0059 0. 0008 0. 1773 0. 0018 0. 0091 0. 0591 0. 0854 0. 0571 UVA 0. 2049 0. 0014 0. 0294 0. 0219 0. 0313 0. 0002 0. 0493 0. 0043 0. 1255 0. 0205 0. 0780 0. 0187 0. 1146 0. 1771 0. 0046 0. 0114 0. 0053 0. 0838 0. 0976 0. 0386 0. 0566 0. 0043 0. 0006 0. 0078 0. 0545 0. 0102 0. 0252 0. 0003 0. 1069 0. 0045 0. 0033 0. 0234 0. 0508 0. 0444 UO ‒ DCU The number of nonzero tests is --> 33 The sum of the signed rank is --> 86. 000000 The 95. 0% level of confidence is --> 184. 129807 Methods cannot be separated. UO ‒ BLLIP The number of nonzero tests is --> 33 The sum of the signed rank is --> 147. 000000 The 95. 0% level of confidence is --> 184. 129807 Methods cannot be separated. DCU ‒ BLLIP The number of nonzero tests is --> 32 The sum of the signed rank is --> 12. 000000 The 95. 0% level of confidence is --> 175. 945801 Methods cannot be separated. BLLIP ‒ UC The number of nonzero tests is --> 33 The sum of the signed rank is --> 198. 000000 The 95. 0% level of confidence is --> 184. 129807 Method 1 is better than method 2. UC ‒ UVA The number of nonzero tests is --> 33 The sum of the signed rank is --> 138. 000000 The 95. 0% level of confidence is --> 184. 129807 Methods cannot be separated.

Monolingual vs. Cross-lingual (Automatic TD Runs) Site Document Fields English French Spanish Dutch UO

Monolingual vs. Cross-lingual (Automatic TD Runs) Site Document Fields English French Spanish Dutch UO AK 1, AK 2, ASR 04 0. 0855 71% 72% ‒ DCU AK 1, AK 2, ASR 06 B 0. 0787 81% ‒ ‒ UC 0. 0571 71% ‒ ‒ 0. 0444 ‒ ‒ 90% AK 1, AK 2, ASR 06 B UVA AK 2, ASR 06 B

CLEF 2007: The CL-SR Czech Track Pavel Pecina pecina@ufal. mff. cuni. cz

CLEF 2007: The CL-SR Czech Track Pavel Pecina pecina@ufal. mff. cuni. cz

What’s Different in Czech? Lack of manual interview annotation no topic boundaries (start and

What’s Different in Czech? Lack of manual interview annotation no topic boundaries (start and stop times) no description (summary, assessors' scratchpad) English labels (assigned thesaurus terms) not used Unknown-boundary relevance assessment Task: to identify appropriate replay points manual labeling of start and stop times focus on start time, stop times ignored modified m. GAP used as the evaluation measure penalization for not exact match

Interviews Czech Holocaust survivors testimonies 357 mostly seen speakers, ~565 hours 35% ASR mean

Interviews Czech Holocaust survivors testimonies 357 mostly seen speakers, ~565 hours 35% ASR mean Word Error Rate 2007 Quickstart collection 11377 automatically generated overlapping passages average passage duration 3. 75 min, 33% overlap fields: DOCNO, INTERVIEWDATA, ASRSYSTEM, CHANNEL, ASRTEXT no thesaurus terms used in 2007

335 Seen 22 Unseen Seen 0 ASR Train … 297 IR Eval … 800

335 Seen 22 Unseen Seen 0 ASR Train … 297 IR Eval … 800 ASR Train … English … Czech Interview Usage 30 60 90 120 Minutes from interview start 150 180

<DOCNO>VHF 04106 -7401. 30</DOCNO> <INTERVIEWDATA>Tommy K. . . -K. . . </INTERVIEWDATA> <ASRSYSTEM>2006</ASRSYSTEM> <CHANNEL>right</CHANNEL>

<DOCNO>VHF 04106 -7401. 30</DOCNO> <INTERVIEWDATA>Tommy K. . . -K. . . </INTERVIEWDATA> <ASRSYSTEM>2006</ASRSYSTEM> <CHANNEL>right</CHANNEL> <ASRTEXT>PŘIVEZLI VĚZNĚ NA NOSÍTKÁCH A PROSTĚ MUSEL TADY BYL SKUTEČNĚ KAŽDÝ KDO TO V TEREZÍNĚ V TÉ DOBĚ BYL ALE STEJNĚ VÝSLEDEK BYL MYSLÍM ŽE MLUVÍ </s> <s> STOJÍM PROSTORU KDE </s> <s> AŽ DO KONCE KVĚTNA ČTYŘICET TŘI PŘICHÁZELY TRANSPORTY Z BOHUŠOVIC TEĎ JEŠTĚ CHODILI PĚŠKY ALE POTOM UŽ JE TO DEVĚT ČASTO MOJE TO ZACHOVALA ČÁST KOLEJÍ KTERÉ VEDLY AŽ TAK ZA HAMBURSKÝ KASÁRNA TAM POTOM UŽ OD KVĚTNA ČTYŘICET TŘI DOCHÁZELY VŠECHNY TRANSPORTY JAK Z VENKU TAKÉ OPAČNĚ KDYŽ TEREZÍNA ŠLY TRANSPORTY DO TAKOVÉHO </s> <s> ZA NÁMI TO BYLA JSOU HAMBURSKÝ KASÁRNA TAM JSOU HANNOVERSKÝCH KASÁRNÁCH TADY NA NA TOM O TOM PROSTORU PŘED TOU HROMADOU VĚCÍ MIMOŘÁDNĚ ŠPATNÉ I TADY TŘEBA MÍSTO KDE PŘICHÁZELY TRANSPORTY KDYŽ ČILI DIALOSTECKÝ DĚTI DVANÁCTSET NEBO KDYŽ POTOM CHODILY TRANSPORTY Z NĚMECKA TAK PROSTĚ NA TEN NA TĚCH KOLEJÍCH SE DĚLALO JAK TRANSPORTOVÁNI LIDI CO PŘICHÁZELI SEM TAK TRANSPORTOVÁNI TY VĚCI DÁL DO OSVĚTIMI NEJVĚTŠÍ TRANSPORTY SE TO ZÁŘÍ ČTYŘICET ČTYŘI KDY TAKÉ ODEŠEL MŮJ OTEC A BRATR TO UŽ BYLO ASI DESET TRANSPORTU PO TISÍCI MUŽÍCH TI ŽIDOVŠTÍ ČINNOST KTERÝ SE VELKÁ VĚTŠINA NEVRÁTIL </s> <s> TAK SEM PODÍVALA DĚLÁME NA BAŠTU TO JE ZAJÍMALI PROSTOR TÍM NĚJAKÝM OSOBNÍ VZPOMÍNKU </s> <s> NA TOM NAHOŘE TO BYLO ZAHRADNICTVÍ VLÁDNOUT ALE NĚJAKÝ DO DOBY NEŽ U TEN TAM BYL POSTAVENÝ DOMEČEK ALE BYLO TAM TAKÉ MALÝ V ODBOJI HŘIŠTĚ TAM JSME NIKDY NE- NĚKDY PŘESNĚ TO VÍM SEDMADVACÁTÉHO KVĚTNA ČTYŘICET ČTYŘI JSME TAM HRÁLI FOTBAL UTKÁNÍ SPARTA NEŽ JÁ VÍM SPARTA TO BYL KLUK Z KLUKŮ KTERÝ JSME BYDLELI TADY V HAMBURSKÝCH KASÁRNÁCH TAKŽE TEN PODVOZEK ALE NAPŘED BYLI MY JSME MĚLI SVÍČKU NEBO SE SPARTA A SESTRY STA SEDMNÁCT TY TO BYL DĚTSKÝ DOMOV KLUKŮ TAM MĚLI SE STALO PŘED VNUČKU NEŽ ALE MY JSME TEHDY DO UTKÁNÍ </s> <s> PROHRÁLI TŘI JEDNA ALE TO NENÍ PODSTATNÝ PODSTATNĚ TO ŽE TEN ZÁPAS A JÁ TADY DVACÁTÉHO SEDMÉHO KVĚTNA ČTYŘICET ČTYŘI A ŽE SE NA NĚ BYL JEŠTĚ PŮJDE OTEC PAK UŽ ODJEL DÁL A UŽ SEM U NÍ NEVĚDĚL </s> <s> KOUKÁME SE NÁM ÚZKÁ KASÁRNA TO JSOU KASÁRNA KTERÁ BYLA DŘEVĚNÁ PRO ŽENY TAK SEM TADY S MAMINKOU HNED V TOM ČTYŘICÁTÉM DRUHÉM JSME SEM PŘIŠLI TAK JSME TADY BYLI UBYTOVANÝ NA POKOJ MĚSTĚ ŠEST ŽE TO TYPICKY KASÁRENSKÝCH DOBU KASÁRENSKÝCH BUDOVAL S DVĚMA DVORY VELIKÝM </s> <s> A VPRAVO BYLO JSOU HANNOVERSKÝCH KASÁRNÁCH TA MOJE ČERNÁ PEKÁRNA TAM SE TAK CHLEBA I PRO TEREZÍN TO PRO NÁS A BYL TAM TAKÉ TAKOVÝ DVŮR ODJÍŽDĚLI ŽENY CO VOZILI NA TU DOBRANSKÝCH VOZECH VŠECHNO CO BYLO POTŘEBA TEREZÍNĚ DĚLAL VŠE SE VOZILO S- VOZY KTERÝ BYLI V TOM DVOŘE TAMHLE VZADU MAMINKA TAM BYL ZAMĚSTNÁN OSUDU HUNDERTSCHAFT </s> <s> JSME UMĚLY HAMBURSKÝCH KASÁREN PŘED TÍM JSME SE DÍVALI OTEC PO TOM PRVNÍM DVOŘE HAMBURSKÝ KASÁRNA KDE BYLI </s> <s> ÚPLNĚ TEDA VĚTŠINOU

Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation

Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 10 9 2006 Czech Unused Safety 28 2006 Czech Evaluation 115 29

Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation

Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 10 9 2006 Czech Unused Safety 28 2006 Czech Evaluation 1 2006 Czech Evaluation 29 2007 Czech Training 115 29 29

Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation

Topics selection 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 10 3 9 2006 Czech 2007 Czech Unused Safety 28 2006 Czech Evaluation 1 2006 Czech Evaluation 29 2007 Czech Training 118 29 29

Topics selection (6 or more relevant passages identified during search-guided assessment) 38 2005 English

Topics selection (6 or more relevant passages identified during search-guided assessment) 38 2005 English Training 25 2005 English Evaluation 40 Possible 2007 Czech Evaluation 33 2006 English Evaluation 10 Possible 2007 Czech Evaluation 10 3 9 2006 Czech 2007 Czech Unused Safety 28 2006 Czech Evaluation 1 2006 Czech Evaluation 29 2007 Czech Training 50 118 29 29

Topics selection (highly-ranked assessment completed) 34 2007 Czech Evaluation 8 2007 Czech Evaluation 42

Topics selection (highly-ranked assessment completed) 34 2007 Czech Evaluation 8 2007 Czech Evaluation 42 (6 or more relevant passages identified during search-guided assessment) 40 Possible 2007 Czech Evaluation 10 Possible 2007 Czech Evaluation 50 38 2005 English Training 25 2005 English Evaluation 33 2006 English Evaluation 10 3 9 2006 Czech 2007 Czech Unused Safety 28 2006 Czech Evaluation 1 2006 Czech Evaluation 29 2007 Czech Training 118 29 29

Evaluation Measure based on the mean Generalized Average Precission human assessments are binary degree

Evaluation Measure based on the mean Generalized Average Precission human assessments are binary degree of match to the assessments can be partial penalization for non 100% match up to 150 sec 1. 0 -75 sec 0. 5 +150 sec 0. 0 quantization noise (scores lower than for English) 15 sec assessment granurality quickstart documents begin every 150 sec

Relevance Judgements performed by 6 relevance assessors in Prague Search-guided assessment completed for 87

Relevance Judgements performed by 6 relevance assessors in Prague Search-guided assessment completed for 87 topics 2156 rel. passages identified in the evaluation topics Highly-ranked assessment completed for 42 topics pool depth set to 50 start times 11896 highly-ranked start times checked (284/topic) 233 rel. passages identified

Relevance Assessment Interface

Relevance Assessment Interface

Relevance Judgement Results

Relevance Judgement Results

Relevance Judgement Statistics

Relevance Judgement Statistics

Participation Brown University (BLLIP) Charles University (CUNI) Pavel Češka, Pavel Pecina University of Chicago

Participation Brown University (BLLIP) Charles University (CUNI) Pavel Češka, Pavel Pecina University of Chicago (UC) Matthew Lease, Eugene Charniak Gina-Anne Levow University of West Bohemia (UWB) Pavel Ircing, Luděk Müller total of 15 runs submitted required condition: automatic queries from Title and Description

Results

Results

Results

Results

Results: Term normalization the effect of term normalization for handling Czech morphology is quite

Results: Term normalization the effect of term normalization for handling Czech morphology is quite significant: 60 -120% relative improvement

Alignment Issues in the Quickstart Collection 2006 data release (affected 2006 working notes) Time

Alignment Issues in the Quickstart Collection 2006 data release (affected 2006 working notes) Time mismatch made m. GAP uninformative (pauses ignored) Post-CLEF 2006 evaluation (“corrected” in 2006 proceedings) Post-hoc start time correction (but missing tapes counted as 30 min) AUTO and MANUAL KEYWORDS still misaligned 2007 data release Some additional corrections for ASR timing AUTO and MANUAL KEWYORDS removed (too hard to fix) 2007 evaluation (reported in 2007 working notes) Missing-tape timing corrected post-hoc

Test Collection Release CLEF CL-SR track test collections: Package for release Independent (cross-site) validation

Test Collection Release CLEF CL-SR track test collections: Package for release Independent (cross-site) validation Deposit at ELDA MALACH ASR training data: Package English and Czech for release With Polish, Russian, Slovak (+ maybe Hungarian) Deposit at LDC

What Did We Learn? Searching conversational speech works Real user needs, two languages Improving

What Did We Learn? Searching conversational speech works Real user needs, two languages Improving ASR helps less than expected Error rates vary by speaker Ranked retrieval prefers lower error rates Automatic classification can help ASR At least if error rates are high Unsegmented sources bring new challenges Cross-sourcealignment Evaluation measure design

Critiquing the Collection Large for ASR is small for IR No manual reference transcription

Critiquing the Collection Large for ASR is small for IR No manual reference transcription ~1, 000 hours of speech = ~20, 000 “documents” Would cost ~$100, 000 Interviews are just one type of conversation