Novelty Detection in Repeated MEAD Summarization Richard Murphy

  • Slides: 7
Download presentation
Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

Novelty Detection in Repeated MEAD Summarization Richard Murphy EECS 597 06 December 2002

The Problem with MEAD Works well for one-time summaries n Summaries produced are readable,

The Problem with MEAD Works well for one-time summaries n Summaries produced are readable, fairly informative News stories are on-going, not one-time n New, relevant articles may appear after cluster is summarized Expanded cluster will include new information Second summary of a cluster will include lots of known information n New information often demoted--further from centroid Repeated summaries lose value n n n Reader can be assumed to remember past summaries Most informative summary will focus on new information with only brief repetition of key points More repetition = Less new information = Less useful summary

[1] CNN. com - Plane hits skyscraper in Milan - April 18, 2002 [2]

[1] CNN. com - Plane hits skyscraper in Milan - April 18, 2002 [2] CNNen. Espanol. com A small plane has hit a skyscraper in central Milan, setting the top floors of the 30 -story building on fire, an Italian journalist told CNN. [3] The crash by the Piper tourist plane into the 26 th floor occurred at 5: 50 p. m. (1450 GMT) on Thursday, said journalist Desideria Cavina. [4] Several storeys of the building were engulfed in fire, she said. [5] Italian TV says the crash put a hole in the 25 th floor of the Pirelli building, and that smoke is pouring from the opening. [6] U. N. envoy horror at Jenin camp U. S. bombing kills Canadians Chinese missiles concern U. S. 2002 Cable News Network LP, LLLP. [7] The building houses government offices and is next to the city's central train station. [1] CNN. com - Plane hits skyscraper in Milan - April 18, 2002 [2] The crash by the Piper tourist plane into the 26 th floor occurred at 5: 50 p. m. (1450 GMT) on Thursday, said journalist Desideria Cavina. [3] The building houses government offices and is next to the city's central train station. [4] Italian TV says the crash put a hole in the 25 th floor of the Pirelli building, and that smoke is pouring from the opening. [5] U. N. envoy horror at Jenin camp U. S. bombing kills Canadians Chinese missiles concern U. S. 2002 Cable News Network LP, LLLP. [6] The Pirelli Building in Milan, Italy, was hit by a small plane. [7] (ABCNEWS. com) 8212; A small plane crashed into a skyscraper in downtown Milan today, setting several floors of the 30 -story building on fire. [8] The plane crashed into the 25 th floor of the Pirelli building in downtown Milan. [9] A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported. [10] WITNESSES REPORTED hearing a loud explosion from the 30 -story office building, which houses the administrative offices of the local Lombardy region and sits next to the city s central train station. [11] Italian state television said the crash put a hole in the 25 th floor of the Pirelli building. [12] CNNen. Espanol. com A small plane has hit a skyscraper in central Milan, setting the top floors of the 30 -story building on fire, an Italian journalist told CNN.

Solution: MEAD with a memory Save summaries with cluster information When summarizing cluster in

Solution: MEAD with a memory Save summaries with cluster information When summarizing cluster in future, check for archived summaries During reranking, compare sentences to sentences in old summaries n n n Existing default-reranker. pl module compares sentences in summary to each other using cosine similarity metric, eliminates those that are too similar to other sentences in the summary After this process, use cosine similarity to demote sentences in new summary that are too similar to sentences in old summary Don’t completely eliminate sentences similar to known information--If user requests large enough summary, “background” (already seen) information should appear lower in new summary User specific n In a MEAD-based system like News. In. Essence, users could log in to get updated summaries of on-going stories

Evaluating Multiple Summaries Evaluation of single (first) summary n n Create manual extract from

Evaluating Multiple Summaries Evaluation of single (first) summary n n Create manual extract from current cluster Run meadeval. pl to calculate precision/recall/kappa of automated summary Evaluation of subsequent summaries n n Create manual extract from current cluster and past automated summaries (not past manual summaries--reader will have seen the automated output) Run meadeval. pl Always use the cluster which was available to MEAD at time of automated summarization

Comparing MEAD to MEAD with memory Default MEAD--Initial summary: n n n Precision: 0.

Comparing MEAD to MEAD with memory Default MEAD--Initial summary: n n n Precision: 0. 571428571 Recall: 0. 571428571 Kappa: 0. 539170506912442 Default MEAD--Second summary: n n n Precision: 0. 25 Recall: 0. 25 Kappa: 0. 14772727273 Default MEAD--Third summary: n n n Precision: 0. 083333333 Recall: 0. 083333333 Kappa: -0. 0416666663 MEAD with memory--Initial: n n n Precision: 0. 571428571 Recall: 0. 571428571 Kappa: 0. 539170506912442 MEAD with memory--Second: n n n Precision: 0. 33333333 Recall: 0. 33333333 Kappa: 0. 242424242 MEAD with memory--Third: n n n Precision: 0. 83333333 Recall: 0. 83333333 Kappa: 0. 81060606 Settings: demote on cosinesimilarity >= 0. 7, demote by 0. 1 points

Remaining / Future Work More testing n n More test clusters Different values of

Remaining / Future Work More testing n n More test clusters Different values of demotion increment, demotion similarity cutoff Command-line options for demotion settings Varying levels of demotion based on position in old summary Multiple users n n Currently assumes cluster belongs to an individual user Add command-line identification of user so that multiple users can summarize cluster without being affected by each others’ archives News in Essence interface n Remember website visitors, keep unique archives for each