Some interesting directions in Automatic Summarization Annie Louis

  • Slides: 45
Download presentation
Some interesting directions in Automatic Summarization Annie Louis CIS 430 12/02/08 1

Some interesting directions in Automatic Summarization Annie Louis CIS 430 12/02/08 1

Today’s lecture Multi-strategy summarization Is one method enough? Performance Confidence Estimation Would be nice

Today’s lecture Multi-strategy summarization Is one method enough? Performance Confidence Estimation Would be nice to have an indication of expected system performance on an input Evaluation without human models Can we come up with cheap and fast evaluation measures? Beyond generic summarization Query focused, updates, blogs, meeting, speech. . 2

Relevant Papers : Lacatusu et al. LCC’s GISTexter at DUC 2006: Multi-Strategy Multi. Document

Relevant Papers : Lacatusu et al. LCC’s GISTexter at DUC 2006: Multi-Strategy Multi. Document Summarization. In Proceedings of the Document Understanding Workshop (DUC-2006) Mc. Keown et al. Columbia multi-document. summarization: Approach and evaluation. In Proceedings of the Document Understanding Conference (DUC 01), 2001. Nenkova et al. Can You Summarize This? Identifying Correlates of Input Difficulty for Multi-Document Summarization. In Proceedings of ACL-08: HLT 3

More about DUC 2002 data… /project/cis/nlp/tools/Summarization_Data/Inputs 2 002 Newswire texts Has 3 categories of

More about DUC 2002 data… /project/cis/nlp/tools/Summarization_Data/Inputs 2 002 Newswire texts Has 3 categories of inputs ! ? 4

DUC 2002 input categories Single event - 30 inputs Eg: d 061 - Hurricane

DUC 2002 input categories Single event - 30 inputs Eg: d 061 - Hurricane Gilbert Same place, roughly same time, same actions Multiple distinct events – 15 inputs Eg: d 064 - Opening of Mac Donald’s at Russia, Canada, South Korea. . Different places, different times, different agents Biographies – 15 inputs Eg: d 065 – Dan Quayle, Bush’s nominee for vice president One person – one event, background info – events from the past Do you think a single method will do well for all ? 5

Tf-idf summary - d 061 Hurricane Gilbert Heads Toward Dominican Coast. Tropical Storm Gilbert

Tf-idf summary - d 061 Hurricane Gilbert Heads Toward Dominican Coast. Tropical Storm Gilbert formed in the eastern Caribbean and strengthened into a hurricane Saturday night. Gilbert Reaches Jamaican Capital With 110 Mph Winds. Hurricane warnings were posted for the Cayman Islands, Cuba and Haiti. Hurricane Hits Jamaica With 115 mph Winds; Communications. Gilbert reached Jamaica after skirting southern Puerto Rico, Haiti and the Dominican Republic. Gilbert was moving west-northwest at 15 mph and winds had decreased to 125 mph. What Makes Gilbert So Strong? With PM-Hurricane Gilbert, Bjt. Hurricane Gilbert Heading for Jamaica With 100 MPH Winds. Tropical Storm Gilbert 6

Tf-idf summary - d 064 First Mc. Donald's to Open in Communist Country. Police

Tf-idf summary - d 064 First Mc. Donald's to Open in Communist Country. Police Keep Crowds From Crashing First Mc. Donald's and Genex contribute $1 million each for the flagship restaurant. A Bolshoi Mac Attack in Moscow as First Mc. Donald's Opens First Restaurant in China. Mc. Donald's hopes to open a restaurant in Beijing later. The 500 -seat Mc. Donald's restaurant in a three-story building is operated by Mc. Donald's Restaurant Shenzhen Ltd. , a wholly owned subsidiary of Mc. Donald's Hong Kong is a 50 -50 joint venture with Mc. Donald's in the United States. Mc. Donald's officials say it is not a question that 7

Tf-idf summary - d 065 Tucker was fascinated by the idea, Quayle said. But

Tf-idf summary - d 065 Tucker was fascinated by the idea, Quayle said. But Dan Quayle's got experience, too. Quayle's Triumph Quickly Tarnished. Quayle's Biography Inflates State Job; Quayle Concedes Error. Her statement was released by the Quayle campaign. But he would go no further in describing what assignments he would give Quayle. ``I will be a very close adviser to the president, '' Quayle said. ``You're never going to see Dan Quayle telling tales out of It was everything Quayle had hoped for. Quayle had said very little and he had said it very well. There are windows into the workings of the 8

Multi-strategy summarization Multiple summarization modules within a single system Better than a single method

Multi-strategy summarization Multiple summarization modules within a single system Better than a single method How to employ a multi-strategy system? Use all methods, produce multiple summaries, choose best Use a router and summarize by only one specific method 9

Produce multiple summaries and choose – LCC GISTexter Task - Query focused summarization Query

Produce multiple summaries and choose – LCC GISTexter Task - Query focused summarization Query is decomposed by 3 methods Sent to a QA system and a multi-document summarizer 6 different summaries Select the best summary Textual entailment + pyramid scoring 10

Route to a specific module – Columbia’s multi-document summarizer Features to classify an input

Route to a specific module – Columbia’s multi-document summarizer Features to classify an input as Single event Biography Loosely connected documents The result of classification is used to route the input to one of 3 different summarizers 11

Features - Single Event To identify Time span between publication dates < 80 days

Features - Single Event To identify Time span between publication dates < 80 days More than 50% documents published on same day To summarize Exploit redundancy, cluster similar sentences into themes Rank themes based on size, similarity, ranking of contained sentences by lexical chains Select phrases from each theme Generate sentences 12

Features - Biographies To identify Frequency of most frequent capitalized letter > X (compensate

Features - Biographies To identify Frequency of most frequent capitalized letter > X (compensate for NE) Frequency of personal pronouns > Y To summarize Target individual mentioned in sentence ? Another individual found in the sentence ? Position of most prominent capitalized word in the sentence 13

Features – Weakly related documents To identify Not single event nor biographical To summarize

Features – Weakly related documents To identify Not single event nor biographical To summarize Words likely to be used in first paragraphs ie important words – learnt from corpus analysis Verb specificity Semantic themes – wordnet concepts Positional and length features More weight to recent articles Downweight sentences with pronouns 14

Characterizing/ Classifying inputs Important if you want to route to a specialized summarizer Classification

Characterizing/ Classifying inputs Important if you want to route to a specialized summarizer Classification can be made along several lines Theme of input – Columbia’s summarizer Scientific/ News articles Long/ Short documents News articles about events/ Editorials Difficult/ Easy ? ? 15

Input difficulty and Performance Confidence Estimation Some inputs are more difficult than others –

Input difficulty and Performance Confidence Estimation Some inputs are more difficult than others – Most summarizers produce poor summaries for these inputs 16

Input to summarizer q Some inputs are easier than others ! Average system scores

Input to summarizer q Some inputs are easier than others ! Average system scores obtained on different inputs for 100 word summaries mean 0. 55 min 0. 07 max 1. 65 Data: DUC 2001 score range 0 - 4 17

Input difficulty & Content coverage scores Content coverage score extent of coverage of important

Input difficulty & Content coverage scores Content coverage score extent of coverage of important content Poor content selection –> low score If most summaries for an input get low score. . Most systems could not identify important content “ Difficult Input ” 18

Did system performance vary with DUC 2001 input categories? Multi-document inputs were from 5

Did system performance vary with DUC 2001 input categories? Multi-document inputs were from 5 categories: A set of documents describing. . . Single event The Exxon Valdez Oil Spill Subject Mad Cow Disease Biographical Cohesive / “On topic” Inputs Elizabeth Taylor Multiple distinct events Different occasions of police misconduct Opinion Views of the senate, public, congress, lawyers etc on the decision by the senate to count illegal aliens in the 1990 census q Non Cohesive/ “Multiple facets” Inputs Single task – generic summarization 19

Input type influenced scores obtained Biographical Single event Subject are easier to summarize than

Input type influenced scores obtained Biographical Single event Subject are easier to summarize than Multiple distinct events Opinions 20

Cohesive inputs are easier to summarize Scores for cohesive inputs are significantly* higher than

Cohesive inputs are easier to summarize Scores for cohesive inputs are significantly* higher than those for non-cohesive inputs at 100, 200 and 400 words *One sided t-tests 95% significance level Cohesive Biographical Single event Subject Non Cohesive Multiple distinct events Opinions 21

Inputs can be easy or difficult => Better summarizers ~ different methods to summarize

Inputs can be easy or difficult => Better summarizers ~ different methods to summarize different inputs multi-strategy Enhancing user experience ~ system can flag summaries that are likely to be poor in content low system confidence on difficult inputs 22

First step. . What characterizes difficult inputs? Find useful features Can we identify difficult

First step. . What characterizes difficult inputs? Find useful features Can we identify difficult inputs with high accuracy? Classification task – difficult vs easy 23

Features – Simple length-based Smaller inputs ~ less loss of information ~ better summaries

Features – Simple length-based Smaller inputs ~ less loss of information ~ better summaries Number of sentences ~ information to be captured in the summary Vocabulary size ~ number of unique words 24

Features – Word distributions in input % of words used only once ~ lexical

Features – Word distributions in input % of words used only once ~ lexical repetition less repetition of content ~ difficult inputs Type- token ratio ~ lexical variation in the input fewer types ~ easy inputs Entropy of the input ~ descriptive words ~ high probabilities ~ less entropy ~ easy 25

Features – Document similarity and relatedness documents with overlapping content ~ easy input Pair-wise

Features – Document similarity and relatedness documents with overlapping content ~ easy input Pair-wise cosine overlap (average, min, max) ~ similarity of the documents High cosine overlaps overlapping content easy to summarize 26

Features – Document similarity and relatedness tightly-bound by topic ~ easy input KL Divergence

Features – Document similarity and relatedness tightly-bound by topic ~ easy input KL Divergence ~ distance from a large collection of random documents Difference between 2 language models input & random collection Greater divergence input is unlike random documents, tightly bound input 27

Features – Log likelihood ratio based more topic terms, similar topic terms ~ topic-oriented,

Features – Log likelihood ratio based more topic terms, similar topic terms ~ topic-oriented, easy input Number of topic signature terms Percentage of topic signatures in the vocabulary ~ control for length of the input Pair-wise topic signature overlap (average, min, max) ~ similarity between the topic vectors of different documents ~ cosine overlap with reduced & specific vectors 28

What makes some inputs easy? Easy inputs have smaller vocabulary smaller entropy greater divergence

What makes some inputs easy? Easy inputs have smaller vocabulary smaller entropy greater divergence from a random collection higher % of topic signatures in the vocabulary higher avg cosine and topic overlap 29

Input difficulty hypothesis for systems Indicator of an input’s difficulty Average system coverage score

Input difficulty hypothesis for systems Indicator of an input’s difficulty Average system coverage score Difficult, if most systems select poor content Defining difficulty of inputs 2 classes Abv/ Below “ mean average system score ” > mean score – easy < mean score – difficult Equal classes 30

Classification Results Baseline performance : 50% Test set: DUC 2002 - 04 10 fold

Classification Results Baseline performance : 50% Test set: DUC 2002 - 04 10 fold cross validation on 192 observations Precision and recall of difficult inputs Accuracy Precision Recall 69. 27 0. 696 0. 674 31

Summary Evaluation without Human Models Current Evaluation Measures - Recap Content Coverage Pyramid Responsiveness

Summary Evaluation without Human Models Current Evaluation Measures - Recap Content Coverage Pyramid Responsiveness ROUGE * My work with Ani 32

Need for cheap, fast measures All current evaluations require human effort Human summaries (content

Need for cheap, fast measures All current evaluations require human effort Human summaries (content overlap, pyramid, rouge) Manual marking of summaries (responsiveness) Human summaries are biased several summaries for the same input are needed to remove bias (Pyramid, ROUGE) Can we come up with cheaper evaluation techniques that will produce the same rankings for systems as human evaluations ? 33

Compare with input – No human models Estimate closeness of summary to input The

Compare with input – No human models Estimate closeness of summary to input The more close a summary is to the input, the better its content should be How do we verify this ? Design some features that can reflect how close a summary is to the input Rank summaries based on the value of this feature Compare the obtained rankings to rankings given by humans Similar rankings (high correlation) – you have succeeded 34

What features should we use? We want to know how well a summary reflects

What features should we use? We want to know how well a summary reflects the input’s content. Guesses ? 35

Features - Divergence between input and summary Smaller divergence ~ better summary KL divergence

Features - Divergence between input and summary Smaller divergence ~ better summary KL divergence input – summary KL divergence summary – input Jensen Shannon Divergence 36

Features – Use of topic words from the input More topic words ~ better

Features – Use of topic words from the input More topic words ~ better summary % of summary composed of topic words % of input’s topic words carried over to the summary 37

Features – Similarity between input and summary More similar to the input ~ better

Features – Similarity between input and summary More similar to the input ~ better summary Cosine similarity input – summary words Cosine similarity input’s topic signatures – summary words 38

Features - Summary Probability Higher likelihood of summary given input ~ better summary Unigram

Features - Summary Probability Higher likelihood of summary given input ~ better summary Unigram summary probability Multinomial summary probability 39

Analysis of features The value of the feature will be the score for the

Analysis of features The value of the feature will be the score for the summary Average the feature values for a particular system over all inputs Compare to average human score Spearman (rank) correlation 40

Results TAC 2008 Query focused summarization 48 inputs, 57 systems Feature JSD % input’s

Results TAC 2008 Query focused summarization 48 inputs, 57 systems Feature JSD % input’s topic in summary KL div summ-input Cosine overlap % of topic summary KL div input – summ Unigram summ prob Mult. Summ prob Pyramid -0. 8803 -0. 8741 -0. 7629 0. 7117 0. 7115 -0. 6875 -0. 1879 0. 2224 Responsivenes s -0. 7364 -0. 8249 -0. 6941 0. 6469 0. 6015 -0. 5850 -0. 1006 0. 2353 41

Evaluation without human models Comparison with input – correlates well with human judgements Cheap,

Evaluation without human models Comparison with input – correlates well with human judgements Cheap, fast, unbiased No human effort needed 42

Other summarization tasks of interest Update summaries The user has read a set of

Other summarization tasks of interest Update summaries The user has read a set of documents A Produce a summary of updates from a set B of documents published later in time Query focused A topic statement is given to focus content selection 43

Other summarization tasks of interest Blog/ Opinion Summarization Mine opinions, good/ bad product reviews

Other summarization tasks of interest Blog/ Opinion Summarization Mine opinions, good/ bad product reviews etc Meeting/ Speech Summarization How would you summarize a brainstorming session ? 44

What you have learnt today. . How simple features you already know can be

What you have learnt today. . How simple features you already know can be put to use for interesting applications Beyond a simple sentence extractor engine – customizing for inputs/ user/ task-setting is important There a lot of interesting tasks in summarization and language processing 45