Summarization and Personal Information Management Carolyn Penstein Ros

  • Slides: 24
Download presentation
Summarization and Personal Information Management Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Summarization and Personal Information Management Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Announcements Questions? n Homework 3 will be assigned Thur or next Tue n Plan

Announcements Questions? n Homework 3 will be assigned Thur or next Tue n Plan for Today n ¨ Two papers about early summarization work

Getting into Technology Problem Human Behavior * Early work in summarization s u c

Getting into Technology Problem Human Behavior * Early work in summarization s u c Solution Design o f s To y’ a d Technology Problem?

Historical View The World According to Carolyn n 1950 s The early days of

Historical View The World According to Carolyn n 1950 s The early days of AI ¨ n n n [both symbolic and connectionist!] 1958: First paper on summarization (H. P. Luhn) 1965 Chomsky’s Aspects of a Theory of Syntax 1970 I was born In the 80 s, Language technologies was From part ofthe linguistics 1988 I started college field of Information Science! 1990 Paice paper 1991 PARSEC: the first connectionist parser 1992 I started grad school 1995 Mc. Keown paper 1998 I got my Ph. D! [Most of the work we’ll read about] 2008 Now!

Two Approaches Full text Other Data Subset of Text Structured Representation Interpretation Subset with

Two Approaches Full text Other Data Subset of Text Structured Representation Interpretation Subset with Glue phrases Generated Text Generation Mc. Keown paper – Major question: do you generate Paice paper – Major question: howhow do you select the rightcompact subset? text?

Paice Paper

Paice Paper

Constructing Literature Abstracts by Computer n Problems with making abstracts sound coherent ¨ Highly

Constructing Literature Abstracts by Computer n Problems with making abstracts sound coherent ¨ Highly disjointed ¨ Dangling anaphoric expressions n Approach ¨ Pick sections that are relatively self-contained (tidy) n Need: theory of text structure

Finding Key Passages n Location method ¨ First sentence of a paragraph often states

Finding Key Passages n Location method ¨ First sentence of a paragraph often states the main point n Precursors to TF-IDF like metrics ¨ Frequency keyword: relies on an existing set of index terms per document ¨ Title keyword method: similar, but uses words from title ¨ Bonus word method: superlatives and value words n Using discourse structure

Important linguistic constructs Remember Grosz & Sidner: Intentional Structure and Attentional State n Indicator

Important linguistic constructs Remember Grosz & Sidner: Intentional Structure and Attentional State n Indicator phrases/ Logical and rhetorical connectives n ¨ “The main aim of the paper is…” ¨ “In this report, we…” n Reference ¨ Discourse focus/ anaphora resolution ¨ Indefinite/Definite reference ¨ Referential versus non-referential pronouns

Rules for Anaphora Resolution * Pros and cons? Alternative approaches?

Rules for Anaphora Resolution * Pros and cons? Alternative approaches?

Candidate passages from a single paper

Candidate passages from a single paper

Future Enhancements n Selection ¨ Integration of multiple selection criteria ¨ Improved linguistic procedures

Future Enhancements n Selection ¨ Integration of multiple selection criteria ¨ Improved linguistic procedures (better than the rules used here) ¨ Strategic addition of neighboring sentences n Compression ¨ Elimination of parenthetical material ¨ Elimination of repetition

Possible New Direction n Abstract frames ¨ Fill slots and then generate the summary

Possible New Direction n Abstract frames ¨ Fill slots and then generate the summary from the structured representation ¨ Look for sentences that play particular roles in the text in a more general way ¨ Teufel and Moens paper for next lecture is this type of approach (in 2002, 12 years later!) n More general purpose than semantic schemas (the “AI approach”, could draw from much recent work on information extraction)

Mc. Keown Paper

Mc. Keown Paper

Two Approaches n Focus on the generation problem ¨ Again n we see a

Two Approaches n Focus on the generation problem ¨ Again n we see a need for linguistic insight STREAK: Local approach ¨ Generate a draft ¨ Apply revision rules to aggregate sentences n PLANDOC: Global approach ¨ Use a plan of overall text structure to aggregate things that logically group together

Linguistic Considerations How to use syntactic and lexical devices to convey information concisely n

Linguistic Considerations How to use syntactic and lexical devices to convey information concisely n Given the choice of a particular word or syntactic structure, how does it constrain or allow the attachment of additional information n How to fold multiple pieces of information into a single linguistic construction n

Architecture

Architecture

Top-Down Architecture

Top-Down Architecture

Conjunction and Ellipsis in PLANDOC

Conjunction and Ellipsis in PLANDOC

Of the applications we have discussed for your project and assignments which of these

Of the applications we have discussed for your project and assignments which of these two approaches would be most appropriate?

Revision Operations n From last time: Sentence reduction, Sentence combination, Syntactic transformation. Lexical paraphrasing,

Revision Operations n From last time: Sentence reduction, Sentence combination, Syntactic transformation. Lexical paraphrasing, Generalization or specification

Single word multiple facts n n n Karl Malone scored 39 points. Karl Malone

Single word multiple facts n n n Karl Malone scored 39 points. Karl Malone tied his season high with 39 points. Karl Malone tied his season high.

Modification Jay Humphries scored 24 points. He came in as a reserve. n Reserve

Modification Jay Humphries scored 24 points. He came in as a reserve. n Reserve Jay Humphries scored 24 points. n

Questions?

Questions?