Summarization and Personal Information Management Carolyn Penstein Ros
- Slides: 24
Summarization and Personal Information Management Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
Announcements Questions? n Homework 3 will be assigned Thur or next Tue n Plan for Today n ¨ Two papers about early summarization work
Getting into Technology Problem Human Behavior * Early work in summarization s u c Solution Design o f s To y’ a d Technology Problem?
Historical View The World According to Carolyn n 1950 s The early days of AI ¨ n n n [both symbolic and connectionist!] 1958: First paper on summarization (H. P. Luhn) 1965 Chomsky’s Aspects of a Theory of Syntax 1970 I was born In the 80 s, Language technologies was From part ofthe linguistics 1988 I started college field of Information Science! 1990 Paice paper 1991 PARSEC: the first connectionist parser 1992 I started grad school 1995 Mc. Keown paper 1998 I got my Ph. D! [Most of the work we’ll read about] 2008 Now!
Two Approaches Full text Other Data Subset of Text Structured Representation Interpretation Subset with Glue phrases Generated Text Generation Mc. Keown paper – Major question: do you generate Paice paper – Major question: howhow do you select the rightcompact subset? text?
Paice Paper
Constructing Literature Abstracts by Computer n Problems with making abstracts sound coherent ¨ Highly disjointed ¨ Dangling anaphoric expressions n Approach ¨ Pick sections that are relatively self-contained (tidy) n Need: theory of text structure
Finding Key Passages n Location method ¨ First sentence of a paragraph often states the main point n Precursors to TF-IDF like metrics ¨ Frequency keyword: relies on an existing set of index terms per document ¨ Title keyword method: similar, but uses words from title ¨ Bonus word method: superlatives and value words n Using discourse structure
Important linguistic constructs Remember Grosz & Sidner: Intentional Structure and Attentional State n Indicator phrases/ Logical and rhetorical connectives n ¨ “The main aim of the paper is…” ¨ “In this report, we…” n Reference ¨ Discourse focus/ anaphora resolution ¨ Indefinite/Definite reference ¨ Referential versus non-referential pronouns
Rules for Anaphora Resolution * Pros and cons? Alternative approaches?
Candidate passages from a single paper
Future Enhancements n Selection ¨ Integration of multiple selection criteria ¨ Improved linguistic procedures (better than the rules used here) ¨ Strategic addition of neighboring sentences n Compression ¨ Elimination of parenthetical material ¨ Elimination of repetition
Possible New Direction n Abstract frames ¨ Fill slots and then generate the summary from the structured representation ¨ Look for sentences that play particular roles in the text in a more general way ¨ Teufel and Moens paper for next lecture is this type of approach (in 2002, 12 years later!) n More general purpose than semantic schemas (the “AI approach”, could draw from much recent work on information extraction)
Mc. Keown Paper
Two Approaches n Focus on the generation problem ¨ Again n we see a need for linguistic insight STREAK: Local approach ¨ Generate a draft ¨ Apply revision rules to aggregate sentences n PLANDOC: Global approach ¨ Use a plan of overall text structure to aggregate things that logically group together
Linguistic Considerations How to use syntactic and lexical devices to convey information concisely n Given the choice of a particular word or syntactic structure, how does it constrain or allow the attachment of additional information n How to fold multiple pieces of information into a single linguistic construction n
Architecture
Top-Down Architecture
Conjunction and Ellipsis in PLANDOC
Of the applications we have discussed for your project and assignments which of these two approaches would be most appropriate?
Revision Operations n From last time: Sentence reduction, Sentence combination, Syntactic transformation. Lexical paraphrasing, Generalization or specification
Single word multiple facts n n n Karl Malone scored 39 points. Karl Malone tied his season high with 39 points. Karl Malone tied his season high.
Modification Jay Humphries scored 24 points. He came in as a reserve. n Reserve Jay Humphries scored 24 points. n
Questions?
- Newton's first law
- Entity summarization
- Text summarization vietnamese
- Medical summaries for law firms
- Text summarization vietnamese
- Text summarization vietnamese
- Abstractive summarization
- Caroline mendiola
- Carolyn boroden
- How to tell wild animals pictures
- Carolyn sourek
- Carolyn talbot
- Carolyn johnston md
- Carolyn shread
- Carolyn graham jazz chants
- Carolyn hanesworth
- Stury
- Carolyn marano nj
- Carolyn brownawell
- Carolyn maull
- Carolyn has 20 biscuits in a tin
- Carolyn ells
- Carolyn hotchkiss
- Carolyn laorno
- Carolyn washburn