Knowledgerich approaches for text summarization Minna Vasankari 27

  • Slides: 32
Download presentation
Knowledge-rich approaches for text summarization Minna Vasankari 27. 11. 2001

Knowledge-rich approaches for text summarization Minna Vasankari 27. 11. 2001

Structure 1. The idea 2. Conceptual summarization 3. Linguistic summarization 4. Example system: Plandoc

Structure 1. The idea 2. Conceptual summarization 3. Linguistic summarization 4. Example system: Plandoc 5. Summary 2

The idea • Full text is not the only possible source material for summarization

The idea • Full text is not the only possible source material for summarization • Other sources: – databases – simulation data – user interaction sequences – etc 3

The idea • Data with structure – easier to interpret than full text –

The idea • Data with structure – easier to interpret than full text – no source text => no shortcuts – text generation phase is hard – domain-dependency 4

Conceptual summarization • Sorting the source material – facts, events • Choosing what is

Conceptual summarization • Sorting the source material – facts, events • Choosing what is important – must be included in the summary • and what is potentially important – can be left out or included 5

Conceptual summarization • What is important? – depends on the domain – depends on

Conceptual summarization • What is important? – depends on the domain – depends on the input material – depends on the user 6

Conceptual summarization • Importance of a fact – manual decision • Importance of an

Conceptual summarization • Importance of a fact – manual decision • Importance of an event – manual decision – frequency analysis 7

Conceptual summarization – Potentially important facts/events are included only if they fit in –

Conceptual summarization – Potentially important facts/events are included only if they fit in – Determined by • space limit • linguistic constraints • possible ordering of facts 8

Linguistic summarization • Expressing the same information in fewer sentences • Method: linguistic constructs

Linguistic summarization • Expressing the same information in fewer sentences • Method: linguistic constructs & revision • Danger: over-effective compression leads to unreadable sentences 9

Linguistic summarization • Linguistic constructs: – semantically rich words – modifiers of nouns or

Linguistic summarization • Linguistic constructs: – semantically rich words – modifiers of nouns or verbs – conjunction and ellipsis – abridged references – abstraction – aggregation – presentational techniques 10

Linguistic summarization • Semantically rich words – killing two birds with one stone Karl

Linguistic summarization • Semantically rich words – killing two birds with one stone Karl Malone scored 39 points. + Karl Malone's 39 point performance is equal to his season high. becomes Karl Malone tied his season high with 39 points. 11

Linguistic summarization • Modifiers of nouns or verbs – one fact specifies a verb

Linguistic summarization • Modifiers of nouns or verbs – one fact specifies a verb or a noun in another fact Jay Humphries scored 24 points. He came in as a reserve. becomes Reserve Jay Humphries scored 24 points. 12

Linguistic summarization • Conjunction – joining facts with "and" or "or" Mick Reynes scored

Linguistic summarization • Conjunction – joining facts with "and" or "or" Mick Reynes scored 265 points last season and Jack Jones scored 265 points last season. • Ellipsis – removing repetition Mick Reynes and Jack Jones scored 265 points last season. 13

Linguistic summarization • Abridged references – using shorter names for already introduced things San

Linguistic summarization • Abridged references – using shorter names for already introduced things San Antonio Spurs took a 127 -111 victory over Denver Nuggets and handed Denver their seventh straight loss. 14

Linguistic summarization • Abstraction – replacing a series of events with a single event

Linguistic summarization • Abstraction – replacing a series of events with a single event mission start, movements, firing, damages, mission abort => failed mission 15

Linguistic summarization • Aggregation – connecting events with spatial or temporal adverbials Site-A and

Linguistic summarization • Aggregation – connecting events with spatial or temporal adverbials Site-A and Site-B simultaneously fired a missile. • Presentational techniques – using spatial or temporal adverbs Site-A fired a missile at 1302. Three minutes later Site-B fired a missile. 16

Linguistic summarization • Revision: approach 1 – First create a draft summary from important

Linguistic summarization • Revision: approach 1 – First create a draft summary from important facts – Then enrich the draft with potentially important facts • Revision: approach 2 – Generate the draft by collecting similar facts into each sentence – Compress the sentences with ellipsis etc. 17

Example system: Plandoc • Application developed by K. Mc. Keown, J. Robin and K.

Example system: Plandoc • Application developed by K. Mc. Keown, J. Robin and K. Kukich at Columbia University, New York and Bell Communication Research (1995) • Problem – a telephone company engineer plans how a telephone route should be developed in the next 20 years – the engineer uses PLAN planning system software – Goal: a documentation of the planning process 18

Plandoc: input and output • Input: a trace of user's actions with the PLAN

Plandoc: input and output • Input: a trace of user's actions with the PLAN system 1. RUNID fiberall FIBER 6/19/93 act yes 2. FA 1301 2 1995 3. FA 1201 2 1995 4. FA 1501 3 1995 5. ANF 1201 1301 2 1995 24 END. 856. 0 670. 2 19

Plandoc: input and output • Output: a 1 -2 page report – the initial

Plandoc: input and output • Output: a 1 -2 page report – the initial plan PLAN proposed – refinements the engineer made – alternative refinements the engineer tried but rejected – the final plan • Purpose: documentation 20

Plandoc: conceptual summarization • Important facts – accepted parts of the initial plan +

Plandoc: conceptual summarization • Important facts – accepted parts of the initial plan + accepted refinements to it = the final plan – rejected refinements? • the engineer decides 21

Plandoc: overview of the method • Fact generator converts the input to an internal

Plandoc: overview of the method • Fact generator converts the input to an internal representation – facts presented as feature structures (attribute/value pairs) • Ontologizer enriches the facts with e. g. price information • Discourse planner groups the facts • A lexicalizer/sentence generator converts the groups into English 22

Plandoc: processing the input Example: FA 1301 2 1995 Enriched feature structure: class: refinement

Plandoc: processing the input Example: FA 1301 2 1995 Enriched feature structure: class: refinement ref-type: fiber action: activation csa-site: 1301 date: year: 1995, quarter: 2 price: $56. 00 K 23

Plandoc: grouping facts into sentences • Let's construct a sentence from the FA facts:

Plandoc: grouping facts into sentences • Let's construct a sentence from the FA facts: FA 1301 2 1995 FA 1201 2 1995 FA 1501 3 1995 1. Group facts by common action – action = activation for all – one sentence is needed FA 1301 2 1995 FA 1201 2 1995 FA 1501 3 1995 24

Plandoc: grouping facts into sentences 2. For each common-action group (sentence): (a) Collapse groups

Plandoc: grouping facts into sentences 2. For each common-action group (sentence): (a) Collapse groups which differ by one feature into a single group – two groups: FA 1301, 1201 2 1995 FA 1501 3 1995 25

Plandoc: grouping facts into sentences (b) If more than one group remains (sentence is

Plandoc: grouping facts into sentences (b) If more than one group remains (sentence is broken into clauses by conjunction): i. Find the feature that is shared across most groups (but has not the same value for all) FA 1301, 1201 2 1995 FA 1501 3 1995 • only the date feature is left and it has two values => two clauses are needed 26

Plandoc: grouping facts into sentences ii. Sort the groups to subgroups by the most

Plandoc: grouping facts into sentences ii. Sort the groups to subgroups by the most common shared feature (nested conjunction inside the clause) – each group has only one member FA 1301, 1201 2 1995 FA 1501 3 1995 27

Plandoc: grouping facts into sentences iii. Repeat the selection of most common shared feature

Plandoc: grouping facts into sentences iii. Repeat the selection of most common shared feature and sorting to subgroups until all have been sorted – no subgroups left iv. Sort the clauses by date FA 1301, 1201 2 1995 FA 1501 3 1995 28

Plandoc: grouping facts into sentences FA 1301, 1201 2 1995 FA 1501 3 1995

Plandoc: grouping facts into sentences FA 1301, 1201 2 1995 FA 1501 3 1995 • The produced sentence: This refinement activated fiber for CSAs 1301 and 1201 in 1995 Q 2 and this refinement activated fiber for CSA 1501 in 1995 Q 3. • The final sentence after ellipsis: This refinement activated fiber for CSAs 1301 and 1201 in 1995 Q 2 and for CSA 1501 in 1995 Q 3. 29

Plandoc: grouping facts into sentences • Readibility This refinement extended fiber from fiber hub

Plandoc: grouping facts into sentences • Readibility This refinement extended fiber from fiber hub 8107 to CSAs 8128, 8126, 8121 and 8113 and from fiber hub 8120 to the CO in 1994 Q 1 and from the CO to CSA 8120 in 1994 Q 3, with the active fibers placed on the primary path. – limit the number of facts conjoined – limit the number of embedded conjunctions inside a clause 30

Summary • Also other sources than text can be summarized • Problems: – choosing

Summary • Also other sources than text can be summarized • Problems: – choosing the important elements – generating a compact and readable summary text – domain-dependency 31

Summary • Applications: – automatic weather reports (not predictions!) – simulation reports – patient

Summary • Applications: – automatic weather reports (not predictions!) – simulation reports – patient monitoring system summaries – etc 32