Natural Language Generation Saurabh Chanderiya 07005004 Abhimanyu Dhamija

Natural Language Generation Saurabh Chanderiya (07005004) Abhimanyu Dhamija (07005024) E K Venkatesh (07005031) G Hrudil (07005032) B Vinod Kumar (07 d 05018) Guide: Prof. Pushpak Bhattacharya 1

Outline �What is Natural Language Generation? �Motivation �Stages in NLG �Applications of NLG �Evaluation Techniques �Conclusion 2

What is Natural Language Generation? �Natural Language Generation (NLG) is the subfield of artificial intelligence and computational linguistics that focuses on computer systems that can produce understandable texts in English or other human languages [Reiter and Dale, 2000] 3

What is Natural Language Generation? �Convert computer based representation into natural language representation (opposite of NLU) Data/ Machinerepresentation NLG Natural Language Text NLU �Machine representation comprises of some form of computerized data �Examples: �A database of daily temperature values in a city �An ontology �A collection of fairy tales 4

Key Elements in NL Generation �Many choices available – an NLG system needs to choose the most appropriate one �Example: Denoting value-change �“the temperature rose” – increase in value �“the temperature plummeted” – drastic increase in value �“the rain got heavier” – again increase in value, but different context [Wikipedia] �Meeting the communication goals – so that the generated text is understandable to the target reader 5

Motivating 1 Example �Suppose you are asked to write an article on IIT Bombay. �How do you proceed? �Step 1: What all should I write about? How should I organize it? � History, Students, Professors, Gymkhana, Mood Indigo … � Start with description of gymkhana or history … �Step 2: What should my style be? � Editorial, Prose, Poetry … � Simple words … �Step 3: Pen it down 6

Motivating 2 Example �We have just identified the key stages in Natural Language Generation 7

Motivating 3 Example �We have just identified the key stages in Natural Language Generation �Step 1: What all should I write about? How should I organize it? � History, Students, Professors, Gymkhana, Mood Indigo … � Start with description of gymkhana or history … TEXT PLANNING (Content Determination and Document Structuring) 8

Motivating 4 Example �We have just identified the key stages in Natural Language Generation �Step 2: What should my style be? � Editorial, Prose, Poetry … � Simple words … MICROPLANNING (Lexical Choice, Referring Expression Generation, Aggregation) 9

Motivating 5 Example �We have just identified the key stages in Natural Language Generation �Step 3: Pen it down REALIZATION 10

NLG Systems Architecture Control Data Input Data Document Planning Content Determination Document Structuring Micro planning Realization Output Data Lexical Choice Referring Expressions Aggregation 11

Stages in NLG �The following different stages of Natural Language Generation can be identified: �Content Determination �Document Structuring �Lexical Choice �Referring Expression Generation �Aggregation �Realization �Each of these is considered in detail in the next few slides 12

Content 1 Determination �Deciding what information to mention in the text �Example: [Wikipedia] �NLG system to summarize information about sick babies has the following information: � The baby is being given morphine via an IV drop � The baby's heart rate shows bradycardias (temporary drops) � The baby's temperature is normal � The baby is crying 13

Content 2 Determination �Factors affecting the decision could be �Communicative goal – the purpose of the text and the reader �A diagnosing doctor would be interested in heart rate while a parent would want to know if the baby is crying or not �Size and level of detail � A formal report about the patient vs. an SMS to the doctor �How unusual the information is � Is it important to mention that the baby’s temperature is normal? 14

Content 3 Determination q Techniques employed �Schemas – predefined templates which explicitly specify what information is to be included �Based on Rhetorical Predicates �Rhetorical predicates specify the “role” that is played by each utterance in the text �Example: Mary has a pink coat � Attributive � �Other rhetorical predicates: � Particular illustration, evidence, inference etc. [Mc. Keown, 1985] 15

Content 4 Determination �Example Schema using Rhetorical Predicates �Identification Schema (for providing definitions) [Mc. Keown, 1985] Identification (class & attribute) Attributive Particular Illustration �Sample text generated from this schema could be Mumbai is an important economic region in Maharashtra. There are many textile mills in Mumbai. Bombay Dyeing is among the noteworthy textile mills. 16

Content 5 Determination �Explicit Reasoning Approaches �Example: Plot generation using case based reasoning � [B. D´ıaz-Agudo et. al, 2004] � Case based reasoning characterized by: retrieve, reuse, revise, retain �Build cases from a set of stories – similar to identifying features that constitute the story �Ontology for the fairy tale world �Accept query from user regarding features of the new plot to be generated 17

Content 6 Determination �Example: Plot generation using case based reasoning (contd. ) �Retrieve similar case – similarity calculated on the basis of distance in the ontology �Resolve dependencies – ask user for further input if needed �Generate plot 18

Content 7 Determination �Sample run: �Query: “princess, murder, interdiction violated, competition, test of hero” �Story number 113 (Swan Geese) returned based on similarity �Perform substitutions �Generate plot 19

Document 1 Structuring �Decide the order and grouping of sentences in a generated text �Example: 1. John went to the shop. 2. John bought an apple. � Now consider: 1. John bought an apple. 2. John went to the shop. � The first case seems more coherent than the second. Thus, sentence structuring is important. 20

Document 2 Structuring �Algorithms �Schema based approach �Corpus based Approach [M Lapata, 2003] P(S 1 … Sn) = P(S 1) * P(S 2|S 1) * P(S 3|S 1, S 2) * … *P(Sn|S 1 … Sn-1) (assuming dependence only on previous sentence) P(S 1 … Sn) = P(S 1) * P(S 2|S 1) * P(S 3|S 2) * … * P(Sn|Sn-1) (using features to representences) P(S 1 … Sn) = P((a<i, 1>, a<i, 2>, … , a<i, n>) | (a<i-1, 1>, … a<i-1, m>)) (assuming independence of features and approximating P(S<i>|S<i-1>) from the Cartesian product S<i> x S<i-1>) P(S<i>|S<i-1>) = π {P(a<i, j>|a<i-1, k>)} where jεS<i> and k ε S<i-1> (estimate prob. using counts, construct directed weighted graph (sentences as nodes and probabilities as edge weights) and obtain approximate solution) 21

1 Aggregation �Aggregation is a subtask of Natural language generation, which involves merging syntactic constituents (such as sentences and phrases) together �Example: �John went to the shop. John bought an apple. �“John went to the shop and bought an apple. ” �Could be syntactic or conceptual �Example of conceptual: replacing “Saturday and Sunday” by “weekend” �Aggregation algorithms must do two things: �Decide when two constituents should be aggregated �Decide how two constituents should be aggregated, and create the aggregated structure 22

2 Post-editing �Identity between different word-groups �Lemma identity: two different words belong to the same inflectional paradigm �Form identity: two words have the same spelling/ sound are lemma-identical �Co-referentiality: two words/constituents denote the same entity or entities in the external context, i. e. have the same reference �[Karin Harbusch et. al, 2009] 23

[Karin Harbusch et. al, 2009] 24

Lexical 1 choice �Lexical choice involves choosing the content words (nouns, verbs, adjectives, adverbs) in a generated text. �The simplest type of lexical choice involves mapping a domain concept to a word. �Lexical choice modules must be informed by linguistic knowledge of how the system's input data maps onto words. This is a question of semantics, but it is also influenced by syntactic factors and pragmatic factors. � 3 factors to look for: �Genre �People perceive different words differently �How language relates to the non-linguistic world 25

Humans’ perception about words 3 �[Rohit Parikh, 1994] �By evening: has different meaning �Different dialects �Choosing between near-synonymous words �It has been suggested that utility theory be applied to word choice. In other words, if we know (1) the probability of a word’s being correctly interpreted or misinterpreted and (2) the benefit to the user of correct interpretation and the cost of misinterpretation, then we can compute an overall utility to the user of using the word. 26

Referring expression generation 1 �This the second last stage in natural language generation �This involves creating referring expressions (noun phrases) that identify specific entities to the reader �Example: �He told the tourist that rain was expected tonight in Southern Scotland. �He, the tourist, tonight and Southern Scotland are reference expressions 27

Criteria for good 2 referents �Ideally, a good referring expression should satisfy a number of criteria: �Referential success: It should unambiguously identify the referent to the reader. �Ease of comprehension: The reader should be able to quickly read and understand it. �Computational complexity: The generation algorithm should be fast �No false inferences: The expression should confuse or mislead the reader by suggesting false implications or other pragmatic inferences. [Wikipedia] 28

Kinds of Referring 3 Expressions �Proper noun-noun �Definite Noun Phrases �Spatial �Temporal Reference �Different Algorithmic models �Graph-Based Generation of Referring Expressions [Krahmer, et. al. 2003] �Centering theory uses ranking [Poesio et. al, 2004 ] �Generating Approximate Geographic Descriptions [Turner et. al, 2009] 29

1 Realization �Realization deals with creating the actual text from the abstract representation �Realization involves three kinds of processing: �Syntactic realization – decide order of components, add function words etc. � Example: in English, Subject usually precedes the verb �Morphological realization – compute inflected forms � Example: plural(woman) == women �Orthographic realization � Capitalization of first letter, punctuations etc. �Realization systems: simplenlg, kpml etc. 30

2 Realization �SIMPLENLG �a simple NLG library for Java for generating grammatically correct English sentences �Sample code: SPhrase. Spec p = nlg. Factory. create. Clause(); p. set. Subject("Mary"); p. set. Verb("chase"); p. set. Object("the monkey"); String output 2 = realiser. realise. Sentence(p); System. out. println(output 2); �Output: “Mary chases the monkey” [http: //code. google. com/p/simplenlg/wiki/Section 1] 31

Applications of 1 NLG �Present information in more convenient way �Airline schedule database �Accounting spreadsheet �Automating document production �Doctor writing discharge summaries �Programmer writing code documentation, logic description etc. �In many contexts, human intervention is required to create texts 32

Application of NLG with human 2 intervention �NLG system is used to produce an initial draft of a document which can be further edited by human author �E. g. �Weather Reporter, which helps meteorologists compose weather forecasts �DRAFTER, which helps technical authors write software manuals �Aleth. Gen, which helps customer-service representatives write response letters to customers 33

Application of NLG without human intervention 3 �Some NLG systems have been developed with the aim of operating as standalone systems. �E. g. �Model Explainer, which generates textual descriptions of classes in an object-oriented software system �LFS, which summarizes statistical data for the general public �PIGLET, which gives hospital patients explanations of information in the patient records. 34

Weather 4 Reporter �Provide retrospective reports of the weather over periods whose duration is one month �Takes large set of numerical data �Produces short texts �E. g. text produced by Weather reporter �The month was cooler and drier than average, with the average number of rain days. The total rain for the year so far is well below average. There was rain on every day for eight days from the 11 th to the 18 th 35

Weather 5 Reporter 36

Weather 6 Reporter �Data shown is real data collected automatically by meteorological data gathering equipment �Weather Reporter design is based on real input data and a real corpus of human-written texts 37

Weather Reporter �Example, using the historical data for 1 -July-2005, the software produces �Grass pollen levels for Friday have increased from the moderate to high levels of yesterday with values of around 6 to 7 across most parts of the country. However, in Northern areas, pollen levels will be moderate with values of 4. �In contrast, the actual forecast (written by a human meteorologist) from this data was �Pollen counts are expected to remain high at level 6 over most of Scotland, and even level 7 in the south east. The only relief is in the Northern Isles and far northeast of mainland Scotland with medium levels of pollen count. 38

Model 7 Explainer �Generates textual description of information in models of object-oriented software. 39

Model 8 Explainer �O-O models are usually depicted graphically �Model Explainer is useful as certain kind of information is better communicated textually �E. g. �Via Model Explainer it is clear that a section must be taught by exactly one professor �Clear data especially for people who are not familiar with the notation used in the graphical depiction 40

Model 9 Explainer 41

Model 10 Explainer �It also express relations from the object model in a variety of linguistic contexts �E. g. “teaches” �A professor teaches a course �A section must be taught by a professor �Professor smith does not teach any sections 42

Task-Based Evaluation �Task-based evaluations measure the impact of generated texts on end users and typically involve techniques from an application domain such as medicine. �For example, a system which generates summaries of medical data can be evaluated by giving these summaries to doctors, and assessing whether the summaries helps doctors make better decisions. 43

Evaluations Based on Human Ratings and Judgments �Another way of evaluating an NLG system is to ask human subjects to rate generated texts on an n-point rating scale 44

Unigram Precision Candidate: the the. Reference 1: The cat is on the mat. Reference 2: There is a cat on the mat. �Unigram Precision of above candidate is 7 as 7 candidate words(“the”) occur in reference 1. �But candidate is not appropriate. 45

Modified Unigram precision �Count the maximum number of times a word occurs in any single reference translation. �Clip the total count of each candidate word by its maximum reference count �Add these clipped counts �Divide by the total number of candidate words. 46

Modified unigram precision Candidate: the the. Reference 1: The cat is on the mat. Reference 2: There is a cat on the mat. �Max count of “the” in ref 1 is 2. �Total # of candidate words is 7 �So, Modified Unigram Precision = 2/7 47

Modified n-gram precision �All candidate n-gram counts and their corresponding maximum reference counts are collected. �The candidate counts are clipped by their corresponding reference maximum value. �Add them. �Divide by the total number of candidate n-grams 48

Modified n-gram precision �A translation using the same words (1 -grams) as in the references tends to satisfy adequacy. �The longer n-gram matches account for fluency. 49

Modified n-gram precision Candidate 1: It is a guide to action which ensures that the military always obey the commands the party. Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct. Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed directions of the party �Modified Bigram Precision of candidate 1 = 8/17 �Modified Bigram Precision of candidate 2 = 1/13 50

Modified n-gram precision on a multi-sentence 51

Modified N-gram Precision : Sentence Length Candidate: of the Reference 1: It is a guide to action that ensures that the military will forever heed Party commands. Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party. Reference 3: It is the practical guide for the army always to heed directions of the party. �Modified Unigram Precision = 2/2 �Modified Bigram Precision = 1/1 52

Brevity Penality �Candidate translations longer than their references are already penalized by the modified n-gram precision measure: there is no need to penalize them again. �Brevity Penality = 1 if candidate matches a reference. Else it is < 1. 53

Effective Reference Length • best match lengths �We call the closest reference sentence length to candidate length the “best match length. ” �Effective Reference Length �Sum of all the best match lengths 54

55

Conclusion �Although Natural Language Generation techniques do generate text from the underlying computer based representation, �The output text of the existing NLG systems is not of fairly high quality – this necessitates human intervention when a high quality text is desired �[Sripada et. al, 2003] �In NLG, as opposed to Machine Translation, it is better to automatic evaluation metrics only as a supplement to human evaluations and not as a replacement. �[Reiter et. al, 2009] 56

1 References � Dale, Robert; Reiter, Ehud (2000). Building natural language generation systems. Cambridge, UK: Cambridge University Press � Reiter E, Sripada S, Hunter J, Yu J, Davy I (2005). "Choosing Words in Computer-Generated Weather Forecasts � B. D´ıaz-Agudo, P. Gerv´as, and F. Peinado. A case based reasoning approach to story plot generation. In ECCBR’ 04, Springer-Verlag LNCS/LNAI, Madrid, Spain, 2004 � Reiter E, Anja Belz (2009). An Investigation into the Validity of Some Metrics for Automatically Evaluating Natural Language Generation Systems, Association for Computational Linguistics � http: //code. google. com/p/simplenlg/wiki/Section 1 � M Lapata (2003). Probabilistic Text Structuring: Experiments with Sentence Ordering. Proceedings of ACL-2003 � http: //web. science. mq. edu. au/~rdale/teaching/esslli/index. html � http: //www. wikipedia. org 57

2 References � E Krahmer, S van Erk, A Verleg (2003). Graph-Based Generation of Referring Expressions. Computational Linguistics � M Poesio, R Stevenson, B di Eugenio, J Hitzeman (2004). Centering: A Parametric Theory and Its Instantiations. Computational Linguistics � R Turner, Y Sripada, E Reiter (2009) Generating Approximate Geographic Descriptions. Proceedings of ENLG-2009 � Kathleen R Mc. Keown(1985). Discourse Strategies for Generating Natural. Language Text, Elsevier Science Publishers B. V. (North-Holland) � S. Sripada, E. Reiter, I. Davy (2003), Sum. Time-Mousam: Congurable marine weather forecast generator, Expert Update 6 (3) � Karin Harbush, Gerard Kempen (2009), Generating clausal coordinate ellipsis multilingually: A uniform approach based on postediting; Proceedings of the 12 th European Workshop on Natural Language Generation � Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu(2003) Bleu: a Method for Automatic Evaluation of Machine Translation, IBM Research Division 58

Thank You! 59