NLG Agenda Natural Language Generation NLG Generation Steps

  • Slides: 38
Download presentation
NLG

NLG

Agenda • Natural Language Generation (NLG) • Generation Steps Natural Language Processing (NLP) by

Agenda • Natural Language Generation (NLG) • Generation Steps Natural Language Processing (NLP) by Rahman Ali, Lect: QACC, UOP 2 19 June 2021

Definition 1: NLG • Natural Language Generation (NLG) is the process of constructing natural

Definition 1: NLG • Natural Language Generation (NLG) is the process of constructing natural language outputs from non-linguistic inputs. • Goal: • The goal of this process can be viewed as the inverse of that of natural language understanding (NLU) • NLU Vs NLG: • NLG maps from meaning to text, while NLU maps from text to meaning. (Juraffsky, Chapter 20) Natural Language Processing (NLP) by Rahman Ali, Lect: QACC, UOP 3 19 June 2021

Definition 2: NLG • Natural Language Generation (NLG) is the natural language processing task

Definition 2: NLG • Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. (http: //en. wikipedia. org/wiki/Natural_language_generation, Retrieved: 31 Oct, 2010) Natural Language Processing (NLP) by Rahman Ali, Lect: QACC, UOP 4 19 June 2021

Definition 3: NLG Natural language generation is the process of deliberately constructing a natural

Definition 3: NLG Natural language generation is the process of deliberately constructing a natural language text in order to meet specified communicative goals. [Mc. Donald 1992] Natural Language Processing (NLP) by Rahman Ali, Lect: QACC, UOP 5 19 June 2021

What is NLG? Or Ingredient of NLG • Goal: • Computer software which produces

What is NLG? Or Ingredient of NLG • Goal: • Computer software which produces understandable and appropriate texts in English or other human languages • Input: • Some underlying non-linguistic representation of information • Output: • Documents, reports, explanations, help messages, and other kinds of texts • Knowledge sources required: • Knowledge of target language and of the domain 6

Example System #1: Fo. G • Function: • Produces textual weather reports in English

Example System #1: Fo. G • Function: • Produces textual weather reports in English and French • Input: • Graphical/numerical weather depiction • User: • Environment Canada (Canadian Weather Service) • Developer: • Co. Gen. Tex • Status: • Fielded, in operational use since 1992 7

Fo. G: Input 8

Fo. G: Input 8

Fo. G: Output 9

Fo. G: Output 9

Example System #2: Plan. Doc • Function: • Produces a report describing the simulation

Example System #2: Plan. Doc • Function: • Produces a report describing the simulation options that an engineer has explored • Input: • A simulation log file • User: • Southwestern Bell Telephone Company (Texas) • Developer: • Bellcore and Columbia University • Status: • Fielded, in operational use since 1996 10

Plan. Doc: Input RUNID fiberall FIBER 6/19/93 act yes FA 1301 2 1995 FA

Plan. Doc: Input RUNID fiberall FIBER 6/19/93 act yes FA 1301 2 1995 FA 1201 2 1995 FA 1401 2 1995 FA 1501 2 1995 ANF co 1103 2 1995 48 ANF 1201 1301 2 1995 24 ANF 1401 1501 2 1995 24 END. 856. 0 670. 2 11

Plan. Doc: Output This saved fiber refinement includes all DLC changes in Run-ID ALLDLC.

Plan. Doc: Output This saved fiber refinement includes all DLC changes in Run-ID ALLDLC. RUN-ID FIBERALL demanded that PLAN activate fiber for CSAs 1201, 1301, 1401 and 1501 in 1995 Q 2. It requested the placement of a 48 -fiber cable from the CO to section 1103 and the placement of 24 -fiber cables from section 1201 to section 1301 and from section 1401 to section 1501 in the second quarter of 1995. For this refinement, the resulting 20 year route PWE was $856. 00 K, a $64. 11 K savings over the BASE plan and the resulting 5 year IFC was $670. 20 K, a $60. 55 K savings over the BASE plan. 12

Example System #3: STOP • Function: • Produces a personalized smoking-cessation leaflet • Input:

Example System #3: STOP • Function: • Produces a personalized smoking-cessation leaflet • Input: • Questionnaire about smoking attitudes, beliefs, history • User: • NHS (British Health Service) • Developer: • University of Aberdeen • Status: • Undergoing clinical evaluation to determine its effectiveness 13

STOP: Input 14

STOP: Input 14

STOP: Output Dear Ms Cameron Thank you for taking the trouble to return the

STOP: Output Dear Ms Cameron Thank you for taking the trouble to return the smoking questionnaire that we sent you. It appears from your answers that although you're not planning to stop smoking in the near future, you would like to stop if it was easy. You think it would be difficult to stop because smoking helps you cope with stress, it is something to do when you are bored, and smoking stops you putting on weight. However, you have reasons to be confident of success if you did try to difficulties. 15 stop, and there are ways of coping with the

Example System #4: TEMSIS • Function: • Summarizes pollutant information for environmental officials •

Example System #4: TEMSIS • Function: • Summarizes pollutant information for environmental officials • Input: • Environmental data + a specific query • User: • Regional environmental agencies in France and Germany • Developer: • DFKI Gmb. H • Status: • Prototype developed; requirements for fielded system being analyzed 16

TEMSIS: Input Query ((LANGUAGE FRENCH) (GRENZWERTLAND GERMANY) (BESTAETIGE-MS T) (BESTAETIGE-SS T) (MESSSTATION "Voelklingen City")

TEMSIS: Input Query ((LANGUAGE FRENCH) (GRENZWERTLAND GERMANY) (BESTAETIGE-MS T) (BESTAETIGE-SS T) (MESSSTATION "Voelklingen City") (DB-ID "#2083") (SCHADSTOFF "#19") (ART MAXIMUM) (ZEIT ((JAHR 1998) (MONAT 7) (TAG 21)))) 17

TEMSIS: Output Summary • Le 21/7/1998 à la station de mesure de Völklingen -City,

TEMSIS: Output Summary • Le 21/7/1998 à la station de mesure de Völklingen -City, la valeur moyenne maximale d'une demi-heure (Halbstundenmittelwert) pour l'ozone atteignait 104. 0 µg/m³. Par conséquent, selon le decret MIK (MIK-Verordnung), la valeur limite autorisée de 120 µg/m³ n'a pas été dépassée. • Der höchste Halbstundenmittelwert für Ozon an der Meßstation Völklingen -City erreichte am 21. 7. 1998 104. 0 µg/m³, womit der gesetzlich zulässige Grenzwert nach MIK-Verordnung von 120 µg/m³ nicht überschritten wurde. 18

Types of NLG Applications • Automated document production • weather forecasts, simulation reports, letters,

Types of NLG Applications • Automated document production • weather forecasts, simulation reports, letters, . . . • Presentation of information to people in an understandable fashion • medical records, expert system reasoning, . . . • Teaching • information for students in CAL systems • Entertainment • jokes (? ), stories (? ? ), poetry (? ? ? ) 19

An Architecture for Generation (Juraffsky, Chapter 20) Natural Language Processing (NLP) by Rahman Ali,

An Architecture for Generation (Juraffsky, Chapter 20) Natural Language Processing (NLP) by Rahman Ali, Lect: QACC, UOP 20 19 June 2021

An Architecture for Generation (Cont. . ) • Discourse Planner – • This component

An Architecture for Generation (Cont. . ) • Discourse Planner – • This component starts with a communicative goal and makes all the choices. It selects the content from the knowledge base and then structures that content appropriately. The resulting discourse plan will specify all the choices made for the entire communication, potentially spanning multiple sentences and including other annotations (including hypertext, figures, etc. ). Natural Language Processing (NLP) by Rahman Ali, Lect: QACC, UOP 21 19 June 2021

An Architecture for Generation (Cont. . ) • Surface Realizer – • This component

An Architecture for Generation (Cont. . ) • Surface Realizer – • This component receives the fully specified discourse plan and generates individual sentences as constrained by its lexical and grammatical resources. • These resources define the realizer's potential range of output. • If the plan specifies multiple-sentence output, the surface realizer is called multiple times. Natural Language Processing (NLP) by Rahman Ali, Lect: QACC, UOP 22 19 June 2021

Component Tasks in NLG 1. Content Determination what information should be conveyed? 2. Discourse

Component Tasks in NLG 1. Content Determination what information should be conveyed? 2. Discourse Planning order & structure of message set 3. Sentence Aggregation grouping messages into sentences 4. Lexicalization words & phrases for concepts, relations 5. Referring Expression Generation words & phrases for entities 6. Linguistic Realization syntax, morphology, orthography 23 NLG: Overview

Typical 3 -Module/Pipelined Architecture goal 1. Content Determination 2. Discourse Planning Text Planner text

Typical 3 -Module/Pipelined Architecture goal 1. Content Determination 2. Discourse Planning Text Planner text plan 3. Sentence Aggregation 4. Lexicalization 5. Referring Expressions Sentence Planner Q: How should these be represented? sentence plans 6. Syntax, Morphology, Orthography Linguistic Realizer surface text 24 NLG: Overview

Text Plans • Common representation : tree • Leaf nodes = messages • Internal

Text Plans • Common representation : tree • Leaf nodes = messages • Internal nodes = message groupings • Simple text plans: templates OK • Complex text plans: require full representation language (e. g. , TAMERLAN, DIOGENES) 25 NLG: Overview

Sentence Plans • Simple: templates (select & fill) • Complex: abstract representation (SPL: Sentence

Sentence Plans • Simple: templates (select & fill) • Complex: abstract representation (SPL: Sentence Planning Language) 26 NLG: Overview

Example SPL Expression (S 1/exist : object (01/train : cardinality 20 : relations ((R

Example SPL Expression (S 1/exist : object (01/train : cardinality 20 : relations ((R 1/period : value daily) (R 2/source : value Aberdeen) (R 3/destination : value Glasgow)))) There are 20 trains a day from Aberdeen to Glasgow 27 NLG: Overview

Content Determination • Messages (raw content) • User Model (influences content) • Is Reasoning

Content Determination • Messages (raw content) • User Model (influences content) • Is Reasoning Required? Find a train from Aberdeen to Leeds (It requires two trains to get there) • Deep Reasoning Systems • represent the user’s goals as well as any immediate query • utilize plan recognition & reasoning 28 NLG: Overview

Discourse Planning • Structure messages into a coherent text • Example: start with a

Discourse Planning • Structure messages into a coherent text • Example: start with a summary, then give details • Discourse relations, e. g. : • elaboration: More specifically, X • exemplification: For example, X • contrast / exception: However, X • Rhetorical Structure Theory (RST) 29 NLG: Overview

Sentence Aggregation • No aggregation (1 sentence / message) • Relative Clause. . which

Sentence Aggregation • No aggregation (1 sentence / message) • Relative Clause. . which leaves at 10 am • Conjunction. . and the next train is the express • Combinations. . and the next train is the express which leaves at 10 am 30 NLG: Overview

Lexicalization • Choosing words to realize concepts or relations • Example: (action/change (measure outside_temperature)

Lexicalization • Choosing words to realize concepts or relations • Example: (action/change (measure outside_temperature) (delta (quantity/deg_F -10))) The temperature dropped 10 degrees 31 NLG: Overview

Lexical Selection Rules (*A-INGEST (AGENT *O-BOB) (PATIENT *O-MILK)) => "drink" (*A-INGEST (AGENT *O-BOB) (PATIENT

Lexical Selection Rules (*A-INGEST (AGENT *O-BOB) (PATIENT *O-MILK)) => "drink" (*A-INGEST (AGENT *O-BOB) (PATIENT *O-CHOCOLATE)) => "eat" 32 NLG: Overview

Case Creation • Additional structure is required to realize the meaning of the semantic

Case Creation • Additional structure is required to realize the meaning of the semantic representation (*A-KICK (AGENT *O-JOHN) (PATIENT *O-BALL)) "John propelled the ball with his foot" 33 NLG: Overview

Case Absorption • Word chosen to realize a semantic head also implies the meaning

Case Absorption • Word chosen to realize a semantic head also implies the meaning conveyed by a semantic role (*A-FILE-LEGAL-ACTION (AGENT *O-BOB) (PATIENT *O-SUIT) (RECIPIENT *O-ACME)) "Bob sued Acme" 34 NLG: Overview

Referring Expression Generation • Initial introduction A man in the park looked up •

Referring Expression Generation • Initial introduction A man in the park looked up • Pronouns He saw a bird fly over • Definite Descriptions The man covered his head with a newspaper 35 NLG: Overview

Fixing Robot Text • Start [the engine]i and run [the engine]i until [the engine]i

Fixing Robot Text • Start [the engine]i and run [the engine]i until [the engine]i reaches normal operating temperature • Start []i and run [the engine]i until [it]i reaches normal operating temperature • Second example introduces ellipsis and anaphora 36 NLG: Overview

Journalistic Style “A dissident Spanish priest was charged here today with attempting to murder

Journalistic Style “A dissident Spanish priest was charged here today with attempting to murder the Pope. Juan Fernandez Krohn, aged 32, was arrested after a man armed with a bayonet approached the Pope while he was saying prayers at Fatima on Wednesday night. According to the police, Fernandez told the investigating magistrates today, he trained for the past six months for the assault. If found guilty, the Spaniard faces a prison sentence of 15 -20 years. ” (Brown and Yule, 1983) 37 NLG: Overview

Reading/References • Daniel Jurafsky, Speech and Language Processing: An Introduction to Natural Language Processing,

Reading/References • Daniel Jurafsky, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Pearson Education, Inc, 2000. Ø Kamil Wiśniewski, July 12 th, 2007, Discourse Analysis. angielski. info/linguistics/discourse. htm, Retrieved date: Oct 16, 2010. Retrieved from: http: //www. tlumaczenia- Ø M. A. Khan, “ Text Based Machine Translation System”, Ph. D Thesis, 1995. Ø The Daily News, “Jolie was high on cocaine during TV interview: Former drug dealer”, dated: 22 Oct, 2010. http: //dailymailnews. com/1010/22/Show. Biz/index. php? id=3 Ø M. A. Khan, “MACHINE TRANSLATION BEYOND SENTENCE BOUNDARIES ”, , In Proceedings of Workshop on Proofing Tools and Language Technologies, July 1 -2, Patras University, Greece. www. mabidkhan. com/. . . /Scientific%20 Khyber, %20 Vol%201, %202004. pdf Natural Language Processing (NLP) by Rahman Ali, Lect: QACC, UOP 38 19 June 2021