Learning to Generate Utterances for Spoken Dialogue Systems

Background • Statistical methods are predominant in Spoken Dialogue Systems (SDS) and are quite

Spoken Dialogue System Components Cognitive Systems University of Sheffield�

Spoken Language Generation • DM and SLG typically handcrafted • Template Based Generation (usual)

NLG System Components DM Communicative Goals Text Planner NLG What to say Sentence Planner

Statistical Methods in SLG/NLG • Statistical Surface Realizers: overgenerate and rank using parameters trained

Problem • Even the ‘trainable’ sentence planner requires a domain-specific handcrafted generation dictionary to

Semantic Representations Example: SLG Joint relation (RST) Assert-food_quality(Babbo, superb) Dictionary Assert-décor(Babbo, superb) Assert-food_quality(X, superb)

Solution? • In many domains, there are web pages that describe and evaluate domain

Restaurant Domain Cognitive Systems University of Sheffield�

Sample Restaurant Review Cognitive Systems University of Sheffield�

Hotel Domain • • a little gem!! Submitted by: kathy b. of cincinnati, oh

What can be used • Textual reviews of domain entities • Scalar ratings of

Bootstrapping SLG • Automatically acquire dictionary entries from user reviews on the web •

Related Work • Create dictionary from parallel corpus (Barzilay et al. , 2002) Requires

Method • Create a population of utterances U from user reviews • For each

Experiment • Obtaining dictionary for restaurant domain • Data collected from we 8 there.

Collect user reviews • Select review websites with individual ratings for review entities •

Derive Domain Ontology • Assume meronymy relation between: o Any attribute that the user

Hypothesis/Assumptions • Closed Domain: At least some of the utterances in reviews realize the

Specify Lexicalizations of attributes Attribute Lexicalizations food, meal service, staff, wait staff, server, waitress

Create and Label Named Entities • Scrape web pages structured data for named entities

Derive Syntactic Representation • Run Minipar parser • Convert Minipar output to DSynt. S

Ratings Review Comment Food=5, Service=5, Value=5, The best Spanish food in New York. I

Filter dictionary entries • No Relations Filter, Other Relations Filter Check whether a mapping

Filtering Statistics filtered retained No Relations Filter 7, 947 10, 519 Other Relations Filter

Filtering Examples U: We had a wonderful time. The river was beautiful What an

Objective Evaluation • Domain coverage o How many relations are covered by the dictionary?

Domain Coverage Distribution of single scalar-valued relation mappings 1 2 3 4 5 Total

Multi-Relation Entries (122 in all) • Food-service: 39 • Food-value: 21 • Atmosphere-food: 14

Linguistic Variation • 137 syntactic patterns • 275 distinct lexemes, 2 -15 lexemes per

Examples Food Adjectival phrases: Attribute specificity RATING: ADJECTIVE • 1: awful, bad, cold, burnt,

Example Service Adjectival phrases RATING: ADJECTIVE • 1: awful, bad, forgetful and slow, marginal,

Example Atmosphere Adjectival phrases RATING: ADJECTIVE • 2: eclectic, unique and pleasant • 3:

Generativity • Incorporate the learned mappings into SPa. RKy generator (Stent et al. ,

Subjective Evaluation • 10 native English speakers • Compare baseline and learned mappings 27

Results Consistency 4. 71 baseline 4. 46 learned Naturalness 4. 61 4. 23 baseline

Conclusion • A new method for automatically acquiring a generation dictionary in spoken dialogue

Future Work • Issues of ‘meaning’ vs. polarity, not substitutable • Handcrafted lexicalizations: need

Slides: 39

Download presentation

Learning to Generate Utterances for Spoken Dialogue Systems by Mining User Reviews Prof. Marilyn Walker Cognitive Systems Group University of Sheffield (w/ Ryu Higashinaka, Rashmi Prasad, Giuseppe Di Fabbrizio)

Background • Statistical methods are predominant in Spoken Dialogue Systems (SDS) and are quite mature for speech recognition, language understanding, and dialogue act detection tasks (Young 2004) • Statistical methods also commonly used in NLP tasks such as information retrieval, automatic summarization, information extraction and machine translation • These methods provide scalability and portability across domains Cognitive Systems University of Sheffield�

Spoken Dialogue System Components Cognitive Systems University of Sheffield�

Spoken Language Generation • DM and SLG typically handcrafted • Template Based Generation (usual) Used in most dialogue systems, generate responses simply matching a template with the current dialogue context Ø Pros: efficient, highly customized Ø Cons: not portable cross domain, hard to encode linguistic constraints (subject/verb agreement), non scalable when more than few hundred templates Natural Language Generation (recently) o Clear separation between 1) text (or content) planning, 2) sentence planning, 3) surface realization. Uses general rules for each generation module Ø Pros: some aspects are portable cross domain and dialogue context Ø Cons: Specific rules are usually needed to tune the quality of the general rules, could be slow for real-time systems o • Cognitive Systems University of Sheffield�

NLG System Components DM Communicative Goals Text Planner NLG What to say Sentence Planner Surface Realizer Prosody Assigner How to to say itit TTS Cognitive Systems University of Sheffield�

Statistical Methods in SLG/NLG • Statistical Surface Realizers: overgenerate and rank using parameters trained from corpora (Halogen, Langkilde 2002; Fergus, Bangalore and Rambow 2004; Chen etal 2004 ) • Trainable Sentence Planning: learn which combination operations for aggregation and content ordering produce highest quality output (SPo. T, Rambow etal 2002, Walker etal 2004; SPa. RKy, Stent etal. 2004) • Prosody Assignment: learn from labelled data to assign appropriate prosody (Hirschberg 90, Pan etal. 2003) Cognitive Systems University of Sheffield�

Problem • Even the ‘trainable’ sentence planner requires a domain-specific handcrafted generation dictionary to specify the mapping between syntactic realizations and the text plan propositions (e. g. , X has good food, X has good service) • Mappings are created by hand It is costly, needed for each domain • Variation limited by original mappings and combination operations Utterances can be unnatural Cognitive Systems University of Sheffield�

Semantic Representations Example: SLG Joint relation (RST) Assert-food_quality(Babbo, superb) Dictionary Assert-décor(Babbo, superb) Assert-food_quality(X, superb) [have [I proper_noun X] [II common_noun food [ATTR adjective superb]]] Assert-décor(X, superb) [have [I proper_noun X] [II common_noun decor [ATTR adjective superb]]] Utterance Babbo has superb food and decor. Cognitive Systems [have [I proper_noun X] [II common_noun food [COORD and [II common_noun décor]] [ATTR adjective superb]]] University of Sheffield�

Solution? • In many domains, there are web pages that describe and evaluate domain entities • These web pages may include: Textual reviews of domain entities o Scalar ratings of specific attributes of domain entities on a per review or per entity basis o Tabular data with values for particular attributes o Domain or product specific ontologies o • Is it possible to mine these corpora to bootstrap a spoken language generator? Cognitive Systems University of Sheffield�

Restaurant Domain Cognitive Systems University of Sheffield�

Sample Restaurant Review Cognitive Systems University of Sheffield�

Hotel Domain • • a little gem!! Submitted by: kathy b. of cincinnati, oh usa ; May 03, 2006 Date of visit: 12/05 Traveler's Rating: 5 • the history, ambience and old world charm of the algonquin are a unique combination that appeals very much. staff is very friendly and helpful; rooms small but restored to period charm. great lobby. say hello to matilda, the resident cat, another great algonquin tradition. • • Best Feature: staff & ambience Needs Improvement: not a thing Amenities rated on a 1 -5 scale: • (1=Lowest 5=Highest N/A=Not Rated) • Rooms = 4 Dining = 5 Public Facilities = 5 Sports/Activities = N/A Entertainment = 4 Service = 5 • • • Cognitive Systems University of Sheffield�

What can be used • Textual reviews of domain entities • Scalar ratings of specific attributes of domain entities on a per review or per entity basis • Tabular data with values for categorial attributes • Specified attributes => partial ontology Cognitive Systems University of Sheffield�

Bootstrapping SLG • Automatically acquire dictionary entries from user reviews on the web • Dictionary entry is a triple (U, R, S): o U (Utterance), R (Semantic representation), and S (Syntactic structure) • Use user ratings and categorial attribute values to pinpoint semantic representation • Use Minipar parser and DSynt. S converter to produce dependency syntactic representation (Lavoie and Rambow 98, Mel’čuk, 1988) Cognitive Systems University of Sheffield�

Related Work • Create dictionary from parallel corpus (Barzilay et al. , 2002) Requires a corpus of parallel semantic representation/syntactic realizations • Find opinion expressions in reviews o Adjectives for products (Hu and Liu, 2005) o Product features and adjectives with polarity (Popescu and Etzioni, 2005) Do not focus on creating dictionary Cognitive Systems University of Sheffield�

Method • Create a population of utterances U from user reviews • For each U Derive semantic representation R Derive syntactic structure S • Filter inappropriate mappings • Add remaining mappings to dictionary Cognitive Systems University of Sheffield�

Experiment • Obtaining dictionary for restaurant domain • Data collected from we 8 there. com o 3, 004 user reviews on 1, 810 restaurants o 18, 466 review sentences o 451 mappings after filtering • Objective evaluation • Subjective evaluation Cognitive Systems University of Sheffield�

Collect user reviews • Select review websites with individual ratings for review entities • Collect review comments and ratings • Collect tabular data Ratings Food, Service, Value, Atmosphere, Overall Tabular Data Name, Food Type, Location Cognitive Systems University of Sheffield�

Derive Domain Ontology • Assume meronymy relation between: o Any attribute that the user rates o Any attribute for which categorical values are specified on the web page • Relations: RESTAURANT has foodquality RESTAURANT has servicequality RESTAURANT has valuequality RESTAURANT has atmospherequality RESTAURANT has overallquality RESTAURANT has FOODTYPE RESTAURANT has LOCATION Cognitive Systems University of Sheffield�

Hypothesis/Assumptions • Closed Domain: At least some of the utterances in reviews realize the relations in the domain ontology (identify these utterances) • Hypoth: If an utterance U contains named entities corresponding to the domain entity and the distinguished attributes, then R for that utterance includes the relation concerning that attribute in the domain ontology. Cognitive Systems University of Sheffield�

Specify Lexicalizations of attributes Attribute Lexicalizations food, meal service, staff, wait staff, server, waitress atmosphere, décor, ambience, decoration value, price, overprice, pricey, expensive, inexpensive, cheap, affordable, afford overall recommend, place, experience, establishment Cognitive Systems University of Sheffield�

Create and Label Named Entities • Scrape web pages structured data for named entities for categorial attributes Foodtype => Spanish, Italian, French, … o Location => New York, San Francisco, London o • Run Gate on U to label o Named entities o Lexicalizations of attributes Cognitive Systems University of Sheffield�

Derive Syntactic Representation • Run Minipar parser • Convert Minipar output to DSynt. S for Real. Pro (Lavoie and Rambow 98) Cognitive Systems University of Sheffield�

Ratings Review Comment Food=5, Service=5, Value=5, The best Spanish food in New York. I am Atmosphere=5, Overall=5 from Spain and I had my 28 th birthday… Review Sentence (U) The best Spanish food in New York. DSynt. S converter NE-tagged Review Sentence DSynt. S (S) The best {NE=foodtype, string=Spanish} {NE=food, string=food, rating=5} in {NE=location, string=New York}. Semantic Representation (R) RESTAURANT has FOODTYPE RESTAURANT has foodquality=5 RESTAURANT has LOCATION Cognitive Systems University of Sheffield�

Filter dictionary entries • No Relations Filter, Other Relations Filter Check whether a mapping has exactly the relations expressed in the ontology • Contextual Filter Check whether U can be uttered independently of the context (looks for context words) • Parsing Filter Check whether S(U) regenerates U • Unknown Words Filter (typos, common nouns, etc), Cognitive Systems University of Sheffield�

Filtering Statistics filtered retained No Relations Filter 7, 947 10, 519 Other Relations Filter 5, 351 5, 168 Contextual Filter 2, 973 2, 195 Unknown Words Filter 1, 467 728 216 512 61 451 Parsing Filter Duplicates Filter Cognitive Systems University of Sheffield�

Filtering Examples U: We had a wonderful time. The river was beautiful What an awful and the food okay. place. R: RESTAURANT has foodquality=3 RESTAURANT has overallquality=1 S: Filtered by No Relations Filtered by Parsing Filtered by Unknown Words Filter (“river” is a common noun) This DSynt. S generates “What awful place. ” Cognitive Systems University of Sheffield�

Objective Evaluation • Domain coverage o How many relations are covered by the dictionary? • Linguistic variation o What do we gain over handcrafted dictionary? • Generativity o Can the dictionary entries can be used in conventional sentence planner ? Cognitive Systems University of Sheffield�

Domain Coverage Distribution of single scalar-valued relation mappings 1 2 3 4 5 Total 5 8 6 18 57 94 15 3 6 17 56 97 3 3 8 31 45 value 0 0 0 1 8 12 21 overall 3 2 5 15 45 70 23 15 21 64 201 327 food service atmosphere Total Cognitive Systems University of Sheffield�

Multi-Relation Entries (122 in all) • Food-service: 39 • Food-value: 21 • Atmosphere-food: 14 • Atmosphere-service: 10 • Atmosphere-food-service: 7 • Food-Foodtype: 4 • Atmosphere-food-value: 4 • etc Cognitive Systems University of Sheffield�

Linguistic Variation • 137 syntactic patterns • 275 distinct lexemes, 2 -15 lexemes per DSynt. S (mean 4. 63) • 55% syntactic patterns ~: ATTR is AP The atmosphere is wonderful. o The food is excellent and the atmosphere is great o • 45% are not. o An absolutely outstanding value with fantastic foodtype food. Cognitive Systems University of Sheffield�

Examples Food Adjectival phrases: Attribute specificity RATING: ADJECTIVE • 1: awful, bad, cold, burnt, very ordinary • 2: acceptable bad, flavored, not enough very bland, very good • 3: adequate, bland mediocre, flavorful but cold, pretty good, rather bland, very good • 4: absolutely wonderful, awesome, decent, excellent, good and generatou, very fresh and tasty • 5: absolutely delicious, ample, well seasoned and hot, delicious but simple, delectable and plentiful, fancy but tasty, so very tasty Cognitive Systems University of Sheffield�

Example Service Adjectival phrases RATING: ADJECTIVE • 1: awful, bad, forgetful and slow, marginal, young, silly and • • inattentive 2: overly slow, very slow and inattentive 3: bland mediocre, friendly and knowledgeable, pleasant, prompt 4: all very warm and welcoming, attentive, extremely friendly and good, great and courteous, swift and friendly, very friendly and accommodating 5: polite, great, all courteous, excellent and friendly, fabulous, impeccable, intrusive, legendary, very friendly and totally personal, very helpful, very timely Cognitive Systems University of Sheffield�

Example Atmosphere Adjectival phrases RATING: ADJECTIVE • 2: eclectic, unique and pleasant • 3: busy, pleasant but extremely hot • 4: fantastic, great, quite nice and simple, typical, very casual, very trendy • 5: beautiful, comfortable, lovely, mellow, nice and comfortable, very cozy, very intimate, very relaxing, warm and contemporary Cognitive Systems University of Sheffield�

Generativity • Incorporate the learned mappings into SPa. RKy generator (Stent et al. , 2004) • Combination operations fail to apply because of assumption that restaurant name is the subject of the utterance, but interior nodes covering multiple propositions can be substituted: • Because the food is excellent, the wait staff is professional and the decor is beautiful and very comfortable, Babbo has the best overall quality among the selected restaurants. • Babbo has the best overall quality among the selected restaurants because the atmosphere is exceptionally nice, food is excellent and the service is superb. Cognitive Systems University of Sheffield�

Subjective Evaluation • 10 native English speakers • Compare baseline and learned mappings 27 hand-crafted mappings from SPa. RKy o 451 learned mappings o • Evaluation criteria: Consistency between semantic representations and realizations o Naturalness/colloquialness of realizations o 1 -5 Likert scale o Cognitive Systems University of Sheffield�

Results Consistency 4. 71 baseline 4. 46 learned Naturalness 4. 61 4. 23 baseline learned • Consistency is significantly lower, but still high • Naturalness is significantly higher Cognitive Systems University of Sheffield�

Conclusion • A new method for automatically acquiring a generation dictionary in spoken dialogue systems • Reduce the cost involved with hand-crafting a spoken language generation module • Achieve more natural system utterances using attested language examples • Experimental results suggest that this approach is promising Cognitive Systems University of Sheffield�

Future Work • Issues of ‘meaning’ vs. polarity, not substitutable • Handcrafted lexicalizations: need to automatically generate lexicalizations for domain concepts (increase recall? ) • Method for extending domain ontology (food is plentiful, delicious, beautifully prepared) • More complex domains Cognitive Systems University of Sheffield�