conceptual coherence in the generation of referring expressions

  • Slides: 33
Download presentation
conceptual coherence in the generation of referring expressions Albert Gatt & Kees van Deemter

conceptual coherence in the generation of referring expressions Albert Gatt & Kees van Deemter University of Aberdeen {agatt, kvdeemte}@csd. abdn. ac. uk

l Gatt and Van Deemter 2007: “Lexical Choice and conceptual perspective in the generation

l Gatt and Van Deemter 2007: “Lexical Choice and conceptual perspective in the generation of plural referring expressions”. Journal of Logic Language and Information (Jo. LLI) 16 (4), p. 423 -444.

some received wisdom… Choice is ultimately dependent on the perspective you decide to take

some received wisdom… Choice is ultimately dependent on the perspective you decide to take on the referent (. . . ). Will it be more effective for me to refer to my sister as my sister or as that lady or as the physicist ? (Levelt `99, p. 226)

the rest of this talk… 1. 2. Generation of Referring Expressions Perspective and Conceptual

the rest of this talk… 1. 2. Generation of Referring Expressions Perspective and Conceptual Coherence l l 3. An algorithm l 4. reference to sets experimental work evaluation Extensions: l local (Conceptual) Coherence in discourse

Generation of Referring Expressions (GRE) l Part of micro-planning (Reiter/Dale `00) l At this

Generation of Referring Expressions (GRE) l Part of micro-planning (Reiter/Dale `00) l At this stage, the content of a message is being determined, including descriptions of domain objects (Noun Phrases) l The task of GRE: – given a set of intended referents, look up properties of these referents that will distinguish them from their distractors in a Knowledge Base

Content determination strategies l entity base type occupation specialisation girth e 1 woman professor

Content determination strategies l entity base type occupation specialisation girth e 1 woman professor physicist plump e 2 woman lecturer geologist thin e 3 man lecturer biologist thin e 4 man postgraduate thin Most algorithms inspired by the Gricean maxims (Grice `75) – especially Brevity (Dale `89, Gardent `02) l But compare: l ? ? λx: professor(x) V plump(x) ? ? λx: professor(x) V [plump(x) & man(x)] λx: biologist(x) V physicist(x) Not all of these have an equally good ring to them.

the Conceptual Coherence constraint l Sets (and disjunction): λx: p(x) V q(x) ‘the p

the Conceptual Coherence constraint l Sets (and disjunction): λx: p(x) V q(x) ‘the p and the q’ – – – l reference to a plurality suggests to the listener that there is a relationship holding between elements of the pluralities p and q should be related or “similar” semantic relatedness allows the listener to conceptualise the plurality more easily (Sanford and Moxey, `95) Gatt and van Deemter (`02): – – – People’s preference for descriptions of this form were highly correlated to the semantic similarity of disjuncts Best results achieved with a distributional definition of similarity (Lin `98) sim(w, w’) is a function of how often w and w’ occur in the same grammatical relations in a corpus

Lin’s definition of distributional similarity l l Let w 1, w 2 be two

Lin’s definition of distributional similarity l l Let w 1, w 2 be two words of the same grammatical category. E. g. dog, cat GR contains information about a syntactic relation w occurs in: – GR = <w, R, x, p> – w the target word, R the relation, x the co-argument of w – p is the probability of w and x occurring in this construction (as mutual information). – Example: <dog, modified-by, stray, 0. 002> sim(w 1, w 2) is calculated using the GR triples that w 1 and w 2 share. We use Sketch. Engine, a large-scale implementation of this theory, based on the BNC (Kilgarriff, `03)

experiment 1: multimodal sentence completion l l l General idea: – To refer to

experiment 1: multimodal sentence completion l l l General idea: – To refer to a set, people will prefer to use a plural that respects the conceptual coherence constraint – If this is impossible, then they will break down the set in manageable parts. Experimental domains: – 3 targets (a, b, c) + 1 distractor (d) – sim(a, b) could be high or low – sim(a, c) ≈ sim(b, c) = low Expectation: – if 2 of the targets have semantically high-sim types, they will be referred to in a plural description

experiment 1: example domain Experimental domain: a d £ 5 c £ 5 £

experiment 1: example domain Experimental domain: a d £ 5 c £ 5 £ 20 1. Participants completed the sentences by clicking on the pictures. 2. Manipulation of similarity of two of the objects (a, b). b £ 5 Complete the following by clicking on the pictures: The _______ and the _______ cost £ 5. The _______ also costs £ 5. 3. Hypothesis: If {a, b} are similar, they are more likely to be referred to in the plural.

experiment 1: results Proportion of plural references to designated targets {a, b} when: {a,

experiment 1: results Proportion of plural references to designated targets {a, b} when: {a, b} are semantically similar {a, b} are semantically dissimilar

experiment 2: sentence continuation l Does similarity play a role in content determination? A

experiment 2: sentence continuation l Does similarity play a role in content determination? A university building was robbed last night. The police have detained three suspects for questioning, all of whom work or study at the university. 1. One of them is a postgraduate. He is a physicist. 2. Another is a Greek, an undergraduate. 3. Also among the suspects is a cleaner. He is an Italian. Both ___________ were held in custody, but the physicist was released last night. l l Distinguishing properties: nouns (12) or adjectives (12 ). Expectation: – Participants will select similar properties in the plural description

experiment 2: results Proportion of references using pairwise similar properties: Nouns: Friedman 45. 89,

experiment 2: results Proportion of references using pairwise similar properties: Nouns: Friedman 45. 89, p <. 001 trend as expected Adjectives: Friedman 36. 3, p <. 001 trend in the opposite direction

summary of findings so far l In referential situations, people prefer to produce plural

summary of findings so far l In referential situations, people prefer to produce plural descriptions if the entities can be conceptualised under the same perspective. l This holds for types, but not modifiers – Types correspond to “concepts”, and are the way we carve up the world and categorise objects – Modifiers correspond to properties of those objects. l Results have been corroborated in other experiments

l l Aloni (2002): answers to questions “wh x? ” must conceptualise the different

l l Aloni (2002): answers to questions “wh x? ” must conceptualise the different x using one and the same perspective (relevant given hearer’s information state and the context) Our experiments confirm that this idea is on the right track …

The challenge for an algorithm: l l Complete coherence is often not possible “the

The challenge for an algorithm: l l Complete coherence is often not possible “the Italian, the Greek and the Spaniard” – But what if there are 5 Spaniards? “the Italian, the Greek and ? ” – What if you don’t know the person’s nationality? “the table, the chair and the plant” – What if you need to refer to an object that’s of different kind of the other two?

a GRE algorithm l The algorithm should try to find the most coherent description

a GRE algorithm l The algorithm should try to find the most coherent description possible. Assumption: this should be done even at the cost of brevity! l Main knowledge source: – The relation sim (Kilgarriff `03) l Input: – Knowledge Base – Target referents (R )

step 1 1. 2. l Lexicalise properties in the KB Identify types (nominal properties)

step 1 1. 2. l Lexicalise properties in the KB Identify types (nominal properties) and modifiers The set of types and the similarity relation define a semantic space S = <T, sim> Definition 1: Perspective A perspective P is a convex subset of S, i. e. : ∀ t, t’’ ∈ T: t, t’ ∈ P & sim(t, t’’) ≥ sim(t, t’) t’’ ∈ P l Computed using a clustering algorithm (Gatt `06), which recursively groups together semantic nearest neighbours.

perspective graph l l Aim: find a description for R that minimises the distance

perspective graph l l Aim: find a description for R that minimises the distance between perspectives from which properties are selected. Weight of a description, w(D): the sum of distances between perspectives represented in D. – – w( ‘the professor and the plump man’ ) = 1 w( ‘the biologist and the physicist’ ) = 0

descriptive coherence Definition 2: Maximal coherence D is maximally coherent if there is no

descriptive coherence Definition 2: Maximal coherence D is maximally coherent if there is no D’ coextensive with D such that w(D’) < w(D) – implies finding a shortest connection network in the perspective graph (intractable!) Definition 3: Local coherence D is locally coherent if there is no D’ coextensive with D s. t. : 1. D’ is obtained by replacing a perspective in D 2. w(D’) < w(D)

search procedure l l l N ∅ //the perspectives represented in D root perspective

search procedure l l l N ∅ //the perspectives represented in D root perspective with most referents in its extension starting from root do: – Check types and modifiers. – If a property excludes distractors: l add it to D l add the perspective to N – If R is not distinguished, go to the next perspective, which is (V is the set of perspectives).

evaluation l Do people prefer coherence over brevity? – l Method: subjects (N =

evaluation l Do people prefer coherence over brevity? – l Method: subjects (N = 39) shown 6 discourses. – – – l (Two Gricean maxims: “Be brief” vs. “Be orderly”) Each discourse introduces 3 entities Followed by 2 possible continuations Subjects had to indicate their preferred continuation Each of the 6 discourses represented a condition: – – – Brevity: descriptions equally (in-)coherent, but one is brief Coherence: descriptions equally (non-)brief; only one is coherent Trade-off: coherent description is non-brief

Example: the domain Three old manuscripts were auctioned at Sotheby’s: e 1: One of

Example: the domain Three old manuscripts were auctioned at Sotheby’s: e 1: One of them is a book, a biography of a composer e 2: The second, a sailor’s journal, was published in the form of a pamphlet. It is a record of a voyage. e 3: The third, another pamphlet, is an essay by Hume

l Intuitively, this is about texts – – l of different genres (e. g.

l Intuitively, this is about texts – – l of different genres (e. g. , essay) published in different forms (e. g. , pamphlet) Of course our corpus-based model doesn’t use these concepts …

Example: continuations: (+c, -b) The biography, the journal and the essay were sold to

Example: continuations: (+c, -b) The biography, the journal and the essay were sold to a collector (+c, +b) The book and the pamphlets were sold to a collector (-c, +b) The biography and the pamphlets were sold to a collector (-c, -b) The book, the record and the essay were sold to a collector

results: no preference for brevity both descriptions coherent x 2 =. 023, p =.

results: no preference for brevity both descriptions coherent x 2 =. 023, p =. 8 both descriptions non-coherent x 2 =. 64, p =. 4

results: preference for coherence both descriptions minimal x 2 = 16. 03, p <.

results: preference for coherence both descriptions minimal x 2 = 16. 03, p <. 001 both descriptions non-minimal x 2 = 13. 56, p <. 001

results: trade-off l Finally, (+c, -b) preferred over (-c, +b) x 2 = 39.

results: trade-off l Finally, (+c, -b) preferred over (-c, +b) x 2 = 39. 0, p <. 001 In other words l Coherence was more important than brevity l In fact, brevity made no difference at all! – we did not confirm that +b is preferred over –b

Conclusion l When it’s impossible to use the same perspective, use perspectives that are

Conclusion l When it’s impossible to use the same perspective, use perspectives that are similar l A version of Grice’s maxim “be orderly”?

Methodology l Many experiments were done – – l l to find a suitable

Methodology l Many experiments were done – – l l to find a suitable notion of similarity/coherence to discover how coherence and brevity relate Different algorithmic interpretations would be possible Algorithms are almost always under-determined by the empirical evidence

A limitation l l Ambiguity/polysemy is not taken into account For example, we might

A limitation l l Ambiguity/polysemy is not taken into account For example, we might generate – l l “the river and the/its bank” These issues investigated in Imtiaz Khan’s Ph. D project One remark: “river” might disambiguate “bank”

An open question l Why doesn’t coherence play the same role for modifiers as

An open question l Why doesn’t coherence play the same role for modifiers as for types?