STS Temporal and Spatial Constraints on Text Similarity

  • Slides: 34
Download presentation
STS: Temporal and Spatial Constraints on Text Similarity March 13, 2012 James Pustejovsky Brandeis

STS: Temporal and Spatial Constraints on Text Similarity March 13, 2012 James Pustejovsky Brandeis University

Measuring Similarity • Objects • Events

Measuring Similarity • Objects • Events

Object similarity is a function of: 1. Sortal correlation 2. Temporal proximity 3. Spatial

Object similarity is a function of: 1. Sortal correlation 2. Temporal proximity 3. Spatial proximity • the Latin Quarter of the 1920 s – the 5 th Arrondissement in 1929 – Paris in 1925 – The Left Bank in the early 20 th Century

Event similarity is a function of: 1. 2. 3. 4. Predicative similarity Participant correlation

Event similarity is a function of: 1. 2. 3. 4. Predicative similarity Participant correlation Temporal proximity Spatial proximity cf. Kim (1993), Davidson (1980), Lewis (1986)

Event Similarity a. Mary visited John in Boston on Tuesday. b. The woman/she saw

Event Similarity a. Mary visited John in Boston on Tuesday. b. The woman/she saw her husband in Copley Square yesterday. • • • Sim(P 1, P 2): visit vs. see Sim(Subj 1, Subj 2): Mary vs. the woman Sim(Obj 1, Obj 2): John vs. her husband Sim(Loc 1, Loc 2): Boston vs. Copley Square Sim(Time 1, Time 2): Tuesday vs. yesterday Brandeis CS 114 -2012 Pustejovsky

Predicative Similarity • Lexical resources • LSA • Vector-based models Brandeis CS 114 -2012

Predicative Similarity • Lexical resources • LSA • Vector-based models Brandeis CS 114 -2012 Pustejovsky

Argument Alignment • Semantic Role Labeling + • Sortal Similarity Brandeis CS 114 -2012

Argument Alignment • Semantic Role Labeling + • Sortal Similarity Brandeis CS 114 -2012 Pustejovsky

Temporal Similarity • Normalization – Map to standardized ISO-Time. ML format • Referencing –

Temporal Similarity • Normalization – Map to standardized ISO-Time. ML format • Referencing – Reference relative to local temporal values Val(Tuesday) = Val(yesterday) Brandeis CS 114 -2012 Pustejovsky

Spatial Similarity • Normalization – Map to standardized ISO-Space format • Referencing – Reference

Spatial Similarity • Normalization – Map to standardized ISO-Space format • Referencing – Reference relative to accessible spatial values Val(Copley_Sq) Spatial-IN Val(Boston) Brandeis CS 114 -2012 Pustejovsky

Temporal Issues • Subsumption in anchoring – The bombing occurred Monday morning. – The

Temporal Issues • Subsumption in anchoring – The bombing occurred Monday morning. – The bombing occurred Monday. – The bombing occurred last week. Brandeis CS 114 -2012 Pustejovsky

Motivation for time and event markup • Natural language is filled with references to

Motivation for time and event markup • Natural language is filled with references to past and future events, as well as planned activities and goals; • Without a robust ability to identify and temporally situate events of interest from language, the real importance of the information can be missed; • A Robust Annotation standard can help leverage this information from natural language text.

Temporal Awareness in Real Text • The bridge collapsed during the storm but after

Temporal Awareness in Real Text • The bridge collapsed during the storm but after traffic was rerouted to the Bay Bridge. • President Roosevelt died in April 1945 before – the war ended. (event happened) – he dropped the bomb. (event didn’t happen) • The CEO plans to retire next month. • Last week Bill was running the marathon when he twisted his ankle. Someone had tripped him. He fell and didn't finish the race.

Current Time Analysis Technology • Document Time Linking – Find the document creation time

Current Time Analysis Technology • Document Time Linking – Find the document creation time and link that to all events in the text; • Local Time Stamping – find an event and a “local temporal expression”, and link it to that time;

Document Time Stamping April 25, 2010 • President Obama paid tribute Sunday to 29

Document Time Stamping April 25, 2010 • President Obama paid tribute Sunday to 29 workers killed in an explosion at a West Virginia coal mine earlier this month, saying they died "in pursuit of the American dream. " The blast at the Upper Big Branch Mine was the worst U. S. mine disaster in nearly 40 years. Obama ordered a review earlier this month and blamed mine officials for lax regulation.

Document Time Stamping: April 25, 2010 • President Obama paid tribute Sunday to 29

Document Time Stamping: April 25, 2010 • President Obama paid tribute Sunday to 29 workers killed in an explosion at a West Virginia coal mine earlier this month, saying they died "in pursuit of the American dream. " The blast at the Upper Big Branch Mine was the worst U. S. mine disaster in nearly 40 years. Obama ordered a review earlier this month and blamed mine officials for lax regulation.

Identify which Events Should be Ordered • The annotation specification should specify a kernel

Identify which Events Should be Ordered • The annotation specification should specify a kernel of events and time expressions to be annotated. • Anchoring relations between events and times depend on genre, style, and register. • Ordering relations between events depend largely on discourse relations in the text.

Creation vs. Narrative Time • Document Creation Time – when the utterance is made

Creation vs. Narrative Time • Document Creation Time – when the utterance is made (speech time) • Narrative Time – when the event occurs

Genre, Style, and Register • • Participants Relations among participants Channel Production Circumstances Setting

Genre, Style, and Register • • Participants Relations among participants Channel Production Circumstances Setting Communicative Purpose Topic

Genre, Register, and Style • Help distinguish text types in order to better characterize

Genre, Register, and Style • Help distinguish text types in order to better characterize the information structure of the text • Example, news wire vs. news article – narrative time (NT) is a function of publication/creation frequency.

Narrative Time • Identifies the temporal interval of the events being described in the

Narrative Time • Identifies the temporal interval of the events being described in the text. – Document Narrative Time: set by text-genre – Current Narrative Time: shifts through the text

Document Time Stamping: for real April 25, 2010 • President Obama paid tribute Sunday

Document Time Stamping: for real April 25, 2010 • President Obama paid tribute Sunday to 29 workers killed in an explosion at a West Virginia coal mine earlier this month, saying they died "in pursuit of the American dream. " The blast at the Upper Big Branch Mine was the worst U. S. mine disaster in nearly 40 years. Obama ordered a review earlier this month and blamed mine officials for lax regulation.

Narrative Container April 25, 2010 • President Obama paid tribute Sunday to 29 workers

Narrative Container April 25, 2010 • President Obama paid tribute Sunday to 29 workers killed in an explosion at a West Virginia coal mine earlier this month, saying they died "in pursuit of the American dream. " The blast at the Upper Big Branch Mine was the worst U. S. mine disaster in nearly 40 years. Obama ordered a review earlier this month and blamed mine officials for lax regulation.

Time Stamping: the good, bad, … • • ✓ ☺Set up a meeting on

Time Stamping: the good, bad, … • • ✓ ☺Set up a meeting on Tuesday with EMC. ✓ ☺Franklin arrives tomorrow from London. ✗ ☹ Franklin arrives on the afternoon flight from London tomorrow. ✗ ☹ ☹ Most people drive today while talking on the phone.

ISO-Time. ML Enables Temporal Parsing • A new generation of language analysis tools that

ISO-Time. ML Enables Temporal Parsing • A new generation of language analysis tools that are able to temporally organize events in terms of their ordering and time of occurrence • These tools can be integrated with visualization, summarization, question answering, and link analysis systems to help analyze large event-rich information spaces.

ISO-Time. ML Provides elements to: • Find all events and times in newswire text

ISO-Time. ML Provides elements to: • Find all events and times in newswire text • Link events to the document time and to local times • Order event relative to other events • Ensure consistency of the temporal relations

ISO-Space • Capture the complex constructions of spatial language in text • Provide an

ISO-Space • Capture the complex constructions of spatial language in text • Provide an inventory of how spatial information is presented in natural language • ISO-Space is not designed to provide a formalism that fully represents the complexity of spatial language

Applications of ISO-Space • Building a spatial map of objects relative to one another.

Applications of ISO-Space • Building a spatial map of objects relative to one another. • Reconstructing spatial information associated with a sequence of events. • Determining object location given a verbal description. • Translating viewer-centric verbal descriptions into other relative descriptions or absolute coordinate descriptions. • Constructing a route given a route description. • Constructing a spatial model of an interior or exterior space given a verbal description. • Integrating spatial descriptions with information from other media.

Semantic Requirements for Annotation • Fundamental distinction between the concepts of annotation and representation

Semantic Requirements for Annotation • Fundamental distinction between the concepts of annotation and representation – Based on ISO CD 24612 Language resource management Linguistic Annotation Framework (Ide and Romary, 2004) • Distinguish between abstract syntax and concrete syntax – Concrete Syntax XML encoding – Abstract Syntax Conceptual inventory and a set of syntactic rules defining the combination of these elements

Spatial Expressions • Constructions that make explicit reference to the spatial attributes of an

Spatial Expressions • Constructions that make explicit reference to the spatial attributes of an object or spatial relations between objects • Four grammatically defined classes: – Spatial Prepositions and Particles: on, in, under, over, up, down, left of – Verbs of Position and Movement: lean over, sit, run, swim, arrive – Spatial Attributes: tall, long, wide, deep – Spatial Nominals: area, room, center, corner, front, hallway

Spatial Relations • Topological: – In, inside, touching, outside • Orientational (with frame of

Spatial Relations • Topological: – In, inside, touching, outside • Orientational (with frame of reference): – Behind, left of, in front of • Topo-metric: – Near, close by • Topological-orientational: – On, over, below • Metric: – 20 miles away

Frames of Reference (Levinson, 2003) • Absolute – The lake is north of the

Frames of Reference (Levinson, 2003) • Absolute – The lake is north of the city. • Relative – The book is to your left. – The tree is between the Pru and the Monitor. • Intrinsic – There’s a ball in front of the car. – The tree is behind the bench.

Frames of reference • The tree to the left of the entrance • The

Frames of reference • The tree to the left of the entrance • The steps in front of me/the entrance

ISO-Space 1. 4 • Spatial Relations are split into 4 types: – Topological (QSLink)

ISO-Space 1. 4 • Spatial Relations are split into 4 types: – Topological (QSLink) – Relational (Orient. Link) – Movement (Move. Link) – Measurement (MLINK, from Time. ML) • Spatial Relations are identified with role labels, include Figure and Ground • SPATIAL_NAMED-ENTITY

Conclusion: Measuring Semantic Similarity • Normalizing temporal and spatial expressions • Developing standardized specifications

Conclusion: Measuring Semantic Similarity • Normalizing temporal and spatial expressions • Developing standardized specifications contribute towards corpora for training and evaluation for such normalization • Cases in point: – ISO-Time. ML (ISO adopted) – ISO-Space (in development)