Shallow Language Generation TG2 and TEMSISGen Stephan Busemann
Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI Gmb. H Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki. de http: //www. dfki. de/~busemann
DFKI‘s TG/2 System Offers a Flexible Framework for NLG • TG/2 is a transparent production system • TG/2 interprets a separately defined set of condition-action rules • TG/2 maps pieces of input onto surface strings TG/2 keeps grammars largely independent from input representations (COOP threshold-passing) DECL -> PPTIME THTYPE EXCEEDS Test Predicates on properties of the input Input Access Paths yielding a part of the Input Source: Stephan Busemann Grammar Rules Language Technology I, WS 2005/2006
The TGL Syntax Uses Test and Access Path Descriptions • Category: Context-free skeleton of derivations • Test: Conjunction of Boolean predicates on the input structure • Template: Sequence of actions, each operating on a substructure • • of the current input (accessor functions) – Rule (: RULE), failure causes the rule to fail as well – Optional rule (: OPTRULE), ignored unless input available – Access path descriptions, determining input substructures – Functions (: FUN) into strings, failure causes rule to fail as well – ASCII Strings, directly to output Constraints: feature equations expressing agreement relations Side effects, e. g. updating a discourse memory Source: Stephan Busemann Language Technology I, WS 2005/2006
TG/2 Grammars Integrate Canned Texts, Templates and Context-free Rules My category is DECL. IF the slot COOP is 'threshold-passing AND the slot LAW-NAME is specified THEN apply PPtime from slot TIME apply THTYPE from CURRENT-INPUT utter "(" apply LAW from slot LAW-NAME utter ") " apply EXCEEDS from slot EXCEEDS utter ". " WHERE THTYPE AND EXCEEDS agree in GENDER [ Busemann 1996 ] En été 1999 la valeur limite autorisée ( selon le decret. . . ) a été dépassée une fois. My category is THTYPE. IF there is no slot THRESHOLD-TYPE specified THEN utter "la valeur limite autoris&e 2 e " WHERE THTYPE has value 'fem for GENDER Source: Stephan Busemann Language Technology I, WS 2005/2006
The TGL Rules are Written as Condition. Action Pairs (defproduction wertueberschreitung "WU 01" (: PRECOND (: CAT DECL : TEST ((pred-eq 'wertueberschreitung) (not (threshold-value-p)))) : ACTIONS (: TEMPLATE (: OPTRULE PPtime (get-param 'time)) (: RULE THTYPE (self)) (: OPTRULE POLL (get-param 'pollutant)) "(" (: RULE LAW (get-param 'law-name)) ") " (: RULE EXCEEDS (get-param 'exceeds)) ". " : CONSTRAINTS (: GENDER (THTYPE EXCEEDS) : EQ)))) (defproduction thtype "THT 3" (: PRECOND (: CAT THTYPE : TEST ((not (threshold-type-p)))) : ACTIONS (: TEMPLATE "la valeur limite autoris&e 2 e " : CONSTRAINTS (: GENDER THTYPE : VAL 'fem)))) Source: Stephan Busemann Language Technology I, WS 2005/2006
TGL Constraints are Percolated Across the Derivation Tree • • Constraints are PATR-II style, percolation through unification ( ) Every local tree is licensed by a grammar rule A feature can be assigned a value (: =) Two features can be constrained to have identical values (=) (X 1. GENDER = X 2. GENDER) (X 0. GENDER : = fem) X 1 X 0 X 1 “la valeur limite “ Source: Stephan Busemann X 2 X 0 (X 0. GENDER = X 2. Gender) X 2 inflect(dépassé) Language Technology I, WS 2005/2006
The Interpreter is Based on the Context. Free Backbone of TGL-Grammars THREE-STEP EVALUATION CYCLE • Matching – Identify all rules with the current category – For each of them perform its tests on the input structure (“IF” part) – Add those passing the tests to the conflict set • Conflict resolution – Select an element of the conflict set (possibly by some preference mechanism) • Firing – Evaluate the rule‘s constraints (if available, “WHERE” part) – For each element of the “THEN” part, read the new category and determine the new input structure by evaluating the associated access path descriptions Source: Stephan Busemann Language Technology I, WS 2005/2006
e. Gram Supports Grammar Development Source: Stephan Busemann Language Technology I, WS 2005/2006
TEMSIS is Developing a Transnational Environmental Management Support and Information System • Project goals TAP Sector C 9 (Environment) No. 2945 Duration 01/96 -06/98 – distributed information and communication platform for transnational cooperation – comfortable bilingual access to regional environmental data – information kiosks accessible via WWW • Cooperation between administrations – French / German urban agglomeration Moselle-Est / Stadtverband Saarbrücken – Environmental information standardised and shared Effort 172, 5 PM Source: Stephan Busemann Language Technology I, WS 2005/2006
Air Quality Reports Are Generated From Environmental Data at the TEMSIS-Server [Busemann/Horacek 1998, Horacek/Busemann 1998] • Parameter selection by the user – Language (German, French) – Pollutant and measurement station – Relevant period of time • Stage 1: text schema construction – Querying the database – Composition of report structure – Elision of contextual redundancies • Stage 2: verbalization by TG/2 – Selection of sentence patterns – Wording, phrasing, grammar • HTML postprocessing Source: Stephan Busemann Parameter Specs Text Generation Server Stage 1 TEMSIS Database Stage 2 HTML/Java Code Language Technology I, WS 2005/2006
The System Covers 384 Report Structures EXAMPLE Parameters selected from the TEMSIS navigator menus: – French language text about a German situation – Ozone data, exceeding thresholds according to decree – Measurements at Völklingen-City in summer 1997 (confirmation) Vous avez choisi la station de mesure de Völklingen-City afin de consulter la pollution atmosphérique relevée en été 1997. A la station de mesure de Völklingen-City, la valeur d'information pour l'ozone pour une exposition de 60 minutes (180 µg/m³ selon le decret allemand (Bundesimmissionsschutzverordnung)) a été dépassée une fois. La valeur d'interdiction du trafic (240 µg/m³) a aussi été dépassée une fois. En été 1996 la valeur d'information (180 µg/m³) n'a pas été dépassée. Source: Stephan Busemann Language Technology I, WS 2005/2006
Text Organization Is Schema-based SAMPLE SCHEMA FOR SUMMER OBSERVATION, THRESHOLD PASSING • • • Confirm important selected parameters Number the values exceeding the lowest threshold Number the values exceeding the next threshold Compare with values of preceding year Repeat the core statement („summary“) A schema is instantiated on the basis of the input parameters and the retrieved data Source: Stephan Busemann Language Technology I, WS 2005/2006
Instantiating a Schema Leads to a Report Structure TEXT ORGANISATION • Achieves text coherence by – removing redundant information – inserting particles („also“) – simple techniques of aggregating information • Yields canned texts or intermediate content representations • Intermediate representations are independent of particular languages Shallow generation can do without explicit knowledge representation and text planning Source: Stephan Busemann Language Technology I, WS 2005/2006
Text Coherence is Achieved by Simple Means • Removing redundancy – If two subsequent slots in a schema contain identical pieces of information, which are not focussed, then delete the second instance • Particle insertion – If two subsequent slots in a schema contain identical pieces of information, which are focussed, then mark the second instance as „repeating“ • Aggregation – If a time series contains a sequence of identical values, describe a time interval (from. . . to. . . ) and mention the value only once The means depend on the phenomena encountered in the corpus Source: Stephan Busemann Language Technology I, WS 2005/2006
Intermediate Representations Specify the Contents Only Partly Stage 1: Text Organization Stage 2: Verbalization [(COOP threshold-passing) (TIME [(PRED season) (NAME [(SEASON summer) (YEAR 1997)])]) (POLLUTANT o 3) (SITE "Völklingen-City") (DURATION [(MINUTE 60)]) (SOURCE [(LAW-NAME bimsch) (THRESHOLD-TYPE info-value)]) (EXCEEDS [(STATUS yes) (TIMES 1)])] En été 1997, à la station de mesure de Völklingen-City, la valeur d'information pour l'ozone pour une exposition de 60 minutes (180 µg/m³ selon le decret allemand (Bundesimmissionsschutzverordnung)) a été dépassée une fois. Source: Stephan Busemann Language Technology I, WS 2005/2006
Shallow Generation is Well Suited for Small Applications • What are „small“ applications? • – Limited linguistic requirements – Simple text sort, e. g. technical documents – Useful in early stages The TEMSIS generation task represents a class of applications – Text organization by few variable schemata – Sufficient quality of text by simple techniques to establish coherence – Realization of intermediate representation without accessing the context – Ca 100/120 TGL rules, ca 400 linguistically different reports With larger applications, the drawbacks of shallow approaches become obvious Source: Stephan Busemann Language Technology I, WS 2005/2006
Shallow Generation Has Pros and Cons ASSESSMENT Possible advantages • Low development effort • Reusable interpreter and subgrammars • Very fast processing • Easy introduction of additional languages • Easy extension with alternative formulations (through a preference mechanism in TG/2) Source: Stephan Busemann Possible drawbacks • Knowledge representation depends on application • Implicit dependencies • Scalability is inherently lower than with in-depth generators • Maintaining transparency of grammars can become a cost factor Language Technology I, WS 2005/2006
Literature All papers on: http: //www. dfki. de/lt/publications. php • Stephan Busemann, ``Ten Years After - An Update on TG/2 (and friends)", in: Proc. 10 th • • European Workshop on Natural Language Generation, Aberdeen, UK, 2005 Stephan Busemann, ``e. Gram - a Grammar Development Environment and Its Usage for Language Generation'', in: Proc. Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal, 2004 Stephan Busemann, ``Language Generation for Cross-Lingual Document Summarisation'', in Sheng, Huanye (ed. ), International Workshop on Innovative Language Technology and Chinese Information Processing (ILT&CIP-2001), Science Press, Chinese Academy of Sciences, Beijing, China, May 2002. Stephan Busemann and Helmut Horacek, ``A Flexible Shallow Approach to Text Generation'', in: Eduard Hovy (ed. ): Proceedings of the Nineth International Natural Language Generation Workshop (INLG '98), Niagara-on-the-Lake, Canada, August 1998, 238 -247. Also at the Computation and Language Archive. Stephan Busemann, ``Best-First Surface Realization'', in: Donia Scott (ed. ): Proceedings of the Eighth International Natural Language Generation Workshop (INLG '96), Herstmonceux, Sussex, 1996, 101 -110. Also at the Computation and Language Archive. Source: Stephan Busemann Language Technology I, WS 2005/2006
- Slides: 18