Une approche base sur la langue naturelle pour
Une approche basée sur la langue naturelle pour la modélisation de documents structurés Yves MARCOUX GRDS – EBSI Université de Montréal Yves Marcoux - OLST-RALI - 21 mars 2007 1
A natural-language approach to modeling Why is some XML so difficult to write? <http: //www. idealliance. org/papers/extreme/proceedings/html/2006/Marcoux 01/EML 2006 Marcoux 01. html> Yves Marcoux - OLST-RALI - 21 mars 2007 2
Structure of the talk 1. 2. 3. 4. The problem Proposed direction for solution Conclusion Question period Yves Marcoux - OLST-RALI - 21 mars 2007 3
Writing well-formed XML: author’s choices • • <sex><male /></sex> <is-female>FALSE</is-female> <gender="&#x 2642; " /> <note>It's a boy!</note> &#x 2642; = ♂ Yves Marcoux - OLST-RALI - 21 mars 2007 4
Writing valid XML is collaborative work • Modeler has chosen the markup (container) • Author supplies the contents • Much like a form • Collaborative work communication between parties: modeler and author • But the modeler is gone… Yves Marcoux - OLST-RALI - 21 mars 2007 5
Problem • Authoring environments are: – good at conveying the syntactic intentions (or decisions) of the modeler – not as good at conveying the semantic intentions of the modeler • Often, all there is is a generic ID or some slightly more developed form – Ex. : “date” in a memo Yves Marcoux - OLST-RALI - 21 mars 2007 6
What is available? • More or less developed forms of gen. IDs (and attribute names) • General documentation of the model • Per element (attribute) documentation • OK for tooltips or popups • Could we do better? • (Applications / stylesheets are not appropriate) Yves Marcoux - OLST-RALI - 21 mars 2007 7
Could we aim at… • Having a semantic conversation right in the editing window? • In the same way that there is actually a syntactic conversation? • Yes… Yves Marcoux - OLST-RALI - 21 mars 2007 8
Structure of the talk 1. 2. 3. 4. The problem Proposed direction for solution Conclusion Question period Yves Marcoux - OLST-RALI - 21 mars 2007 9
Key idea • Have modeler prepare bits of NL (prose) • That can be intertwined with authorsupplied contents to give them meaning • Allows “fill-in”-like sentences • And thus, a semantic conversation in the editing window • NB: modeler segments can contain hyperlinks Yves Marcoux - OLST-RALI - 21 mars 2007 10
Example Facts about some US cities City Population Denver Rochester Palm Spring 850, 000 240, 000 48, 000 Annual snowfall (inches) 23 88 0 Yves Marcoux - OLST-RALI - 21 mars 2007 11
Raw XML <facts-about-US-cities> <city> <name>Denver</name> <population>850, 000</population> <annual-snowfall-in-inches>23</annual-snowfall-in-inches> </city> <name>Rochester</name> <population>240, 000</population> <annual-snowfall-in-inches>88</annual-snowfall-in-inches> </city>. . . </facts-about-US-cities> Yves Marcoux - OLST-RALI - 21 mars 2007 12
Prose equivalent Here are facts about some US cities. The city of Denver has a population of 850, 000 and an annual snowfall of 23 inches. The city of Rochester has a population of 240, 000 and an annual snowfall of 88 inches. The city of Palm Spring has a population of 48, 000 and an annual snowfall of 0 inches. Yves Marcoux - OLST-RALI - 21 mars 2007 13
Modeler prepares “peritext” segments Element text-before text-after facts-about-US-cities "Here are facts about some US cities. " empty city " The city " ". " name "named " empty population " has a population of " empty annual-snowfall-in-inches " and an annual snowfall of " Yves Marcoux - OLST-RALI - 21 mars 2007 " inches" 14
Possible “semantic” view Here are facts about some US cities. The city named Denver has a population of 850, 000 and an annual snowfall of 23 inches. The city named Rochester has a population of 240, 000 and an annual snowfall of 88 inches. The city named Palm Spring has a population of 48, 000 and an annual snowfall of 0 inches. Yves Marcoux - OLST-RALI - 21 mars 2007 15
What it allows during editing (in semantic view) • Peritexts convey the semantic intentions of the modeler • A semantic conversation takes place in the editing window (instead of a syntactic one) • Fill-in sentences: – Make “tag abuse” embarrassing… – Likely to reduce some kinds of errors • Other views / fragment viewing / hyperlink Yves Marcoux - OLST-RALI - 21 mars 2007 16
Discussion • This is not like defining an application – Not a stylesheet mechanism • Peritexts (fixed here) could be allowed to vary with some parameters: – position among siblings – attribute value – etc. • (Attributes should be treated) Yves Marcoux - OLST-RALI - 21 mars 2007 17
Why does it work? • Sometimes tricky (see paper), but… • NL has very high affordance • NL can act as it’s own metalanguage • XML contents + NL usually mix pretty well Yves Marcoux - OLST-RALI - 21 mars 2007 18
Intertextual semantics • Meaning of a text fragment is given by placing it in a network of other texts • That network can simply consist in a sentence (or “quasi-sentence”) • Or more elaborate topology: peritexts can contain hyperlinks, determining sensemaking / learning paths – Too much hyperlinking can spoil the idea! Yves Marcoux - OLST-RALI - 21 mars 2007 19
Interpretation workflow S H d S(d) actual “meaning” of d for H • • • d is document or fragment, H is a human S(d) is the intertextual semantics of d S(d) is in NL S is machine computable Actual meaning of d for H may vary: – with H – for a same H, from one “reading” of S(d) to another Yves Marcoux - OLST-RALI - 21 mars 2007 20
Interpretation workflow H 2 H 1 d S(d) H 3 H 2 H 3 Yves Marcoux - OLST-RALI - 21 mars 2007 21
Suggests a modeling process • Modeler starts with the prose • Identify peritexts • Work out more and more abbreviated forms – Will correspond to different “views” in the editor • Tersest level gives markup • Increase model usability? Yves Marcoux - OLST-RALI - 21 mars 2007 22
Mixed content question revisited • Known: can get rid of mixed content with <!ELEMENT text (#PCDATA)> Example: <!ELEMENT (e 1 | e 2 | … | #PCDATA)*> becomes: <!ELEMENT (e 1 | e 2 | … | text)*> • Why does it feel bad? – Tags “text” are not abbreviations of any reasonable peritexts! Yves Marcoux - OLST-RALI - 21 mars 2007 23
Is NL too much to ask for? • Relative to some “target” community • Can go a long way (previous slide) • Hyperlinks are allowed in peritexts – Allows defining “sense-making” or learning paths • (Almost) anything formal can be turned into NL… Yves Marcoux - OLST-RALI - 21 mars 2007 24
NL as formalism common denominator Expression in artificial formalism Textbook explaining formalism STAPLER Equivalent expression in NL Yves Marcoux - OLST-RALI - 21 mars 2007 25
Editing setup without intertextual semantics World Modeler Doc. / tr. material Author NL and presupposed knowledge of target community Valid XML instance or fragment XML DTD XML EDITOR Yves Marcoux - OLST-RALI - 21 mars 2007 26
Editing setup with intertextual semantics Modeler World Author NL and presupposed knowledge of target community NL equivalent XML DTD text-before and text-after segments Valid XML instance or fragment XML EDITOR Yves Marcoux - OLST-RALI - 21 mars 2007 27
Structure of the talk 1. 2. 3. 4. The problem Proposed direction for solution Conclusion Question period Yves Marcoux - OLST-RALI - 21 mars 2007 28
What it suggests • Bring some of the discipline of producing “good documents” (manuals of style) into model & interface design – E. g. , don’t abuse hyperlinking • Litterate modeling, litterate interfaces – Litterate interface / interaction design • Benefit: make explicit prerequisite knowledge & sense-making / learning paths Yves Marcoux - OLST-RALI - 21 mars 2007 29
Other possible uses of intertextual semantics • Legal documents with multiple renditions • NLP systems that cannot treat markup – Including full-text indexing • <ex>Hamlet</ex> • “Exit Hamlet” • Other data models – Ex. : relational • Normal forms – A new look at expressivity Yves Marcoux - OLST-RALI - 21 mars 2007 30
Future work • Editing: – Work out a few existing / new models – Properly integrate attributes – More powerful peritext computation – Implement ideas in a real editor • Display peritexts when chosing insertion • Hyperlinks in displayed peritexts – Experiment with real authors Yves Marcoux - OLST-RALI - 21 mars 2007 31
Future work • More than peritexts? • More than NL (icons, sound, …)? • Compare with other semantic frameworks – Downstream semantics: Wrightson, Renear et al. • Other models • Tackle litterate modeling / interface design Yves Marcoux - OLST-RALI - 21 mars 2007 32
Merci! Questions? Yves Marcoux - OLST-RALI - 21 mars 2007 33
- Slides: 33