xml tm XML Text Memory Using XML technology

  • Slides: 28
Download presentation
xml: tm XML Text Memory Using XML technology to reduce the cost of translating

xml: tm XML Text Memory Using XML technology to reduce the cost of translating XML documents

Computational Linguistic Methodologies • • Machine Translation Memory Hybrid Linguistic Inferencing Engines Terminology

Computational Linguistic Methodologies • • Machine Translation Memory Hybrid Linguistic Inferencing Engines Terminology

Automating Translation • Machine translation • 40 year history • Rigorous control of grammar

Automating Translation • Machine translation • 40 year history • Rigorous control of grammar and terminology can produce very good results • Enormous amount of work left to achieve free format translation.

Translation Memory • • • Align source and target text Look up new text

Translation Memory • • • Align source and target text Look up new text against memory Relatively primitive technology No advance over past 30 years Need for proofing Proprietary translation memory formats

Translating XML Documents • XML inherently easier to translate • Separation of form and

Translating XML Documents • XML inherently easier to translate • Separation of form and content • Support for Unicode and other international encoding formats. • Allows multiple output formats - PDF, XHTML, WAP

XML Translation Standards • LISA - Localization Industry Standards Association: http: //www. lisa. org

XML Translation Standards • LISA - Localization Industry Standards Association: http: //www. lisa. org • OASIS - Organization for the Advancement of Structured Information Standards: http: //www. oasis-open. org • W 3 C - World Wide Web Consortium: http: //www. w 3 c. org • OLIF Consortium: http: //www. olif. net

LISA Standards • TMX - Translation Memory Exchange format: http: //www. lisa. org/tmx •

LISA Standards • TMX - Translation Memory Exchange format: http: //www. lisa. org/tmx • TBX - Termbase Exchange format: http: //www. lisa. org/tbx • SRX - Segmentation Rules Exchange format: http: //www. lisa. org/srx • GMX - GILT Metrics Exchange format: http: //www. lisa. org/gmx

OASIS L 10 N Standards • XLIFF - XML Localization Interchange File Format: http:

OASIS L 10 N Standards • XLIFF - XML Localization Interchange File Format: http: //www. oasisopen. org/committees/tc_home. php? wg_abbrev=xl iff • Trans. WS - Translation Web Services: http: //www. oasisopen. org/committees/tc_home. php? wg_abbrev=tr ans-ws • DITA – Darwin Information Technology Architecture http: //www. oasisopen. org/committees/tc_home. php? wg_abbrev=di ta

W 3 C and OLIF • W 3 C ITS http: //www. w 3.

W 3 C and OLIF • W 3 C ITS http: //www. w 3. org/International/its • OLIF - Open Lexicon Interchange Format: http: //www. olif. net

XML namespace • Major feature of XML • Allows the mapping of different ontological

XML namespace • Major feature of XML • Allows the mapping of different ontological entities onto the same representation • Allows different ways to look at the same data • Namespaces can be made transparent

xml: tm • XML based text memory • Revolutionary approach to translating XML documents

xml: tm • XML based text memory • Revolutionary approach to translating XML documents • First significant advance in translation memory technology • Uses XML namespace to transparently embed contextual information

xml: tm namespace • • Text Memory namespace Can be mapped onto any XML

xml: tm namespace • • Text Memory namespace Can be mapped onto any XML document Vertical view of document in terms of ‘text segments’ Can be totally transparent

xml: tm namespace Example of the use of tm namespace in an XML document:

xml: tm namespace Example of the use of tm namespace in an XML document: <document xmlns: tm="urn: xml-Intl-tm" > <tm: tm> <section> <para> <tm: te> <tm: tu> Namespace is very flexible. </tm: tu> <tm: tu> It is very easy to use. </tm: tu> </tm: te> </para>

xml: tm namespace original document view title tm para text para text section para

xml: tm namespace original document view title tm para text para text section para te tu text tm namespace view text te tu sentence tu sentence te tu sentence

xml: tm namespace original document view text <para> Namespace is very simple. It is

xml: tm namespace original document view text <para> Namespace is very simple. It is easy to use. </para> tm namespace view te tu sentence <para> <tm: te id=“e 1”> <tm: tu id=“u 1. 1”> Namespace is very simple. <tm: tu id=“u 1. 2”> It is easy to use. </tm: te> </para> </tm: tu>

xml: tm Text Memory • Author memory Maintain memory of source text Authoring statistics

xml: tm Text Memory • Author memory Maintain memory of source text Authoring statistics Authoring tool input • Translation memory Automatic alignment Maintain perfect link of source and target text Reduce translation costs

xml: tm DOM differencing Source Document Updated Source Document tu id=” 1” tu id=”

xml: tm DOM differencing Source Document Updated Source Document tu id=” 1” tu id=” 2” deleted tu id=” 3” tu id=” 4” tu id=” 5” modified tu id=” 7” tu id=” 6” new tu id=” 8” origid=” 5”

xml: tm Author Memory • • • Namespace aware DOM differencing Identify changes from

xml: tm Author Memory • • • Namespace aware DOM differencing Identify changes from the previous version Unique text unit identifiers are maintained Modification history Text units can be loaded into a database Authoring environment integration

xml: tm Translation Memory • The tm namespace can be used to create XLIFF

xml: tm Translation Memory • The tm namespace can be used to create XLIFF files • Automatic alignment of source and target languages • Allows for more focused translation matching – – – Perfect matching Leveraged matching from document - identical text Leveraged matching from database Modified text unit matching Linguistically enhanced fuzzy matching Non translatable text unit identification

xml: tm translation Source Document XLIFF Document Translated Document tu id=” 1” trans-unit id=”

xml: tm translation Source Document XLIFF Document Translated Document tu id=” 1” trans-unit id=” 1” tu id=” 2” trans-unit id=” 2” tu id=” 3” trans-unit id=” 3” tu id=” 4” trans-unit id=” 4” tu id=” 5” trans-unit id=” 5” tu id=” 6” trans-unit id=” 6” tu id=” 6”

xml: tm translated document view title para tekst para te tu tekst section para

xml: tm translated document view title para tekst para te tu tekst section para tekst translated tm namespace view tm tekst te tu zdanie tu zdanie te tu zdanie

xml: tm perfect alignment Source Document Perfect alignment Translated Document tu id=” 1” tu

xml: tm perfect alignment Source Document Perfect alignment Translated Document tu id=” 1” tu id=” 2” tu id=” 3” tu id=” 4” tu id=” 5” tu id=” 6”

xml: tm matching Updated Source Perfect Document Matching tu id=” 1” non trans tu

xml: tm matching Updated Source Perfect Document Matching tu id=” 1” non trans tu id=” 2” tu id=” 1” requires no translation tu id=” 2” tu id=” 3” tu id=” 4” tu id=” 7” requires translation tu id=” 6” new: same Matched Target Document tu id=” 7” non translatable fuzzy match tu id=” 6” tu id=” 8” requires proofing tu id=” 8” doc leveraged match tu id=” 9” requires proofing tu id=” 9” DB leveraged match DB

xml: tm Traditional Translation Scenario Publishing source text Translation source text extract target text

xml: tm Traditional Translation Scenario Publishing source text Translation source text extract target text QA Extracted text merge Translated text tm process Prepared text Translate

xml: tm Translation Scenario Publishing xml source text leveraged matching extract Extracted text Automatic

xml: tm Translation Scenario Publishing xml source text leveraged matching extract Extracted text Automatic Process Web xml target text QA Web service/ interface Automatic Process Prepared text tm process perfect matching Translate Translator merge

xml: tm benefits • • • Enterprise level scalability Totally integrated within the XML

xml: tm benefits • • • Enterprise level scalability Totally integrated within the XML framework Source text is automatically extracted and matched Word counts are controlled by the customer Text can be presented for translation via the web Online composition The most up to date translation is held by the customer Data is merged automatically at end of translation cycle All memory operations are totally automated Can be used transparently for relay translations Much cheaper to run More accurate – better matching

xml: tm • Fully specified XML based standard: – http: //www. xml-intl. com/docs/specification/ xml-tm.

xml: tm • Fully specified XML based standard: – http: //www. xml-intl. com/docs/specification/ xml-tm. html • Maintained by xml-intl. com – http: //www. xml-intl. com/dtd/tm. dtd – http: //www. xml-intl. com/dtd/tm. xsd • Detailed article on www. xml. com • Offered for consideration as a Lisa standard

xml: tm Any questions?

xml: tm Any questions?