An Open Localisation Interface to CMS using OASIS

  • Slides: 32
Download presentation
An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services Aonghus Ó

An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services Aonghus Ó h. Airt, Dominic Jones, Leroy Finn and David Lewis Centre for Next Generation Localisation, Trinity College Dublin

Challenges for Interoperability More iterative workflows From push-based hand-offs To change notification & fine

Challenges for Interoperability More iterative workflows From push-based hand-offs To change notification & fine grained retrievals Extending data management to support innovation • Statistical MT, named entity recognition, text analytics for QA and terminology management • All require up-to-date relevant training corpora Solutions must sit comfortably with technology of: Content Management, Web Publishing & Localisation No one Standard can address it all Integrating ITS and CMIS for L 10 n & a little on XLIFF, RDF and Open Provenance

CMS-TMS Interoperability roadblock High variety: Content Management Systems Content formats Increasingly dynamic Add language

CMS-TMS Interoperability roadblock High variety: Content Management Systems Content formats Increasingly dynamic Add language resource curation Data driven MT, text analytics Client Create Content Prepare Content Translate Content editors CMS LSP CMS/ CVS Terminology tools TMS QA QA tools design Publish Content translators Web CMS CAT

Internationalisation Tag Set (ITS) Allows I 18 n and L 10 n tools to

Internationalisation Tag Set (ITS) Allows I 18 n and L 10 n tools to be instructed to treat specific text in specific ways Principles: Minimise disturbance of original content Don’t reinvent wheel Link to existing meta-data before adding new Defined distinct, independent Data Categories Identify relevant text using: Attributes to existing elements: LOCAL selection Xpath selectors in a special element: GLOBAL selection

C. Lieske, F. Sasaki 2010

C. Lieske, F. Sasaki 2010

ITS 1. 0 Data Categories Translate: Mark whether the content of an element or

ITS 1. 0 Data Categories Translate: Mark whether the content of an element or attribute should be translated or not Localization Note: Communicate notes to localizers about a particular item of content Terminology Mark terms and optionally associate them with information, such as definitions Directionality Specify the base writing direction of blocks, embeddings and overrides for the Unicode bidirectional algorithm Ruby Provide a short annotation of an associated base text, particularly useful for East Asian languages Language Information Express the language of a given piece of content Element within Text Identify how an element behaves relative to its surrounding text, eg. for text segmentation purposes

ITS 2. 0 Draft Data Categories Language Technology I 18 n • Locale Filter

ITS 2. 0 Draft Data Categories Language Technology I 18 n • Locale Filter • External Resource • Preserve Space • Allowed Characters • Storage Size • ID Value • • Domain MT confidence Disambiguation Text Analysis Annotation Provenance & QA • Quality Issue • Quality Précis • Translation Provenance Agent • Trans Revision Prov Agent • Standoff Provenance

ITS and Content Management Global ITS rules can be defined in an external file

ITS and Content Management Global ITS rules can be defined in an external file Attribute applied to a node with following precedence: LOCAL attributes Embedded GLOBAL rules in reverse order External GLOBAL rules in reverse order ITS allows tool-specific mechanisms for associating global rules with content – precedence not specified Common practice to apply a given set of rules to all documents in a project with the same schema Can this scale to multiple overlapping schema? Can we use some CMS-level meta-data interoperability solution?

CMS Interoperability Integrating with CMS requires the use of an API. Until now, most

CMS Interoperability Integrating with CMS requires the use of an API. Until now, most CMS used proprietary APIs Proprietary interfaces to CMS lead to limited support, vendor lock-in and poor interoperability between CMS and with localisation tools Content Management Interoperability Service (CMIS) from OASIS offers a standardised API for interacting with CMS Localisation is out of scope for CMIS How can CMIS facilitate the localisation of content across multiple CMS?

OASIS Content Management Interoperability Services (CMIS) “defines a domain model and Web Services and

OASIS Content Management Interoperability Services (CMIS) “defines a domain model and Web Services and Restful Atom. Pub bindings that can be used by applications to work with one or more Content Management repositories/systems. ” (CMIS standard) Published in 2010 Participation from Adobe, Alfresco, EMC, IBM, Microsoft, Oracle, SAP, and others.

CMIS Implementations Alfresco 3. 3+ Apache Chemistry In. Memory Server Athento COI Day Software

CMIS Implementations Alfresco 3. 3+ Apache Chemistry In. Memory Server Athento COI Day Software CRX EMC Documentum e. Xo Platform with x. CMIS Fabasoft HP Autonomy Interwoven Worksite IBM Content Manager IBM File. Net Content Manager IBM Content Manager On Demand IBM Connections Files IBM Lotus. Live Files IBM Lotus Quickr Lists ISIS Papyrus Objects Knowledge. Tree 3. 7+ Maarch 1. 3 Magnolia (CMS) 4. 5 Microsoft Share. Point Server 2010 NCMIS Nemaki. Ware Nuxeo Platform 5. 5 O 3 spaces 3. 2+ Open. IMS Open. WGA 5. 2+ PTC Windchill SAP Net. Weaver Cloud Document Seapine Surround SCM 2011. 1 Sense/Net 6. 0+ TYPO 3 VB. CMIS

CMIS Objects A repository is a container of objects. Objects have four base types:

CMIS Objects A repository is a container of objects. Objects have four base types: Document object – “elementary information entities managed by the repository” Folder object – “serves as the anchor for a collection of fileable objects” Relationship object – “instantiates an explicit, binary, directional, non-invasive, and typed relationship between a Source Object and a Target Object” Policy object – “represents an administrative policy that can be enforced by a repository, such as a retention management policy. ” (CMIS Specification)

CMS-L 10 n Interoperability: Two Requirements Flexible ITS rule to document bindings The same

CMS-L 10 n Interoperability: Two Requirements Flexible ITS rule to document bindings The same rule to be applied to multiple documents Multiple rules to be applied to individual documents Specify the precedence order in which rules are processed for a document Aim to support external ITS rules via CMIS Need to signal L 10 n-relevant updates to documents MLW-LT (ITS 2. 0) workgroup identified a requirement for such ‘readiness’ signalling Aim to support open asynchronous change notification for CMIS

Design: Extending CMIS Implementations Two approaches to modelling the localisation information: Custom content modelling

Design: Extending CMIS Implementations Two approaches to modelling the localisation information: Custom content modelling Alfresco aspects Implementation in repository Alfresco (primary) Nuxeo (basic testing)

ITS rules using Policy Objects Translate rules as policy objects

ITS rules using Policy Objects Translate rules as policy objects

ITS Rules as Folders Translate rules as folder objects

ITS Rules as Folders Translate rules as folder objects

Signalling Readiness from CMS Readiness meta-data Indicates the readiness of a document for submission

Signalling Readiness from CMS Readiness meta-data Indicates the readiness of a document for submission to L 10 n processes or provide an estimate of when it will be ready for a particular process Data model ready-to-process – type of process to be perfomred next process-ref – a pointer to an external set of process type definitions used for ready-to-process ready-at – defines the time the content is ready for the process, it could be some time in the past, or some time in the future revised – indicates is this is a different version of content that was previously marked as ready for the declared process priority – high or low complete-by – indicates target date-time for completing the process

Polling extension to CMIS Polling schemes describe the way in which documents are polled

Polling extension to CMIS Polling schemes describe the way in which documents are polled for updated readiness properties scheme name / ID polling interval notification method notification target / host port (for network connection) readiness property readiness value

Polling sequence

Polling sequence

Readiness modelled as custom object Readiness modelled with an aspect

Readiness modelled as custom object Readiness modelled with an aspect

Polling Schemes

Polling Schemes

Document model with localisation

Document model with localisation

Technical setup Repository browser tool Polling system Notification system Test tools

Technical setup Repository browser tool Polling system Notification system Test tools

Evaluation Notification response time 250. 00 Notification time (seconds) 40. 00 35. 00 30.

Evaluation Notification response time 250. 00 Notification time (seconds) 40. 00 35. 00 30. 00 25. 00 20. 00 15. 00 10. 00 5. 00 0. 00 200. 00 150. 00 Interval time 100. 00 Mean notification time 50. 00 Polling scheme (interval in seconds) 10 . 0 30 0. 0 50 0. 0 70 0. 0 90 0 11. 00 0 13. 00 0 15. 00 0 17. 00 0 19. 00 0. 00 2. 00 6. 0 10 0. 0 14 0. 0 18 0. 0 22 0. 0 26 0. 0 30 0. 0 34 0. 0 38 0. 0 0 Notification time (seconds) 45. 00 Polling scheme (interval in seconds) Interval time Mean notification time

Evaluation Performance evaluation 14 10 8 alfresco 6 mysql poller 4 simulator 2 0

Evaluation Performance evaluation 14 10 8 alfresco 6 mysql poller 4 simulator 2 0 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 Memory usage (%) 12 Time (seconds)

Content Management - L 10 n Workflow Integration ITS Webbased PE Source CMS QA

Content Management - L 10 n Workflow Integration ITS Webbased PE Source CMS QA viewer Content Management Reassemble Target CMS Parse, filter, segment XLIFF/ PROV MT MT XLIFF+ITS RDF provenance store Localisation Preparation Workflow Management XLIFF store TM CAT Translation Management

XLIFF and Open Provenance Capture XLIFF transformations that operate on content and its meta-data

XLIFF and Open Provenance Capture XLIFF transformations that operate on content and its meta-data as the result of content processing by different localisation workflow services A provenance model used to capture process operations • agents and properties of those processes Support managing & auditing quality of processes • correlating output of individual steps with professional, crowd and consumer judgement • support end-to-end process management • terminology management On-demand language resource assembly • e. g. for parallel text for MT training

Linked Localisation Data: RDF-based logging Open Provenance Vocabulary • http: //openprovenance. org/ • Active

Linked Localisation Data: RDF-based logging Open Provenance Vocabulary • http: //openprovenance. org/ • Active W 3 C Provenance working group

LT Assisted Localisation Process Provenance was. Controlled. By j. doe Machine Trans 12401 was.

LT Assisted Localisation Process Provenance was. Controlled. By j. doe Machine Trans 12401 was. Translated. From was. Generated. By was. Generated. At xml: lang 2010 -02 -09 T 12: 30: 00 fr-FR value 15601 was. Translated. From m. bean 2010 -02 -12 T 13: 17: 00 l. jfinn was. Controlled. By 15790 “Poor” 2010 -02 -13 T 10: 07: 00 value “Je suis une string” 16723 value was. Generated. By was. Annotated. With value 2010 -02 -14 T 14: 05: 00 was. Generated. At 2010 -02 -14 T 10: 30: 00 “Je suis un phrase” 16740 ms was. Annotated. With was. Generated. By anomolous Trans QA 16734 value c 3 po Prof trans expended was. Generated. By 16727 Text Classify value was. Generated. At d. jones was. Generated. At was. Annotated. With was. Generated. By 15771 Crowd rate was. Generated. At was. Controlled. By Crowd PE was. Generated. By was. Revised. From “Je suis un string” was. Controlled. By “I am a string” value was. Controlled. By was. Generated. At 2010 -02 -14 T 13: 21: 00 pass s. curran

Future LSP-Neutral Open Service CMIS+ITS+PROV Client Source CMS QA viewer Target CMS Common Services

Future LSP-Neutral Open Service CMIS+ITS+PROV Client Source CMS QA viewer Target CMS Common Services Content status/ update Provenanc e query Resource Curation/ Sharing CMIS+ ITS+XLIFF +PROV XLIFF+ITS LSPs TMS+ L 10 n tools

Conclusion Have extended CMIS to support: Document level ITS rules Open document change notification

Conclusion Have extended CMIS to support: Document level ITS rules Open document change notification mechanism Strong potential to streamline CMS-L 10 n integration in combination with XLIFF and PROV Achieved with current CMIS specification Custom extension to folder object Custom extension to policy object may be better Next Steps Combining standards for vendor-neutral CMS integration Aligh with ITS 2. 0 and XLIFF 2. 0 Discuss extensions with CMIS-compliant vendors

Questions. THANK YOU. Follow ITS Use Case at: http: //www. w 3. org/International/multilingualweb/lt/wiki/CMS_Neutral_Externa l_ITS_Rules_and_Readiness

Questions. THANK YOU. Follow ITS Use Case at: http: //www. w 3. org/International/multilingualweb/lt/wiki/CMS_Neutral_Externa l_ITS_Rules_and_Readiness Follow XLIFF+ITS mapping at: http: //www. w 3. org/International/multilingualweb/lt/wiki/XLIFF_Mapping