Common Locale Data Process Issues and Challenges George

  • Slides: 27
Download presentation
Common Locale Data: Process, Issues, and Challenges George Rhoten grhoten@us. ibm. com Steven R.

Common Locale Data: Process, Issues, and Challenges George Rhoten grhoten@us. ibm. com Steven R. Loomis srloomis@us. ibm. com IBM San José Globalization Center of Competency Copyright © 2004 IBM Corporation 26 th Internationalization and Unicode Conference, San José, CA, September 2004

Locale Data Confusion Variations in localized data can irritate or confuse users… OS #1:

Locale Data Confusion Variations in localized data can irritate or confuse users… OS #1: 2003 -02 -17 (févr. ) OS #2: 03 -02 -17 (fév) Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 2

Locale Data Problems Mismatched data can be catastrophic… Searching… OS #1: 24 records in

Locale Data Problems Mismatched data can be catastrophic… Searching… OS #1: 24 records in {Z. . Aa} Searching… OS #2: 0 records in {Z. . Aa} NOT FOUND. Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 3

What is Locale Data? • Locale = identifier string referring to linguistic and cultural

What is Locale Data? • Locale = identifier string referring to linguistic and cultural preferences • Typical data – – – Dates/times Numbers Measurement Currency Sorting (Collation) Translated country and language names Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 4

Where is locale data found? • International Components for Unicode (ICU) • Open. Office.

Where is locale data found? • International Components for Unicode (ICU) • Open. Office. org • Operating Systems – Linux, Solaris, AIX, Windows, … • Java • Other vendors: People. Soft, Oracle, … Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 5

Current Status • CLDR 1. 1. 1: June 8, 2004 • CLDR 1. 2:

Current Status • CLDR 1. 1. 1: June 8, 2004 • CLDR 1. 2: In progress now (Sep 2004) – Common locale data repository • Sponsored by the Unicode Consortium http: //unicode. org/cldr Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 6

Repository Objectives • • • Common XML format for locale data Collect data from

Repository Objectives • • • Common XML format for locale data Collect data from platforms Make repository available to the public Validate and release corrected data Enable W 3 C Web Services – Exchange and display of data in localized form – Consistent behavior between services Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 7

Vetting Process 1. Data collected from different platforms, experts 2. Data is in the

Vetting Process 1. Data collected from different platforms, experts 2. Data is in the repository marked “Draft” 3. Verify data with experts. Mark as compliant to national standards. 4. Vote to accept data- may have to choose between multiple “Acceptable” forms 5. Data becomes “Final”. Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 8

Locale Data Markup Language • XML "vocabulary" for locale data interchange • Data stored

Locale Data Markup Language • XML "vocabulary" for locale data interchange • Data stored in separate files (fr. xml or cs_CZ. xml) • Inheritance used: ‘root. xml’ root locale, ‘fr. xml’ for French, ‘fr_CA. xml’ for French, Canada Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 9

Locale Naming • ISO-639 + ISO-15924 + ISO-3166 +Variant: en — English fr_BE —

Locale Naming • ISO-639 + ISO-15924 + ISO-3166 +Variant: en — English fr_BE — French as in Belgium zh_Hant — Traditional Chinese hy_AM_REVISED — Armenian, Armenia (Revised Orthography) • or RFC-3066 • with Keywords: de_DE@collation=phonebook as in Germany, Phonebook collation. Common Locale Data: Process, Issues, and Challenges — German San José, CA, September 2004 10

Scope of CLDR Data • Naming of Locales, • Collation • Exemplar Characters other

Scope of CLDR Data • Naming of Locales, • Collation • Exemplar Characters other entities • Preferred Legacy • Formatting Encoding - Numbers • Yes/No Responses - Currency • Paper Size, Layout, and - Times Orientation - Time Zones • Measurement Units - Dates / Calendars Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 11

Inheritance fr • Janvier, Février… • 1, 234. 56 ¤ • … fr_CA •

Inheritance fr • Janvier, Février… • 1, 234. 56 ¤ • … fr_CA • 1 234, 57 $ • … Common Locale Data: Process, Issues, and Challenges fr_LX • 1. 234, 57 € • … San José, CA, September 2004 12

Sources of CLDR Data • IBM Cultural Information Repository • Java • International Components

Sources of CLDR Data • IBM Cultural Information Repository • Java • International Components for Unicode (ICU) • Apple • Comparisons with Platform data • Other contributions and bug reports • 288 Locales, 94 Languages. Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 13

CLDR Data Process • Data is available for evaluation – XML files – Locale

CLDR Data Process • Data is available for evaluation – XML files – Locale Explorer & other interactive demos – Comparison charts • Comments and Corrections via web site • Changed data marked “Draft” • Language Experts vet Draft data & conflicts • New CLDR version released Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 14

Language & Country Contacts • Changes are discussed on mailing lists by engineers. •

Language & Country Contacts • Changes are discussed on mailing lists by engineers. • Remaining conflicts are resolved by vote of the CLDR Technical Committee (TC) • TC members are encouraged to have their own experts available to advise them. • All data must be vetted by TC (with advice from Contacts) before each release. Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 15

Display / Feedback • (Prototype) Common Locale Data: Process, Issues, and Challenges San José,

Display / Feedback • (Prototype) Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 16

Comparison Charts • Generated to compare CLDR data with other platforms, or other versions

Comparison Charts • Generated to compare CLDR data with other platforms, or other versions of CLDR • Show at a glance the salient differences and similarities between platforms • “Majority” data can help in detecting errors Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 17

Comparison Chart Example nl_ NL CLDR Win 32 Java Date yyyy 'm. ' MMMM

Comparison Chart Example nl_ NL CLDR Win 32 Java Date yyyy 'm. ' MMMM d 'd. ', EEEE yyyy 'm. ' MMMM d 'd. ' EEEE, yyyy, MMMM d Suid-Afrika Suid Afrika ZA Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 18

Causes of Conflicting Data • Typographical errors • Regional differences • Context – Parts

Causes of Conflicting Data • Typographical errors • Regional differences • Context – Parts of Speech • • Standards vs. Common Use Misunderstanding Uncommon cases Individual Preferences Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 19

Typographical Errors • Data input errors or mistakes • If not obvious, may require

Typographical Errors • Data input errors or mistakes • If not obvious, may require Language Expert advice to correct. Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 20

Regional differences German: Bangladesch Ruanda Kap Verde Bangladesh Rwanda Kapverden Germany, Belgium, … Common

Regional differences German: Bangladesch Ruanda Kap Verde Bangladesh Rwanda Kapverden Germany, Belgium, … Common Locale Data: Process, Issues, and Challenges Switzerland San José, CA, September 2004 21

Contexts of use • German Collation – Dictionary: öf, of – Phonebook: of, öf

Contexts of use • German Collation – Dictionary: öf, of – Phonebook: of, öf – de@collation=phonebook Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 22

Part of Speech • Calendar title: – март 2004 (March 2004) • Date: –

Part of Speech • Calendar title: – март 2004 (March 2004) • Date: – 3 марта (3 rd of March) Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 23

Standards vs. Common Use • Follow national standards where possible • Allow “Common Practice”

Standards vs. Common Use • Follow national standards where possible • Allow “Common Practice” to coexist Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 24

Misunderstandings • Translating internal structure, programmatic keywords • Wrong: – Pattern “#. ###, 00”

Misunderstandings • Translating internal structure, programmatic keywords • Wrong: – Pattern “#. ###, 00” • Right: – – “#, ###. 00” Group: “. ” Decimal: “, ” 1. 234, 56 Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 25

Uncommon Cases and Individual Preferences • Obscure, Archaic, Underspecified • Language ‘peo’ – English:

Uncommon Cases and Individual Preferences • Obscure, Archaic, Underspecified • Language ‘peo’ – English: “Old Persian” – Maltese: “Persjan Antik” • Currency code ‘TTO’ – Danish: Gammel dollar fra Trinidad og Tobago – English: Trinidad and Tobago Old Dollar Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 26

Panel Discussion • Common Locale Data: – Process, Issues and Challenges • Panelists –

Panel Discussion • Common Locale Data: – Process, Issues and Challenges • Panelists – – – Texin, Xen. Wacraft Deborah Goldsmith, Apple George Rhoten, IBM Raghuram Viswanadha, IBM Steven R. Loomis, IBM • http: //unicode. org/cldr Common Locale Data: Process, Issues, and Challenges San José, CA, September 2004 27