Unicode Mark Davis Unicode Consortium President IBM Chief
Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect 2003 -09 -24
Universal Character Encoding n Unique number for every character …
Unifies all Languages n 96 thousand characters, so far n All characters accessible at the same time, in the same document: A, Ž, Ш, Δ, ﺵ , �, �, �, … か, 上, 각, …. .
Lingua Franca for Computers n Developed & supported by industry leaders: n n Required by modern standards: n n Apple, HP, IBM, Just. System, Microsoft, Oracle, SAP, Sun, Sybase, Unisys, … XML, HTML, Java, ECMAScript (Java. Script), LDAP, CORBA 3. 0, WML, Perl, etc. Implemented in: n All modern operating systems, browsers, and other products
International Domain Names n Approved - Unicode-Based n Examples: nhttp: //Юникод. com nhttp: //Βαλκανίω ν. com nhttp: //����. com
Standard Resources nwww. unicode. org Online Standard n Technical Reports n FAQs n General Information n Discussion Forums, Conferences n
Programming Resources n System n APIs: Windows, Java, Unix, Oracle, DB 2, Sybase, Mac, Linux, … n Languages n Java, Java. Script, C#, Perl 5. 6. 0, C, C++, SQL, … n Cross-platform n libraries: ICU, Rosette, …
Stability Developers / other standards need absolute stability n Characters are never moved or deleted n n Characters never change names n n Ordering of characters is by collation, not binary order. See UTS #10: Unicode Collation Algorithm Characters may be deprecated (discouraged). Annotations are used to clarify usage See Unicode Policies
Indic Support in Unicode n ISCII the basis for characters and allocation n Consortium actively engaged with Indian Government, which is a member n Welcomes addition of missing characters (e. g. Vedic), clarifications or corrections of usage
Structural Similarities with ISCII n Within script, layout and contents nearly identical n Independent + dependent vowels n Halant model for representing conjuncts / half-forms not directly encoded n represented by sequences instead n n Phonetic sequence – order in syllables
Structural Differences with ISCII n Unicode is stateless: No shifting to get different scripts n Each character has a unique number n n Unicode is uniform: No extension bytes necessary n All characters coded in the same space n
Additional Characters n Indian Government is developing proposals for: n Additions of missing characters: Vedic n Individual characters for certain scripts n n Annotations and Descriptions
Global Applications now support languages of India n Companies supporting Indic with Unicode n Open. Type fonts n Font support for Indic n Microsoft Windows n Java (IBM contributed ICU Indic Layout) n Linux n…
Benefits for India n All documents, anywhere in the world, can have Indic text n Allows seamless multilingual documents in India n including scriptures and minority languages n Opens up software export market, beyond English n Connects India to the world
How India Can Contribute Effective Communication with the Unicode Consortium n Provide Resources for Development n n n n Descriptions of Usage Descriptions of Character Shaping Transliteration Tables from Script to Script Collation Information Open. Type fonts …
What Developers Can Do n Interwork with existing ISCII systems n Move to Unicode for future developments n Java, Windows, Linux, …
The Future n The world is moving rapidly to Unicode n Unicode makes India open to the world The world comes to you, and n You go to the world n n You can help
Q&A
Backup Slides
Multiple Forms n UTF-8: maximal compatibility with 8 -bit systems n UTF-16: good storage, interoperability with Windows/Java n UTF-32: simplest processing n Fast, lossless conversion n See Forms of Unicode
- Slides: 20