Review What is multilingual computing Bilingual trilingual vs
Review • What is multilingual computing? • Bilingual, trilingual, vs. Multilingual • What are the fundamental issues in multi-lingual computing? – Representation of each language in a computer – Ways to distinguish different scripts – How can a system be designed so that it can be used by different languages with minimal changes – How can a system be designed so that it can be used for multiple languages Review 1
Characteristics of different scripts • What is a script? • What are the different types of scripts and examples of them ? – Token-based/Alphabet-based scripts, – phonetic based scripts, – Ideographs • What is a phonetic transcription system and examples of them? • What is Romanization? Review 2
Characteristics of Chinese • Graphemics • Variant writing (e. g. 教 都 ) • Phonetics ( the sound, 音) • Types of phonemes • Semantics (the meaning, 義 ) • Independence of meaning Review 3
Computer representation of characters • Selection of a finite set of characters → character set – Uniqueness → each character/symbol • Design of a coded character set → codeset – Uniqueness → each codepoint assignment – Different coding length → different codesets • What are the following terms mean? – Codepoint • Length of a codepoint – Code space • Size of a code space – Code range – Order of characters ( in a char. Set vs. a codeset) Review 4
• What are the different numerical notations? – Decimal notation – Binary notation – Hexadecimal notation – Scalar value • Characteristics of the ASCII codeset • What is the Row-cell notation? • What are character subsets and why? • Character set comparison operations • Codeset comparison operations – Character set – Codepoint assignment • Review. Compatibility 5
• What is an encoding method and why do we need it? • What is the so called high-bit on scheme? • What are the characteristics of GB-2312? – No. of Rows, No. of columns → code space – Code range? – Major subsets? – Full characters vs. half characters • What are the characteristics of Big 5 and Etan Big 5? – Rows, columns → code space – Major subsets? – What are UDAs and VDAs for? Review 6 • HKSCS
• Other codesets using high-bit on schemes? • Encodings using designation(指定)? – ISO 2022 – Extended Unix Code(EUC) • What is Charset registry and why? • Problems with different codesets? – Compatibility → wrong interpretation of data – Solutions: Codeset announcement(using designation) and conversion → conversion problems Review 7
ISO 10646 and Unicode • What are the design principles of ISO 10646? • What are the different coding structures in ISO 10646? • What is the structure of UCS-4? • What is the characteristics of BMP? • What is the structure of BMP? • What is UCS-2? • What is the compatibility zone for? • What is the difference between ISO 10646 and Unicode? • Big Endian vs Little Endian notation: FEFF vs FFFE Review 8
• What is Extension A and Extension B? – Where were they coded? • What is Surrogate pairs, what is the need for surrogate pairs, and how does it work? • What is UTF, what is its purpose and how does UTF-8 work? • What is the difference between a character and a glyph? • What is the difference between multi-byte character and wide character ? Review 9
Input Methods • • What is an input method, why do we need it? What are the different types of input methods? What is a keyboard-based input method? How to design an IM? – What is the basic requirement? – What are the limitations? – What information can be used in IM design? • Who are the main users? • Efficiency consideration? • What are the two types of IM? – Applicability and limitations • What is keyboard arrangement, why do we need 10 it? Review
Software L 10 N and I 18 N • • • What is L 10 N and why do we need it? What is I 18 N and why do we need it? What are the principles in I 18 N? How to design I 18 N programs? What is POSIX and what is its purpose? What is the name of the POSIX facility for a specific region? • What are the components in a POSIX NLS package? • What is a locale and what are the classes in each locale? Review 11
• POSIX provides a set of interface functions, how are their behaviors defined and in where? • What are the major files in each locale? • If POSIX where never developed, can you still develop an I 18 N program on top of an operating system? • What is a symbolic name and where are they used? • How do we know the binary code of a symbolic name? • Programming using wide character data type vs multi-byte characters • What is collation and how does it work? Review 12
Open systems • • • Review What is an open system? Why do we want open systems? What are the measurements of an open system? What is an open specification? What are the two types of portability issues? What mechanisms can be used to improve portability or how can we write portable programs? 13
Review 14
Output • What are characters, glyphs and fonts? • What are their relationships and/or difference? – Internal representation vs. external representation • What is the difference of character box and bounding box? • Why should there are space between the character box and bounding box? • What does rendering mean? • What are the two different glyph/font representations Review 15
What are the characteristics of bitmap fonts and outline fonts? – Representations, scaling (distortion), space requirement, compression • How to deal with distortion in the scaling of bitmap fonts? – Ad hoc smoothing algorithms – Smoothing spline and interpolation • Understanding of Bazier’s cubic curves – Control points and the equations • Why bitmap to outline conversion is needed? • How does erosion work? Review 16
Unicode on different platforms • Unicode is supported on what platforms and in what forms? – Unix, Windows, Mac, Linux, • What is a code page? • Can Unicode be used if the operating system is not coded using Unicode? • Why would encoding needs to be specified when compiling a Java program? • What are the data structures supporting multi-byte and Unicode in Java? Review 17
I 18 N vs. multilingual applications • What is the difference between an I 18 N program and a multilingual application? • Can a multilingual application be designed/implemented using I 18 N • What needs to be separately considered in the design of multilingual applications • What is the relationship between multi-lingual applications to Unicode? Review 18
IDCs and the IDS • What are ideographic description characters(IDCs)? – Different types of IDCs • Why introducing IDCs? • What is a ideograph description sequence? • How is an IDS between expressed? • For a given character, is its IDS unique? • For a given IDS does it uniquely define a character? Review 19
Information retrieval • Differences of IRS from Database system • Basic components of an IRS • What is the purpose of VSM? what are the data associated with a VSM? • What are the similarity functions for? • What is term selection for and methods to do term selection • What kinds of information can be used as weights for the VSM? Review 20
- Slides: 20