Exchange 2002 PROGRESS WORLDWIDE Text goes in mxyzptlk

  • Slides: 69
Download presentation
Exchange 2002 PROGRESS WORLDWIDE “Text” goes in, “mxyzptlk” comes out! Debugging code page problems

Exchange 2002 PROGRESS WORLDWIDE “Text” goes in, “mxyzptlk” comes out! Debugging code page problems Texin Director, International Business the Progress Company

‘Text” goes in, “Mxyzptlk” comes out! 2002 PROGRESS WORLDWIDE Exchange Files Text Goes In

‘Text” goes in, “Mxyzptlk” comes out! 2002 PROGRESS WORLDWIDE Exchange Files Text Goes In mxyzptlk Database Server comes mxyzptlk out Exchange 2002, Chicago, IL, USA 2 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer?

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer? Code pages (single-byte) The Progress architecture for text Code pages (multi-byte) Problem solving Exchange 2002, Chicago, IL, USA 3 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE What is “mxyzptlk”? Exchange a) Garbled text (aka “garbage”) b) “Mojibake”

2002 PROGRESS WORLDWIDE What is “mxyzptlk”? Exchange a) Garbled text (aka “garbage”) b) “Mojibake” in Japan c) A magical imp that plays tricks on Superman d) M 6 K e) All of the above Cartoon courtesy Warner Bros, Inc. Mr. Mxyzptlk is a trademark of DC Comics Exchange 2002, Chicago, IL, USA 4 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE In Japan, it’s called Mojibake Exchange 2002, Chicago, IL, USA 5

2002 PROGRESS WORLDWIDE In Japan, it’s called Mojibake Exchange 2002, Chicago, IL, USA 5 Exchange © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer?

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer? Code pages (single-byte) The Progress architecture for text Code pages (multi-byte) Problem solving Exchange 2002, Chicago, IL, USA 6 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE What is text (to a computer)? A rose by any other

2002 PROGRESS WORLDWIDE What is text (to a computer)? A rose by any other name? Exchange 2002, Chicago, IL, USA Exchange 41 20 72 6 F 73 65 20 62 79 20 61 6 E 79 20 6 F 74 68 65 72 20 6 E 61 6 D 65 3 F 7 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange “Text? We don’t need no stinkin’ text” n n Computers

2002 PROGRESS WORLDWIDE Exchange “Text? We don’t need no stinkin’ text” n n Computers (software) do not know about text. They only know binary values. By assigning numbers to characters, and associating behaviors and values to those numbers, programs give the illusion of characters, text and their semantics. – Behaviors: word breaking, hyphenation – Values: glyph image, case, alphabetic, numeric, sort Exchange 2002, Chicago, IL, USA 8 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange What makes text, text? n n n Typing, Input method

2002 PROGRESS WORLDWIDE Exchange What makes text, text? n n n Typing, Input method Display Print Digits Operators “Words” n n Collation Word wrapping Justification Hyphenation – Next, Previous n n Punctuation Upper, Lower case Exchange 2002, Chicago, IL, USA 9 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer?

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer? Code pages (single-byte) The Progress architecture for text Code pages (multi-byte) Problem solving Exchange 2002, Chicago, IL, USA 10 © 2002, Progress Software Corporation

Code pages A collection of ordered symbols 2002 PROGRESS WORLDWIDE Exchange Take a collection

Code pages A collection of ordered symbols 2002 PROGRESS WORLDWIDE Exchange Take a collection of symbols n Letters, Digits- a, å, 1, 2, . . . n Punctuation, Arithmetic Operators n Special Symbols- e. g. ©, ¶, ¥, £, § n Line Drawing n Control Codes Assign each a unique number (code point): n å = 229 ISO 8859 -1 n å = 134 IBM 850 Exchange 2002, Chicago, IL, USA 11 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Western Europe (CP 1252) Code page 32 64 96 128

2002 PROGRESS WORLDWIDE Exchange Western Europe (CP 1252) Code page 32 64 96 128 160 192 224 Characters in the range 128 -159 in 1252 are not in ISO 8859 -1. Don’t label text as ISO-8859 -1, if it is Windows-1252. Exchange 2002, Chicago, IL, USA 12 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Russian Windows Code Page CP 1251 32 64 96 128

2002 PROGRESS WORLDWIDE Exchange Russian Windows Code Page CP 1251 32 64 96 128 160 192 224 Exchange 2002, Chicago, IL, USA 13 © 2002, Progress Software Corporation

If a character is not in the current 2002 code page, how can it

If a character is not in the current 2002 code page, how can it be expressed? PROGRESS WORLDWIDE Exchange ? Exchange 2002, Chicago, IL, USA ? 14 © 2002, Progress Software Corporation

If a file contains a 255, what character does it represent? Exchange 2002, Chicago,

If a file contains a 255, what character does it represent? Exchange 2002, Chicago, IL, USA 15 2002 PROGRESS WORLDWIDE Exchange © 2002, Progress Software Corporation

Text: numeric values with 2002 associated attributes and behaviors PROGRESS WORLDWIDE Exchange n n

Text: numeric values with 2002 associated attributes and behaviors PROGRESS WORLDWIDE Exchange n n n n Aa 1+¿ Zzß ® 9& “XYZ” > “ABC” < “ 5” I paid $123. 45 for a Z-3 He paid € 1 k for a 1998 4+5 Windows to Unix Exchange 2002, Chicago, IL, USA 16 n n n n Alphabet recognition Case Collation Word break Number recognition Arithmetic Conversion © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Rules for “text” vary with language Sorting for example: English:

2002 PROGRESS WORLDWIDE Exchange Rules for “text” vary with language Sorting for example: English: ABC. . . RSTUVWXYZ German: AÄB. . . NOÖ. . . SßTUÜV…YZ Swedish/Finnish: ABC. . . RSTUVWXYZÅÄÖ Norwegian: ABC. . . VWXÜZÆØÅ Note Y = Ü Exchange 2002, Chicago, IL, USA 17 © 2002, Progress Software Corporation

What happens in the Operating System when you press a key? Font tables map

What happens in the Operating System when you press a key? Font tables map code point 115 => s to Glyph (image) 2002 PROGRESS WORLDWIDE Exchange s O/S (Code page) tables define character behavior 115 => letter, lowcase, Keyboard driver maps to character values+controls Alt + 115 Keyboard generates scan codes Alt + Row C key 2 Exchange 2002, Chicago, IL, USA 18 sort rank = a + 18 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer?

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer? Code pages (single-byte) The Progress architecture for text Code pages (multi-byte) Problem solving Exchange 2002, Chicago, IL, USA 19 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Text rules that Progress “knows” n n n n FORMAT

2002 PROGRESS WORLDWIDE Exchange Text rules that Progress “knows” n n n n FORMAT “A(9)” CAPS, LC “XYZ” > “ABC” Where x contains y INTEGER(mystring) 4+5 INPUT FROM OUTPUT TO Exchange 2002, Chicago, IL, USA Alphabet recognition Case Collation Word break Number recognition Arithmetic Conversion 20 © 2002, Progress Software Corporation

Defining text in Progress n n 2002 PROGRESS WORLDWIDE Exchange Tables in DLC/prolang/CONVMAP/*. DAT

Defining text in Progress n n 2002 PROGRESS WORLDWIDE Exchange Tables in DLC/prolang/CONVMAP/*. DAT Compiled into CONVMAP. CP – ISALPHA n n “Type 1” - define alphabetic characters “Type 2” - define multibyte lead/tail bytes – UPPERCASE-MAP – LOWERCASE-MAP – CONVERT – COLLATION n Word. Break Tables *. WBT in …/CONVMAP/ Exchange 2002, Chicago, IL, USA 21 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange From DLC/PROLANG/CONVMAP/*. DAT CONVERT n SOURCE-NAME "ISO 8859 -1" n

2002 PROGRESS WORLDWIDE Exchange From DLC/PROLANG/CONVMAP/*. DAT CONVERT n SOURCE-NAME "ISO 8859 -1" n TARGET-NAME "IBM 850" n TYPE "1" n /*000 -015*/ 000 001 002 003 004 005. . . 015 n … n /*208 -223*/ 209 165 227 224 226 229. . . n /*224 -239*/ 133 160 131 198 132 134. . . n /*240 -255*/ 208 164 149 162 147 228. . . n ENDTABLE n ENDCONVERT Exchange 2002, Chicago, IL, USA 22 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange CONVMAP. CP In Progress 9. 1 C: n n Over

2002 PROGRESS WORLDWIDE Exchange CONVMAP. CP In Progress 9. 1 C: n n Over 50 Character sets 25 Collations, including: – Arabic 9, Basic 9, Croatian, Czech, Danish, Finnish, German Library, German 9, Greek, Hebrew, Hungarian, Icelandic, Latvian, Lithuanian, Norwegian, Polish, Romanian, Russian, Spanish 9, Swedish, Thai, Turkish n n 2 Case rules (Basic, French) 300+ Code page conversions Exchange 2002, Chicago, IL, USA 23 © 2002, Progress Software Corporation

ISALPHA tables map characters 2002 by code point to a true/false value PROGRESS WORLDWIDE

ISALPHA tables map characters 2002 by code point to a true/false value PROGRESS WORLDWIDE Exchange Code page = ISO 8859 -1 Character a b Code Point 97 98 Alpha value 1 1 Exchange 2002, Chicago, IL, USA 24 ’ ç 146 231 0 1 © 2002, Progress Software Corporation

Case tables map characters by code 2002 point to Upper & Lower code points

Case tables map characters by code 2002 point to Upper & Lower code points PROGRESS WORLDWIDE Exchange Code page = ISO 8859 -1 Character a A c Code Point 97 65 99 Upper case 65 65 Lower case 97 97 Exchange 2002, Chicago, IL, USA 25 é 233 70(E)? 67 201(É)? 99 233 © 2002, Progress Software Corporation

2 Collation tables map characters 2002 by code point to sort weight PROGRESS WORLDWIDE

2 Collation tables map characters 2002 by code point to sort weight PROGRESS WORLDWIDE Exchange Code page = ISO 8859 -1 Character a A c == ç Code Point 97 65 99 Case Sens. 1 2 5 == 5 Case Insens. 1 == 1 5 == 5 231 ç sorts like c, having identical sort weights Exchange 2002, Chicago, IL, USA 26 © 2002, Progress Software Corporation

Conversion tables map characters 2002 by code point to other code points PROGRESS WORLDWIDE

Conversion tables map characters 2002 by code point to other code points PROGRESS WORLDWIDE Exchange Code page = ISO 8859 -1 Character a b ISO 8859 -1 Code Point 97 98 146 231 IBM 850 Code Point 97 98 202 135 Exchange 2002, Chicago, IL, USA 27 ’ ç © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Code page startup parameters Exchange -cpinternal - code page in memory

2002 PROGRESS WORLDWIDE Code page startup parameters Exchange -cpinternal - code page in memory and GUI -cpstream - code page for in/out data. -cpprint- Printer code page -cpterm - Terminal’s code page -cpcase - Upper/lower case rules -cpcoll - Collation table for 4 GL, not DB -convmap - convmap. cp directory -cprcodein - override R-code page -cprcodeout - R-code page Exchange 2002, Chicago, IL, USA 28 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Items labeled with Code page n n n Exchange R-code files

2002 PROGRESS WORLDWIDE Items labeled with Code page n n n Exchange R-code files Progress and other databases Dump files (. df, . d) Promsgs files Progress database also “labels” collation and word break HTML, XML files can be labeled Exchange 2002, Chicago, IL, USA 29 © 2002, Progress Software Corporation

Russian Progress configuration (every blue line is conversion) 2002 PROGRESS WORLDWIDE Exchange -CPINTERNAL 1251

Russian Progress configuration (every blue line is conversion) 2002 PROGRESS WORLDWIDE Exchange -CPINTERNAL 1251 –CPCASE BASIC –CPCOLL RUSSIAN Client -CPINTERNAL ISO 8859 -5 -CPPRINT KOi 8 -R Database Server Dos Files 1251 RUSSIAN ISO 8859 -5 -CPSTREAM R-code labeled IBM 866 Exchange 2002, Chicago, IL, USA Database labeled 30 © 2002, Progress Software Corporation

Startup parameters and labels determine table choices Ø Collation for 4 GL cpinternal+cpcoll or

Startup parameters and labels determine table choices Ø Collation for 4 GL cpinternal+cpcoll or cpinternal+db collation n Db collation n cpinternal+cpcase n cpinternal+cpstream n cpinternal+cpprint n cpinternal+cpterm n cpinternal+item cp n Client+Server cp n Exchange 2002, Chicago, IL, USA 2002 PROGRESS WORLDWIDE Exchange Ø Ø Ø Ø 31 Collation for indexes Case Conversion Conversion © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer?

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer? Code pages (single-byte) The Progress architecture for text Code pages (multi-byte) Problem solving Exchange 2002, Chicago, IL, USA 32 © 2002, Progress Software Corporation

Traditional Chinese (CP Big-5) Exchange 2002, Chicago, IL, USA 33 2002 PROGRESS WORLDWIDE Exchange

Traditional Chinese (CP Big-5) Exchange 2002, Chicago, IL, USA 33 2002 PROGRESS WORLDWIDE Exchange © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Double-byte programming n Japanese, Chinese, Korean only – Thousands of

2002 PROGRESS WORLDWIDE Exchange Double-byte programming n Japanese, Chinese, Korean only – Thousands of characters require > 8 bit CP n n Some characters are 1 byte, some 2 bytes Problems are caused by: – Assuming 1 character is 1 byte – Assuming 1 character is 1 column – Manipulating bytes instead of characters Exchange 2002, Chicago, IL, USA 34 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Japanese, Chinese, Korean Languages with >255 characters Mixed size characters:

2002 PROGRESS WORLDWIDE Exchange Japanese, Chinese, Korean Languages with >255 characters Mixed size characters: 1 or 2 bytes How long is a DBCS string? A b c Byte type: # Chars: # Bytes: Exchange 2002, Chicago, IL, USA S 1 1 S 2 2 S 3 3 d e L T L T S 4 4 5 5 6 6 7 4 5 6 7 8 9 10 S 8 11 日 35 本 語 © 2002, Progress Software Corporation

Unicode Character Set 2002 A Worldwide, Multilingual Code Page PROGRESS WORLDWIDE Exchange UTF-8 is

Unicode Character Set 2002 A Worldwide, Multilingual Code Page PROGRESS WORLDWIDE Exchange UTF-8 is multibyte ! Example Unicode Characters Exchange 2002, Chicago, IL, USA 36 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer?

2002 PROGRESS WORLDWIDE Exchange Agenda What is mxyzptlk? What is text to a computer? Code pages (single-byte) The Progress architecture for text Code pages (multi-byte) Problem solving Exchange 2002, Chicago, IL, USA 37 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange “When I print a report I get garbage characters in

2002 PROGRESS WORLDWIDE Exchange “When I print a report I get garbage characters in my report? Can you help me? ” John or Jill Q. Public Exchange 2002, Chicago, IL, USA 38 © 2002, Progress Software Corporation

‘Text” in, “Mxyzptlk” out! 2002 st Start debugging. What is the 1 step? PROGRESS

‘Text” in, “Mxyzptlk” out! 2002 st Start debugging. What is the 1 step? PROGRESS WORLDWIDE Exchange Files Text Goes In mxyzptlk Database Server comes mxyzptlk out Exchange 2002, Chicago, IL, USA 39 © 2002, Progress Software Corporation

Techniques at your disposal Identify all the players 2002 PROGRESS WORLDWIDE Exchange Keyboard->client->server->database->server->client->printer A

Techniques at your disposal Identify all the players 2002 PROGRESS WORLDWIDE Exchange Keyboard->client->server->database->server->client->printer A B C B A D+font “ç”> client> ABconvert> BCconvert> CBconvert> BAconvert> ADconvert> Font >“ç” n Which components are involved? – Devices (terminal, printer) – Operating system(s) – Drivers, 3 rd party S/W (Java, ODBC, OCX, . . . ) and are they internationalized. . . – Progress components (client, appserver, . . . ) n n Which code pages do they use? Which conversions are performed & when? Exchange 2002, Chicago, IL, USA 40 © 2002, Progress Software Corporation

Determine expected and actual results 2002 PROGRESS WORLDWIDE Exchange Identify, as much as possible:

Determine expected and actual results 2002 PROGRESS WORLDWIDE Exchange Identify, as much as possible: n n Characters that went in and resulting characters Determine their code points if possible. Evaluate inputting other characters and results How they were input (keyboard, paste, file, net, …) Exchange 2002, Chicago, IL, USA 41 © 2002, Progress Software Corporation

Determine expected and actual results 2002 PROGRESS WORLDWIDE Exchange Identify, as much as possible:

Determine expected and actual results 2002 PROGRESS WORLDWIDE Exchange Identify, as much as possible: n n How they were output (Terminal, printer, file, …) All steps (processing) in-between Related information (e. g. OS, fonts, regional settings, …) Caution- 3 rd party S/W may require regional settings in unique ways or with different values Exchange 2002, Chicago, IL, USA 42 © 2002, Progress Software Corporation

Confirm the Progress environment 2002 Use 4 GL statements for verification PROGRESS WORLDWIDE Exchange

Confirm the Progress environment 2002 Use 4 GL statements for verification PROGRESS WORLDWIDE Exchange – DBCODEPAGE (db_id) – DBCOLLATION (db_id) – GET-CODEPAGES – GET-COLLATIONS (codepage) – SESSION: CPINTERNAL, CPSTREAM, ETC. – RCODE-INFO – ASC – CHR Exchange 2002, Chicago, IL, USA 43 © 2002, Progress Software Corporation

Techniques at your disposal Compare with other code pages n n n 2002 PROGRESS

Techniques at your disposal Compare with other code pages n n n 2002 PROGRESS WORLDWIDE Exchange “ç” went in. Which value came out? Does the pattern match a conversion table? The letter “ç” in different code pages – 1252: – ISO 8859 -1: – ROMAN-8: – IBM 850: – IBM 273: Exchange 2002, Chicago, IL, USA 231 181 135 072 44 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange How do I find out about code pages? n n

2002 PROGRESS WORLDWIDE Exchange How do I find out about code pages? n n n Windows Accessory utility: Charmap DLC/prolang/convmap/*. dat 4 GL functions: – GET-CODEPAGES, CONVERT-CODEPAGE n Popular code pages are on the web Exchange 2002, Chicago, IL, USA 45 © 2002, Progress Software Corporation

Printing IBM 850 vs. ISO 8859 -1 Print 135, when 231 is needed. .

Printing IBM 850 vs. ISO 8859 -1 Print 135, when 231 is needed. . . Exchange 2002, Chicago, IL, USA 46 2002 PROGRESS WORLDWIDE Exchange © 2002, Progress Software Corporation

Compare with other code pages 2002 Conversion scenarios to consider PROGRESS WORLDWIDE Exchange Keyboard->client->server->database->server->client->printer

Compare with other code pages 2002 Conversion scenarios to consider PROGRESS WORLDWIDE Exchange Keyboard->client->server->database->server->client->printer A B C B A D+font “ç”> client> ABconvert> BCconvert> CBconvert> BAconvert> ADconvert> Font >“ç” n n n Correct source to wrong target A-X Correct target from wrong source X-B Inverse conversion B-A Extra conversion A-B-X Missing conversion A-A Caution: Multiple wrongs can seem right Exchange 2002, Chicago, IL, USA 47 © 2002, Progress Software Corporation

Compare with other code pages Consider mislabeling n 2002 PROGRESS WORLDWIDE Exchange Mislabeling of

Compare with other code pages Consider mislabeling n 2002 PROGRESS WORLDWIDE Exchange Mislabeling of the data’s code page – E. g. new Euro code pages (ISO 8859 -15) – Asian and other code pages have vendor variations (e. g. “” as currency symbol) n Especially data labeled “ISO 8859 -1” – Windows code pages are commonly misrepresented – Unlabeled Web pages presumed as ISO 8859 -1 n Ask an I 18 n expert for other similar pages, and problematic code points. Exchange 2002, Chicago, IL, USA 48 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Consider fonts n n Fonts are codepagebased Code point may

2002 PROGRESS WORLDWIDE Exchange Consider fonts n n Fonts are codepagebased Code point may be right, and the character image wrong. abcdefghij Verify the font (check device, configuration, cartridges, etc. ) On Windows, check script setting Exchange 2002, Chicago, IL, USA 49 © 2002, Progress Software Corporation

4 GL can generate possibilities or 2002 provide verification PROGRESS WORLDWIDE Exchange n Statements

4 GL can generate possibilities or 2002 provide verification PROGRESS WORLDWIDE Exchange n Statements that specify conversions – ASC, CHR – CODEPAGE-CONVERT – INPUT, OUTPUT, INPUT-OUTPUT n Use these during analysis to: – Insert additional conversions – Replace existing conversions – Undo or invert extra conversions – Evaluate alternatives Exchange 2002, Chicago, IL, USA 50 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Use 4 GL to generate possibilities /* take code point

2002 PROGRESS WORLDWIDE Exchange Use 4 GL to generate possibilities /* take code point integer make it a character */ schar = chr(sval). do i = 1 to num-entries(GET-CODEPAGES): target = entry(i, GET-CODEPAGES). /*code page*/ if source = target then next. if target = "undefined" then next. /* Convert it to the current target */ tval = asc(schar, target, source) no-error. /* ignore illegal conversions */ if tval = -1 then next. display sval target tval. /* List results*/ end. Exchange 2002, Chicago, IL, USA 51 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Use 4 GL to generate possibilities Conversions for ISO 8859

2002 PROGRESS WORLDWIDE Exchange Use 4 GL to generate possibilities Conversions for ISO 8859 -1 code point 231: Source-Tgt Codepage------Target 231 IBM 850 135 231 ISO 8859 -15 231 1252 231 ROMAN-8 181 231 IBM 861 135 231 IBM 437 135 231 IBM 037 072 231 IBM 500 072 231 IBM 297 224 231 IBM 284 072 231 IBM 280 224 231 UTF-8 50, 087 231 UCS 2 59, 136 Exchange 2002, Chicago, IL, USA 52 © 2002, Progress Software Corporation

Techniques at your disposal Reduce variables 2002 PROGRESS WORLDWIDE Exchange Keyboard->client->server->database->server->client->printer A B C

Techniques at your disposal Reduce variables 2002 PROGRESS WORLDWIDE Exchange Keyboard->client->server->database->server->client->printer A B C B A D+font “ç”> client> ABconvert> BCconvert> CBconvert> BAconvert> ADconvert> Font >“ç” n n Print character values at after each step Replace each step with known or hardcoded input and repeat comparisons – E. g. Replace UPDATE with CHR(231) – Replace returned values with CHR(231) – Use pattern: CHR(128)+CHR(129). . . CHR(255) Exchange 2002, Chicago, IL, USA 53 © 2002, Progress Software Corporation

Techniques at your disposal Pattern analysis n n 2002 PROGRESS WORLDWIDE Exchange Identify “bad”

Techniques at your disposal Pattern analysis n n 2002 PROGRESS WORLDWIDE Exchange Identify “bad” records using 4 GL Identify records with incorrect, infrequent characters using CHARSCAN proutil <db> -C convcharscan 1252 "188, 189, 190" Charscan searching for iso 8859 -1 character: 188 0 xbc. (6570) Charscan found a character match in Customer. Comments, recid 103. (6569) Charscan match count: 1 (6568) Exchange 2002, Chicago, IL, USA 54 © 2002, Progress Software Corporation

Techniques at your disposal Additional checks n 2002 PROGRESS WORLDWIDE Exchange Differentiate conversion problems

Techniques at your disposal Additional checks n 2002 PROGRESS WORLDWIDE Exchange Differentiate conversion problems from case or collation problems – CAPS(é) = “E” vs. CAPS(é) = “É” – “Å” < “B” or “Å” > “Z” n n n Review DLC/prolang/convmap/*. dat Try a different widget (occasional bug) Watch for third party software/hardware PUT CONTROL "~033 E~033(10 U~033&l 2 A~033&l 1 O~033". Exchange 2002, Chicago, IL, USA 55 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange “My customer is screaming ‘Mojibake!’ Can you help me? ”

2002 PROGRESS WORLDWIDE Exchange “My customer is screaming ‘Mojibake!’ Can you help me? ” John or Jill Q. Public Exchange 2002, Chicago, IL, USA 56 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE What causes Mojibake? Exchange Not treating all bytes as one character

2002 PROGRESS WORLDWIDE What causes Mojibake? Exchange Not treating all bytes as one character n Lead-byte and tail-byte tables define valid bytes and column widths n Keep all bytes of a character together. n Don’t insert in the middle, don’t delete 1 w/o the other, caution at blk boundaries n Tail-bytes can be syntax-significant e. g. “” in pathnames, “~” in 4 GL Exchange 2002, Chicago, IL, USA 57 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Making Mojibake Splitting multibyte characters 日 Byte type: Bytes: 本

2002 PROGRESS WORLDWIDE Exchange Making Mojibake Splitting multibyte characters 日 Byte type: Bytes: 本 語 L T L T 9 F 9 7 8 E 3 A 6 B C A Inserting “a” (61) in second byte 殿 Byte type: Bytes: Exchange 2002, Chicago, IL, USA L T 9 6 3 1 a 坙 { 語 L T S L T F 9 7 8 E A 6 B C A 58 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Making Mojibake Splitting multibyte characters 日 Byte type: Bytes: 本

2002 PROGRESS WORLDWIDE Exchange Making Mojibake Splitting multibyte characters 日 Byte type: Bytes: 本 語 L T L T 9 F 9 7 8 E 3 A 6 B C A Deleting second byte 当 Byte type: Bytes: Exchange 2002, Chicago, IL, USA { 語 L T S L T 9 9 7 8 E 3 6 B C A 59 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Considerations for Mojibake n Exchange At the 4 GL level: –

2002 PROGRESS WORLDWIDE Considerations for Mojibake n Exchange At the 4 GL level: – “RAW” vs. “CHARACTER”, “COLUMN”? – Is –cpinternal a multi-byte code page? – Is the software DBE (double-byte enabled)? – Are bytes in Lead-byte, Tail-byte tables? – Test with IS-LEAD-BYTE function. – Windows requires Default Language Setting (actually a conversion issue. ) Exchange 2002, Chicago, IL, USA 60 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange “I want to sell my application in Poland. Which codepage

2002 PROGRESS WORLDWIDE Exchange “I want to sell my application in Poland. Which codepage supports Polish? Also, I have an opportunity in Viet Nam. Which code page should I use there? ” John or Jill Q. Public Exchange 2002, Chicago, IL, USA 61 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Choosing a code page n n n Exchange For most languages,

2002 PROGRESS WORLDWIDE Choosing a code page n n n Exchange For most languages, there is a clear choice based on the platform. Consider the source of the data (ie any legacy data in a known code page? ) There is no conversion to GUI, so must use Windows code page on Windows. Progress will add code pages for customers if there is a business need. Use Unicode wherever possible. Exchange 2002, Chicago, IL, USA 62 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange “I disagree with the conversion table. Should I create my

2002 PROGRESS WORLDWIDE Exchange “I disagree with the conversion table. Should I create my own conversion table? I also disagree with other tables, should I make my own? ” John or Jill Q. Public Exchange 2002, Chicago, IL, USA 63 © 2002, Progress Software Corporation

Should customers define code page tables? 2002 PROGRESS WORLDWIDE Exchange In general, ask Support

Should customers define code page tables? 2002 PROGRESS WORLDWIDE Exchange In general, ask Support first. n n OK for 1 -way CPPRINT, CPTERM code pages For data storage code pages, collations, conversions, refer to support first. – Conversions must be 1 -1 – Round trip compatibility is required among families of code pages – There can be subtle dependencies for ISALPHA, Indexes, Word break tables Exchange 2002, Chicago, IL, USA 64 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange “Hey thanks. I found the problem. Some of my data

2002 PROGRESS WORLDWIDE Exchange “Hey thanks. I found the problem. Some of my data is now in the wrong code page in my database. How can I correct these records? ” John or Jill Q. Public Exchange 2002, Chicago, IL, USA 65 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Exchange Replacing “mxyzptlk” with “text” n n n Sometimes it is

2002 PROGRESS WORLDWIDE Exchange Replacing “mxyzptlk” with “text” n n n Sometimes it is not possible. It may be possible to look for illegal values, or identify misconverted records. Heuristics exist for identifying code page, if the text is large enough. If the misconversion is well-identified, it may be possible to find records and correct the conversion. Experience/expertise is helpful here. Exchange 2002, Chicago, IL, USA 66 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Want to know more? n n Exchange Globalization Empowerment http: //www.

2002 PROGRESS WORLDWIDE Want to know more? n n Exchange Globalization Empowerment http: //www. progress. com/consulting/globa lization_empowerment_solutions. htm Progress Internationalization Guide OS vendors have international web pages – http: //www. microsoft. com/globaldev/dis_v 1 /disv 1. asp – http: //www. sun. com/globalization Exchange 2002, Chicago, IL, USA 67 © 2002, Progress Software Corporation

2002 PROGRESS WORLDWIDE Text goes in, Text comes out! n n Exchange Progress code

2002 PROGRESS WORLDWIDE Text goes in, Text comes out! n n Exchange Progress code page architecture is straightforward, yet powerful and flexible Debugging code page problems is easy with knowledge of the architecture The 4 GL has diagnostic functions that can help A little knowledge goes a long way, sign up for Globalization Empowerment Exchange 2002, Chicago, IL, USA 68 © 2002, Progress Software Corporation

Questions Exchange 2002, Chicago, IL, USA 2002 PROGRESS WORLDWIDE Exchange 69 © 2002, Progress

Questions Exchange 2002, Chicago, IL, USA 2002 PROGRESS WORLDWIDE Exchange 69 © 2002, Progress Software Corporation