Siebel CRM Unicode Conversion The DBA Perspective Brian

Siebel CRM Unicode Conversion – The DBA Perspective Brian Hitchcock OCP 8, 8 i, 9 i DBA Sun Microsystems brian. hitchcock@sun. com brhora@aol. com www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 1

CRM Unicode Conversion Ÿ Three separate presentations – – – 1) The overall conversion process Ÿ What we had, what we wanted, how to get there Ÿ Issues that come up during conversion 2) Multi byte data in the existing CRM db Ÿ What’s the issue, how did it happen Ÿ A general method to find and fix this problem 3) The actual conversion Ÿ What really happened Ÿ Issues that came up and how they were resolved Ÿ Focus on DBA issues, not Siebel application www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 2

How Did I Get Involved? Ÿ Sleeping in a meeting… Ÿ Heard someone say – “We told the users to stop entering Japanese into the CRM system but we aren’t sure they stopped” Ÿ Woke up, said – – “I’ve done that before…” See “Case of the Missing Kanji” Ÿ Don’t wake up in meetings… www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 3

What’s The Issue? Ÿ Existing Siebel CRM system – – Oracle 8. 1. 7. 4 Single byte character set (WE 8 ISO 8859 P 1) Ÿ Interface systems – – Multi byte character set(s) (UTF 8) Handle data between single, multi byte apps Ÿ Want to convert to Unicode – – Siebel, database, interfaces all should be UTF 8 Eliminate interface systems www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 4

What we had Amer 8859 P 1 Users Emea Apac UTF 8 Tcustdb Apac UTF 8 Custdb Apac UTF 8 Tcustdb Emea Siebel CRM 8859 P 1 Oracle Db UTF 8 Custdb Emea Custdb Amer WE 8 ISO 8859 P 1 Ordering System WE 8 ISO 8859 P 1 www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 5

What we wanted UTF 8 Amer Users Emea Apac UTF 8 Custdb Emea Siebel CRM UTF 8 Oracle Db Custdb Amer WE 8 ISO 8859 P 1 Ordering System AL 32 UTF 8 www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 6

What We Wanted Ÿ All data in one database – – All languages Unicode Ÿ Eliminate interface systems – Reduce support costs Ÿ Support increased CRM functionality – – All data in one place Supports new business functionality www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 7

Would you like fries with that? Ÿ Unicode conversion includes – Oracle db Ÿ Convert to AL 32 UTF 8 character set Required by Siebel for Unicode Ÿ Upgrade to 9. 2. 0. 4 Required to get AL 32 UTF 8 character set – Remove Tcustdb databases Ÿ Modify triggers that link source db to Tcustdb www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 8

And A Shake? Ÿ And, while you’re at it… – Application GUI Ÿ Retrieve different data, multi byte, local language – Clients Ÿ Upgrade to Oracle 9. 2. 0. 4 (SQL*Plus) Ÿ Lots of changes all at once – – Testing How to know impact of each change? www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 9

Converting to Unicode Ÿ It’s easy – right? – Siebel CRM Ÿ make some configuration changes – Oracle database Ÿ Export from single byte database Ÿ Import into new db created with UTF 8 char set – – Testing Done Ÿ This is the ‘management’ view www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 10

What Is Unicode? Ÿ International standard Ÿ Collection of characters – Covers most of the world’s languages Ÿ Chinese poetry? – All characters have unique byte code Ÿ Application developers – – Support Unicode No need to worry about specific languages www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 11

You Make This Stuff Up! Ÿ What follows can be found in – – – Oracle 9 i Database Globalization Support Guide Release 2 (9. 2) Part Number A 96529 01 Ÿ Or, you can trust me… Ÿ Character sets, Unicode – – Consist of set of characters Encoding of the characters to byte codes www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 12

Single Byte Encoding Schemes Ÿ 7 bit encoding schemes – – – Single byte 7 bit up to 128 characters normally support just one language US 7 ASCII Ÿ 8 bit encoding schemes – – – Single byte 8 bit up to 256 characters often support a group of related languages WE 8 ISO 8859 P 1 www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 13

8859 P 1 Character set Oracle Character Set WE 8 ISO 8859 P 1 www. brianhitchcock. net Brian Hitchcock September 15, 2004 Hex 0 x 41 is A DCSIT Technical Services DBA Page 14

Multi-byte Encoding Schemes Ÿ Fixed width – – – each character occupies a fixed number of bytes Faster text processing AL 16 UTF 8 Ÿ Variable width – – – one or more bytes to represent a single character Saves disk space (typically lots of disk space) UTF 8, AL 32 UTF 8 Ÿ Shift sensitive variable width – use control codes to differentiate single byte multi byte characters with the same code values www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 15

UTF 8 Byte Storage Different characters occupy 1, 2, 3 or 4 bytes www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 16

AL 32 UTF 8 Ÿ UTF 8 – – – Supports Unicode 3. 0 since 8. 1. 7. 4 Up to 3 bytes per character Supplemental characters Ÿ Pairs of 3 byte character codes Ÿ AL 32 UTF 8 – – Supports Unicode 3. 1 (latest version? ), since 9 i Up to 4 bytes per character Ÿ Supplemental characters www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 17

Confused? Ÿ Unicode, a set of characters Ÿ Character set, encoded set of characters Ÿ Encoding scheme, UTF 8, ISO standard for variable width encoding of Unicode character set Ÿ UTF 8, Oracle implementation of UTF 8 Ÿ If you’re not confused, you aren’t paying attention! www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 18

Changing Character Set Ÿ You can simply alter the database (right? ) Ÿ Only works if – – new character set is strict superset of existing character set For all characters in existing character set Ÿ All exist in new character set Ÿ All have exact same code in new character set Ÿ Example – – WE 8 MSWIN 1252 (superset, includes euro) WE 8 ISO 8859 P (subset) www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 19

Complexities Ÿ Even for the same character – Different encoding in different character set Ÿ Example – – – Latin (Western European) character á E 1 in WE 8 ISO 8859 P 1 C 391 in UTF 8 Ÿ If existing character not in new char set – ? (replacement character) displayed www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 20

Cure Ÿ Create new database – Using new character set Ÿ Extract data from old database Ÿ Insert data into new database Ÿ Export/import is most often used – Could use other methods Ÿ Extract data to flat files Ÿ SQL*Loader www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 21

Database Conversion Ÿ Serial – Upgrade source, export, drop schemas, import Ÿ Parallel – – – Create target Export source Import to target Ÿ Chose Parallel – Source still available after target in use Ÿ User tablespace issue for example www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 22

Impact of Unicode Ÿ Table columns must be widened Ÿ Existing column – – – Holds up to 20 Latin characters WE 8 ISO 8859 P 1, each Latin character 1 byte VARCHAR 2(20) Ÿ New column – – – UTF 8 Each Latin character occupies 2 bytes Need VARCHAR 2(40) www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 23

Impact of Unicode Ÿ Worst case – – – UTF 8 can have up to 4 bytes per character For all existing character columns Need to expand by 4 x Ÿ Disk space – – CHAR – 4 x disk space VARCHAR 2 – 1 x to 4 x Ÿ Depends on specific characters inserted www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 24

Impact of Unicode Ÿ Tables – – Columns must be wider Each character can be up to 4 bytes Ÿ Triggers, PL/SQL code – Modify to handle multi byte data Ÿ End user front end (browser) – Reconfigure to Ÿ Display multi byte data, accept multi byte data Ÿ All app components must handle Unicode www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 25

User Impact Ÿ VARCHAR 2, AL 32 UTF 8 – 4000 byte limit Ÿ How many characters can I enter? – – Latin, 2000 Japanese, 4000/3 Ÿ If moving from Japanese character set Ÿ 2 bytes per character Ÿ Max characters reduced by 1/3 – Supplemental characters, 1000 Ÿ Characters like ‘treble clef’ www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 26

Disk Space Ÿ How much multi byte data do you have? – – – We found all of ours Typically, 5 10% See 2) Multi byte data in the existing CRM db Ÿ Compute disk space requirement – – If you have 5% multi byte character data Need maximum of 20% more disk space Ÿ Will you add more multi byte data? – Once you have converted to Unicode… www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 27

Expanding Columns Ÿ Need to expand lots of columns – – Individual SQL statements Lots of SQL to generate Ÿ How to make Oracle do this for us? – – Export existing database New database has init. ora parameter Ÿ NLS_LENGTH_SEMANTICS = CHAR – Import into new database Ÿ All character columns widened as tables created VARCHAR(10) becomes VARCHAR(40) www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 28

Character Semantics – 9 i Ÿ Change column data types – – – VARCHAR 2(10 byte) VARCAHR 2(10 char) Requires SQL statement for each column Ÿ NLS_LENGTH_SEMANTICS – – – Init. ora parameter What happens if init. ora changed? BYTE or CHAR All character columns created with byte or char Handles PL/SQL code as well www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 29

The Siebel Process Ÿ Create target database Ÿ Shutdown app Ÿ Upgrade Oracle client Ÿ Source db character set Ÿ Run migrate. sh script Ÿ Full export source Ÿ Import to target db Ÿ Modify target db www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 30

Create target database Ÿ Oracle 9. 2. 0. 4 Ÿ Character set AL 32 UTF 8 Ÿ Character semantics CHAR Ÿ Tablespace names same as source db – 15% more space than source db Ÿ Locally managed, uniform 130 k Ÿ Auto UNDO, tablespace www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 31

Shutdown app Ÿ Shutdown various app servers Ÿ Shutdown source db Ÿ Cold backup Ÿ Upgrade source db to 9. 2. 0. 4 – Migrate 8. 1. 7. 4 to 9. 2. 0. 4 www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 32

Upgrade Oracle client Ÿ Upgrade Oracle client software to 9. 2. 0. 4 – – – For all machines that have SQL*Plus Upgrade to 9. 2. 0. 4 Install 9. 2. 0. 4 Ÿ Client install only – – Tar up 9. 2. 0. 4 client ORACLE_HOME ftp, untar on machines that need SQL*Plus www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 33

Source db character set Ÿ Fix any user tablespace issues – Import won’t fix them for you Ÿ Change source db character set – WE 8 MSWIN 1252 Ÿ Siebel requirement Ÿ Contains euro symbol Ÿ Is a strict superset of WE 8 ISO 8859 P 1 www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 34

Run migrate. sh script Ÿ Siebel supplied script – Generates various scripts Ÿ Expand. ksh Widen columns for Unicode Ÿ Impexp 06. ksh Import individual tables for large dbs We use full export/import instead Ÿ Run sun_expand. sql – Widen columns in tables outside Siebel schemas www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 35

Export Source, Import Target Ÿ Full export of source db – Source db is now 9. 2. 0. 4 Ÿ NLS_LANG AMERICAN_AMERICA. AL 32 UTF 8 Ÿ Import into target db – Target db created as 9. 2. 0. 4 Ÿ NLS_LANG AMERICAN_AMERICA. AL 32 UTF 8 www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 36

The conversion setup WE 8 ISO 8859 P 1 Source Db AL 32 UTF 8 Target Db WE 8 MSWIN 1252 Source Db WE 8 MSWIN 1252 import WE 8 MSWIN 1252 export www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 37

Modify target db Ÿ Run impexp 06. ksh – Handles sequences etc. Ÿ Run check_schema. sql – Find columns that didn’t get widened Ÿ Various changes on Siebel App side Ÿ Verify db links to Custdb databases www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 38

Conversion Complete? Ÿ Siebel process is done Ÿ Fix any data issues – – Multi byte character data in source db Convert properly to AL 32 UTF 8 Ÿ Testing Unicode changes – – GUI changes Performance Ÿ Unicode processing Ÿ Users accessing different data www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 39

Multi-byte Data In Source Db? Ÿ Source db is WE 8 ISO 8859 P 1 – – Single byte character set Doesn’t support multi byte characters Ÿ That’s the official story Ÿ The reality is somewhat different Ÿ What, if any multi byte data is in source db? – – – How to determine correct character set? How to find, how to fix? Japanese, Russian, others? www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 40

CRM Unicode Conversion Ÿ Three separate presentations – – – 1) The overall conversion process Ÿ What we had, what we wanted, how to get there Ÿ Issues that come up during conversion 2) Multi byte data in the existing CRM db Ÿ What’s the issue, how did it happen Ÿ A general method to find and fix this problem 3) The actual conversion Ÿ What really happened Ÿ Issues that came up and how they were resolved Ÿ Focus on DBA issues, not Siebel application www. brianhitchcock. net Brian Hitchcock September 15, 2004 DCSIT Technical Services DBA Page 41
- Slides: 41