Siebel CRM Unicode Conversion 2 The DBA Perspective
Siebel CRM Unicode Conversion 2 – The DBA Perspective Brian Hitchcock OCP 8, 8 i, 9 i DBA Sun Microsystems brian. hitchcock@sun. com brhora@aol. com www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 1
CRM Unicode Conversion Ÿ Three separate presentations – – – 1) The overall conversion process Ÿ What we had, what we wanted, how to get there Ÿ Issues that come up during conversion 2) Multi byte data in the existing CRM db Ÿ What’s the issue, how did it happen Ÿ A general method to find and fix this problem 3) The actual conversion Ÿ What really happened Ÿ Issues that came up and how they were resolved Ÿ Focus on DBA issues, not Siebel application www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 2
How Did I Get Involved? Ÿ Sleeping in a meeting… Ÿ Heard someone say – “We told the users to stop entering Japanese into the CRM system but we aren’t sure they stopped” Ÿ Woke up, said – – “I’ve done that before…” See “Case of the Missing Kanji” Ÿ Don’t wake up in meetings… www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 3
What’s The Issue? Ÿ Existing Siebel CRM system – – Oracle 8. 1. 7. 4 Single byte character set (WE 8 ISO 8859 P 1) Ÿ Interface systems – – Multi byte character set(s) (UTF 8) Handle data between single, multi byte apps Ÿ Want to convert to Unicode – – Siebel, database, interfaces all should be UTF 8 Eliminate interface systems www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 4
What We Had Amer 8859 P 1 Users Emea Apac UTF 8 Tcustdb Apac UTF 8 Custdb Apac UTF 8 Tcustdb Emea Siebel CRM 8859 P 1 Oracle Db UTF 8 Custdb Emea Custdb Amer WE 8 ISO 8859 P 1 Ordering System WE 8 ISO 8859 P 1 www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 5
What We Wanted UTF 8 Amer Users Emea Apac UTF 8 Custdb Emea Siebel CRM UTF 8 Oracle Db Custdb Amer WE 8 ISO 8859 P 1 Ordering System AL 32 UTF 8 www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 6
What We Wanted Ÿ All data in one database – – All languages Unicode Ÿ Eliminate interface systems – Reduce support costs Ÿ Support increased CRM functionality – – All data in one place Supports new business functionality www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 7
Multi-byte Data In Source Db? Ÿ Source db is WE 8 ISO 8859 P 1 – – Single byte character set Doesn’t support multi byte characters Ÿ That’s the official story Ÿ The reality is somewhat different Ÿ What, if any multi byte data is in source db? – – – How to determine correct character set? How to find, how to fix? Japanese, Chinese, others? www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 8
But Wait, There’s More… Ÿ Not just multi byte data to look for Ÿ Non p 1 character data also – – Non multi byte character data Could be WE P 1 (western European) Ÿ German, Italian, French etc. – Could be WE Pn Ÿ Polish, Greek, Russian etc. Ÿ How to find? www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 9
How Polish Was Handled Ÿ Use separate app that sends polish (P 2) to CRM database Ÿ Stored in P 1 db Ÿ Triggers move this polish data to TWCD Ÿ Triggers in TWCD – – Know that it’s polish (P 2) Convert to UTF 8 and send to WCD db Ÿ Therefore, multiple languages in Siebel P 1 db www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 10
What’s the Problem? Ÿ Character data from multiple languages – – Stored in oracle db Db configured for P 1 Ÿ P 1 supports multiple WE languages Ÿ Does not support polish, Russian, etc. Ÿ Need to find all such character data Ÿ Non p 1 can be – – Single byte (polish, Russian, etc. ) Multi byte (Japanese, Chinese, etc. ) www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 11
Single-byte Character Sets Ÿ All Pn (8859 1, 8859 2, etc. ) character sets – – Share same range of byte codes, 0 to 255 Above 0 x. A 1 (decimal 161) Ÿ Same byte codes represent different characters Ÿ Example – WE 8 ISO 8859 P 1 (8859 1) Ÿ Byte code 0 x. A 3 (decimal 163) is character £ – EE 8 ISO 8859 P 2 (8859 2) Ÿ Same byte code, 0 x. A 3 is character Ł www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 12
Finding Non-p 1 Char Data? Ÿ Logically – – Examine db design, Siebel docs, figure out which tables designed to store language specific (local language) data Some column (country code) in these tables to tell you which country data is from Determine correct character set for data from each country Convert these tables manually to AL 32 UTF 8 as part of overall Unicode conversion process www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 13
Not Good Ÿ Want general method – – No need to analyze the meaning of existing data Need automated way to find all non P 1 char data Ÿ Can’t do it – No general way to determine if char data is P 1 or P 2 or Pn Ÿ As shown before, byte code 0 xa 3 (decimal 163) Character £ in P 1 Character Ł in P 2 www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 14
Good Ÿ But, can find non ASCII data in general – And then find multi byte character data Ÿ Use separate approach to find non P 1 Ÿ Use PL/SQL code – – – Examine every table Examine every column that holds character data Determine which rows if any are ASCII Rows that aren’t ASCII are ‘suspect’ Identify tables that have any non ASCII character data www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 15
Why Look For ASCII? Ÿ Character data that is ASCII – – – Only 7 bits used to encode character 8 th bit of every byte is 0 For non ASCII, 8 th byte is set Ÿ WE 8 ISO 8859 Pn Ÿ Multi byte, Japanese, Chinese, etc. Ÿ By eliminating all tables that are ASCII – – No need to ask are they P 1, P 2, Pn or multi byte Greatly reduces the task www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 16
How To Find Non-ASCII? Ÿ Use SQL function convert – – – Convert a given column to ASCII character set Compare resulting string with original If original string is all ASCII Ÿ Will match converted string – If not a match Ÿ Column value is non ASCII Could be WE 8 ISO 8859 Pn Could be multi byte www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 17
Example Finding Non-ASCII àin WE 8 ISO 8859 P 1 database create table Psycho_Acircle (text VARCHAR 2(100)); insert into Psycho_Acircle values (chr(197)||'BCDE'); insert into Psycho_Acircle values ('ABCDE'); select * from Psycho_Acircle; TEXT ÅBCDE ABCDE select convert(text, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') from Psycho_Acircle; CONVERT(TEXT, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') ? BCDE ABCDE ÅBCDE is not the same as ? BCDE www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 18
Not Included Ÿ Did not scan – – LONG datatype columns CLOB datatype columns Ÿ Didn’t have any in schema – PL/SQL code in database Ÿ Dev team determined this wasn’t needed www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 19
Scripts Strategy Ÿ Eliminate as much as possible – – Identify all ASCII only tables Left with set of non ASCII tables Ÿ For remaining tables – – Find likely Japanese character data Verify it is Japanese Copy to separate table Remove from non ASCII tables Ÿ Repeat for other languages – How to identify byte patterns for each language? www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 20
PL/SQL scripts Ÿ Scripts used – – – – Scan_Table_1_Gen_Column_Info. sql Scan_Table_2_Gen_Nonascii_rows_Info. sql Scan_Table_3_Gen_Nonascii. Tables_No. Long. sql Scan_Table_4_Gen_Nonascii. Tables_Nonascii. Cols_Only. sql Scan_Table_5_Gen_Nonascii. Tables_Yes. Long. sql Scan_Table_6_Gen_NA_EUCJP_info_sql_col_info. sql Scan_Table_7_Gen_NA_EUCJP_Tables. sql Scan_Table_8_Gen_NA_EUCJP_2_rows_info. sql www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 21
Scripts Ÿ Each script generates table(s) – Output of each script stored in table(s) Ÿ Next script uses tables Ÿ Lots of intermediate data stored – – – Helped develop scripts Each script simpler Provided extra output for developers, analysts to help them verify results Ÿ Is this data really Japanese? www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 22
What Does Each Script Do? Ÿ Scan_Table_1_Gen_Column_Info. sql – – Scans all tables in a schema Creates two tables Ÿ Table_Gen_Info on all tables Ÿ Table_Column_Info on character columns Which contain any non ASCII strings Doesn’t include LONG columns Can’t use SQL functions on LONG datatype www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 23
What Does Each Script Do? Ÿ Scan_Table_2_Gen_Nonascii_rows_Info. sql – – – Use table Table_Column_Info Examine tables with non ASCII character data Creates two tables Ÿ Table_Non. Ascii_info Number of rows, columns with non ASCII data Ÿ Table_Non. Ascii_SQL to extract non ASCII data from each table Useful for developers, analysts to extract data from other environments www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 24
What Does Each Script Do? Ÿ Scan_Table_3_Gen_Nonascii. Tables_No. Long. sql – – – Use tables table_gen_info, table_nonascii_sql Create copies of tables that have non ASCII data Copies contain only the non ASCII rows Ÿ Have all character columns of original table Ÿ Helps identify which country data is from – Creates tables as select * from <tablename> Ÿ Doesn’t work on tables with LONG column Ÿ Tables named NONASCII_<tablename> www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 25
What Does Each Script Do? Ÿ Scan_Table_4_Gen_Nonascii. Tables_Nonasci i. Cols_Only. sql – – Similar to third (previous) script Table copies only contain columns that have non ASCII data Does handle tables with LONG column Creates tables of form NA_CO_<tablename> Ÿ Set of tables containing all non ASCII data in the schema www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 26
What Does Each Script Do? Ÿ Scan_Table_5_Gen_Nonascii. Tables_Yes. Lon g. sql – – Creates copies of tables having non ASCII data Copy tables have all char columns of base table Only copies tables that have LONG column Companion to third script Ÿ Deals with tables that have LONG column Ÿ Tables named NONASCII_<tablename> – Now have complete set of tables Ÿ Have all non ASCII char columns of base tables www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 27
Katakana, Hiragana? Ÿ How to find Japanese character data? – – – – Look at hex dump of character data and see lots of ¥_¥ and ¤_¤ The byte code of ¥ is A 4, ¤ is A 5 Many Japanese transliterated terms (company names) start with these bytes Typical of EUCJP character set Find rows that contain '%¥_¥%' or '%¤_¤%‘ repeated ¥ or ¤ means EUCJP more likely Verify that these rows are indeed Japanese www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 28
What Does Each Script Do? Ÿ Scan_Table_6_Gen_NA_EUCJP_info_sql_col _info. sql – – For table copies with non ASCII columns only Look for specific pattern of '%¥_¥%' Or '%¤_¤%‘ Creates tables Ÿ Table_NA_EUCJP_Info Ÿ Table_NA_EUCJP_SQL Ÿ Table_NA_EUCJP_COL_INFO www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 29
6 th Script Ÿ What does each table contain? – Table_NA_EUCJP_Info Ÿ Number of EUCJP rows in each non ASCII table – Table_NA_EUCJP_SQL Ÿ SQL to extract EUCJP rows – Table_NA_EUCJP_COL_INFO Ÿ Number of EUCJP rows in each column www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 30
What Does Each Script Do? Ÿ Scan_Table_7_Gen_NA_EUCJP_Tables. sql – Create two copies of each table that has EUCJP Ÿ Contain rows that have EUCJP Ÿ First table, all char columns Ÿ Second, only EUCJP columns – Tables created have names Ÿ EUCJP_<tablename> Ÿ ECUJP_CO_<tablename> www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 31
After 7 th Script Ÿ We have identified EUCJP rows – – In non ASCII tables Copied these rows to separate tables Ÿ Delete these rows from the non ASCII tables Ÿ As we identify rows from a specific char set – – Remove them from the non ASCII tables Smaller and smaller set of unknown rows www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 32
What Does Each Script Do? Ÿ Scan_Table_8_Gen_NA_EUCJP_2_rows_inf o. sql – – – Find rows containing ¥ or ¤ Could be Japanese Could be WE www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 33
Results Ÿ For each script – – – – Time to run Output %of total db that is non ASCII Demonstrates power of this approach No attempt to speed up Ÿ Only need to scan once, no need for speed Copy prod data to separate environment Run scripts there, develop the SQL to correctly convert the non ASCII data as needed Ÿ Apply to prod as part of Unicode conversion www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 34
Results Ÿ Scripts run against copy of production db Ÿ Database – – – 25 Gb total, but 13 Gb free space 12 Gb of actual data to scan (be skeptical when people tell you they support multi terabyte dbs, size of actual data counts) Ÿ Scripts create tables in the same schema they run in www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 35
Results Ÿ Script 1 – 2 hours – – – Scanned 12 Gb of data 2483 tables, 63138 columns Created two tables Ÿ Table_gen_info Ÿ Table_column_info www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 36
1 st Script Results SQL> select * from Table_Gen_Info where rownum <=10; TABLENAME NUMROWS NUMCOLS NUMCHARCOLS NUMCLOBCOLS NUMLONGCOLS --------------- -----------ACCNT_STAT 15775 5 3 0 0 1085497 7 6 0 0 AMER_AR_T 1060 3 2 0 0 APAC_AR_OWNER 2770 6 6 0 0 AR_ADMIN 5578 35 31 0 0 AR_CON 3573 22 17 0 0 AR_STAT 88652 7 5 0 0 AUDIT_TABLE 53301 29 26 0 0 CONT_CREATED 515126 2 2 0 0 CON_CREATED 184744 2 2 0 0 AMER_AR_OWNER www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 37
1 st Script Results SQL> select * from Table_Column_Info where rownum <=20; TABLENAME NUMROWS NUMCHARCOLS CHARCOLNUM CHARCOLNAME NUMNONASCIIROWS --------------- ------------ACCNT_STAT 15775 3 1 WCD 0 ACCNT_STAT 15775 3 2 STATUS 0 ACCNT_STAT 15775 3 3 R 4_STATUS 0 AR_ADMIN 5578 31 1 R 4_ID 0 AR_ADMIN 5578 31 2 R 4_SR_NUM 0 AR_ADMIN 5578 31 3 X_DESC . . 72 20 rows selected. SQL> www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 38
2 nd Script Results Ÿ 12 minutes – – 68 tables that have non ASCII char data 68 SQL statements Ÿ Overall – – – We have 12 Gb of data 68/2483 tables have any non ASCII char data Only 3% of the tables Ÿ But they’re some of the biggest tables Ÿ Schema analysis much easier on 68 tables www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 39
2 nd Script results SQL> select * from Table_Non. Ascii_Info where rownum <= 10; TABLENAME NUMROWS NUMNONASCIIROWS NUMCOLS NUMNONASCIICOLS --------------- ----------AR_ADMIN 5578 692 35 6 AR_CON 3573 107 22 3 AUDIT_TABLE 53301 17 29 1 CX_S_ADDR_ORG_XM 69470 275 19 5 C_ACCOUNT 17897 1114 20 1 6562 933 21 6 C_ADDRESS 25590 5490 28 6 C_AR 88638 3760 26 6 C_CONTACT 52574 10401 20 3 2139 119 25 4 C_ACT C_OPTY www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 40
2 nd Script Results SQL> select * from Table_Non. Ascii_SQL where rownum <= 10; TABLENAME LENGTHNONASCIISQL ---------------NONASCIISQL ---------------------------------------------------------------------------------------------------AR_ADMIN 445 select count(*) from AR_ADMIN where 1=0 or X_DESC != CONVERT (X_DESC, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') or LAST_NAME != CONVERT (LAST_NAME, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') or FST_NAME != CONVERT (FST_NAME, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') or ACCOUNT != CONVERT (ACCOUNT, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') or OWNER_LAST_NAME != CONVERT (OWNER_LAST_NAME, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') or R 3_CREATED_LAST_NAME != CONVERT (R 3_C REATED_LAST_NAME, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') AR_CON 233 select count(*) from AR_CON where 1=0 or OWNER_LAST != CONVERT (OWNER_LAST, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') or OWNER_FST != CONVERT (OWNER_FST, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') or R 3_X_NOTES != CONVERT (R 3_X_N OTES, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') AUDIT_TABLE 100 select count(*) from AUDIT_TABLE where 1=0 or FIELD 2 != CONVERT (FIELD 2, 'US 7 ASCII', 'WE 8 ISO 8859 P 1') www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 41
3 rd Script Results Ÿ 10 minutes – – Create copies of non ASCII tables Copies contain all character columns Ÿ LONG columns not included – Creates 65 tables SQL> select table_name from user_tables where table_name like 'NONASCII%' and table_name not like '%_ORIG ‘ and rownum <= 5; TABLE_NAME ---------------NONASCII_AR_ADMIN NONASCII_AR_CON NONASCII_AUDIT_TABLE NONASCII_CX_S_ADDR_ORG_XM NONASCII_C_ACCOUNT www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 42
4 th Script Results Ÿ 7 minutes – – – Create copies of non ASCII tables Copies contain only non ASCII columns Creates 68 tables SQL> select table_name from user_tables where table_name like 'NA_CO_% ‘ and rownum <= 5; TABLE_NAME ---------------NA_CO_AR_ADMIN NA_CO_AR_CON NA_CO_AUDIT_TABLE NA_CO_CX_S_ADDR_ORG_XM NA_CO_C_ACCOUNT www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 43
5 th Script Results Ÿ 1 minute – – Create copies of non ASCII tables Copies contain all character columns Ÿ LONG column included – Creates 3 tables Ÿ only 3 non ASCII tables have LONG column TABLE_NAME NONASCII_EIM_ACCNT_DTL NONASCII_EIM_OPTY_DTL NONASCII_S_CS_QUEST_LANG www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 44
6 th Script Results Ÿ 27 minutes – – Scan non ASCII tables Find '%¥_¥%' or '%¤_¤%‘ Very likely EUCJP character set Create three tables Ÿ Table_NA_EUCJP_Info (68 tables) Ÿ Table_NA_EUCJP_SQL (5 tables) Ÿ TABLE_NA_EUCJP_COL_INFO (213 columns) – 5 tables have EUCJP character data www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 45
6 th Script Results SQL> select * from Table_NA_EUCJP_Info where rownum <= 10; TABLENAME NUM_NONASCII_ROWS NUM_NA_EUCJP_ROWS NUM_NONASCII_COLS NUM_NA_EUCJP_COLS ----------------- ---------NA_CO_AR_ADMIN 5578 9 6 1 NA_CO_AR_CON 3573 4 3 1 NA_CO_AUDIT_TABLE 53301 0 NA_CO_CX_S_ADDR_ORG_XM 69470 0 5 0 NA_CO_C_ACCOUNT 17897 0 1 0 6562 0 6 0 NA_CO_C_ADDRESS 25590 0 6 0 NA_CO_C_AR 88638 0 6 0 NA_CO_C_CONTACT 52574 0 3 0 2139 0 4 0 NA_CO_C_ACT NA_CO_C_OPTY www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 46
6 th Script Results SQL> select * from Table_NA_EUCJP_SQL; TABLENAME LEN_NA_EUCJP_SQL --------NA_EUCJP_SQL ----------------------------------------------------NA_CO_AR_ADMIN 91 select count(*) from NA_CO_AR_ADMIN where 1=0 or X_DESC like '%¥_¥%' or X_DESC like '%¤_¤%' NA_CO_AR_CON 97 select count(*) from NA_CO_AR_CON where 1=0 or R 3_X_NOTES like '%¥_¥%' or R 3_X_NOTES like '%¤_¤%' NA_CO_S_ADDR_ORG 97 select count(*) from NA_CO_S_ADDR_ORG where 1=0 or COMMENTS like '%¥_¥%' or COMMENTS like '%¤_¤%' NA_CO_S_CONTACT 142 select count(*) from NA_CO_S_CONTACT where 1=0 or COMMENTS like '%¥_¥%' or COMMENTS like '%¤_¤%' or X_DEPT like '%¥_¥%' or X_DEPT like '%¤_¤%' NA_CO_S_SRV_REQ 200 select count(*) from NA_CO_S_SRV_REQ where 1=0 or X_NOTES like '%¥_¥%' or X_NOTES like '%¤_¤%' or X_DESC like '%¥_¥%' or X_DESC like '%¤_¤%' or X_EMAIL_NOTES like '%¥_¥%' or X_EMAIL_NOTES like '%¤_¤%' www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 47
6 th Script Results SQL> select * from TABLE_NA_EUCJP_COL_INFO where rownum <=10; TABLENAME NUMNONASCIIROWS NUMNACOLS NACOLNUM NAEUCJPCOLNAME NUMNAEUCJPROWS ---------- ---------------NA_CO_AR_ADMIN 5578 6 1 X_DESC 9 NA_CO_AR_ADMIN 5578 6 2 LAST_NAME 0 NA_CO_AR_ADMIN 5578 6 3 FST_NAME 0 NA_CO_AR_ADMIN 5578 6 4 ACCOUNT 0 NA_CO_AR_ADMIN 5578 6 5 OWNER_LAST_NAME 0 NA_CO_AR_ADMIN 5578 6 6 R 3_CREATED_LAST_NAME 0 NA_CO_AR_CON 3573 3 1 OWNER_LAST 0 NA_CO_AR_CON 3573 3 2 OWNER_FST 0 NA_CO_AR_CON 3573 3 3 R 3_X_NOTES 4 53301 1 1 FIELD 2 0 NA_CO_AUDIT_TABLE www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 48
7 th Script Results Ÿ 6 minutes – – Create two copies of each EUCJP tables First copy has all character columns of table Second copy has only the EUCJP columns Tables named Ÿ EUCJP_<tablename> Ÿ EUCJP_CO_<tablename> www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 49
7 th Script Results SQL> select table_name from user_tables where table_name like 'EUCJP_%' minus select 2 table_name from user_tables where table_name like 'EUCJP_CO_%'; TABLE_NAME ---------------EUCJP_AR_ADMIN EUCJP_AR_CON EUCJP_S_ADDR_ORG EUCJP_S_CONTACT EUCJP_S_SRV_REQ SQL> select table_name from user_tables where table_name like 'EUCJP_CO_%'; TABLE_NAME ---------------EUCJP_CO_AR_ADMIN EUCJP_CO_AR_CON EUCJP_CO_S_ADDR_ORG EUCJP_CO_S_CONTACT EUCJP_CO_S_SRV_REQ www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 50
7 th Script Results Ÿ EUCJP rows selected Ÿ Reviewed by dev team – EUCJP of all rows verified Ÿ Make copies of these tables for reference Ÿ Delete the EUCJP rows from the non ASCII tables Ÿ Further scanning of the non ASCII tables won’t consider the EUCJP rows www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 51
8 th Script Results Ÿ 47 minutes – – – Scan non ASCII tables (again) Find '%¥%' or '%¤%‘ Could be EUCJP character set Ÿ Could also be WE character data – Create three tables Ÿ Table_NA_EUCJP_2_Info Ÿ Table_NA_EUCJP_2_SQL Ÿ TABLE_NA_EUCJP_2_COL_INFO – 3 tables have EUCJP character data www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 52
8 th Script Results Ÿ Possible EUCJP rows selected Ÿ Reviewed by dev team – EUCJP of all rows verified Ÿ Make copies of these tables for reference Ÿ Delete these EUCJP rows from the non ASCII tables Ÿ Further scanning of the non ASCII tables won’t consider these EUCJP rows www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 53
Next Steps Ÿ What I had planned Ÿ With the EUCJP rows verified and removed Ÿ Scan non ASCII tables (yet again) Ÿ Look for 8859 Pn character data – – – How? WE languages, single isolated 8 bit byte code with ASCII (7 bit) byte codes on either side Example: Bücher www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 54
Next Steps Ÿ Select likely WE rows from non ASCII tables – – Review with dev team Determine source country for each row Ÿ Schema has ‘country code’ Ÿ Select each row using character set of country – – Verify rows with fluent speaker for each country Remove rows from non ASCII tables as verified Ÿ What to do with remaining rows – Not sure… www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 55
What Really Happened? Ÿ After 8 scripts Ÿ Dev team was able to – – – Identify likely country for each non ASCII row I identified likely character set for each country I selected rows for each country Ÿ Using identified character set – Fluent speaker from each country verified Ÿ Rows as selected were correct – Wrote SQL to correctly convert rows to Unicode www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 56
Conversion Ÿ How to convert non ASCII rows to Unicode? – New db uses AL 32 UTF 8 character set Ÿ With correct character set identified Ÿ After importing into new 9 i database – – – Convert back to WE 8 MSWIN 1252 Convert to AL 32 UTF 8 Example: Ÿ UPDATE <tablename> SET <column> = CONVERT (<column>, WE 8 MSWIN 1252, AL 32 UTF 8); Ÿ UPDATE <tablename> SET <column> = CONVERT (<column>, AL 32 UTF 8, <charset>); www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 57
Script Summary Ÿ 8 scripts, scanning 12 Gb of data – Run times Ÿ 2 hours Ÿ 12 minutes Ÿ 10 minutes Ÿ 7 minutes Ÿ 1 minute Ÿ 27 minutes Ÿ 6 minutes Ÿ 47 minutes Ÿ Total run time – 230 minutes, about 4 hours – Very slow development machine www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 58
Conclusions Ÿ For character set conversion – – From any 8 bit character set (WE 8 ISO 8859 Pn) To Unicode Accept that some of the existing data may not be in the database character set Don’t assume, verify Ÿ Use PL/SQL scripts, identify non ASCII character data Ÿ Decide how to evaluate the non ASCII data Ÿ Document, test, communicate – Make sure everyone knows how data from each character set is identified www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 59
Books Used Ÿ Oracle PL/SQL By Example – – Rozenzweig, Silvestrova Prentice Hall 2004 I needed lots of examples Ÿ multiple nested cursors – Needed to get going fast Ÿ Got help from experienced PL/SQL developer – – Quotes issue Even they couldn’t explain why the specific number of quotes works…but it did www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 60
CRM Unicode Conversion Ÿ Three separate presentations – – – 1) The overall conversion process Ÿ What we had, what we wanted, how to get there Ÿ Issues that come up during conversion 2) Multi byte data in the existing CRM db Ÿ What’s the issue, how did it happen Ÿ A general method to find and fix this problem 3) The actual conversion Ÿ What really happened Ÿ Issues that came up and how they were resolved Ÿ Focus on DBA issues, not Siebel application www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 61
PL/SQL Notes Ÿ Quotes of quotes – – – Hard to know how many you need Experiment Test Ÿ PL/SQL that generates SQL that contains quoted strings Ÿ Keep it simple Ÿ Break up the task into multiple scripts Ÿ Generate tables of results, next script uses table(s) as input – Tables provide documentation of intermediate results www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 62
PL/SQL Notes Ÿ Second script – – Looping to build up select SQL Selects data from all non ASCII columns Ÿ Initial select SQL has to be – – – Non. Ascii. SQL_stmt : = 'select count(*) from '||Table. Name||' where 1=0 Subsequent SQL of form Non. Ascii. SQL_stmt : = Non. Ascii. SQL_stmt||' or '||Table. Char. Col. Name|| Needed ‘where 1=0 so we could append further OR clauses www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 63
PL/SQL Notes Ÿ LONG datatype – Third script created tables as select * from Ÿ Can’t do this when table has LONG column – Fourth script create tables by building up the create table SQL one column at a time Ÿ Skip the LONG column, if present in base table www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 64
PL/SQL Notes Ÿ DBMS_OUTPUT limitations – – Only works for so long Has limit of 1 M characters Ÿ Scripts are not commercial grade – Testing statements are left in Ÿ Commented out – – – No error trapping Still development scripts They work, but they aren’t pretty www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 65
PL/SQL Notes Ÿ Scripts setup to – – Run in SQL*Plus user’s schema Output tables created in user’s schema Ÿ Could easily change scripts – – Store output tables in separate schema Take a schema as input Ÿ Scan tables in specified schema www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 66
PL/SQL Script Example Ÿ Show PL/SQL of first script – – – Cursors with definitions that depend on loop variable of outer loop Quotes and more quotes Generating insert statements that are inserting strings of SQL www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 67
6 th Script Text serveroutput on size 1000000; Sql_stmt VARCHAR 2(4000); declare Sql_stmt 2 VARCHAR 2(4000) : = 'COMMIT'; cursor C_Euc. Jp. Tab. Names is NA_EUCJP_SQL_stmt VARCHAR 2(4000); select table_name from user_tables NA_EUCJP_SQL_stmt_insert VARCHAR 2(4000); where table_name like 'NA_CO_%'; NACol. Count NUMBER; cursor C_Euc. Jp. Tab. Cols (i_table_name varchar 2) is BEGIN select column_name from user_tab_columns where table_name = i_table_name dbms_output. disable; order by column_id; Sql_stmt : = 'create table Table_NA_EUCJP_Info Table. Name VARCHAR 2(100); (Table. Name VARCHAR 2(30), Table. Row. Count NUMBER; NUM_NONASCII_ROWS NUMBER, Col. Count NUMBER; NUM_NA_EUCJP_ROWS NUMBER, Table. Char. Col. Name VARCHAR(100); NUM_NONASCII_COLS NUMBER, Num. Ascii. Plus. Non NUMBER; NUM_NA_EUCJP_COLS NUMBER)'; execute immediate Sql_stmt; Table. Char. Col. Num NUMBER; Num_NA_EUCJP_Rows NUMBER; Sql_stmt : = 'create table Table_NA_EUCJP_SQL Tab. Num_NA_EUCJP_Rows NUMBER; (Table. Name VARCHAR 2(30), Len_NA_EUCJP_SQL_stmt NUMBER; Len_NA_EUCJP_SQL NUMBER, Tab. Num_NA_EUCJP_Cols NUMBER; NA_EUCJP_SQL VARCHAR 2(4000))'; Cur. Num_NA_EUCJP_Cols NUMBER; execute immediate Sql_stmt; www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 68
6 th Script Text Sql_stmt : = 'create table Table_NA_EUCJP_Col_Info (Table. Name VARCHAR 2(30), Table. Char. Col. Num : = 0; Cur. Num_NA_EUCJP_Cols : = 0; NUMNONASCIIROWS NUMBER, NUMNACOLS NUMBER, NACOLNUM NUMBER, NAEUCJPCOLNAME VARCHAR 2(30), NUMNAEUCJPROWS NUMBER)'; execute immediate Sql_stmt; open C_Euc. Jp. Tab. Cols (Table. Name); LOOP FETCH C_Euc. Jp. Tab. Cols into Table. Char. Col. Name; Exit when C_Euc. Jp. Tab. Cols%NOTFOUND; dbms_output. put_line('This is column '||Table. Char. Col. Name); Table. Char. Col. Num : = Table. Char. Col. Num + 1; open C_Euc. Jp. Tab. Names; compute the number of EUCJP rows for this column. . . LOOP FETCH C_Euc. Jp. Tab. Names into Table. Name; Exit when C_Euc. Jp. Tab. Names%NOTFOUND; execute immediate 'select count(*) from '||Table. Name|| ' where '||Table. Char. Col. Name||' like ''%¥_¥%'' or ' ||Table. Char. Col. Name||' like ''%¤_¤%''' into Num_NA_EUCJP_Rows; NA_EUCJP_SQL_stmt : = 'select count(*) from '||Table. Name||' where 1=0'; NA_EUCJP_SQL_stmt_insert : = '''select count(*) from '||Table. Name||' where 1=0'; execute immediate 'select count(*) from user_tab_columns where table_name = ''' || Table. Name || '''' into NACol. Count; dbms_output. put_line('here is the NA_EUCJP_SQL_stmt_insert '); dbms_output. put_line(SUBSTR(''||NA_EUCJP_SQL_stmt_insert||'', 1, 255)); dbms_output. put_line('This column has '||Num_NA_EUCJP_Rows||' NA_EUCJP_ rows'); IF Num_NA_EUCJP_Rows != 0 THEN NA_EUCJP_SQL_stmt : = NA_EUCJP_SQL_stmt||' or '||Table. Char. Col. Name|| ' like ''%¥_¥%'' or '||Table. Char. Col. Name||' like ''%¤_¤%'''; dbms_output. put_line('table name is '||Table. Name); NA_EUCJP_SQL_stmt_insert : = NA_EUCJP_SQL_stmt_insert||' or '||Table. Char. Col. Name|| execute immediate 'select count(*) from '||Table. Name into Table. Row. Count; ' like ''''%¥_¥%'''' or '||Table. Char. Col. Name||' like ''''%¤_¤%'''''; www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 69
6 th Script Text Cur. Num_NA_EUCJP_Cols : = Cur. Num_NA_EUCJP_Cols + 1; dbms_output. put_line('This is NA_EUCJP_Column number '||Cur. Num_NA_EUCJP_Cols); dbms_output. put_line('here is Cur. Num_NA_EUCJP_Cols'); dbms_output. put_line(Cur. Num_NA_EUCJP_Cols); dbms_output. put_line('SQL statement appended. . . '); END IF; insert column info. . . Dummy_col_count : = 999; Sql_stmt : = 'insert into Table_NA_EUCJP_Col_Info values ('''||Table. Name||''', '||Table. Row. Count|| ', '||NACol. Count||', '||Table. Char. Col. Num||', '''||Table. Char. Col. Name||''', '||Num_NA_EUCJP_Rows||')'; execute immediate Sql_stmt; dbms_output. put_line('Column info insert completed. . . '); End Loop; NA_EUCJP_SQL_stmt_insert : = NA_EUCJP_SQL_stmt_insert||''''; dbms_output. put_line('here is the NA_EUCJP_SQL_stmt_insert '); dbms_output. put_line(SUBSTR(''||NA_EUCJP_SQL_stmt_insert||'', 1, 255)); www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 70
6 th Script Text Tab. Num_NA_EUCJP_Cols: = Cur. Num_NA_EUCJP_Cols; dbms_output. put_line('here is Tab. Num_NA_EUCJP_Cols'); dbms_output. put_line(Tab. Num_NA_EUCJP_Cols); update number of NAEUCJP columns. . . Sql_stmt : = 'update Table_NA_EUCJP_Col_Info set NUMNAEUCJPCOLS = Tab. Num_NA_EUCJP_Cols where Table. Name = '''||Table. Name||''; execute immediate Sql_stmt; dbms_output. put_line('Number of NAEUCJP columns updated. . . '); Close C_Euc. Jp. Tab. Cols; Len_NA_EUCJP_SQL_stmt : = LENGTH (NA_EUCJP_SQL_stmt); dbms_output. put_line('Length of NA_EUCJP_SQL stmt '||Len_NA_EUCJP_SQL_stmt); dbms_output. put_line('here is the NA_EUCJP_SQL_stmt'); dbms_output. put_line(SUBSTR(''||NA_EUCJP_SQL_stmt||'', 1, 255)); this has already been done above. . . execute immediate 'select count(*) from '||Table. Name into Table. Row. Count; execute immediate 'select count(*) from user_tab_columns where table_name = ''' || Table. Name || '''' into Col. Count; www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 71
6 th Script Text NA_EUCJP_SQL_stmt : = 'testing'; Tab. Num_NA_EUCJP_Rows : = 0; execute immediate NA_EUCJP_SQL_stmt into Tab. Num_NA_EUCJP_Rows; dbms_output. put_line('Number of NA_EUCJP_ rows. . . '||Tab. Num_NA_EUCJP_Rows); Len_NA_EUCJP_SQL_stmt : = 0; dbms_output. put_line('Num rows in the table '||Table. Row. Count); dbms_output. put_line('Num columns in the table '||Col. Count); dbms_output. put_line('Length of NA_EUCJP_SQL stmt '||Len_NA_EUCJP_SQL_stmt); dbms_output. put_line('Num NAEUCJP_ Rows '||Tab. Num_NA_EUCJP_Rows); dbms_output. put_line('Num NAEUCJP_ Columns '||Tab. Num_NA_EUCJP_Cols); Sql_stmt : = 'insert into Table_NA_EUCJP_Info values ('''||Table. Name||''', '||Table. Row. Count|| ', '||Tab. Num_NA_EUCJP_Rows||', '||Col. Count||', '||Tab. Num_NA_EUCJP_Cols||')'; execute immediate Sql_stmt; dbms_output. put_line('First insert completed. . . '); If number of EUCJP rows is non zero, insert select SQL into SQL table IF Tab. Num_NA_EUCJP_Rows != 0 THEN Sql_stmt : = 'insert into Table_NA_EUCJP_SQL values ('''||Table. Name||''', '||Len_NA_EUCJP_SQL_stmt|| ', '||NA_EUCJP_SQL_stmt_insert||')'; execute immediate Sql_stmt; dbms_output. put_line('Second insert completed. . . '); www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 72
6 th Script Text End If; execute immediate Sql_stmt 2; End Loop; Close C_Euc. Jp. Tab. Names; End; / www. brianhitchcock. net Brian Hitchcock November 11, 2004 DCSIT Technical Services DBA Page 73
- Slides: 73