NAAM Oracle Character sets Aino Andriessen 1 Demo























- Slides: 23
NAAM Oracle Character sets Aino Andriessen 1
Demo 1 2
nls_length_semantics Intializatie parameter CHAR of BYTE (default) Van toepassing op multi byte character sets Definieert het type voor de lengte van character kolommen en variabelen alter session set nls_length_semantics=CHAR; § niet met terugwerkende kracht § ev pl/sql recompile § alter system 4
nls_length_semantics 2 lengte van karakter kolommen en variabelen expliciet opgeven § create table demo (naam varchar 2(4 char)) § create table demo (naam varchar 2(4 byte)) § t_naam varchar 2(4 char); § t_naam demo 2. naam%TYPE 5
Demo 2 6
Character encoding 8
Character set Character set definieert de 'mapping' tussen binary/headecimale code en het character § § § § UTF 8 WE 8 MSWIN 1252 WE 8 ISO 8859 P 1 JA 16 EUC US 7 ASCII WE 8 DEC. . . Code pages § IBM / windows terminologie § ~ analoog met character set § code page per language 9
Character sets 2 ASCII § 1 byte § 128 karakters § standaard letters uit het engels zonder accenten ISO 8859 en latin-1 § 1 byte (8 bit) § 256 karakters CP-1252 § Windows variant op latin 1 UTF 8 § variabel, multibyte § max 4 bytes § ~100000 karakters • ~1 miljoen beschikbaar § meertalig § ascii codes zijn gelijk 10
Voorbeelden Character Set Hexadecimale code - Euro AL 32 UTF 8 E 282 AC WE 8 MSWIN 1252 80 ASCII - WE 8 ISO 8859 P 15 164 Character Set Hexadecimale code - é AL 32 UTF 8 C 3 A 9 (50089) WE 8 MSWIN 1252 E 9 (233) ASCII - WE 8 ISO 8859 P 1 E 9 WE 8 ISO 8859 P 15 E 9 11
Unicode / UTF 8 example The image shows the number of bytes needed to store different kinds of characters in the UTF-8 character set. The ASCII characters (C, t, and d) require one byte. The Latin and Greek characters (á, ö, and Ø) require 2 bytes. The Asian character requires 3 bytes. The supplementary character (treble clef sign) requires 4 bytes of storage. 12
Diakrieten en speciale tekens Diakrieten zijn accenten die bij (boven, onder of zelfs door) een letter gezet worden om de uitspraak van een letter te veranderen en daarmee taaleigen klanken van een (gewijzigde) letter te voorzien. § àÿęňĜş etc. Speciale tekens § ßæ¿ 13
Diakrieten en speciale tekens Single byte character sets § 1 byte voor samengesteld karakter § Niet alle combinaties mogelijk § code pages UTF-8 § diakriet heeft eigen codering § samengesteld karakter heeft eigen codering • meestal (altijd) samenstelling van oorspronkelijke karakter + diakriet 14
Database functies Character functies § § § substr - substrb - substrc - substr 2 instr -. . . length - lengthb chr (n) § § Returns a character corresponding to the number passed in as the argument in the database character set select chr (50089) from dual; dump § § Returns a VARCHAR 2 value containing the datatype code, length in bytes, and internal representation of expr. The returned result is always in the database character set. select dump (naam, 1017) from demo 2; convert § Converts a character string from one character set to another utl_raw § select utl_raw. cast_to_raw(naam) from demo 2; unistr() § § Converts the characters in x to the national language character set select (unistr('Ren 0 e 9')) from dual; 15
Demo 3 16
nls_lang Client character set When the client NLS_LANG character set is set to the same value as the database character set, Oracle assumes that the data being sent or received are of the same (correct) encoding, so no conversions or validations may occur for performance reasons. The data is just stored as delivered by the client, bit by bit. 18
nls lang 2 language_country. character set § american_america. UTF 8 § dutch_the netherlands. WE 8 MSWIN 1252 § american_THE NETHERLANDS. WE 8 MSWIN 1252 Environment variable, nls_lang Verschil in Windows GUI (WE 8 MSWIN 1252) en command line (WE 8 PC 850) Wordt niet door Java clients gebruikt 19
Demo 4 20
National character set Support for another character set next to the database character set e. g to allow japanese in a MSWIN 1252 or ISO 8859 character set Less necessary in a UTF 8 database Multibyte nvarchar, nclob etc. 22
Case TELETEX karakterset § bestaat niet meer in Oracle select convert(naam, ’TELETEX’, ’UTF 8’) from tabel; Locale builder 23
sql> select name from emp@db sql> select utl_raw. cast_to_varchar (utl_raw. cast_to_raw (name)) from emp@db sql> select utl_raw. cast_to_varchar (utl_raw. cast_to_raw@db (name)) from emp@db 25
Vraag Diacrietloos zoeken Case insensitive zoeken Oracle Intermedia 26
Summary nls_lenght_semantics Always explicitly define a character column with its type (CHAR or BYTE) Oracle performs automatic character set conversion § wysinawyg Use a Java client Working with character sets can be confusing UTF 8 is often the preferred character set 27
Referenties Unicode en Ultraedit § http: //www. ultraedit. com/support/tutorials_power_tips/ultr aedit/unicode. html nls_lang § http: //www. oracle. com/technology/tech/globalization/htdo cs/nls_lang%20 faq. htm Oracle globalization support § http: //download. oracle. com/docs/cd/B 28359_01/server. 1 11/b 28298/toc. htm Wikipedia 28