Open World 2003 Advanced PLSQL and Oracle ETL

  • Slides: 41
Download presentation
Open World 2003 Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA Sage.

Open World 2003 Advanced PL/SQL and Oracle ETL Doug Cosman Senior Oracle DBA Sage. Logix, Inc. doug_cosman@yahoo. com www. Sage. Logix. com

Agenda Overview of Oracle 9 i ETL Provides Fast Transformations Using Only the 9

Agenda Overview of Oracle 9 i ETL Provides Fast Transformations Using Only the 9 i DB Advanced PL/SQL Features Necessary for Understanding Oracle 9 i ETL PL/SQL Performance Techniques for Data Warehouse Environments www. Sage. Logix. com

What Is ETL? Extract Pull the Data From the Source Transform Convert the Input

What Is ETL? Extract Pull the Data From the Source Transform Convert the Input Format to the Target Format Encode any Values Load Insert the Transformed Data to the Target Tables www. Sage. Logix. com

Oracle 9 i ETL Extract Oracle 9 i External Tables Transform PL/SQL Pipelined Table

Oracle 9 i ETL Extract Oracle 9 i External Tables Transform PL/SQL Pipelined Table Functions Oracle Warehouse Builder Can also be Used to Build Pipelined Table Functions Maps Source Data Layout and Target Schema and Builds PL/SQL and SQL Code Load Direct Path Inserts www. Sage. Logix. com

Performance Factors SQL Execution Time Efficiency of Execution Plan Hardware Resource Waits Code Logic

Performance Factors SQL Execution Time Efficiency of Execution Plan Hardware Resource Waits Code Logic Execution Time Speed of Host Language Variable Binding Time to Bind Values Back to Host Language www. Sage. Logix. com

PL/SQL Binding Types of Binds IN-Binds Bind Values From Host Language to SQL Engine

PL/SQL Binding Types of Binds IN-Binds Bind Values From Host Language to SQL Engine OUT-Binds Values are Returned from SQL Objects to Host Variables Bind Options Single Row Binds Bulk Binds www. Sage. Logix. com

Single Row Binds Cursor FOR-LOOP DECLARE CURSOR cust_cur (p_customer_id NUMBER) IS SELECT * FROM

Single Row Binds Cursor FOR-LOOP DECLARE CURSOR cust_cur (p_customer_id NUMBER) IS SELECT * FROM f_sales_detail WHERE customer_id = p_customer_id; v_customer_id NUMBER : = 1234; BEGIN FOR rec IN cust_cur (v_customer_id) LOOP INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, rec. sales_id, sysdate); END LOOP; END; www. Sage. Logix. com

Context Switching OUT-BIND DB SQL Engine PL/SQL Engine IN-BIND www. Sage. Logix. com

Context Switching OUT-BIND DB SQL Engine PL/SQL Engine IN-BIND www. Sage. Logix. com

Single Row Binds The Most Expensive Operation by Far is the Binding Single Row

Single Row Binds The Most Expensive Operation by Far is the Binding Single Row Binding is SLOW for Large Result Sets www. Sage. Logix. com

Bulk Binding PL/SQL Bulk Bind Support added in 8 i IN-Binds An Array of

Bulk Binding PL/SQL Bulk Bind Support added in 8 i IN-Binds An Array of Values is Passed to the SQL Engine OUT-Binds SQL Engine Populates a PL/SQL Bind-Array Context Switch Once per Batch Instead of Once per Row Performance Increase of Up to 15 Times www. Sage. Logix. com

Bulk Operators BULK COLLECT Specifies that Bulk Fetches Should be Used Be Careful to

Bulk Operators BULK COLLECT Specifies that Bulk Fetches Should be Used Be Careful to Handle Last Batch LIMIT Defines the Batch Size for Bulk Collections FORALL Bulk DML Operator Not a Looping Construct like a Cursor-For-Loop PL/SQL Table is Referenced in the Statement www. Sage. Logix. com

DECLARE TYPE sales_t IS TABLE OF f_sales_detail. sales_id%TYPE INDEX BY BINARY_INTEGER; sales_ids sales_t; v_customer_id

DECLARE TYPE sales_t IS TABLE OF f_sales_detail. sales_id%TYPE INDEX BY BINARY_INTEGER; sales_ids sales_t; v_customer_id NUMBER : = 1234; max_rows CONSTANT NUMBER : = 10000; CURSOR sales(p_customer_id NUMBER) IS SELECT sales_id FROM f_sales_detail WHERE customer_id = p_customer_id; BEGIN OPEN sales(v_customer_id); LOOP EXIT WHEN sales%NOTFOUND; FETCH sales BULK COLLECT INTO sales_ids LIMIT max_rows; FORALL i IN 1. . sales_ids. COUNT INSERT INTO sales_hist (customer_id, detail_id, process_date) VALUES (v_customer_id, sales_ids(i), sysdate); END LOOP; CLOSE sales; END; www. Sage. Logix. com

Native Compilation Allows PL/SQL to be Executed as a Compiled C Program Requires Native

Native Compilation Allows PL/SQL to be Executed as a Compiled C Program Requires Native C Compiler on Host Enabling Set init. ora PLSQL_* Parameters Compile as Native Code PL/SQL is First Compiled Down to P-Code C Source Code is Generated from the P-Code Native Compiler is Invoked Creating a ‘C’ Shared Object Library Subsequent Calls to PL/SQL Object are Run by the ‘C’ Library www. Sage. Logix. com

Native Compilation Performance Language Execution Speed is About Five Times Faster when not Interacting

Native Compilation Performance Language Execution Speed is About Five Times Faster when not Interacting with the Database In Typical Code Interacting with Larger Data Volumes Execution Speed is Very Similar to Interpreted Code Remember that Variable Binding can be a Bigger Factor than Code Execution Speed Mixing Native and Interpreted PL/SQL Oracle Recommends an All or None Approach for Production Including User-Defined and Supplied Packages www. Sage. Logix. com

Collection Types Associative Arrays (PL/SQL Tables) PL/SQL Type Only Nested Tables Shared Type Varrays

Collection Types Associative Arrays (PL/SQL Tables) PL/SQL Type Only Nested Tables Shared Type Varrays Shared Type www. Sage. Logix. com

Associative Arrays PL/SQL Type Only Not a SQL Type Easy to Use Automatic Element

Associative Arrays PL/SQL Type Only Not a SQL Type Easy to Use Automatic Element Allocation No Need to Initialize Two Kinds in 9 i Release 2 INDEX BY BINARY_INTEGER INDEX BY VARCHAR 2 Similar to: Java Hashtables Perl and Awk Associative Arrays www. Sage. Logix. com

Associative Arrays DECLARE TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR 2(30); email_map

Associative Arrays DECLARE TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR 2(30); email_map hash_table_t; CURSOR users IS SELECT username, user_id FROM dba_users; BEGIN FOR user IN users LOOP email_map(username) : = user_id; END LOOP; END; www. Sage. Logix. com

Multi-Dimensional Arrays New in 9 i Release 1 Implemented as Collection of Collections DECLARE

Multi-Dimensional Arrays New in 9 i Release 1 Implemented as Collection of Collections DECLARE TYPE element IS TABLE OF NUMBER INDEX BY BINARY_INTEGER; TYPE two. Dimensional IS TABLE OF element INDEX BY BINARY_INTEGER; two. Dimensional; BEGIN two. D(1)(1) : = 123; two. D(1)(2) : = 456; END; www. Sage. Logix. com

Nested Tables No Maximum Size Harder to Use than Associative Arrays Need to be

Nested Tables No Maximum Size Harder to Use than Associative Arrays Need to be Initialized Code Must Explicitly Allocate New Elements Shared Type with SQL Two Options for Type Definition Local PL/SQL Definition Global SQL Type Declared in the Database Allows Variables to be Shared Between Both Environments www. Sage. Logix. com

Nested Tables PL/SQL Scoped Type DECLARE TYPE nest_tab_t IS TABLE OF NUMBER; nt nest_tab_t

Nested Tables PL/SQL Scoped Type DECLARE TYPE nest_tab_t IS TABLE OF NUMBER; nt nest_tab_t : = nest_tab_t(); BEGIN FOR i IN 1. . 100 LOOP nt. EXTEND; nt(i) : = i; END LOOP; END; www. Sage. Logix. com

Nested Tables Globally Defined in SQL CREATE OR REPLACE TYPE email_demo_obj_t AS OBJECT (

Nested Tables Globally Defined in SQL CREATE OR REPLACE TYPE email_demo_obj_t AS OBJECT ( email_id NUMBER, demo_code NUMBER, value VARCHAR 2(30) ); / CREATE OR REPLACE TYPE email_demo_nt_t AS TABLE OF email_demo_obj_t; / www. Sage. Logix. com

Nested Tables SQL-Defined Nested Tables PL/SQL Variables can be Manipulated by the SQL Engine

Nested Tables SQL-Defined Nested Tables PL/SQL Variables can be Manipulated by the SQL Engine Local PL/SQL Variables Can Be: Sorted Aggregated Used for Dynamic In-Lists Joined With SQL Tables Joined with Other PL/SQL Nested Tables www. Sage. Logix. com

Table Functions Nested Tables Enable Table Functions SELECT * FROM TABLE( CAST(eml_dmo_nt AS email_demo_nt_t)

Table Functions Nested Tables Enable Table Functions SELECT * FROM TABLE( CAST(eml_dmo_nt AS email_demo_nt_t) ) TABLE Operator Tells Oracle to Treat the Variable as a SQL Table CAST Operator Explicitly Tells Oracle the Data Type to be Used to Handle the Operation www. Sage. Logix. com

Table Function Example DECLARE eml_dmo_nt email_demo_nt_t : = email_demo_nt_t(); BEGIN -- Some logic that

Table Function Example DECLARE eml_dmo_nt email_demo_nt_t : = email_demo_nt_t(); BEGIN -- Some logic that populates the nested table … eml_dmo_nt. EXTEND(3); eml_dmo_nt(1) : = email_demo_obj_t(45, 3, '23'); eml_dmo_nt(2) : = email_demo_obj_t(22, 3, '41'); eml_dmo_nt(3) : = email_demo_obj_t(18, 7, 'over_100 k'); -- Process the data in assending order of email id. FOR r IN (SELECT * FROM TABLE(CAST(eml_dmo_nt AS email_demo_nt_t)) ORDER BY 1) LOOP dbms_output. put_line(r. email_id || ' ' || r. demo_id); END LOOP; END; www. Sage. Logix. com

Returning Result Sets Returning Collections Directly Return the Data Structure Itself Returning Reference Cursors

Returning Result Sets Returning Collections Directly Return the Data Structure Itself Returning Reference Cursors Returns and Open Cursor to an Application Doesn’t Return Data from PL/SQL Directly Calling a Table Function from the SQL Context Convert Function Return Value into a Cursor www. Sage. Logix. com

Returning Collections Return a Collection Type Explicitly Best Suited for PL/SQL Calling Programs FUNCTION

Returning Collections Return a Collection Type Explicitly Best Suited for PL/SQL Calling Programs FUNCTION get_email_demo(p_email_id NUMBER) RETURN email_demo_nt_t IS eml_dmo email_demo_nt_t; BEGIN SELECT email_demo_obj_t(email_id, demo_id, value) BULK COLLECT INTO eml_dmo FROM email_demographic WHERE email_id = p_email_id; -- Apply some business logic on the nested table here. RETURN eml_dmo; END; www. Sage. Logix. com

Table Functions Can be Used in a SQL Context Too A Table Function Takes

Table Functions Can be Used in a SQL Context Too A Table Function Takes a Collection Type as an Argument A Function that Returns a Collection Works Too Allows us to Pass Out PL/SQL Collections as a Cursor to any Host Language SELECT * FROM TABLE( CAST( get_email_demo(45) AS email_demo_nt_t)); www. Sage. Logix. com

Table Functions Data is Buffered in the Local Variable During Function Execution Cursor Returns

Table Functions Data is Buffered in the Local Variable During Function Execution Cursor Returns Rows after Function Completes Private Memory Issues if the Result Set is Large Need a Way to Stream Results 9 i Pipelined Table Functions Provides a Streaming Interface Rows are Returned as they are Produced Rows are Actually Buffered in Small Batches Remember Bulk Binding Issue? Can be Run in Parallel PIPELINED Keyword PIPE ROW Operator www. Sage. Logix. com

Pipelined Table Function FUNCTION get_email_demo RETURN email_demo_nt_t PIPELINED IS CURSOR email_demo_cur IS SELECT email_demo_obj_t(email_id,

Pipelined Table Function FUNCTION get_email_demo RETURN email_demo_nt_t PIPELINED IS CURSOR email_demo_cur IS SELECT email_demo_obj_t(email_id, demo_id, value) FROM email_demographic; eml_dmo_nt email_demo_nt_t; BEGIN OPEN email_demo_cur; LOOP EXIT WHEN email_demo_cur%NOTFOUND; FETCH email_demo_cur BULK COLLECT INTO eml_dmo_nt LIMIT 1000; FOR i IN 1. . eml_dmo_nt. COUNT LOOP /* Apply some business logic on the object here, and return a row. */ PIPE ROW (eml_dmo_nt(i)); END LOOP; RETURN; END; www. Sage. Logix. com

External Tables One Last Piece of Background Information Oracle 9 i External Tables Provides

External Tables One Last Piece of Background Information Oracle 9 i External Tables Provides a Way for Oracle to Read Directly from Flat Files on the Database Server File can be Queried as if it is a Real Database Table Can Sort, Aggregate, Filter Rows, etc. External File Can be Queried in Parallel Only Table Definition is Stored in the Database Data is ‘External’ Table Definition is Similar to SQL*Loader Control File www. Sage. Logix. com

External Tables CREATE TABLE ext_tab (email VARCHAR 2(50), age NUMBER, income VARCHAR 2(20)) ORGANIZATION

External Tables CREATE TABLE ext_tab (email VARCHAR 2(50), age NUMBER, income VARCHAR 2(20)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY data_dir ACCESS PARAMETERS (RECORDS DELIMITED BY NEWLINE LOGFILE data_dir: 'ext_tab. log' BADFILE data_dir: 'ext_tab. bad' FIELDS TERMINATED BY ', ' MISSING FIELD VALUES ARE NULL (email CHAR(50), age INTEGER EXTERNAL(2), income CHAR(20) ) ) LOCATION ('ext_tab. dat') ) REJECT LIMIT UNLIMITED; www. Sage. Logix. com

ETL Example • Normalize, Encode and Pivot Input Record EMAIL AGE INCOME John. Doe@excite.

ETL Example • Normalize, Encode and Pivot Input Record EMAIL AGE INCOME John. Doe@excite. com 58 over_100 k EMAIL_ID DEMO_CODE VALUE 2345 3 58 2345 7 over_100 k www. Sage. Logix. com

PACKAGE BODY etl IS TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR 2(30);

PACKAGE BODY etl IS TYPE hash_table_t IS TABLE OF NUMBER INDEX BY VARCHAR 2(30); email_map hash_table_t; FUNCTION transform (new_data SYS_REFCURSOR) RETURN email_demo_nt_t PIPELINED PARALLEL_ENABLE ( PARTITION new_data BY ANY ) IS TYPE ext_tab_array IS TABLE OF ext_tab%ROWTYPE INDEX BY BINARY_INTEGER; indata ext_tab_array; email_demo_obj_t : = email_demo_obj_t(null, null); demo_map hash_table_t; BEGIN LOOP EXIT WHEN new_data%NOTFOUND; FETCH new_data BULK COLLECT INTO indata LIMIT 1000; FOR i IN 1. . indata. COUNT LOOP email_demo_obj. email_id : = email_map(indata(i). email); email_demo_obj. demo_code : = 3; email_demo_obj. value : = indata(i). age; PIPE ROW (email_demo_obj); email_demo_obj. demo_code : = 7; email_demo_obj. value : = indata(i). income; PIPE ROW (email_demo_obj); END LOOP; RETURN; END; BEGIN FOR email IN (SELECT email_id, email FROM email) LOOP email_map(email) : = email_id; END LOOP; END; www. Sage. Logix. com

Oracle 9 i ETL Transformation is Just a Simple INSERT as SELECT Elegant Solution

Oracle 9 i ETL Transformation is Just a Simple INSERT as SELECT Elegant Solution to Parallel, Transactional Co-processing INSERT /*+ append nologging */ INTO email_demographic (SELECT /*+ parallel( a, 4 ) */ * FROM TABLE( CAST( etl. transform( CURSOR(SELECT * FROM ext_tab )) AS email_demo_nt_t)) a); www. Sage. Logix. com

Parallel Co-processing Extract Input File Transform PQ Slave PL/SQL Load INSERT PQ Slave PL/SQL

Parallel Co-processing Extract Input File Transform PQ Slave PL/SQL Load INSERT PQ Slave PL/SQL www. Sage. Logix. com DB

Performance Issues Speed is Respectable but There is a Performance Bottleneck with the Table

Performance Issues Speed is Respectable but There is a Performance Bottleneck with the Table Function Mechanism Possibly an Issue Binding Data Back from the SQL Engine Throughput is about Three Times Slower than Coding with BULK COLLECT and FORALL Operators However These Don’t Support Parallel Operations Oracle Expects to have it Fixed in Next Release www. Sage. Logix. com

ETL Alternatives The Multi-Table INSERT Statement New in 9 i Each Sub-Query Input Row

ETL Alternatives The Multi-Table INSERT Statement New in 9 i Each Sub-Query Input Row Can be INSERT’ed to a Different Table … or the Same Table Multiple Times Faster than Using PL/SQL It’s Always Faster to do Something in Pure SQL than Using Any Host Language Binding is Avoided www. Sage. Logix. com

Multi-Table Insert INSERT /*+ append nologging */ ALL INTO email_demographic (email_id, demo_id, value) VALUES

Multi-Table Insert INSERT /*+ append nologging */ ALL INTO email_demographic (email_id, demo_id, value) VALUES (email_id, 3, age) INTO email_demographic (email_id, demo_id, value) VALUES (email_id, 7, income) (SELECT /*+ ordered index( b ) */ b. email_id, a. income, a. age FROM ext_tab a, email b WHERE a. email = b. email); www. Sage. Logix. com

SQL-Only Processing Extract Transform Load PQ Slave Input File PQ Slave INSERT PQ Slave

SQL-Only Processing Extract Transform Load PQ Slave Input File PQ Slave INSERT PQ Slave www. Sage. Logix. com DB

Performance Solutions Minimize SQL Execution Time Exploiting Caching to Eliminate Some SQL Look-ups and

Performance Solutions Minimize SQL Execution Time Exploiting Caching to Eliminate Some SQL Look-ups and Joins Direct Path Inserts Code Logic Execution Time Replacing Interpreted PL/SQL with Native Compilation Eliminating Host Language Using Multi-Table INSERTS Variable Binding Replace Single Row Binds with Bulk Binds www. Sage. Logix. com

Conclusion Oracle 9 i ETL is a High Performance ETL Solution Especially Once the

Conclusion Oracle 9 i ETL is a High Performance ETL Solution Especially Once the Table Function Issue is Resolved Already Included in the Cost of the RDBMS www. Sage. Logix. com