Oracle Data Archiving Taming the Beast Dave Moore

Oracle Data Archiving Taming the Beast Dave Moore Neon Enterprise Software INTELLIGENCE. INNOVATION. INTEGRITY

Agenda Archiving Defined Requirements and Solutions Oracle Archiving Strategies Oracle Row Removal Options Oracle Post Archive Operations INTELLIGENCE. INNOVATION. INTEGRITY

Dave n Oracle ACE n Using Oracle since 1991 n Product Author at Neon Enterprise Software n Creator of Oracle. Utilities. com n Author of “Oracle Utilities” from Rampant Tech Press n Core competencies include performance, utilities and data management INTELLIGENCE. INNOVATION. INTEGRITY

Database Archiving: The process of removing selected data records from operational databases that are not expected to be referenced again and storing them in an archive data store where they can be retrieved if needed. Purge INTELLIGENCE. INNOVATION. INTEGRITY

Trends Impacting Archive Needs Data Retention Issues: Amount of Data Volume of data 0 e c n ec t ro Length of retention requirement n o i t Varied types of data P Security issues ia l p m o C Time Required 30+ Yrs INTELLIGENCE. INNOVATION. INTEGRITY

Archiving All Types of Data Paper Blueprints Forms Claims Word Excel PDF XML IMS DB 2 ORACLE SYBASE SQL Server IDMS VSAM Programs UNIX Files Outlook Lotus Notes Attachments Sound Pictures Video INTELLIGENCE. INNOVATION. INTEGRITY

Data Archiving and ILM Create Operational Reference Archive Needed for completing business transactions Needed for reporting or expected queries Needed for compliance and business protection Discard Mandatory Retention Period INTELLIGENCE. INNOVATION. INTEGRITY

Some Sample Regulations Impacting Data Retention INTELLIGENCE. INNOVATION. INTEGRITY

What Does It All Mean? Enterprises must recognize that there is a business value in organizing their information and data. Organizations that fail to respond run the risk of seeing more of their cases decided on questions of process rather than merit. (Gartner, 20 -April-2007, Research Note G 00148170: Cost of E-Discovery Threatens to Skew Justice System) INTELLIGENCE. INNOVATION. INTEGRITY

Operational Efficiency Database Archiving can be undertaken to improve operational efficiency n Large volumes of data can interfere with production operations – efficiency of transactions – efficiency of utilities: BACKUP/RESTORE, REORG, etc. – Storage » Gartner: databases copied an average of 6 times! INTELLIGENCE. INNOVATION. INTEGRITY

What Solutions Are Out There? n Keep Data in Operational Database — Problems with authenticity of large amounts of data over long retention times n Store Data in UNLOAD files (or backups) — Problems with schema change and reading archived data; using backups poses even more serious problems n Move Data to a Parallel Reference Database — Combines problems of the previous two n Move Data to a Database Archive INTELLIGENCE. INNOVATION. INTEGRITY

Components of a Database Archiving Solution Production Database Data Extract Captured Structure Archive Policies Data Retention Archive Data Store and Retrieve Recall Database Data Recall Archive Store Archive Data Query Access Metadata Policies History Data & Metadata Archive Administration INTELLIGENCE. INNOVATION. INTEGRITY

Archiving Requirements n Policy based archiving: logical selection n Keep data for very long periods of time n Store very large amounts of data in archive n Maintain Archives for ever changing operational systems n Become independent from Applications/DBMS/Systems n Protect authenticity of data n Access data when needed; as needed n Discard data after retention period automatically INTELLIGENCE. INNOVATION. INTEGRITY

Policy based archiving n Why : — Business objects are archived, not files — Rules for when something is ready can be complex — Data ready to be archived is distributed over database n Implications: — User must provide policies for when something is to be archived n How: — Full metadata description of data — Flexible specification of policy : “WHERE clause” INTELLIGENCE. INNOVATION. INTEGRITY

For Example… Parts Master is the parent table to all other tables STORAGE INFO Part Number Bin Number Qty on Hand Qty on Order Qty Backorder PARTS MASTER Part Number Type Description Unit Type Cost Price Substitute Parts ORDER INFO Part Number PO Number Vendor ID Quantity Ordered Unit Cost Date Ordered Date Received SUMMARY BY QUARTER Part Number Year Q 1 Disbursed Q 2 Disbursed Q 3 Disbursed Q 4 Disbursed DISBURSEME NT Part Number Dept. ID CHIT ID Qty Disbursed Date Disbursed INTELLIGENCE. INNOVATION. INTEGRITY

Keep Data for a Long Time n Why: retention requirements in decades n Implications: — — — Archive will outlive applications/DBMS/systems that generated them Archive will outlive people who designed and managed operational systems Archive will outlive media we store it on n How: — — — Unique data store Application/DBMS/system independence Metadata independence Continuous management of storage Continuous management of archive content INTELLIGENCE. INNOVATION. INTEGRITY

Maintain Archive for Changing Operational Systems n Why : — — Metadata changes frequently Applications are re-engineered periodically – – Change DBMS platform Change System platform Replace with new application Consolidate after merger or acquisition n Implications: — — Archive must support multiple variations of an application Archive must deal with metadata changes n How: — — Manage applications as major archive streams having multiple minor streams with metadata differences Achieve independence from operating environment INTELLIGENCE. INNOVATION. INTEGRITY

Achieve Metadata Independence n Why : — — Operational metadata is inadequate Operational metadata changes Operational systems keep only the “current” metadata Data in archive often does not mirror data in operational structures n Implications: — — Archive must encapsulate metadata Metadata must be improved n How: — — — Metadata Capture, Validate, Enhance capabilities Store structure that encapsulates with data Keeps multiple versions of metadata INTELLIGENCE. INNOVATION. INTEGRITY

Protect Authenticity of Data n Why : — — Potential use in lawsuits/ investigations Potential use in business analysis n Implications: — — — Protect from unwanted changes Show original input Cannot be managed in operational environment n How: — — — — SQL Access that does not support I/U/D Do not modify archive data on metadata changes Encryption as stored Checksum for detection of sabotage Limit access to functions Audit use of functions Maintain offsite backup copies for restore if sabotaged INTELLIGENCE. INNOVATION. INTEGRITY

Access Data Directly From Archive n Why : — Cannot depend on application environment n Implications: — Full access capability within archive system n How: — — Industry standard interface (e. g. JDBC) LOAD format output for – For load into a database – May be different from source database — — — Requires full and accurate metadata Ability to review metadata Ability to function across metadata changes INTELLIGENCE. INNOVATION. INTEGRITY

Discard Function n Why : — Legal exposure for data kept too long n Implications: — Data cannot be kept in archive beyond retention period — Must be removed with no exposure to forensic software n How: — Policy based discard — System level function — Tightly controlled and audited — True “zero out” capability — Discard from backups as well INTELLIGENCE. INNOVATION. INTEGRITY

Database or Archive? Keep in DB Keep in Archive Performance Space Compliance INTELLIGENCE. INNOVATION. INTEGRITY

Based on Data Availability Keep in DB Keep in Archive Purge Must be Available to App Must be Available Must Be Secure Not Needed INTELLIGENCE. INNOVATION. INTEGRITY

Oracle Archiving Strategies n Designed Up Front (Yeah, right) n Determined by Application Owner n Implemented by ______ n Utilize Oracle Features INTELLIGENCE. INNOVATION. INTEGRITY

Finding Large Tables n DBA_SEGMENTS (bytes) n DBA_TABLES (num_rows) n or based on I/O INTELLIGENCE. INNOVATION. INTEGRITY

Rolling Windows n Self Managing n Mostly based on DATE n Utilize DBMS Features — Partitioning — Transportable Tablespaces – – Exchange Partition Set tablespace read only Expdp Copy export file and data file INTELLIGENCE. INNOVATION. INTEGRITY

Rolling Windows via Partitioning P 1 … Data Profile Storage Profile Probably Never Accessed Cheap as you can get Read Only / Compressed … P 47 Rarely Accessed Heavily Accessed Not so fast or expensive Fast, expensive Read Only Read / Write INTELLIGENCE. INNOVATION. INTEGRITY

Why not use transportablespaces or Oracle exports for data retention? INTELLIGENCE. INNOVATION. INTEGRITY

The Problem with Oracle Files n Transportable Tablespaces n Exports n Backups Oracle Export Files & Datafiles Import Trans Tsp Version 16 Z Year 2007 Year 2030 Not a good method for LT Data Retention INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (Old ways) n Range Partitioning Data is distributed based on partition key range of values – usually a date. Good When: Data is date-based. INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (Old Ways) n Hash Partitioning Uses hash algorithm to create equally sized buckets of data. Good When: No natural partition key and desire I/O balancing (hot spots). INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (Old Ways) n List Partitioning Data is distributed based on LIST of values in partition key. Good When: Have short list of values (States, Regions, Account Types) INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (New Ways – 11 G) n Interval Partitioning Initial Partition is created manually, the rest are automatically created as new data arrives. Good When: Need a rolling window! INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (New Ways – 11 G) n REF partitioning Related Tables benefit from same partitioning strategy, whether column exists in children or not! Good When: Desire related data to be partitioned in the same manner. INTELLIGENCE. INNOVATION. INTEGRITY

Partitioning (New Ways – 11 G) n Virtual Column Partitioning Partition key may be based on virtual column Good When: Virtual column is required for partition key. INTELLIGENCE. INNOVATION. INTEGRITY

Rows Gotta Go INTELLIGENCE. INNOVATION. INTEGRITY

Row Removal Options n SQL DELETE n CTAS / DROP / RENAME n TRUNCATE n Row Marking INTELLIGENCE. INNOVATION. INTEGRITY

SQL DELETE n Good for small number of rows n RI handled automatically n Oracle was born to DELETE, better than any PL/SQL that you write. n Issue with Un-indexed Foreign Keys INTELLIGENCE. INNOVATION. INTEGRITY

DELETE Optimization n Work in batches, committing (only when programmatically DELETING) n Use parallel DML (Partitioned tables only) n Drop Indexes before (if possible) n Index FK columns INTELLIGENCE. INNOVATION. INTEGRITY

CTAS n Works well for PURGE, not archive n Perfect when you want to keep low percentage of rows in the table n Doesn’t handle RI – no DELETE was issued. n Process 1. Create table with rows you want to keep 2. Drop old table 3. Rename table 4. Recreate indexes create table new_table unrecoverable as select * from old_table where. . . INTELLIGENCE. INNOVATION. INTEGRITY

TRUNCATE n Congratulations if your application lends itself to TRUNCATE without losing new data n What about RI? n May truncate or drop individual partitions INTELLIGENCE. INNOVATION. INTEGRITY

DROP n DROP PARTITION n What would you do before you drop it? n Exchange partition with table n Transportablespace. INTELLIGENCE. INNOVATION. INTEGRITY

Things to Remember • Benchmark the best way for you • Benchmark against real data if possible • Use parallel DML INTELLIGENCE. INNOVATION. INTEGRITY

Design Summary n Create an architecture that lends itself to aging, archiving, deleting n This architecture should compensate for business requirements — For instance, customer orders not accessible after 6 months … or — top query performance needed for all ‘ACTIVE’ accounts … etc n Implement it – THE EASY PART INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges “I have successfully deleted 10 billion rows from the table. Hooo. Ahhhh! Performance will be great, space will be available, and I will get credit for optimizing our data warehouse application, saving the company billions of dollars” INTELLIGENCE. INNOVATION. INTEGRITY

… 2 Days Later … INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges Hmmmmm. It looks like … - Queries are not any faster. . . The Select count(*) took the same amount of time. . . Space was not freed in Oracle (DBA_FREE_SPACE). . . Space was not freed in the operating system. . . WHY NOT ? ? ? Where are the benefits ? ? ? INTELLIGENCE. INNOVATION. INTEGRITY

From Swiss to Provolone After DELETE After Maintenance INTELLIGENCE. INNOVATION. INTEGRITY

Post Archive Challenges n Statistics are not fresh n High Water Marks are very high n Space has not been freed within Oracle (if that’s what you want) n Space has not been freed to the OS INTELLIGENCE. INNOVATION. INTEGRITY

Refresh Statistics n Help the optimizer, easy enough n dbms_stats provides many options INTELLIGENCE. INNOVATION. INTEGRITY

Automatic Stats n Recommended by Oracle n Calls DBMS_STATS_JOB_PROC n Enabled via: Begin dbms_auto_task_admin. enable ( client_name => ‘auto optimizer stats collection’, operation => NULL, window_name => NULL); END; / INTELLIGENCE. INNOVATION. INTEGRITY

When do you go manual ? n High transaction DELETEs or TRUNCATEs n Bulk loads which add more than 10% of table size So there’s our answer – go manual. INTELLIGENCE. INNOVATION. INTEGRITY

How do we Gather Them? n NOT the Analyze Command n Instead DBMS_STATS package exec dbms_stats. gather_table_stats(ownname => 'BDB', tabname => 'MASTER', estimate_percent => dbms_stats. auto_sample_size); INTELLIGENCE. INNOVATION. INTEGRITY

High Water Mark INTELLIGENCE. INNOVATION. INTEGRITY

High Water Mark INTELLIGENCE. INNOVATION. INTEGRITY

Reset High Water Mark (HWM) n DROP or TRUNCATE n Multiple OTHER ways to do this depending on version n In v 9 … alter table move tablespace [tsp name]; — Row movement must be enabled — Tablespace must be a LMT — Can move into same tablespace — Will occupy 2 X space temporarily — Must then rebuild indexes n In v 10 … alter table <table_name> shrink space; INTELLIGENCE. INNOVATION. INTEGRITY

Freeing Allocated Space INTELLIGENCE. INNOVATION. INTEGRITY

Create table, check space SQL> create table space_example as select * from dba_source; Table created. SQL> select count(*) from space_example; COUNT(*) -----296463 SQL> exec dbms_space. unused_space(‘DAVE', 'SPACE_EXAMPLE'); Total blocks: 6328 Unused blocks: 1 Unused bytes: 8192 Last Used Block: 55 Last Used Block ID: 10377 Last Used Ext File ID: 4 INTELLIGENCE. INNOVATION. INTEGRITY

Check datafile space Size Current Poss. FILE_NAME Poss. Size Savings ------------------------- -------/export/home/ora 102/oradata/ora 102/qasb 001. dbf 29 46 17 /export/home/ora 102/oradata/ora 102/example 01. dbf 69 100 31 /export/home/ora 102/oradata/ora 102/qasb 002. dbf 41 41 0 /export/home/ora 102/oradata/ora 102/system 01. dbf 493 500 7 /export/home/ora 102/oradata/ora 102/sysaux 01. dbf 430 0 /export/home/ora 102/oradata/ora 102/undotbs 01. dbf 91 175 84 /export/home/ora 102/oradata/ora 102/users 01. dbf 44 83 39 /export/home/ora 102/oradata/ora 102/test. dbf 51 70 19 INTELLIGENCE. INNOVATION. INTEGRITY

Delete rows, check space SQL> delete from space_example; 296463 rows deleted. SQL> commit; SQL> exec dbms_space. unused_space(‘DAVE', 'SPACE_EXAMPLE'); Total blocks: 6328 Unused blocks: 1 Unused bytes: 8192 Last Used Block: 55 Last Used Block ID: 10377 Last Used Ext File ID: 4 Nothing Changed ! INTELLIGENCE. INNOVATION. INTEGRITY

Shrink it, check space SQL> alter table space_example enable row movement; SQL> alter table space_example shrink space; SQL> exec dbms_space. unused_space('BDB', 'SPACE_EXAMPLE'); Total blocks: 8 Unused blocks: 4 Unused bytes: 32768 Last Used Block: 4 Last Used Block ID: 5129 Last Used Ext File ID: 4 Space Freed From Table, but still in Oracle INTELLIGENCE. INNOVATION. INTEGRITY

Check space again Size Current Poss. FILE_NAME Poss. Size Savings ------------------------- -------/export/home/ora 102/oradata/ora 102/qasb 001. dbf 29 46 17 /export/home/ora 102/oradata/ora 102/example 01. dbf 69 100 31 /export/home/ora 102/oradata/ora 102/qasb 002. dbf 41 41 0 /export/home/ora 102/oradata/ora 102/system 01. dbf 493 500 7 /export/home/ora 102/oradata/ora 102/sysaux 01. dbf 430 0 /export/home/ora 102/oradata/ora 102/undotbs 01. dbf 171 175 4 /export/home/ora 102/oradata/ora 102/users 01. dbf 44 83 39 /export/home/ora 102/oradata/ora 102/test. dbf 1 70 69 This datafile should be resized to save 69 MB SQL> alter database datafile '/export/home/ora 102/oradata/ora 102/test. dbf' resize 1 m; INTELLIGENCE. INNOVATION. INTEGRITY

Free the Space n Space is still reserved for future inserts and updates, just not freed back to the OS n Space will not be automatically freed – confirm by checking DBA_FREE_SPACE n Ways to set it free — drop — truncate — alter table move … — alter table shrink space … INTELLIGENCE. INNOVATION. INTEGRITY

Unindexed Foreign Keys Example 1 Million Rows PARENT COL 1 ON DELETE CASCADE CHILD COL 1_PARENT 1 Million Rows SQL> DELETE FROM PARENT WHERE COL 1 < 1000; Fky. sql INTELLIGENCE. INNOVATION. INTEGRITY

Before Index delete from parent where col 1 < 1000 call count -------Parse 1 Execute 1 Fetch 0 -------total 2 cpu elapsed disk query current ---------- -----0. 01 0. 08 2 27 0 0. 90 0. 80 4 2208799 6062 0. 00 0 0 0 ---------- -----0. 91 0. 88 6 2208826 6062 rows -----0 999 0 -----999 delete from "DAVE". "CHILD" where "COL 1_PARENT" = : 1 call count -------Parse 1 Execute 999 Fetch 0 -------total 1000 cpu elapsed disk query current ---------- -----0. 00 0 0 0 285. 94 293. 11 1543900 2208789 1029 0. 00 0 0 0 ---------- -----285. 94 293. 11 1543900 2208789 1029 rows -----0 999 0 -----999 INTELLIGENCE. INNOVATION. INTEGRITY

SQL> create index prnt_ndx on child(col 1_parent); delete from parent where col 1 < 1000 call count -------Parse 1 Execute 1 Fetch 0 -------total 2 cpu elapsed disk query current ---------- -----0. 00 0 0 0 0. 53 0. 47 7 13 7053 0. 00 0 0 0 ---------- -----0. 53 0. 47 7 13 7053 rows -----0 999 0 -----999 delete from "DAVE". "CHILD" where "COL 1_PARENT" = : 1 call count -------Parse 1 Execute 999 Fetch 0 -------total 1000 cpu elapsed disk query current ---------- -----0. 00 0 0 0 0. 42 0. 46 2 3002 4058 0. 00 0 0 0 ---------- -----0. 42 0. 46 2 3002 4058 rows -----0 999 0 -----999 INTELLIGENCE. INNOVATION. INTEGRITY

Unindexed Foreign Keys n Problem is not limited to DELETE statements n Search database for unindexed FK columns n Script is on asktom — Search for unindex. sql INTELLIGENCE. INNOVATION. INTEGRITY

Summary Points n Create sound Archiving strategy based on Oracle technical features as well as business and/or legal requirements n Leverage partitioning n Move partitions to cheap disk when appropriate n Make partitions ‘read only’ and compressed n Remove data via DROP or TRUNCATE if possible n If SQL DELETE, make sure to perform maintenance operations n Consider 3 rd party. INTELLIGENCE. solutions INNOVATION. INTEGRITY

“Well done is better than well said” Ben Franklin Questions? INTELLIGENCE. INNOVATION. INTEGRITY
- Slides: 70