All Powder Board and Ski Oracle 9 i
All Powder Board and Ski Oracle 9 i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003 1
Oracle Relational Approach Relational Tables Customer Sale. Item Meta-Data Materialized Views Star Design Dimension Sale + Fact Measure Customer Dimension 2
Desired Sales Cube Dimensions Sales Dimensions State (ship) Month Category Style Skill. Level Size Color Manufacturer Binding. Style Weight. Max? Item. Material? Waist. Width? 3
Early Data: Spreadsheets 4
External Tables: Attach to CSV create or replace directory csv_dir as ‘D: studentsBuild. All. PowderAll. Powder. Sample. Data. CSV'; create table Old. Sale_Ext ( Sale. ID INTEGER, Sale. Date DATE, Ship. State VARCHAR 2(50), Ship. ZIP VARCHAR 2(50), Payment. Method VARCHAR 2(50), SKU VARCHAR 2(50), Quantity. Sold INTEGER, Sale. Price NUMBER(10, 2) Model. ID VARCHAR 2(250), Item. Size NUMBER, Manufacturer. ID INTEGER, Category Color Model. Year Graphics Item. Material List. Price Style Skill. Level Weight. Max Waist. Width Binding. Style ) VARCHAR 2(50), INTEGER, VARCHAR 2(50), NUMBER(10, 2), VARCHAR 2(50), INTEGER, NUMBER, VARCHAR 2(50) Continued on next slide Warning: currency columns cannot have $ symbols or commas 5
External File Definition organization external ( type oracle_loader default directory csv_dir access parameters ( records delimited by newline fields terminated by ', ' optionally enclosed by '"' lrtrim missing field values are null ( Sale. ID, Sale. Date char date_format date mask "mm/dd/yyyy", Ship. State, Ship. ZIP, Payment. Method, SKU, Quantity. Sold, Sale. Price, odel. ID, Item. Size, Manufacturer. ID, Category, Color, Model. Year, Graphics, Item. Material, List. Price, Style, Skill. Level, Weight. Max, Waist. Width, Binding. Style ) ) location ('Lab 08 -01 Early Sales. csv') ) reject limit unlimited; 6
Create Customer and Employee Customer. ID and Employee. ID are missing from the old data. Instead of relying on blank cell values, create a new customer called “Walk-in” and a new employee called “Employee” Write down the ID numbers generated for these anonymous entries. If you use SQL, you can assign a value of zero to these entries. INSERT INTO Customer (Customer. ID, Last. Name) Values (0, 'Walk-in') INSERT INTO Employee (Employee. ID, Last. Name) Values (0, 'Staff') 7
Extract Model Data SELECT DISTINCT Old. Sale_ext. Model. ID, Old. Sale_ext. Manufacturer. ID, Old. Sale_ext. Category, Old. Sale_ext. Color, Old. Sale_ext. Model. Year, Old. Sale_ext. Graphics, Old. Sale_ext. Item. Material, Old. Sale_ext. List. Price, Old. Sale_ext. Style, Old. Sale_ext. Skill. Level, Old. Sale_ext. Weight. Max, Old. Sale_ext. Waist. Width, Old. Sale_ext. Binding. Style FROM Old. Sale_ext; 8
UNION Query for Models SELECT DISTINCT Model. ID, Manufacturer. ID, Category, … FROM Old. Sales_ext UNION SELECT DISTINCT Model. ID, Manufacturer. ID, Category, … FROM Old. Rentals_ext 9
Insert Model Data into Item. Model INSERT INTO Item. Model (Model. ID, Manufacturer. ID, Category, Color, Model. Year, Graphics, Item. Material, List. Price, Style, Skill. Level, Weight. Max, Waist. Width, Binding. Style) SELECT DISTINCT qry. Old. Models. Model. ID, qry. Old. Models. Manufacturer. ID, qry. Old. Models. Category, qry. Old. Models. Color, qry. Old. Models. Model. Year, qry. Old. Models. Graphics, qry. Old. Models. Item. Material, qry. Old. Models. List. Price, qry. Old. Models. Style, qry. Old. Models. Skill. Level, qry. Old. Models. Weight. Max, qry. Old. Models. Waist. Width, qry. Old. Models. Binding. Style FROM qry. Old. Models; 10
Insert SKU Data into Inventory CREATE VIEW qry. Old. Inventory AS SELECT DISTINCT Model. ID, SKU, Item. Size FROM Old. Sale_ext UNION SELECT DISTINCT Model. ID, SKU, Item. Size FROM Old. Rental_ext; INSERT INTO Inventory (Model. ID, SKU, Item. Size, Quantity. On. Hand) SELECT DISTINCT qry. Old. Inventory. Model. ID, qry. Old. Inventory. SKU, qry. Old. Inventory. Item. Size, 0 As Quantity. On. Hand FROM qry. Old. Inventory; Note the use of the column alias to force a zero value for Quantity. On. Hand for each row 11
Copy Sales Data INSERT INTO Sale (Sale. ID, Sale. Date, Ship. State, Ship. ZIP, Payment. Method) SELECT DISTINCT Old. Sales_ext. Sale. ID, Old. Sales_ext. Sale. Date, Old. Sales_ext. Ship. State, Old. Sales_ext. Ship. ZIP, Old. Sales_ext. Payment. Method FROM Old. Sales_ext; Note that if you have added data to your Sales table, your existing Sale. ID values might conflict with these You can solve the problem by adding a number to these values so they are all larger than your highest ID INSERT INTO Sale (Sale. ID, Sale. Date, Ship. State, Ship. ZIP, Payment. Method) SELECT DISTINCT Old. Sales_ext. Sale. ID+5000, Old. Sales_ext. Sale. Date, Old. Sales_ext. Ship. State, Old. Sales_ext. Ship. ZIP, Old. Sales_ext. Payment. Method FROM Old. Sales_ext; 12
Copy Sale. Item Rows INSERT INTO Sale. Item (Sale. ID, SKU, Quantity. Sold, Sale. Price) SELECT DISTINCT Old. Sale_ext. Sale. ID+5000, Old. Sale_ext. SKU, Old. Sale_ext. Quantity. Sold, Old. Sale_ext. Sale. Price FROM Old. Sale_ext; If you transformed the Sale. ID in the prior step for the Sale data, you must do the exact same calculation for Sale. ID in the Sale. Item table 13
Copy Rental Data INSERT INTO Rental (Rent. ID, Rent. Date, Expected. Return, Payment. Method) SELECT DISTINCT Old. Rental_ext. Rent. ID+5000, Old. Rental_ext. Rent. Date, Old. Rental_ext. Expected. Return, Old. Rental_ext. Payment. Method FROM Old. Rental_ext; INSERT INTO Rent. Item (Rent. ID, SKU, Rent. Fee, Return. Date) SELECT DISTINCT Old. Rental_ext. Rent. ID+5000, Old. Rental_ext. SKU, Old. Rental_ext. Rent. Fee, Old. Rental_ext. Return. Date FROM Old. Rental_ext; 14
Discoverer Administrator: Load Business Area Schema Tables and views Select tables 15
Load Wizard Options: LOV Most options are selected by default Select the LOV option to have Discoverer build lookup lists 16
Discoverer: Business Area Tables shown as folders and named so managers understand them Columns shown as items Add a calculated item 17
Create a Data Hierarchy Select Category and Style from the Ski. Board. Style lookup table 18
Discoverer Desktop: New Workbook Select the dimensions and the fact item 19
Initial Crosstab Layout Page area Column area Row area 20
Discoverer Crosstab Browser Totals Select all items Format options 21
Time Series Analysis: Moving Average 22
Time Series Analysis: Discoverer 23
Sales by State for Regression Note that some states are missing from the list. 24
Regression Data Query CREATE VIEW State. Sales 2004 AS SELECT State. Name, Income 2001, Pop 2002, Sum(Sale. Price*Quantity. Sold) AS Sales 2004 FROM Sale INNER JOIN State. Demographics ON Sale. Ship. State = State. Demographics. State. Code INNER JOIN Sale. Item ON Sale. ID = Sale. Item. Sale. ID WHERE Ship. State IS NOT NULL AND Sale. Date Between '01 -Jan-2004' And '31 -Dec-2004' GROUP BY State. Name, Income 2001, Pop 2002 ORDER BY State. Name; 25
Regression Setup You should include the label row but be sure to check the box to show you included it 26
Regression Results Relatively high R-square Population is a significant predictor, Income is not 27
Association Rules/Market Basket Locate folders Item to find Possible location Data mining samples D: Oracleora 92dmdemosample ORACLE_HOME D: Oracleora 92 JAVA_HOME C: Oracle. DataOra 92 DSjdk 28
Copy Files to Protect Original compile. Sample. Code. bat execute. Sample. Code. bat Sample_Association. Rules. java Sample_Association. Rules_Transactional. property Sample_Global. property 29
Edit Sample_Global. property File mining. Server. url=jdbc: oracle: thin: @Your. Server. Name: 1521: DBName mining. Server. user. Name=odm mining. Server. password=password input. Data. Schema. Name=powder output. Schema. Name=powder timeout=120 If necessary, use enterprise manager to unlock and assign new passwords to accounts: odm and odm_mtr 30
Create New Table To Hold Transaction Basket Data CREATE TABLE MARKET_BASKET_TX_BINNED ( SEQUENCE_ID INTEGER, ATTRIBUTE_NAME VARCHAR 2(35), VALUE NUMBER ); GRANT SELECT ON MARKET_BASKET_TX_BINNED TO odm; commit; If you use these names, you do not have to edit the Transactional. property file 31
Copy Sale. Item Data INSERT INTO MARKET_BASKET_TX_BINNED (SEQUENCE_ID, ATTRIBUTE_NAME, VALUE) SELECT Sale. ID, Item. Model. Category || '_' || Item. Model. Style AS AName, 1 As Value FROM Sale. Item Inner Join Inventory ON Sale. Item. SKU = Inventory. SKU Inner Join Item. Model ON Inventory. Model. ID = Item. Model. ID GROUP BY Sale. ID, Item. Model. Category || '_' || Item. Model. Style; 32
Copy Sale Data INSERT INTO MARKET_BASKET_TX_BINNED (SEQUENCE_ID, ATTRIBUTE_NAME, VALUE) SELECT Sale. ID, 'ID', Sale. ID FROM Sale; commit; 33
Remove Dashes from Attribute UPDATE MARKET_BASKET_TX_BINNED SET ATTRIBUTE_NAME = substr(ATTRIBUTE_NAME, 1, instr(ATTRIBUTE_NAME, '-')-1) || '_' || substr(ATTRIBUTE_NAME, instr(ATTRIBUTE_NAME, '-')+1) WHERE instr(ATTRIBUTE_NAME, '-') > 0; commit; Run at least twice—until you get zero changes. Because a row might have more than one dash. 34
Limit Size of Attribute_Name UPDATE MARKET_BASKET_TX_BINNED SET ATTRIBUTE_NAME = substr(ATTRIBUTE_NAME, 1, 20); commit; This is critical—but is probably due to a bug in Oracle’s code. There is a slight chance it arises because of the 30 character name limitation in Oracle. 35
Compile and Run the Code SET ORACLE_HOME = D: Oracleora 92 SET JAVA_HOME = C: Oracle. Dataora 92 DSjdk compile. Sample. Code. bat Sample_Association. Rules. java execute. Sample. Code. bat Sample_Association. Rules_Transactional. property Type as all one line—do not hit <Enter> until the end To redirect the output to a file, at the end, add: >myfile. txt 36
Sample Results Getting top 5 rules for model: Sample_AR_Model_tx sorted by support. Rule 124: If Boots_=1 then Clothes_=1 [support: 0. 17285714, confidence: 0. 44814816] Rule 38: If Clothes_=1 then Boots_=1 [support: 0. 17285714, confidence: 0. 35276967] Rule 101: If Board_Half_Pipe=1 then Clothes_=1 [support: 0. 11357143, confidence: 0. 4622093] Rule 9: If Clothes_=1 then Board_Half_Pipe=1 [support: 0. 11357143, confidence: 0. 23177843] Rule 100: If Ski_Freestyle=1 then Clothes_=1 [support: 0. 09785714, confidence: 0. 48070174] Get rules by support: Sample_AR_Model_tx, with minimum support of 0. 16. Rule 124: If Boots_=1 then Clothes_=1 [support: 0. 17285714, confidence: 0. 44814816] Rule 38: If Clothes_=1 then Boots_=1 [support: 0. 17285714, confidence: 0. 35276967] Get rules by confidence: Sample_AR_Model_tx, with confidence of 0. 56 or more. Investigate and think about the results. Do you have too many clothes targeted to half-pipe boards and freestyle skiers, or not enough? 37
GIS: Microsoft Map. Point The Discoverer worksheet places the data into rows and columns A dynamic copy of this sheet is used to remove the top rows 38
Map. Point Data Wizard 39
GIS Analysis of Sales 40
- Slides: 40