Unit 4 Designing and building an ETL mapping

Unit - 4 Designing and building an ETL mapping

Designing our staging area • The staging area is the interim location for the data between the source system and the target database structure. • The staging area will hold the data extracted directly from the ACME_POS source database, which will determine how we structure our staging table.

Designing the staging area contents q Sales: The data elements in the Sales dimensional object are: • Quantity • Sales amount q Date: The data element in the Date dimensional object is: • Date of sale. q Product. : The data elements in the Product dimensional object are: • SKU, Name, List price, Department, Category, and Brand

q Store: The data elements in the Store dimensional object are: • Name • Number • Address 1 • Address 2 • City • State • Zip postal code • Country • Region

Building the staging area table with the Data Object Editor • To get started with building our staging area table, let's launch the OWB Design Center if it's not already running. • Expand the ACME_DW_PROJECT node.

The steps to create the staging area table in our target database are: • 1. Navigate to the Databases | Oracle | ACME_DW_LOCATION module. We will create our staging table under the Tables node, so let's right-click on that node and select New. . . from the pop-up menu. Notice that there is no wizard available here for creating a table and so we are using the Data Object Editor to do it. • 2. Upon selecting New.

• 3. The first tab is the Name tab where we'll give our new table a name. Let's call it POS_TRANS_STAGE for Point-of-Sale transaction staging table. We'll just enter the name into the Name field, replacing the default TABLE_1 that it suggested for us. • 4. Let's click on the Columns tab next and enter the information that describes the columns of our new table.

• We'll save our work using the Ctrl+S keys, or from the Diagram | Save All main menu entry in the Data Object Editor

• The other tabs in Data Object Editor for a table are: q. Constraints • The next tab after Columns is Constraints where we can enter any one of the four different types of constraints on our new table. A constraint is a property that we can set to tell the database to enforce some kind of rule on the table that limits (or constraints) the values that can be stored in it.

Constraints • Check constraint—a constraint on a particular column that indicates the acceptable values that can be stored in the column.

• Foreign key—a constraint on a column that indicates a record must exist in the referenced table for the value stored in this column. • A foreign key is also considered a constraint because it limits the values that can be stored in the column that is designated as a foreign key column.

• Primary key—a constraint that indicates the column(s) that make up the unique information that identifies one and only one record in the table. • It is similar to a unique key constraint in which values must be unique. The primary key differs from the unique key as other table's foreign key columns use the primary key value (or values) to reference this table. • value stored in the foreign key of a table is the value of the primary key of the referenced table for the record being referenced.

Another Tabs in Data Object Editor • 1. Indexes : used to create index for faster search • 2. Partitions : A partition is a way of breaking down the data stored in a table into subsets that are stored separately.

• 3. Attribute Sets : An Attribute Set is a way to group attributes of an object in an order that we can specify when we create an attribute set. • 4. Data Rules : A data rule can be specified in the Warehouse Builder to enforce rules for data values or relationships between tables.

• 5. Data Viewer : It is used to view Cubes and Dimensions.

Creating Mapping • Adding source tables There a couple of ways we can add a table to our mapping. • One way is to use the Explorer window and the other way is to use the Palette window. • In the Explorer window, we will use the Available Objects tab to find the table that we want to include in our mapping. • To find an object in the Explorer, we have to know what module it is located under. • In our case, we know the POS_TRANSACTIONS table is defined under the ACME_POS module. • So let's navigate to the Databases | Non-Oracle | ODBC | ACME_POS node in the Available Objects tab to find the POS_TRANSACTIONS table entry. • Click and hold the left mouse button on POS_TRANSACTIONS, drag it over to the Mapping window, and release the left mouse button to drop the table into our mapping.

• There a couple of items to note about how the Mapping Editor window looks. The Properties window no longer shows the mapping information. • It has changed to show the properties of the POS_TRANSACTION table as it is now highlighted in the Mapping canvas window. • Click and drag the Table Operator from the Palette window onto the Mapping window.

Adding a target table • We'll navigate to Databases | Oracle | ACME_DW_LOCATION | Tables | POS_TRANS_STAGE in the Explorer, and click and drag the POS_TRANS_STAGE table to the righthand side of our source tables in the Mapping window.

Connecting source to target • The process of connecting the source to the target means data fields from the source go(maps) to the data fields in the target. • The Data flow operators will help to connect the source and target. • An Aggregator operator that can be used to aggregate data. The Joiner operator is visible, drag this operator into the Mapping window, and drop it between the sources and target.

Joiner operator attribute groups • Joiner operator has three groups defined, but the attributes in our table operators are all in one group. • The groups in operators we saw are generally input groups, output groups, or both. • To edit it, right-click on the header of the box and select Open Details. . . to open the Joiner Editor. • This dialog box will allow us to edit the number of groups as well as change the group names if we want something different from INGRP 1 and INGRP 2. • The Joiner Editor can be used to edit not only the groups, but also the attributes that compose each group. • So if we right-click inside the Joiner box on a group and select Open Details. . . , we will get the same dialog box with just the individual tab selected that corresponds to the group we clicked on.

• With the Joiner Editor open, let's click on the Groups tab. We'll click three times on the Add button in the lower-right corner to add three more groups. • Notice that the default names it assigns are INGRP 3, INGRP 4, and INGRP 5. • Add when we click on the Add button is an input group. Now we'll click on the OK button to close out the Joiner Editor dialog box.

Connecting operators to the Joiner • The Mapping Editor to go ahead and connect every attribute in the group. • If we have several attributes, this is a convenient way to connect them. • So, click and drag INOUTGRP 1 of the ITEMS table operator onto the ITEMS group of the JOINER. Immediately, it will add all the attributes from the ITEMS table to the ITEMS group in the JOINER and connect each one with a line.

Defining operator properties for the Joiner • The Properties window in the Mapping Editor. If the JOINER operator is not already selected, click once on the header of the box to select it and the Properties window will immediately change to display the properties of the selected object. • We can see one property mentioned which is Join Condition. Click on the blank box to the right of the Join Condition label. Expand the ITEMS group on the left and double-click the ITEMS_KEY attribute to add it to the expression. • As we want every record included where the ITEM_SOLD equals the ITEMS_KEY, we will include an equal sign next by clicking on the button with the = sign on it. • We'll finish this first relation by expanding the POS_TRANSACTIONS group and double-clicking on the ITEM_SOLD attribute to include it.

• 1. The REGISTER attribute by double-clicking on it in the POS_TRANSACTIONS group. • 2. The REGISTERS_KEY attribute by doubleclicking on it under the REGISTERS group. • 3. This expression is followed by another AND by clicking on the And button. • 4. Press the Enter key.

Adding an Aggregator operator • The Aggregator operator requires that we specify a few things for it to function correctly. We have to specify a group by clause that the Aggregator operator is used to group the data, and it will create an output attribute for every attribute we use in the group by clause. • We have to manually add output attributes for any of the values that are going to be summed up, and then specify the SUM() function to use for them.

• Drag an AGGREGATOR operator from the Palette window to the canvas and drop it to the right of the JOINER operator between that operator and the POS_TRANS_STAGE target operator. • Connect the output attributes from the JOINER operator as input to the AGGREGATOR operator by dragging the OUTGRP 1 output group and dropping it on the INGRP 1 input group of the AGGREGATOR operator.

• Drag a Transformation Operator from the Palette window and drop it on the canvas between the Joiner operator and the Aggregator operator near the DATE_SOLD attribute. • In the resulting pop up that appears, we'll scroll down the window until the Date() functions appear and then select the TRUNC() function. • Then connect the VALUE attribute of the RETURN output group of the TRUNC operator to the DATE_SOLD attribute of the INGRP 1 group of the Aggregator operator.

• 1. We have our input set for the Aggregator operator and now we need to address the output. Let's select the Aggregator operator by clicking on the title bar of the window where it says AGGREGATOR. The Properties window of the Mapping Editor will display the properties for the aggregator. • 2. The very first attribute listed is Group By Clause. open the Expression Builder for the Group By Clause.

• • • • • Enter the following attributes separated by commas by double-clicking each in the INGRP 1 entry in the left window: INGRP 1. ITEM_NAME , INGRP 1. ITEM_CATEGORY , INGRP 1. ITEM_SKU , INGRP 1. ITEM_BRAND , INGRP 1. ITEM_LIST_PRICE , INGRP 1. ITEM_DEPT , INGRP 1. STORE_NAME , INGRP 1. STORE_NUMBER , INGRP 1. STORE_ADDRESS 1 , INGRP 1. STORE_ADDRESS 2 , INGRP 1. STORE_CITY , INGRP 1. STORE_STATE , INGRP 1. STORE_ZIP, INGRP 1. REGION_NAME , INGRP 1. COUNTRY , INGRP 1. DATE_SOLD.

• 3. We'll click on the OK button to close the Expression Builder dialog box and looking at the AGGREGATOR now, we can see that it added an output attribute for each of these attributes in our group by clause. • This list of attributes has every attribute needed for the POS_TRANS_STAGE operator except for the two number measures, SALE_QUANTITY and SALE_DOLLAR_AMOUNT. • 4. We'll right-click on the OUTGRP 1 attribute group of the AGGREGATOR operator and select Open Details. . . from the pop up.

• 5. Now we need to apply the SUM() function to these two new attributes, so we'll click on SALES_QUANTITY in the OUTGRP 1 group of the Aggregator. • 6. The Expression editor for output attributes of an Aggregator is custom built to apply aggregation functions. • We'll select SUM from the Function drop-down menu, ALL from the ALL/DISTINCT drop-down menu, and SALES_QUANTITY from the Attribute drop-down menu. • We'll then click on the Use Above Values button and the expression will fill in showing the SUM function applied to the SALES_QUANTITY attribute.

• We'll click on the OK button to save the expression and close the dialog box. Then we'll do the same thing for the AMOUNT output attribute of the Aggregator, but will select AMOUNT for the Attribute drop-down menu.

• Make the following attribute connections between the Aggregator and the POS_TRANS_STAGE table by clicking and dragging a line between attributes. We'll do individual attributes this time, not the whole group.

ü SALES_QUANTITY to SALE_QUANTITY ü AMOUNT to SALE_DOLLAR_AMOUNT ü DATE_SOLD to SALE_DATE ü ITEM_NAME to PRODUCT_NAME ü ITEM_SKU to PRODUCT_SKU ü ITEM_CATEGORY to PRODUCT_CATEGORY ü ITEM_BRAND to PRODUCT_BRAND ü ITEM_LIST_PRICE to PRODUCT_PRICE ü ITEM_DEPT to PRODUCT_DEPARTMENT

ü STORE_NAME to STORE_NAME ü STORE_NUMBER to STORE_NUMBER ü STORE_ADDRESS 1 to STORE_ADDRESS 1 ü STORE_ADDRESS 2 to STORE_ADDRESS 2 ü STORE_CITY to STORE_CITY ü STORE_STATE to STORE_STATE ü STORE_ZIP to STORE_ZIPPOSTALCODE ü REGION_NAME to STORE_REGION ü COUNTRY to STORE_COUNTRY
- Slides: 35