Welcome to the Baton Rouge Area SQL Server
Welcome to the Baton Rouge Area SQL Server User Group SQL Saturday 2009! SSIS Overview A Member of the largest user-run community for SQL Server & Business Intelligence in the world
SQL Server Integrations Services (SSIS) Overview Best practices and guidelines, migration from DTS to SSIS, differences between 2 k, 2 k 5 and 2 k 8, performance considerations, examples. See a sample of what SQL Server Integration Services has to offer with some real-life examples and usage demos that will inspire you to take advantage of the powerful engine of SSIS. William D Assaf Development Consultant Sparkhound, LLC
What is SSIS? • Company Line: “Microsoft SQL Server 2008 Integration Services (SSIS) makes it possible to build high performance data integration solutions, including the extraction, transformation, and loading (ETL) of data for data warehousing. Integration Services replaces Data Transformation Services (DTS). ”
SSIS • How many have used SSIS before? • What kind of projects?
SSIS Basics - DTS • SSIS concepts of “Control Flow” and “Data Flow” are not as distinct in DTS. • SSIS has one control flow per package with multiple data flows that can have precedence hierarchies.
SSIS Basics - SSIS • Control flow: every package has one control flow • Data flow: every control flow can invoke one or many data flow tasks. • Event Handlers, similar to. net, can execute control flow packages on object events, such as On. Error, On. Pre. Validate, On. Post. Execute, etc. – Similarly, breakpoints are great for stepping through and troubleshooting or understanding an existing package. • Package Explorer tab shows a summary of events (record counts, error results, and history) after a package runs.
DTS vs SSIS 2005/2008 • There is a plugin for SQL 2005 Management Studio called “SQL Server 2000 DTS Designer Components” that lets you edit SQL 2000 -era DTS packages in their native environment, inside SSMS. – This has mixed results. For me, it never worked and corrupted my 2000 Enterprise manager install.
DTS vs SSIS 2005/2008 • The SQL Server Upgrade Advisor (SSUA) can automate the upgrade process from DTS to 2000 or 2005, but its results should be carefully tested. – Some development inside SSIS will likely still be necessary on the product that the SSUA generates. • How SSUA is used to upgrade DTS to SSIS: – Open the DTS in SQL 2000 Enterprise Manager. – Save the DTS as a “Structured Storage File Format. ” – Open the DTS Package w/SSUA 2005/2008
But really, how many times has Microsoft provided an in-place upgrade that didn’t work well?
DTS vs SSIS 2005/2008 • Since SQL 2005 can run SQL 2000 DTS packages natively, the recommendation is to leave DTS 2000 packages running as is • Rewrite processes in SSIS 2005 or 2008 when time is available to follow best procedures and take advantage of new technologies in 2005. • (Don’t bother with the SSUA)
DTS vs SSIS 2005/2008 • Here are some of the features of SSIS that DTS did not have (or at least to the degree that SSIS contains): Separation of control flow and data flow Some connections have been optimized Several additional various transformations Wider range of data sources Security enhancements Toolset unification via BIDS with the other BI projects, source control integration – Checkpoints & in place restarts – Dynamic runtime configurations – Superior logging capabilities – – –
DTS vs SSIS 2005/2008 “on average SSIS is believed to be roughly 7 times as fast as DTS. ” Most importantly: • DTS is deprecated and will not be supported past SQL 2008.
Questions? Feeling Overwhelmed? Wish you had gone to that session on building your resume? From here on, only SSIS. No more mention of DTS.
Upgrading from SSIS 2005 to SSIS 2008 • Should be as easy as opening each package up and re-saving it in 2008. – Scripting engine has changed. (You can now write SSIS scripts in C#. ) – SSIS 2008 uses a newer build of the SQL Server Native Client • SSIS 2005 custom developed components will need to be altered to work in SSIS 2008
SSIS 2005 vs SSIS 2008 • The script environment differences are the biggest change between the two builds of SSIS. (and its not really that big a deal for many SSIS packages) – When you open/add a SSIS 2005 package which contains a Script Task in SQL Server 2008 BIDS it will automatically attempt to convert the script for you. (Do this. ) – You can also attempt to execute the older SSIS 2005 package with the 2008 dtexec utility, this operation will also convert the script in process; however the script conversion is only temporary.
SSIS 2005 and 2008 • No performance limitations in SSIS Standard edition vs. Enterprise edition – Enterprise does have some advances transformations, like Data Mining and Fuzzy Lookups, that Standard does not have. – These advanced operations are likely to be used in OLAP environments.
Enterprise Edition • Both 2005 and 2008 Enterprise Edition’s killer feature (or one of them anyway) is horizontal partitioning. – Strongly recommended for large datasets in OLAP cubes. Can provide huge performance gains. – Can be used on any large SQL database for performance • Guideline is approx more than 20 million rows in a single partition max.
TSQL vs SSIS • TSQL vs SSIS – Examples: • Replace CAST or CONVERT with SSIS Data Conversion. • Replace CASE with derived column expression logic. • Replace GROUP BY with SSIS aggregations • Of course, TSQL can still be written and called by an SSIS package using an Execute SQL task.
Best Practices Good Ideas
Good Ideas– Server • SSIS has adjustable maximum concurrency that should be set to the number of logical CPU’s available, and not higher. – Only impacts multi-step, non-consecutive operations • SSIS loves RAM, so 64 bit is a big advantage on max RAM availability and usage. Consider it for new projects. – Can make significant performance improvement and decrease physical drive I/O. • Use horizontal partitioning
Good Ideas – Server • For. ETL’s you will need Integration Services installed on the server. – You can actually create an SSIS package without SSIS installed, but many features (like built-in email) won’t work. – No reason not to install SSIS that I know of.
Good Ideas– SSIS • In destination tables, either make sure they are empty, or remove all indexes if they are not empty. – Bulk inserts are not as advantageous if the system has to update indexing. This can be a major performance impact.
Good Ideas – SSIS • SSIS OLE DB Destination adapter is usually the fastest. It allows checkpointing in batches of rows. – Set the “Rows per batch” and “Maximum insert commit size” in the OLE DB destination. (Default is 0, change to 1 k, 100 k, whatever seems to work best with your system’s memory and tempdb performance. )
Good Ideas – SSIS • The SQL Server Destination object doesn’t do checkpoints within a transaction, though it is recommended by books online. – In SQL 2005, it is not recommend by the Kimball Group. – Performance is improved in 2008.
Good Ideas – SSIS • Modularization of complex processes using SSIS, with business logic, is key. – For complex SSIS projects, have a master SSIS project that will call child packages. – Modularize your packages along business rules, for example, one for each table or cluster of related tables. This makes maintenance down the line far easier. – Don’t put everything in one control package.
Good Ideas – SSIS • Auditing is important. – Use the error flows to report on row-level problems. Observe error data sets. – Use SSIS’s robust logging features. • Can now log to text file, SQL Table, Windows Error Log, SQL Trace File, XML. • Send emails with error summaries and error row counts.
Error Schema • Consider a parallel schema for any tables you load for the purposes of dumping bad data. – The second schema should be exactly the same, but with much wider data types and no keys or constraints. – Example: If you are populating a table dbo. Accounts, have an error. Accounts table to store data validation errors.
Batch Tracking • If possible, track the batch that data was inserted in. – Example: An audit table has a row inserted every time the ETL runs. – The audit table’s IDENTITY column, Master. Key, value is inserted along with data into every destination table in a column called Master. Key. Continued…
Batch Tracking Continued… • If an ETL is interrupted midway through a run, removing records is much easier if you know what run inserted them. • The audit table can also track row count, error row count, start and end time, who started, results/notes on the run, etc. • The Master. Key would be passed to child packages and insert statements as SSIS variables.
Examples • WH Load ETL • Automated Restore for UAT • Rebuild Partitioned Tables
Source Reference Links These links were quoted or referenced in preparation of this presentation. • http: //msdn. microsoft. com/en-us/sqlserver/cc 511477. aspx • http: //blog. scalabilityexperts. com/2008/02/14/the-key-sql-server-2008 -bi -migration-dts-to-ssis-2008 -part-1 -of-2/ • http: //blog. scalabilityexperts. com/2008/02/19/the-key-sql-server-2008 -bi -migration-dts-to-ssis-2008 -part-2 -of-2/ • http: //www. vsteamsystemcentral. com/cs/blogs/applied_team_system/ar chive/2007/06/08/400. aspx • https: //www 106. livemeeting. com/cc/mseventsbmo/view? cn=w&id=1032 297072&pw=E 8 E 6 D 589&fmt=wmm - Excellent Kimball Institute webcast! • https: //msevents. microsoft. com/cui/Register. aspx? culture=en. US&Event. ID=1032276873&Country. Code=US
- Slides: 31