A thorough Introduction to Integration Services March 3, 2009Posted by fofo in Sql Server, Sql Server 2005, SQL Server 2008.
Tags: Integration Services, Sql Server 2005, SQL Server 2008
In this rather huge post I will try to explain and give valuable references when appropriate, regarding the use and functionality of Integration Services or SSIS for short. There will be other posts following regarding SSIS.
I will be using SQL Server 2008 for the various examples but you can use SQL Server 2005. It will not be a problem.
I will try to explain the main concepts of SSIS. More specifically I will try to talk about
- Tasks integrated Services are best suited for
- What the major components of SSIS are
- The techniques that one can use to build,debug and deploy transformations with Business Intelligence Development Studio
SSIS is an automation tool-component that is part of SQL Server 2005 and SQL Server 2008. It replaces DTS which has been a set of tools that were useful to DBAs to import, export, and transform data. DTS was included in all previous versions of SQL Server up to 2000.SSIS is only available in the “Standard” and “Enterprise” editions.
It is used for
- Moving data from one database to another database
- Copy database to an excel file or a text file
- Extract data from a data source, manipulate it,transform it and import it in a database
- Converting between various data types
- Merging columns into new or existing columns
The type of operation that we perform with SSIS is called ETL.
To access information related to SSIS from the msdn site click here .
For a more complete and sound definition you can click here and see what Wikipedia says about SSIS.
So to recap these are the main tasks that SSIS are used for:
- Merging data from multiple sources. I know many people out there, will claim that they can do that with writing code in a .net language. Yes of course one can do that. But it takes more time and one will need lots of time for debugging and testing purposes. SSIS provide wizards and components which are configured to do exactly that.
- Populate data warehouses and data stores. This means that SSIS is a great tool for getting data from a OLTP database and insert it into OLAP data store
- Cleaning and validating the data in the database
- Build business intelligence logic into data transformation
- Automate administrative tasks like database back up, delete,copy database objects
Now that all the declarations are out of the way, let’s review the necessary tools for creating,debugging,deploying and managing SSIS packages. The two main tools are
- Business Intelligence Development Studio
- SQL Server Management Studio
In order to create a SSIS package you must create a new SSIS integration project.
In order to create the project you need to fire up BIDS(Business Intelligence Development Studio). Then select New Project and from the Project type window select “Business Intelligence Projects”. From the templates window select the Integration Services Project. Have a look at the picture below.
When you click OK in the New Project window, you will have in your screen something similar like the picture below.
In this simple project we have just created, there are 4 folders in it anda a sample Package called Package.dtsx.
A package is the main container for tasks that are part of a complete unit of work.
A Package contains objects that you can simply drag and drop from the Toolbox window and processes that control the sequence of those objects. The objects can be
- Control flow objects which are commonly referred as tasks, like copying files and sending emails
- Data flow objects are called transformations. You extract data using data sources, you transform tha data (in memory) using data transformations and load data using data destinations.
There are 3 main processes. These Processes are represented by the 3 different tabs in the Package.dtsx[Design].
- Control Flow. The Control Flow tab allow us to drag and drop control flow items from the Toolbox to the designer surface and then connect them together.
- Data Flow. Their mission is to retrieve data from a data source and save it to a target destination
- Event Handlers. This allow us to perform various tasks at specific points in time during the lifetime of a package.
I will start a new practical example to demonstrate the above topics. I need a database to do that. I will use the AdventureWorksLT database.SQL Server 2005-2008 does not install sample databases by default due to security reasons. We must download them and install them separately. You can download this free database from the codeplex website. Codeplex is the project hosting site for Microsoft SQL Server Samples and Community Projects.The down-loadable file, will be a .msi file that you need to run and after successfully doing that, the .mdf and .ldf files will be placed somewhere on your hard disk. Then you can attach them to your local instance of your SQL Server from SQL Server Management Studio.
- For those that have Sql Server 2005 installed they need to go and download the file ,”AdventureWorksLT.msi”, from here
- For those that have Sql Server 2008 installed they need to go and download the file,”SQL2008.AdventureWorks_All_Databases.x86.msi”, from here
So I assume that you have downloaded the correct version of the AdventureWorksLT database and attached it to the local instance of your SQL Server. In this example I want to
- Get data from Customer table in my AdventureWorksLT database
- Make some transformations on the retrieved rows of the Customer table. e.g combine two columns of the table into a new column. Take FirstName and LastName columns and combine them into a new one.
- Save the data to a different format in a .xls file.
1) Start BIDS (Business Intelligence Development Studio)
2) Create a new business intelligence project (Integration Services Project) and call it “TranformCustomers”
3) Drag and Drop a Data Flow Task onto the designer surface.
4) Double click on it and you will immediately be transferred from the Control Flow tab to the Data Flow tab.
5) Go to your toolbox and drag and drop on the design surface(Data Flow tab) a OLE DB Source component.
6) Double click the component and in the window that appears (OLE DB Source Editor) . Click the button New
7) Click New again to new window that appears titled Configure OLE DB Connection Manager
8) In the next window that will appear titled Connection Manager find the name of your server
9) Then select the AdventureWorksLT database, Hit the “Test the Connection” Button and click “OK” and then click “OK” one more time.
10) In the OLE DB Source Editor window make sure that the option Table or View is selected.
11) Then select the SalesLT.Customer table from the dropdown.
12) Hit the Preview button if you want and click “OK”
13) Drag and drop a Derived Column component onto the Designer Surface.
14 ) Drag the green arrow from the OLE DB Source component to the Derived Column component until a connection is established.
15) Double click on the Derived Column component. A new window, Derived Column Transformation Editor, appears.
16) In this new window under the Derived Column Name column, type a name for the new derived column.
17) In the Expression type FirstName + ” ” + LastName and then click OK.
If you have done everything correctly so far you should have something that looks similar with the picture below
18) The next step is to define where our transformed data will be located. Drag and drop an Excel Destination Component onto the designer surface.
19) Drag the green line-arrow from the Derived Column component to the Excel Destination component until a connection is established.
20) Minimise all the windows. Start Microsoft Excel 2003 or 2007 and create an empty workbook. Name it Customer.xls and save it on your desktop. Open the workbook and name the first worksheet Customer Data.
21) Double click the Excel Destination component. The Excel Destination Editor window appears.
22) Click on the New button and from the Excel Connection Manager window click Browse and then navigate to the Customer.xls and click Open and then OK.
23) Hit the New Button(Excel Destination editor window), next to the Name of the excel sheet:
24) In the Create Table window change the CREATE TABLE `Excel Destination` to CREATE TABLE `Customer Data` and click OK.
25 ) Hit the Preview button to have a look at the columns.
26) Click OK to exit the Excel Destination editor window and save your changes to the project
27) Run your project by hitting F5
28) Your package should execute fine and you should have a picture similar like the one below
29) Open the Customer.xls file from your desktop and see the transformed data in it.
Now let’s see another way of creating packages using the Import / Export Wizard.
We will demonstrate how to create a package with a Import / Export Wizard with an example. In this example I want to copy certain tables from one database that resides in my local instance of SQL Server to a new database that I will create and also resides in my local instance of SQL Server. The database will be AdventureWorksLT.
1) Start BIDS (Business Intelligence Development Studio)
2) Create a new business intelligence project (Integration Services Project) and call it “Import/Export”
3) On the SSIS Packages folder right-click and select SSIS Import and Export wizard.
4) In the wizard that pops up click Next.In this window select the data source. In this example you should use the local intance of your SQL Server and then select the AdventureWorksLT database. Then click Next.
5) In the next step of the wizard, select the destination. In this example it will be a new database in the same server.
6) Hit the New button and in the new window that appears just type “Mynewdb” and hit the OK button and then click the Next button.
7) Select (should be selected by default) the first option in the next step of the wizard.Click Next.
8) In the next step of the wizard select as many tables of the database that you want copied to the new database.
9) Click the Next button and then click the Finish button.
10) if you go back to your project in the BIDS, you will see a new package that was created automatically for us and called Package1.dtsx. We did not have to do anything manually only follow the simple steps of the wizard.
11) Run the package by hitting F5.
12) Check that the destination database has all the data you specified in the steps of the wizard.
That is all for now. It is not difficult, is it?