By : Kasim Wirama, MCDBA, MVP SQL Server
What is WMI? WMI stands for Windows Management Instrumentation. WMI is in query scripting language that is capable of managing your windows server and workstation. SSIS task for reading information in your workstation or server is provided by WMI data reader. For example to look information of your application log.
With WMI data reader, you can :
1. Read event log
2. Query list of running application
3. Query memory available for executing your SSIS package.
4. Get information of hard disk free space.
For example to get information of free disk space on c drive, you issue : SELECT FreeSpace, DeviceId, Size, SystemName, Description FROM Win32_LogicalDisk WHERE DeviceID = 'C:' or you want to get information about SSIS service and package running you issue : SELECT * FROM Win32_NTLogEvent WHERE LogFile = 'Application' AND (SourceName='SQLISService' OR SourceName='SQLISPackage').
WMI data reader task need WMI type connection manager. On task editor of the task, you specify whether you specify your query directly or through file/variable in WsqlQuerySourceType and specify the query itself in WsqlQueryType, and what kind of output format you would like to get, tabular format or property name – value format or just value format at Output type property. Destination type property gives you options whether you would like to save the output into file or variable, if you want to see the result directly without further processing, you can select the output to file.
Run your package, WMI reader task will read information through WMI query you specify.
Sunday, November 18, 2007
Interacting SSIS with Web Service
By: Kasim Wirama, MCDBA, MVP SQL Server
Other interesting component of SSIS is web service component as interface to interact with other third party through web service.
You can build your own web service with Visual Studio, then you can create your SSIS project by first creating HTTP connection manager. Specify server url in HTTP connection manager editor window. Then you drag web service task into control flow designer, on general page, select http connection manager on HttpConnection property, type wsdl file for saving into your local SSIS, and specify whether you would like to override the wsdl file.
On input page, you can select service name and web service method. And on output page you can receive web service output to any destination such as to variable or to file.
You can test your SSIS package, when check that the output will be saved into your specified destination.
Other interesting component of SSIS is web service component as interface to interact with other third party through web service.
You can build your own web service with Visual Studio, then you can create your SSIS project by first creating HTTP connection manager. Specify server url in HTTP connection manager editor window. Then you drag web service task into control flow designer, on general page, select http connection manager on HttpConnection property, type wsdl file for saving into your local SSIS, and specify whether you would like to override the wsdl file.
On input page, you can select service name and web service method. And on output page you can receive web service output to any destination such as to variable or to file.
You can test your SSIS package, when check that the output will be saved into your specified destination.
Saturday, November 10, 2007
What are new features offered in SQL Server 2008 ?
By : Kasim Wirama, MCDBA, MVP SQL Server
SQL Server 2008, code name Katmai, will be officially released on 27 February 2008. There are a number of changes that will impact on database design and some performance gained with new feature. Some aspects will need you consideration below :
1. Consider to use latest hardware, such as dual core or quad core processor, and 64 bit machine that has maximum memory 1 TB instead of 32 bit that has maximum 4 GB memory. And use 64 bit SQL Server 2008
2. Consider to use 64 bit latest server operating system (windows 2003 server SP 2 or windows 2003 R2). Those OSes will support SQL Server in high load utilization. If you install Scalable Network Pack for TCP Offload engine, it will free up CPU cycle. If you use Windows Server 2008, it will have capability to send dynamic frame sizing on TCP/IP low level.
3. Database mirroring on SQL Server 2008 will be much better compared to SQL Server 2005. It will be easy to implement and will compress log stream over network, resulting efficient bandwidth utilization, that will impact on system performance overall.
4. SQL Server 2008 will use data compression on LOB data type such as IMAGE and TEXT, less physical I/O has to be done, improving performance.
5. Data compression will also be applied to backup so it will reduce storage requirement.
6. Regard to store LOB data into database, SQL Server 2008 will introduce new data type FILESTREAM to address performance overhead imposed on store LOB into database. LOB will be keep externally and FILESTREAM will give high performance accessing the LOB outside database because it integrates will Windows file system whilst still maintaining database integrity.
7. SQL Server 2008 introduce some datetime datatype, namely DATE, TIME, DATETIMEOFFSET (datetime datatype with timezone) and DATETIME2 (larger year and fractional second range). It will make datetime related data migration easier from other database platform such as Oracle and DB2.
8. If you write some queries that involve some grouping with refers to same table, you can just write only one query with GROUPING SETS new TSQL feature of SQL Server 2008. It is simpler and fewer scan on base table, resulting better performance.
SQL Server 2008, code name Katmai, will be officially released on 27 February 2008. There are a number of changes that will impact on database design and some performance gained with new feature. Some aspects will need you consideration below :
1. Consider to use latest hardware, such as dual core or quad core processor, and 64 bit machine that has maximum memory 1 TB instead of 32 bit that has maximum 4 GB memory. And use 64 bit SQL Server 2008
2. Consider to use 64 bit latest server operating system (windows 2003 server SP 2 or windows 2003 R2). Those OSes will support SQL Server in high load utilization. If you install Scalable Network Pack for TCP Offload engine, it will free up CPU cycle. If you use Windows Server 2008, it will have capability to send dynamic frame sizing on TCP/IP low level.
3. Database mirroring on SQL Server 2008 will be much better compared to SQL Server 2005. It will be easy to implement and will compress log stream over network, resulting efficient bandwidth utilization, that will impact on system performance overall.
4. SQL Server 2008 will use data compression on LOB data type such as IMAGE and TEXT, less physical I/O has to be done, improving performance.
5. Data compression will also be applied to backup so it will reduce storage requirement.
6. Regard to store LOB data into database, SQL Server 2008 will introduce new data type FILESTREAM to address performance overhead imposed on store LOB into database. LOB will be keep externally and FILESTREAM will give high performance accessing the LOB outside database because it integrates will Windows file system whilst still maintaining database integrity.
7. SQL Server 2008 introduce some datetime datatype, namely DATE, TIME, DATETIMEOFFSET (datetime datatype with timezone) and DATETIME2 (larger year and fractional second range). It will make datetime related data migration easier from other database platform such as Oracle and DB2.
8. If you write some queries that involve some grouping with refers to same table, you can just write only one query with GROUPING SETS new TSQL feature of SQL Server 2008. It is simpler and fewer scan on base table, resulting better performance.
Friday, November 9, 2007
Building First Windows Communication Foundation
By : Kasim Wirama, MCDBA, MVP SQL Server
Hey, .NET 3.5 will come soon, and I haven’t got know anything about WCF. So I learn how to build service in WCF so it could be consume by clients. So this application contains 2 parts, service as server side and client that consumes the service. I will host the service into IIS.
At glance, WCF is a framework that provides communication mechanisms that involve client from windows and non windows world. Of course there are some aspects that should be taken into account for this service run reliable, scalable and secure to be accessed. I leave additional theory behind this WCF to leading leading MVP C# to explain more about this kind of “animal”.
So you install first your WCF, and Visual Studio extension for WCF and WPF, there is database sample used, AdventureWorks. I use enterprise library May 2007 build.
Make new project in VS2005, there is option NET Framework 3.0 in Visual C#’s or Visual Basic tree node of Project types list box, choose WCF Service Library. Name it ProductServiceProject. Configure project property Output Path from bin\debug to bin\ in Build section, make note to your assembly name and default namespace.
I also add reference Microsoft enterprise library for data.dll, common.dll, and object builder.dll
There is one class created by default, you just add corresponding Microsoft enterprise library namespace in “using (for C#)/Imports (for VB.NET)” section.
There must be 2 namespace that included in every class in WCF projects, “using System.ServiceModel” and “using System.Runtime.Serialization”. “using System.ServiceModel” contains all WCF related classes and “using System.Runtime.Serialization” contains all classes required to serialize object over wire and deserialize into object again.
Now at sample section you will see 2 attributes, ServiceContract and DataContract. DataContract identifies a class/enumeration/structure that can be serialized/deserialized as XML stream by WCF. ServiceContract identifies interface as a contract, so clients application would be able to generate metadata to consume the WCF service.
A class contains members, you can mark each members that you would like to serialize/deserialize with DataMember attribute. And for each member of interface that is marked as OperationContract, could be exposed with OperationContract interface. And you create a class to implement the interface. Let’s see below sample script to explain about this:
[DataContract]
public class Product
{
[DataMember]
public string ProductNumber;
}
[ServiceContract]
public interface IProductService
{
[OperationContract]
List ListProducts();
}
public class ProductServiceImpl : IProductService
{
public List ListProducts()
{
Database dbAdventureWorks = DatabaseFactory.CreateDatabase("AdventureWorksConnection");
string queryString = @"SELECT ProductNumber
FROM Production.Product";
IDataReader productReader = dbAdventureWorks.ExecuteReader(CommandType.Text, queryString);
List productsList = new List();
while (productReader.Read())
{
string productNumber = productReader.GetString(0);
productsList.Add(productNumber);
}
return productsList;
}
}
If you want to deploy it to IIS, you need to specify the class that implement the interface and assembly that contains the class in separate service file (SVC file) as below script.
<%@ServiceHost Service="Products.ProductServiceImpl" %>
<%@Assembly Name="ProductsService" %>
In web config add below scripts in section
binding="basicHttpBinding"
contract="Products.IProductsService" />
Service Model element contains configuration information about WCF web service.
Services element contains detail implementations for each service.
Endpoint element contains detail of each services that client application will require in order to communicate with the corresponding service.
Address attribute is left blank because IIS will use SVC file.
Binding element provides transport mechanism to access the WCF web service.
Next step is to make virtual directory point to your WCF project. After deployment you can see your service by typing http://localhost//. Notice that there is sentence “Metadata publishing for this service is currently disabled”. To make it enable to add this script in webconfig inside
And add new attribute inside service element with behaviorConfiguration="ProductsBehavior".
I have done for WCF service part. I try for WCF client part that consume the service.
Hey, .NET 3.5 will come soon, and I haven’t got know anything about WCF. So I learn how to build service in WCF so it could be consume by clients. So this application contains 2 parts, service as server side and client that consumes the service. I will host the service into IIS.
At glance, WCF is a framework that provides communication mechanisms that involve client from windows and non windows world. Of course there are some aspects that should be taken into account for this service run reliable, scalable and secure to be accessed. I leave additional theory behind this WCF to leading leading MVP C# to explain more about this kind of “animal”.
So you install first your WCF, and Visual Studio extension for WCF and WPF, there is database sample used, AdventureWorks. I use enterprise library May 2007 build.
Make new project in VS2005, there is option NET Framework 3.0 in Visual C#’s or Visual Basic tree node of Project types list box, choose WCF Service Library. Name it ProductServiceProject. Configure project property Output Path from bin\debug to bin\ in Build section, make note to your assembly name and default namespace.
I also add reference Microsoft enterprise library for data.dll, common.dll, and object builder.dll
There is one class created by default, you just add corresponding Microsoft enterprise library namespace in “using (for C#)/Imports (for VB.NET)” section.
There must be 2 namespace that included in every class in WCF projects, “using System.ServiceModel” and “using System.Runtime.Serialization”. “using System.ServiceModel” contains all WCF related classes and “using System.Runtime.Serialization” contains all classes required to serialize object over wire and deserialize into object again.
Now at sample section you will see 2 attributes, ServiceContract and DataContract. DataContract identifies a class/enumeration/structure that can be serialized/deserialized as XML stream by WCF. ServiceContract identifies interface as a contract, so clients application would be able to generate metadata to consume the WCF service.
A class contains members, you can mark each members that you would like to serialize/deserialize with DataMember attribute. And for each member of interface that is marked as OperationContract, could be exposed with OperationContract interface. And you create a class to implement the interface. Let’s see below sample script to explain about this:
[DataContract]
public class Product
{
[DataMember]
public string ProductNumber;
}
[ServiceContract]
public interface IProductService
{
[OperationContract]
List
}
public class ProductServiceImpl : IProductService
{
public List
{
Database dbAdventureWorks = DatabaseFactory.CreateDatabase("AdventureWorksConnection");
string queryString = @"SELECT ProductNumber
FROM Production.Product";
IDataReader productReader = dbAdventureWorks.ExecuteReader(CommandType.Text, queryString);
List
while (productReader.Read())
{
string productNumber = productReader.GetString(0);
productsList.Add(productNumber);
}
return productsList;
}
}
If you want to deploy it to IIS, you need to specify the class that implement the interface and assembly that contains the class in separate service file (SVC file) as below script.
<%@ServiceHost Service="Products.ProductServiceImpl" %>
<%@Assembly Name="ProductsService" %>
In web config add below scripts in
contract="Products.IProductsService" />
Service Model element contains configuration information about WCF web service.
Services element contains detail implementations for each service.
Endpoint element contains detail of each services that client application will require in order to communicate with the corresponding service.
Address attribute is left blank because IIS will use SVC file.
Binding element provides transport mechanism to access the WCF web service.
Next step is to make virtual directory point to your WCF project. After deployment you can see your service by typing http://localhost/
And add new attribute inside service element with behaviorConfiguration="ProductsBehavior".
I have done for WCF service part. I try for WCF client part that consume the service.
Wednesday, November 7, 2007
Scripting in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
For most business requirement scenarios, use of click, drop and select property of some tasks is sufficient. But the other scenarios, you need to provide custom solution that is not provided by the tasks (such as File System task or FTP task).
In DTS, you have only one scripting option, using ActiveX Script task, but in SSIS, you have other options, using .NET scripting provided by Script Task, besides ActiveX Script task. Let’s see what’s inside ActiveX Script task.
When you open Script section at ActiveX Script Task Editor window, there are some properties, such as Language, Script and EntryMethod. Still same with SSIS predecessor, there are VBScript and Jscript language. At Script property, you can type your script here, or alternatively, you load your activeX script file from Browse button, remember that if you load from file, all script that you type in Script property will be overridden. And you specify EntryMethod that will be first executed in script. Nothing new in this task, I recommend you move to Script task rather than ActiveX script task, for several advantages:
1. Intellisense with Visual Studio environment
2. can pass parameter into script
3. can add breakpoint to your script
4. Managed in .NET environment
5. You can precompile your script for faster execution performance.
Now, explore what Script Task component look like. In Script property, there are some properties.
ScriptLanguage property currently contains only Visual Basic .NET, but in next version of SQL Server 2008, it will have options Visual C# also.
PrecompileScriptIntoBinaryCode property is set to true, will make execution faster but file size will be bigger.
EntryPoint specify entry point method, just leave it as default.
ReadOnlyVariables contains variables that are read only purpose, and WriteOnlyVariables contains variables that can be read-write purpose.
You can try it out, first make 2 variables, named it readVar and writeVar with string type, you expected when readVar is assigned to “Hello” will output to writeVar as “Hello world”. Set value “Hello” to readVar, then define in Script Task Editor, ReadOnlyVariables to readVar and ReadWriteVariables to writeVar. Click Script button and it will open another visual studio window called VSA (Visual Studio for Application). In VSA, type dts.Variables(“writeVar”) = dts.Variables(“readVar”) & “ world” and Msgbox (dts.Variables(“writeVar”).values.ToString()) before last statement (TaskResult) and procedure Main.
Now, you run the package, it will display pop up message with text “Hello World”
Script task is very useful and offer flexibility and familiar to use when you are already used to working with Visual Studio.
For most business requirement scenarios, use of click, drop and select property of some tasks is sufficient. But the other scenarios, you need to provide custom solution that is not provided by the tasks (such as File System task or FTP task).
In DTS, you have only one scripting option, using ActiveX Script task, but in SSIS, you have other options, using .NET scripting provided by Script Task, besides ActiveX Script task. Let’s see what’s inside ActiveX Script task.
When you open Script section at ActiveX Script Task Editor window, there are some properties, such as Language, Script and EntryMethod. Still same with SSIS predecessor, there are VBScript and Jscript language. At Script property, you can type your script here, or alternatively, you load your activeX script file from Browse button, remember that if you load from file, all script that you type in Script property will be overridden. And you specify EntryMethod that will be first executed in script. Nothing new in this task, I recommend you move to Script task rather than ActiveX script task, for several advantages:
1. Intellisense with Visual Studio environment
2. can pass parameter into script
3. can add breakpoint to your script
4. Managed in .NET environment
5. You can precompile your script for faster execution performance.
Now, explore what Script Task component look like. In Script property, there are some properties.
ScriptLanguage property currently contains only Visual Basic .NET, but in next version of SQL Server 2008, it will have options Visual C# also.
PrecompileScriptIntoBinaryCode property is set to true, will make execution faster but file size will be bigger.
EntryPoint specify entry point method, just leave it as default.
ReadOnlyVariables contains variables that are read only purpose, and WriteOnlyVariables contains variables that can be read-write purpose.
You can try it out, first make 2 variables, named it readVar and writeVar with string type, you expected when readVar is assigned to “Hello” will output to writeVar as “Hello world”. Set value “Hello” to readVar, then define in Script Task Editor, ReadOnlyVariables to readVar and ReadWriteVariables to writeVar. Click Script button and it will open another visual studio window called VSA (Visual Studio for Application). In VSA, type dts.Variables(“writeVar”) = dts.Variables(“readVar”) & “ world” and Msgbox (dts.Variables(“writeVar”).values.ToString()) before last statement (TaskResult) and procedure Main.
Now, you run the package, it will display pop up message with text “Hello World”
Script task is very useful and offer flexibility and familiar to use when you are already used to working with Visual Studio.
Sending Mail in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
In most business requirement, business user would like to be notified about an execution status of SSIS package, through number of ways, such as logging. Another preferred way is by sending email to related user. SSIS provides this through Send Mail task. Scenario example for using this task when you would like to be notified by email when backup database task is failed or sales target has not reached minimum sales amount.
This task requires SMTP task connection manager. So you need to specify SMTP based connection manager before using the task. On the connection manager, you specify smtp server name.
On this SSIS version task, it has no longer dependency with MAPI. On DTS, it has dependency with MAPI, means that you should install Outlook on machine where the package will run.
Let see some properties for the task here:
SmtpConnection property contains name of connection manager you specified earlier. Another interesting property is MessageSourceType, there are 3 options, Direct Input, File Connection, and Variable, so it gives you more flexibility depending on business requirement. MessageSource will depend selection you make in MessageSourceType. As email priority, this task provide email priority through Priority property. And you can attach file through Attachments property. You can specify multiple attachments separated with semicolons. Unfortunately here, there is no way to define attachment property through variables.
In most business requirement, business user would like to be notified about an execution status of SSIS package, through number of ways, such as logging. Another preferred way is by sending email to related user. SSIS provides this through Send Mail task. Scenario example for using this task when you would like to be notified by email when backup database task is failed or sales target has not reached minimum sales amount.
This task requires SMTP task connection manager. So you need to specify SMTP based connection manager before using the task. On the connection manager, you specify smtp server name.
On this SSIS version task, it has no longer dependency with MAPI. On DTS, it has dependency with MAPI, means that you should install Outlook on machine where the package will run.
Let see some properties for the task here:
SmtpConnection property contains name of connection manager you specified earlier. Another interesting property is MessageSourceType, there are 3 options, Direct Input, File Connection, and Variable, so it gives you more flexibility depending on business requirement. MessageSource will depend selection you make in MessageSourceType. As email priority, this task provide email priority through Priority property. And you can attach file through Attachments property. You can specify multiple attachments separated with semicolons. Unfortunately here, there is no way to define attachment property through variables.
Message Queuing Implementation in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
SSIS provides tasks that can communicate with other application through windows built in messaging, called Message Queue (MSMQ). The task is Message Queue task. It is used to interact with message queue service. A message can contain file, string or variable. With this task, you can scale up your package, enabling multiple packages sending asynchronous messages to other application. Let’s see what configuration inside the task.
In General page, you specify message queue connection manager. On Message Option property, you specify whether the task is configured to send or receive message. Other property is whether to use windows 2000 MSMQ.
If you choose send message, the Send option will appear on left list box. On Message Type property, you can set what kind of content will be in message, the content could be variable, file or string type.
If you choose receive message, a bit more property will be there, such as time-out property and remove message from queue after read it. For time out property, if it is specified, it will produce error if it experiences a time out.
It is interesting task, I think. If you would like your application to interact one another through loosely coupled, tool or component interact with MSMQ is worth to be considered for use.
SSIS provides tasks that can communicate with other application through windows built in messaging, called Message Queue (MSMQ). The task is Message Queue task. It is used to interact with message queue service. A message can contain file, string or variable. With this task, you can scale up your package, enabling multiple packages sending asynchronous messages to other application. Let’s see what configuration inside the task.
In General page, you specify message queue connection manager. On Message Option property, you specify whether the task is configured to send or receive message. Other property is whether to use windows 2000 MSMQ.
If you choose send message, the Send option will appear on left list box. On Message Type property, you can set what kind of content will be in message, the content could be variable, file or string type.
If you choose receive message, a bit more property will be there, such as time-out property and remove message from queue after read it. For time out property, if it is specified, it will produce error if it experiences a time out.
It is interesting task, I think. If you would like your application to interact one another through loosely coupled, tool or component interact with MSMQ is worth to be considered for use.
Tuesday, November 6, 2007
Simplifying File Operation in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
In previous version of SSIS (i.e.: DTS), if you want to do file or directory related operation, you have to write this in ActiveX script task. For one or two operation, it would be no problem to maintain, but imagine when you ETL process involves many file or directory operation, you will need some efforts to understand and modify the ActiveX scripts. So it is not practical anymore in this situation.
SSIS provide File System Task component to simplify these processes by just selecting appropriate already built-in functions inside the task, such as Copy Directory, Copy File, Create Directory, Delete Directory, Delete Directory Content, Delete File, Move Directory, Rename File, and Set Attributes.
Let’s say if you want to Move Directory along with their files inside, some properties that should be specified below, based on category : Source Connection and Destination Connection.
For Source Connection property group, there are 2 properties, if you would like to assign value from variable to SourceConnection property, set it true on IsSourcePathVariable, then you can select drop down values on SourceVariable property.
For Destination Connection property group, there are 3 properties, similar to Source Connection property group, there are properties IsDestinationPathVariable and DestinationConnection. If you want to override existing file or directory and destination when file/directory name similar with source, you can set the value to true for Overwrite Destination property.
With complete function of the task, you will remove necessary to write complex ActiveX script and the package becomes easy to maintain.
In previous version of SSIS (i.e.: DTS), if you want to do file or directory related operation, you have to write this in ActiveX script task. For one or two operation, it would be no problem to maintain, but imagine when you ETL process involves many file or directory operation, you will need some efforts to understand and modify the ActiveX scripts. So it is not practical anymore in this situation.
SSIS provide File System Task component to simplify these processes by just selecting appropriate already built-in functions inside the task, such as Copy Directory, Copy File, Create Directory, Delete Directory, Delete Directory Content, Delete File, Move Directory, Rename File, and Set Attributes.
Let’s say if you want to Move Directory along with their files inside, some properties that should be specified below, based on category : Source Connection and Destination Connection.
For Source Connection property group, there are 2 properties, if you would like to assign value from variable to SourceConnection property, set it true on IsSourcePathVariable, then you can select drop down values on SourceVariable property.
For Destination Connection property group, there are 3 properties, similar to Source Connection property group, there are properties IsDestinationPathVariable and DestinationConnection. If you want to override existing file or directory and destination when file/directory name similar with source, you can set the value to true for Overwrite Destination property.
With complete function of the task, you will remove necessary to write complex ActiveX script and the package becomes easy to maintain.
Modularize SSIS Package
By : Kasim Wirama, MCDBA, MVP SQL Server
Sometimes, your package is too big and some of the tasks are reusable. You can break your big package down into smaller packages so that you can reuse your smaller package in other SSIS projects. SSIS supports this requirement by provide Execute Package task. Let’s examine more detail what configuration that the task offers.
The important configuration is located on Package section on list box of Execute Package Task Editor window, some properties are important to get familiarize with.
First is Location properties, you need to know where your package is save to, in SQL Server or in File System. When it is in SQL Server, the package is stored into msdb database. If it is in File System, you need to specify where it is located in file system.
Specify password property, when the called package is in encrypted form for security purpose.
Other property is ExecuteOutOfProcess. By default the value is false. When it is true, the referenced package will be executed in separate process and memory space.
How about setting up input parameters on child package. In DTS, you need the re-supply them to the child packages, but in SSIS, it is not necessary, because the child package will read input parameters through configuration of main package.
It is good practice if you break your large package into smaller ones, there is issue of memory (message : OutOfMemory exception) when you develop your SSIS package too big (let’s more than 10 MB). So to address this issue, the workaround is break the big SSIS package into smaller ones, and get arranged by main SSIS package, the Execute Package Task comes into play for this scenario.
Sometimes, your package is too big and some of the tasks are reusable. You can break your big package down into smaller packages so that you can reuse your smaller package in other SSIS projects. SSIS supports this requirement by provide Execute Package task. Let’s examine more detail what configuration that the task offers.
The important configuration is located on Package section on list box of Execute Package Task Editor window, some properties are important to get familiarize with.
First is Location properties, you need to know where your package is save to, in SQL Server or in File System. When it is in SQL Server, the package is stored into msdb database. If it is in File System, you need to specify where it is located in file system.
Specify password property, when the called package is in encrypted form for security purpose.
Other property is ExecuteOutOfProcess. By default the value is false. When it is true, the referenced package will be executed in separate process and memory space.
How about setting up input parameters on child package. In DTS, you need the re-supply them to the child packages, but in SSIS, it is not necessary, because the child package will read input parameters through configuration of main package.
It is good practice if you break your large package into smaller ones, there is issue of memory (message : OutOfMemory exception) when you develop your SSIS package too big (let’s more than 10 MB). So to address this issue, the workaround is break the big SSIS package into smaller ones, and get arranged by main SSIS package, the Execute Package Task comes into play for this scenario.
File Transfer Activity in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
Task for file activities through FTP (File Transfer Protocol) has been introduced in DTS. With major improvement in SSIS, the task has even more functionality than ever before.
What is difference functionality between the task in DTS and SSIS? In DTS, you can only receive file through FTP, but in SSIS, not only you can receive, but also you can send, delete local or remote file. Here is complete function available in FTP task on SSIS : send/receive file, create/remove local/remote directory, delete local/remote files.
The task need connection to FTP type, so you need to create FTP type connection manager. In FTP Connection Manager Editor window, you specify FTP server name, by default FTP port is 21, but if your FTP is secure FTP, the port is usually 22. If your FTP requires any connection get authenticated before doing any file operation, specify it in UserName and Password in Credentials section. If you FTP uses passive mode, check it on Options section.
After you define FTP connection manager, you can create FTP task in Control Flow Designer. In FTP Task Editor window, select the FTP-based connection manager on FTPConnection property.
On File Transfer section, you will see similar properties with those of File System Tasks. For Local Parameters property group in FTP is similar to Source property group in File System task. And For Remote Parameters property group in FTP is similar to Destination property group in File System task. One unique property that attracts my attention is IsTransferAscii property, by default the value is false. Most of FTP clients the default is true, if your FTP file operates in Ascii transfer mode, adjust it to avoid content inconsistency when your files get transferred between local and remote connection.
With SSIS, Your package has much of FTP functionalities in file/directory operations ever than before.
Task for file activities through FTP (File Transfer Protocol) has been introduced in DTS. With major improvement in SSIS, the task has even more functionality than ever before.
What is difference functionality between the task in DTS and SSIS? In DTS, you can only receive file through FTP, but in SSIS, not only you can receive, but also you can send, delete local or remote file. Here is complete function available in FTP task on SSIS : send/receive file, create/remove local/remote directory, delete local/remote files.
The task need connection to FTP type, so you need to create FTP type connection manager. In FTP Connection Manager Editor window, you specify FTP server name, by default FTP port is 21, but if your FTP is secure FTP, the port is usually 22. If your FTP requires any connection get authenticated before doing any file operation, specify it in UserName and Password in Credentials section. If you FTP uses passive mode, check it on Options section.
After you define FTP connection manager, you can create FTP task in Control Flow Designer. In FTP Task Editor window, select the FTP-based connection manager on FTPConnection property.
On File Transfer section, you will see similar properties with those of File System Tasks. For Local Parameters property group in FTP is similar to Source property group in File System task. And For Remote Parameters property group in FTP is similar to Destination property group in File System task. One unique property that attracts my attention is IsTransferAscii property, by default the value is false. Most of FTP clients the default is true, if your FTP file operates in Ascii transfer mode, adjust it to avoid content inconsistency when your files get transferred between local and remote connection.
With SSIS, Your package has much of FTP functionalities in file/directory operations ever than before.
Sunday, November 4, 2007
Precedence Constraint in SSIS
Precedence Constraint in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
If you ever used earlier version of SSIS, DTS, you usually use precedence constraint to connect from one task to another task, there are 3 kinds of precedence constraints, success, failure and completion to enable next task get executed. An improvement made by SSIS for this constraint.
There is addition besides constraint value (success, failure and completion), also another functionality that can evaluate some value dynamically before allowing next task get executed. The functionality is possible through conditional expression on precedence constraint. And you can combine between precedence constraint value and precedence constraint expression with OR/AND operator.
For example, if you want ExecuteSQLTask for executing send mail task if ExecuteSQL task has failed executed or no records returned from ExecuteSQL Task, you can configure precedence constraint to precedence constraint value OR precedence constraint expression. So there are 4 options for you based on business requirement :
1. Constraint
It is default setting, the value is success, failure, or completion
2. Expression
This is where you specify condition that should be satisfied before move to next task
3. ExpressionAndConstraint
The combination that must fulfil constraint made by both Constraint and Expression
4. ExpressionOrConstraint
The combination that should fulfil constraint made either by Contraint or by Expression.
I think this functionality makes package flow behaviour get executed more dynamically.
By : Kasim Wirama, MCDBA, MVP SQL Server
If you ever used earlier version of SSIS, DTS, you usually use precedence constraint to connect from one task to another task, there are 3 kinds of precedence constraints, success, failure and completion to enable next task get executed. An improvement made by SSIS for this constraint.
There is addition besides constraint value (success, failure and completion), also another functionality that can evaluate some value dynamically before allowing next task get executed. The functionality is possible through conditional expression on precedence constraint. And you can combine between precedence constraint value and precedence constraint expression with OR/AND operator.
For example, if you want ExecuteSQLTask for executing send mail task if ExecuteSQL task has failed executed or no records returned from ExecuteSQL Task, you can configure precedence constraint to precedence constraint value OR precedence constraint expression. So there are 4 options for you based on business requirement :
1. Constraint
It is default setting, the value is success, failure, or completion
2. Expression
This is where you specify condition that should be satisfied before move to next task
3. ExpressionAndConstraint
The combination that must fulfil constraint made by both Constraint and Expression
4. ExpressionOrConstraint
The combination that should fulfil constraint made either by Contraint or by Expression.
I think this functionality makes package flow behaviour get executed more dynamically.
Calling Other Executable in SSIS
Calling Other Executable in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
Sometimes, you need to call other executable through SSIS, let say, you write custom business logic in executable, and you need SSIS to call the executable (out of process), bottom line here that you want the module is reusable without rewrite same business logic in SSIS. With the component that wraps the executable inside SSIS, you just focus on the executable, so easy and efficient results you can get with the component.
The component is called ExecuteProcessTask. The component in SSIS is enhanced compared to previous version of this in DTS. You can write error information to a variable and you can redirect output from the command file to a variable.
Some variable here you should to configure here.
RequireFullFileName property should be set to true if you require your package to know exactly the location of the executable.
Executable property should specify executable name, for example notepad.exe, it is better if you specify fullfilename.
Arguments property contains list of arguments that you want to pass into Executable property.
Working Directory property contains input for location from which you executable is executed from.
StandardInputVariable property contains values as input arguments for the executable.
StandardOutputVariable property will keep execution output into variable.
StandardErrorVariable property will keep execution error information into variable.
If you set true on FailTaskIfReturnCodeIsNotSuccessValue property, it will see value on SuccessValue property, if return value is same with value of SuccessValue property defined, so it is considered success, otherwise failed.
Timeout property defines in seconds how long the executable is expected to finish execution. Value 0 means wait indefinitely. This property works with TerminateProcessAfterTimeOut property.
WindowStyle property contains value Normal, Maximized, Minimized and Hidden. If you would like to avoid interaction the executable to end user, set it Hidden.
By : Kasim Wirama, MCDBA, MVP SQL Server
Sometimes, you need to call other executable through SSIS, let say, you write custom business logic in executable, and you need SSIS to call the executable (out of process), bottom line here that you want the module is reusable without rewrite same business logic in SSIS. With the component that wraps the executable inside SSIS, you just focus on the executable, so easy and efficient results you can get with the component.
The component is called ExecuteProcessTask. The component in SSIS is enhanced compared to previous version of this in DTS. You can write error information to a variable and you can redirect output from the command file to a variable.
Some variable here you should to configure here.
RequireFullFileName property should be set to true if you require your package to know exactly the location of the executable.
Executable property should specify executable name, for example notepad.exe, it is better if you specify fullfilename.
Arguments property contains list of arguments that you want to pass into Executable property.
Working Directory property contains input for location from which you executable is executed from.
StandardInputVariable property contains values as input arguments for the executable.
StandardOutputVariable property will keep execution output into variable.
StandardErrorVariable property will keep execution error information into variable.
If you set true on FailTaskIfReturnCodeIsNotSuccessValue property, it will see value on SuccessValue property, if return value is same with value of SuccessValue property defined, so it is considered success, otherwise failed.
Timeout property defines in seconds how long the executable is expected to finish execution. Value 0 means wait indefinitely. This property works with TerminateProcessAfterTimeOut property.
WindowStyle property contains value Normal, Maximized, Minimized and Hidden. If you would like to avoid interaction the executable to end user, set it Hidden.
Tuesday, October 30, 2007
Checkpointing in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
If you ever developed sophisticated DTS package, and there is another error in one of the task, usually you have to reexecute the package from beginning. For complex package and big data that get involved in ETL process, it will consume considerably amount of time wasted. In SSIS, you have option not to start package from beginning in case the package gets error. The feature in SSIS to address this issue is checkpoint.
Checkpoint will note execution information to a file, when the task in a package gets failure to execute. When you correct the task, and re-execute the package, the beginning of execution will start at the failed task not the first task of a package. The package will read execution in the file, and resume executing the task and the rest. Once it is successfully executed, the package will remove the file. Nice feature that help you a lot when you develop a sophisticated package.
To enable checkpointing of a package, you define package’s properties in Checkpoints section, you specify Checkpointfilename, set CheckPointUsage to IfExists or always, and set to true for SaveCheckpoint. Then you set FailPackageOnFailure property of a task to true, otherwise the package will not create checkpoint file if it fails.
To simulate the problem, create 3 execute SQL Task package, set second package property name ForceExecutionResult to failure, and run the package. If you notice the output window, package status will have failure status, then you will find checkpoint file that you specify on Checkpointfilename property. Let’s say you have corrected the task, for demonstration purpose, set ForceExecutionResult property of second task from Failure to success. Run the package again, this time you notice that package will be executed from the second task (failed task) rather than first task. Once it is successfully executed, the checkpoint file will no longer exist, because the package deletes the checkpoint file that no longer useful.
Checkpoint feature in SSIS is brand new feature that worth to be implemented, to make your SSIS package is more reliable and manageable.
If you ever developed sophisticated DTS package, and there is another error in one of the task, usually you have to reexecute the package from beginning. For complex package and big data that get involved in ETL process, it will consume considerably amount of time wasted. In SSIS, you have option not to start package from beginning in case the package gets error. The feature in SSIS to address this issue is checkpoint.
Checkpoint will note execution information to a file, when the task in a package gets failure to execute. When you correct the task, and re-execute the package, the beginning of execution will start at the failed task not the first task of a package. The package will read execution in the file, and resume executing the task and the rest. Once it is successfully executed, the package will remove the file. Nice feature that help you a lot when you develop a sophisticated package.
To enable checkpointing of a package, you define package’s properties in Checkpoints section, you specify Checkpointfilename, set CheckPointUsage to IfExists or always, and set to true for SaveCheckpoint. Then you set FailPackageOnFailure property of a task to true, otherwise the package will not create checkpoint file if it fails.
To simulate the problem, create 3 execute SQL Task package, set second package property name ForceExecutionResult to failure, and run the package. If you notice the output window, package status will have failure status, then you will find checkpoint file that you specify on Checkpointfilename property. Let’s say you have corrected the task, for demonstration purpose, set ForceExecutionResult property of second task from Failure to success. Run the package again, this time you notice that package will be executed from the second task (failed task) rather than first task. Once it is successfully executed, the checkpoint file will no longer exist, because the package deletes the checkpoint file that no longer useful.
Checkpoint feature in SSIS is brand new feature that worth to be implemented, to make your SSIS package is more reliable and manageable.
Sunday, October 28, 2007
Managing Transaction in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
SSIS supports transactions as you usually use in TSQL, there are 2 kinds of transactions in SSIS :
1. Distributed Transaction
This transaction uses 2 phase commit transaction with Distributed Transaction Coordinator (DTC) service. Make sure this service runs before you run your package that uses the service. You don’t have to explicitly state begin tran…commit/rollback tran, because they are handled automatically by DTC service
2. Native SQL Server engine transaction
This transaction uses explicit BEGIN TRAN/COMMIT/ROLLBACK TRAN, explicitly in Execute SQL Task component. You must setting up the involved connection manager’s RetainSameConnection property from false to true.
For first alternative, there are 3 options for TransactionOption properties :
1. NotSupported
The component will not join to transaction scope.
2. Supported
The component will join parent’s transaction scope if exists
3. Required
The component will join parent’s transaction scope if exists, otherwise it will start new component.
Let’s start by create 3 Execute SQL Tasks that refers to same connection manager. Point to AdventureWorks, and you can create dummy table called test with 1 character column. Put there 3 each of these 3 statements into each of Execute SQL Tasks.
Create table dbo.dummy (cola varchar(10) not null);
Insert into dbo.dummy values (‘a’);
Drop table dbo.dummy;
Set up property TransactionOption from Supported (default) to Required for package level. Make sure DTC service already runs, otherwise, the package will fail.
Run your package, and there will be a table created and populated in AdventureWorks database.
For second alternative, you don’t have dependency to DTC service. DTC service is used when you want span transaction over more than 1 connection, tasks even more than 1 package, otherwise, it is simple to use native transaction TSQL statement.
SSIS supports transactions as you usually use in TSQL, there are 2 kinds of transactions in SSIS :
1. Distributed Transaction
This transaction uses 2 phase commit transaction with Distributed Transaction Coordinator (DTC) service. Make sure this service runs before you run your package that uses the service. You don’t have to explicitly state begin tran…commit/rollback tran, because they are handled automatically by DTC service
2. Native SQL Server engine transaction
This transaction uses explicit BEGIN TRAN/COMMIT/ROLLBACK TRAN, explicitly in Execute SQL Task component. You must setting up the involved connection manager’s RetainSameConnection property from false to true.
For first alternative, there are 3 options for TransactionOption properties :
1. NotSupported
The component will not join to transaction scope.
2. Supported
The component will join parent’s transaction scope if exists
3. Required
The component will join parent’s transaction scope if exists, otherwise it will start new component.
Let’s start by create 3 Execute SQL Tasks that refers to same connection manager. Point to AdventureWorks, and you can create dummy table called test with 1 character column. Put there 3 each of these 3 statements into each of Execute SQL Tasks.
Create table dbo.dummy (cola varchar(10) not null);
Insert into dbo.dummy values (‘a’);
Drop table dbo.dummy;
Set up property TransactionOption from Supported (default) to Required for package level. Make sure DTC service already runs, otherwise, the package will fail.
Run your package, and there will be a table created and populated in AdventureWorks database.
For second alternative, you don’t have dependency to DTC service. DTC service is used when you want span transaction over more than 1 connection, tasks even more than 1 package, otherwise, it is simple to use native transaction TSQL statement.
Managing One Temporary Table in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
In T-SQL, you must have used temporary table to keep temporary resultset. Temporary table is unique per connection, so if there is 2 users creating connection to create temporary table, even though the name is same, but two of them doesn’t interact each other. How it becomes possible? In engine level, SQL Server create temporary table with suffix with connection id.
If you create temporary table in SSIS, you must refers it to the same connection manager, otherwise SSIS will consider the temporary table is created and manipulated under different connection id.
You can create connection manager, pointing to tempdb database. Now I would like to do temporary table creation , insert new record, and drop the table, each of them is put on separate Execute SQL Task.
If you run the package, it will display red background in second task, the error message is : "Invalid object name '’.". It seems that SSIS consider the connection of the second task is different from the connection of the first task, even though both tasks refers to the same connection manager.
You should see property of the connection manager called RetainSameConnection. The default value is false, meaning that SSIS will create another connection session. To just make current connection of one task is shared to other connection session of previous task, you can change the default value from false to true.
Run you package again. This time, SSIS recognises connection session among three packages refers to one connection session.
In T-SQL, you must have used temporary table to keep temporary resultset. Temporary table is unique per connection, so if there is 2 users creating connection to create temporary table, even though the name is same, but two of them doesn’t interact each other. How it becomes possible? In engine level, SQL Server create temporary table with suffix with connection id.
If you create temporary table in SSIS, you must refers it to the same connection manager, otherwise SSIS will consider the temporary table is created and manipulated under different connection id.
You can create connection manager, pointing to tempdb database. Now I would like to do temporary table creation , insert new record, and drop the table, each of them is put on separate Execute SQL Task.
If you run the package, it will display red background in second task, the error message is : "Invalid object name '
You should see property of the connection manager called RetainSameConnection. The default value is false, meaning that SSIS will create another connection session. To just make current connection of one task is shared to other connection session of previous task, you can change the default value from false to true.
Run you package again. This time, SSIS recognises connection session among three packages refers to one connection session.
Setting Up dynamic property value at SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
SSIS offers more flexibility and debugging feature that doesn’t exists in previous ETL version (DTS). This article I will let you know one of property of each component or task that makes your SSIS package more flexible through Expression Editor and more configurable ever than before. Even you can change your parameter value from XML file.
For example, I would like to iterate each files in particular folder, and I make this particular folder is configurable, so I have change input parameter from outside SSIS package, input parameter will be got from outside SSIS environment for example, from XML file or through command line.
Let’s jump into new SSIS package for making it clear for you. First, you define package level type SSIS variable. Name it DirectoryName and FullFileName, and their types are string.
Then you drag For Loop Each container component, and configure its property through Foreach Loop Editor. At Collection section, you define whatever string inside Folder textbox, and make sure Fully Qualified radio button at Retrieve File Name section is selected. Now the dynamic part is located in Expressions of the component. Click on it to display Property Expression Editor. On Property column select Directory and on Expression click ellipsis button to display Expression Builder, and drag variable DirectoryName from Variables list box to Expression textbox, you can click Evaluate Expression button to make sure before you go back to Expression Editor window by clicking OK. Now you can change DirectoryName not directly into Foreach Loop component, but from variable.
The task has output of full file name (with folder path), and you need to provide variable that will receive return string value from the task. Let’s go to Variable Mappings, and set Variable column to variable FileName, and click OK to go back to Control Flow designer. Drag Script Task into Foreach Loop Container task, and open its editor, on Script section specify FileName in ReadOnlyVariables because I just want to pop up its value through message box. Click on design script button to open another new Visual Studio IDE, and type MsgBox(Dts.Variables(“User::FileName”). Value.ToString()), close it.
Now you fill in DirectoryName variable, that you want to pop up all files inside the directory. Run the package. And it will pop up file names whose files reside in the directory.
Now you need to expand your DirectoryName variable to outside SSIS package, so each time you don’t have to edit your SSIS when directory name changes. Choose SSIS menu and select Package Configurations. Click Add button. Choose XML configuration file type and specify its name, on next screen you choose Value property of DirectoryName.
Now you can specify command line that calls the package as below:
DTEXEC /File “your dts file path” /CONFIGFILE “your xml config file path” /MAXCONCURRENT “-1” /CHECKPOINTING OFF /REPORTING N
With command line specification above, you can achieve same behaviour as it is in Visual Studio IDE. Your package parameter doesn’t tightly coupled with its package by extracting out to XML file, so you can change the value in XML file without having to change to package.
SSIS offers more flexibility and debugging feature that doesn’t exists in previous ETL version (DTS). This article I will let you know one of property of each component or task that makes your SSIS package more flexible through Expression Editor and more configurable ever than before. Even you can change your parameter value from XML file.
For example, I would like to iterate each files in particular folder, and I make this particular folder is configurable, so I have change input parameter from outside SSIS package, input parameter will be got from outside SSIS environment for example, from XML file or through command line.
Let’s jump into new SSIS package for making it clear for you. First, you define package level type SSIS variable. Name it DirectoryName and FullFileName, and their types are string.
Then you drag For Loop Each container component, and configure its property through Foreach Loop Editor. At Collection section, you define whatever string inside Folder textbox, and make sure Fully Qualified radio button at Retrieve File Name section is selected. Now the dynamic part is located in Expressions of the component. Click on it to display Property Expression Editor. On Property column select Directory and on Expression click ellipsis button to display Expression Builder, and drag variable DirectoryName from Variables list box to Expression textbox, you can click Evaluate Expression button to make sure before you go back to Expression Editor window by clicking OK. Now you can change DirectoryName not directly into Foreach Loop component, but from variable.
The task has output of full file name (with folder path), and you need to provide variable that will receive return string value from the task. Let’s go to Variable Mappings, and set Variable column to variable FileName, and click OK to go back to Control Flow designer. Drag Script Task into Foreach Loop Container task, and open its editor, on Script section specify FileName in ReadOnlyVariables because I just want to pop up its value through message box. Click on design script button to open another new Visual Studio IDE, and type MsgBox(Dts.Variables(“User::FileName”). Value.ToString()), close it.
Now you fill in DirectoryName variable, that you want to pop up all files inside the directory. Run the package. And it will pop up file names whose files reside in the directory.
Now you need to expand your DirectoryName variable to outside SSIS package, so each time you don’t have to edit your SSIS when directory name changes. Choose SSIS menu and select Package Configurations. Click Add button. Choose XML configuration file type and specify its name, on next screen you choose Value property of DirectoryName.
Now you can specify command line that calls the package as below:
DTEXEC /File “your dts file path” /CONFIGFILE “your xml config file path” /MAXCONCURRENT “-1” /CHECKPOINTING OFF /REPORTING N
With command line specification above, you can achieve same behaviour as it is in Visual Studio IDE. Your package parameter doesn’t tightly coupled with its package by extracting out to XML file, so you can change the value in XML file without having to change to package.
Thursday, October 25, 2007
Slowly Changing Dimension in SSIS
Slowly Changing Dimension in SSIS
By: Kasim Wirama, MCDBA, MVP SQL Server
Another SSIS component that is interesting to examine is Slowly Changing Dimension. Term of SCD (slowly changing dimension) is used to describe changing row attribute in dimension or lookup table, such as change of product price. Changing of these row attributes has some implications of table design, depending whether business user want to keep its history or just override the destination the cell value.
In this case, there are 3 solutions depending on how scenario business users would like to choose:
1. do not keep historical data
it’s simple to be implemented, but users couldn’t see historical value, because old value is overridden by new value
2. keep historical data
users can see its historical value, but table will grow bigger, and transaction table that refers to lookup table should be updated its foreign key reference to lookup table (SCD table)
3. keep historical data in one column
users can see last history value, but you will implement by your own code because the component do not provide for this kind of solution.
Let’s take a look below sample. I create simple table called dbo.EmployeeSalary that keeps salaries of each employee. You can take scripts below :
create table dbo.EmployeeSalary
(
EmployeeID int not null identity(1,1) primary key,
EmployeeNumber char(3) not null ,
EmployeeName varchar(10) not null,
SalaryPerMonth int not null,
[Address] varchar(15) not null,
SalaryStatus varchar(10) not null default ('current')
);
Now, there are some changes, first change is to raise salary of John from 3500 to 4000, second one is to change smiley address from garden road to Sudirman road. Users want to keep history of salary changes but override address changes. So for this case, you will examine how to use this component. But here is the text file that contains data changes, named it list.txt
empnumber empname salary permonth address
001 john 4000 orchad road
002 smiley 3000 sudirman road
Create SSIS package, add Flat file source and SCD component to Data Flow designer, connect two of them, point the Flat File source component to the text file (list.txt), change column type of salary permonth from string to integer. On SCD wizard, you create destination OLEDB point to table dbo.EmployeeSalary, and setting up address (Input column) to Address (Dimension column) as Not a key column, similar to empname, salary per month, and business key for empnumber because it uniquely identifies each instance of employee from business user perspective.
Next, you specify SalaryPerMonth as HistoricalAttribute and Address as ChangingAttribute.
On HistoricalAttributeOptions, select SalaryStatus on Columns to indicate current record, Current at Value When Current, and Expired at Expiration Value.
When you complete the wizard, run the package. If the package runs successfully, you will get 3 rows at dbo.EmployeeSalary. With expired row on salary changes belonged to John and Smiley’s updated address.
If you see final result of the package, SCD component will expand to several component, so it will make implementation of SCD much more easier in SSIS world.
By: Kasim Wirama, MCDBA, MVP SQL Server
Another SSIS component that is interesting to examine is Slowly Changing Dimension. Term of SCD (slowly changing dimension) is used to describe changing row attribute in dimension or lookup table, such as change of product price. Changing of these row attributes has some implications of table design, depending whether business user want to keep its history or just override the destination the cell value.
In this case, there are 3 solutions depending on how scenario business users would like to choose:
1. do not keep historical data
it’s simple to be implemented, but users couldn’t see historical value, because old value is overridden by new value
2. keep historical data
users can see its historical value, but table will grow bigger, and transaction table that refers to lookup table should be updated its foreign key reference to lookup table (SCD table)
3. keep historical data in one column
users can see last history value, but you will implement by your own code because the component do not provide for this kind of solution.
Let’s take a look below sample. I create simple table called dbo.EmployeeSalary that keeps salaries of each employee. You can take scripts below :
create table dbo.EmployeeSalary
(
EmployeeID int not null identity(1,1) primary key,
EmployeeNumber char(3) not null ,
EmployeeName varchar(10) not null,
SalaryPerMonth int not null,
[Address] varchar(15) not null,
SalaryStatus varchar(10) not null default ('current')
);
Now, there are some changes, first change is to raise salary of John from 3500 to 4000, second one is to change smiley address from garden road to Sudirman road. Users want to keep history of salary changes but override address changes. So for this case, you will examine how to use this component. But here is the text file that contains data changes, named it list.txt
empnumber empname salary permonth address
001 john 4000 orchad road
002 smiley 3000 sudirman road
Create SSIS package, add Flat file source and SCD component to Data Flow designer, connect two of them, point the Flat File source component to the text file (list.txt), change column type of salary permonth from string to integer. On SCD wizard, you create destination OLEDB point to table dbo.EmployeeSalary, and setting up address (Input column) to Address (Dimension column) as Not a key column, similar to empname, salary per month, and business key for empnumber because it uniquely identifies each instance of employee from business user perspective.
Next, you specify SalaryPerMonth as HistoricalAttribute and Address as ChangingAttribute.
On HistoricalAttributeOptions, select SalaryStatus on Columns to indicate current record, Current at Value When Current, and Expired at Expiration Value.
When you complete the wizard, run the package. If the package runs successfully, you will get 3 rows at dbo.EmployeeSalary. With expired row on salary changes belonged to John and Smiley’s updated address.
If you see final result of the package, SCD component will expand to several component, so it will make implementation of SCD much more easier in SSIS world.
Wednesday, October 24, 2007
Unpivot columns in SSIS
Unpivot columns in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
If you see my previous article about pivoting data using SSIS component called Pivot, you will agree with me that it is easy to use, all you need to do is to define relational query that in turns will be converted to the pivot format. Now here is the vice versa condition, you are given table in pivot format, and there is a requirement that need to convert from pivot format to unpivot format. SSIS provides the component to make it possible and easy to use, the component name is Unpivot component.
Some properties of the component that you must configure are :
1. input columns
columns that are needed as input column stream
2. destination column
a column name that you will unpivot from pivoted columns to unpivoted rows
3. pivot key value
Unique name for each pivot columns, usually same with input columns.
4. pivot key value column name
New column name in unpivoted form that contains caption name of pivoted columns.
Let’s look pivoted form at AdventureWorks database, and you query select statement from view called [Sales].[vSalespersonsalesbyfiscalyears]. You notice that it displays sales amount for each sales person from 2002 until 2004.
And final results of unpivoted as follows :
SalesYear AmountSales SalesPersonID FullName Title SalesTerritory
2002 1951086.8256 275 Michael G Blythe Sales Rep Northeast
2003 4743906.8935 275 Michael G Blythe Sales Rep Northeast
2004 4557045.0459 275 Michael G Blythe Sales Rep Northeast
2002 2800029.1538 276 Linda C Mitchell Sales Rep Southwest
2003 4647225.4431 276 Linda C Mitchell Sales Rep Southwest
2004 5200475.2311 276 Linda C Mitchell Sales Rep Southwest
…..so forth
SalesYear and AmountSales are new columns in unpivoted format. Pivot key values are pivoted column (2002-2004). Pivot key value column name is caption name of pivot column name that is mapped to SalesYear unpivot column name. Destination columns are AmountSales. Input columns are all columns including pivoted columns.
You can see how it works by starting new packages, and dragging OLEDB source component, OLEDB destination component, and Unpivot component. Connect each of them from Source – Unpivot – Destination, and specify query for the view at source component. If you open transformation editor of unpivot component, you will see 4 properties as I mention above, configure it. Configure columns of salespersonid, fullname, title, salesterritory as pass through column, the rest (pivoted columns) as input columns.
Run your package, and the final results similar to what I display above.
With fewer steps of configurations, you will do unpivot data extremely easily.
By : Kasim Wirama, MCDBA, MVP SQL Server
If you see my previous article about pivoting data using SSIS component called Pivot, you will agree with me that it is easy to use, all you need to do is to define relational query that in turns will be converted to the pivot format. Now here is the vice versa condition, you are given table in pivot format, and there is a requirement that need to convert from pivot format to unpivot format. SSIS provides the component to make it possible and easy to use, the component name is Unpivot component.
Some properties of the component that you must configure are :
1. input columns
columns that are needed as input column stream
2. destination column
a column name that you will unpivot from pivoted columns to unpivoted rows
3. pivot key value
Unique name for each pivot columns, usually same with input columns.
4. pivot key value column name
New column name in unpivoted form that contains caption name of pivoted columns.
Let’s look pivoted form at AdventureWorks database, and you query select statement from view called [Sales].[vSalespersonsalesbyfiscalyears]. You notice that it displays sales amount for each sales person from 2002 until 2004.
And final results of unpivoted as follows :
SalesYear AmountSales SalesPersonID FullName Title SalesTerritory
2002 1951086.8256 275 Michael G Blythe Sales Rep Northeast
2003 4743906.8935 275 Michael G Blythe Sales Rep Northeast
2004 4557045.0459 275 Michael G Blythe Sales Rep Northeast
2002 2800029.1538 276 Linda C Mitchell Sales Rep Southwest
2003 4647225.4431 276 Linda C Mitchell Sales Rep Southwest
2004 5200475.2311 276 Linda C Mitchell Sales Rep Southwest
…..so forth
SalesYear and AmountSales are new columns in unpivoted format. Pivot key values are pivoted column (2002-2004). Pivot key value column name is caption name of pivot column name that is mapped to SalesYear unpivot column name. Destination columns are AmountSales. Input columns are all columns including pivoted columns.
You can see how it works by starting new packages, and dragging OLEDB source component, OLEDB destination component, and Unpivot component. Connect each of them from Source – Unpivot – Destination, and specify query for the view at source component. If you open transformation editor of unpivot component, you will see 4 properties as I mention above, configure it. Configure columns of salespersonid, fullname, title, salesterritory as pass through column, the rest (pivoted columns) as input columns.
Run your package, and the final results similar to what I display above.
With fewer steps of configurations, you will do unpivot data extremely easily.
Thursday, October 18, 2007
Implementing Conditional Task Execution with SSIS
By: Kasim Wirama, MCDBA, MVP SQL Server
If you have ever developed DTS package, you must be familiar with Precedence Constraint (Success, Failure, Completion) move from one task to another task. Some enhancements for these Precedence Constraint is added with conditional that you can apply with expression such as @var1 > 5, that should be evaluated before move execution from one task to another task.
In SSIS, you can define execution condition not only by constraint (success or failure) but also by conditional expression (<, >, <= , >= , <>) and mixed (constraint and expression) with AND and OR operator.
Let say, you have 2 tasks at Control Flow Designer, first task is Data Flow Tasks, second task is Execute SQL Tasks, you want second task is executed when first tasks contains records and it is executed without error.
You can achieve this scenario by using RowCount Component that counts number of rows returned from previous tasks. This component requires a variable that saves value of number of rows.
You can try it out, by make new package. Make OLEDB connection at connection manager area. Specify OLEDB source as SQL Server , database AdventureWorks.
Next you create SSIS variables, name it Var1 with Int32 type. And drag Data Flow Tasks into Control Flow designer area, then double click to switch to Data Flow designer. Drag OLE DB source, specify connection manager from OLEDB connection that you have just created, and point to table AWBuildVersion.
Drag RowCount component, connect OLEDB Source’s green arrow to it, and then double click RowCount. At Component Properties tab, assign Var1 variable to VariableName property, close the window by clicking OK button.
Switch back to Control Flow designer, drag Execute SQL task from toolbox to below Data Flow tasks, and connect Data Flow tasks to Execute SQL task. Edit Execute SQL task, and assign ConnectionType property as OLEDB, Connection property as OLEDB connection, and SQLStatement as select ‘dummy’. Edit precedence constraint (green arrow between 2 tasks), and in Precedence Constraint Editor choose Expression and Constraint, fill out Expression entry with @Var1 > 1 (assume number of rows in AWBuildVersion table is 1, so the expected behaviour is Execute SQL Task has been never got executed because it doesn’t pass from conditional expression @Var1 > 1).
Close Precedence Constraint Editor by click OK button. And you execute package now, for first time second task doesn’t get executed because number of rows is 1
End package execution, edit the precedence constraint, this time you change Expression to @Var1 <=1, execute the package again. This time you see that second task gets executed. The usage of RowCount component comes into play for this scenario.
SSIS gives you more choice by providing Expression criteria, besides constraint (success, failure, completion) that must be fulfilled by next task before allowing the task gets executed and combination between constraint and expression.
If you have ever developed DTS package, you must be familiar with Precedence Constraint (Success, Failure, Completion) move from one task to another task. Some enhancements for these Precedence Constraint is added with conditional that you can apply with expression such as @var1 > 5, that should be evaluated before move execution from one task to another task.
In SSIS, you can define execution condition not only by constraint (success or failure) but also by conditional expression (<, >, <= , >= , <>) and mixed (constraint and expression) with AND and OR operator.
Let say, you have 2 tasks at Control Flow Designer, first task is Data Flow Tasks, second task is Execute SQL Tasks, you want second task is executed when first tasks contains records and it is executed without error.
You can achieve this scenario by using RowCount Component that counts number of rows returned from previous tasks. This component requires a variable that saves value of number of rows.
You can try it out, by make new package. Make OLEDB connection at connection manager area. Specify OLEDB source as SQL Server , database AdventureWorks.
Next you create SSIS variables, name it Var1 with Int32 type. And drag Data Flow Tasks into Control Flow designer area, then double click to switch to Data Flow designer. Drag OLE DB source, specify connection manager from OLEDB connection that you have just created, and point to table AWBuildVersion.
Drag RowCount component, connect OLEDB Source’s green arrow to it, and then double click RowCount. At Component Properties tab, assign Var1 variable to VariableName property, close the window by clicking OK button.
Switch back to Control Flow designer, drag Execute SQL task from toolbox to below Data Flow tasks, and connect Data Flow tasks to Execute SQL task. Edit Execute SQL task, and assign ConnectionType property as OLEDB, Connection property as OLEDB connection, and SQLStatement as select ‘dummy’. Edit precedence constraint (green arrow between 2 tasks), and in Precedence Constraint Editor choose Expression and Constraint, fill out Expression entry with @Var1 > 1 (assume number of rows in AWBuildVersion table is 1, so the expected behaviour is Execute SQL Task has been never got executed because it doesn’t pass from conditional expression @Var1 > 1).
Close Precedence Constraint Editor by click OK button. And you execute package now, for first time second task doesn’t get executed because number of rows is 1
End package execution, edit the precedence constraint, this time you change Expression to @Var1 <=1, execute the package again. This time you see that second task gets executed. The usage of RowCount component comes into play for this scenario.
SSIS gives you more choice by providing Expression criteria, besides constraint (success, failure, completion) that must be fulfilled by next task before allowing the task gets executed and combination between constraint and expression.
Tuesday, October 16, 2007
Pivoting data with SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
SSIS provides other interesting component for data pivoting. It is called Pivot component. It will automate pivot requirement. There are some mandatory properties that you should set up to make it display your data correctly.
1. Pivot Column
Is a column that you would like to change it from rows to columns.
2. Row Column
Is a column that comes from rows in a column.
3. Value column
Is an aggregated value based on row column.
For example, you will display total quantity for first 3 months of one year sales quantity for each product. In this case you will specify month as pivot column, content of month as Row column, and total quantity as Value column, as query below :
Select
p.[name],
p.productnumber,
datepart(mm,th.transactiondate) as month,
sum(th.quantity) as total
from production.product as p
inner join production.transactionhistory as th
on p.productid = th.productid
where th.transactiondate between '20040101' and '20040301'
group by p.[name],p.productnumber,datepart(mm,th.transactiondate)
order by p.[name],p.productnumber,datepart(mm,th.transactiondate)
You can try to use it by dragging OLEDB destination and Pivot component to Data Flow designer and connect OLEDB to Pivot component.
In Input tab of Pivot component, you check all columns, and in Input and Output Properties tab, you can see all input columns that you specify in second tab. Now you specify PivotUsage property for Name column of input columns to 0 (row), productNumber column to 1 (key row), month column to 2 (pivot column), total column to 3 (pivot value).
Expand Pivot Default tab, and add output column by clicking Add Column button. Name it ProductName, set Source Column property to LineageID of Name column (input column). Add new column, name it ProductNumber, set its Source Column property to LineageID of ProductNumber column (input column).
Add new output column, name it January, set Pivot Key Value to 1, and Source Column to LineageID of total column (input column), similar to new columns for February and March, the only difference is setting up their PivotKeyValue properties.
Last step is add RecordSet destination, and connect Pivot component to it, add variable with object type and assign It to VariableName property of first tab, check all columns at the second tab.
You run the package. You will see month columns at row level becomes column names.
SSIS provides other interesting component for data pivoting. It is called Pivot component. It will automate pivot requirement. There are some mandatory properties that you should set up to make it display your data correctly.
1. Pivot Column
Is a column that you would like to change it from rows to columns.
2. Row Column
Is a column that comes from rows in a column.
3. Value column
Is an aggregated value based on row column.
For example, you will display total quantity for first 3 months of one year sales quantity for each product. In this case you will specify month as pivot column, content of month as Row column, and total quantity as Value column, as query below :
Select
p.[name],
p.productnumber,
datepart(mm,th.transactiondate) as month,
sum(th.quantity) as total
from production.product as p
inner join production.transactionhistory as th
on p.productid = th.productid
where th.transactiondate between '20040101' and '20040301'
group by p.[name],p.productnumber,datepart(mm,th.transactiondate)
order by p.[name],p.productnumber,datepart(mm,th.transactiondate)
You can try to use it by dragging OLEDB destination and Pivot component to Data Flow designer and connect OLEDB to Pivot component.
In Input tab of Pivot component, you check all columns, and in Input and Output Properties tab, you can see all input columns that you specify in second tab. Now you specify PivotUsage property for Name column of input columns to 0 (row), productNumber column to 1 (key row), month column to 2 (pivot column), total column to 3 (pivot value).
Expand Pivot Default tab, and add output column by clicking Add Column button. Name it ProductName, set Source Column property to LineageID of Name column (input column). Add new column, name it ProductNumber, set its Source Column property to LineageID of ProductNumber column (input column).
Add new output column, name it January, set Pivot Key Value to 1, and Source Column to LineageID of total column (input column), similar to new columns for February and March, the only difference is setting up their PivotKeyValue properties.
Last step is add RecordSet destination, and connect Pivot component to it, add variable with object type and assign It to VariableName property of first tab, check all columns at the second tab.
You run the package. You will see month columns at row level becomes column names.
Referring To Your Reference Table with Fuzzy Lookup
Referring To Your Reference Table with Fuzzy Lookup
By: Kasim Wirama, MCDBA, MVP SQL Server
One of problem of ETL world is referencing non existent data. For example when one column of fact table that is foreign key to a dimension. It is usually caused by bad data in the fact table. So how to handle it more appropriately in cleansing data with less effort. The answer is that using Fuzzy Lookup. This component will help most of painful process in dealing with data cleansing.
This component provides 2 important informations besides cleansed data:
1. Similarity
Is a number between 0 and 1 representing similarity. You can define how close the similarity is. The close the number to 1, the closer similarity should be met.
2. Confidence
Is a number between 0 and 1 representing confidence of match relative to set of matched results.
I will show you how to use this component.
First you make a flat file named employeelist.txt, its content is as below, separated with tab among columns.
EMPNAME EMPTITLE
HOWARD GEN MANAGER
MATTHEW GENERAL MANAGER
LINDA GENERAL MGR
SUSAN JUNIOR MANAGER
ERIC JUNIOR MGR
JOHN JUN MANAGER
Next you created below script in tempdb database, a lookup table :
create table dbo.OccupationTitle
(
OccupationTitleID int not null,
OccupationTitleDesc varchar(30) not null
);
go
insert into dbo.OccupationTitle values (1,'GENERAL MANAGER');
insert into dbo.OccupationTitle values (2,'JUNIOR MANAGER');
Now, you create SSIS package with Visual Studio 2005, more details how to create this, just refer to early links about SSIS.
Drag Data Flow task into Control Flow designer, double click on the task to switch to Data Flow designer. Drag Flat File Source, double click on it, create new Flat File Connection Manager by clicking New button. And create your flat file connection manager as previous article I ever write before, with Column Names in the first data row checked.
Drag Fuzzy Lookup and connect to it. Double click on it, specify OLEDB connection manager and select reference table that you have just created on previous step
On columns tab connect arrow between EMPTITLE and Occupation TitleDesc, specify input column to EMPTITLE, Lookup column to OccupationTitleDesc, Mapping Type to Fuzzy, and check all comparison Flag check box. And check all column on Available Lookup Column and Available Input Column.
On Advanced tab, you can define your own Similarity threshold, if you have no idea, specify it to 0.5.
Drag Recordset Destination, connect Fuzzy Lookup component to it. Create variable with type Object. Double click on Recordset Destination, specify VariableName property to the variable. On Available Input Columns, check all columns. And click OK.
If you want to see contenct of Recordset Destination, you can add Data Viewer component by right clicking arrow between Recordset Destination and Fuzzy Lookup. Now you can run the package.
You can compare original data and cleansed data between EMPTITLE column and OccupationTitleDesc column, and Fuzzy Lookup will find its OccupationTitleID.
What a useful component for data cleansing.
By: Kasim Wirama, MCDBA, MVP SQL Server
One of problem of ETL world is referencing non existent data. For example when one column of fact table that is foreign key to a dimension. It is usually caused by bad data in the fact table. So how to handle it more appropriately in cleansing data with less effort. The answer is that using Fuzzy Lookup. This component will help most of painful process in dealing with data cleansing.
This component provides 2 important informations besides cleansed data:
1. Similarity
Is a number between 0 and 1 representing similarity. You can define how close the similarity is. The close the number to 1, the closer similarity should be met.
2. Confidence
Is a number between 0 and 1 representing confidence of match relative to set of matched results.
I will show you how to use this component.
First you make a flat file named employeelist.txt, its content is as below, separated with tab among columns.
EMPNAME EMPTITLE
HOWARD GEN MANAGER
MATTHEW GENERAL MANAGER
LINDA GENERAL MGR
SUSAN JUNIOR MANAGER
ERIC JUNIOR MGR
JOHN JUN MANAGER
Next you created below script in tempdb database, a lookup table :
create table dbo.OccupationTitle
(
OccupationTitleID int not null,
OccupationTitleDesc varchar(30) not null
);
go
insert into dbo.OccupationTitle values (1,'GENERAL MANAGER');
insert into dbo.OccupationTitle values (2,'JUNIOR MANAGER');
Now, you create SSIS package with Visual Studio 2005, more details how to create this, just refer to early links about SSIS.
Drag Data Flow task into Control Flow designer, double click on the task to switch to Data Flow designer. Drag Flat File Source, double click on it, create new Flat File Connection Manager by clicking New button. And create your flat file connection manager as previous article I ever write before, with Column Names in the first data row checked.
Drag Fuzzy Lookup and connect to it. Double click on it, specify OLEDB connection manager and select reference table that you have just created on previous step
On columns tab connect arrow between EMPTITLE and Occupation TitleDesc, specify input column to EMPTITLE, Lookup column to OccupationTitleDesc, Mapping Type to Fuzzy, and check all comparison Flag check box. And check all column on Available Lookup Column and Available Input Column.
On Advanced tab, you can define your own Similarity threshold, if you have no idea, specify it to 0.5.
Drag Recordset Destination, connect Fuzzy Lookup component to it. Create variable with type Object. Double click on Recordset Destination, specify VariableName property to the variable. On Available Input Columns, check all columns. And click OK.
If you want to see contenct of Recordset Destination, you can add Data Viewer component by right clicking arrow between Recordset Destination and Fuzzy Lookup. Now you can run the package.
You can compare original data and cleansed data between EMPTITLE column and OccupationTitleDesc column, and Fuzzy Lookup will find its OccupationTitleID.
What a useful component for data cleansing.
Subscribe to:
Posts (Atom)