Feeds:
Posts
Comments

Archive for October, 2013

I have had one package called Load Dimensions.dtsx that is used to load dimension data incrementally. This package contains 26 data flow tasks. 14 of them are using SSIS built-in Slowly Changing Dimension (SCD) and the rest using Kimball’s Dimension Merge Slowly Changing Dimension 2008.  As you may know, if the dimension has more than 9000 records, it’s not recommended to use the built-in SCD hence the idea of using Kimball’s SCD instead.  I notice that if the data flow tasks using Kimball are excluded from the package, the package runs with no problems so I conclude that the problem is with the Kimball’s component itself.  In order to replace Kimball, I have used Hash function to build the hashkey in my data source view then compare the haskey between source and destination. If the hashkey in the destination is null then it’s a new record. If the hashkey between source and destination is different, then it is an updated record.

BEFORE:

Kimball

AFTER:

SCD using Hashkey

 

 

Advertisements

Read Full Post »

I am using Kimball’s Dimension Merge Slowly Changing Dimension component to populate data into dimensions incrementally. It was working fine but out of sudden it threw an Internal error complaining about some columns having NULL value but in fact there are not any NULL values in the source data. Google returned with a blog that has a solution to this problem i.e. delete the Dimension Merge Slowly Change Dimension component then create it again. This in fact resolved my problem. 🙂 No idea why it did that though. As far as I could recall, I have added a few columns to the source etc.

Read Full Post »

I have Billing data, which comes from 3 different billing item sources i.e. Referral, Bed Cards and Miscelleanous. Each source has its own last modified date. So instead of breaking the SSIS package down into 3 parts, it’s more efficient to use Foreach Loop Container instead.

First of all, create a variable called BillingItemRecords and set its data type to use Object. Then create a data flow task to get a list of billing item sources and store its output into a recordset destination. Modify the recordset destination so that the output is stored in the BillingItemSourceRecords variable.

Get Billing Item Source Records

RecordSet Destination Configuration

Secondly, create another variable called BillingItemSource and set its data type to use String. Then drag and drop a Foreach Loop Container into the Control Flow. Configure it so that its Enumerator is set to use Foreach ADO Enumerator, its ADO object source variable is set to User::BillingItemSourceRecords and its Enumeration mode is set to “Rows in the first table”. Set the variable mappings to use User:BillingItemSource with an Index of 0.

Foreach Loop Container Collection Configuration

Foreach Loop Container Variable Mappings Configuration

Thirdly, move all tasks in the Control Flow into the Foreach Loop Container. Then modify the “Get Last Modified Date” Execute SQL Task to use parameter mapping of the User::BillingItemSource, “Input” as the direction, “VARCHAR” as the data type, 0 as the Parameter Name and 50 as the parameter size. Store its result set to the User::LastModifiedDate and name it as “LastModifiedDate” as the Result Name.

Get Last Modified Date Parameter Mapping configuration

Get Last Modified Date ResultSet configuration

Fourthly, modify the OLE DB Source in the  “Load data into fact_Billing” data flow task so that it retrieves the billing data based on the item source and the last modified date. Store the LastModifiedDate to the User::LastModifiedDate variable and the ItemSource to the User::BillingItemSource variable. See below as an example:

SELECT *
FROM [DataMart].[vw_fact_Billing]
WHERE COALESCE(ModifiedDate, ActivityDate) > ?
AND ItemSource = ?

OLEDB Source Set Query Parameters

Screenshots before and after:

BEFORE:

Load Fact_Billing BEFORE

AFTER:

Load Fact_Billing AFTER

Read Full Post »

I have a fact table containing measures in decimal points such as decimal(5,2). When these measures brought forth to the OLAP Cube, they get rounded up. The data types of these measures in the cube are “Inherited”. Changing the data type from “Inherited” to “Double” had not worked resulting in cube processing failure saying that the data type must match the data type in the source. To resolve this, the measure  data type in the fact  table of the data warehouse needs to use float data type.

 

Read Full Post »

%d bloggers like this: