Feeds:
Posts
Comments

Archive for March, 2021

I have a data mart that is processed incrementally everyday. The number of records that take part in the incremental load of the data mart are stored into a table (say the table name is payroll_incremental) within the SQL Server. A tabular model has also been built for this data mart, however, it takes from 45 mins to an hour to process the whole tabular model each day (i.e. process mode set to full). This is not good so my goal is to also incrementally process the tabular model.

First of all, you would need to partition the tabular model. You can do it in SSMS or SSDT. Select the table within the tabular model and then right click to bring up a pop up menu. From the menu, select “Partitions”. Within the “Partitions” menu, select the “New” icon to create a new partition.

In my example, I have created 2 partitions i.e. the “new updated or deleted” and “unchanged” partitions.

Query for the “new updated or deleted” partition.

I have reused the payroll_incremental table and join it (INNER JOIN) to the query that populates the tabular model. This will give me the number of records that are new, have changed or been deleted).

Query for the “unchanged” partition:

I have also reused the payroll_incremental table but this time LEFT JOIN it to the query that populates the tabular model and make sure that the records returned do not exist in the payroll_incremental table. This will give me the number of records that have not changed. This partition only needs to be processed ONCE or NEVER if the model has been processed previously.

Next, we need to look at how the partitions can be processed and which mode(s) to use.

There are 5 modes available to process the tabular model and/or table partitions. They are defined below according to Microsoft.

1. Process Default: Loads data into unprocessed partitions or tables. Hierarchies, relationships, and calculated columns for affected tables are updated.

2. Process Full: Loads data into all partitions or tables. Hierarchies, relationships, and calculated columns for all tables are updated.

3. Process Data: Loads only the data into a partition or table.

4. Process Clear: Removes all data from a partition, table or database.

5. Process Add: Incrementally loads data into the partition or table.

In order to fully understand what each process exactly does, I tried each one of them on my tabular model.

Process Add by definition says that it incrementally loads data into the partition or table. In fact, it only adds data into the partition. This may cause data duplication.

Process Clear by definition removes all data from a partition, table or database. In fact, what it does is it not only clears all data from the partition but also delete the hierarchies, relationships from the tables.

Process Data by definition loads only the data into a partition or table. Indeed, it does that. However, if you do a Process Clear followed by Process Data, you won’t be able to browse the tabular model afterwards because the hierarchies and relationships have been removed in the Process Clear.

Process Full by definition loads data into all partitions or tables. Hierarchies, relationships, and calculated columns for all tables are updated. Indeed, it does only that and it works.

Process Default by definition loads data into unprocessed partitions or tables. Hierarchies, relationships, and calculated columns for affected tables are updated. I am careful with the “unprocessed” word and would not use this mode as I cannot think of any case where the partition stays unprocessed.

After trying all the 5 modes above, to incrementally process the tabular model successfully, you would need to clear the data in the partition first (Process Clear) and then load the data into the partition afterwards (Process Full). It is because you cannot modify records in the model once they are loaded.

You can script out each process and then use the scripts in setting up the SQL job. You would need to create 2 job steps (with the type of SQL Server Analysis Services Command) i.e. Process Clear and followed by Process Full. Test the model by comparing the row count or sum of the measure with the source. Hope this blog helps.

{
“refresh”: {
“type”: “clearValues”,
“objects”: [
{
“database”: “Payroll”,
“table”: “Payroll”,
“partition”: “New Updated or Deleted”
}
]
}
}

{
“refresh”: {
“type”: “full”,
“objects”: [
{
“database”: “Payroll”,
“table”: “Payroll”,
“partition”: “New Updated or Deleted”
}
]
}
}

UPDATE (17Aug2021)

The above solution is based on processing the delta partition every day and the unchanged partition every weekend. This solution has not worked because the delta may exist in the unchanged partition and as the result, the delta may not be updated everyday.

The solution that has worked is to partition the table in the tabular model based on whether the data is in the current financial year or not i.e. “Current Financial Year” partition and “Not in Current Financial Year” partition. There is no need to hard code the year; which is better.

So you link your fact table with the Date dimension and set the filter to the current financial year. This partition will be processed daily. The “Not in Current Financial Year” partition will be processed every Sunday.

UPDATE (06Sep2021)

All of the other tables in the tabular model would also need to be reprocessed each day.

Read Full Post »