Slow Changing Dimension Type 2 and Type 4 Concept and Implementation

Amitava Nandi
4 min readOct 9, 2020

--

This article helps you to understand the concept of Slow Changing Dimension Type 2 and Type 4. Here, you can also get idea about the implementation of SCD Type 2 & Type 4 using process diagram. The implementation for both the processes using Azure Data Factory are also shared at the end of this article. Please, go through the Slowly Changing Dimensions Concept first before reading this article.

While copying data from a source system to a target system, we can load all the date from source to target table for the first time. After that, we have to follow some process to maintain the new data insertion and update existing data in the target system.
Slowly Changing Dimensions Type 2: This type of SCD will insert all the records into the target system and mark the old records as inactive and all the records(new or old) should be present in the target table. We should add few new columns into our target table. Those columns will help us to maintain the historical change management and ensure optimal use of the active record.
· Surrogate Key: The original ID will no longer be sufficient to identify the specific record. So, it is required to create a new ID/Key so that we can join specific records with another table.
· Active Flag: To identify current state of each record, active[0] or inactive[1].
· Start Date: The date from which the specific historical version is active.
· End Date: The date from which the specific historical version is inactive.
SCD Type2 can easily maintain the history for the entire record and can perform change-over-time analysis. However, it also comes with more maintenance overhead, increased storage requirement and potential performance impacts if used on very large table.

fig: Data Loading 1st & 2nd Run Using SCD Type 2

Slowly Changing Dimensions Type 4: The idea behind the SCD Type 4 is almost same as SCD Type 2. Here, we will also maintain the historical data but not in the same table. We will create a new history table to keep the old records over there. Separating the historical data makes our dimensions smaller and therefore reduces complexity and improves performance if the majority of uses only need the current value. SCD type 4 provides a solution to handle the rapid changes in the dimension tables.

fig: Data Loading 1st & 2nd Run Using SCD Type 4

When we have dimension attributes which changes very frequently, the dimension grow very rapidly causing considerable performance and maintenance issues if we are keeping historical data into the same table using SCD Type 2. In such cases, SCD Type 4 gives us the advantage by storing the historical data into an another table.

Process Design: Please go through the following Process Diagrams to understand the steps and stages to implement SCD Type2 and SCD Type4.

fig: Flow diagram of SCD Type 2
fig: Flow diagram of SCD Type 4

SCD Type 2 Pipeline and Data Flow Activity using Azure Data Factory:
1. Select and Copy the data from Source table to Staging table where sync flag=0.
2. Load the new records into the Target table from the Staging table after adding or deriving the surrogate key column as (MAX value from Target table + 1) to (MAX value from Target table + Count of New records) and start date as current timestamp.
3. Left semi join between Target table (where active flag=0)and Staging table using primary key/business key to select the common records where surrogate key column values are less than (MAX value from Target table + 1).
4. Update those common records in Target table — set end date as current timestamp and active flag=1 (inactive).

fig: Data flow activity transformations for SCD Type 2

SCD Type 4 Pipeline and Data Flow Activity using Azure Data Factory:
First 3 steps of SCD Type 4 are same as SCD Type 2. The only change is no need to add/derive active flag in step 2. After those above 3 steps, the 4th step is following.
4. Move those common records(selected after doing left semi join) from Target table to History table after adding/deriving the end date as current timestamp.

fig: Data flow activity transformations for SCD Type 4

Congrats! You have just learned the concepts of slowly changing dimensions Type 2 and the slowly changing dimensions Type 4. Hope, the process/flow diagrams will also help you to develop the SCD Type 2 and SCD Type 4.
You can follow the above steps to build the SCD Type 2 and SCD Type 4 Data Flow Activity using Azure Data Factory — for more reference please visit to my git repository of SCD Type 2 and SCD Type 4. You can also ask your doubt in the conversation section of this article.

--

--

Amitava Nandi

Experienced IT Professional, Expertise in Azure Data Engineering, Big Data Solution, Data Analysis, Python, Spark, SQL