Slowly Changing Dimensions Concept and SCD Type 1 Implementation in Azure Cloud Data Warehouse

Amitava Nandi
6 min readSep 1, 2020

--

Objective: First, understand the concept of Slowly Changing Dimensions in detail. Then, we will move into the SCD type 1 design and implementation in Azure Cloud using Azure Data Factory.

Prerequisites: Before moving into the Slowly Changing Dimensions, we need to understand the following few things.
1. Dimensional model in data warehouse: Basically, the dimensional model is used by many OLAP systems. It helps to represent the data more logically using facts and dimensions which will improve the data retrial process.
2. Facts & Dimensions: Facts are associated with business measurement, amount, or event which are typically numeric, continuously valued. Dimensions are context surrounding facts- who, when, where, and under what conditions the fact should be measured which may be discrete or numeric valued.
3. Fact & Dimension Tables: Fact tables store a set of business measured values that are used for calculations. Dimension tables store all the analyzed attribute values. The fact table’s values get displayed in the business reports and those reports become meaningful using Dimension table’s value. In dimensional modeling the central fact tables are surrounded by dimension tables.

Slowly changing dimension is a process to handle the dimensional data where data can change slowly over time rather than regular bases or fixed time interval. e.g. Suppose, the price of a product will change after sometime or a customer will buy a new house and changed his address etc.
In data warehouse environment, there may be a requirement to keep track of the change in dimension values which will be used to create a report at any given point of time using the historical data. Based on the business requirements, we will decide whether we will store the historical data of the dimension table or not. If business asked to keep the historical data, then how we will store them. These are manged by implementing SCD using
various approaches.
Type 0: Always constant.
Type 1 : No need to store historical data, overwrite using new data.
Type 2 : Creating a new additional record with some active flag column.
Type 3 : New values are updated but the old values still remain valid as the second option into an another column.
Type 4 : Using a separate history table to maintain the historical data.
Type 6 : Combination of type 1, 2 and 3.
Type 1, Type 2 and Type 4 are most popular . So, this article will help you to understand the SCD Type 1 in detail with Azure Data Factory implementation.

Slowly Changing Dimensions Type 1 : If there is a change in existing value of the dimensional attributes, then the existing value will be overwritten by the new value which is basically a update kind of thing. SCD Type 1 is not keep the historical data, so it is easy to maintain.

Scenario: In a ETL or Data Loading process, we will load the data from source to the destination. So, for this purpose considering a Product Dimension table(product_dim). For this process, we should use a staging table to copy the same data from source, then we will load them into the target table after deriving three new columns, which helps us to identify whether a record has added or updated and when it it added or updated in target system:
1. Surrogate Key: Each row in a dimension table should be identified by an unique surrogate key value.
2. Added Date: When a particular record has been added.
3. Updated Date: When a particular record has been updated by new value.
The Process will be look like following example:

fig: Data Loading 1st & 2nd Run Using SCD Type 1

Process Design: Please go through the following Diagram.

fig: Flow diagram of SCD Type 1

SCD Type 1 Process Development Process using Azure Data Factory:
1. First, create two Azure SQL DB resources: a. Source DB and b: Target DB
2. Create the Azure Data Factory resource.
3. Create one table in the source DB: Product table.
4. Create two tables in target DB: Product Dim Staging and Product Dim table.
5. Insert few data into the source DB product table.
6. Go to the Azure Data Factory > Manage > Linked services > New and create two new linked services one for source and another for target.

fig: Linked Services

7. Then, create three data set (Azure Data Factory > Author > Datasets > New dataset), one for source table, one for staging table and another for target table.

fig: Data Sets

8. Go to the Azure Data Factory > Author > Pipeline > New Pipeline
9. First, create a copy activity to copy the data from source product table to staging product dim table. Here, we use the below query to select the data from source product table:
select * from [schema].[configuration table name] where Sync_flag=0;
10. Then, create a data flow activity to implement the whole logic of SCD type 1 to copy data from staging to target product dim table.
11. After loading the data into product dim table, we will update the Sync_flag as 1 in the source product table.

fig: Pipeline

12. Within the data flow activity, we should implement the actual SCD type 1 logic.

fig: Data flow activity transformations

13. First, we take 3 source within the data flow activity:
a. Staging source: for the staging product dim table.
b. Target source: for the target product dim table.
c. Surrogate Key source: to get the max value of the surrogate key column from the target product dim table.
14. Then, create two parallel path from Staging source using Exists transformation:
a. Update record path: Join the stage and target table using primary key column and select only matching records from there.
b. Insert record path: Join the stage and target table using primary key column and select only non matching records from the staging table.
15. For the insert record path, we use the lookup transformation on Surrogate Key source and create a max_surrogate_key column, then using the surrogate key transformation we will populate value for each row starting from 1 in s_key column, then using derived column transformation create the surrogate_key column using the following logic:
surrogate_key = max_surrogate_key + s_key
16. Then, add the current timestamp into added_date column and insert the data into the target product_dim table using sink transformation allow insert method.
17. For the update record path, we will use lookup transformation on Target source to get the existing value of surrogate_key and the added_date column value. Then, add the current timestamp into updated_date column using derived column transformation and use the alter transformation to update the record into the target product_dim table via sink transformation allow update method.

Congrats! You just learned how to design and develop the slowly changing dimensions type 1 in Azure cloud platform using Azure Data Factory. Hope, this article also helps you to understand the Slowly Changing Dimensions concept.
You can follow the above steps to build the SCD type 1 process — for more reference please visit to my git repository of SCD_Type1. You can also ask your doubt in the conversation section of this article.

Next article — Slow Changing Dimension Type 2 and Type 4 Concept and Implementation.

Thank you for reading this article.
visit to the
https://github.com/CoolexxAmitava/Full_Or_Incremental_Load
**********git repository: Azure Data Factory implementation************
Visit next article about SCD Type 2 & SCD Type 4 :
https://medium.com/@apu.nandi88/slow-changing-dimension-type-2-and-type-4-concept-and-implementation-398c8dec7030

--

--

Amitava Nandi
Amitava Nandi

Written by Amitava Nandi

Experienced IT Professional, Expertise in Azure Data Engineering, Big Data Solution, Data Analysis, Python, Spark, SQL