When building business intelligence applications using the enterprise data warehouse or production databases as the source, attribute changes need to be handled with considerable care. The way you model your dimensions and facts depends on how you want changes to be reflected in your applications, or whether your business requires change history for reporting at all.
The handling of dimensions that change slowly and irregularly over time, known as Slowly Changing Dimensions (SCD), is especially challenging. Maintaining history on Slowly Changing Dimensions requires that you model your dimensions to capture every state of data and associate it with timestamps, version, or current status for correct identification of the most recent, down to the original attribute copy. Preserving this information allows a business to create reports, for example, the impact of sales generated by a salesperson when he was transferred to 3 different locations across a period of two years. Since a sales reps’ location is an attribute that does not change frequently, the associated dimension must be modeled to maintain history for specified attributes.
In this blog, we will take a closer look at the possible implementations of Slowly Changing Dimensions in data warehouses, the different approaches taken to maintain SCD history, the effort involved, and how businesses can automatically set up SCDs.
What are Slowly Changing Dimensions?
As the name says, Slowly Changing Dimensions refer to dimensions that change slowly over time and lack a fixed schedule for changes. Territory or Product Name are examples of dimension attributes that are unlikely to change frequently and thus may be specified as an SCD if the business wants to record the history of changes for these attributes.
Types of Slowly Changing Dimensions in Data Warehousing
To illustrate the concept better, we will be using a hypothetical data warehouse scenario throughout this blog, where the business must track historical changes in the Product dimension.
One of the most important visionaries in the data warehousing space, Ralph Kimball, places SCDs into 3 primary types, claiming that every instance of time variation in data warehouses can be handled using the aforementioned 3 types, or a combination of the initial 3 types. If you’re using or planning to use any modern data warehouse tool, like Microsoft SSIS or Astera’s data warehouse automation tool, you will find an SCD component that allows you to specify which type of SCD you want to use on the fields that need to be tracked for changes over time.
Let’s take a look:
Overwriting Dimension Fields: SCD Type 1
Rather than tracking history, this type of SCD simply replaces the old value with the new one, effectively overwriting the field. While this is the simplest change in your dimensions, record of the previous value is lost with SCD Type 1.
Using an SCD Type 1 makes sense when:
- The change is a correction rather than an actual change. For example, Product Category was erroneous and had to be updated before the product went to market.
- A conscious choice is made not to track history when the business does not see any strategic advantage of maintaining history for a particular field.
Add New Row: SCD Type 2
In SCD Type 2, whenever a change is made in tracked fields, a new row (or record) is created. This record contains the same elements as the original record, but the field in which the change is made is updated while the other fields remain the same.
If you’re using SCD Type 2, your database table will need to be dropped and recreated to add a few necessary columns, containing information that allows the business to track which record is the most recent one. Columns that are usually used for this purpose are:
- Begin Date
- End Date
- Current Status
For example, when Product Price is changed in the Product dimension table, a new record is created with the new Price. The ‘End Date’ column is updated in the original record, while the same date is updated in the ‘Begin Date’ column in the new record. ‘Current Status’ and ‘Version’ is updated in the same manner. Any of these parameters can then be used to identify the latest record, depending on how historical data is being queried.
SCD Type 2 is the most commonly used method to track historical changes in data warehousing.
Add New Column: SCD Type 3
Rather than adding a row, in SCD Type 3, a new column is added to the table that contains the previous value. By default, the Previous Value column will contain only the most recent historical value. If a business requires further historical data for past changes to be maintained, a new column must be created for every past value. The amount of database manipulation this requires is generally not feasible for a business, and as a result, the application of SCD Type 3 is limited.
Implementing SCDs in a Data Warehouse
The idea behind building a data warehouse is to make reporting and analysis easier and faster across your business. Implementing SCDs, while important, may require significant involvement of your development and business analysis team. In certain situations, your business may be faced with the requirement of building a hybrid SCD, like the SCD Type 6, which is a combination of SCDs Type 1, 2, and 3 (1+2+3 = Type 6). This would require additional development related work and rethinking of ETL processes.
An emerging method to deal with this challenge is to use data warehouse automation solutions that automatically build SCD configurations on the backend for the business. Users simply need to choose the options and SCD type they want to implement on a particular field, and they’re ready to go.
For more information on how you can take this route, please contact us for a free consultation.