Dimensional modeling is one of the key concepts in data warehouse design. In its essence, it is a collection of techniques used to structure database tables. The primary benefit of using dimensional modeling is simplicity, optimized query performance, and faster data retrieval. It is mostly used for business reporting, owing to its simple structure comprising of denormalized data. Moreover, dimensional modeling is adaptable and scalable as per the changes in businesses; it is often as simple as adding a new column or creating a new table.
Dimensional modeling techniques are broadly classified into:
- Basic fact table techniques
- Basic dimension table techniques
- Integration vs. conformed dimensions
- Slowly changing dimension techniques
- Dimension hierarchy techniques
- Advanced fact table techniques
- Advanced dimensional table techniques
- Special purpose schemas
In this article, we are going to take a closer look at advanced dimensional table techniques.
Advanced Dimensional Table Techniques
A dimensional table is an integral part of the star schema of the data warehouse. It can be termed as a reference point that stores attributes, dimensions which are used to define elements of a fact table. Advanced dimensional table techniques are used to model, retrieve, and modify data inside a dimensional table. Some of the salient techniques under this category are as follows:
The intrinsic nature of text comments renders them ambiguous and casts doubt about their modeling. The debate is whether text comments should be modeled in a text fact or a separate dimension? However, the most agreed upon and efficient method is storing text comments outside the fact table in a separate dimension. There should be a foreign key in the fact table corresponding to it.
Multivalued Dimensions and Bridge Tables
In an ideal dimensional schema, a fact table only supports a single dimension with a single value relative to the fact table’s grain. But there are occasions when a fact table has to acquiesce to dimension that has multiple values. This throws the entire fact and dimension relationship out of balance as well as the established primary key – foreign key relationship between the dimension table and fact table. A good example of this is a joint bank account which has multiple account holders. The balance for such an account can’t be stored in the fact table with a customer ID as dimension as there are multiple customers (account holders). This is where a bridge table comes in. As evident from the name, a bridge table is nestled between a fact table and a dimension table and resolves many-to-many relationships (multiple account holders as stated in the above example) between fact and dimension.
Late Arriving Dimensions
Fact tables should have foreign keys pointing back to related dimensions, this is a fact but there are scenarios where fact table is populated or known before the associated dimension record. The scenario is termed as late arriving dimension or early arriving fact. An example of this would be a new employee getting enrolled in an organization’s medical insurance program. The employee is entitled to claim insurance from day one but all the requisite paperwork might take weeks to complete and hence hand over to the insurance company will be delayed. In the meantime, if the employee avails the insurance, the records will be classified as fact records but will not have the corresponding patient dimension details as the paperwork hasn’t arrived yet.
Looking for a rich information repository about data warehouse architecture updated regularly and maintained by experts in their field? Click here
To tackle this situation “inferred member” is used. This entails a multipronged approach where a new record and a new column is added to the dimension table. If we consider the above-cited example and incorporate it into an illustration from the insurance company’s point of view, it would look something like this:
Customer ID 521 allocated to the employee whose medical expenses have been received.
|Customer ID||Medical Expenses|
Since paperwork for the employee has not been received yet, the customer ID will not exist in the customer dimension table and hence its foreign key will also not be available in the fact table. To handle this, a new record has been added to the customer dimension table where the customer ID 521 is placed and a surrogate ID of 17 is allotted to it. This surrogate value of 17 is going to be placed in the fact table as a foreign key. Another column titled “inferred” is added to the customer dimension with a bit value of “1” which essentially means that this is an “inferred field”.
|FK_ Customer||Medical Expenses|
This approach satisfies the fact and dimension table relationship requirement, and allows the accommodation of the late arriving dimension. Later on when the record gets updated and the source system provides the value of the attribute such as name, it is updated and the inferred field bit is changed to 0 as shown below:
Multiple Time Zones
Databases have to be accommodating of universal standard time as well as local time zones. Trouble arises when there are multiple local time zones. Modeling such data requires a simple technique which is the placement of dual foreign keys in affected fact tables that join two relevant dates dimensional table.
Advanced dimensional modeling has other facets which make life easier, and data more comprehensive and readable. If you want to build an effective BI strategy for your business, get in touch with our data architects to ensure that you get the best value out of your investment.