Business intelligence and database teams have a range of options when conceptualizing their data warehouse design. You could go with either the Kimball or Inmon route and decide whether to build your data marts first and then consolidate them to establish the data warehouse, or the other way around. Logical data warehouses are also a popular option, where data lies in separate repositories, referenced and queried from a single logical layer that makes it look like you’re accessing a central repository. Then there’s the Enterprise Data Warehouse (EDW), where you bring operational databases together while referencing some external sources as well to create a single source of truth.
Regardless of which option you choose, the architectural layers remain the same:
- Source Layer
- Staging Layer
- Storage Layer
- Presentation Layer
Source Layer
This is where the source data sits, within internal and external enterprise applications and systems. Key data sources for your data warehouse are the relational databases that form the storage backbone of your enterprise systems.
For instance, if you are using a Customer Relationship Management (CRM) software, whether it’s cloud or on-premise, a backend database will contain application data in a tabular format. This database could be one among the many data sources for your data warehouse. It’s the same for your inventory software, cloud applications, point-of-sale systems, or even social media networks – the methods of access differ though. Data contained in these relational databases is called structured data.
With data being generated in overwhelming amounts today, unstructured data too must be considered as a source for analysis in your data warehouse. You could either use data extraction software to convert unstructured documents into structured and then copy this data to your repository or use a logical layer to include unstructured data as an external source which would be queried from within the enterprise data warehouse. Unstructured data is typically not analyzed due to difficulties in loading it, but are valuable silos of information.
Staging Layer
We have talked about data sources. The next step is to copy source data to your data warehouse. But first, you must make all of this data consistent, converting it into a format that is ideal for reporting from a single repository. That’s where the staging layer comes in, which includes everything between the source systems and the data warehouse.
Each of your enterprise systems will store data in a specific format. There’s even more variance when you’re adding external data sources, where you have no control over how data is stored. The staging layer allows you to extract data from all the different source systems, so you can then clean, validate, apply business rules, perform mathematical operations, and otherwise transform your data to organize it for loading into your data warehouse. To do so, you will require two components in your staging layer: a staging database to act as intermediary storage where you integrate and transform source data and an Extract-Transform-Load tool to allow data manipulation. In some approaches, data is also normalized and modeled at this stage. If you’re building a logical data warehouse, staging will be done on run-time, when required data is loaded into a cache.
The staging layer is critical to ensure the success of your organization’s business intelligence initiatives. If you load poor, inconsistent data directly into your data warehouse, the resulting reports will make less sense, impacting the quality of decision-making at a strategic level.
Storage Layer
Once you have integrated your source data and cleaned it in the staging database, it’s time to move this data to your data warehouse, which will become its final destination. Depending on how you design your data warehouse, this destination could be a single, consolidated database, individual data marts, or operational data stores. When queries are run across your data warehouse, required data will be accessed from the storage layer.
Presentation Layer
For all practical purposes, the presentation layer can also be called the data warehouse. After all, this is the layer with which users interact to fulfill their business intelligence requirements. Queries are run at this layer to perform analysis, and to facilitate that, you could use a graphical user interface to visualize query results.
A popular alternative today is to integrate specialized business intelligence tools, like Tableau, with your presentation layer. Users could create dashboards within Tableau, integrate their data warehouse with the platform, and customize the workflow to automatically populate Tableau dashboards with values in predefined records or tables as per business requirements.
Deciding on Your Data Warehouse Architecture
Now that you know about the four basic architectural layers of a data warehouse, you need to think about how you’ll employ these layers. In addition, you should determine what tools you’ll need to complete your data warehouse design in a way that best meets your organization’s reporting requirements. Ensure that your planned data warehouse architecture encompasses both your internal and external sources, while ideally including both structured and unstructured data to provide all-inclusive decision support.
Each layer will require you to work with a number of different tools. You may need a data integration tool, a separate tool for specialized data transformations, a data modeling tool, business intelligence tools for visual representation, etc. List down the tools you’ll need and develop a process around their collaborative usage to meet your data warehousing requirements. Alternatively, consider a data warehouse automation platform that houses every specialized tool and functionality, allowing you to build your repository faster.