Data warehouse architecture is changing, and it has been changing for some time now. The traditional on-premise deployment model was succeeded by cloud deployment. However, cloud-based data warehouses are different from traditional on-premise ones in a variety of ways.We will be discussing these features in this article. But before that, we are going to have an in-detail look at the two architectures, compare and contrast the two, and at the end decide which one is better given the requirements.
A lot of the organizations are transitioning to cloud-based data warehouses due to the following major advantages they offer:
- No need to buy extremely expensive and very hardto maintain physical hardware.
- Cloud-based data warehouses are quicker to setup and scale easily with the growing needs of an organization.
- The use of massively parallel processing (MPP)helps cloud-based data warehouse architectures to perform complex analytical queries much faster.
The emergence of cloud computing over the past few years has dramatically impacted the data warehouse architecture,leading to the popularity of Data Warehouses-as-a-service (DwaaS). Let us have a brief look at how the traditional architecture is laid out, you can also check out one such solution for your data warehousing needs here.
Traditional Data Warehouse
The traditional data warehouse architecture is implemented as an on-premise solution. Organizations running their own traditional on-site data warehouse must effectively manage the infrastructure.
The traditional data warehouse architecture consists of a three-tier structure, listed as follows:
- Bottom tier: The bottom tier contains the data warehouse server, which is used to extract data from different sources, such as transactional databases used for front-end applications.
- Middle tier: The middle tier contains an OLAP (Online Analytical Processing) server. The purpose of the server is to transform the data structurally so it is better suited for analysis and complex querying.
- Top tier: The top tier has the front-end Business Intelligence tools used for querying, reporting, and analytics.
There are two different approaches when it comes to the data warehouse design, engineered by the pioneers of computer science, Bill Inmon and Ralph Kimball.
Ralph Kimball believed in the creation of data marts, which are data repositories belonging to particular business lines(e.g. finance), as the first step of the designing process. The data warehouse is simply a combination of different data marts that facilitates reporting and analysis. This is known as a “bottom-up” approach.
Bill Inmon, on the other hand, suggested a “top-down” approach. In this approach the data warehouse is a centralized repository for all enterprise data. Dimensional data marts, serving particular business lines are created from the data warehouse.
Cloud Data Warehouse
Cloud-based data warehouses are still relatively new. By offering data warehouse functionalities which are accessible over the Internet, cloud providers enable organizations to avoid the hefty setup costs needed to build a traditional on-premise data warehouse.
Cloud architectures are considerably different from traditional data warehouse ones. Depending on the service providing the cloud solution, the architecture of the cloud can vary. A somewhat general architecture when it comes to cloud data warehouse is as follows:
- Clusters: A cluster is basically a group of shared computing resources, called nodes. It is a huge grouping of nodes.
- Nodes:Nodes are computational resources that have their own CPU, RAM, and memory. A cluster that consists of two or more nodes is composed of a leader node and compute nodes. The leader node communicates with client programs and compiles code to execute queries,assigning it to compute nodes. Compute nodes execute the queries and return the results to the leader node. A compute node only executes queries that reference tables stored on that node.
- Partitions: Each compute node is partitioned into slices. A slice receives an allocation of memory and disk space on the node. Slices operate in parallel to speed up the query execution time.
Which Deployment Model Is Better?
Throughout this article we have highlighted the two approaches to data warehousing – the traditional and cloud-based approach. We know you’re interested in finding out which one is objectively better, but it’s not just that simple. Mostly the choice of solution depends on the needs of the organization, their resource and budget restrictions, data sensitivity, etc.
Considering the above-mentioned factors, there is no objective winner. Both the solutions offer unique advantages and disadvantages. The ideal solution for you is the one that fits your organization’s requirements.
Imagine this, you’re an entrepreneur, you have a great idea and it’s going to be the next big thing in IT. But you don’t have the resources to set up an on-site data warehouse, then the cloud-based solution would be suitable for your needs. On the other hand,if you’re a well-established organization dealing with sensitive information, such as medical records, that you cannot risk transferring to the cloud then you can benefit more from an on-site data warehousing solution as it offers enhanced security.
The data warehousing solution an organization decides to deploy will significantly impact their experience. Cost, performance, scalability, and security are the main factors that will help you come to a decision. Consider these factors in the light of your organization’s and it will help you decide which deployment model is better for you. Either way you decide to go we have got you covered.