A data warehouse creates a centralized source of data which facilitates business intelligence, strategy, and decision-making. Thus, having an effective and efficient data warehousing solution is extremely important for any organization. Examples of some information that we might be able to extract from a data warehouse include:
- Reporting on financial performance of different departments
- Predicting trends within the marketplace
Traditionally, a data warehouse solution is implemented on an on-site location. It has to be configured and managed by an experienced, on-site IT team. Building data warehouses can be expensive, owing to the accompanying hardware and software cost. On top of that, scalability is another factor that affects cost. The cost-effective alternative to this traditional implementation of a data warehouse is the cloud-based architecture. But before we discuss the cloud base solution, let us first discuss cloud-based computing and how it works.
Cloud computing is a method of providing a set of shared computing resources that includes applications, computing, storage, networking, development, and deployment platforms as well as business processes. Cloud computing turns traditional siloed computing assets into shared pools of resources that are based on an underlying Internet foundation. Clouds come in different versions, depending on your needs. There are two primary cloud deployment models: public and private. Most organizations use a combination of private computing resources (data centers and private clouds) and public services as a hybrid environment.
Cloud computing has evolved from a risky and confusing concept to a strategy that large and small organizations are beginning to adopt as part of their overall computing strategy. Not only are organizations using the cloud for services, such as e-mail or customer relationship management, but also many are utilizing a set of important cloud foundational services Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) to develop and deploy applications to support the business and open up new innovative opportunities and new revenue streams.
Types of Cloud Environments
Open Community Clouds
The most open type of cloud environment is an open community cloud — a cloud environment that doesn’t require any criteria for joining other than signing up and creating a password. These environments may be privately or publicly owned and include social networking environments, such as Facebook, LinkedIn, and Twitter. There are also open community sites that enable individuals with a common interest to participate in online discussions. For example, there may be a community of professionals in a certain industry that want to share ideas.
Controlled Open Mode
Some public clouds offer a higher level of service because they are true commercial environments. Commercial public clouds are those environments that are open for use by any one at any time, but these clouds are based on a pay-per-use model. For example, a SaaS vendor that charges per-user, per-month or per-year is one example of this kind of environment. In addition, vendors can offer analytics as a service to customers on a per-use or per-task basis.
Public/Private Hybrid Clouds
Companies often want the flexibility of the cloud but with the security and predictability of the data center. In these cases, a private cloud provides an environment that sits behind a firewall. Unlike a data center, a private cloud is a pool of common resources optimized for the use of the IT organization. Unlike a public cloud, a private cloud adheres to the company’s security, governance and compliance requirements. Whatever service level is required for the company applies to the private cloud.
Cloud-based data warehouses offer some major advantages over the traditional on-premise solutions; with internet accessibility being the major one.
The cloud architecture is different from the conventional architecture, depending on the service provider. However, the basics stay the same and are listed as follows:
- Clusters: A cluster is basically a group of shared computing resources, called nodes. It is a huge grouping of nodes.
- Nodes: Nodes are computational resources that have their own CPU, RAM, and memory. A cluster that consists of two or more nodes is composed of a leader node and compute nodes. Leader nodes communicate with client programs and compile code to execute queries, assigning it to compute nodes. Compute nodes execute the queries and return the results to the leader node. A compute node only executes queries that reference tables stored on that node
- Slices: Each compute node is partitioned into slices. A slice receives an allocation of memory and disk space on the node. Slices operate in parallel to speed up query execution time.
Due to their architecture, cloud-based data warehouse offers some major advantages over the traditional systems, such as:
Many organizations cite a lack of resources and expertise as barriers to implementing an on-site data warehouse solution. This is where cloud data warehouses become a preference.
The bottom line is that the right type of deployment model is dependent on your organization’s requirements.