Popularized by Gartner IT analyst Mark Beyer in 2011, the term “logical data warehousing” is defined as an architectural layer that combines the strength of a physical data warehouse with alternative data management techniques and sources to speed up time-to-analytics. While the term was used by Bill Inmon in 2004, it was in a context entirely different than how the world knows it today, with the concept now being considered a data warehousing best practice.
So why “logical” data warehouses? Wasn’t the traditional repository serving BI needs well?
The answer lies in first clearly understanding that logical data warehousing is meant to complement the functionality of a physical data warehouse, rather than replacing it, as mentioned in our opening definition.
This logical architectural layer not only sits atop an organization’s monolithic data repository, but it also spans disparate sources of data that are not or can not be included in the main BI repository where data is being persisted for decision-making. Reasons for this inability range from the data source being created well after the data warehouse was built and required significant resources to add into the repository later, to the fact that unstructured, non-tabular data is increasingly becoming a part of business intelligence and cannot be ingested by the data warehouse without significant pre-processing. These sources are presented as “views”.
What are Views in a Logical Data Warehouse?
A logical data warehouse provides virtual “views” of data stored anywhere across the enterprise, whether the source is structured and resides within an SQL data warehouse, or unstructured streaming data and lives in an Azure or Amazon S3 data lake. The user doesn’t need to think about where the data is stored, what structures define source data, or where joins are being performed between data sets to answer their query. They simply need to process an information request, and the logical data warehouse will manage the request and everything it entails on its backend, ultimately presenting requested information to the user as a “view”.
The core technology used here is data virtualization. It ensures that the user is insulated from all the technical details of accessing their data, only seeing what they need to see and when. For all intents and purposes, they’re requesting data from a single, logical data warehouse, even when actual data sources could be a NoSQL or Hadoop database, a SaaS enterprise application, or a social media clickstream.
Advantages of a Logical Data Warehouse
As outlined above, one of the biggest benefits of a logical layer is that it federates queries across a multitude of data sources that an enterprise is using, be it cloud, big data, an operational data store in an RDBMS, or even unstructured reports. This enables the logical layer to augment the capabilities of the Enterprise Data Warehouse (EDW) by providing BI analysts a more complete view of data, enabling the collective physical and virtual layers to serve as the “single source of truth” in reality. This augmented information that the logical layer accesses can be pushed to consuming analytics applications live, which brings us to the second major benefit of the logical data warehouse.
Data is fresher in the logical layer, more current than the batch-oriented physical data repository that relies exclusively on ETL processes to Extract, Transform, and Load data. This is possible because the logical layer is not bound to the pre-built data structures and models of the data warehouse and creates structure of the required data at run-time. The “no-latency” analytics are invaluable when data is required for time-sensitive business processes, and reports need to be generated “now” based on live data, rather than waiting for ETL jobs to run within a data warehouse and update records so the user can generate reports.
The benefit of accessing data as it is being generated is part of the key attraction of logical data warehousing: reducing dependency on recurring batch-oriented ETL processes, which:
- Cause decisions to be made on data that is not real-time
- Require data from a new data source to be extracted, transformed, and loaded onto a physical repository before it can be used for decision-making.
Integrating these diverse data assets virtually to create a single, integrated logical data warehouse saves a business significant time and storage and processing costs.
Role of Metadata in Presenting an Integrated View
So far in this article, we’ve established that the logical data warehouse does not “house” data itself – so what does it hold? Metadata for accessing all the data sources. Metadata is the information that sets the context of data sets and is crucial to our logical architectural layer in enabling it to find and access disparate data sources. This is how replication is prevented and ETL is eliminated.
Note that, while metadata holds the location of data sets, it also contains information that allows users to relate to the data in a business context without having to know where it lives. This is how a single, integrated view is presented to users, helping them pull the right data to understand it in the right context to draw the right conclusions.
Logical Data Warehouse or Enterprise Data Warehouse?
While the benefits of a logical data warehouse cannot be denied, a major misconception today is that it can replace an enterprise data warehouse. The point of a logical data warehouse is to expand the functionality of the EDW, building on top of it to address business needs where data requirements are of current data, or when big data and other unstructured data stores need to be bridged with the EDW. Data integration and ETL will continue to play an important role in populating the physical data warehouse and keeping historical and current data consistent.
The logical data model acts as an extension that allows the business to reduce the size and scope of the EDW to optimize performance and reduce the cost of maintenance and implementation.
3 comments