Before the Xbox and iPhone, prior to the first Facebook Like or Tweet, and well before the cloud and tablets, there was the data warehouse. For over 30 years, businesses have been centrally storing data for analytics and data-dictated decision making.
The duo of IBM researchers Paul Murphy and Barry Devlin introduced the data warehousing concept in 1988. The call for warehousing data matured as databases became more advanced, and data overflow became common.
Today, leading businesses are managing their EDWs through innovative data warehouse automation platforms, and also leveraging technologies like data virtualization that creates an abstraction layer for simplifying complicated datasets.
Understanding the Abstraction Layer
Abstraction means to trim down to the generic or essential features. Therefore, it is the act of eliminating or taking away characteristics from something to highlight its essential attributes. In the context of data warehouses, an abstraction layer streamlines the design of adaptable databases.
It has three levels, namely:
- View
- Logical
- Physical
View Level
In simple terms, it’s the data view of the end users, which typically relates to the requirements of their operations and respective organizational units. This is because parts of stored data are not meant for every user. For example:
- The students shouldn’t see the salaries of the faculty members
- The faculty members shouldn’t see payment or billing data
The benefits of representing the design via the view level are:
- It is simpler to distinguish the data required by the end users
- It can be quickly checked to ensure its adequacy to facilitate the defined requirements, processes, and constraints
- Courtesy of a well-thought-out design, security can be enhanced by granting users access only to the subset of the required data
It’s admissible to mention that the applications are coded with regards to an external schema. The view level is never stored and is only computed when accessed. Different view levels can be offered to different tiers of users. Moreover, the data warehouse facilitates the transition from the view level to the logical level automatically at runtime.
Lastly, this initial level of security can be enforced upon many users of the system.
Logical Level
The logical level, as the name implies, is the logical design of the data warehouse, and is typically illustrated through E/R Diagrams. It conceals storage details of the physical level and is created by fetching all views to form a globalized view of the whole database. The advantages of the logical abstraction layer level are:
- It offers the macro level view of data
- It functions independently of both software and hardware
Remember, the process of data warehousing maps data access between the logical and physical schemas in an automated manner. Furthermore, the physical schema can be modified without changing the application, for instance, you can remove or add an index. The logical view is abstracted as a conceptual view through Entity-Relationship Modeling, which is independent of the data warehouse architecture.
Physical Level
It is the last and least level of abstraction in data warehousing. The physical schema outlines how data is stored in the data warehouse. It commonly identifies the record layout of files and their types, i.e., b-tree, hash, and flat.
The physical level explains the procedure to store data on a medium, and the type of medium you require for it. It relies on software and hardware for extraction. Also, it’s created in the end so that a data warehouse professional knows the hardware specifications of the database.
Remember, the objective of the physical abstraction layer level is to develop a design where the physical model can be altered without impacting the internal one. Early applications in the 1960s only functioned at this level. They explicitly handled internal details, such as lessening physical distances across related data and arranging the data structures inside the file, i.e., linked lists of blocks and blocked records.
Challenges
- The routines are hard-coded to cope with the physical representation
- Making changes to the data structures requires expertise
- The application code can be time-consuming when dealing with details
- Implementing new features also requires expert skills
Conclusion
If the rules formulated by the ANSI/SPARC are deployed, a data warehouse can be smoothly scaled and upgraded. Any professional would tell you that the most common need is the upgradability ease at the physical level. It’s because the database expansion requires hardware upgrades without the need to remodel the database.
Therefore, when upgrading the physical level, ideally choose a solution that offers a unified platform to design, deploy, and sustain a data warehouse.
Great post.