Skip to glossary
Data Warehouse Info

A practitioner's reference for analytical data warehousing.

Reference Articles · Technique Deep-Dives · Courses · Glossary

Glossary

24 entries


Definitions of the working vocabulary used across the publication. Each entry links to the topic where the term is developed in depth.


A

1

Abstraction layer
In warehouse architecture, a layer that hides physical or implementation detail so the layer above can address data in business terms. Most often refers to the semantic layer between warehouse tables and BI tools. See Warehouse Fundamentals.

B

1

Bridge table
A dimensional-modeling structure that resolves many-to-many relationships between a fact table and a dimension by carrying one row per (group, member) pair. See Dimensional Modeling.

C

2

Change data capture
A category of techniques for identifying and propagating changes from a source data system to a downstream consumer without copying the full source state on each load cycle. See Loading and Operations.
Conformed dimension
A dimension used identically across multiple fact tables or data marts. The mechanism that lets independent marts roll up into an integrated enterprise warehouse without metric definitions drifting apart. See Dimensional Modeling.

D

9

Data catalog
A searchable index over the metadata of the data assets in an analytics platform: tables, columns, dashboards, models, owners, descriptions, and lineage, federated from the upstream tools that produce each piece. See Warehouse Fundamentals, Loading and Operations, and Warehouse Automation.
Data contract
A declarative specification of the schema, semantics, and operational guarantees a data producer promises to consumers, used to enforce expectations at the source boundary rather than discover violations downstream. See Loading and Operations.
Data fabric
A metadata-driven architecture that unifies heterogeneous data sources through an active catalog, automated governance, and a federated query layer. See Warehouse Fundamentals.
Data lake
Object storage of raw, varied, schema-on-read data. The storage layer for analytical workloads that don't fit the warehouse's structured model. See Warehouse Fundamentals.
Data lakehouse
Object storage plus an open table format (Iceberg, Delta, or Hudi), exposing lake-style data through a warehouse-style table abstraction with ACID, schema enforcement, and time travel. See Warehouse Fundamentals and Modern Warehouse Platforms.
Data lineage
The recorded graph of how a data value flows from source to destination across the pipeline: which sources fed which models fed which dashboards, at table or column granularity, derived from build artifacts and runtime events rather than maintained by hand. See Warehouse Fundamentals, Loading and Operations, and Warehouse Automation.
Data mart
A curated subset of a data warehouse, organized around a single department or subject area. Not a different architectural pattern; a deployment style of warehouse content. See Warehouse Fundamentals.
Data quality
The degree to which data is fit for its intended use, framed across five dimensions in the practitioner literature: validity, completeness, consistency, uniqueness, and conformity. See Loading and Operations.
Degenerate dimension
A dimensional attribute that lives on the fact table directly, without a separate dimension table. Used for high-cardinality transactional identifiers that have no descriptive attributes worth grouping by. See Dimensional Modeling.

E

1

Enterprise data warehouse (EDW)
An integrated, organization-wide data warehouse that consolidates analytical data across business units, in contrast to a single departmental mart. In modern cloud deployments the qualifier is mostly redundant. See Warehouse Fundamentals.

F

2

Factless fact table
A fact table that records the occurrence of an event without any additive numeric measures. The right shape for events whose analytical value is the event itself, not a quantity attached to it. See Dimensional Modeling.
Federated query
A query that executes across multiple underlying data stores through a single engine, with the engine pushing predicates down to each source and combining results. See Warehouse Fundamentals.

G

1

Grain (dimensional modeling)
The precise definition of what one row in a fact table represents. The most consequential design decision in a dimensional model, and the one most frequently skipped. See Dimensional Modeling.

I

2

Idempotency
The property that an operation produces the same result whether it runs once or many times against the same input. Central to safe recovery in warehouse load pipelines, REST APIs, and distributed stream processing. See Loading and Operations.
Inferred member
A placeholder dimension row inserted at fact-load time when the fact references a business key that does not yet exist in the dimension, with a flag that flips when the real attributes arrive. See Dimensional Modeling and Loading and Operations.

M

1

Multivalued dimension
A dimension that has multiple values per fact row, breaking the one-foreign-key-per-dimension assumption of the canonical star schema and requiring a bridge table to resolve. See Dimensional Modeling.

R

1

Referential integrity
The property that every foreign key value in a child table actually exists in the parent table it references. In a data warehouse, the question of where to enforce this property, in the database engine, in the transformation layer, or not at all, is a long-running design debate that the move to cloud platforms has decisively reshaped. See Loading and Operations, Dimensional Modeling, and Warehouse Automation.

S

2

Semantic layer
A modeling abstraction between physical warehouse tables and BI tools that defines business entities, metrics, and dimensions once, so downstream consumers query consistent definitions rather than rebuilding them per report. See Warehouse Fundamentals and Dimensional Modeling.
Slowly changing dimension
A dimension whose attribute values change over time at a rate slower than fact table growth, requiring explicit strategies to preserve or overwrite history. See Dimensional Modeling and Loading and Operations.

W

1

Watermark
A stored value marking progress through a stream of data. In warehouse loading, the boundary between data already ingested and data still pending; in stream processing systems, the event-time marker indicating which records are complete enough to act on. See Loading and Operations.