Skip to content
Data Warehouse Info

A practitioner's reference for analytical data warehousing.

Reference Articles · Technique Deep-Dives · Courses · Glossary

Practice


Techniques and patterns for warehouse delivery.

Focused how-to articles on the specific moves that determine whether a warehouse holds up in production. The Foundations pillars cover the model; Practice covers the work.


Technique

Advanced dimensional modeling: bridge tables, inferred members, multi-timezone, and the awkward cases

How to model the dimensional cases the textbook example never quite covers: multivalued dimensions and bridge tables, inferred members for late-arriving dimensions, free-text comments, and facts that span multiple time zones.

Read

Technique

Building a data warehouse: a four-phase practitioner's playbook

How warehouse projects actually get built, organized as discovery, design, development, and deployment, with the Kimball-versus-Inmon design choice treated as a concrete decision rather than an academic debate.

Read

Technique

Change data capture: implementation strategies

How log-based, timestamp-based, and trigger-based change data capture actually work in production, including the initial snapshot handoff, schema evolution failure modes, and the operational disciplines that keep CDC pipelines correct.

Read

Technique

Data cleansing in the warehouse: where it belongs and what it does

Where data cleansing sits in a modern warehouse load: the staging-to-curated boundary, the rule categories that catch real defects, the test-at-the-transform-layer pattern, and the observability that catches the drift the rules miss.

Read

Technique

Data extraction models: full, incremental, log-based, query-based, file-based, API, and streaming

The seven data extraction patterns a warehouse encounters in practice, what each one assumes about the source, where each one fails, and how the modern connector stack (Fivetran, Airbyte, Estuary, Debezium, Kafka) decides between them.

Read

Technique

Data masking in the data warehouse

How static, dynamic, and on-the-fly data masking actually work in a cloud warehouse, including the mask-before-load versus mask-in-warehouse axis, column-level masking policies on Snowflake, BigQuery, and Databricks, and the trade-offs between tokenization, encryption, and hashing under GDPR, CCPA, and HIPAA.

Read

Technique

Data modeling phases: conceptual, logical, and physical

How conceptual, logical, and physical data models actually divide warehouse design work in 2026, including where data contracts and dbt fit, and the handoffs that determine whether the model survives production.

Read

Technique

Data virtualization: federated query in modern warehouse stacks

How data virtualization works as a technique, what it shares with and how it differs from federated query and the logical data warehouse, where it fits in cloud warehouse stacks, and the failure modes that determine when virtualization holds up in production.

Read

Technique

Data warehouse metadata: catalogs, lineage, and the metadata repository in 2026

How technical, business, and operational metadata get organized in a modern warehouse stack, including the shift from monolithic metadata repositories to federated data catalogs, dbt-driven lineage, and OpenLineage as the cross-tool standard.

Read

Technique

Data warehouse testing: validation, regression, and performance

What to test in a production warehouse pipeline, where each kind of test lives, and how dbt tests, Great Expectations, and contract patterns fit together without producing a green dashboard over wrong data.

Read

Technique

Logical data warehouse: the architectural pattern

The logical data warehouse unifies a physical warehouse with lakehouses, operational stores, and SaaS sources behind a single query layer. How the pattern actually works in 2026, where it fits, and where it quietly breaks.

Read

Technique

Normalization and denormalization in data warehousing

Normalization vs denormalization for analytical workloads: where 3NF still belongs in a 2026 warehouse, why columnar engines have made denormalization the default for query layers, and how to think about the trade-off layer by layer.

Read

Technique

Slowly changing dimensions: implementation strategies

How SCD Type 1, 2, 3, and the hybrid types actually work in a production warehouse, including active row identification, fact loading under Type 2, and the edge cases that bite teams in practice.

Read

Technique

Surrogate key management: generation, lookup, and the cases that bite

How to generate and manage surrogate keys in a 2026 cloud warehouse: integer sequences, hash-based deterministic keys, UUID v7, the fact-loading lookup under Type 2 SCD, and the edge cases that produce silent errors.

Read