Skip to content

Practice

Techniques and patterns for warehouse delivery.

Focused how-to articles on the specific moves that determine whether a warehouse holds up in production. The Foundations pillars cover the model; Practice covers the work.

16 entries

Technique

Advanced dimensional modeling: bridge tables and the hard cases

Advanced dimensional modeling beyond the basics: bridge tables, multivalued and inferred-member dimensions, comment dimensions, and multi-timezone facts.

Technique

Building a data warehouse: a four-phase playbook

How a data warehouse project actually gets built, across discovery, design, development, and deployment, with the Kimball vs Inmon choice treated as a concrete decision, not a debate.

Technique

Change data capture: implementation strategies

How log-based, timestamp-based, and trigger-based CDC work in production: the snapshot-to-streaming handoff, schema-evolution failure modes, and the disciplines that keep pipelines correct.

Technique

Data cleansing in the warehouse: where it belongs

Where data cleansing sits in a modern warehouse load: the staging-to-curated boundary, the rule categories that catch real defects, the test-at-the-transform-layer pattern, and the observability that catches the drift the rules miss.

Technique

Data extraction models: full, incremental, and log-based CDC

The seven data extraction patterns a warehouse encounters in practice, what each one assumes about the source, where each one fails, and how the modern connector stack (Fivetran, Airbyte, Estuary, Debezium, Kafka) decides between them.

Technique

Data integration: approaches and when to use each

The main data integration approaches compared: ETL, ELT, change data capture, replication, data virtualization, and streaming, and the trade-offs in latency, source type, and governance that pick the winner.

Technique

Data masking in the data warehouse

How static, dynamic, and on-the-fly data masking actually work in a cloud warehouse, including the mask-before-load versus mask-in-warehouse axis, column-level masking policies on Snowflake, BigQuery, and Databricks, and the trade-offs between tokenization, encryption, and hashing under GDPR, CCPA, and HIPAA.

Technique

Data modeling phases: conceptual, logical, and physical

Data modeling phases explained: what the conceptual, logical, and physical models each deliver, where dbt and data contracts fit, and the handoffs that decide if the model holds.

Technique

Data virtualization: federated query in modern stacks

How data virtualization works as a technique, what it shares with and how it differs from federated query and the logical data warehouse, where it fits in cloud warehouse stacks, and the failure modes that determine when virtualization holds up in production.

Technique

Data warehouse metadata: catalogs, lineage, and repositories

How technical, business, and operational metadata get organized in a modern warehouse stack, including the shift from monolithic metadata repositories to federated data catalogs, dbt-driven lineage, and OpenLineage as the cross-tool standard.

Technique

Data warehouse testing: validation, regression, and performance

What to test in a production warehouse pipeline, where each kind of test lives, and how dbt tests, Great Expectations, and contract patterns fit together without producing a green dashboard over wrong data.

Technique

Logical data warehouse: the architectural pattern

The logical data warehouse unifies a physical warehouse with lakehouses, operational stores, and SaaS sources behind a single query layer. How the pattern actually works in 2026, where it fits, and where it quietly breaks.

Technique

Normalization and denormalization in data warehousing

Normalization vs denormalization for analytical workloads: where 3NF still belongs in a 2026 warehouse, why columnar engines have made denormalization the default for query layers, and how to think about the trade-off layer by layer.

Technique

Slowly changing dimensions: implementation strategies

How SCD Type 1, 2, 3, and the hybrid types actually work in a production warehouse, including active row identification, fact loading under Type 2, and the edge cases that bite teams in practice.

Technique

Surrogate key management: generation, lookup, and pitfalls

How to generate and manage surrogate keys in a 2026 cloud warehouse: integer sequences, hash-based deterministic keys, UUID v7, the fact-loading lookup under Type 2 SCD, and the edge cases that produce silent errors.

Pattern

Where coding agents quietly get the warehouse wrong

AI-generated SQL that parses, runs, and passes your tests but answers the wrong question: the silent failure modes of coding agents in the warehouse, and what actually catches them.