Loading and Operations topic | Data Warehouse Info

The pillar

1

Pillar
Data warehouse loading and operations
How a data warehouse stays current: full vs incremental loading, change data capture, watermarks, load ordering, idempotency and recovery, late-arriving data, and the monitoring that keeps it reliable.
Read →

Techniques

12

Technique
Advanced dimensional modeling: bridge tables and the hard cases
Advanced dimensional modeling beyond the basics: bridge tables, multivalued and inferred-member dimensions, comment dimensions, and multi-timezone facts.
Read →
Technique
Building a data warehouse: a four-phase playbook
How a data warehouse project actually gets built, across discovery, design, development, and deployment, with the Kimball vs Inmon choice treated as a concrete decision, not a debate.
Read →
Technique
Change data capture: implementation strategies
How log-based, timestamp-based, and trigger-based CDC work in production: the snapshot-to-streaming handoff, schema-evolution failure modes, and the disciplines that keep pipelines correct.
Read →
Technique
Data cleansing in the warehouse: where it belongs
Where data cleansing sits in a modern warehouse load: the staging-to-curated boundary, the rule categories that catch real defects, the test-at-the-transform-layer pattern, and the observability that catches the drift the rules miss.
Read →
Technique
Data extraction models: full, incremental, and log-based CDC
The seven data extraction patterns a warehouse encounters in practice, what each one assumes about the source, where each one fails, and how the modern connector stack (Fivetran, Airbyte, Estuary, Debezium, Kafka) decides between them.
Read →
Technique
Data integration: approaches and when to use each
The main data integration approaches compared: ETL, ELT, change data capture, replication, data virtualization, and streaming, and the trade-offs in latency, source type, and governance that pick the winner.
Read →
Technique
Data masking in the data warehouse
How static, dynamic, and on-the-fly data masking actually work in a cloud warehouse, including the mask-before-load versus mask-in-warehouse axis, column-level masking policies on Snowflake, BigQuery, and Databricks, and the trade-offs between tokenization, encryption, and hashing under GDPR, CCPA, and HIPAA.
Read →
Technique
Data virtualization: federated query in modern stacks
How data virtualization works as a technique, what it shares with and how it differs from federated query and the logical data warehouse, where it fits in cloud warehouse stacks, and the failure modes that determine when virtualization holds up in production.
Read →
Technique
Data warehouse metadata: catalogs, lineage, and repositories
How technical, business, and operational metadata get organized in a modern warehouse stack, including the shift from monolithic metadata repositories to federated data catalogs, dbt-driven lineage, and OpenLineage as the cross-tool standard.
Read →
Technique
Data warehouse testing: validation, regression, and performance
What to test in a production warehouse pipeline, where each kind of test lives, and how dbt tests, Great Expectations, and contract patterns fit together without producing a green dashboard over wrong data.
Read →
Technique
Slowly changing dimensions: implementation strategies
How SCD Type 1, 2, 3, and the hybrid types actually work in a production warehouse, including active row identification, fact loading under Type 2, and the edge cases that bite teams in practice.
Read →
Technique
Surrogate key management: generation, lookup, and pitfalls
How to generate and manage surrogate keys in a 2026 cloud warehouse: integer sequences, hash-based deterministic keys, UUID v7, the fact-loading lookup under Type 2 SCD, and the edge cases that produce silent errors.
Read →

Pattern

1

Pattern
Where coding agents quietly get the warehouse wrong
AI-generated SQL that parses, runs, and passes your tests but answers the wrong question: the silent failure modes of coding agents in the warehouse, and what actually catches them.
Read →

Comparison

1

Comparison
ETL vs ELT
ETL vs ELT: what the order of operations actually changes, why cloud columnar warehouses shifted the default from ETL to ELT, the trade-offs that determine which pattern fits a given workload, and a note on where reverse ETL fits.
Read →

Decision

1

Decision
Referential integrity in a data warehouse
Referential integrity in a data warehouse is a decision, not a default. A framework for choosing between database-enforced foreign keys, informational constraints, ELT-layer assertions, and unenforced declarations on Snowflake, BigQuery, Redshift, Databricks, and lakehouse table formats.
Read →

Glossary

10

Loading and Operations

Data warehouse loading and operations

Advanced dimensional modeling: bridge tables and the hard cases

Building a data warehouse: a four-phase playbook

Change data capture: implementation strategies

Data cleansing in the warehouse: where it belongs

Data extraction models: full, incremental, and log-based CDC

Data integration: approaches and when to use each

Data masking in the data warehouse

Data virtualization: federated query in modern stacks

Data warehouse metadata: catalogs, lineage, and repositories

Data warehouse testing: validation, regression, and performance

Slowly changing dimensions: implementation strategies

Surrogate key management: generation, lookup, and pitfalls

Where coding agents quietly get the warehouse wrong

ETL vs ELT

Referential integrity in a data warehouse

Change data capture

Data catalog

Data contract

Data lineage

Data quality

Idempotency

Inferred member

Referential integrity

Slowly changing dimension

Watermark