ETL vs ELT

ETL and ELT name the same three operations (extract, transform, load) executed in different orders against different infrastructure. The difference is small enough that the comparison reads as pedantic on first encounter and large enough that it determines pipeline cost, governance posture, and the shape of the team that operates the warehouse. The warehouse loading and operations pillar covers the ETL process broadly; this page is the focused comparison between the two patterns and the conditions under which each is the right choice. Both are approaches to data integration — the broader discipline of bringing source data into the warehouse — which surveys them alongside change data capture, replication, virtualization, and streaming.

TL;DR. ELT is the production default on modern cloud warehouses. The economics of separated storage and compute, the maturity of in-warehouse transformation tooling (dbt being the dominant case), and the operational simplicity of one pipeline architecture instead of two together favor ELT for most new builds. ETL still wins when transformation must run before the data is allowed to land in governed storage, when source data must be filtered or masked before reaching the warehouse for compliance reasons, or when downstream destinations are not warehouses but operational systems that expect already-transformed records. The honest framing is that the cloud era has made ELT the better default for the majority case, not the right answer for every case.

What each one is

ETL

ETL stands for extract, transform, load, in that order. Data is extracted from source systems into a separate transformation environment (historically a dedicated ETL server, more recently a dedicated processing layer like Spark or a streaming engine), transformed there to fit the warehouse model, and only then loaded into the warehouse. The warehouse never sees the data in its raw, untransformed state.

The defining architectural commitment is that transformation runs outside the warehouse. Two consequences follow. The pipeline owns more infrastructure (the transformation environment is a separate system that needs to be sized, monitored, and maintained), and governance sits naturally at the boundary between extraction and load (data that is unsafe to land in the warehouse never gets there because transformation, masking, and quality checks happen first).

ETL is the older pattern. It dominated through the 1990s and 2000s for legitimate reasons: warehouse storage was expensive, warehouse compute was even more so, and pushing transformation onto a separate, cheaper processing layer was good economics. Many production warehouses still run ETL because it has been working for a decade or longer and the migration cost outweighs the savings.

ELT

ELT stands for extract, load, transform. Data lands in the warehouse first, untransformed, and the transformation step runs inside the warehouse against data already in place. There is no separate transformation environment; the warehouse's own query engine performs the work the dedicated ETL server used to do.

The defining architectural commitment is that transformation runs inside the warehouse. The pipeline owns less infrastructure (the warehouse is doing double duty as both query engine and transformation engine), but partially governed data exists inside the warehouse before transformation has completed, which raises access control and governance questions that ETL environments handle before data ever arrives.

ELT became practical at scale once cloud warehouses (Snowflake, BigQuery, Redshift, Databricks SQL) separated storage from compute and added elastic compute. Before that separation, in-warehouse transformation competed for the same scarce CPU and disk that queries needed. With separated compute, a transformation workload can run on dedicated compute that scales independently of the query workload, and the cost economics flip in favor of pushing the transformation step into the warehouse.

Why the comparison shifted in the cloud era

The pre-cloud version of this comparison was straightforward and stable for two decades. ETL was correct for governance, ELT was acceptable when source data was trivial, and most teams used ETL. The cloud columnar warehouse era changed the math on parts of the comparison but not on others.

The economics flipped. On separated-storage-and-compute warehouses, in-warehouse transformation is no longer competing with queries for the same compute. Transformation can run on dedicated warehouse compute that scales elastically; storage is cheap object-storage-backed; compute is metered separately. The historical "ETL keeps the warehouse free for analytics" argument is much weaker than it was when storage and compute were a single physical resource.

The tooling matured. dbt is the de-facto standard for in-warehouse SQL transformation: version-controlled models, dependency-resolved DAGs, generic and bespoke tests, documentation generated from the lineage graph. Coalesce, SQLMesh, and a handful of others occupy the same shape with different bets on what the developer experience should look like. The ELT-side tooling now has the operational rigor (testing, lineage, observability) that ETL tooling spent two decades developing.

The governance question is more subtle than it used to be. ETL had a clean answer to "where does sensitive data live before it's safe": outside the warehouse, in transit through the transformation environment, never landing in governed storage in raw form. ELT lands raw data in the warehouse first, which means access control on that raw layer has to be at least as strict as it would have been in the ETL transformation environment. Warehouses support this (row-level security, column masking, separate raw and staged schemas with restricted access) but it requires explicit configuration. Teams that assume "the data is in the warehouse, so it must be governed" without doing the schema-level work routinely discover that their raw layer is reachable from analyst queries that were supposed to be reading the curated layer.

What did not change. The order of operations still affects the same things it always did. ELT means raw source data exists inside the warehouse for some window; ETL means it does not. ELT means transformation has to be expressible against the warehouse's query engine; ETL allows arbitrary transformation logic. ELT requires the warehouse to be sized for transformation workload; ETL allows the transformation environment to be sized independently. These trade-offs are not artifacts of older infrastructure. They are properties of the order itself.

Comparison along key axes

Axis	ETL	ELT
Order of operations	Extract, transform, then load	Extract, load, then transform
Transformation runs	Outside the warehouse, in a dedicated processing layer	Inside the warehouse, against data already loaded
Raw data in warehouse	No; only transformed data lands	Yes; raw data lands first, gets transformed in place
Governance posture	Strong by default; raw data never reaches the warehouse	Requires explicit configuration; raw and curated layers need separate access control
Compute scaling	Transformation environment sized independently of query workload	Transformation and queries share warehouse compute (decoupled at the cluster / virtual warehouse level on cloud platforms)
Transformation language	Whatever the transformation environment supports (Spark, Python, Scala, custom code)	Limited to what the warehouse's query engine handles (SQL, plus UDFs in supported languages)
Tooling pattern	Dedicated ETL platforms or custom pipelines orchestrated externally	SQL transformation frameworks (dbt is the dominant case); the warehouse's own scheduling for orchestration
Latency profile	Transformation step adds to total pipeline time	Load completes first; transformation runs as a separate downstream step
Cost model	Dedicated transformation infrastructure plus warehouse compute and storage	Warehouse compute (typically separately metered for transformation vs. query workloads) plus storage
Fit for unstructured data	Better; transformation environment can parse and structure unfamiliar formats	Limited; warehouses parse some semi-structured formats well, others awkwardly
Operational complexity	Two systems to maintain (transformation environment plus warehouse)	One system (the warehouse) does both jobs

Two axes are worth specific elaboration because they are the ones most often misunderstood.

Governance posture is configuration, not architecture. ETL's "transformation before load" gives stronger governance for free: data that hasn't been cleaned and masked never reaches the warehouse, so it can't accidentally be queried by an analyst whose access was loosely scoped. ELT requires the same effect to be achieved through explicit configuration: separate schemas for raw and curated data, role-based access policies that restrict the raw layer, automated tests that catch when a curated query accidentally pulls from the raw layer. The configuration is real work and is the most common place ELT adoption produces governance regressions that didn't exist under ETL.

The transformation-language axis is more practical than philosophical. ETL allows arbitrary transformation in arbitrary languages; ELT constrains transformation to what the warehouse supports. For most analytical workloads this is a constraint without consequence because SQL is genuinely sufficient. For machine learning feature engineering, complex enrichment from external APIs, or transformation logic that needs to call out to specialized libraries, SQL alone is not enough and the warehouse's UDF support varies sharply by platform. Workloads where SQL is genuinely the wrong tool are still better served by ETL or by a hybrid pattern where the heavy lifting happens upstream and the warehouse handles only the final shaping.

The architectural difference between the two patterns:

Design-time AI.

Deterministic runtime.

AI helps you build. Production runs deterministic SQL on your warehouse. No LLM calls at runtime.

See a demo

When ETL still wins

ELT being the default for new builds does not mean ETL is obsolete. The cases where ETL is the better choice are specific and worth recognizing in advance.

Compliance requirements that prohibit raw data in the warehouse. Some regulatory environments require that personally identifiable information, payment data, or healthcare records be masked, tokenized, or filtered before reaching the analytical layer. The cleanest way to enforce this is to run the masking and filtering as part of the extraction pipeline, outside the warehouse, so the warehouse never holds the unsafe data even briefly. ETL is the natural fit; configuring an ELT pipeline to never let raw sensitive data become queryable is possible but mechanically harder and easier to get wrong.

Transformations that are not expressible in SQL. When transformation requires calling external APIs for enrichment, running ML model inference, parsing unusual formats (industry-specific binary protocols, scientific instrument output), or applying logic that genuinely needs a procedural language, the work belongs upstream of the warehouse. Most teams in this position run a hybrid: heavy transformation runs in Spark or Python as part of the extract step, and the warehouse only handles the final dimensional shaping inside dbt or its equivalent.

Destinations that are not warehouses. When analytical pipelines also produce data for operational systems (a customer success platform that needs a refreshed list of accounts at risk, a marketing automation tool that needs segmented customer lists), the data is being loaded into systems that expect already-transformed records. That is ETL by definition; the transformation has to happen before the load. The reverse ETL section below covers this case in more detail.

Workloads where the warehouse is itself the cost constraint. ELT moves transformation cost into warehouse compute. For most cloud warehouses this is fine because compute is elastic and separately metered. For warehouses on consumption-priced platforms where transformation workload would meaningfully drive up the bill, pushing transformation back outside the warehouse can be the right cost decision. The math depends on the specific platform's pricing and the actual transformation volume; the decision is empirical, not categorical.

When ELT wins

For most analytical workloads on modern cloud warehouses, ELT is the default for the reasons sketched above. A few cases make the choice especially obvious.

Transformation is mostly SQL. Dimensional modeling, metric calculations, joining and aggregating across source tables, applying business logic that can be expressed declaratively all favor ELT. SQL is genuinely sufficient and the in-warehouse transformation tooling (dbt and its peers) handles testing, lineage, and documentation at a level of rigor that custom ETL pipelines typically do not match.

The team is SQL-first. Teams that are strongest in SQL and analytics rather than software engineering tend to find ELT more accessible. The barrier to adding a new transformation is writing a SQL query against existing tables, not learning a new processing framework. This shifts work from a specialized data engineering team to a wider population of analytics engineers and analysts who can contribute models against the warehouse.

Source data needs to be retained in raw form anyway. If the warehouse already holds raw source data for audit, replay, or future analysis, the marginal cost of running transformation against it is small. The retention argument and the ELT argument compose: hold the raw, transform downstream when needed.

Iteration speed matters more than infrastructure simplicity at the start. Spinning up a new dbt model against tables that already exist in the warehouse takes minutes. Building the equivalent ETL pipeline takes longer because the transformation environment has to be configured for the new logic. For teams that are still figuring out what the analytical questions are, ELT removes a layer of pre-commitment.

A note on push-down

Push-down optimization, often the framing the older comparison used to bridge ETL and ELT, describes the technique where a tool that nominally runs ETL pushes specific transformation steps down to the source or target database engine instead of running them in its own processing layer. The original benefit was practical: avoid moving data into the ETL server, into the transformation engine, and back out again when the source or target database could perform the transformation in place.

Push-down was the bridge technique that let ETL platforms behave more like ELT for the cases where it made sense. In the cloud era, the distinction has largely collapsed because the warehouse is doing the push-down naturally as part of native ELT, and the dedicated push-down mode is no longer a distinct feature so much as a default behavior of the warehouse-centric transformation tools. Where the older framing made push-down sound like a sophisticated optimization, the modern framing is that push-down is what ELT does at the architectural level. The term still appears in older documentation; readers encountering it should know it is essentially synonymous with the transformation-in-warehouse argument the rest of this article makes.

A note on reverse ETL

Reverse ETL is a related but architecturally distinct pattern. Where ETL and ELT both move data into the warehouse for analytical use, reverse ETL moves data out of the warehouse and back into operational systems: customer data platforms, marketing automation, customer success tools, ad platforms, payment systems. The warehouse becomes a system of record for derived data (segments, scores, lifecycle stages) that operational systems then act on.

The "ETL" in reverse ETL is structural: the warehouse is now the source, the operational system is the target, and the data has to be transformed to fit the target's API shape before being loaded. The transformation step happens outside the operational system, before the load. By the literal definition that makes reverse ETL a form of ETL, just running in the opposite direction relative to the analytical layer.

The category exists because the warehouse-as-system-of-record-for-analytics model that ELT enabled produced data (computed segments, machine-learned scores, predicted churn, lifecycle status) that operational teams genuinely needed back in their tools. Hightouch, Census, and a small number of other vendors built the category around making this load reliable, schema-aware, and reverse-incrementally efficient. The pricing of the category ($27.60 CPC reflects this) indicates the commercial intensity of the space.

Reverse ETL is a complement to ELT, not a replacement for it or an alternative to it. A typical 2026 stack runs ELT to land and transform analytical data in the warehouse, then runs reverse ETL to sync derived data back to the operational systems that need it.

Closing

The honest decision rule for a new build in 2026 is: default to ELT on a cloud warehouse with dbt or its equivalent handling the transformation layer; reach for ETL when compliance requires it, when SQL is not the right language for the work, or when destinations are operational systems rather than analytical ones. The choice is no longer ideological; it is workload-driven, and the workload usually favors ELT.

Teams running ETL at production scale on cloud warehouses should evaluate the migration economically rather than aesthetically. Migration is real work and the existing pipeline is probably reliable; the question is whether the operational complexity savings, the wider contributor pool that SQL transformation enables, and the cost savings of one system instead of two add up to enough to justify the project. They often do at organizations with active analytics engineering teams, and they often do not at organizations where ETL was set up once a decade ago and has been steadily working since.

The broader operational context for both patterns is in the warehouse loading and operations pillar, which covers full vs. incremental loading, change data capture, idempotency, and the monitoring practices that distinguish reliable pipelines from mostly-reliable ones. The data-architecture context is in the data warehouse pillar and in the data warehouse vs data lake vs data mart vs lakehouse comparison, which covers where each architecture sits relative to ELT and ETL workloads. The OLTP vs OLAP comparison covers the transactional-versus-analytical workload distinction beneath the ETL/ELT choice. Change data capture covers the upstream half of incremental loading and applies to both ETL and ELT pipelines. For cloud platform decisions, see the modern warehouse platforms pillar.

Reference

ETL and ELT predate any single canonical source; the patterns are codified across decades of practitioner work.

Ralph Kimball and Joe Caserta, The Data Warehouse ETL Toolkit, Wiley, 2004. The foundational treatment of ETL discipline for dimensional warehouses, including the architectural arguments that informed ETL's dominance through the 2000s.
dbt documentation. The reference for the modern ELT pattern's tooling layer; covers the in-warehouse transformation model that the comparison's "ELT wins" cases assume.
Martin Kleppmann, Designing Data-Intensive Applications, O'Reilly, 2017. Chapter 10 covers batch processing and the trade-offs between processing-in-place and processing-in-transit, which is the systems-level framing underneath the ETL vs ELT distinction.