Friday, July 13, 2012

ETL, Not Always In That Order

Almost everyone in the Data Warehouse business has extracted data from a source system, then applied processes to cleanse it, de-duplicate it, aggregrate it, or many other operations with it, then loaded into the final resting place of a respective Operational Data Store (ODS) or Data Warehouse.  We as data professionals call that process ETL which stands for Extraction, Transformation, and Loading.  Nothing earth shattering about this and used commonly.

At a Data Warehouse client of ours I have heard their employees use the terms ETL and ELT interchangeably. Every time I heard them say this I thought maybe they just got the acronym mixed up; which is understandable.  I don't how many times I confused CTE (common table expression) and called it CET, but know it was more than a handful.

So I asked an employee at the client why he used the acronym in that order and to my surprise he said Extract, Load, then Transform and explained it.  I had that look of a cartoon character when their mouth drops, hits the floor and their tongue rolls out.  Like this...although my overalls and bow tie are much cooler.

I decided to research this. To my surprise both terms are correct and used in the industry as well as with a third term; ETLT.  Now, I'd be remiss if I didn't state that I always speak about ETL as a layer of a BI solution or in general the process of transporting data.  I usually don't speak about the exact procedure of ETL, ELT, ETLT, although I have done it many times.  I've just never spoke about it in this way.

Let's define each procedure and the differences:

ETL: Classically the way Data Warehouse / Business Intelligence professionals categorize the process of extracting data from the source, combining, de-duping, cleansing (transforming) it, and loading it into the target Data Warehouse or ODS. ETL can also be the exact order of the procedure used.

ELT: An alternative procedure to perform the ETL process on the target server. i.e. Extract the data from the source, load it to the ODS or Data Warehouse, then transform it to its final form.  This tends to be heavy on the destination relational database management system where the Data Warehouse or ODS lives. There is a need to have enough horsepower to handle it in the Data Warehouse or the ODS.

ETLT: The same description as ETL above, then one last transformation on the target Data Warehouse or ODS. ETLT is slightly heavier on the destination RDBMS system than ETL is, but not as heavy as ELT due to the fact that in ETLT it is usually a few final transformations and not the bulk of them.

No comments:

Post a Comment