How to Refactor SQL into dbt Cloud Models

This is all about taking existing sql scripts and migrating them into dbt

Migrating into dbt
Implementing sources
Refactoring strategies
Cosmetic cleanups, CTE groupings
Centralizing logic, dividing into models (stg, int, final)
Auditing dbt vs existing code

1. Migration process

Starting from a sql script we want to migrate into a dbt project. It might be a good first step to create a folder within your models called ‘legacy’ where we can copy and paste our script to. We can then run that script to check it all works in this environment.

If you’re migrating between source systems (e.g. SQL Server to Snowflake) then this is where work will be needed to migrate between the flavours of sql.

2. Sources

Look through sql for hardcoded table references. Create folder hierarchy for the staging tables following the convention of models > staging > source system. We can then create source configuration files within those system folders, calling them something like _sources.yml and it’s okay if these files have the same name in different folders as the folder makes them unique.

Now you can go through and replace all hardcoded table references with source references.

3. Choosing refactoring strategy

This is the steps you’ll choose to take to organise the models you make as part of refactoring.

If you create a branch and start working on the legacy model, changing sources, modifying the script structure etc, you won’t have something to audit against within this project/environment.

Another way to refactor would be to work along side the legacy model. You’d start this by have the legacy model sitting relatively untouched in the legacy folder, but you’d also copy it to the model folder that is you end goal, e.g. a mart folder. You’d then work on this copy in the goal destination. It might feel like you have duplicate models being run, but that’s the point, something to audit against without relying on another environment.

4. CTE Groupings and cosmetic cleanups

Cosmetics

The goal is to make it easier to read.

If you have mixed cases used, perhaps convert all to lowercase (how to)

This is also a good time to convert long lines into new line and tabbed structures for readability.

CTE Grouping

At this point it will be useful to order our CTEs to make the script readable from top to bottom.

Import CTEs: Anything using the source references, but just that part, e.g:

payments as (
select * from {{ source('stripe', 'payment') }}
)

Logical CTEs

Working with the import CTEs
Final CTEs

The final product of the script
Simple Select Statement

I was unconvinced on this part, but it’s a simple select * from <final CTE> and apparently this makes life easier down the line

5. Centralizing logic, dividing into models

Identify what CTEs could be staging and marts models confused which is which?

Aiming to have sources that have structures like:

with

source as (

    select * from {{ source('jaffle_shop', 'customers') }}

),

transformed as (

    select

        id as customer_id,
        last_name as surname,
        first_name as givenname,
        first_name || ' ' || last_name as full_name

    from source

)

select * from transformed

These staging models are the only models to reference sources directly.

Then we can build intermediate models and our final mart models in a top down fashion, building with the staging models and further CTEs to simplifiy the logic.

6. Auditing

We could now select from our legacy model and from our new mart model to compare them. However there are also packages that help do this. One of which is audit_helper. Read more about packages

Using audit_helper, you can quickly compare against legacy and look for any issues that came up/ assert you new model is performing as expected.

Written on February 26th, 2025 by Chris Meardon

How to Refactor SQL into dbt Cloud Models

1. Migration process

2. Sources

3. Choosing refactoring strategy

4. CTE Groupings and cosmetic cleanups

Cosmetics

CTE Grouping

5. Centralizing logic, dividing into models

6. Auditing

You may also enjoy:

Hacky Solution? Using calculated field as a business preference

"Calculated dimensions can be only defined at either Row or LOD (Level of Detail) level" bullshit

Why use DMOs when you have DLOs?

Quick note on packages in dbt Cloud

How to quickly and easily convert to lowercase in dbt Cloud