So far we’ve been using references to database objects within our models using <database>.<schema>.<table>
etc, which is a pain when that data moves or you want to point at a different version of it as you have to update lots of script.
Sources are for making this all easier.
Sources have their own yml, letting you define where it is etc and in your models you can use the {{ source('stripe','payments') }}
, like ref
previously.
Now any changes to the location of that table, or you want to use the a table in another location, you can just edit the source yml rather than ever reference to it.
But you’ll also now get sources appearing in your lineage!
Source files look something like:
version: 2
sources:
- name: jaffle_shop
database: raw
schema: jaffle_shop
tables:
- name: orders
- name: customers
- name: stripe
tables:
- name: payments
We can use the sources configuration to indicate the freshness of sources and create warnings or errors if they become stale.
Requires: a field in the source indicating the freshness of the data
You specify a number (or count) of minutes, hours or days (use singular syntax) to create a warning or error. These warnings or error will be shown when you build your models using this source or when you test the freshness with dbt source freshness
.
Freshness tests can be applied at table or schema level if you have the same field in each table of that scheme to test. If you want to exclude a table from these schema level configuration you can specify the freshness test as null under that table.
When creating a sources config from blank, I assume like most configs in dbt cloud, you can use __sources
shortcut to create a basic config:
sources:
- name: source_name
tables:
- name: table_name
When you use a shortcut, you’ll often see greyed out parameters that you have to fill out. You can start typing right away and it will change the first parameter. To get to the next one, don’t you dare touch your mouse, press tab
and you can straight away start typing on the next.