Okay so in this next section the objectives are…
Okay so traditional data teams have engineers and analysts (I’m dropping the use of ‘data’ in their names). Engineers are the backend people, getting data into databases in nice structures with cadences. Analysts sit downstream, playing in the data, building useful things with it. For the analyst to get the most of the the data, the engineers must do their jobs in different ways, and this requirement has led to a more modern structure of having an analytics engineer, a bit of both.
The idea is this will lead to greater efficiency as there is not a need to communicate between these roles, if the roles are one person.
Data warehouses have changed the game and ETL processes are being replaces with ELT. Left behind are the self managed database systems that you add data to, pull back out, change and add back to the db. Data warehouses let you do that all in the cloud, treating the tech as a service, and handing over the compute to super computers. The storage is fairly cheap and let’s you add the raw data, then process it within the warehouse. Run of of space or want to improve speed, scaling is easy (if you have the cash).
The analytics engineer is the T (from ELT), transforming data from the warehouse into the BI layer, ready for the analyst to use. This frees up work from the engineers, doing the EL, to work without handling requests from analysts. So analytics engineers does not replace data engineers and data analysts, but rather sits in the team with them.
Leaving us with a work split like:
But the lines are blurry and one person may wear many hats.
TWO chilli seedlings are withering, battle stations 🚨
(10 minutes later)
Okay, the gang have been watered and some pots have been prepped to pot them up later on (have to warm up the soil first because these tropical plants don’t like the northern UK climate).
It’s not a data loader and it’s not a BI tool. It works with data platforms to transform data after it has been loaded and before BI tools can uses it. It also is particularly nice for T (ELT), with it’s dev interface, testing, documentation and deployment.
I’m a data visual person really, so I enjoy the nodes and how they connect. Directed Acyclic Graph (because I’ll forget).