Learn data engineering, all the references (credits)

This is a special edition of the Data News. But right now I'm in holidays finishing a hiking week in Corsica πŸ₯Ύ. So I wrote this special edition about: how to learn data engineering in 2024.

The aim of this post is to create a repository of important links and concepts we should care about when we do data engineering. Obviously I'm full of bias, so if you feel I missed something do not hesitate to ping me with stuff to add. The idea is to create a living reference about Data Engineering.

πŸ“¬ Subscribe to the excellent weekly newsletter πŸ“¬

A bit of context

It's important to take a step back and to understand from where the data engineering is coming from. Data engineering inherits from years of data practices in US big companies. Hadoop initially led the way with Big Data and distributed computing on-premise to finally land on Modern Data Stack β€” in the cloud β€” with a data warehouse at the center.

In order to understand today's data engineering I think that this is important to at least know Hadoop concepts and context and computer science basics.

Who are the data engineers?

Every company out there has his own definition for the data engineer role. In my opinion we can easily say a data engineer is a software engineer working with data. The idea behind is to solve data problem by building software. Obviously as data is different than "traditional product" β€” in term of users for instance β€” a data engineer uses other tools.

In order to define the data engineer profile here some resources defining data roles and borders.

What is data engineering

As I said it before data engineering is still a young discipline with many different definitions. Still, we can have a common ground when mixing software engineering, DevOps principles, Cloud β€” or on-prem β€” systems understanding and data literacy.

If you are new to data engineering you should start by reading the holy trinity from Maxime Beauchemin. He wrote some years ago 3 articles defining data engineering field.

There is a global consensus stating that you need to master a programming language (Python or Java based) and SQL in order to be self-sufficient.

Some concepts

When doing data engineering you can touch a lot of different concepts. Firstly, read the Data Engineering Manifesto, this is not something official in any kind but it greatly depicts all the concepts data engineers daily face.

Then here a list of global resources that can help you navigate through the field:

If we go a bit deeper, I think that every data engineer should have basis in:

Is it really modern? (credits)

The modern (and the future) data stack

Coming from Hadoop β€” also called the old data stack β€” people are now building modern data stacks. This is a new way to describe data platforms with a warehouse at the core where all the company data and KPIs sit. Below some key articles defining this new paradigm.

And now some articles I like that will help you get inspiration.


Once again if you feel I forgot something important do not hesitate to tell me. I'll add more and more stuff to this article in the future.

If you enjoyed this article please consider subscribing to my weekly newsletter about data where I demystify all these concepts. I help you save 5 hours of curation per week.