The World is Mine

Bonjour ! Here your fresh Data News edition. This is the first day of July and summer break will arrive for many of you. Sadly data never waits. So I hope you have enough team redundancy to be able to disconnect.

This week I published a new YouTube animated video: data explained to kids — in French but with English subtitles. BTW if you want to help me to do the English version, ping me 👋

Data Fundraising 💰

Raising money

Things you should know about databases

I often share stuff around databases knowledge. This is a content I really enjoy. It's time for you to learn new things you should know about databases. Mahdi wrote a post with great illustrations. He explains very well how indexes and transactions work. What really happens between BEGIN and COMMIT in your SQL query?

A rant against dbt ref

Complaining about dbt became a trend. When you see the adoption and how people are happy about it this is normal at some point to see dissonant voices. It's Max turn to rant against dbt ref.

I do agree with Max. ref manipulation is a pain point in dbt that breaks the magic. Especially when your workflow as a analyst is:

And you do this every day, for every model you touch. In the end you spend more time playing Where's Wally? with tables names rather than writing SQL — ok I exaggerate a bit, but you got it.

On this specific point I think this is possible to develop a browser extension which on the fly replace tables names with the right dbt references — while waiting for some changes from the inside. If you want to do it with me.

The data quality no-one is speaking of

This is an intervention.

Thanks to the Modern Data Stack and dbt we created SQL-driven platforms and analysts are becoming SQL monkeys. This is not good. Pissing SQL all-day long creates monstrosities. Rather than adding extra layers to achieve data quality, build quality from inside out.

In this post, that I deeply recommend, Petr speaks the truth. Everyone should go back to the root cause of data quality issues: your code complexity. It's time to "tame the complexity".

Finding the real data quality

Databricks Summit

Databricks Summit (called Data + AI Summit) is taking place. As I don't have the time to follow it, here is Simon's feedback on Day 1 and Day 2. In a nutshell they announced

ML Friday 🤖

Fast News ⚡️


PS: for the first time in a long time I'm not late. See you next week ❤️.