Mastodon and Hadoop are on a boat... (credits)

Hey you, 11th of November was usually off for me. Since I've started my freelancing activities I don't really follow the usual calendar, working whenever I need/want. I mainly work 3 to 4 days a week. Which is awesome but it has a major drawback I never took a break longer than 1 week. Which, yeah, kinda sucks. Let's change this next year.

On a social note, today I've joined data-folks Mastodon server, you can follow me there. I'll add this new community as source for my curation and I'm gonna try to be active there.

Also, on the 21st of November I'm gonna talk to a meetup for the first time in English and in Berlin. So if you wanna listen my terrible French accent, join us. I'll speak about "How to build the data dream team".

Let's jump onto the news.

Ingredients of a Data Warehouse

Going back to basics. Kovid wrote an article that tries to explain what are the ingredients of a data warehouse. And he does it well. A data warehouse is a piece of technology that acts on 3 ideas: the data modeling, the data storage and processing engine.

In the post Kovid details every idea. In this cloud world where everything is serverless a good data modeling is still a key factor in the performance—which often mean cost—of a data platform. Modeling is often lead by the dimensional modeling but you can also do 3NF or data vault. When it comes to storage it's mainly a row-based vs. a column-based discussion, which in the end will impact how the engine will process data.

Schema changes management

A story of an int becoming a str (credits)

I bet that most common data horror stories are about schema changes. It could be because the product team changed an integer to a varchar in a source Postgres table or because an analyst remove the tax field in the income table. Every time it means morning headaches with Slack messages, Airflow screaming at you with red circles and downstream pipelines to re-run.

Fast forward to today, more and more team are trying to fix this. Here are few articles that will give you few ideas about stuff to do—tbh, there isn't a one-stop solution to fix it:

Machine learning at Riot Games

If you play video games like me you'll like this video. If not, you'll still like it I think. This is a morning coffee from the MLOps Community with Ian Schweer who works at Riot Games. Ian describes how Riot Games uses data and what machine learning means.

Even if I recommend you to watch the video here few points I've written that were interesting to me:

Fast News ⚡️

Delivering the fast news (credits)

Data Fundraising 💰

PS: Regarding database trends Cloud Database Report wrote a great article about 7 actual database market trends. More serverless, graph, vector, Postgres is used everywhere, etc.


See you next week.