Data News — Week 24.02
Data News #24.02 (late) — First DN edition of the year, let's catchup with awesome content written these last weeks.
Hello you. Back to the usual Data News—with a little delay, I'm sorry.
First of all, I'd like to thank you for your positive comments on last week's article. It's a subject close to my heart and I was very happy to share it with you, because I never thought that Data News would become such a big part of my life.
I'm starting my annual university lectures today. It's always very exciting to go back and teach students, to help them discover the world of data from another perspective. The details: it's a 27-hour course called DataOps. It's quite a broad subject. I actually cover data engineering and how to put data stuff into production.
For years I gave a 30-hour lecture called Python for Data Science in which I covered the basics of Python, pandas and scikit-learn. But I stopped 2 years ago because it was too much and repetitive for me. I'm very happy with this new DataOps lecture because it's much closer to what's really going on in the data world.
Over the years I've accumulated exercices and one day—I hope this year—I'll provide it to everyone in a nice way.
It's funny because in the days leading up to the lecture, I'm always stressed about something: I'm always afraid I'm going to run out of content. The last thing I want to do is give a boring class.
Wish me luck and have fun reading the news.
AI News 🤖
- Bill Gates talks with Sam Altman — An 30 minutes episode of Bill Gates' podcast where he chats with Sam Altman.
- 14 predictions about AI — In a long form article, Vincent shares his predictions about AI and the trends we might see in 2024. Garbage in, garbage out, still one of the most important issue. Personally, I have a question for authors in 2024: when are you going to stop generating images to illustrate articles? They're horrible and destroy the content. If I have to predict something it would be the this trend to stop.
- Meta, from audio to synthesize human in conversation — Do we finally see an outcome of the billions Meta invested in the Metaverse 🙃. To be honest this is impressive, from an audio Meta is capable to generate a photorealistic avatar that behaves like if it was you speaking.
- How Meta is advancing Gen AI — a podcast about Meta GenAI breakthroughs.
- Microsoft will replace the Windows keycap by a Copilot — This might be a major change to Windows computer and keyboards, Microsoft wants to add a physical AI trigger on every keyboard. Might be the best adoption trigger we ever saw.
- I coded exclusively with ChatGPT for 30 Days — Good takeaways about a nice experiment.
- IBM explaining why they invested in HuggingFace — During gold rush sell shovels. It explains NVIDIA 2023 success, but HuggingFace is legendary for the same reason. HF became the defacto platform when it comes to share and showcase AI models.
- Sentence embeddings — After reading this article you will be able to do a PhD in embeddings. Personally I did not read it but if you want to understand embeddings you should.
Fast News ⚡️
- dbt related stuff
- Download artifacts from you dbt Cloud job runs — a tutorial from a CLI tool to generate ERD diagrams for dbt Cloud projects.
- Testing dbt macros — A clever pattern to write unit tests on dbt macros with a model computing all the possible macro values and a dbt test checking all the possible cases.
- Unit testing dbt models — Using a dbt-unit-testing package Matthieu showcases how you can easily test your models.
- dbt meta tag — A list of the companies habing product features depending on the
meta
tag. It shows how deeply dbt change the data world.
- What I would do differently getting into Data Engineering — Data engineering has changed a lot in the recent years and Daniel gives 3 advices that you should consider to get into data engineering. Learn SQL, be social and learn to say no.
- Lead Data Engineer career guide — Detailed skillsets needed to be a lead data engineer.
- Effectively managing junior developers on remote teams — In the current state of the ecosystem this is super important to provide a perfect introduction to the data world to juniors.
- Every data transform is technical debt.
- How BigQuery stores semi-structured data? — It relates to Dremel and parquet structures.
- Mixpanel modern data stack fast lane.
- Netflix video processing rebuilt with microservices.
- How Monzo built Year in Monzo.
- A/B Testing at HomeToGo.
- Datadog, scaling self-serve analytics, serving 5000 employees — 🤯.
- 2024: the year of the value-driven data person.
- Transfer data from BigQuery to Fabric with Arrow and Rust.
- Removing egress fees when moving off Google Cloud.
- The evolution of a data platform.
- Fixit, MotherDuck SQL AI error fixer.
- What's next for Malloy in 2024.
Data Economy 💰
- Talend will shutdown Talend Open Studio their open-source version on January 31. As a reminder Talend has been acquired by Qlik 9 months ago. This is probably a strategy to keep money flowing. See you Talend 👋.
- Alteryx to be acquired by private equity firms in $4.4B deal. OK.
See you soon ❤️.
blef.fr Newsletter
Join the newsletter to receive the latest updates in your inbox.