Data News — Week 24.40
Data News #24.40 — Back in Paris, Forward Data Conference program is out, OpenAI and Meta new stuff, DuckCon and a lot of things.
Hey, hey, hey. I'm so sorry for this small break about the news. I was in middle of starting my new company, nao, and moving back from Berlin to Paris. Still I hope this edition finds you well, it will be a mix of personal news, OpenAI saga and usual data engineering stuff that I enjoy reading.
First things first, yes, I'm co-founding a company. We called the company nao and you can see it as a no-code semantic layer. Still I keep a post about it for later, but if you're interested, hmu.
Then, with my girlfriend we decided to move back from Berlin to Paris after 2 years there. It's a professional move for both of us, we will miss Berlin to be honest but a big part of our social life is in Paris. Being in Paris will ease all the events and IRL stuff I go / organise.
Forward Data Conference ✨
As a reminder, on November 25th I'm organising the Forward Data Conference. It will be a day to shape the future of the data community, where teams can come to learn and grow together. There are still tickets left—we sold around 80% of the tickets.
This week we announced the program, you can find it on our website. I really like the program we put in place, it a mix of Engineering and Strategic / Vision talks.
The conference will be held in French + English, a few talks will be given in French but we will subtitle them live and we will also find a way to always have something in English in parallel for all English native speakers.
You can use BLEF_FWD24 promo-code to get 15% reduction on your ticket.
PS: dear readers, if you proposed a talk to the FDC which has been rejected, I'm so sorry you did not get a detailed explanation, we received a lot of talks and I wasn't able to write a personal message to every talk that has been rejected. Tho, if you're wondering why, reach me and I will explain you.
AI News 🤖
- OpenAI is our best saga about drama and tech, when the Netflix show is going out?
- DevDay recap — OpenAI DevDay was the developer conference to announce features, models and stuff about their product. The "biggest" announcement was around Realtime API targeting the speech-to-speech applications.
In addition they introduced prompt caching to save tokens costs, the possibility to fine-tune vision for GPT-4o. Last thing is Canvas, which is a new way to interact with the models, I'd say it's a mix of Notion and Anthropic better UI. This is mandatory for OpenAI to improve and diversify their public UI/UX in order to compete with large apps ecosystems.
- DevDay recap — OpenAI DevDay was the developer conference to announce features, models and stuff about their product. The "biggest" announcement was around Realtime API targeting the speech-to-speech applications.
- Whisper large v3 turbo — New turbo version of Whisper has been released on Hugging Face (announcement). Following Realtime voice API, it's great to see improvements in Whisper, the voice model.
- OpenAI to remove non-profit control and give Sam Altman equity — After a magic trick, Sam could receive equity worth around $150b. The important note is also that OpenAI is moving it's core business to for-profit which will not be controlled anymore by the non-profit board.
- Advanced Voice not available in EU — Advanced voice is a Siri interface on top of Chat-GPT capabilities. The unavailability in EU is lobbying at it's finest, fearing AI Act or GDPR could harm innovation. Explain to me why companies with the best engineers in the world can't find a way to make things legal.
- They raised $6.6b at $157b valuation (and $4b in debt). Another $10b after the first in Jan 2023.
- Meta, if there was a race, Meta would be well positioned, who would have thought after Metaverse choices?
- Meta Movie Gen — Meta announce new research for movie generation models. Let's be honest for the moment it just feels unreal, like a video game or something in virtual reality. But in the end, this is maybe what we need?
- New hardware (powered with AI) — Two promising product have been demonstrated a pair of glasses and a wristband that allows you to interact with virtual interfaces with your finger movements.
- SAM 2, Segment Anything Model 2 can run on-device on Apple CoreML — A demo of image segmentation that run 100% offline and on-device. Industrial application might easily follow out of this.
- Mark Zuckerberg says leaders should have technical skills if they want to call themselves a tech company. Yes, but technical leaders are also sometimes not the best ones, maybe the crazy ones, so other skills are required.
- Introducing contextual retrieval — Anthropic introduced a new way to do RAG with more context, that performs better than standard.
- Meta and Google announced automatic dubbing for resp. Reels and YouTube videos, this is something. Translation looks like a use-case that is almost solved with LLMs. It unlocks a world where languages are not anymore barriers, giving us access to instantly content and discussions all around the world, especially if it can run on-device, cheaply.
- Web browser automation through agentic workflows — A Github repo with a demo using Gemini and Selenium to automate browser actions.
- New AutoGen architecture — AutoGen is an open-source programming framework for agentic workflows, they designed a new architecture (to be honest I don't know what it means).
- Klarna drama — Klarna CEO announced he will shutdown Salesforce and Workday to replace it with internal initiatives + AI. Let's see where it goes.
- Paris police wants to keep AI surveillance in place post-Olympics — Who could have predicted?
- Malt AI report — Malt is a French / European freelance marketplace and they dropped their new AI report. A few things I can note going through the report below.
- Snowflake demand has largely increased and it's close to Databricks in volume, tho Hadoop demand is still larger 🙃
- The biggest demand concern stuff around AI like LLM, Deep Learning, Machine Learning, scikit-learn, etc. — in 2024 there are 16k AI freelancer profiles
- dbt pops out as a specific skill on freelancer profile
- AI engineers and scientists have an average daily rate around 500€, which is 100€ more than tech and data general category.
- AI supply is half data scientists half all other tech positions (DA, DE, Back-end, SE, DevOps).
Fast News ⚡️
- CfP for DuckCon in Amsterdam on January 31, 2025 — In January next week, the DuckCon will take place, the call for paper is still open until Oct 18th. I might propose something about yato (?).
- dlt goes 1.0.0 — dlt announced their 1.0.0 version, as well as 1000 open-source customers in production. This version brings stability and marks a new milestone for the library.
Side note, I'm a dltHub investor. - Airbyte is also going 1.0 — Following dlt (?), Airbyte is also going 1.0 with 3 objectives more use-cases, reliability and better throughput performance.
- ❤️ NO SLIDES conference — Be careful before clicking on this link you might loose yourself in a loophole. Recently Timo organised a NO SLIDES conference, a conference where people would only share their screen and no slides. I participated to demo nao, but the demo failed, so the recording does not exist anymore (oups), still I've watched other few talks and really enjoyed.
- ELT with Kestra, DuckDB, dbt, Neon and Resend — How with Kestra you can create a declarative data pipelines to move data using the trendy libraries.
- DuckDB is the foundation.
- Fast feedback when SQL writing — A nice experiment showcasing how writing SQL tomorrow would look like. Imagine getting results directly while typing to have a faster iteration loop.
- BigQuery jobs explorer refreshed — Google team released a fresh new explorer for BigQuery Jobs.
- Coursera and Joe Reis launched a Data Engineering Professional Certificate — I can't recommend Joe enough, he's one of the best when it comes to capture date engineering job and the syllabus is great.
- Current state of Databricks SQL — "The best data warehouse is a lakehouse", lmao. Episode 21425325 in the competition between Snowflake and Databricks.
- The data death cycle — 5 traps you wanna avoid to deliver value with Data & AI products: the tech trap, the doing trap, the project trap, the silo trap and the performance-first trap. And follow-up about silos by Hugo.
No comments
Mainly because of time and length of this issue.
- Is Excel immortal? — Benn
- Evolution of catwalk: model serving platform at Grab.
- How HomeToGo improved our Superset monitoring framework.
- The importance of clear software requirements.
Data Economy 💰
- OpenAI raises $6.6b at $157b valuation. Softbank goes in with half a billion.
- Supabase raises $80m Series C. It's an open-source Firebase based on-top of Postgres.
- Kestra raises $8m Series A. Kestra is an open-source orchestration engine, written in Java, and you create workflows using a declarative model. Ludovic the CTO wrote about turning an open-source project into a viable business.
- fal.ai raises $14m Series A. For the readers that are here for a long time you might remember fal.ai, they were the first to propose a way to mix Python and dbt models with a specific tooling, and they pivoted into a super-faster GenAI inference platform.
- NVidia acquires OctoAI.
- BlackRock and Microsoft plan $30bn fund to invest in AI infrastructure.
- Voltron Data laid off 50+ employees recently. Voltron engineers are one of the best when it comes to under the hood engines powering our modern data platforms.
- :probabl. raised €5.5m Seed round. probabl is the official operator of the scikit-learn brand and will develop products and services around the library. Because we need the data science tooling to be and stay open-source.
Side note, I'm a :probabl investor.
See you soon ❤️
blef.fr Newsletter
Join the newsletter to receive the latest updates in your inbox.