Skip to content

Data News — Week 24.45

Data News #24.45 — dlt Paris meetup and Forward Data Conference approaching soon, SearchGPT, new Mistral API, dbt Coalesce and announcements and more.

Christophe Blefari
Christophe Blefari
6 min read
Métro-boulot-dodo (credits)

It's Data News time. Time really flies on my side, and apart from the bad news from across the Atlantic, all is well on my side. To be honest, I miss you folks. Writing here has been my little thing for the last 3 years and because I haven't been able to get back to my previous frequency since July, I feel empty every Friday.

I'm back in Paris and, wow, the way I live my life in Paris is so different from Berlin, Paris demands speed at every level. I've only been back 6 weeks and I feel like I haven't even left the last 2 years. I haven't yet settled back into my routines by jumping between all the hats I've decided to accumulate over the last few years: content, freelancing, founder and conference organiser.

I don't know when I'll be able to start writing here once a week again, but I'm doing my best to do it as soon as possible.

Enough.


On November 19th, I'm organising with dltHub folks the first dlt Paris community meetup, the event will take place at 42 and will start a 16h. It will feature:

  • Navigating the complexities of enterprise ELT, towards data democracy and cost efficiency
  • Me — Towards a simple future (dlt, DuckDB, yato and more)
  • dltHub CTO, Marcin Rudolf — A teaser of the upcoming dltHub "Portable Data Lake"
  • Lightning community talks (reach out if you want to present something)

It will be only a few days before the Forward Data Conference that we sold out a few weeks ago, the program is out—the official schedule is coming soon. We are very proud and honoured, along with the organising committee, that around 300 people took a paid ticket to the event. I'm sorry for those who weren't able to get a ticket, we've set up a waiting list and we're doing our best to find a way to push the walls.

I would also like to thank the sponsors who are accompanying us on this exciting adventure: Castordoc, Omni and Corail Analytics, Mirakl, SYNQ, nibble, Sparkline, Monte Carlo.

Whether at the dlt community meetup or at Forward, I look forward (no pun intended) to meeting you all.

Conferences coming—I'm hyped (credits)

AI News 🤖

I'm a bit offended, AI news is not what it used to be 🙃, we were used to more exciting news, competition and drama by the space.

  • ChatGPT Search — OpenAI finally plugged ChatGPT to internet and live data. You can now switch on the web logo and ask for model to search on the web alongside his training knowledge. When you mix it to the new Canvas UI ChatGPT look more and more like Google Search results.
  • HubSpot co-founder bought chat.com earlier this year and sold it to OpenAI for shares [via Oliver].
  • Claude Computer Use — Like a soufflé, everyone was hyped when Claude released Computer Use, a chat interacting with an operating system, but a few days after it looks like almost everyone forgot it, like the 01 interpreter. If you wanna try Computer Use (at your own risk), there is a repo with a Docker image launching a VNC and a Streamlit app—it works fine.
  • New Mistral APIs — a batch API, for batching calls rather than doing it synchronously lowering by 50% the costs and a moderation API which is a 0-categories classifier scoring text into intent.
  • IBM released lightweight open-foundation models — 2 sets of "small" models: 2B / 8B dense models and mixture-of-experts 1B / 3B models. IBM has proudly shared the datasets they used to train their model.
  • Skrub: Less data wrangling, more machine learning — skrub is a preprocessing / feature engineering library for tabular machine learning. The video emphasis something critical, even if we often talk about training impact—time, carbon footprint—we tend to forget that inference is also a critical part and more importantly because of the preprocessing. So your preprocessing matters.
  • Standford, "Building Large Language Models (LLMs)" — 1h44 of a Stanford class about building LLMs. I did not watch it, but I bet you gonna learn at least a thing watching it.
  • Generative AI’s Act o1 — Sequoia essay deep dives into the current state of LLM apps and infra which has been stabilised around Microsoft/OpenAI, AWS/Anthropic, Meta and Google/DeepMind. The text touches fast vs. slow reasoning, System 1 and System 2 thinking, while in the awaiting the all-mighty AGI to come plunging us in a new technology era.
  • Apple proved LLMs do not reason (at least Mathematically) — Did we actually needed Apple for this? Because it's actually a design nor a bug or a feature. Apple researchers published a paper saying: "our work underscores significant limitations in the ability of LLMs to perform genuine mathematical reasoning". No way.
  • The future is robot — In the recent weeks a lot of new robots were (re)-announced with eventually new features like Humanoid robots, Tesla Optimus, Boston dynamics. Happy to learn that we really need robots to fix the jobs market.

Fast News ⚡️

Google lacks money (credits)

Food for thoughts


See you soon ❤️

PS: for our product research with nao we are looking for analytics engineers working on modeling daily. If you fall into this bucket, answer by saying "hi i'm an analytics engineer" and we will follow-up on this 🤗.

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.40

Data News #24.40 — Back in Paris, Forward Data Conference program is out, OpenAI and Meta new stuff, DuckCon and a lot of things.

Members Public

Data News — Week 24.37

Data News #24.37 — OpenAI o1 new series, building low cost platform with Model dlt and dbt, Data teams survey, feature store, Ibis without pandas.