Skip to content

Data News — Week 22.36

Data News #22.36 — Arize and Hebbia 💰, Firebolt lay-offs?, data mesh/contracts, dashboard explosion, a big ML Friday and news.

Christophe Blefari
Christophe Blefari
5 min read
👑 (credits)

Hey, weeks are passing so fast. Every week I'm like I have time until Friday and here it's already Friday.

On the 21st I'll give a talk in French at a meetup: dbt and the modern data stack. I'll talk about the dbt artifacts and my extension dbt-helper. I'd love to see you there 🤗.

Enjoy this week edition.

Data fundraising 💰

  • Arize, a machine learning observability platform, raised $38m Series B. Used by big names. They integrates with the Python standard machine learning stack, with a free tier. If you need drift detection, model monitoring or explainability it's worth looking.
  • Hebbia a document search engine raised $30m Series A. Their website does not detail a lot what they are doing and how. You can ingest PDFs, Office docs, etc. and then ask natural language question to get answer.
  • 😥 Firebolt is apparently doing a lay-off firing dozen of employees. We don't have more information but if it appears to be true it'll sad. It also shows that the data warehouse competition is harder than ever before and their high valuation — $1.4b in Jan — was a tricky spot to deliver.
  • On the same sad note, Snap will shut down Zenly app firing off the whole Paris team. Almost 3 years I was in the same situation with my former employer, I wish all the best to Zenly team. As everyone is saying, Zenly was one of the best French tech team, so if you are looking for talented people try to reach out to them.

Do's and don'ts of data mesh

BlaBlaCar is one of the most advanced French company when it comes to data. The travel company decided to implement a mesh organisation at the beginning of the year rearranging 5 teams into 5 domains. Teams are cross functionnal — like feature teams — in 5 domains: demand, supply (x2), marketing and infrastructure.

In the post Kineret details few do's and don'ts when deciding to move to a mesh structure. As always for a migration the communication is one of the most important topic. With big changes, transparency should come first.

Continuing on the organisation aspect of a mesh, if you want your domain-oriented teams to succeed you'll need to create a way for team to communicate between each other. Data contract is a piece of the puzzle. As data contracts picked up again recently mehdio explained how you can implement data contracts and why it is important.

Small head's up here: you can implement data contracts without an event bus, and even with an event bus you might still need to implement "contracts" that goes deeper than just the messaging system. Because you'll still have exceptions and a lot of stuff will happen outside of the bus.

What if every dashboard self destructed

The title says it all. This is a fun title but it means a lot. In data we have too many things. Many dashboards. Many tables. Many KPIs. What if we automatically destroy dashboards? What if we do it based on views numbers? We could also remove and clean the whole data chain behind a dashboard. In real life I'm not a tidy person, but when it comes to data warehouses or a BI tools I feel this is way more important than my bedroom.

When people are trying to predict the BI future they are often saying that notebooks are the dashboards replacement. I don't think it'll be the case but it's a move forward. In the future of the future people are saying that canvas are the notebooks replacement. I feel this is a good idea, to me it joins the dashboard creativity to the linear execution of the notebook to create a good story.

A small advice I heard this week in the excellent DataGen podcastDeezer episode, in French. If you use a dashboard, a notebook, a canvas or whatever when you release analysis record an additional video to put sound on your analysis. It will for sure help people onboard faster on your work.

Tableau after you read the previous article (credits)

ML Friday

  • The journey to real-time machine learning at Instacart — Whatever we way today's data stacks are still mainly batch. The main reason is often because data is used for analytics where batch is enough. That's also why machine learning often starts in batch. But if you want to do production you'll need to be more reactive. Instacart details their journey from batch to real-time with a feature store at the center.
  • Unsung saga of MLOps — Jaya from Walmart writes about what are the operational concepts around machine learning in production. Training, modeling and canary deployment in the post.
  • Evolving DoorDash’s substitution recommendations algorithm — How a retailer can recommend product when some are not available? This is a great machine learning exercise for aspiring data scientist.
  • Recommendations APIs at Slack — This is a bit of an insider post that show where Slack uses ML and also what's the API infrastructure to do it. Mainly batch, orchestrated by Airflow. Next time when slackbot will suggest you to leave a channel you'll know what's behind.
  • Recommender System Optimization — Music Tomorrow is a platform that gives knowledge to music professional. They reversed-engineered the Spotify recommendation engine to help music industry create more recommended content ➰.
  • Acing the data science interview: 8 practical tips with examples
The newsletter feels like a bullet point collection these days (credits)

Fast News ⚡️


See you next week.

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Small break until January

Data News #50 — Small break until January next year.

Members Public

Data News — Week 24.45

Data News #24.45 — dlt Paris meetup and Forward Data Conference approaching soon, SearchGPT, new Mistral API, dbt Coalesce and announcements and more.