Skip to content

Data News — Week 17

Data News #17 — Mozart Data and Deepset fundraising, conferences time, Netflix fact store, data compression in Python, Zach Wilson on Reddit, etc.

Christophe Blefari
Christophe Blefari
4 min read
Some people are saying I'm the Data News Mozart (credits)

Hello friends you'll find below this week's newsletter. A lot of bulleted lists, but I had less time than usual to write it. So enjoy and see you next week!

Data fundraising 💰

  • Mozart Data raised $15m in Series A. Mozart Data provides an all-in-one data platform for modern needs. Snowflake has been picked behind the scene as primary warehouse and customers can use a lot of connectors, SQL based transformations and exports. This kind of cloud data platforms has flourished in the last two years to provide companies faster cold start.
  • Deepset announced a $14m Series A and a cloud version of their end-to-end search NLP platform. Deepset has open-sourced haystack, a NLP framework to do neural search, question answering and semantic document search. I got an idea with haystack, stay tuned...

Conferences time

Recently a lot conferences took place. Here some records or topics I liked.

Netflix ML fact store

This is maybe the first time your hear of this concept. Conceptually we commonly call it a feature store. The idea behind is to store at any time the value of facts — or machine learning features — about your users. Netflix decided to call it a fact store. This is a fact. But still their architecture is interesting.

Everything relies on a data storage, called Axion, which is a mix of Spark, a cache and Iceberg. Obviously they developed their own key-value in-memory cache called EVCache. Every feature store contains a key-value storage because you need data per ml "entities".

If you still struggle understanding what is feature engineering, Swapnil tries to unravel the feature engineering mystery.

As a follow up a small ML Friday 🤖

This is blue. This is a fact. (credits)

How to build a lossless data compression and data decompression pipeline

Navigating through compression engines can be hard. In order to help you understand what it means Ramses explains what are the building blocks of a data compression pipeline and gives an example of the bzip2 algorithm written in Python. After reading the post you'll feel like Richard Hendricks.

Zack Wilson — Ask me anything on Reddit

Zack Wilson, the LinkedIn influencer, ran a AMA on Reddit about his experience being a data engineer at FAANG with a great career path evolution from L3 to L6. From the answers we can learn a lot like from what is the interview process at Netflix to advices for entry level people.

Fast News ⚡️

Delete Facebook — and all social networks? (credits)

PS: I hope you still have fun with pictures legends.

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.45

Data News #24.45 — dlt Paris meetup and Forward Data Conference approaching soon, SearchGPT, new Mistral API, dbt Coalesce and announcements and more.

Members Public

Data News — Week 24.40

Data News #24.40 — Back in Paris, Forward Data Conference program is out, OpenAI and Meta new stuff, DuckCon and a lot of things.