Data News — Week 22.28
Data News #22.28 — SingleStore and Deci fundraising, open-source dbt repos, datavis in the 80s and fast news.
Hey, happy Bastille Day 🇫🇷 (one day later). This is a fast-written edition of the newsletter because I'm so late and friends are waiting for me to finish to go to the bar 🍻. I hope you're good and that you enjoy summer time.
This week this is a less-technical edition as it'll probably be for the whole summer.
Data Fundraising 💰
- SingleStore raised $116m Series F extension. SingleStore was previously called MemSQL and it is used for instance at Uber. This is a all-in-one database that covers relational to real-time analytics use-cases. They go in the same direction as Snowflake but in the other direction: creating a super database to govern them all.
- Deci rasied $25m Series B. Deci provides a deep learning development platform that helps you pick best hardware/architecture to train your models. They developed a tool called the "Model Zoo" that gives you metrics on hardware performance per model.
Why we should open-source our dbt repos
Philosophically this is a good question. Should we open-source our dbt repos? If we remove from dbt repos everything that is not shareable like PII or sensitive SQL it makes sense.
I see for instance a lot of companies facing the same issues in their marketing attribution and everyone rewrite the same SQL again and again (cf. everything is a funnel but SQL doesn't get it). In the end it would also be a huge step forward data transparency. It reminds me when French tax administration open-source the code to compute income tax. But in language M.
Data visualisation from the 80s
Awwww ❤️ . This is everything I like. Great visualisations printed in a great book. Besides the Tufte classic — The Visual Display of Quantitative Information — published in 1983, Tom found Learn to Draw Charts And Diagrams Step by Step published in 1988. The article gives 6 lessons we can learn from the 80s.
Deconstructing community building — dbt, Airbyte and Levels
Sven did an awesome job at deconstructing how dbt and Airbyte became well-known through big efforts around community building. Respectively with 8k and 32k members. Obviously their success is driven by how the community adopted tools.
When retrospectively I look back at dbt in my local French market it just went viral 2 years ago. Everyone was speaking about it in startups and while the tool and the promise is blazing simple everyone wants it.
How we automated FAQ responses at Grab
Grab is an Asian Super-App. Super-App means you can do a lot of different stuff within the app like ordering a ride, a meal or doing payments. In order to speed-up internal knowledge sharing they decided to automate it.
I really like this article because it shows a problem that can be answered by AI — within a company that has the people to do it — but they still chose an external tool to do it. I also think that the method they used to pick the tool is clever.
Fast News ⚡️
- Apache Superset 2.0 released — This is a major release that depreciates a lot of old stuff.
- How to use Airflow with Trino — I recently started to see Trino getting more and more traction in the data ecosystem as standalone — not only as a way to escape from Hadoop limitations. This is a small tutorial on how to call Trino from Airflow.
- Netflix picked Microsoft to runs ads — In order to grow in revenue Netflix decided to go for an ad-supported subscription in the future. They named Microsoft as technology partner to run this ads system. Data will, for sure, flows.
- Run Snowflake workloads on your own on-premise data — To me this is the biggest news of the week. After being able to query some expensive on-premise technologies Snowflake will be able to query your on-premise objects storage like MinIO.
- Kafka team released the first ARM Docker images — Finally for M1 users.
- Differences between Spark, Flink, and ksqlDB for data stream processing — Redpanda wrote an article comparing 3 majors streaming frameworks. If you want to deeper understand what's doing Redpanda team there is this excellent post from the newsletter Interesting Data Gigs about their Senior Staff Engineer offer.
- Data Lake vs Data Pond — 🙄
- 📚 Data Pipelines with Apache Airflow ; book review — Everyone knows here that I'm a huge Airflow fan. As Gabriel is stating this book seems to be a great introduction to Airflow.
See you next week ❤️ (probably for a special edition).
blef.fr Newsletter
Join the newsletter to receive the latest updates in your inbox.