Data News — Week 23.06
Data News #23.06 — Understand the metrics store, Bard, migrate from Airflow to Dagster, lower Snowflake costs and data economy news.
Dear Data News friend, every week there is a bit of randomness when this email will truly land in your mailbox—which, btw, breaks all the rules of newsletter writing. Yeah, you know, you have to get your readers used to a fixed schedule, which they can trust and bla, bla, bla. The good news is that at least with me you can trust that I have no schedule except that you should have the newsletter on Friday or Saturday.
While I feel privileged to be able every week to send my thoughts to so many people, it takes me a significant amount of time to craft and write the newsletter. I ask you to consider supporting me by becoming a paying subscriber. Especially if you think like me that the newsletter is great.
Fast News ⚡️
- News from the generative AI universe — Google announced Bard a competitor to ChatGPT, but with better ethics, etc. In the same time Microsoft opened in beta the ChatGPT integration with Bing. Closer to us on the data space Hex proposed a prompt that can do magic for you.
- Big Data is Dead — A retrospective on why we don't need any more as much as computing power as before. Obviously the article is biased because it's from DuckDB mother company. As a reminder DuckDB runs on a single node fitting all computes in memory. But the article is relevant nonetheless.
- Migrating from Airflow to Dagster is now a breeze — In the orchestration competition Dagster made a step forward, they develop tooling to ease migration from one to the other and one side-effect is that you can orchestrate Dagster DAGs from Airflow. In order to understand Dagster philosophy you should now think with assets.
- Data Analytics framework in Python: from scientific approach to actionable implementation — A framework to conduct data analysis in Python.
- Should you measure the value of a data team? — Considerations about measuring the job a data team is doing and which metrics you should go for.
- Analytics is not about data. It's about truth. — This is an hot take this one because what's the truth?
- Rebuilding a Cassandra cluster using Yelp’s Data Pipeline — This is awesome when we can use our data engineering skills not only to do analytics but also to help fellow tech teams in tasks that are hard to do.
- How to fix your ETL to lower Snowflake Costs — Mark shares a 3 Snowflake queries that you can run to get table usage in order to identify what costs a lot.
- Reflecting on the past 6 years ff data engineering — This is a podcast episode (which I did not listen because of time).
- The complete guide to building reliable data with dbt tests — 10 practical points to improve your dbt tests.
Data Economy 💰
- Acceldata raises $50m in Series C. Acceldata looks like an enterprise data observability tool that does everything other data observability tools are doing. Like drawing charts that shows that you probably have issues 🫠.
- Recently the Kafka company (Confluent) acquired the Flink company (Immerok), economically it means a lot and reshuffle companies strategies. In addition RisingWave also shared views on why you probably need a stream processing system.
- Why big tech companies need so many people — this is a good economical question. For instance, Twitter, should be easy to copy. Why do they need thousands of engineers to develop a website that I can re-develop over a weekend?
- dbt Labs intends to acquire Transform. I just put this here for people who do not read the first part of the newsletter 🫠.
See you next week ❤️
blef.fr Newsletter
Join the newsletter to receive the latest updates in your inbox.