Data News — Week 39
Data News #39 — Speedata, Amplitude and Anaconda money rounds, MAD landscape, Airbyte worth the hype, data people skill set and usual fast news.
Hi there. Big news this week. The post will be a bit longer than usual as I added some more views about the topics I care the most. If you find views interesting do not hesitate to reach me on LinkedIn to say give me your feedback. I'm looking forward to hear you.
Have fun with the news 👇
Data fundraising 💰
- Speedata a Israeli startup founded in 2019 with 45+ employees is developing chips (processor) to accelerate big data analytics processing. They announced $15m in seed round + $55m in Series A. This is a market trend we should follow because it will impact what Cloud providers are using but also the "AI" dedicated chip market — dominated by Nvidia.
- As announced in the Week 28, Amplitude went public this week via direct listing (and not via IPO — see diffs). It seems that the valuation settled down at $5b after a jump from $35 to $50 per share and as I'm far from being an expert I let you with this Reuters article detailing the operation.
- Snowflake Ventures has invested in Anaconda — the data science platform previously known as the package manager that makes data engineers unhappy. This partnership will help Snowflake bringing machine learning and Python pipelines in the warehouse experience (probably through Snowpark).
Data landscape has gone MAD 😅
Will all VC and market money the data ecosystem and market has probably gone mad: startups and tools are going out the blue every week, salaries are increasing, data is flying and privacy concerns are... what is privacy? Is it a bubble? I dont know.
If you are lost in that lake or if you want to understand a bit what is happening the huge 2021 machine learning, AI and data (MAD) landscape map is out. Shout-out to Matt Turck for this quality work. The write-up is long and decrypt all the concepts for you.
On the other hand thoughtworks team released the Tech Radar 24, when it comes to data ecosystem, they "trialed" Snowflake, dbt, Great Expectations, Delta Lake, Materialized, MLflow and Streamlit and starting to consider DataHub, Dagster and Feature Store concept.
Data Visualization Society survey
The Data visualization society is running a State of the industry survey that is closing down today. Go fill out the survey there, you can also find the result from 2020.
Airbyte — Worth the hype?
I live in a bubble where I see Airbyte a lot, I mean a lot — LinkedIn and relations working there — I haven't had the time yet to test it out, this post is trying to test the tool with a lot of screen captures and thinking about use-cases.
gRPC for Data Engineers ⚙️
I really like articles that directly in the title are saying by whom it is meant be read 🙂. This one is trying to explain simple concepts about gRPC and how to use it when you already master Python.
For those that did not know gRPC is a protocol implemented by Google 6 years ago that aims to be used in API communication, by default gRPC uses protobuf messages (Google again) over HTTP.
Data quality metrics for your data warehouse
Metaplane team comes up with a write-up about data quality metrics you should look into when you want to build a working warehouse. Or as they say in the title KPIs for KPIs. This is a must-read if you are still struggling in data quality definition in your data team.
Data people skillset — Analytics Engineer and others
These last years 2 new positions came out to fill the void in data teams. Even if it seems that the Analytics Engineer is a rebrand of the SQL developer or the BI Engineer — with new skills, tools and profiles tbh. The ML Engineer is here to help DS avoiding becoming unicorns.
A Reddit user analyzed 44k unique job posts and tried to defined what are the technologies used per position. If you are trying to hire data people this raw post can help you finding the right words. AS the author said this is a US-centric view.
My takeaways from this are:
- Python stronger than ever across the universe — Java sometimes here, Scala, wait Scala?
- Old tools like Hadoop, SSIS are still here
- dbt — that democratize the Analytics Engineer position — is doing a small 20% appearance in the AE job posting, what does that mean for the other AE positions?
- Tableau still being the reference. That makes me asking a question I have for a long time, why Looker is the visualization layer in the Modern Data Stack?
Why data scientists shouldn’t need to know Kubernetes
Following the previous post about technology skills for data scientists. We can notice that Kubernetes isn't mentioned (and this is a good point IMHO). But still if you did not read this great post by Chip Huyen it's a good reminder and worth checking because the industry is shifting away from the Unicorn data scientist.
s/Kafka/Pulsar/g 🪛
Geeky title for all sed editors out there. If you have the motivation to move away from Kafka to use Apache Pulsar instead Jesse Anderson wrote about how Pulsar could have help companies like Slack and Uber struggling with some Kafka internals.
Fast News ⚡️
I've already reach the usual 800 words for the newsletter so I'm keeping it short for the last articles.
- Data Content Creator database — Jeremy, founder of naas.ai, created a database with data content creators, if you are looking for other people to follow go check it out.
- Open Source Data Stack Conference replays — Here you are: Day 1, Day 2, Day 3.
- Cloudflare announced R2 Storage — A way cheaper S3, around 0.08$ per GB saved (looking at general pricing), 80$ saved if you lake is 1To and so on. R2 means Really Requestable.
- Enlarge your data — sometime you need to have a larger dataset, this medium post explains what data augmentation is and how it can be useful.
- 192 Snowflake nodes for 10$ — if the title catches you it's normal, Christian tried to launch a big Snowflake cluster for less than 10$ on the trial account.
- [in French] French administration data guidelines — the French administration released this week a big roadmap for public data policy: 15 ministry roadmap giving 500 actions to be done.
- Tech burnout — Two major articles about burnout in tech teams and companies: I Just Don’t Want to Be Busy Anymore and Reflections on Burnout.
Be safe and read the two last articles to take care about your mental health.
blef.fr Newsletter
Join the newsletter to receive the latest updates in your inbox.