Data News — Week 51
Data News #51 — Happy Holidays, Airbyte raised, ROI of data work, structure of data, data versioning.
What a year, it's almost the last week of the year, I imagine a lot of people are in holidays. So today it'll be a short Data News and next week will be a retrospective post about what we achieve in 2021.
Data fundraising 💰
- Airbyte raised $150m in Series B, it's an extract-load open-source platform with a Cloud version aiming to compete with leaders like Fivetran or Stitch. The money will mainly be used to increase hirings but also to launch other products like real-time data ingestion or reverse-ETL.
- brytlyt, raised another $5m in an extends of a Series A to launch their data analytics and visualisation platform. They leverage PostgreSQL with GPUs in order to create analytics platforms that "scale".
How to think about the ROI of data work
Once again Monzo data team offers us an awesome data article. This time it's about measuring the ROI of data work. This is probably a question all the data teams have. How can we prove the C-level that data investments are profitable?
In the article Mikkel shows a new way to talk about ROI, he also brings nice visuals to explain all the concepts. To be honest this is a must-read.
How should organizations structure their data
Every once in a while we get data modeling articles and Kimball concepts comes back to the denormalisation world Hive, BigQuery and Snowflake have brought years ago. Michael compares Kimball, Inmon and Data Vault structures to help you get started.
Personally I'm more a pragmatic person so the simpler structure, to me, is often the better.
Improving Data Quality with Data Contracts
Sometimes we expect (or we wait) for a magic product to solve all our Data Quality issues. But, spoiler, it may not solve everything. Probably you will need to define schema (Data Contracts) on you data and enforce them. The team at GoCardless added a schema validation layer in their CDC architecture to bring a better data quality. If you are in this, go check it out.
Deploying Airflow 2 on EKS using Terraform, Helm and ArgoCD
This is a huge 2 parts tutorial. Vitor explains how you can deploy Airflow 2 on AWS using ArgoCD, Helm and Terraform (part 1 & part 2). Obviously this is a way to deploy Airflow, but not the only one. When we look at the numbers more and more companies are now deploying Airflow on top of Kubernetes.
In the tutorial you will find Terraform files and also how to configure your Argo to make Airflow works. If you are new to these technologies it'll give you a overlook.
The guide to data versioning
If you want to understand how data versioning is working, LakeFS team wrote an article detailing the 3 most common versioning practices.
Thank you all for the support over this year, this week I have been hit by the Covid so this edition is shorter than usal but I still wish you Happy Holidays and see one next week for the last of the year.
Stay safe.
blef.fr Newsletter
Join the newsletter to receive the latest updates in your inbox.