Data News — Week 41
Data News #41 — Google Cloud partners with Tableau, event tracking system at Udemy, NPS for data teams, BigQuery innovations.
Hello Data News readers, I hope this edition finds you good. Last week edition was fun to write and was longer than usual. This week post is shorter but with awesome articles. Like all weeks. But no crunchy fundraising to eat today.
Google Cloud partners with Tableau
But well, I found interesting news about Google Cloud partnership with Tableau unveiled at the Cloud Next '21 — the Google Cloud annual conference. With this move it means that Google Cloud can visualize data with Data Studio, Looker and Tableau.
If we read between the lines to me it's a growth technique for Google Cloud, being Tableau's favorite partner when it comes to Cloud migration. I can bet that a lot of users are still on-premise with Tableau Server.
On the other hand, Tableau will be able to access Looker semantic model. Ironically I bet no-one care about this one. If I am wrong please reach out to me.
Designing the new event tracking system at Udemy
❤️ My favorite article of the week. Udemy team wrote a long piece of article about their journey migrating from a legacy event tracking system to a fresh new one. What amazes me in the article is how meticulous they were in the selection phase of the project. How everything goes is: requirements, buy vs. build, serialization.
If you are planning to go event driven in the next month this article is a good start when it comes to technical design. Huge shout-out to the Avro customization they made.
Snowflake Streams applied to IoT data
Following the previous article I suggest reading this demonstration about Snowflake Streams applied to IoT data. The idea is to do a real time processing on top of an eventual big table (31 billions rows/year). Thanks to Streams you are able to get only new rows and compute faster than a full.
NPS for data teams
When you are in a data you obviously like numbers but you — also obviously — struggle to know if you truly have an impact. Shifting away from a support team to a product team could be mandatory. To go further we can apply NPS survey to data team to get a team KPI to follow.
This is something I've already done in the past but you need to have a certain scale to be sure that you have relevant results and also a certain maturity. But don't take the — good — results for granted because a recommendable data team for business could not be the data team you want to be in.
In order to boost my own KPIs I recommend you to Subscribe to get the news by email each Friday. Obviously no spam and forever free.
Using Singer to ingest data at Glassdoor
Singer is an open-source standard for composable extract-load. It creates a transfer between a tap — a source — and a target. Because Singer is composable it is theoretically possible to use all taps with all targets, bringing a lot of combination.
That being said. Glassdoor explained how they used Singer to ingest data from APIs. I think this is a good introduction post with good ideas. I really like the idea of using Singer schema discovery features to check if Tap schema have been altered.
Thoughts about Hex and dbt
Claire Carroll — previously at dbt Labs — wrote her thoughts about Hex and dbt used together. She says that using a notebook based query tool is better than the Snowflake UI mainly because you are able to juggle between queries results. Finally in the wishing list something everyone is probably waiting for: can we have query editors supporting the dbt ref macro?
As cool as the new table formats
Recently Hudi and Iceberg became topics I write about in the newsletter because it could become the next big improvement in our data stacks. This week we have a series of 3 articles about what is Hudi and how it can be used. On the other side we have a demo article about Iceberg.
What's new with BigQuery
Following the Cloud Next '21, BigQuery team announced what is coming next to BigQuery. With an overlook I can say that they keep bridging the gap in terms of database features with Snowflake by keeping on to adding machine learning expertise. Here a small outlook:
- BigQuery become heavily interoperable → Storage accessible from various part and query federation even more
- BigQuery Omni is becoming generally available (GA) to support multi-cloud based workflows, but no idea about the price (Google if you read this contact me)
- They preview
GRANT / REVOKE
commands to support data authorization — row and column level security - Run Python external functions (and 6 other languages) from your SQL
- New Monitoring UI to understand how BigQuery is used
- And more: Table snapshots and clones (hey Snowflake), cheaper write API for streaming, search indexes for text fields, better ML explainability and Vertex AI integration
Fast News
It's time now for the fast news. Cools news but faster than before.
- What's new in Apache Airflow 2.2.0 — Have a look at the
@task.docker
decorator. Neat! - MonteCarlo released on OReilly Data Quality Fundamentals — if they could also release observability term in the dictionary that would be awesome because the linter will go crazy (just kidding).
- Curated list of dbt resources — Hiflylabs created a Github repository of with awesome articles and tutorials about dbt
- The only Data Mesh video you should watch — if you have 30 minutes (+30' of question) to spare I recommend you to have a look at this video. They explain how they implemented domain design to data warehouse.
- Databases explained with Manga — if you want to have fun learning databases concepts read this manga.
- Bitwise operations for data engineers — Daniel wrote about something fun: bitwise operations in Python. He does not explain well the
(2 & 3) << 1
giving 4 but the post is cool.
Have a great weekend and see you next week!
PS: if you read the newsletter until here I thank you, what do you think of a audio format 🎙️ (podcast) of the newsletter? Will you listen it? Drop me an email to tell me, I'm curious about it.
blef.fr Newsletter
Join the newsletter to receive the latest updates in your inbox.