Data News — Week 37
Data News #37 — Online next conferences, OLAP cube explained, Salary survey, dbt seeds and huge AI Friday and more.
Hi folks, I hope you all had a great week. A new professional year has started — yes to me the new year is in Sept. like I'm still going to school — and I wish you the best for yours. What do you plan to learn this year? On my side I'd like to be better at pottery 🫖.
With this new year, new community events have been announced and because this week I don't have any awesome fundraising to share I'll share with you upcoming events.
Upcoming events 📺
- Sept. 28 → 30 — The first Open Source Data Stack Conference is coming featuring people from technologies powering the modern data stack. It'll be online. This is a great initiative, but without a lot of diversity — only one tech per usage in the MDS.
- Nov. 3 — Monte Carlo is announcing IMPACT 2021, the Data Observability Summit. Great speakers that in the Data world will come to stage online there.
- Dec 6 → 10 — dbt Labs announced his yearly Coalesce conference. Sadly this year it'll still be online. It seems already 33 sessions have been announced.
Also to note that the Kafka Summit took place this week, if you want to have an idea Robin Moffatt, Dev Advocate at Confluent, wrote some Twitter threads about the his attendance.
What's an OLAP cube? 🧊
From an interview question to an onboarding chat data people mention OLAP or OLTP often without really knowing what it means. Claire Carroll wrote an awesome post explaining what is an OLAP cube. It has been written 1 month ago but it's a personal favorite.
As a side note I really like the conclusion about "Jargon as a gatekeeper" saying that we — the data community collectively — keep using complicated terms to create a barrier excluding new people.
O'Reilly 2021 Data/AI Salary Survey
O'Reilly published this week the result of their salary survey (mainly based on US -based respondents). Charts are interesting to see, they were able to split salaries by gender (a gap still need to be closed), by programming languages and also by tools and platform (this last split is not that relevant, the tools are too heterogeneous).
GDPR compliance in a nutshell
It's a first time in the Data News, we are speaking about the GDPR. People from Sifflet wrote a FAQ / glossary post about all you need to know about the European regulation to be compliant.
🌱 Use dbt seeds for your Lookup tables
Daniel Mateus Pires explained how his team use the dbt seeds to manage better the lookup tables — or reference tables. This post a super good introduction to dbt seeds feature.
Airflow hidden features
Did you know the Airflow CLI contains a command to generate an image of you Airflow DAG? I didn't know before reading this cheat sheet about the Airflow CLI.
Folks at Databand.ai also explained how to use Airflow cluster policies and task callbacks to add observability on your tasks without too much overhead.
Understand Materialized Views — Part 1 & 2
Dunith Dhanushka wrote two articles on Medium to help you understand how materialized views can be useful for you and how it can speed up queries.
AI Friday
This week I want to share with some AI articles that have been written in the last weeks that I found really well written and inspiring! It sometimes makes me want to do AI 🙃.
Deezer team explained what they use when it comes to recommend music to new users. It is nice to see people writing about cold start.
Marie-Fleur Sacreste from Preligens, a french defense AI company, described in a well detailed post how they created a unique agile framework to deploy deep learning algorithms in a blink of an eye.
If finally these two articles gave you motivation to work with AI here some lessons learned from 2 years as a data scientist.
Fast News ⚡
- Atlan by Postman team — Data team at Postman shared their journey through data discovery and how they went from Confluence and Google Sheet to Atlan.
- Snowflake unstructured data support — it had been announced at the Summit, it is now in preview, also Snowflake released a serverless way to manage warehouses for Tasks.
PS: unstructured means text, image, video and audio. - DataHub’s UI is getting a makeover — The metadata platform enhanced the UI in their last version 0.8.12 and it also seems that a dark mode is coming soon. Go check their demo.
- Fivetran SDK — Fivetran released their SDK... in Go, the SDK aims to use the Fivetran API to manage programmatically your account. But, in Go.
- New Tableau integrations in Slack — Following Slack acquisition by Salesforce we could expect more crossover like this. This post details what teams at Slack and Tableau have been doing to better communicate with data (get also a sneak peek at the Tableau Slack App).
Thank you and see you next week.
blef.fr Newsletter
Join the newsletter to receive the latest updates in your inbox.