Data News — Week 38
Data News #38 — ClickHouse and Bigeye fundraising, Snowflake and Databricks competition, dbt v1, Infra as SQL and new trendy OS technologies.
Hello, a new fresh edition of the Data News is delivered to you in time. This week we maybe have less articles than the previous weeks but it raises still some interesting trends and questions.
Data fundraising 💰
- Yesterday Bigeye announced their $45m Series B following a Series A 6 months earlier. Bigeye (formerly called Toro Data) is a data product focused on a data monitoring and alerting. You need to connect your sources, setup your metrics (auto or not), setup thresholds (auto or not) and you're done.
- ClickHouse, Inc announced $50m in Series A founding. The high-performance columnar database will be incorporated in this new company and will spin-out from Yandex (their founding home company). I can imagine they will try now to compete in the Cloud database segment with others.
Snowbricks & Dataflake
This is a topic I've already mentioned in the newsletter. Snowflake and Databricks are converging, they both compete on the "Cloud Data Platform" segment (and also on Data Cloud 🤷). Annika Lewis wrote a two-parts analysis on where its going and also what are the similarities in this competition with previous SAP & Oracle competition.
Is BI dead?
Benn asked this question this week about the future of BI. Is BI dead? In a sense, yeah the terms BI and BI tools became untrendy because now we are speaking of Modern Data Stack so we don't want old stuff, we want modern tools. But what does that even mean? Is Looker really modern — why Tableau is considered too old to be at the table? I'd say it's actually only vocabulary discussion. The original BI is no longer the same, now we have visualization tools in a whole data ecosystem.
dbt v1.0 — get ready
In December 2021, the dbt Core v1.0.0 will be released. I'm not sure it will change a lot in the product — still some great improvements coming — but the perception will be at least different. dbt is now used by more than 6000 teams and here to last. With the v1.0.0 it means that you can start building on a stable version for the future. So prepare to upgrade.
Infrastructure as SQL
This week thanks to Octavian Zarzu I've discovered a new Data x DevOps range of tools: Infrastructure as SQL. Imagine a place where you could be able to run SQL queries to know how many EC2 instances you run and how much memory it represents. It's amazing. 3 tools came out recently aiming to do that:
- Cloud Query — open-source and working with AWS, GCP, Azure, Yandex and DigitalOcean (+ Slack and Kubernetes)
- Infrastructure as SQL (iasql) — only in early access, in creation by Alan Technologies, a company based in SF
- Steampipe — open-source and working with AWS, GCP, Azure (+ Slack, Github, Zendesk)
Verticalized product analytics suite: PostHog
The product analytics space is one of the first to get specialized solutions in the modern data stack. This week I want to share PostHog that provides a self-hosted solutions for company to own the whole product analytics workflow.
They want to replace these complex product SQL queries by a platform already containing funnel analysis and product usage trends to name a few.
Under the hood if you plan to self-host the platform they will launch a ClickHouse instance to be able to keep a interactive UI. If you need to go deeper in ClickHouse I propose you this Reddit thread comparing it with Apache Pinot (another low-latency columnar database developed at LinkedIn).
Ad event processing at Uber
The Uber team wrote a post detailling the technologies they use to do a real-time exactly-once event processing. What I found fun here is that Uber become so huge today that they say using Pinot in that post, but they also appear on ClickHouse website in the portfolio.
A/B test explained by Netflix
If you are still new to A/B testing or that you need a post to explain it your colleagues Netflix wrote it for you. What is an A/B Test? This is the second post in the this dedicated series. It talks about the product challenges and that everything starts with an idea.
Does empathy play a role in being data-driven?
Adam Votava writes in TDS about empathy. Do we need empathy to create a data-driven company? As he said "data doesn't lie!", so why bother being empathic? Go there to find out.
PS: this summary has been voluntarily wrote without empathy 🙃 .
Fast News
- re_data — This is an OS data quality tool worth checking it out: Build on top of dbt, re_data helps you find, debug and resolve problems in your data. They got me at "Discover data issues before your users & CEO".
- BigQuery Cloud SQL federation — if you need to query Cloud SQL instances from BigQuery you can try federated queries, this post guides you through it.
- How to run containers on AWS — if you are asking yourself how to run containers on AWS, Last Week on AWS gathered all ways in one post or you. Spoiler there are more than 17 ways 😂.
- Tabular, Iceberg and new datalake generation — 2 weeks ago I wrote about Tabular company fundraising, this week founders wrote a note regarding the Iceberg community and Paul detailed on TDS what Iceberg, Hudi and Delta Lake are bringing to the table.
blef.fr Newsletter
Join the newsletter to receive the latest updates in your inbox.