What is the metrics store
What is the metrics store? What are the key difference with the metrics layer or the semantic layer?
This week dbt Labs announced the intention to acquired Transform. While, you should already be aware about what's dbt, there are still unknowns about what's Transform. Transform is a company that has been founded by ex-Airbnb employees—which is important here—that proposes an open-source metrics framework and a SaaS metrics store.
At the moment Transform is a small company compared to dbt Labs, only 40 employees according to LinkedIn and they raised around $25m. Which is only 10% of dbt Labs actual workforce. But I think this acquisition matters and will shape our data stacks.
In the past I've made jokes about the naming confusion the data field was into, especially with the following terms: semantic layer, metrics layer, metrics store, headless BI, features store. This is want I want to demystify today. I've spent the whole day reading and watching content in this category and I want to help you understand what it means for us. As a side note, it's fair to say that I also wasn't a believer in the actual necessity of this infrastructure piece. After a full day of research I'm more into it, but we have to be careful.
First, definitions
Before going further I have to write down some definitions. These definitions are mine and if you think I'm wrong you'd be more than happy to get your feedback on it. This is also super hard to have a universal definition across all vendors—as can be seen by this discussion.
- Measure — a measure is a value on which we can do all sort of computations (addition, multiplication, etc.), in a warehouse context we do aggregations on measures (sum, count, avg). A measure is often numerical but not necessarily. As an example the order price is a measure.
- Dimension — a dimension is something that categorises a measure, it adds context to a measure. You can use a dimension to filter or group the data. For instance the order date is a dimension.