Data News — Week 23.22
Data News #23.22 — Japan views on copyright for AI, a new AI camera, what's the hype behind DuckDB?.
Hey, I've been sick longer than I expected, but I'm finally well. I hope this email finds you all well, as well. I've had to catch up on almost 3 weeks of content. When I step back, the amount of articles shared each week is insane, there are countless articles about things that have already been written. Sometimes I feel like I'm trying to find a needle in a stack. Or several needles.
I wanted to write more about Microsoft Fabric and the states of data that were published last week but I'll do it another time.
Gen AI 🤖
As always the pace of innovation in this field is incredibly fast so here a few news I've seen I found worth it:
- Japan goes all in: copyright doesn’t apply to AI training — I'm far from being a law expert but it looks like something that will create precedence. The article is saying that it lays down with Japanese new strategy to become a leader in AI technologies, by removing barriers on training data they hope to open doors. Obviously artists (especially mangakas) were not happy about it.
- Sam Altman, OpenAI CEO did an Europe Tour — Sam went to Europe recently (Span, France, Poland, Germany and UL) in order to meet countries representatives. I guess that he did lobbying around the AI Act but also he was here to do real estate because OpenAI wants an European office.
- New Nvidia 144TB GPU — Nvidia is the clear winning of the AI race. They announced an insanely crazy new GPU and Google, Meta and Microsoft are already customers. Surprising.
- How DoorDash uses XcodeGen to eliminate project merge conflicts — Ok now I don't want to resolve a Git conflit anymore 😅 .
- US researchers developed a LLM-powered Minecraft agent: Voyager. Minecraft is a survival game and the agent has been designed to Minecraft learn life skills incrementally. In the end it generates a code that is used to send the agent in the cubic world.
- A new kind of camera— An artist developed an AI camera, the Paragraphica, that is a context-to-image camera. The camera is using location data to feed context to a generative algorithm.
Fast News ⚡️
- Meltano announced their Cloud — Meltano is an open-source data integration project that has been started at Gitlab. With a few configuration and a CLI you can write data pipelines using hundreds of connectors (using Singer spec). The pricing is based on the number of runs and not the volume of data. This is a major difference with the competition (Airbyte, Fivetran, Stitch).
- A ridesharing app simulation — Juraj developed over the last months a complete simulation of a ridesharing app (like Uber), he shared everything he did in blog posts and the results is kinda amazing. I recently spent hours on Mini Motorways so this is the kind of side projects I like.
- Breaking into data engineering as a self-taught developer — A few advice from a fellow data engineer who was data analyst before.
- What's the hype behind DuckDB? — This is a great post from Matt Palmer about DuckDB. If you want a quick intro about the tool this is the way to start. In the article Matt also showcases how you could use DuckDB to write a transfer pipeline like moving a Parquet file from a disk to S3.
- How Instacart Ads modularized data pipelines with Spark — A great deep dive on a Lakehouse architecture for streaming. The article describes a migration from "thousands of complex SQL lines" to composable Spark SQL.
- dbt at Zendesk ; setting foundations for scalability.
Data Economy 💰
- Databricks acquires bit.io — bit.io was "the fastest way to get a Postgres database". In order to start you just had to send data and your database was already setup. When looking at the press release Databricks acquisition is a team acquisition to improve their own developper experience.
Now I go back on Diablo — See you next week ❤️.
blef.fr Newsletter
Join the newsletter to receive the latest updates in your inbox.