Andrea Bozzo | Blog

👋 Welcome to my technical blog!

I write about Data Engineering, Rust, Go, Python and open source technologies.

Exploring lakehouse architectures, real-time streaming and the modern data world.

1 Year of Claude Code

1 Year of Claude Code: An Interview

Claude interviews Andrea Bozzo about a full year of using Claude Code in the terminal — the workflow, the custom skills, the rough edges, and the nuked database.

March 5, 2026 Â· 8 min Â· 1502 words Â· Andrea Bozzo
Harvesting vs Scraping

Harvesting vs Scraping: Building Both Sides in Rust with Ares and Ceres

Two Rust projects, one conceptual divide. Ares fetches arbitrary web pages and uses LLMs to extract structured data; Ceres harvests metadata from CKAN portals and indexes it semantically. Together they show what it looks like to move from scraping scripts to production data pipelines.

February 20, 2026 Â· 14 min Â· 2907 words Â· Andrea Bozzo
Profiling data around Apache Arrow

Designing a Data Profiler Around Apache Arrow: Lessons from dataprof

A design story of dataprof: why I built a profiler around Apache Arrow, how it changed the architecture, and how this journey led to contributions to arrow-rs’ Parquet reader.

February 5, 2026 Â· 11 min Â· 2316 words Â· Andrea Bozzo
Async in Python and Rust

Async in Python and Rust: Two Worlds, One Keyword

A technical exploration of async/await in Python and Rust: how the same syntax hides completely different execution models, with practical examples from contributions to Tokio and Python projects.

January 22, 2026 Â· 13 min Â· 2661 words Â· Andrea Bozzo
Mosaico Logo

Mosaico: The Data Platform for Robotics and Physical AI Written in Rust

A deep dive into Mosaico, the robotics data platform written in Rust: client-server architecture, semantic ontologies, data-oriented debugging, my journey within it, and the integration with Data Contract Engine.

January 6, 2026 Â· 18 min Â· 3726 words Â· Andrea Bozzo
Ceres Logo

Ceres: Semantic Search for Open Data

Ceres is a semantic search engine for CKAN portals. Built in Rust with Tokio and PostgreSQL+pgvector, it bridges the gap between how people search and how public administrations name their datasets.

December 20, 2025 Â· 7 min Â· 1454 words Â· Andrea Bozzo
Polars - Extremely Fast DataFrames

Closing the Rust Circle: High-Performance Data Analysis with Polars

Polars completes the Rust data engineering ecosystem: lazy evaluation, Apache Arrow, and native Iceberg V3 integration for performant analytics that compete with distributed clusters. The third pillar of the RisingWave + Lakekeeper + Polars stack.

December 3, 2025 Â· 24 min Â· 4993 words Â· Andrea Bozzo
Lakekeeper Architecture Overview

Lakekeeper: The Apache Iceberg REST Catalog Written in Rust

An exploration of Lakekeeper, the Iceberg REST catalog in Rust that completes the data engineering ecosystem: enterprise security with vended credentials, multi-tenancy, and RisingWave integration to build streaming lakehouses without JVM

November 22, 2025 Â· 10 min Â· 2019 words Â· Andrea Bozzo
RisingWave Architecture

RisingWave and Iceberg-Rust: When Real-Time Streaming Meets Modern Data Lake

The partnership between RisingWave and Iceberg-Rust represents a window into where modern data engineering is heading: real-time CDC streaming, intelligent hybrid delete strategy, and a performant Rust ecosystem challenging JVM dominance.

November 10, 2025 Â· 7 min Â· 1408 words Â· Andrea Bozzo