Andrea Bozzo | Blog

👋 Welcome to my technical blog!

I write about Data Engineering, Rust, Go, Python and open source technologies.

Exploring lakehouse architectures, real-time streaming and the modern data world.

Zero Grappler logo

Zero Grappler: Data-Pipeline Thinking on a Microcontroller (Draft Notes from Before the Hardware Arrives)

Zero Grappler is a small no_std crate that applies a data-pipeline mindset to embedded ML: three traits, two async tasks, compile-time buffer sizing, zero allocations. This post is about the design choices — not yet a hardware report. The Pico 2 W smoke test on real silicon is still ahead of me.

April 21, 2026 Â· 12 min Â· 2370 words Â· Andrea Bozzo
Lance Format and LanceDB

Lance Format and LanceDB: Columnar Storage for the Embedding Age

Lance is a columnar storage format built for machine learning workloads — fast random access, native vector indexing, and zero-copy Arrow integration. This article walks through the format itself, how LanceDB builds on top of it, and how I wired it into a live NATS stream to build a simple semantic search layer over real-time events.

April 7, 2026 Â· 8 min Â· 1525 words Â· Andrea Bozzo

Guardrails for Tabular ML: A Data Engineer's Take on Data Leakage, Poisoning, and Brittle Pipelines

Most ML pipeline failures are not exotic model bugs — they are data issues that nobody encoded as checks. This article walks through building guardrails using pandas, Apache DataFusion, data contracts, and the Arrow C Data Interface.

March 23, 2026 Â· 13 min Â· 2649 words Â· Andrea Bozzo
1 Year of Claude Code

1 Year of Claude Code: An Interview

Claude interviews Andrea Bozzo about a full year of using Claude Code in the terminal — the workflow, the custom skills, the rough edges, and the nuked database.

March 5, 2026 Â· 8 min Â· 1502 words Â· Andrea Bozzo
Harvesting vs Scraping

Harvesting vs Scraping: Building Both Sides in Rust with Ares and Ceres

Two Rust projects, one conceptual divide. Ares fetches arbitrary web pages and uses LLMs to extract structured data; Ceres harvests metadata from CKAN portals and indexes it semantically. Together they show what it looks like to move from scraping scripts to production data pipelines.

February 20, 2026 Â· 14 min Â· 2907 words Â· Andrea Bozzo
Profiling data around Apache Arrow

Designing a Data Profiler Around Apache Arrow: Lessons from dataprof

A design story of dataprof: why I built a profiler around Apache Arrow, how it changed the architecture, and how this journey led to contributions to arrow-rs’ Parquet reader.

February 5, 2026 Â· 11 min Â· 2316 words Â· Andrea Bozzo
Async in Python and Rust

Async in Python and Rust: Two Worlds, One Keyword

A technical exploration of async/await in Python and Rust: how the same syntax hides completely different execution models, with practical examples from contributions to Tokio and Python projects.

January 22, 2026 Â· 13 min Â· 2661 words Â· Andrea Bozzo
Mosaico Logo

Mosaico: The Data Platform for Robotics and Physical AI Written in Rust

A deep dive into Mosaico, the robotics data platform written in Rust: client-server architecture, semantic ontologies, data-oriented debugging, my journey within it, and the integration with Data Contract Engine.

January 6, 2026 Â· 18 min Â· 3726 words Â· Andrea Bozzo
Ceres Logo

Ceres: Semantic Search for Open Data

Ceres is a semantic search engine for CKAN portals. Built in Rust with Tokio and PostgreSQL+pgvector, it bridges the gap between how people search and how public administrations name their datasets.

December 20, 2025 Â· 7 min Â· 1454 words Â· Andrea Bozzo
Polars - Extremely Fast DataFrames

Closing the Rust Circle: High-Performance Data Analysis with Polars

Polars completes the Rust data engineering ecosystem: lazy evaluation, Apache Arrow, and native Iceberg V3 integration for performant analytics that compete with distributed clusters. The third pillar of the RisingWave + Lakekeeper + Polars stack.

December 3, 2025 Â· 24 min Â· 4993 words Â· Andrea Bozzo