About
What I
work on
Most of my work has been about getting ML out of notebooks and into production — the unglamorous half where models meet real users, real latency, and real consequences. LLM agents make that work harder in ways traditional ML never did. That’s what I’m researching now.
On evaluation
“Most of what an LLM agent does in production has never appeared in any evaluation set.”

Identity
Who I am
Stable identifiers — for citations, search engines, and other Carloses.
- Full name
- Carlos Chinchilla Corbacho (cite as “Chinchilla Corbacho, C.”)
- ORCID
- 0009-0001-4495-8179 — canonical identifier
- GitHub
- @cchinchilla-dev — note the
-devsuffix; other GitHub users named “Carlos Chinchilla” are different people. - Affiliations
- QUANT AI Lab · Inditex (Senior ML & AI Engineer); Universidad de Salamanca (PhD candidate, expected 2027).
- Open source
- Maintainer of agentloom and agentanvil; merged contributor to a2a-python and a2a-go (Linux Foundation A2A Protocol, 1.0 stable).
Get in touch
Reach me
Best for ML/agent work, talks, collabs.
The work
Career
Where I’ve been building.
- 2025 —Senior ML Engineer · QUANT AI LabSame team, new challenge: agentic AI. Architecting the multi-agent (A2A) systems layer at Inditex — automating decisions across functional areas at global e-commerce scale.
- 2024 — 25ML Engineer · QUANT AI LabEmbedded in Inditex's Experience AI team. Real-time personalisation across all global e-commerce markets — inference under traffic spikes, experimentation that holds in production.
- 2023 — 24ML Engineer · Pontifical University of SalamancaDL for lithium battery second-life on Edge/IoT — SoH prediction contributing to a 35% reuse rate in energy storage.
- 2022 — 23Data Scientist · Telefónica FoundationProFuturo programme — predictive models reducing student dropout in large-scale education.
Press
Selected publications
For academic and industry venues.
- 2025PaperAdvanced Machine Learning and Deep Learning Approaches for Estimating the Remaining Life of EV Batteries — A ReviewBatteries
- 2024PaperApplication of Machine Learning Techniques for the Characterization and Early Diagnosis of Respiratory Diseases such as COVID-19IEEE Access
- 2024PaperAutomated Identification of Cylindrical Cells for Enhanced State of Health Assessment in Lithium-Ion Battery ReuseBatteries
In other words
Why I write here
The longer answer.
- 01Evaluation harnesses, in detail
- 02Failure modes from production
- 03Notes on debugging multi-agent runs
- 04Occasional opinionated takes
I work on the systems around language models — evaluation harnesses, trace and replay infrastructure, the things that let an agent be debugged, audited, and trusted in production. Most of it is engineering, not modelling.
That distinction matters. ML research and ML production look superficially similar — they share vocabulary, papers, even people — but they have different failure modes, different value functions, different ideas of what “done” means. A model that scores well on a benchmark can still leak data, hallucinate confidently, or regress silently after a deploy.
This site is the long form of that work. Short pieces, real numbers, code where useful. If you ship ML systems — LLM agents or otherwise — and care about reliability, you’ll find familiar problems here.