Writing

Notes on retrieval, evals, agent infrastructure, and what it takes to ship systems that keep working.

A refusal-eval rubric for grounded document QA

April 5, 2026

Four buckets, one table, and why "correct" is the least interesting score in the set
System-design notes from shipping hosted LLM workloads

March 22, 2026

Prompt caching, tool-use loops, structured output, and the failure modes that show up in regulated deployments
Notes on designing a protocol-native agent eval substrate

February 28, 2026

Why the wire protocol is the right place to draw the eval boundary
What it actually takes to run unattended for a year

February 20, 2026

The single design constraint that reshapes every other decision
Rust, Go, and the distribution-vs-guarantees matrix

January 18, 2026

When Rust is the right choice, when Go is, and why the honest answer is "it depends on which problem you're solving"