Writing
Notes on retrieval, evals, agent infrastructure, and what it takes to ship systems that keep working.
-
A refusal-eval rubric for grounded document QA
Four buckets, one table, and why "correct" is the least interesting score in the set
-
System-design notes from shipping hosted LLM workloads
Prompt caching, tool-use loops, structured output, and the failure modes that show up in regulated deployments
-
Notes on designing a protocol-native agent eval substrate
Why the wire protocol is the right place to draw the eval boundary
-
What it actually takes to run unattended for a year
The single design constraint that reshapes every other decision
-
Rust, Go, and the distribution-vs-guarantees matrix
When Rust is the right choice, when Go is, and why the honest answer is "it depends on which problem you're solving"