← back to writing

What it actually takes to run unattended for a year

The single design constraint that reshapes every other decision

Across fifteen years of different jobs and a dozen side projects, there is one requirement I keep coming back to: runs unattended for a year in a controlled environment. Not “works on my laptop.” Not “passes the demo.” Not “fails over gracefully when someone notices.” It runs, for a year, without intervention.

It looks like a reliability requirement, and it is. More importantly, it is a design constraint, and it reshapes every decision upstream of it. If you take the requirement seriously from day one, you will make different language choices, different packaging choices, different observability choices, and different dependency choices than if you retrofit it later. In my experience, you cannot retrofit it at all. You can only rebuild with it in mind.

Here is what the requirement actually demands.

No runtime the operator has to maintain

Every runtime you ship to the device is a runtime that will need updating at some point in the next twelve months. Python’s virtualenv, Node’s node_modules, Java’s JRE, Ruby’s gemset, every one of these is something somebody has to tend. On a single device operated by a motivated hobbyist, that’s fine. On a hundred devices in the field, operated by people who have other jobs, it isn’t.

The language choice that falls out of this is “compiles to a single static binary.” Go, Rust, and Zig all qualify. C and C++ can, with effort. Python, Node, Java, and Ruby do not, without packaging heroics that create their own long-term maintenance burden.

No hidden network dependencies

Every call out is a call that will fail at some point in the next twelve months. Every DNS lookup, every NTP sync, every package index hit, every “phone home for configuration,” every telemetry submission. Any of them can break silently and leave the device in a partially working state. The ones that are easy to miss are the ones baked into libraries: “is my clock right?” checks, default logging destinations, automatic update pings.

The design that falls out of this is “works fully offline, and is boringly explicit about every outbound call it makes.” If you’ve already reasoned about every outbound call at design time, turning off the internet is a non-event.

A single state store with a clear backup story

Applications that sprinkle state across three places (a SQLite file, a config directory, a small JSON cache, a logrotate-managed log file) have an incomplete backup story by default. When an operator restores from backup a year into the deployment, they will lose something small and important. The right answer is one place for state, one command to back it up, one command to restore it, and all three tested against the actual build. Anything else is a promise that nobody in the field will keep.

Observability designed for the absent operator

The default observability stack is designed for someone staring at a dashboard. A year-unattended device has nobody watching. The observability requirement is different. The device has to complain loudly when something is wrong, over a channel the operator actually reads (email, phone notification, a physical LED), and stay silent otherwise. Anything that drifts into “noisy normal” is operator fatigue waiting to happen.

Concretely, a year-unattended device should produce zero alerts under normal operation, one alert when something non-trivial happens, and should refuse to clear the alert itself unless an operator acknowledges it.

Every dependency is a long-term bet

Every third-party crate, library, or service you take a dependency on is a bet that someone else will maintain it for twelve months. Some bets are safer than others. The deep dependency tree of a modern web framework is an enormous number of small bets, any one of which can go stale. The single-binary Go or Rust tool with three or four dependencies is a much smaller bet.

This is the instinct that makes me prefer DuckDB plus a custom extension over a “modern data stack” of six services glued together, and Rust crates from known maintainers over the week’s flavor-of-the-month. The goal isn’t to reinvent everything. The goal is to minimize the surface area of things that can break without your knowledge.

Why I keep coming back to this

Every one of these decisions feels pessimistic at design time and vindicated at month nine. Every one of them costs something on day one and saves more on day 300. And every one of them pushes you toward the same design aesthetic: small binaries, narrow APIs, explicit dependencies, boring state management, loud failures, quiet success.

That aesthetic is not a personal preference. It is what “runs unattended for a year” decomposes into once you think about it carefully. Once you internalize the decomposition, you can’t unsee it in other people’s designs, and you start designing for it by default.