One command should set up an AI assistant for everyone, or you have not really built a tool.

The brief was easy to say and harder to honour. Every person on a growing team should get their own AI assistant. Each one should run on its own server, with its own login, its own private keys, and its own spending limit. And nobody should have to write fresh code every time a new person joins. One person arrives, one command runs, and their assistant exists. That is the whole product.

So we built a setup tool. You hand it a short description of the assistant you want (which settings, which model, which budget, which permissions) and it does everything else. It walks through the steps a person would otherwise do by hand, and it does them the same way every time. One command in, a working assistant out.

The first run for a real person is the only honest test

Everything worked on our own machines. We had run it, checked it, and rehearsed it. Green across the board. Then we pointed it at a live server to set up the first assistant for an actual team member.

That run found six problems our own testing never could. Not because our testing was lazy. It is built into the nature of the thing. A tool whose entire job is to set something up on a new server cannot prove it works until it meets a server it has never seen before. The rehearsal happens on familiar ground. The real test happens on strange ground, and only the strange ground tells the truth.

A few of those problems, to be specific rather than impressive: a part of the assistant that was supposed to be switched on had never actually been switched on. A connection to our private key store had quietly pointed at the wrong place, so it looked like it was working while reaching the wrong shelf entirely. Running the tool a second time, to change a setting, silently did nothing at all. It reported success and changed nothing. And in one configuration the assistant could not prove who it was.

None of these are exotic. Every one is the kind of thing you write down as "obvious" the moment you have seen it once, and would never have guessed before. That is exactly the point.

A setup tool is a pile of assumptions about a machine it has never met. Those assumptions stay untested until they finally meet one.

Three rules, or it is just a script

The discipline that came out of this is short.

Running the tool twice must be safe, and the second run must actually apply your changes rather than shrug and do nothing. The difference between one person's assistant and the next must live in the short description you hand it, never in a separate copy of the code. And when an assumption breaks, the tool must say so loudly. A failure that happens quietly is a server that quietly comes back wrong after a restart and tells no one.

We fixed all six problems on that one live run and shipped the fixes the same day. That is a trade I would take every time. Six real problems found on one honest run beats a month of testing on home ground that only confirms what you already believed.

The reason this matters beyond our own setup is simple. "Done for you" is a promise about the second time, not the first. Anyone can stand up one assistant by hand and call it a system. The real system is the thing that stands up the tenth one, for a person you have never met, without you in the room. If your one command cannot survive its first meeting with a real, unfamiliar server, you have written a very good script. You have not built a tool.

Under the hood

The "setup tool" is a manifest-driven bash engine. You hand it a small YAML file describing the agent (profile, model, budget, permissions) and it does the rest: provisions a dedicated operating-system user so each agent is isolated from the others, clones an open-source agent harness pinned to a known version, wires the runtime secrets through the secrets vault, then installs two hardened systemd units and starts them.

The first live deploy surfaced six engine gaps. The four worth naming:

The agent's API server was never enabled. The harness's main command bound no port; the health-check responder was a separate switch we had not flipped. Locally it went unnoticed because we knew where to poke.
The secrets-vault endpoint was never wired through, so the client quietly defaulted to the public hosted service instead of our self-hosted one. It "worked" — against the wrong place.
The re-deploy path was a no-op. Enabling an already-enabled systemd unit changes nothing, so config edits on a second run never took effect. The engine reported success and did nothing.
The identity-token path was wrong for one of the two unit types, so the agent could not prove its identity under that configuration.

The codified discipline: the engine must be idempotent (a second run is safe and re-applies changed config), manifest-driven (per-agent difference is data, not a code fork), and loud on a broken assumption. A silenced failure on the autostart path is a server that comes back up wrong after a reboot and tells nobody. All six were fixed and shipped in the same session.

One command should set up an AI assistant for everyone, or you have not really built a tool.

The first run for a real person is the only honest test

Three rules, or it is just a script

Want one of these in your inbox once a month?

We automated the intake and kept the care human.

The service moved an hour earlier, and the page stayed dark.

A book on the shelf has never improved a codebase.