How to give a beginner a capable AI agent without it being reckless.

We were about to hand real, capable AI agents to people who had never written a line of code. They had a mentor sitting beside them, someone who knew the system, but they were new, and trusting, and just as able to paste something dangerous into the agent without knowing it. The instinct in that moment is to lock everything down. Strip the agent back until it can barely do anything, and call it safe.

We didn't do that, and the reasoning is the only part of this worth writing down.

A safe agent that does nothing is not safe, it is useless

Think of giving someone a car with the engine bolted so it can only idle. It will never crash, but it will never take them anywhere either, so they stop getting in. An over-restricted agent fails its owner the same quiet way. The beginner asks it to do real work, research something, draft a document, send an internal message, and it can't, because in the name of safety you took away the hands it needs. So they stop using it, learn nothing, and you have protected them from a tool they no longer touch. That is not a win. It is the whole project failing with a tidy record to show for it.

The entire point of giving someone an agent is that they learn what one feels like by doing genuine work with it. A restriction that defeats that purpose has confused the goal with the fence around it.

The one question that sorted every decision

We needed a test a non-specialist could understand and that we could apply the same way to every single thing the agent might do. We landed on one idea: can the damage be undone?

If the agent can only break itself, that's recoverable, and it's fine. If it can leak a secret, act on the outside world as if it were a real person, or wreck something a whole group depends on, that's hard to take back, and it gets handled.

That single question did more work than any checklist. An agent that trashes its own setup is a quick rebuild and a lesson learned, annoying, not serious. An agent that emails the outside world wearing someone's real name, or quietly corrupts something a whole team relies on, is a completely different animal, because you cannot reach back and unsend it. The first kind we left alone. The second kind we treated as the real problem.

This is deliberately not airtight, and admitting that is the honest part. It is the right setup for a trusted, mentored beginner, not the setup you'd use for an anonymous stranger on the open internet. Matching how tightly you lock things to the actual person in the chair is the discipline. Most teams reach for the strictest setting they can find and feel responsible for having done so, without ever asking whether it fits who they are protecting.

Make the prize worthless instead of building a taller wall

The more useful instinct turned out to be the opposite of locking down. Instead of asking "how do we stop someone reaching the keys," we asked "what is within this agent's reach that is even worth stealing?" If the keys it carries cost nothing, run out fast, and are worth nothing to whoever grabbed them, then the worst case of someone getting hold of them is a shrug. You've moved the security off the wall, which can be climbed, and onto the contents, which simply aren't worth the climb.

That shift is quieter than a long list of blocks and far more durable. A wall invites someone to go looking for the gap. An empty safe doesn't reward the effort. We spent more of our attention making sure nothing valuable lived anywhere a beginner's agent could touch it than on guarding every path to it.

What this is really about

There's a reflex in security work to treat restriction as care, the more you take away, the more responsible you feel. With people you are mentoring, that reflex is wrong. Care looks like a capable agent they will actually use, with the genuinely unrecoverable paths closed and everything else left open enough to learn on. You ask whether the damage can be undone, and whether the prize is worth taking. Where both answers are no, you step in. Everywhere else, you let them work.

Containment, not maximum restriction. The difference is the whole job.

For the technical reader

The decision rule was a recoverability test applied per capability: scope the blast radius of each power and sort it into recoverable or not. Anything whose worst case is confined to the agent's own state (its config, its workspace, its own session) is left ungated, because the remedy is a re-provision. Anything that can exfiltrate a secret, take an action against an external system under a credentialed real-world identity, or mutate shared state that other people depend on is treated as the hard case and mitigated. That maps cleanly to a worst-case tiering: self-only damage at the bottom, externally-visible or shared-state damage at the top.

The gaps are a deliberate threat-model choice, not an oversight. The protected party is a trusted, mentored, non-technical beginner, so the frame is "contain a well-meaning person who can't yet read the danger," not "withstand an adversary." The same agent exposed to anonymous public users would need a stricter posture.

The "devalue the loot" move is credential design rather than wall-building: where an agent must hold keys, prefer keys that are capped, cheaply and freely mintable, and carry no value to a third party who exfiltrates them. The aim is to make the reward of reaching a credential approach zero, so the perimeter around it stops being the only thing standing between a beginner's mistake and a real loss.

How to give a beginner a capable AI agent without it being reckless.

A safe agent that does nothing is not safe, it is useless

The one question that sorted every decision

Make the prize worthless instead of building a taller wall

What this is really about

Want one of these in your inbox once a month?

We automated the intake and kept the care human.

The service moved an hour earlier, and the page stayed dark.

A book on the shelf has never improved a codebase.