An agent can do the work and still lose your trust.

The complaint was simple, and it kept coming back. "I can't see the agent doing anything."

The agents were helping HOFMI's ministry team with real tasks: looking things up, working through files, coming back with the right answer. From our side, everything looked healthy. From the user's side, it was a chat box that went quiet for a few seconds and then produced a reply out of nowhere.

That quiet is the whole problem. Imagine asking someone a hard question and they stare at the wall, say nothing, and then hand you an answer. Even if the answer is right, you don't quite trust it. You didn't see them do the thinking. An agent that works in silence looks exactly the same as an agent that is guessing. Trust does not last long in that state.

The work was happening. The proof wasn't getting through.

So we set out to give the user a small, honest sign of life: a little label, each time the agent uses one of its tools, that says what it did.

The careful part was deciding how much that label should reveal. The tempting move is to show everything (the full command, the full result, the lot) and then add a separate step that scrubs out anything private before it reaches the screen. We didn't do that.

The label carries only the name of the tool and whether it worked. Nothing private is in it, because nothing private was ever put in it.

Each label says three things and only three: which tool ran, whether it succeeded, and how long it took. Not what it looked at. Not what it found. That choice does more than tidy the screen. A scrubbing step is a permanent liability: every new tool is one more thing it has to remember to hide, and the day it forgets is the day something leaks. If the private detail never goes into the label, there is nothing to scrub and nothing to leak. The user gets enough to trust the work. The screen carries nothing it shouldn't.

The bug that hid the fix

Here is the part that cost us more than the rest.

We built the label. We checked it. By every measure we could take, it was working, except the one that mattered: whether it actually appeared on the user's screen. It didn't. The work was done, shipped, and live, and the user still saw nothing.

The cause was an old copy. The app had quietly kept serving an outdated version of itself to anyone whose browser tab was already open, and it kept doing that no matter how many times we shipped the new version. Our fix was live; the browser was politely ignoring it and showing the stale copy instead.

Once we forced the app to let go of the old copy, the label appeared at last: a tool, run, succeeded, timed to the millisecond. The silence was gone.

The lesson is narrow and worth saying plainly. If a change is provably finished and yet provably invisible, suspect that an old copy is being served before you suspect your own work is broken. We now treat clearing that old copy as part of every release that touches what the user sees. Not an afterthought. It would have saved us a long, pointless hunt for a fix that was never broken.

Under the hood

Two agent shapes, one chat surface. We run two kinds of agent against the same chat. One streams its turn live and already had a path to surface tool activity. The other polls for work and posts a reply when it's done. That kind had no write path for tool traces, because it never holds an open stream. The most active agents on the platform were therefore the most opaque.

The trace path. The fix had two halves. First, an endpoint on our agent platform that accepts and persists a small trace array attached to each agent message. Second, a shim inside the runtime that asks the agent to run in streaming mode purely so it can read the tool-progress events as they fire, then attaches the resulting trace to the reply it posts.

Privacy by omission, not redaction. Each trace entry carries exactly three fields: tool name, success/failure status, and duration in milliseconds. No arguments, no results, no touched content ever enters the array. So there is no redaction pass to maintain and no per-tool leak surface: the privacy requirement is met because the sensitive material is never written, not because it's stripped later.

Verified at the DB before the UI. A live probe showed a trainee agent's reply persisting four tool entries: the same terminal tool called four times, each with its own measured duration. The trace was well-formed at the database layer well before it ever rendered in the browser; those were two separate problems.

The stale cache. The render failure was the service worker. Its CACHE_VERSION hadn't been bumped since April, so the progressive web app kept serving the old bundle to every already-open tab indefinitely, regardless of how many deploys landed. The new code was live; the browser refused to fetch it. Bumping the version and clearing the old worker let the status chip finally render (a single terminal call, SUCCESS, ~151 ms). Bump the cache version on every deploy that touches the client. Treat it as part of the deploy, not an afterthought.

An agent can do the work and still lose your trust.

The work was happening. The proof wasn't getting through.

The bug that hid the fix

Want one of these in your inbox once a month?

We automated the intake and kept the care human.

The service moved an hour earlier, and the page stayed dark.

A book on the shelf has never improved a codebase.