The Trust-But-Verify Approach to AI Code

March 17, 2026

There are two ways to fail with AI coding tools. The first is trusting too much — accepting output without reading it, shipping code you don’t understand. The second is trusting too little — hovering over the AI, re-reading every line as it types, essentially pair programming with a tool that doesn’t need a pair.

The second failure mode is more common. And more expensive.

What trust means

Trust doesn’t mean “I believe the AI wrote perfect code.” It means “I believe the AI can execute a well-scoped task without me watching.”

The distinction matters. When I scope a task for Claude Code and hit enter, I switch to another tab. I don’t watch it work. I don’t read the streaming output. I go review another project’s diff, or write another scope, or make coffee.

This is the foundation of running 8 sessions at once. If I watched each session work, I’d run one session. Maybe two. The parallelism comes from trusting the process enough to walk away.

Trust is a time decision. Every minute you spend watching the AI work is a minute you’re not spending on something only you can do — scoping, reviewing, shipping.

What verify means

Verify means reading every diff. Not skimming. Reading.

git diff HEAD~1

Every file that changed. Every line that was added or removed. Every test that was written. I covered the full process in my review workflow post, but the short version is three passes: structure (right files?), logic (right code?), style (right patterns?).

Verify also means running the code. Not just reading the diff — actually testing the feature. Click through it. Hit the edge cases. Try the thing that users will try that you didn’t spec.

On Lucid, I reviewed a diff for the journal search feature. Code looked right. Tests passed. But when I actually typed a search query with special characters, it crashed. The regex wasn’t escaped. The diff review missed it because the code was correct for normal input. The manual test caught it in 10 seconds.

The trust calibration

Not all tasks deserve the same level of trust.

High trust tasks — boilerplate, scaffolding, styling, copy changes, adding a new page that follows an existing pattern. These have low risk and the AI handles them reliably. I review the diff but I’m not looking hard. A quick structure pass and I’m done.

When I had Claude Code update the onboarding copy on Triumfit, the diff was three lines of text changes. Thirty-second review. Accept.

Medium trust tasks — new features that follow established patterns, test writing, API endpoints with straightforward CRUD logic. I do the full three-pass review. I test the feature manually once.

Low trust tasks — business logic with edge cases, data migrations, authentication/authorization changes, anything involving money or user data. Full review. Manual testing. Sometimes I’ll read the code twice. On Scouter, any change that touches billing gets the full treatment regardless of how clean the diff looks.

The calibration also changes based on how good your spec was. A tight scope with file paths, function references, and acceptance criteria produces more trustworthy output than a loose description. This is why writing good specs is the highest-leverage thing you can do.

Building trust over time

When I started using Claude Code, I watched it work. I’d read the streaming output, catch myself reaching for the keyboard, hold back, then read it all again after it finished. I was reviewing the code twice — once as it wrote it, once after.

That’s expensive. It’s also how most people use AI coding tools right now.

Trust builds as your specs improve. Here’s the progression I went through:

Week 1: Vague prompts. Watch the AI. Review nervously. Accept about 60% of output without changes.

Month 1: Better prompts with file paths. Stop watching some tasks. Accept about 75% of output.

Month 3: Full specs with architecture references. Rarely watch. Accept about 85% of output on first review.

Now: Tight scopes, every time. Never watch. Accept about 90% of output. The 10% I reject is almost always a scoping problem, not an AI problem.

The trust didn’t increase because the AI got better. It increased because my specs got better. Better input, better output, more trust in the output.

The cost of not trusting

Watching the AI work is the most expensive thing you can do with these tools.

Let’s do the math. If a task takes Claude Code 8 minutes to complete, and I have 8 tasks:

Watching each one: 64 minutes of me sitting there reading streaming output. Plus review time after. Call it 90 minutes total.
Trusting and reviewing after: 8 minutes of scoping. 40 minutes of review. 48 minutes total, 25 of which I was doing other things while the agents worked.

Same output. Half the time. And during those 25 free minutes, I was scoping more tasks or reviewing other projects.

Distrust is a scaling bottleneck. You can’t run parallel development if you need to be present for each session. The whole model falls apart.

The micromanagement trap

Some developers treat AI like a junior engineer who needs constant supervision. They read every line as it writes, interrupt to correct course, re-prompt after every few lines.

This is pair programming. Pair programming is fine. But it defeats the purpose of having an agent that can work independently.

The shift is: you’re not a supervisor. You’re a reviewer. The difference is timing. A supervisor is present during the work. A reviewer shows up after. A supervisor’s bottleneck is the work itself. A reviewer’s bottleneck is the review queue.

If you write good enough specs — clear scope, file paths, architectural guidance, acceptance criteria — you don’t need to supervise. You need to scope well and review thoroughly.

What to do when trust breaks

Sometimes the AI produces something wrong. Not slightly off — fundamentally wrong approach, bad architecture, missed the point entirely.

When that happens, the fix is almost never “watch it more carefully next time.” The fix is:

Check the spec. Was the task clear? Did I specify the approach or leave it open? If I left it open, that’s my fault.
Check the scope size. Was the task too big? Large tasks give the AI room to go in wrong directions. Smaller tasks constrain the solution space.
Add context. Does the CLAUDE.md file explain the relevant patterns? Does the spec reference the existing code it should follow?

After a trust failure on Logline — Claude Code rebuilt a search index using a completely different data structure than the rest of the app — I added a section to the project’s CLAUDE.md explaining the data layer patterns. Haven’t had that problem since.

Trust breaks are feedback about your process, not about the AI’s capability.

The mental model

Think of it like a contractor building a house.

You don’t stand on the job site watching them hammer nails. You give them blueprints (specs). You show up periodically to inspect the work (review). You trust that a professional can follow blueprints without supervision. If the work doesn’t match the blueprints, you give better blueprints next time.

The blueprints are your job. The hammering is theirs. The inspection is yours.

Trust the hammering. Verify the result. That’s the whole model.