The Context Window Problem

March 3, 2026

Here’s what happens during a long vibe coding session. You start strong. The AI remembers everything. Your instructions, your preferences, the decisions you made three messages ago. Output quality is high.

Forty minutes in, things shift. The AI suggests an approach you already rejected. It forgets a constraint you stated early in the conversation. It generates code that contradicts a decision from twenty messages back. You correct it. It apologizes and fixes the immediate issue but introduces a new inconsistency because the correction didn’t propagate to everything else it’s holding in memory.

This isn’t a bug. It’s a fundamental constraint of how context windows work. And it changes how you should use AI coding tools.

What’s actually happening

Language models process text in a context window — a fixed-size buffer that holds the conversation history, the system prompt, any files the model has read, and its own previous outputs. For Claude, this is large. But it’s not unlimited. And size isn’t the real issue.

The real issue is attention.

As the context fills, the model has to distribute its attention across more information. Earlier instructions get less weight. Not because they’re deleted — they’re still in the window. But because newer information is attended to more strongly. The model’s effective recall of something you said 30 messages ago is lower than its recall of something you said 3 messages ago.

This produces a specific failure mode: recency bias. The model’s behavior drifts toward the most recent instructions and away from the foundational ones. Your architecture decision from message 5 gets overridden by an offhand comment in message 35.

There’s a second issue: context pollution. Every correction you make, every “no, not like that” response, every abandoned approach that you discussed and rejected — it’s all still in the window. The model has to distinguish between “things we decided to do” and “things we discussed but rejected.” It’s reasonably good at this. But not perfect. And the longer the conversation, the more noise accumulates.

Why this matters for coding

Coding conversations are particularly vulnerable because they accumulate state quickly. You discuss architecture, make decisions, reference files, generate code, review it, request changes, generate more code. Every round-trip adds substantial text to the context.

A typical 45-minute session with Claude Code might consume 50,000+ tokens of context. That’s the conversation itself, the code generated, the files read, and your corrections. The model is juggling all of it, and the quality of its juggling decreases monotonically with volume.

I noticed this first on Scouter. I’d start a session working on the API layer. Everything was clean — correct types, consistent error handling, proper use of existing patterns. An hour later, same session, the generated code was using a different error handling approach. Not wrong, just different from what we’d established. The model had drifted from its own earlier output.

The conversational coding trap

Vibe coding is inherently conversational. The conversation is the specification — it builds up over time through back-and-forth. This means the specification is distributed across the entire context window. Decision A was in message 4. Constraint B was in message 12. The file path was in message 18. The interface was clarified in message 23.

The model has to reconstruct the full picture from scattered pieces. Every time it generates code, it’s assembling the spec from fragments spread across thousands of tokens. Some fragments are instructions. Some are corrections. Some are abandoned ideas. The model has to figure out which is which.

This is why vibe coding breaks at scale. It’s not just a workflow problem. It’s a context window problem. The methodology is fighting the medium.

How specs solve this

A spec is self-contained. It doesn’t depend on conversation history. It doesn’t accumulate over 40 messages. It exists as a single, complete document.

When I hand Claude Code a spec, the context window contains: the system prompt, the spec, and the files the agent reads. That’s it. No conversation history. No previous corrections. No abandoned approaches. No noise.

Task: Add daily active users chart to the Scouter dashboard

File: src/components/Dashboard/DauChart.tsx
Use ChartWrapper from src/components/Charts/ChartWrapper.tsx.
Data from analyticsApi.getDailyActiveUsers().
30-day line chart. Hover tooltips only. Match existing chart styles.
Mount in src/pages/Dashboard.tsx below MetricsGrid.

Constraint: Don't modify ChartWrapper or existing components.
Test: Chart renders with 30 days of DAU data.

Every decision is in one place. No assembly required. The model doesn’t have to reconstruct intent from fragments. The intent is right there, fully formed, using minimal context.

This is why spec-driven tasks produce more consistent output than long conversational sessions — even when the end goal is the same. The model has less noise to filter and more attention to spend on the actual code generation.

The fresh session approach

This understanding drives a specific workflow decision: start a new Claude Code session for every task.

Not every project. Every task.

When I run 8 Claude Code sessions at once, each one starts fresh. Clean context. No history. Just a spec and the codebase. The model’s full attention is on the current task.

If the first attempt needs changes, I don’t iterate in the same session. I start a new session with a revised spec. “In src/components/DauChart.tsx, change the default view from 30 days to 14 days.” Fresh context. Clean attention.

Is this wasteful? The model has to re-read files it already read in the previous session. But that re-reading costs less than the accumulated noise of a correction-heavy conversation. The tradeoff is worth it.

Why this matters more for parallel work

When you’re running one session, context degradation is manageable. You’re there. You notice the drift. You correct it. The correction adds noise, but you can tolerate it for a single session.

When you’re running 8 sessions, you can’t monitor context quality in real time. You’re not there. You scope the tasks in the morning, check the outputs at review time, and everything in between is unsupervised.

If those 8 sessions are conversational, you have no idea whether message 30 in each session is still aligned with message 1. You can’t see the drift until you review the output, and by then the agent has built on top of the drift for 20 more messages.

If those 8 sessions each start with a spec and run as a single execution, the context never degrades. There’s no conversation to drift. The spec is the instruction set, the codebase is the reference, and the agent builds. One pass.

This is the hidden reason specs enable parallelism. The obvious reason is that specs make you optional during execution. The less obvious reason is that specs keep the context window clean. Fresh session, focused context, consistent output. Every time.

Practical implications

Keep sessions short. If you’re in a conversational session and it’s been more than 20–30 minutes, the context is getting noisy. Either finish the task or start fresh.

Don’t reuse sessions across tasks. When you finish one task, don’t say “okay, now let’s work on this other thing.” Start a new session. The residue of the first task pollutes the context for the second.

Put decisions in the spec, not the conversation. If you find yourself making a decision mid-conversation — “actually, let’s use Zod for this” — stop. Add it to the spec. Start fresh. The spec should be the single source of truth, not message 14 of a 30-message thread.

Review early output as a baseline. The first code the model generates in a session is its best. If that’s wrong, the spec is wrong. If the output quality degrades over the session, the context is polluted.

The bottom line

AI coding assistants have a shelf life within any single conversation. The longer you talk, the less reliable the output. Not dramatically less — you won’t get garbage at message 50. But the subtle drift is real, and it compounds.

Specs aren’t just a workflow improvement. They’re an architectural response to a technical constraint. The context window has a finite capacity for coherent work. Specs keep the work within that capacity.

Write the spec. Start a fresh session. Let the model give its full attention to your task. That’s how you get consistent output — not by hoping the model remembers what you said twenty minutes ago, but by never requiring it to.

For the practical side of writing these specs, see how to write specs that AI can execute. For the methodology that ties all of this together, start with the spec coding complete guide. And for why this problem is the fundamental limit of vibe coding at scale, that’s where it all connects.