Context Windows Explained: Why Your AI Coding Agent Forgets What You Said

AISoftware EngineeringDeveloper Tools

You start a session with your AI coding agent. It's sharp. It knows your codebase, follows your conventions, and gives you exactly what you asked for. An hour later, it's contradicting decisions it made thirty minutes ago, forgetting you already refactored that module, and producing code that doesn't match the style it was following perfectly at the start.

You didn't do anything different. The model didn't change. So what happened?

The answer is the AI coding agent context window, and once you understand it, a lot of "weird AI behavior" stops being mysterious.

The Whiteboard Has a Fixed Size

Think of the context window as a whiteboard. Everything the model knows about your current session lives on that whiteboard: your instructions, the conversation history, files it's read, tool definitions, error logs it fetched. The whiteboard has a hard limit on space. When you run out of room, you start erasing the oldest stuff in the corner to make room for new things.

At some point, you've erased the original plan. Now the model is working from whatever is still visible. And what's visible is mostly the recent back-and-forth, not the architectural decisions you made at the start.

That's context rot. As token count increases in a session, recall accuracy degrades. The transformer architecture creates n-squared pairwise relationships for n tokens, so attention gets stretched thinner as the window fills. The model doesn't fail suddenly. It drifts. And the drift is easy to miss until something goes obviously wrong.

This is the single most important concept for understanding why AI sessions fall apart. The model didn't get dumber. It ran out of room to think.

What's Actually in Your Context Window

The context window holds everything the model can "see":

  • The system prompt (instructions for how the agent should behave)
  • Full conversation history: every message you sent, every response it gave
  • Files it loaded or read during the session
  • Tool schemas from any MCPs you have connected
  • Results returned from tool calls: search results, error logs, file contents

Every one of these competes for the same finite space. A lot of engineers assume the context window is just the chat. It's not. Before you type a single character, you may have already spent thousands of tokens on things you didn't consciously think about.

How MCPs Eat Your Token Budget

MCPs (Model Context Protocol) let your agent connect to external services: Slack, GitHub, Sentry, your analytics platform. Each connected MCP server adds its tool schema to the context automatically. A single MCP server can consume 10,000 tokens just to initialize. Connect five servers and you might burn 50,000 tokens before any actual work begins.

That's not a small number. Claude Sonnet's context window is 200,000 tokens. You could blow through 25% of it on MCP definitions before the agent has read a single line of your code.

The right way to think about MCPs: treat them like browser tabs. Open the ones you need for this task. Close everything else.

Practical Tips for Keeping Sessions Clean

A few habits that actually help:

  • Connect only the MCPs you need for the current task. If you're writing frontend components, you probably don't need the Sentry MCP running.
  • Keep files small and scoped. When you ask the agent to read files, it loads them into context. Loading a 3,000-line file to check a 20-line function burns context you didn't need to burn.
  • Break long tasks into focused sessions. Plan in one session, implement in another, review in a third. Each session starts with a full, clean window.
  • Write tight instructions. Every line in your CLAUDE.md or AGENTS.md loads every session. Cut anything that isn't earning its space.
  • Start fresh after major decisions. Once you've planned something big and committed it, the conversation that led there isn't useful anymore. Clear it.

Context Engineering: What Comes After Prompt Engineering

Most engineers learned about prompt engineering first: how to phrase requests, how to give the model the right framing. That's still useful. But I think the more important skill now is context engineering: managing what's in the window, not just what you say in it.

Prompt engineering is about what you say. Context engineering is about what the model can see. Both matter, but I think context engineering pays off more in 2026 and beyond, because sessions are getting longer, tasks are getting more complex, and the cost of a polluted window is growing.

Your AI coding agent doesn't have unlimited memory. It has a whiteboard. Your job is to keep that whiteboard usable.