Context Engineering Part 1: Why Your AI Chatbot Forgets Everything
I'm Software Engineer with 7+ years of experience in designing, implementing and debugging softwares including backend services, automation tools and Mobile SDK.
I love building things and currently building lessentext.com
Every Large Language Model has amnesia. And it's not a bug—it's a fundamental design constraint that costs companies millions in lost productivity and wrong code decisions.
In this first part of our Context Engineering series, we'll explore the root cause of AI memory loss and why understanding the context window is critical for building production AI systems.
The Context Window: AI's Working Memory
Think of a context window as a whiteboard in a meeting room. You can only write so much on it before you run out of space. When you do, you must erase something old to write something new—and whatever you erase is gone.
What is a Context Window?
Every LLM can only "see" a finite amount of text at any given time. This finite text space is called the context window. It's measured in tokens (roughly 0.75 words per token).
Here's how context windows have evolved over time:
| Model | Year | Context Window | Approx. Pages |
|---|---|---|---|
| GPT-3 | 2022 | 4K tokens | ~3 pages |
| GPT-4 | 2023 | 32K tokens | ~25 pages |
| Claude 3 | 2024 | 200K tokens | ~150 pages |
| Gemini 1.5 Pro | 2024 | 1M tokens | ~750 pages |
| GPT-4.1 | 2025 | 1M tokens | ~750 pages |
| Llama 4 Scout | 2026 | 10M tokens | ~7,500 pages |
The Real-World Impact
The context window directly controls five critical factors in your AI applications:
The Amnesia Problem: A Concrete Example
Here's a perfectly normal conversation that breaks without proper context management:
# Turn 1
user: "My name is Akash, I'm building a React app with TypeScript."
assistant: "Nice to meet you, Akash! What features are you implementing?"
# Turn 2
user: "User authentication and a real-time dashboard."
assistant: "Great choices. What's your backend stack?"
# Turn 3
user: "Node.js with PostgreSQL and Redis for caching."
assistant: "Solid stack! What specific issue are you facing?"
# Turn 4
user: "The WebSocket connections keep dropping."
assistant: "Let me help debug that. Can you share your config?"
# ...20 turns later...
# Turn 24
user: "What tech stack am I using again?"
assistant: "I'm not sure — could you remind me?" # ← CONTEXT LOST
The model didn't forget because it's bad. It forgot because those early messages were pushed out of the context window.
What Happens When the Window Overflows?
When new messages arrive but the window is full, one of four things happens:
The Four Failure Modes
| Failure Mode | What You See | Example |
|---|---|---|
| Context Drift | Model loses the original topic | Started discussing React, now answering about Python |
| Repetition | Model re-asks for information already provided | "What framework are you using?" (you said React 5 turns ago) |
| Information Loss | Important details silently dropped | User's constraints, preferences, prior decisions — gone |
| Context Overflow | Hard crash, no response | Error: This model's maximum context length is 4097 tokens |
Context Window vs Human Memory
Humans don't have this problem (at least not this badly). Here's why:
The Fundamental Challenge
The fundamental challenge of context engineering is: How do we give LLMs something resembling human memory management—selective, prioritized, and graceful—within a rigid token budget?
This is the question the rest of this series answers. We'll explore:
Naive solutions (sliding windows) and why they fail
Smarter strategies (token-based management)
Compression techniques (summarization)
Modern approaches (RAG, tool use, memory systems)
The current frontier (long context models and their limitations)
Why This Matters for Your Business
Poor context management isn't just an annoyance—it has real business impact:
Lost Productivity: Teams spend time re-explaining context
Wrong Decisions: AI makes contradictory recommendations
Poor User Experience: Chatbots feel forgetful and unintelligent
Increased Costs: Inefficient token usage leads to higher API bills
Security Risks: Important constraints and requirements get lost
What's Next?
In Part 2, we'll dive into the most common solution—the sliding window—and explore why this seemingly reasonable approach is actually a trap that causes more problems than it solves.
Key Takeaways
Context windows are the fundamental constraint of working with LLMs
Every other challenge flows from this limitation
Simple solutions fail in production systems
Context engineering is a critical production discipline
This is Part 1 of a 6-part series on Context Engineering. Read Part 2: The Sliding Window Trap to learn about common pitfalls and their solutions.
References:
Karpathy, A. (2025). "Context Engineering" — X/Twitter post
Liu, N., et al. (2024). "Lost in the Middle: How Language Models Use Long Contexts" — Transactions of the ACL
Found this helpful? Follow me on LinkedIn for more insights on AI engineering and subscribe to get notified about the next parts in this series.
