Skip to main content

Command Palette

Search for a command to run...

Context Engineering Part 1: Why Your AI Chatbot Forgets Everything

Published
5 min read
A

I'm Software Engineer with 7+ years of experience in designing, implementing and debugging softwares including backend services, automation tools and Mobile SDK.

I love building things and currently building lessentext.com

Every Large Language Model has amnesia. And it's not a bug—it's a fundamental design constraint that costs companies millions in lost productivity and wrong code decisions.

In this first part of our Context Engineering series, we'll explore the root cause of AI memory loss and why understanding the context window is critical for building production AI systems.

The Context Window: AI's Working Memory

Think of a context window as a whiteboard in a meeting room. You can only write so much on it before you run out of space. When you do, you must erase something old to write something new—and whatever you erase is gone.

What is a Context Window?

Every LLM can only "see" a finite amount of text at any given time. This finite text space is called the context window. It's measured in tokens (roughly 0.75 words per token).

Here's how context windows have evolved over time:

Model Year Context Window Approx. Pages
GPT-3 2022 4K tokens ~3 pages
GPT-4 2023 32K tokens ~25 pages
Claude 3 2024 200K tokens ~150 pages
Gemini 1.5 Pro 2024 1M tokens ~750 pages
GPT-4.1 2025 1M tokens ~750 pages
Llama 4 Scout 2026 10M tokens ~7,500 pages

The Real-World Impact

The context window directly controls five critical factors in your AI applications:

The Amnesia Problem: A Concrete Example

Here's a perfectly normal conversation that breaks without proper context management:

# Turn 1
user: "My name is Akash, I'm building a React app with TypeScript."
assistant: "Nice to meet you, Akash! What features are you implementing?"

# Turn 2  
user: "User authentication and a real-time dashboard."
assistant: "Great choices. What's your backend stack?"

# Turn 3
user: "Node.js with PostgreSQL and Redis for caching."
assistant: "Solid stack! What specific issue are you facing?"

# Turn 4
user: "The WebSocket connections keep dropping."
assistant: "Let me help debug that. Can you share your config?"

# ...20 turns later...

# Turn 24
user: "What tech stack am I using again?"
assistant: "I'm not sure — could you remind me?"  # ← CONTEXT LOST

The model didn't forget because it's bad. It forgot because those early messages were pushed out of the context window.

What Happens When the Window Overflows?

When new messages arrive but the window is full, one of four things happens:

The Four Failure Modes

Failure Mode What You See Example
Context Drift Model loses the original topic Started discussing React, now answering about Python
Repetition Model re-asks for information already provided "What framework are you using?" (you said React 5 turns ago)
Information Loss Important details silently dropped User's constraints, preferences, prior decisions — gone
Context Overflow Hard crash, no response Error: This model's maximum context length is 4097 tokens

Context Window vs Human Memory

Humans don't have this problem (at least not this badly). Here's why:

The Fundamental Challenge

The fundamental challenge of context engineering is: How do we give LLMs something resembling human memory management—selective, prioritized, and graceful—within a rigid token budget?

This is the question the rest of this series answers. We'll explore:

  1. Naive solutions (sliding windows) and why they fail

  2. Smarter strategies (token-based management)

  3. Compression techniques (summarization)

  4. Modern approaches (RAG, tool use, memory systems)

  5. The current frontier (long context models and their limitations)

Why This Matters for Your Business

Poor context management isn't just an annoyance—it has real business impact:

  • Lost Productivity: Teams spend time re-explaining context

  • Wrong Decisions: AI makes contradictory recommendations

  • Poor User Experience: Chatbots feel forgetful and unintelligent

  • Increased Costs: Inefficient token usage leads to higher API bills

  • Security Risks: Important constraints and requirements get lost

What's Next?

In Part 2, we'll dive into the most common solution—the sliding window—and explore why this seemingly reasonable approach is actually a trap that causes more problems than it solves.


Key Takeaways

  1. Context windows are the fundamental constraint of working with LLMs

  2. Every other challenge flows from this limitation

  3. Simple solutions fail in production systems

  4. Context engineering is a critical production discipline


This is Part 1 of a 6-part series on Context Engineering. Read Part 2: The Sliding Window Trap to learn about common pitfalls and their solutions.

References:

  • Karpathy, A. (2025). "Context Engineering" — X/Twitter post

  • Liu, N., et al. (2024). "Lost in the Middle: How Language Models Use Long Contexts" — Transactions of the ACL


Found this helpful? Follow me on LinkedIn for more insights on AI engineering and subscribe to get notified about the next parts in this series.

Context Engineering

Part 4 of 4

Explore the critical discipline of Context Engineering for Large Language Models (LLMs). This comprehensive series dives deep into the fundamental constraints of context windows, why traditional approaches fail, and how to build robust, efficient, and intelligent AI systems that truly understand and remember. Learn about advanced techniques like summarization, RAG, tool use, and causal-aware context management to overcome AI amnesia and unlock the full potential of LLMs in production environments.

Start from the beginning

Context Engineering Part 4: Teaching AI to Take Smart Notes

Instead of dropping old messages entirely, what if we could compress them into a summary that preserves essential information while using far fewer tokens?