Skip to main content

Command Palette

Search for a command to run...

Context Engineering Part 4: Teaching AI to Take Smart Notes

Instead of dropping old messages entirely, what if we could compress them into a summary that preserves essential information while using far fewer tokens?

Published
3 min read
A

I'm Software Engineer with 7+ years of experience in designing, implementing and debugging softwares including backend services, automation tools and Mobile SDK.

I love building things and currently building lessentext.com

This is the promise of summarization-based context management.

The Key Idea: Compress Without Losing Meaning

Token savings: 90% reduction Context preserved: Key facts retained

The Summarization Flow

Implementation

class SummarizingContextManager:
    def __init__(self, max_tokens=4000, threshold=0.8):
        self.max_tokens = max_tokens
        self.threshold = threshold  # Summarize at 80% capacity
        self.messages = []
        self.summaries = []

    def add_message(self, role, content):
        self.messages.append({"role": role, "content": content})

        if self._usage_ratio() > self.threshold:
            self._summarize_older_messages()

    def _summarize_older_messages(self):
        # Keep the 3 most recent messages
        recent = self.messages[-3:]
        older  = self.messages[:-3]

        if not older:
            return

        # Create summary using the LLM
        summary = self._call_llm_for_summary(older)
        self.summaries.append({
            "role": "system",
            "content": f"Previous conversation summary: {summary}"
        })

        # Replace old messages with summary
        self.messages = recent

    def get_full_context(self):
        return self.summaries + self.messages

    def _call_llm_for_summary(self, messages):
        conversation = "\n".join(
            f"{m['role']}: {m['content']}" for m in messages
        )
        prompt = f"""Summarize this conversation concisely. Preserve:
        - User's goals and requirements
        - Technical decisions made
        - Problems and solutions discussed
        - Constraints and preferences

        Conversation:
        {conversation}"""

        return llm_api_call(prompt)

Summarization Strategies

1. Incremental Summarization

Summarize in small chunks rather than all at once:

2. Structured Summarization

Use a structured template instead of free-form summaries:

def create_structured_summary(messages):
    prompt = """Create a structured summary with these sections:

    GOALS:       What the user wants to accomplish
    TECH_STACK:  Technologies and tools mentioned
    DECISIONS:   Important decisions made
    PROBLEMS:    Issues discussed and their status
    CONSTRAINTS: Limitations or requirements

    Conversation:
    {conversation}"""

    return llm_api_call(prompt.format(conversation=format(messages)))

Output:

SUMMARY:
  GOALS:       Build inventory management system with real-time tracking
  TECH_STACK:  React, TypeScript, Django, PostgreSQL, Redis, AWS ECS
  DECISIONS:   Using WebSocket for real-time, Redis for caching
  PROBLEMS:    WebSocket connections dropping — investigating nginx config
  CONSTRAINTS: Must handle 10K concurrent users, deploy by Q3

3. Hierarchical Summarization

Multiple levels of detail for different needs:

The Trade-offs of Summarization

What Gets Lost in Summarization?

The Deepest Problem: Decision-Rationale Separation

There's a specific failure mode of summarization that's worse than general information loss:

This is decision-rationale separation—when compression preserves what was decided but loses why it was decided. The AI then feels free to override the decision because it doesn't see the constraint behind it.

This is the most dangerous failure in AI-assisted software development because:

  • The user doesn't know the rationale was lost (it's invisible)

  • The AI sounds confident (it doesn't know it's contradicting anything)

  • The result is wrong code, architectural inconsistencies, or compliance violations

Beyond Summarization

Summarization is a massive improvement over simple dropping, but it still has an inherent ceiling: you're compressing lossy. Some information will always be lost.

What if, instead of trying to fit everything into the window, we could reach outside it?

This brings us to modern context engineering: RAG, tool use, and memory systems.


Read Part 5: RAG, Tools, and the Context Engineering Stack to learn how modern approaches extend beyond the context window.

#AI #Summarization #ContextEngineering #LLM

Context Engineering

Part 1 of 4

Explore the critical discipline of Context Engineering for Large Language Models (LLMs). This comprehensive series dives deep into the fundamental constraints of context windows, why traditional approaches fail, and how to build robust, efficient, and intelligent AI systems that truly understand and remember. Learn about advanced techniques like summarization, RAG, tool use, and causal-aware context management to overcome AI amnesia and unlock the full potential of LLMs in production environments.

Up next

Context Engineering Part 3: Beyond Message Counting

Not all messages are equal. So why treat them that way?