Context Engineering Part 4: Teaching AI to Take Smart Notes
Instead of dropping old messages entirely, what if we could compress them into a summary that preserves essential information while using far fewer tokens?
I'm Software Engineer with 7+ years of experience in designing, implementing and debugging softwares including backend services, automation tools and Mobile SDK.
I love building things and currently building lessentext.com
This is the promise of summarization-based context management.
The Key Idea: Compress Without Losing Meaning
Token savings: 90% reduction Context preserved: Key facts retained
The Summarization Flow
Implementation
class SummarizingContextManager:
def __init__(self, max_tokens=4000, threshold=0.8):
self.max_tokens = max_tokens
self.threshold = threshold # Summarize at 80% capacity
self.messages = []
self.summaries = []
def add_message(self, role, content):
self.messages.append({"role": role, "content": content})
if self._usage_ratio() > self.threshold:
self._summarize_older_messages()
def _summarize_older_messages(self):
# Keep the 3 most recent messages
recent = self.messages[-3:]
older = self.messages[:-3]
if not older:
return
# Create summary using the LLM
summary = self._call_llm_for_summary(older)
self.summaries.append({
"role": "system",
"content": f"Previous conversation summary: {summary}"
})
# Replace old messages with summary
self.messages = recent
def get_full_context(self):
return self.summaries + self.messages
def _call_llm_for_summary(self, messages):
conversation = "\n".join(
f"{m['role']}: {m['content']}" for m in messages
)
prompt = f"""Summarize this conversation concisely. Preserve:
- User's goals and requirements
- Technical decisions made
- Problems and solutions discussed
- Constraints and preferences
Conversation:
{conversation}"""
return llm_api_call(prompt)
Summarization Strategies
1. Incremental Summarization
Summarize in small chunks rather than all at once:
2. Structured Summarization
Use a structured template instead of free-form summaries:
def create_structured_summary(messages):
prompt = """Create a structured summary with these sections:
GOALS: What the user wants to accomplish
TECH_STACK: Technologies and tools mentioned
DECISIONS: Important decisions made
PROBLEMS: Issues discussed and their status
CONSTRAINTS: Limitations or requirements
Conversation:
{conversation}"""
return llm_api_call(prompt.format(conversation=format(messages)))
Output:
SUMMARY:
GOALS: Build inventory management system with real-time tracking
TECH_STACK: React, TypeScript, Django, PostgreSQL, Redis, AWS ECS
DECISIONS: Using WebSocket for real-time, Redis for caching
PROBLEMS: WebSocket connections dropping — investigating nginx config
CONSTRAINTS: Must handle 10K concurrent users, deploy by Q3
3. Hierarchical Summarization
Multiple levels of detail for different needs:
The Trade-offs of Summarization
What Gets Lost in Summarization?
The Deepest Problem: Decision-Rationale Separation
There's a specific failure mode of summarization that's worse than general information loss:
This is decision-rationale separation—when compression preserves what was decided but loses why it was decided. The AI then feels free to override the decision because it doesn't see the constraint behind it.
This is the most dangerous failure in AI-assisted software development because:
The user doesn't know the rationale was lost (it's invisible)
The AI sounds confident (it doesn't know it's contradicting anything)
The result is wrong code, architectural inconsistencies, or compliance violations
Beyond Summarization
Summarization is a massive improvement over simple dropping, but it still has an inherent ceiling: you're compressing lossy. Some information will always be lost.
What if, instead of trying to fit everything into the window, we could reach outside it?
This brings us to modern context engineering: RAG, tool use, and memory systems.
Read Part 5: RAG, Tools, and the Context Engineering Stack to learn how modern approaches extend beyond the context window.
#AI #Summarization #ContextEngineering #LLM
