Akash's Engineering

Context Engineering Part 2: The Sliding Window Trap

Akash Panchal — Mon, 16 Mar 2026 07:21:54 GMT

The most common solution to AI amnesia is also the worst: sliding windows. It seems reasonable—keep only the most recent messages, drop the oldest when full. But this seemingly logical approach creates more problems than it solves.

The Sliding Window Approach

Here's how sliding windows work:

Implementation

class SlidingWindow:
    def __init__(self, window_size=10):
        self.window_size = window_size
        self.messages = []

    def add_message(self, message):
        self.messages.append(message)
        if len(self.messages) > self.window_size:
            self.messages = self.messages[-self.window_size:]

    def get_context(self):
        return self.messages

Simple. Clean. And deeply flawed.

The Critical Problem: You Lose What Matters Most

Consider this real conversation:

LOST: We're building an inventory system, using Django, with PostgreSQL, needing real-time tracking.

The AI knows about AWS and 10K users, but has no idea what we're actually building.

System Prompt Vulnerability

Even worse—losing the system prompt:

This isn't hypothetical—it's a real vulnerability in production systems.

When Sliding Window Works

Despite flaws, it works for:

Quick Q&A (independent questions)
Translation tasks (no long-term context)
Stateless API calls (self-contained requests)
Real-time chat support (only recent messages matter)

Priority Sliding Window: A Small Fix

Always keep the system prompt:

def priority_sliding(messages, window_size):
    system_msgs = [m for m in messages if m['role'] == 'system']
    other_msgs = [m for m in messages if m['role'] != 'system']
    
    available_space = window_size - len(system_msgs)
    recent_msgs = other_msgs[-available_space:]
    
    return system_msgs + recent_msgs

Better: System instructions preserved.
Still bad: Early conversation context lost.

The Core Issue

Sliding windows treat all messages as equally disposable. But in reality, some context is more valuable than others.

This brings us to token-based management—our next topic.

Read Part 3: Beyond Message Counting to learn how smart token allocation solves some of these problems.

#AI #ContextEngineering #SoftwareDevelopment #Tech

Context Engineering Part 1: Why Your AI Chatbot Forgets Everything

Akash Panchal — Thu, 12 Mar 2026 02:16:30 GMT

Every Large Language Model has amnesia. And it's not a bug—it's a fundamental design constraint that costs companies millions in lost productivity and wrong code decisions.

In this first part of our Context Engineering series, we'll explore the root cause of AI memory loss and why understanding the context window is critical for building production AI systems.

The Context Window: AI's Working Memory

Think of a context window as a whiteboard in a meeting room. You can only write so much on it before you run out of space. When you do, you must erase something old to write something new—and whatever you erase is gone.

What is a Context Window?

Every LLM can only "see" a finite amount of text at any given time. This finite text space is called the context window. It's measured in tokens (roughly 0.75 words per token).

Here's how context windows have evolved over time:

Model	Year	Context Window	Approx. Pages
GPT-3	2022	4K tokens	~3 pages
GPT-4	2023	32K tokens	~25 pages
Claude 3	2024	200K tokens	~150 pages
Gemini 1.5 Pro	2024	1M tokens	~750 pages
GPT-4.1	2025	1M tokens	~750 pages
Llama 4 Scout	2026	10M tokens	~7,500 pages

The Real-World Impact

The context window directly controls five critical factors in your AI applications:

The Amnesia Problem: A Concrete Example

Here's a perfectly normal conversation that breaks without proper context management:

# Turn 1
user: "My name is Akash, I'm building a React app with TypeScript."
assistant: "Nice to meet you, Akash! What features are you implementing?"

# Turn 2  
user: "User authentication and a real-time dashboard."
assistant: "Great choices. What's your backend stack?"

# Turn 3
user: "Node.js with PostgreSQL and Redis for caching."
assistant: "Solid stack! What specific issue are you facing?"

# Turn 4
user: "The WebSocket connections keep dropping."
assistant: "Let me help debug that. Can you share your config?"

# ...20 turns later...

# Turn 24
user: "What tech stack am I using again?"
assistant: "I'm not sure — could you remind me?"  # ← CONTEXT LOST

The model didn't forget because it's bad. It forgot because those early messages were pushed out of the context window.

What Happens When the Window Overflows?

When new messages arrive but the window is full, one of four things happens:

The Four Failure Modes

Failure Mode	What You See	Example
Context Drift	Model loses the original topic	Started discussing React, now answering about Python
Repetition	Model re-asks for information already provided	"What framework are you using?" (you said React 5 turns ago)
Information Loss	Important details silently dropped	User's constraints, preferences, prior decisions — gone
Context Overflow	Hard crash, no response	`Error: This model's maximum context length is 4097 tokens`

Context Window vs Human Memory

Humans don't have this problem (at least not this badly). Here's why:

The Fundamental Challenge

The fundamental challenge of context engineering is: How do we give LLMs something resembling human memory management—selective, prioritized, and graceful—within a rigid token budget?

This is the question the rest of this series answers. We'll explore:

Naive solutions (sliding windows) and why they fail
Smarter strategies (token-based management)
Compression techniques (summarization)
Modern approaches (RAG, tool use, memory systems)
The current frontier (long context models and their limitations)

Why This Matters for Your Business

Poor context management isn't just an annoyance—it has real business impact:

Lost Productivity: Teams spend time re-explaining context
Wrong Decisions: AI makes contradictory recommendations
Poor User Experience: Chatbots feel forgetful and unintelligent
Increased Costs: Inefficient token usage leads to higher API bills
Security Risks: Important constraints and requirements get lost

What's Next?

In Part 2, we'll dive into the most common solution—the sliding window—and explore why this seemingly reasonable approach is actually a trap that causes more problems than it solves.

Key Takeaways

Context windows are the fundamental constraint of working with LLMs
Every other challenge flows from this limitation
Simple solutions fail in production systems
Context engineering is a critical production discipline

This is Part 1 of a 6-part series on Context Engineering. Read Part 2: The Sliding Window Trap to learn about common pitfalls and their solutions.

References:

Karpathy, A. (2025). "Context Engineering" — X/Twitter post
Liu, N., et al. (2024). "Lost in the Middle: How Language Models Use Long Contexts" — Transactions of the ACL

Found this helpful? Follow me on LinkedIn for more insights on AI engineering and subscribe to get notified about the next parts in this series.

I Solved Claude Code's Biggest Flaw: Context Compaction Amnesia

Akash Panchal — Wed, 25 Feb 2026 17:29:27 GMT

How a frustrating afternoon led to a breakthrough that changed how I use AI forever

The Breaking Point

It was 3 AM on a Tuesday. I'd been working with Claude Code for six hours straight, architecting a complex microservices system. The conversation was long, detailed, and productive.

Until it wasn't.

Me: "Let's use PostgreSQL for the user database since our team knows SQL well and we can't hire MongoDB experts."

Claude (6 hours later): "You know, for this user data use case, MongoDB might actually be better with its flexible schema..."

I stared at the screen, dumbfounded.

Six hours. Dozens of architectural decisions. And Claude had completely forgotten the most important constraint: our team only knows SQL.

This wasn't just annoying. It was actively harmful. All that context, all those decisions, and the AI was suggesting something that would require us to hire three new developers.

That's when I realized: Claude Code has a fundamental flaw.

The Hidden Flaw: Context Compaction Amnesia

Claude Code is incredible, but it has a dirty secret. As conversations get long, it performs "context compaction" - summarizing earlier parts to make room for new information.

The problem? Compaction strips away the WHY behind decisions.

It remembers the WHAT ("Use PostgreSQL") but forgets the CONSTRAINT ("team only knows SQL") and RATIONALE ("can't hire MongoDB experts").

Without the WHY, decisions become mere suggestions that can be overridden.

Here's how it plays out in real projects:

Turn 1: "Use React not Vue — our company standardized on React last year"
Turn 15: Claude suggests Vue for a new component
Turn 30: Claude recommends a Vue-specific library
Result: Architectural inconsistency, wasted time, confused team

This isn't just a technical issue. It's a project management nightmare.

The Lightbulb Moment

The next day, I was explaining this frustration to a colleague over coffee.

"The problem," I said, "is that Claude remembers WHAT we decided, but not WHY we decided it."

He looked at me and said: "What if you could weld the WHY to the WHAT so tightly that compaction could never separate them?"

That was it. That was the solution.

What if instead of storing flat facts, we stored causal triples?

CONSTRAINT ⛔ Company standardized on React last year
     ↓
RATIONALE 💡 Team consistency and reduced training costs
     ↓
DECISION ▸ Use React not Vue

These three would be welded together. Compaction couldn't drop one without dropping all. Claude would always see the WHY.

Building Crux: The Causal Memory Graph

I spent the next two weeks building Crux - a Claude Code plugin that implements this causal memory system.

The Architecture

Crux hooks directly into Claude Code's lifecycle:

SessionStart: Sets up the causal graph
UserPromptSubmit: Extracts decisions from your requests
PreCompact: The magic layer - injects co-inclusion rules
Stop: Extracts decisions from Claude's responses
SessionEnd: Cleans up for next session

The PreCompact hook is where the magic happens. Right before Claude compacts context, Crux injects strict instructions:

"You MUST NOT separate any DECISION from its RATIONALE and CONSTRAINT. These are co-included and must travel together."

The Causal Triple System

Every decision in Crux becomes a causal triple:

{
  "decision": {
    "id": "decision_123",
    "content": "Use PostgreSQL for user database",
    "dependsOn": ["rationale_123", "constraint_123"],
    "source": "user",
    "turn": 15
  },
  "rationale": {
    "id": "rationale_123", 
    "content": "Team knows SQL well and can't hire MongoDB experts",
    "dependsOn": ["constraint_123"],
    "source": "user",
    "turn": 15
  },
  "constraint": {
    "id": "constraint_123",
    "content": "Team only knows SQL databases",
    "dependsOn": [],
    "source": "user", 
    "turn": 15
  }
}

These aren't just stored - they're actively injected back into Claude's context before every compaction.

The Transformation: Before vs After

Before Crux: The Amnesia Problem

You: "Use PostgreSQL because our team knows SQL well and we can't hire MongoDB experts"
Claude: "Got it, PostgreSQL for user database."

...40 turns of development later...

Claude: "For the user analytics feature, have you considered MongoDB? Its flexible schema would be great for user behavior tracking."
You: "But we can't hire MongoDB experts..."
Claude: "Right, but you could use a managed service or hire a consultant..."

Result: Wasted time, architectural inconsistency, frustrated team.

After Crux: Co-Inclusion Guarantee

You: "Use PostgreSQL because our team knows SQL well and we can't hire MongoDB experts"
Crux: [Stores causal triple automatically]

...40 turns of development later...

Claude: "For the user analytics feature, PostgreSQL remains the best choice given your constraint that team only knows SQL and you can't hire MongoDB experts. Here's how we can optimize the queries..."

Result: Consistent architecture, respected constraints, productive development.

Real-World Impact: A Case Study

Let me show you how this transformed a real project.

The Project: E-commerce Platform

Our team was building a complex e-commerce platform. Here's how the conversation evolved:

Week 1: Foundation Decisions

You: "We'll use Next.js for frontend because our team has React experience and we need good SEO."
Crux: [Stores: CONSTRAINT=team React experience, RATIONALE=SEO needs, DECISION=Next.js]

You: "Payment processing must use Stripe because our legal team approved it and we can't use PayPal due to compliance issues."
Crux: [Stores: CONSTRAINT=compliance issues, RATIONALE=legal approval, DECISION=Stripe]

Week 3: Architecture Challenge

Claude: "For the product recommendation engine, we could use TensorFlow.js for client-side processing..."
You: "But our team only knows Python for ML..."
Claude: "Right, given your constraint that team only knows Python, let's use a Python-based recommendation service with API endpoints."

Week 6: Vendor Pressure

Stakeholder: "PayPal is offering us better rates. Can we switch?"
Claude: "Based on your constraint that PayPal has compliance issues and legal team only approved Stripe, switching is not viable without legal review. However, we can negotiate better rates with Stripe or explore hybrid solutions."

The difference was night and day.

Without Crux: Claude might have suggested switching to PayPal, causing legal issues and wasted effort.

With Crux: Claude actively protected our constraints while still helping optimize within those boundaries.

The Technical Magic: How It Actually Works

Extraction Engine

Crux uses a sophisticated extraction engine that understands natural language:

// Simplified example of extraction logic
function extractDecision(text) {
  const constraints = findConstraints(text); // "team only knows SQL", "legal approved"
  const decisions = findDecisions(text);    // "use PostgreSQL", "use Stripe"  
  const rationales = findRationales(text);  // "SEO needs", "compliance issues"
  
  return createCausalTriples(constraints, decisions, rationales);
}

The Co-Inclusion Injection

Before every context compaction, Crux injects:

--- CRUX: Active Architectural Decisions ---

DECISION: Use Next.js for frontend
RATIONALE: Team has React experience and needs good SEO  
CONSTRAINT: ⛔ Team only knows React-based frameworks

DECISION: Use Stripe for payment processing
RATIONALE: Legal team approval and compliance requirements
CONSTRAINT: ⛔ Cannot use PayPal due to compliance issues

⚠️ Respect these decisions unless explicitly asked to revisit them. Do not separate decisions from their constraints and rationales.

This injection happens automatically, invisibly, every single time.

Smart Deduplication

Crux also prevents decision pollution:

// Same decision, different wording
"We'll use PostgreSQL" vs "Let's go with Postgres" vs "PostgreSQL is our choice"

// Crux recognizes these as semantically identical
// and updates existing decision instead of creating duplicates

Zero Configuration, Maximum Impact

The best part about Crux? It works completely transparently.

No special commands. No manual configuration. No "remember this decision" prompts.

It just understands from normal conversation and protects your architectural choices automatically.

Installation

# Add the marketplace
/plugin marketplace add akashp1712/claude-marketplace

# Install Crux
/plugin install crux@akashp1712

That's it. Start coding, and Crux will automatically protect your decisions.

This Changes Everything for Long-Term Projects

For anyone using Claude Code for serious development work, this is transformative:

Team Collaboration

Everyone on the team sees the same constraints and reasoning. No more "I thought we decided to use X" conversations.

Client Work

Maintain consistency across long engagements. Decisions made in month 1 are still respected in month 6.

Complex Architecture

Multiple interconnected decisions stay coherent. The payment system choice respects the legal constraints. The frontend choice respects the team skills.

Peace of Mind

No more waking up at 3 AM wondering if Claude is going to suggest something contradictory tomorrow.

The Future of AI Memory

Crux is just the beginning. As AI becomes more integrated into our development workflows, we need better memory systems:

Roadmap v0.2: Conflict Detection

When you say "Use PostgreSQL" and later "Use MongoDB", Crux will flag: "This contradicts your earlier decision about team skills."

Roadmap v0.3: Team Graphs

Git-committable decision graphs for team sharing. Everyone sees the same architectural constraints.

Roadmap v1.0: The Proxy

Protocol-level interception for exact control over context compaction.

Try It Yourself

I've open-sourced Crux under MIT license. It's production-ready, zero-dependency, and works with Claude Code immediately.

Get started: https://github.com/akashp1712/claude-crux

Full marketplace: https://github.com/akashp1712/claude-marketplace

Join the Movement

Context compaction amnesia is a problem every serious Claude Code user faces. We don't have to accept it.

If you've ever been frustrated by your AI forgetting important constraints, try Crux. If you're building AI tools, think about causal memory, not just flat facts.

Share your experiences: Let me know how Crux changes your workflow.

Contribute: The project is open for contributions - new extraction patterns, better UI, conflict detection.

Because your AI should remember WHY, not just WHAT.

From O(n) to O(1): How We Fixed Our API Key Validation Performance

Akash Panchal — Wed, 25 Feb 2026 16:49:14 GMT

When you're building a SaaS API, every millisecond counts. We recently discovered that our API key validation was a ticking time bomb—and fixed it before it exploded.

The Problem

Our API uses bearer tokens for authentication. Every request includes an API key:

Authorization: Bearer paymint_production_apikey_a1b2c3...

For security, we encrypt API keys before storing them in the database. The encryption uses AES-256-GCM, which means we can't simply query for a matching key—we have to decrypt to compare.

Here's what our original validation looked like:

export async function validateApiKey(apiKey: string): Promise {
  // Get ALL active API keys
  const apiKeyRecords = await database.apiKey.findMany({
    where: { status: 'active' },
  });

  // Decrypt each one and compare
  for (const record of apiKeyRecords) {
    const decryptedKey = decrypt(record.encryptedKey);
    if (decryptedKey === apiKey) {
      return { isValid: true, organizationId: record.organizationId };
    }
  }

  return { isValid: false };
}

This works. But there's a problem hiding in plain sight.

The Math

Let's say decryption takes 5ms per key (it's actually faster, but let's be conservative).

Active API Keys	Time to Validate
10	50ms
100	500ms
1,000	5 seconds
10,000	50 seconds

Every API request—fetching products, listing subscriptions, canceling a subscription—would need to wait for this validation. At 1,000 keys, we'd be adding 5 seconds of latency to every single request.

This is O(n) complexity. As our customer base grows, performance degrades linearly.

Why Not Just Query the Encrypted Key?

You might think: "Just store the encrypted key and query for it directly."

SELECT * FROM api_keys WHERE encrypted_key = ?

This doesn't work because encryption is non-deterministic. AES-GCM uses a random initialization vector (IV) for each encryption, so encrypting the same plaintext twice produces different ciphertexts.

encrypt('my_api_key') // => 'abc123...'
encrypt('my_api_key') // => 'xyz789...' (different!)

This is actually a security feature—it prevents attackers from identifying duplicate keys by comparing ciphertexts.

The Solution: Hash-Based Lookup

The fix is elegant: store a hash of the API key alongside the encrypted version.

Unlike encryption, hashing is deterministic—the same input always produces the same output. And unlike encryption, we don't need to reverse it. We just need to find a match.

import crypto from 'node:crypto';

function hashApiKey(apiKey: string): string {
  return crypto.createHash('sha256').update(apiKey).digest('hex');
}

New Schema

model ApiKey {
  id           String  @id @default(uuid())
  keyHash      String? @unique  // SHA-256 hash for O(1) lookup
  encryptedKey String           // AES-256-GCM encrypted key
  // ... other fields
}

New Validation

export async function validateApiKey(apiKey: string): Promise {
  const keyHash = hashApiKey(apiKey);

  // O(1) lookup using unique index
  const record = await database.apiKey.findFirst({
    where: {
      keyHash: keyHash,
      status: 'active',
    },
  });

  if (!record) {
    return { isValid: false };
  }

  // Defense in depth: verify by decrypting
  const decryptedKey = decrypt(record.encryptedKey);
  if (decryptedKey !== apiKey) {
    return { isValid: false };
  }

  return {
    isValid: true,
    organizationId: record.organizationId,
  };
}

Now validation is O(1)—constant time regardless of how many API keys exist.

Active API Keys	Old Time	New Time
10	50ms	~5ms
100	500ms	~5ms
1,000	5s	~5ms
10,000	50s	~5ms

Why Keep the Encrypted Key?

You might wonder: if we have the hash, why bother with encryption?

Two reasons:

Defense in depth: After finding a hash match, we decrypt and verify. This protects against hash collisions (astronomically unlikely with SHA-256, but defense in depth is good practice).
Key rotation: If we ever need to re-encrypt keys (e.g., rotating the encryption key), we need the actual key value. The hash alone isn't reversible.

Migration Strategy

We couldn't just flip a switch—existing API keys didn't have hashes. Here's how we handled the migration:

1. Make the hash column nullable

ALTER TABLE "ApiKey" ADD COLUMN "keyHash" TEXT;
CREATE UNIQUE INDEX "ApiKey_keyHash_key" ON "ApiKey"("keyHash");

2. Add hash on new key creation

function generateApiKey(environment: 'sandbox' | 'production') {
  const key = `paymint_\({environment}_apikey_\){crypto.randomBytes(32).toString('hex')}`;
  const keyHash = hashApiKey(key);
  const encryptedKey = encrypt(key);

  return { key, keyHash, encryptedKey };
}

3. Fallback for legacy keys

export async function validateApiKey(apiKey: string): Promise {
  const keyHash = hashApiKey(apiKey);

  // Try O(1) lookup first
  const record = await database.apiKey.findFirst({
    where: { keyHash, status: 'active' },
  });

  if (record) {
    // Verify and return
    const decryptedKey = decrypt(record.encryptedKey);
    if (decryptedKey === apiKey) {
      return { isValid: true, organizationId: record.organizationId };
    }
    return { isValid: false };
  }

  // Fallback: check legacy keys without hash
  return validateApiKeyLegacy(apiKey);
}

async function validateApiKeyLegacy(apiKey: string): Promise {
  const records = await database.apiKey.findMany({
    where: { status: 'active', keyHash: null },
  });

  for (const record of records) {
    const decryptedKey = decrypt(record.encryptedKey);
    if (decryptedKey === apiKey) {
      // Auto-migrate: add hash for future O(1) lookups
      const keyHash = hashApiKey(apiKey);
      await database.apiKey.update({
        where: { id: record.id },
        data: { keyHash },
      });

      return { isValid: true, organizationId: record.organizationId };
    }
  }

  return { isValid: false };
}

The legacy fallback auto-migrates keys on first use. Over time, all keys get hashes, and the fallback path becomes unused.

Security Considerations

Is storing a hash safe?

Yes. SHA-256 is a one-way function—you can't reverse it to get the original key. An attacker with database access would see:

keyHash: "a1b2c3d4e5f6..."
encryptedKey: "encrypted_blob..."

They can't use the hash to authenticate (the API expects the original key), and they can't decrypt without the encryption key (stored separately in environment variables).

What about rainbow tables?

API keys are high-entropy random strings (72+ characters of hex). Rainbow tables are only practical for low-entropy inputs like passwords. The search space for our keys is 16^72, which is... large.

What about timing attacks?

We use constant-time comparison for the final verification:

import crypto from 'node:crypto';

function secureCompare(a: string, b: string): boolean {
  return crypto.timingSafeEqual(Buffer.from(a), Buffer.from(b));
}

Results

After deploying this change:

P50 latency: Reduced by 40ms
P99 latency: Reduced by 200ms
Database load: Significantly reduced (no more full table scans)

More importantly, we removed a scaling bottleneck. Our API can now handle 10x more customers without degrading performance.

Key Takeaways

Audit your auth paths: Authentication runs on every request. Even small inefficiencies compound.
Encryption ≠ Hashing: Encryption is reversible and non-deterministic. Hashing is one-way and deterministic. Use the right tool.
Plan for scale: Code that works at 10 customers might break at 10,000. Think about complexity classes.
Migrate gracefully: Use fallbacks and auto-migration to avoid big-bang deployments.
Defense in depth: Even with hash-based lookup, we still verify by decrypting. Belt and suspenders.

Building a SaaS? Check out Paymint—we handle subscription billing so you can focus on your product.

Tags: performance, security, api-design, database, optimization, saas, authentication

From O(n) to O(1): How We Fixed Our API Key Validation Performance

Akash Panchal — Sat, 24 Jan 2026 02:13:21 GMT

When you're building a SaaS API, every millisecond counts. We recently discovered that our API key validation was a ticking time bomb—and fixed it before it exploded.

The Problem

Our API uses bearer tokens for authentication. Every request includes an API key:

Authorization: Bearer paymint_production_apikey_a1b2c3...

For security, we encrypt API keys before storing them in the database. The encryption uses AES-256-GCM, which means we can't simply query for a matching key—we have to decrypt each one to compare.

Here's what our original validation looked like:

export async function validateApiKey(apiKey: string): Promise<ValidationResult> {
  // Get ALL active API keys
  const apiKeyRecords = await database.apiKey.findMany({
    where: { status: 'active' },
  });

  // Decrypt each one and compare
  for (const record of apiKeyRecords) {
    const decryptedKey = decrypt(record.encryptedKey);
    if (decryptedKey === apiKey) {
      return { 
        isValid: true, 
        organizationId: record.organizationId 
      };
    }
  }

  return { isValid: false };
}

This works. But there's a problem hiding in plain sight.

The Request Flow Problem

The Math

Let's say decryption takes 5ms per key (it's actually faster, but let's be conservative).

Active API Keys	Time to Validate	Performance Impact
10	50ms	Noticeable
100	500ms	Unacceptable
1,000	5 seconds	Business-breaking
10,000	50 seconds	Timeout territory

This is O(n) complexity. As our customer base grows, performance degrades linearly. We needed O(1).

Why Not Just Query the Encrypted Key?

You might think: "Just store the encrypted key and query for it directly."

SELECT * FROM api_keys WHERE encrypted_key = ?

This doesn't work because encryption is non-deterministic. AES-GCM uses a random initialization vector (IV) for each encryption, so encrypting the same plaintext twice produces different ciphertexts.

encrypt('my_api_key') // => 'abc123...'
encrypt('my_api_key') // => 'xyz789...' (different!)

This is actually a security feature—it prevents attackers from identifying duplicate keys by comparing ciphertexts. But it means we can't use encrypted values for database lookups.

The Solution: Hash-Based Lookup

The fix is elegant: store a hash of the API key alongside the encrypted version.

Unlike encryption, hashing is deterministic—the same input always produces the same output. And unlike encryption, we don't need to reverse it. We just need to find a match.

import crypto from 'node:crypto';

function hashApiKey(apiKey: string): string {
  return crypto.createHash('sha256').update(apiKey).digest('hex');
}

Comparing Encryption vs Hashing

New Schema

prisma

model ApiKey {
  id           String   @id @default(uuid())
  keyHash      String?  @unique // SHA-256 hash for O(1) lookup
  encryptedKey String            // AES-256-GCM encrypted key
  status       String
  organizationId String
  // ... other fields
}

New Validation Logic

typescript

export async function validateApiKey(apiKey: string): Promise<ValidationResult> {
  const keyHash = hashApiKey(apiKey);

  // O(1) lookup using unique index
  const record = await database.apiKey.findFirst({
    where: { 
      keyHash: keyHash,
      status: 'active',
    },
  });

  if (!record) {
    return { isValid: false };
  }

  // Defense in depth: verify by decrypting
  const decryptedKey = decrypt(record.encryptedKey);
  if (decryptedKey !== apiKey) {
    return { isValid: false };
  }

  return { 
    isValid: true, 
    organizationId: record.organizationId,
  };
}

Now validation is O(1)—constant time regardless of how many API keys exist.

The New Request Flow: Optimized API Key Validation Flow

Performance Comparison

Active API Keys	Old Time (O(n))	New Time (O(1))	Improvement
10	50ms	~5ms	10x faster
100	500ms	~5ms	100x faster
1,000	5s	~5ms	1000x faster
10,000	50s	~5ms	10000x faster

Why Keep the Encrypted Key?

You might wonder: if we have the hash, why bother with encryption?

Two critical reasons:

1. Defense in Depth

After finding a hash match, we decrypt and verify. This protects against hash collisions (astronomically unlikely with SHA-256, but defense in depth is good practice). If an attacker somehow found a collision, they still wouldn't get through.

2. Key Rotation & Recovery

If we ever need to:

Re-encrypt keys (e.g., rotating the encryption key)
Migrate to a different encryption algorithm
Support key export features

We need the actual key value. The hash alone isn't reversible, so we'd lose the original keys forever.

What About Rainbow Tables?

API keys are high-entropy random strings (72+ characters of hex from crypto.randomBytes(32)). Rainbow tables are only practical for low-entropy inputs like passwords or common phrases.

The search space for our keys is 16^72 ≈ 10^86 possible values. For comparison:

Number of atoms in the universe: ~10^80
SHA-256 output space: 2^256 ≈ 10^77

Rainbow tables aren't feasible here.

What About Timing Attacks?

We use constant-time comparison for the final verification to prevent timing attacks:

import crypto from 'node:crypto';

function secureCompare(a: string, b: string): boolean {
  if (a.length !== b.length) {
    return false;
  }
  return crypto.timingSafeEqual(
    Buffer.from(a), 
    Buffer.from(b)
  );
}

// Use in validation
if (secureCompare(decryptedKey, apiKey)) {
  return { isValid: true, organizationId: record.organizationId };
}

This prevents attackers from using response time variations to guess key characters.

Security Layers Summary

Defense-in-Depth Security Layers

Results

After deploying this change, we saw immediate and dramatic improvements:

P50 latency: Reduced by 40ms (from 45ms to 5ms)
P99 latency: Reduced by 200ms (from 205ms to 5ms)
Database load: Significantly reduced (no more full table scans)
Scalability: Removed the O(n) bottleneck entirely

More importantly, we removed a scaling time bomb. Our API can now handle 10x, 100x, or even 10,000x more customers without degrading authentication performance.

Key Takeaways

1. Audit Your Authentication Paths

Authentication runs on every single request. Even small inefficiencies compound dramatically. A 50ms slowdown might seem negligible, but multiply that by millions of requests and you've got a serious problem.

2. Encryption ≠ Hashing

Encryption: Reversible, non-deterministic, requires a secret key
Hashing: One-way, deterministic, no secret needed

Use encryption when you need to retrieve the original value. Use hashing when you only need to verify a match. For API keys, we need both—hash for lookup, encryption for storage.

3. Plan for Scale from Day One

Code that works perfectly at 10 customers might completely break at 10,000. Always think about algorithmic complexity:

O(1): Constant time - scales infinitely
O(log n): Logarithmic - scales very well
O(n): Linear - degrades as you grow
O(n²): Quadratic - disaster waiting to happen

4. Migrate Gracefully

Use fallbacks and auto-migration to avoid big-bang deployments. Our approach:

Made changes backward-compatible
Auto-migrated on first use
Monitored migration progress
Removed legacy code only after full migration

Zero downtime, zero broken API keys.

5. Defense in Depth Works

Even with hash-based lookup, we still verify by decrypting. This belt-and-suspenders approach:

Protects against hash collisions
Enables future key rotation
Provides an extra security layer
Costs only one additional decryption (~5ms)

The tiny performance cost is worth the security benefits.

6. Database Indexes Are Your Friend

The @unique index on keyHash is what makes the O(1) lookup possible. Without it, we'd still be doing table scans. Always index your lookup fields.

Conclusion

This optimization transformed our API from a scaling liability into a performant, production-ready system. By understanding the difference between encryption and hashing, and applying the right tool for each job, we turned a potential disaster into a success story.

The next time you're implementing authentication, remember: how you store credentials matters just as much as that you store them securely.

Building a SaaS? Check out Paymint—we handle subscription billing so you can focus on your product.

Have questions about this implementation? Found this helpful? Let me know in the comments below!

Tags: #performance #security #api-design #database #optimization #saas #authentication #scaling #node-js #postgresql

From 4.5s to 90ms: Solving Webhook Timeouts with Vercel Workflow

Akash Panchal — Sun, 14 Dec 2025 04:47:48 GMT

When building integrations with third-party services like Paddle or Stripe, webhook handlers face a critical, often overlooked challenge: Response Time Constraints.

Most webhook providers enforce strict timeout limits (typically 5–10 seconds). If you don't respond with a 200 OK within that window, the provider assumes you failed. They will retry the webhook, potentially leading to duplicate processing, or worse—give up entirely.

The 4500ms Nightmare

Recently, our Paddle webhook integration started behaving dangerously. We were experiencing response times of approximately ~4500ms.

With Paddle’s timeout limit set to 5000ms, we were living on the edge.

Why was it so slow?

Our architecture was synchronous. When a webhook hit our server, we tried to do everything at once:

Log the entry: Write a received status to the database.
Process the logic: Run complex business logic (provisioning licenses, sending emails).
Update the log: Write a processed status to the database.

The Risks

While this worked in local dev, in production it created three critical issues:

Timeout Risk: A slight network blip or database latency would push us over 5000ms, triggering retries.
Data Integrity: If the handler failed on step 3, the provider would retry, and we might provision the license twice (Step 2).
Scalability: Synchronous processing locks up server resources, making us vulnerable to burst traffic.

The Solution: Asynchronous Durable Workflows

We needed to decouple the acknowledgment of the event from the processing of the event.

We implemented the Vercel Workflow Development Kit to transform our synchronous handler into an asynchronous, durable workflow.

The Result? We reduced our response time from ~4500ms to ~90ms — a 98% improvement.

Here is how we architected it using Next.js.

The Architecture Change

Instead of doing the work during the request, we simply validate the request, queue the work, and respond immediately.

Implementation Guide

Here is the step-by-step implementation to move from synchronous code to Vercel Workflow.

1. Configure Next.js

First, wrap your Next.js config to enable workflow directives.

// next.config.ts
import { withWorkflow } from 'workflow/next';
import type { NextConfig } from 'next';

const nextConfig: NextConfig = {
  // ... your Next.js configuration
};

export default withWorkflow(nextConfig);

2. Update Middleware

The SDK creates internal endpoints at /.well-known/workflow/*. You must exclude these from your middleware authentication/redirect logic so the workflow engine can communicate with your app.

// middleware.ts
export const config = {
  matcher: [
    // Exclude .well-known/workflow/ from middleware
    '/((?!_next/static|_next/image|favicon.ico|.well-known/workflow/).*)',
  ],
};

3. The New Route Handler (The 90ms Fix)

This is the most important change. Notice how we do not await the business logic. We only await the start() command, which simply queues the job.

// app/webhooks/paddle/[webhookId]/route.ts
import { start } from 'workflow/api';
import { workflowPaddleWebhook } from '@/workflows/webhooks/paddle';

export const POST = async (
  request: Request,
  { params }: { params: Promise<{ webhookId: string }> }
): Promise => {
  try {
    const { webhookId } = await params;

    // 1. Verify signature (Security is still synchronous!)
    const isValid = await verifyWebhookSignature(request, webhookId);
    if (!isValid) return new Response('Invalid signature', { status: 401 });

    const eventData = await request.json();

    // 2. Start workflow asynchronously - returns immediately!
    await start(workflowPaddleWebhook, [
      organizationId,
      'production',
      eventData,
    ]);

    // 3. Respond immediately
    return new Response('Webhook processed', { status: 200 });
  } catch (error) {
    console.error('Error starting webhook workflow', error);
    return new Response('Internal server error', { status: 500 });
  }
};

4. Defining the Workflow

The workflow file acts as the orchestrator. The 'use workflow' directive enables automatic retries and state persistence.

// workflows/webhooks/paddle/index.ts
import {
  stepCreateWebhookLog,
  stepHandlePaddleWebhook,
  stepUpdateWebhookLog,
} from './steps';

export const workflowPaddleWebhook = async (
  organizationId: string,
  environment: 'sandbox' | 'production',
  eventData: PaddleWebhookPayload
) => {
  'use workflow'; // <--- The magic directive

  // Step 1: Log reception
  const webhookLog = await stepCreateWebhookLog(organizationId, eventData);

  // Step 2: Heavy lifting
  await stepHandlePaddleWebhook(organizationId, environment, eventData);

  // Step 3: Log completion
  await stepUpdateWebhookLog(webhookLog.id);

  return { success: true };
};

5. Defining the Steps

Each step is an isolated unit of work. If Step 2 fails (e.g., the database is temporarily down), the engine will retry only Step 2. Step 1 remains completed.

// workflows/webhooks/paddle/steps.ts
export const stepCreateWebhookLog = async (orgId: string, data: any) => {
  'use step';
  return await database.webhookLog.create({ /* ... */ });
};

export const stepHandlePaddleWebhook = async (orgId: string, env: string, data: any) => {
  'use step';

  // Dispatch based on event type
  switch (data.event_type) {
    case 'product.created':
      await handleProductCreated(data.data, orgId);
      break;
    case 'subscription.created':
      await handleSubscriptionCreated(data.data, orgId);
      break;
  }
};

The "Gotchas": Handling Idempotency

When you switch to an async retry system, idempotency is mandatory.

Because the workflow engine might retry a failed step, your code must be able to run twice without breaking things (e.g., charging a customer twice).

Always check if the work has already been done before processing:

export const handleSubscriptionCreated = async (data: any, orgId: string) => {
  // Check if we already handled this Event ID
  const existingEvent = await database.webhookEvent.findUnique({
    where: { eventId: data.id },
  });

  if (existingEvent) {
    console.log('Event already processed, skipping.');
    return; 
  }

  // If not, proceed...
  await database.subscription.create({ /* ... */ });
};

Performance Comparison

The impact on our system stability was immediate.

Metric	Before (Synchronous)	After (Workflow)
Response Time	~4500ms	~90ms
Timeout Buffer	< 10%	\> 99%
Retry Strategy	None (Fail = Data Loss)	Automatic & Durable
Scalability	Low (Blocking)	High (Queued)

Conclusion

Migrating to Vercel Workflow DevKit didn't just speed up our response times; it fundamentally changed how we handle reliability. We no longer fear the 5-second timeout limit, and we have granular visibility into exactly which step of a webhook failed.

If you are dealing with critical webhooks (Payments, CI/CD, Email), move your logic out of the route handler. Your future self will thank you.

Links & References

Have you struggled with webhook timeouts in Next.js? Let me know how you solved it in the comments below!

Configuring HTTP Proxy for gRPC in C# Without Environment Variables

Akash Panchal — Sat, 13 Dec 2025 13:34:39 GMT

The Challenge

You have a gRPC client in C# using Grpc.Core that needs to route traffic through an HTTP proxy. Sounds simple, right?

Not quite.

If you've searched for solutions, you've probably found:

Set http_proxy environment variable ✅ Works, but affects ALL HTTP traffic
Use Grpc.Net.Client with HttpClientHandler.Proxy ✅ Clean, but requires library migration
Set grpc.http_proxy channel option ❌ Doesn't work in Grpc.Core

I needed per-channel proxy configuration without affecting other traffic and without migrating libraries. So I dove into the gRPC C-core source code to understand how http_proxy actually works.

My major limitation was that my monolith library is using grpc.core v1.10.0 and had no options such as grpc_proxy

Source Code References (gRPC v1.10.0)

If you want to explore the internals yourself, here are the key files:

File	Purpose
`http_proxy.cc`	Reads `http_proxy` env var, sets channel args
`http_connect_handshaker.cc`	Sends HTTP CONNECT request to proxy
`secure_channel_create.cc`	SSL/TLS target name override handling
`channel.cc`	Default authority header logic
`ChannelOptions.cs`	C# channel option constants

What I Discovered

When gRPC honors the http_proxy environment variable, it doesn't do anything magical. It simply:

Parses the proxy URL
Sets internal channel arguments
Uses these arguments during connection

The key insight: these channel arguments are accessible via ChannelOption in C#!

The Solution

Understanding HTTP CONNECT Tunneling

HTTP proxies use the CONNECT method to create TCP tunnels:

Once the tunnel is established, TLS handshake and gRPC communication flow through transparently.

The Three Magic Channel Options

private Channel CreateChannel(string targetEndpoint, SslCredentials credentials, string proxyEndpoint)
{
    string proxy = proxyEndpoint?.Trim();

    if (!string.IsNullOrEmpty(proxy))
    {
        // Extract hostname without port for SSL validation
        string targetHost = targetEndpoint.Contains(":") 
            ? targetEndpoint.Substring(0, targetEndpoint.LastIndexOf(':')) 
            : targetEndpoint;

        var options = new[]
        {
            // 1. HTTP CONNECT tunnel target (host:port)
            new ChannelOption("grpc.http_connect_server", targetEndpoint),

            // 2. SSL SNI + certificate validation (hostname only)
            new ChannelOption(ChannelOptions.SslTargetNameOverride, targetHost),

            // 3. HTTP/2 :authority header (host:port)
            new ChannelOption(ChannelOptions.DefaultAuthority, targetEndpoint)
        };

        // Channel connects to PROXY, tunnels to TARGET
        return new Channel(proxy, credentials, options);
    }

    // Direct connection
    return new Channel(targetEndpoint, credentials);
}

What Each Option Does

1. `grpc.http_connect_server`

Purpose: Tells gRPC where to tunnel

What happens: When gRPC connects to the channel target (the proxy), it sends:

CONNECT api.example.com:443 HTTP/1.1
Host: api.example.com:443

Format: host:port (port is required!)

2. `SslTargetNameOverride`

Purpose: SSL/TLS hostname for SNI and certificate validation

The problem: Without this, gRPC would:

Send SNI for the proxy hostname
Validate the certificate against the proxy hostname
Both are WRONG — we need the actual target's certificate!

What happens:

TLS ClientHello contains correct SNI: api.example.com
Certificate validation checks against: api.example.com

Format: hostname (NO port — SSL certificates don't include ports)

3. `DefaultAuthority`

Purpose: Sets the HTTP/2 :authority pseudo-header

The problem: gRPC servers use :authority for routing. Without this override, it would be set to the proxy address.

What happens:

:method: POST
:scheme: https
:authority: api.example.com:443  ← Correct!
:path: /mypackage.MyService/MyMethod

Format: host:port (port often required for server routing)

Complete Sequence Diagram

Why Not Just Use Environment Variables?

Approach	Scope	Risk
`http_proxy` env var	Global (all HTTP)	May break other services
Channel Options	Per-channel	Isolated, controlled

Environment variables are global. If your application makes HTTP calls to multiple services, and only ONE needs proxying, you can't use environment variables safely.

Channel options give you surgical precision.

By understanding how gRPC handles proxies internally, we can configure per-channel proxy support without environment variables, keeping our other HTTP traffic unaffected.Configuring HTTP Proxy for gRPC in C# Without Environment Variables