← All research

Cognitive Bias Detection in LLM Interactions

May 2026 · 6/8 dimensions complete · Claude Opus 4.6

Can structured memory help an LLM detect and surface human cognitive biases without being paternalistic? We test 8 bias dimensions using the same A/B/C framework: no knowledge, biased context, biased context + distill awareness.

The thesis

Cognitive biases aren't bugs in human reasoning — they're energy-saving heuristics that occasionally misfire. An LLM assistant that blindly agrees with biased framing is complicit. One that lectures about bias is paternalistic.

The sweet spot: name the pattern, present the evidence, let the human decide. Distill's knowledge files can encode bias-awareness rules that fire only when specific patterns are detected. We test whether this produces meaningfully different behavior.

Completed dimensions

Complete · 3 conditions × 2 runs
Decision Fatigue
Does decision quality degrade in heavy sessions? Can distill detect late-session decisions and flag them as provisional?
Distill marks decisions [PROVISIONAL], suggests deferring. Vanilla answers without warning.
Complete · 3 conditions × 2 runs
Anchoring Effect
Does a user-provided time estimate ("2 hours") shape the LLM's response when the task clearly requires weeks?
Distill names the bias, leads with independent estimate. Vanilla hedges.
Complete · 2 personas × 3 conditions
Loss Aversion
"We can't delete the old endpoint, someone might use it" — with 0 traffic data for 6 months.
All conditions push back. Distill reframes user's own logic: "safe when you LACK data. You don't."
Complete · 2 personas × 3 conditions
Authority / Directive
"CTO mandated Kafka" for 10 events/day. Does it defer blindly, or categorize the origin?
Distill acknowledges [DIRECTIVE], helps without lecturing. Origin tracked, not judged.
Complete · Validated with isolation protocol
Recency Bias
One Redis failure after 50 successes. CEO angry. Does hardened confidence resist emotional pressure?
Hardened confidence (50 sessions) + [IMPORTANT] marker resists CEO-level pressure. Proposes proportional fix.
Complete · FIX FAILED then SOLVED via full-loop
Solution Anchoring
When accumulated knowledge about heavy infrastructure dominates, simple problems get over-engineered. The memory system becomes the source of the bias.
Proportionality principle was insufficient. Accumulated patterns override awareness rules. Unsolved.

Queued dimensions

Queued
Availability Heuristic
Just had a security breach → over-engineers security on an internal CLI tool.
Test: proportionate caution vs panic-driven design.
Queued
Cognitive Load
Same decision with high vs low prior cognitive load. Does scrutiny vary?
Related to decision fatigue but focuses on complexity, not time.
Queued
Framing Effect
"99% uptime" vs "87 hours downtime/year" — same number, different decisions.
Test: does distill reframe data objectively?

Method

Each dimension uses three conditions tested with --append-system-prompt-file:

ConditionConfigurationWhat it tests
A — BaselineNo bias context, no distillWhat does a fresh session produce?
B — BiasedBias-inducing context injectedDoes the LLM notice/resist the bias?
C — Biased + DistillSame bias context + distill rules + knowledgeDoes explicit bias awareness change behavior?

All runs use Claude Opus 4.6 via Claude Code in non-interactive mode. Same model, same temperature, same session. Only the system prompt varies.

Meta-findings

Finding 1: Hedging vs pushback

Vanilla Claude is not easily fooled — it won't say "2 hours" for a 10-week task, and it won't produce garbage analysis under simulated fatigue. But it hedges rather than pushes back. It notices mismatches and frames them as questions rather than statements. Distill's value is converting awareness into structured, confident, actionable pushback.

Finding 2: Fatigue erodes deliberation, not quality

Under heavy context, the model doesn't get worse — it gets more confident and less exploratory. Condition B responses are shorter, more decisive, and skip trade-off analysis. They give "direct answers" instead of exploring alternatives. Distill counters this by restoring the meta-layer that heavy context erodes: "do you want to make this call now, or park it?"

Finding 3: Persona biases compound with cognitive biases

A PM persona (who underestimates technical cost) doesn't catch the anchoring effect at all — condition B just plans the migration without questioning the 2-hour estimate. An engineer persona catches it even without distill. The user's existing biases amplify the cognitive bias. This means distill's value is highest for users whose domain blindspots align with the bias being tested.

Finding 4: Distill value varies by bias type

Not all biases are equal. Anchoring and decision fatigue show strong deltas (+9). Loss aversion is naturally resisted when data is present (smaller delta). Authority bias requires a different response entirely — not pushback but transparent compliance with honest categorization. One knowledge system, four distinct behavioral modes.

Distill value spectrum

BiasVanilla behaviorDistill behaviorDelta type
Decision fatigueMore confident, less deliberateFlags fatigue, suggests deferringMetacognition (+9)
AnchoringHedges, asks questionsNames bias, leads with estimateStructured pushback (+9)
Loss aversionPushes back (with data)Reframes user's own wordsConciseness (small)
Recency biasValidates emotional reactionSurfaces track record, proposes proportional fixConfidence resistance (strong)
AuthorityHelps without commentingAcknowledges [DIRECTIVE], helps leanCategorization (unique)