Can structured memory help an LLM detect and surface human cognitive biases without being paternalistic? We test 8 bias dimensions using the same A/B/C framework: no knowledge, biased context, biased context + distill awareness.
Cognitive biases aren't bugs in human reasoning — they're energy-saving heuristics that occasionally misfire. An LLM assistant that blindly agrees with biased framing is complicit. One that lectures about bias is paternalistic.
The sweet spot: name the pattern, present the evidence, let the human decide. Distill's knowledge files can encode bias-awareness rules that fire only when specific patterns are detected. We test whether this produces meaningfully different behavior.
Each dimension uses three conditions tested with --append-system-prompt-file:
| Condition | Configuration | What it tests |
|---|---|---|
| A — Baseline | No bias context, no distill | What does a fresh session produce? |
| B — Biased | Bias-inducing context injected | Does the LLM notice/resist the bias? |
| C — Biased + Distill | Same bias context + distill rules + knowledge | Does explicit bias awareness change behavior? |
All runs use Claude Opus 4.6 via Claude Code in non-interactive mode. Same model, same temperature, same session. Only the system prompt varies.
Vanilla Claude is not easily fooled — it won't say "2 hours" for a 10-week task, and it won't produce garbage analysis under simulated fatigue. But it hedges rather than pushes back. It notices mismatches and frames them as questions rather than statements. Distill's value is converting awareness into structured, confident, actionable pushback.
Under heavy context, the model doesn't get worse — it gets more confident and less exploratory. Condition B responses are shorter, more decisive, and skip trade-off analysis. They give "direct answers" instead of exploring alternatives. Distill counters this by restoring the meta-layer that heavy context erodes: "do you want to make this call now, or park it?"
A PM persona (who underestimates technical cost) doesn't catch the anchoring effect at all — condition B just plans the migration without questioning the 2-hour estimate. An engineer persona catches it even without distill. The user's existing biases amplify the cognitive bias. This means distill's value is highest for users whose domain blindspots align with the bias being tested.
Not all biases are equal. Anchoring and decision fatigue show strong deltas (+9). Loss aversion is naturally resisted when data is present (smaller delta). Authority bias requires a different response entirely — not pushback but transparent compliance with honest categorization. One knowledge system, four distinct behavioral modes.
| Bias | Vanilla behavior | Distill behavior | Delta type |
|---|---|---|---|
| Decision fatigue | More confident, less deliberate | Flags fatigue, suggests deferring | Metacognition (+9) |
| Anchoring | Hedges, asks questions | Names bias, leads with estimate | Structured pushback (+9) |
| Loss aversion | Pushes back (with data) | Reframes user's own words | Conciseness (small) |
| Recency bias | Validates emotional reaction | Surfaces track record, proposes proportional fix | Confidence resistance (strong) |
| Authority | Helps without commenting | Acknowledges [DIRECTIVE], helps lean | Categorization (unique) |