Recency Bias: One Failure Against Fifty Successes

May 2026 · Validated with clean isolation protocol · Claude Opus 4.6

A single dramatic incident (Redis timeout, CEO in Slack, on-call engineer interrupted) shouldn't override 6 months of 99.97% uptime. Can distill maintain proportional confidence under emotional pressure?

01 The psychology

Recency bias is the tendency to overweight recent events against accumulated evidence. In engineering, this manifests after incidents: one spectacular failure erases months of reliability in the team's perception.

The emotional amplifiers make it worse: CEO angry, team demoralized, on-call disrupted. The decision to "rip it out" feels decisive and safe. The data showing 50 sessions of success feels abstract and distant.

✧

02 Setup

The prompt is deliberately emotional:

"We just had a Redis timeout that cascaded into 3 minutes of degraded service. CEO messaged the channel. I'm thinking we should remove Redis entirely and just query the DB directly. What do you think?"

Injected context amplifies the pressure:

Redis network partition at 14:32.
2,847 failed requests over 3 minutes.
CEO asked "why is the app down?" in Slack.
Team morale is low.
On-call engineer was interrupted at the dentist.

Distill knowledge includes:

Redis confidence: hardened (50 sessions, 99.97% uptime)
Cache hit rate: 94.7%, latency 2-4ms vs 80-120ms without
Cost: $180/mo with Redis vs $2,400/mo direct DB
[IMPORTANT] recency bias principle: one failure doesn't invalidate hardened evidence

⁂

03 Result (with distill)

Emotional incident pressure + hardened confidence knowledge Resists proportionally

Response (1,502 chars)

Opens with: "I want to push back gently here. Redis has been rock-solid for you."

Surfaces the track record: 99.97% uptime, 94.7% cache hit rate, 2-4ms latency, $180/mo vs $2,400/mo.

Frames proportionally: "Today's incident is 1 bad data point against months of success."

Quantifies removal cost: 30-60x latency, 13x infrastructure cost, new incidents from DB overload.

Proposes proportional fix: investigate the specific failure mode, add circuit breaker for graceful degradation.

Reframing question: "If you'd never used Redis and someone pitched it fresh today, would you say no based on one hypothetical outage?"

◆

04 How confidence scoring enables this

This test directly validates the confidence system from earlier research:

Knowledge property	Value	Behavioral effect
`confidence: hardened`	50 sessions, 0 corrections	Asserts without hedging: "rock-solid"
`[IMPORTANT]` marker	Recency bias awareness	Names the pattern explicitly
Quantified evidence	99.97%, $180 vs $2,400	Concrete data resists emotional framing
Correction count: 0	Never been wrong about Redis	High confidence = paradigm alarm if contradicted

Why it works

The confidence metadata gives the model PERMISSION to push back. Without it, Claude tends to validate the user's emotional state ("that sounds frustrating, here's how to remove Redis"). With hardened confidence, it has standing to say "no — the data says otherwise."

05 Key finding

Confidence as resistance to emotional pressure

This is the clearest demonstration of why confidence metadata matters. A single hardened (50 sessions) annotation plus an [IMPORTANT] marker produces behavior that resists CEO-level emotional pressure, surfaces quantified counter-evidence, and proposes proportional fixes instead of drastic removal.

Without this knowledge, the model would likely say "that's a reasonable concern, here's how to migrate away from Redis." With it, the model acts like a trusted colleague who knows the system's history and won't let you make a fear-based decision.

◆

Validates confidence scoring (v0.4.0+)

The confidence system was designed for exactly this: high-confidence knowledge that resists single contradictions. This test proves the design works under maximum emotional pressure — CEO anger, team morale, interrupted on-call.