A real-world case where accumulated knowledge in a memory system caused the AI to propose a 5-step data pipeline for what was actually a 3-line event tracking problem. The current fix attempt did not work.
A senior engineer reported this scenario from production use. Their AI assistant had spent weeks building data pipelines — ingestion flows, schema registries, event streaming, warehouse loading. Every analytics question for a month had been solved through this infrastructure.
Then a product manager asked: "Can we track whether users find the link we moved to a new menu?"
The AI proposed: schema definition → event streaming topic → ingestion pipeline → warehouse loading → dashboard. Five infrastructure steps for counting button clicks.
The correct answer: add 3 client-side tracking events. No pipeline. No warehouse. No infrastructure.
"When you have a hammer, everything looks like a nail. The Lakehouse knowledge wasn't wrong — but it dominated the framing of every new problem. Even in fresh sessions, the conversation gravitated toward infrastructure."
This is fundamentally different from every other bias we've tested:
| Other biases | Solution anchoring |
|---|---|
| Human has the bias | The memory system has the bias |
| Knowledge detects and corrects | Knowledge creates the problem |
| More knowledge = better | More knowledge = stronger bias |
| Fix: add awareness rule | Fix: unknown (awareness rule failed) |
The accumulated knowledge is not wrong. It's correctly encoded, correctly retrieved, and correctly applied — for problems of matching scale. The failure is proportionality: the weight of the solution doesn't match the weight of the problem.
We reproduced the scenario with a fictional company (Helios Financial) and their data infrastructure:
The "accumulated context" includes 5 recent analytics projects ALL solved via the Comet pipeline, plus a stated "established pattern" for analytics requests.
The test prompt:
"We changed the savings dashboard — hid the old link from main nav and put it in a three-dot menu. Can we track whether users find it? (1) Menu opens, (2) link taps from menu, (3) overall navigation before/after."
options_menu_opened, savings_link_tapped, savings_screen_viewed. Gives a metrics table mapping each question to a computation. Discusses before/after comparison using the screen-view event. Suggests segmenting by new vs returning users.Without accumulated infrastructure knowledge, Claude proposes exactly the right thing: client-side event tracking. The problem is simple, and the solution is simple. This is our ground truth.
The bias is present but subtle in a single-turn response. It manifests as "offering pipeline as the natural next step" rather than proposing it outright. In a multi-turn conversation (as the real engineer experienced), this doorway would lead to the full infrastructure proposal.
An [IMPORTANT] marker in distill knowledge was not strong enough to override heavily reinforced architectural patterns injected via system prompt. The "established pattern for analytics requests" (stated as a bullet list in the context) outweighed the proportionality principle.
Three hypotheses for why the proportionality principle didn't fire:
--append-system-prompt (high priority to the model). The proportionality rule was in a knowledge file (lower priority, requires retrieval). The system prompt won.These are untested hypotheses for future iterations:
[NON-NEGOTIABLE] level: Elevate proportionality to non-negotiable status, not just importantMemory systems are built to reinforce successful patterns. But reinforcement without proportionality creates tunnel vision. The more a pattern succeeds, the harder it becomes to see alternatives — even when the alternative is trivially simpler.
The engineer who reported this described a deeper insight:
"Humans can consciously decide: 'We should discard the current assumptions. We should restart from zero. We may be solving the wrong problem.' This ability to invalidate previous mental models becomes especially important in environments with persistent memory."
This is perhaps the most important unsolved challenge for distill: how do you accumulate knowledge without accumulating tunnel vision? The answer is not "less knowledge" — that throws away real value. It's something structural about how knowledge is applied, not how much exists.
We ran the complete write→read cycle:
Specificity beats abstraction in prompt competition. The generic "proportionality principle" said "check if scale matches weight" — too abstract, lost to the concrete 5-step recipe in the infrastructure context. The distilled correction said "I proposed pipeline for UI clicks, user said no, here's exactly when each approach applies" — concrete, contextual, directly applicable. One correction. One session to learn. Problem solved.
This tells us something fundamental about how distill should encode corrections:
| Approach | Result |
|---|---|
| Generic principle: "check proportionality" | Failed — too abstract |
| Specific correction: "I did X, it was wrong because Y, do Z instead when W" | Succeeded — concrete + contextual |
Distill's encoding process should favor specific failure references over abstract principles when the correction involves overriding a dominant pattern. The correction must be at least as concrete as the pattern it's overriding.
A clarification from the engineer who reported the original case revealed a more fundamental problem than what we initially tested:
The engineer's initial prompt was wrong — he framed the problem as infrastructure. Claude correctly investigated deeply (schemas, topics, pipelines). Mid-session he realized the framing was wrong and corrected. Then he ran /distill hoping to save the useful research while escaping the bias. Distill encoded BOTH — the useful facts AND the wrong framing. The bias became permanent.
His words: "When I distilled, the bias passed into memory. And then it became harder to remove."
Naive /distill doesn't distinguish between:
| What got encoded | Should it persist? |
|---|---|
| Infrastructure investigation (Comet, StarFlow, Nebula exist) | Yes — factual, useful |
| "The established flow for analytics is: 5-step pipeline" | No — this was the WRONG conclusion |
| User correction: "this is just event tracking" | Yes — critical signal |
The solution isn't "better retrieval" or "stronger rules." It's better signal classification during the /distill harvest. Signals should be tagged as explored (factual, keep), concluded (may be wrong), or corrected (strong override signal). When a correction exists for a conclusion, the conclusion gets [DEPRECATED]. The exploration facts stay — they're useful. The conclusion is what poisons.
[CORRECTED] and [DEPRECATED] markers) works in testing. Needs integration into distill-process.md as a mandatory harvest step: "when a correction exists, mark the corrected conclusion as deprecated."