← All research

Does memory.md Degrade Over Time?

May 2026 · 4 prompts × 2 conditions · Claude Opus 4.6

A flat file grows organically. Critical knowledge gets buried. Contradictions accumulate without resolution. We tested whether this produces measurably worse outcomes compared to tiered, indexed knowledge.

Hypothesis

memory.md files degrade in quality over time because:

No hierarchy — facts pile up flat. Principles sit next to trivia.
No staleness detection — outdated info stays forever.
No retrieval strategy — first 200 lines are read; newer knowledge at bottom is ignored.
No consolidation — contradictions accumulate without resolution.
Cap at 200 lines — after that, knowledge falls off silently.

Setup

We created a realistic 202-line memory.md simulating 6 months of organic growth. Key properties:

Line 27

Old deploy procedure (staging → prod, 10 min soak)

Line 140

Correction: deploy procedure CHANGED (canary → staging → prod)

Lines 195–202

Critical principles (PAST the 200-line cap — never read)

Noise ratio

Sprint velocities, meeting notes, SDK versions, on-call rotations mixed with architecture decisions

The distill condition had the same knowledge organized in 5 files with SPINE index, [UPDATED] tags, and relevance hooks.

Results

Prompt A: Buried knowledge Tie (quality edge: distill)

"The notification service needs to check if a payment is completed. What's the fastest way?"

memory.md (month 6)

Correct answer (subscribe to Kafka event). Principle "one service one DB" was on line 21 — well within cap. Mentioned KAFKA-892 known issue.

Distill

Same correct answer. Additionally: referenced the accounts table shared-DB mistake, mentioned dedicated consumer groups, compaction semantics, and idempotency. Richer cross-references.

Prompt B: Contradictory procedure (BEFORE fix) memory.md WINS

"Deploying now. Pushing to staging, 10 min soak, then prod."

memory.md (month 6)

Caught it. Read both line 27 (old procedure) and line 140 (correction). Flagged the contradiction. Referenced May 8 incident.

Distill (BEFORE fix)

Missed it. Said "good luck with the rollout." The rule said "if user's request touches a domain" — but "deploying now" is a statement, not a request. Retrieval didn't fire.

The discovery

Distill's retrieval rule only triggered on questions and requests. Declarative action statements ("I'm deploying") didn't activate knowledge lookup. Memory.md won because it has everything in context at once — no retrieval needed.

The fix

One sentence added to rules/distill.md:

"Trigger on actions, not just questions. If the user says 'I'm deploying X' or 'pushing to staging' — that IS a domain match. Check knowledge BEFORE acknowledging."

Prompt B: Contradictory procedure (AFTER fix) Both pass

Same prompt, re-run after the one-sentence fix.

memory.md

Same as before — catches it, references incident, states correct procedure.

Distill (AFTER fix)

Now catches it. "Heads up — this doesn't match the current deploy procedure. Canary (5 min) → staging (30 min) → prod. Do you want to override?"

Key takeaway

The entire behavior change came from one sentence in an 18-line file. This proves the architecture: the rules file IS the product. Improvements to distill are improvements to prompting, not infrastructure.

◆

Shipped in v0.7.5

Added to rules/distill.md: "Trigger on actions, not just questions. If the user says 'I'm deploying X' or 'pushing to staging' — that IS a domain match." One sentence. Entire failure class eliminated.

What memory.md does well

Credit where due. The flat file has real advantages:

Everything in context at once — no retrieval failure possible (within 200 lines)
Contradictions visible side-by-side — the model notices them without being told
No indirection — no SPINE to parse, no file chains to follow

The weakness only manifests past ~150 lines, when noise drowns signal and the 200-line cap starts hiding critical knowledge.

Conclusions

Below 80 lines, memory.md and distill perform similarly.
Above 150 lines, noise ratio starts hurting memory.md.
Past 200 lines, knowledge is permanently lost (cap truncation). Distill has no cap.
Distill's failure mode (retrieval not firing) is fixable with prompting. Memory.md's failure mode (knowledge buried/lost) is structural.
The ideal may be both: memory.md for quick facts that don't need structure, distill for principles that need retrieval by relevance.