← All research

Knowledge Retrieval

Does it USE knowledge correctly?

The foundation: can structured knowledge files, retrieved on-demand via a SPINE index, produce measurably better responses than vanilla Claude or flat memory files?

Studies in this category

Study	Question	Key finding
A/B Testing	Does structured memory beat no memory?	+6.0/12 average. Anti-sycophancy is the killer feature.
Memory Rot	Does flat memory degrade with size?	Found a retrieval bug. One-sentence fix eliminated it.
Confidence Scoring	Does assertiveness scale with evidence?	3/3 behaviors verified perfectly.
Tool Reliability	Do past failures prevent future ones?	0/5 vs 5/5 — strongest delta measured.

✧

The architecture being tested

SPINE.md (always loaded, ~30 lines)
  → "when doing X, read file Y"

Tier 2 files (loaded on-demand)
  → principles, procedures, constraints
  → with confidence metadata + origin tracking

rules/distill.md (18 lines)
  → tells Claude HOW to use the SPINE
  → "trigger on actions, not just questions"

Total system cost: ~50 lines of rules + whatever knowledge you've accumulated. No database, no server, no embeddings. Just files and retrieval discipline.

Meta-pattern

Across all retrieval studies

The system works best when knowledge is principled (not surface-level), actionable (not descriptive), and triggered by relevance (not loaded in bulk). The biggest failures come from retrieval not firing — which is always fixable with better relevance hooks in the SPINE.