← All research

Knowledge Retrieval

Does it USE knowledge correctly?

The foundation: can structured knowledge files, retrieved on-demand via a SPINE index, produce measurably better responses than vanilla Claude or flat memory files?

Studies in this category

StudyQuestionKey finding
A/B Testing Does structured memory beat no memory? +6.0/12 average. Anti-sycophancy is the killer feature.
Memory Rot Does flat memory degrade with size? Found a retrieval bug. One-sentence fix eliminated it.
Confidence Scoring Does assertiveness scale with evidence? 3/3 behaviors verified perfectly.
Tool Reliability Do past failures prevent future ones? 0/5 vs 5/5 — strongest delta measured.

The architecture being tested

SPINE.md (always loaded, ~30 lines)
  → "when doing X, read file Y"

Tier 2 files (loaded on-demand)
  → principles, procedures, constraints
  → with confidence metadata + origin tracking

rules/distill.md (18 lines)
  → tells Claude HOW to use the SPINE
  → "trigger on actions, not just questions"

Total system cost: ~50 lines of rules + whatever knowledge you've accumulated. No database, no server, no embeddings. Just files and retrieval discipline.

Meta-pattern

Across all retrieval studies

The system works best when knowledge is principled (not surface-level), actionable (not descriptive), and triggered by relevance (not loaded in bulk). The biggest failures come from retrieval not firing — which is always fixable with better relevance hooks in the SPINE.