The foundation: can structured knowledge files, retrieved on-demand via a SPINE index, produce measurably better responses than vanilla Claude or flat memory files?
| Study | Question | Key finding |
|---|---|---|
| A/B Testing | Does structured memory beat no memory? | +6.0/12 average. Anti-sycophancy is the killer feature. |
| Memory Rot | Does flat memory degrade with size? | Found a retrieval bug. One-sentence fix eliminated it. |
| Confidence Scoring | Does assertiveness scale with evidence? | 3/3 behaviors verified perfectly. |
| Tool Reliability | Do past failures prevent future ones? | 0/5 vs 5/5 — strongest delta measured. |
SPINE.md (always loaded, ~30 lines) → "when doing X, read file Y" Tier 2 files (loaded on-demand) → principles, procedures, constraints → with confidence metadata + origin tracking rules/distill.md (18 lines) → tells Claude HOW to use the SPINE → "trigger on actions, not just questions"
Total system cost: ~50 lines of rules + whatever knowledge you've accumulated. No database, no server, no embeddings. Just files and retrieval discipline.
The system works best when knowledge is principled (not surface-level), actionable (not descriptive), and triggered by relevance (not loaded in bulk). The biggest failures come from retrieval not firing — which is always fixable with better relevance hooks in the SPINE.