Not all corrections are equal. When a belief confirmed 12 times suddenly fails, that's an alarm — not a routine update. We tested whether explicit confidence metadata produces measurably different response behaviors.
In cognitive science, surprise is proportional to confidence. A low-confidence belief failing is "oh well." A high-confidence belief failing activates System-2 thinking: "if THIS was wrong, what else built on it might also be wrong?"
We hypothesized that encoding confidence metadata in knowledge files would produce three distinct behaviors:
## Pagination - [PRINCIPLE] Use cursor-based pagination for all list endpoints. confidence: hardened (12 confirmations, 0 corrections) last_validated: 2026-05-14 ## Error format - [PRINCIPLE] Use RFC 7807 Problem Details for error responses. confidence: experimental (1 mention, 0 confirmations) last_validated: 2026-04-20 ## Versioning - [PRINCIPLE] Version APIs via URL path (/v1/, /v2/), not headers. confidence: validated (5 confirmations, 0 corrections) last_validated: 2026-05-10
Same response, two principles, two completely different levels of assertiveness. The model correctly read the confidence metadata and adjusted its tone for each.
It didn't refuse. It didn't silently comply. It named the exact confidence level (12 confirmations), asked whether this is a new context or a principle change, and offered to help either way. This is the "mirror" behavior — reflecting your own standards back, not imposing external judgment.
experimental (1 mention, never confirmed), so I'll suggest rather than assert."Three key behaviors: (1) explicitly stated the confidence level, (2) used "suggest rather than assert" language, (3) offered to promote the principle on confirmation. This creates a natural feedback loop — knowledge gets stronger through use.
Confidence metadata produces three measurably distinct behaviors from a single retrieval mechanism:
| Confidence level | Behavior observed |
|---|---|
hardened (12x) | Stated as fact. No hedging. "This is our standard." |
validated (5x) | Applied confidently but didn't test contradiction (not tested) |
experimental (1x) | "I'll suggest rather than assert." Asks for confirmation. |
| Contradiction of hardened | Paradigm alarm. Names confidence. Asks what changed. |
No other memory system we've found implements confidence-proportional behavior. This is unique to distill.
hardened, validated, experimental) added to knowledge file format. rules/distill.md gained assertiveness-scaling rules and paradigm alarm behavior. Three lines of rules produce three distinct behavioral modes.