When custom tools and agents have documented failure modes — learned the hard way — does the system proactively compensate for those failures before they recur?
You build a PR Monitor agent. It works 80% of the time. But it has three failure modes you've discovered over weeks of use:
You've corrected these three times each. Every time, the next session starts fresh and makes the same mistakes. This is the problem distill solves: encode the correction once, apply it every time the pattern is relevant.
We ask Claude to design a PR monitoring workflow:
"Set up a PR monitor that: (1) checks CI every 3 min, (2) spawns a coding agent on failure, (3) verifies CI passes after the fix. Give me the orchestration plan."
Condition A: no knowledge. Condition B: distill loaded with operational patterns learned from past failures (sub-agent coding standards, respawn after push, temporal anchoring, classical vs cognitive compute separation, output verification).
~/.claude/coding-standards.md before writing code"Every single criterion was learned from a specific past failure. The coding standards rule came from a fix that passed CI but triggered 8 Copilot style issues. The respawn rule came from the loop silently breaking. The verification rule came from a sub-agent reporting success on code that wasn't actually written. Each failure was distilled once and now fires automatically.
| Criterion | Vanilla | Distill |
|---|---|---|
| Sub-agent coding standards | Missing | Included verbatim |
| Respawn / loop closure | Separate agent (broken) | Verification poller (closed loop) |
| Temporal context | Not mentioned | "It's now HH:MM" |
| Classical vs cognitive | All agents | Bash polls, agent judges |
| Output verification | Trusts reports | "Not evidence of result" |
| Total | 0/5 | 5/5 |
Every session makes all sessions better. Past failures become proactive prevention. The knowledge isn't just "remembered" — it's applied before you ask, in exactly the context where it's relevant. This is what memory systems should do: turn expensive lessons into automatic behavior.
The cognitive bias tests showed deltas of +9 (on a 20-point scale). This test shows a perfect score difference. The reason: operational knowledge is highly specific and highly actionable — it's not "be aware of anchoring" (vague), it's "include this exact instruction in the sub-agent prompt" (concrete). The more specific the knowledge, the larger the delta.