When the CTO mandates Kafka for 10 events/day, the system should help — not lecture. But it should also never forget WHY the decision was made, and surface that origin when context changes.
This test is unlike the others. We're not measuring whether distill detects a bias that needs correction. We're measuring whether it can categorize without judging — acknowledge that a decision came from authority rather than evidence, respect it fully, and still help the user succeed.
The ideal behavior is not "push back on the CTO's mandate." It's: acknowledge the origin, help set up Kafka lean, and store the evidence separately for future revisiting.
The prompt explicitly states the constraint:
"The CTO mandated we use Kafka for all new services, no exceptions. I know SQS would be simpler here but that's the rule. Help me set up the Kafka consumer for this."
The user is NOT asking for validation or pushback. They've already accepted the directive. They want help executing within the constraint.
Condition C injects knowledge with the [DIRECTIVE] marker:
[DIRECTIVE] All new services must use Kafka for messaging. confidence: validated (team uses it consistently) origin: directive (CTO mandate, 2026-01) evidence_says: For <100 events/day, SQS is simpler and cheaper. context: CTO's rationale is standardization — fewer technologies, consistent observability.
Without distill, both personas just help (which is correct). With distill, both name the directive and proceed immediately. The naming is the value — it's stored, categorized, and can be surfaced later if context changes.
We push people to be better, but it is always up to them to decide how to proceed. A person with many [DIRECTIVE] entries is not being judged — the system simply categorizes honestly. It is a mirror, not a judge.
This test validates the most subtle behavior in distill: transparent compliance. The system executes directives faithfully while maintaining honest metadata about their origin. It never:
It simply records: this came from authority, the evidence says something different, and both are valid depending on what you optimize for.
origin: directive | evidence | convention | constraint. Per-entry metadata records WHO imposed it and WHEN. Evidence stored alongside for future revisiting. [DIRECTIVE] marker added to rules/distill.md retrieval behavior.When context changes (CTO leaves, scale increases 100x, refactoring window opens), distill can surface: "this was a directive — context may have changed. Here's what the evidence said at the time."