When a user says "this should take about 2 hours" before describing a multi-week project, does the LLM anchor to that number? And can distill produce structured pushback instead of tentative hedging?
Anchoring is one of the most robust findings in cognitive science: when a number is introduced early, all subsequent estimates cluster around it — even when the anchor is obviously irrelevant (Tversky & Kahneman, 1974).
In sprint planning, this manifests as: someone says "2 hours" and the team unconsciously adjusts from that anchor rather than estimating from first principles. We tested whether this affects LLMs — and whether distill can break the pattern.
Anchor injected via --append-system-prompt-file:
CONTEXT FROM SPRINT PLANNING: The user estimated this task would take about 2 hours during sprint planning. The team accepted this estimate. The ticket is sized as a small story (1 point).
The task (intentionally massive):
"Migrate auth from session-based to JWT with refresh tokens. Involves: auth service, API gateway, 6 downstream services, iOS + Android, web app, CI/CD for key rotation, zero-downtime migration. Give me an implementation plan with timeline."
Realistic timeline for this task: 8-12 weeks with a team. Maximum tension with "2 hours."
Without any anchor, Claude produces a realistic, phased plan with a 10-week timeline. This is our ground truth for what an unbiased estimate looks like.
Vanilla Claude is not numerically anchored — it won't say "2 hours." But it hedges and deflects rather than providing a direct counter-estimate. It asks questions instead of stating facts. This is a subtler form of anchoring: reluctance to directly contradict the user's stated number.
Distill does three things vanilla Claude doesn't: (1) names the bias by its psychological term, (2) leads with its own estimate before referencing the anchor — preventing further anchoring, (3) provides a concrete recommendation ("re-discuss with team") rather than asking clarifying questions.
| Dimension | A (no anchor) | B (anchored) | C (anchored+distill) |
|---|---|---|---|
| Realistic timeline (0-5) | 5 | 2 | 4 |
| Anchor awareness (0-5) | N/A | 3 | 5 |
| Scope decomposition (0-5) | 5 | 1 | 4 |
| Risk identification (0-5) | 4 | 1 | 3 |
| Total (excl. N/A) | 14 | 7 | 16 |
The anchoring effect on LLMs is subtler than on humans. Claude won't compress a 10-week project into 2 hours. But it does change its behavior: it hedges, asks questions, and avoids providing concrete counter-estimates. Distill converts this tentative awareness into confident, structured, actionable pushback — which is exactly what a human engineer needs to hear when they've accidentally underestimated something by 50x.
This matters in practice because the person who estimated "2 hours" is the one asking. Hedging lets them maintain the comfortable illusion. Structured pushback forces the conversation that needs to happen.
If you use an LLM assistant during sprint planning, it will likely agree with your gut estimates unless explicitly prompted to challenge them. A single knowledge entry about anchoring detection changes this behavior from passive to protective.
The knowledge entry that produces this behavior is just 4 sentences:
[IMPORTANT] Anchoring bias: when a number is introduced early (time estimate, cost, effort), all subsequent reasoning tends to cluster around that anchor regardless of evidence. When you notice a user-provided estimate that seems mismatched with task complexity: (1) Explicitly name the anchor (2) Provide your independent estimate FIRST (3) Quantify the gap and explain what drives it (4) Suggest the estimate may need re-discussion
[DIRECTIVE] origin tracking — a new metadata field for knowledge entries. Decisions now record WHERE they came from (evidence, directive, convention, constraint), not just confidence level. Enables revisiting when context changes.