Why does it look like LLMs consistently overestimate implementation time?

I have my suspicion: they estimate how long people would have taken to implement some feature, becasue they were trained on such data. I consistently see estimates of 2 week/3 weeks or 5 days, etc. But then implementation takes a day or 2 max using agents within Claude/GPT. Unless I am missing something? Anybody else notice this?

As a workarond, I have noticed that a better estimate of time is number of sessions required to fully implement, review, and test a given feature. You can ask the LLM to help you estimate how complex a feature will be during the planning phase. The number of tokens can easilly be translated to sessions, asumming that you normally aim to keep a session below 60% of the context limit. As a rule of thumb, a simple feature can be fully implemented, tested and reviewed in a single session. Whereas a complex feature might take multiple sessions. Then depending on your workload and attention span, you can have a decent idea of how many full context sessions you can actually plan and review in a single day. Of course, some features can happen in parallel but in general the bottleneck is your ability to understand the context that goes into each session.

If what you say is reality, I would keep that.

We have a tendency to give overly optimistic estimates, best case scenario, no other tasks, no roadblocks...

Whenever asked for an estimate, think how long it would take you to make it and multiply by 5.

>day or 2 max

I've frequently seen tasks that it thinks will take weeks being done in under an hour. And it will often recommend doing X instead of Y because X requires so much extra work. Basically I just remind it that it is an LLM.

If it worries something is error prone, I ask it to write tools to verify it.

The problem is that LLMs do not have a conceptual grounding in actual time. They estimate based on statistical correlation found in their training data which is filled with standard corporate project management timelines legacy codebases and waterfall estimates.

This is very common as they have no conception of time. They are just using ballpark figures based on what they see in the wild. I see estimates for modules in weeks/months when it can produce it in a single afternoon of prompting.

maybe they're rewarded for being under their estimated implementation time in their training. they could learn a similar behavior (to safely underestimate) in other contexts and that could've spilled over.

you can blame everything on wired quirks in the training (claude overusing the words "true" and "genuine" when their not needed, AIs using em-dashes because the pretrain has a ton of them)

They tend to turn a small change into the whole cleanup plan. Sometimes that is useful, but it makes estimates too large.

You ask LLMs for estimates? Interesting.

Probably big model providers should do calibratuons for that and add an estimation skill.

I've found that they declare estimates unprompted.

yes exactly - I have never asked them for an estimate

they estimate based on how software is usually built in organizations, not how fast it can be built with modern AI tools and agents.

they estimate as if human will build it

yeah this is my suspicion too

[flagged]