Quantifying Uncertainty in Estimates

Forecasting when a software project is going to be done is difficult. Nobody disputes this. Software is complex. It's path dependent. Even the best software has components coupled in ways that are not easy to anticipate. And while we're often asked to complete tasks that are very similar to things we've done in the past, we also frequently confronted with having to do things that are entirely new. In short, there is tremendous uncertainty when trying to estimate software completion dates.

Many in our industry shy away from making estimates for this reason. They know from experience that precision is impossible, but are asked to predict the future nonetheless. The most common estimation practices ignore this essential fact. In planning poker, the team is expected to produce a single point/time estimate for each task. A person wanting to know when a given user story will be addressed adds all the individual estimates ahead of it and divides by the team's velocity. What could be simpler?

That simple calculation obscures the uncertainty behind the individual estimates. It produces a result that suggests more precision than is warranted. This is exacerbated when estimates are treated as commitments with punishment meted out to those who fail to deliver accordingly. The end result is a whole industry hesitant to make estimates at all. When forced to do it anyway, those estimates frequently get padded to the point beyond where they're no longer useful.

Even though we know we should do estimates, when it's time to estimate a whole project, we often struggle with our tools. There's a better way.

Range Estimates

A Range Estimate captures and quantifies uncertainty in a fairly rigorous way. The most common type of range estimate is called a 50/90 estimate. In this case, each task in the project is given two estimates - an Aggressive But Possible (ABP) one and a Highly Probable (HP) one. Let's examine how it works by way of example.

Suppose you were trying to estimate how long it was going to take to fill your grocery cart with all of the ingredients for a new recipe you want to try. To make it interesting, we'll assume that the recipe involves you buying two ingredients you get frequently - vegetable oil and sliced mushrooms - and one ingredient you have never purchased - tamarind soup base.

To finish as quickly as possible, you make a plan that has you starting in the front of the with the cash registers, only going down the aisles containing the ingredients on the shopping list, and returning to the cash registers. You'll get done fastest if you don't have to retrace your steps, so you order your shopping list accordingly. First up is mushrooms which are in the produce section right in front. Next is the tamarind soup base. Then comes the vegetable oil.

Now it's time to estimate the total shopping time by predicting how long it will take to get each ingredient. As mentioned previously, each one gets two estimates of varying degrees of confidence.

The first is the Aggressive But Possible (ABP) estimate. It is one that is as likely to be wrong on the low side as it is on the high side. That is, you are 50% confident that you'll complete the task within the estimated time. The second estimate, called the Highly Probable (HP) estimate is one with much more confidence. If you're used to padding your estimates to boost confidence, you're used to making HP estimates. You want to be 90% confident that you'll complete a given task within the estimated time when making the HP estimate.

The Range Estimation technique understands that the truth frequently lies in the middle. If you were to add all of the ABP task estimates, you'd arrive at a number that is almost certainly too low for the collection. But if you sum all the HP estimates and use that, you are going to be too high.

The spread between ABP and HP is the measure of uncertainty. For something you've done before, that spread would probably be pretty small. For a task whose requirements are still unclear or which requires a brand new technology or component, the degree of uncertainty, the spread between ABP and HP, will be much larger.

Returning to our example, the first ingredient to buy is sliced mushrooms. You buy them all the time and know exactly where they are in the store. Your Aggressive But Possible estimate is 2 minutes and your Highly Probable estimate is 3 minutes. There's not much uncertainty here in part because you know right where they are. But you're also starting from a known point.

Next up is the tamarind soup base. You've never purchased that before, but you're guessing that it's with the other soups which are near the front. If you're right, it will take you 3 minutes after getting the mushrooms to find and get the soup. But if you're wrong, you have to go searching or asking for help. Your path going from mushrooms to soup might look the diagram below. To account for this undercertainty, you set your highly probable estimate to 10 minutes.

The final ingredient is the vegetable oil. In this case your ABP estimate is 3 minutes and your HP estimate is 6 minutes. Even though you know where it is in the store, you're a bit unsure where you'll be starting from. Time spent shopping, like software development, is path dependent.

If you assume that completion times for a given task follow a distribution that is approximately normal, then the ABP estimate represents the mean of the distribution. The HP estimate falls approximately two standard deviations above that.

Calculating the Estimate

Estimating the time for an entire project turns out to be relatively straight forward and solved statistical problem. We add to the sum of all ABPs a buffer term. The buffer is calculated as the square root of the sum of squares for the uncertainty intervals. This equates to establishing an overall estimate whose eventual completion time should fall within a 90% confidence interval.

For our example, the sum of the ABPs is

2 + 3 + 3 = 8 minutes

The buffer term is:

sqrt( (3-2)^2 + (10-3)^2 + (6-3)^2 ) = 7.7  minutes

Adding them together yields an estimate of 15.7 minutes of shopping time.

Extending the Technique

The 50/90 estimate as normally practiced produces a single estimate - 15.7 minutes in our example. Even though the value is calculated in a way to quantify uncertainty, communicating the result as a single number cuts against the idea that our estimates still carry uncertainty.

For this reason, we've extend the technique by calculating optimistic and pessimistic values that reflect the uncertainty in the underlying estimates. In our toy example, the spread from optimistic to pessimistic time spent shopping would be between 11.9 and 19.5 minutes.

Tooling

We do range estimates frequently. When we're making a proposal to a client, we give them an estimate which allows them to decide if a project is worth doing and if we're the appropriate partern. Once we're in the middle of a project, we are often asked for an updated forecast of when it will be done. In these cases, we use range estimates to arrive at the estimate.

It's unfortunate that more agile software estimate tools don't provide support for range estimates. There would be a lot less anxiety over having to make forecasts if the process was understood to account for the uncertainty we all feel. Because of the lack of tooling, we've built our own range estimation tool. It takes the form of a spreadsheet that allows a team to work collaboratively or in a distribute fashion to make estimates on all the tasks and stories associated with a project. You are able to plug in addition degrees of uncertainty such as the percentage of the project not yet specified - the unknown unknowns of requirements.

We're sharing that tool today in Google Sheets, Microsoft Excel, and Numbers formats. We hope you like it and would welcome your feedback and stories on how it's helped you to make more realistic project estimates.

Conclusion

There's no avoiding the fact that software estimation is difficult. The Range Estimation technique is no panacea and it's not appropriate in all circumstances. But when you're looking to make whole-project estimates that both capture and communicate the amount of uncertainty in the project, having this tool in your toolbox can improve both the transparency and accuracy of the result.