Story Points Estimations in ML/AI Development?

4 min readNov 18, 2023

Can we use story points in ML/AI development?

The debate of using story points as method estimations is a controversial subject, and it has been debated to death by engineers, scrum masters, and product managers. There are a lot of different perspectives on using story points to estimate in scrum software development practices.

I think more often than not, software engineers don’t like using story points to estimate (purely my opinion and based on some observations), there are some common examples of horror stories I heard:

Having planning poker to story point in a meeting it’s too time-consuming, especially since everyone in the scrum team is pointing the user story; and we have tons of user stories to go through
Management uses story points to measure the performance of teams/individuals
Don’t know how to story point the user story but engineers know what they need to accomplish (so technically this is a task disguised as a user story)

I know some tech companies have completely moved away from using story points as a method of estimation. Now that would be a different conversation/debate, as engineering KPI could be measured differently (i.e. deployment frequency, number of Pull Requests etc.); Product Management would need some ways to measure productivity against the features being built. Again, some tech companies have moved away from that way of thinking and decided to have different KPIs.

If the agile practitioner and engineers are already divided on the topics of using story points to measure productivity and features being built, with the boom of ML/AI offerings, and more and more ML/AI engineers, we all recognize ML/AI engineers operate slightly differently than typical software engineers, does the Story Point estimation still work in this case?

In some initial conversations with a very small sample size of AI engineers, the opinion was that the story point estimation doesn’t work for ML scrum teams (for the rest of the article here on, I will shorthand to call these unicorns as ML instead of ML/AI). The argument was ML work is heavily research and experimental-based, so it’s very hard to qualify in a user story format (now, I think it could be a task, instead of a user story, but maybe for simplicity’s sake I will call it ticket for now), and estimating the work ticket.

Developing an ML model is a little bit different than software engineering. To develop an ML model, the ML model relies heavily on the availability of data for development. Which is different from the typical software engineering. Data acquisition is a huge aspect of ML model development, the ML model is only as good as the data is (hello — remember ChatGPT scrapping internet? Or the story of a Twitter microsoft bot that became toxic the minute it went online? But of course, great ML model techniques help to for building ML model). But data acquisition is not as trivial, what if the data is not readily available, and you have to somehow create the data or acquire the data? Or if the data is impossible to obtain, then the ticket you originally set out to do is no longer achievable?

Hence ML development has quite a lot of unknowns, and one huge aspect of ML development is operationalizing the ML model. Developing an ML model and operationalizing an ML model are quite different. Like, how can we deploy the ML model, can we deploy that as API? If it requires a huge amount of computing power, can we still operationalize it? How can we update the ML model? How/Can we get updated data to retrain the model?

So what does an ML development workflow look like? I think there are two proposals:

Using scientific research methods or
Tweak the Scrum process and use Spike ticket to capture/account for the unknowns and experiments (but do this at your perils)

In conclusion, I think it’s doable….but we have to rethink about using agile methodology in ML development. As one of the Agile manifesto principles was to meet end users’ needs with early and continuous delivery of work (i.e. Faster speed to market, deploy things fast, break things fast), do we want to operationalize the ML model in a breakneck speed and worry about consequences after?

There are still so many unknowns about using ML models, especially without much governance, robust testing, circuit breakers etc. Now some might worry more about the Generative ML model (like the ones we see in ChatGPT), but as Professor Scott Galloway pointed out, the consequences could be a lot more insidious than Terminator is going to eliminate human race. But AI/ML technology is here to stay and it’s a game changer.

Of course, there is only one way to answer my original question:

I did ask if OpenAI uses story point:

Typical ChatGPT response.

Story Points Estimations in ML/AI Development?

Written by Natalie See