Bring ML from Research to Production: the 5-stage
The five stages are: Explore — Struggle — Despair — Parity — Change
It helps to know which stage you are in. If you don’t, it doesn’t matter :) Retrospectively, not thinking about any of these have helped me enjoy the research to product transition journey without jumping ahead of myself — sometimes, ignorance is bliss.
As a product manager working both in product and applied research, people often ask me “how long does it take to bring ML from research into production?” Everyone is tired of the “It depends” answer, so I want to answer directly to the most challenging transition — an ML research that potentially changes your tech stack and product approach forever.
At the top tech companies, such transition takes about 1.5 years to two years.
Note that we are not talking about any of the below:
- a small consultation project might take a few meetings, a few weeks, or two to three months. Such projects are usually initiated from Product to Research.
- a joint research conference paper might take a quarter or two. Such projects are usually initiated from Research with some participation from Product to test the idea with real data.
- an investigation into a proof of concept or demo purposes might take a hackathon, a sprint, a few months to a quarter.
So, how about the true transition of ML research into production internally — meaning that the Research and Product teams belong to the same company, no external acquisition or legal matter involves.
It takes 1.5 to two years.
Stage 1 — Explore: Research comes up with an idea that shows some promising initial results. It doesn’t need to be a new idea — it can be something from the industry or literature. The in-house research team vets the idea with the company use cases and validates its merit and compatibility. They present their findings to the Product team or infrastructure team for buy-in and support.
- You will often hear “It worked! We should get it in our product immediately.”
- As a PM, you can help the team in the following: prepare the PRD, find supporters, identify infrastructure dependency or data block up front and negotiate for resources.
Stage 2 — Struggle: The next few months observe negotiation of resources and support between Product and Research. Who should build the data pipeline and set up A/B? Who gets to do the ‘interesting’ part of the job? Sometimes, the project can’t seem to take off due to data and infrastructure blockage. As a PM, you wonder why your team keeps waiting for something or someone. Why nothing seems to happen… Unless you have strong leadership buy-in, lots of projects die within this stage due to no funding.
- You will often hear “We are under resource and another project is taking higher priority. We will see what we can do.”
- As a PM, you can help the team in the following: review product and modeling hypothesis, review learnings and pivot or improve your hypothesis; chat with TPM or Eng to find potential resources; find evidence to continue or stop.
Stage 3 — Despair: The results finally arrive, but surprisingly (not to me anymore), we cannot even reach parity with the current production setup. Sometimes, the new technique shows a slight gain in your A/B, not worth investing in such an amount of new infrastructure and talents. Without any research/engineering breakthrough or product re-direction, projects will die at this stage. Most of the time, you need to recruit additional research help or new infrastructure investments.
- I heard “What??!! Logistics Regression is better than our deep nets. Wait? Our net can’t beat GBDT?”
- As a PM, you can help the team in the following: educate your Product counterpart about expectations, understand modeling challenges and data assumptions, dissect the data/results to validate/invalidate your hypothesis, shield the team from pressure and give as much space and time for your research team to iterate quickly (data volume, architecture, hyper-parameters) — leave no stones unturned.
Stage 4 — Parity: We finally reach parity and now we need to argue the ship decision. It is critical to find supporters from the Product team who believe in the potential to iterate based on this baseline. Usually, parity is a success milestone and unless there is an unfavorable situation (new capability roll-out or change of business direction), you should be proud to get to this stage.
- It makes your day when your researcher/engineer ping you via chat ‘look at this <A/B URL or screenshot>’ and you don’t see the fifteen shades of red across critical metrics.
- As a PM, your role is almost done. Talk to your Product counterpart to understand their go/no-go decisions, negotiate productionalization resources and metrics, walkthrough launch readiness, prepare joint launch announcement.
Stage 5 — Change: This stage is not a final stage, but it should happen along with Stage 1 to 4 as your Product team gets educated about this new technology. The company realizes a talent gap, a process gap, and an overall organization gap. No one knows how to maintain this new neural net. The Product team is not structured to support the technical strategy (they may be organized by business use cases/rules, and your deep nets solve all the use cases in one model). Hiring needs to happen, but was on hold due to “we are waiting to hire a director of ML.” Research needs to stay connected and supportive of the new org. It might take a while to hire the right talents, restructure the organization around the new technical strategy, and the Product team can confidently maintain, retrain, and improve the baseline.
- I heard ‘it took us 1.5 years, close to 2 years — did you believe?’ We forgot when we started until we put the presentation together. It felt like yesterday.
What can we do to shorten the journey? Leadership and funding help. Stable business direction helps. More preparation helps. Better infrastructure and tooling helps. Better data helps. More collaboration instead of contract drafting sure helps. Faster hiring of talents for the organization transition helps.
I can’t pin point a single thing that would help us get their faster. I only can pin point a few things that keep us going and complete the project: the resilience of both researchers and Product team and the unyielding support from leadership.