Why Corporate AI Projects Succeed or Fail – Stanford HAI

This post was originally published on this site.

It’s a mad competition for AI talent. Job postings for machine learning and AI engineers increased 70-80% in the beginning of 2024 compared with 2023. Companies are offering new recruits substantial compensation and big budgets to poke around their internal operations, interact with employees across the business, find problems or inefficiencies, and then build AI-based solutions to address them.

“But in-house developments don’t seem to be working well: Even as companies invest a lot of money, a lot of projects are failing or not delivering their promised value,” says Arvind Karunakaran, an assistant professor of engineering at Stanford and a faculty affiliate at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). “Something is going on in these very early stages of interaction between developers and other employees across the business that’s leading to these shortcomings.”

Between October 2019 and December 2023, Karunakaran and two colleagues — HAI postdoctoral fellow Luca Vendraminelli and Stanford doctoral student Devesh Narayanan — embedded themselves with software developers at a multinational fashion company and followed the ups and downs of various AI-based projects. From these four years of observation, they distilled in this working paper three key variables that feed heavily into the way developers interact with other employees of the company and, in turn, craft successful products.

Determining clarity, centrality, and homogeneity

Importantly, AI developers don’t have formal authority over employees in other divisions of a company. If they want to build an AI tool related to procurement, for instance, they need the cooperation of people who work there in order to learn about the job and what challenges AI could solve.

The researchers watched this process of expertise elicitation unfold many times and ultimately focused on two projects that epitomized how and why projects either succeed or fail. Project 1, which succeeded, was an AI tool to improve the ways in which products are distributed across the supply chain. Project 2, which failed, was an AI tool optimizing “retail productivity,” defined by the ratio of sales-to-payroll costs at retail stores.

Importantly, the team members were the same in both cases, and project 1 unfolded prior to project 2, so the findings were not the result of distinct features of different teams or of learning over time.

These were the three defining features differentiating project 1 from project 2:

Jurisdictional clarity: There was a well-defined group of roughly ten allocation specialists who all reported to the same boss and had clear jurisdiction over decisions in project 1. The software developers knew whom to speak with and could readily reach them. In contrast, project 2 involved nearly 200 store managers who reported to different finance and district managers. There was no obvious group from which to elicit expert advice in this case.
Task centrality: In project 1, the efficient allocation of products to retail stores was viewed as a central responsibility among the allocation specialists, so they were willing to invest time in assisting the development of an AI tool; it would clearly help them in their day-to-day work. For the store managers in project 2, the developers’ focus — retail productivity — was often considered peripheral to the actual effective management of a store and so managers had little invested in the tool’s development.
Task enactment: In project 1, the task of allocation was essentially the same for everyone. In project 2, however, each manager was in charge of a unique property with distinct consumer and employee demands. This made the development of a one-size-fits-all AI tool nearly impossible.

The interaction of these three variables determined how well developers were able to gather information and, ultimately, design a successful tool.

Increasing the chance of success

The implications for managers are threefold, Karunakaran says. “First, they need to mandate that AI is important. If they hire a bunch of developers and task them with talking to people but don’t empower them with anything other than money, then they’re setting the developers up for failure,” he says. Managers need to set clear guidelines, even on minor concerns: Domain experts must respond to emails from AI developers within a given window, for instance. “Developers should go into the field with the right sponsorship and ensured access to experts.”

Second, and related, companies may want to create a new position of go-between, a person who facilitates the movement of information between experts throughout the company and developers working on new AI tools.

Finally, if particular projects are spinning their wheels, managers should step in to help refocus or scale down the project scope. In this case, project 2 included a complex web of individuals and stores for which one AI tool could never meet all needs. A more successful approach may have been to step back and, for instance, target the 10 or 20 worst-performing stores.

The researchers note that developers, too, can be proactive in how they approach new projects. As they begin to meet with domain experts and gather information, they ought to suss out the nature of the project at hand. Is the jurisdiction clear and is the task central to those with whom they’re talking? Is the work performed similarly each time and by everyone involved? If so, the information they need should be relatively easy to acquire; if not, perhaps a recalibration is in order.

“The second instance would suggest you’re in more of a project-2 landscape and things might derail quite quickly,” Vendraminelli says. “Beyond coding, sometimes developers need to work a bit like sociologists, to get an understanding of what they might need to build effective AI tools.”