Bad Data, Bad Results: When AI Struggles to Create Staff Schedules | Working Knowledge

This post was originally published on this site.

AI is poised to transform how we work, learn, and live, but a recent study shows that extracting its benefits still depends on a user’s expertise.

Researchers analyzing five years of AI-generated work schedules for thousands of employees at large retail chains found that incorrect data—variables like when and how much an employee can work, and on what tasks—led to faulty plans. The study is among the first to demonstrate how flawed training data and poor input management can dramatically undermine AI’s performance, says Caleb Kwon, who led the study as a doctoral student at Harvard Business School.

The researchers found that “if you put in garbage, the AI tool—no matter how sophisticated it is or how complex it is or how much data you feed it—will produce something that’s suboptimal,” Kwon explains. “And that’s exactly what we found: the schedules generated by this AI tool do not reflect the reality of what employees can and can’t do. The generated work schedules were effectively useless.”

After dabbling with AI for multiple years, many companies are seeking to operationalize, scale up, and find value from the technology. Kwon’s findings suggests that companies reap the greatest benefits when they establish strong, principled controls in how AI tools are set up and managed before deployment, rather than treating them as autonomous solutions.

Now an assistant professor at the McCombs School of Business at the University of Texas at Austin, Kwon coauthored “The Impact of Input Inaccuracy on Leveraging AI Tools” with Antonio Moreno, the Sicupira Family Associate Professor at HBS, and Ananth Raman, the UPS Foundation Professor of Business Logistics at HBS.

Garbage in, garbage out

The researchers studied schedules for 300,000 retail employees over five years, encompassing 99 million shifts. The schedules covered 6,200 stores across 2,000 cities in all 50 states.

Though used by independent retail chains, the plans were created similarly using the same commercially available AI tool. The tool was programmed to generate labor schedules that met two objectives: ensuring enough employees were on duty to match anticipated demand and complying with labor laws and union agreements.

The timetables then went to store general managers, who could review them and make corrections based on evolving information.

The team of researchers conducted a shift-by-shift comparison of the AI-generated schedules against the ones adjusted by managers. What they found was striking: managers made manual overrides to 84 percent of the 99 million shifts in the study.

These corrections included both adjusting key parameters—such as shift duration, assigned employee, start date, start time, and job assignments—as well as manually adding or deleting entire shifts.

Given that the tool was intended to streamline the scheduling process and free up managers for other tasks, this raised an important question for both the researchers and retailers using the AI tool: Why were managers making so many changes?

Unraveling the scheduling anomalies

Initially, the team hypothesized that managers’ unique, on-the-ground knowledge might explain the frequent adjustments. For example, managers might have private information about demand that the AI tool did not possess. However, “we saw some very weird things in the data,” Kwon explains—for example, that only certain employees’ shifts always seemed to be deleted or only certain workers appeared in the manager-adjusted schedule.

Faulty input data, such as inaccurate employee availability records, produced problematic schedules. The study found that 7.8 million shifts, or 7.9 percent of the total, required manual adjustments due to erroneous information provided to the AI model.

Yet those were not the only interventions. The researchers found that the erroneous information triggered a ripple effect: for every 1 percent increase in flawed shift additions and deletions, there was an additional 1.9 percent increase in overrides to regular shifts created with accurate data.

Fluidity in the schedules represents more than just an inconvenience or inefficiency. The overrides caused schedules to deviate from the goal of providing a balance between labor and demand—by 20 percentage points. This misalignment, the study found, could significantly hurt store performance.

Moreno remarked about the findings “It was surprising to encounter such inaccuracies in the input data used by the tool, especially considering the significant investment companies make in these technologies.”

AI’s broader operational challenges

The study’s findings apply to more essential operational activities than just scheduling, Kwon says.

For example, a company might decide to use AI to interpret historical sales data and other variables to help make decisions about inventory levels or prices. Incorrect or omitted data could skew those decisions and cause sales declines.

AI can also play a role in hiring decisions, but entering erroneous information from resumes could advantage or disadvantage candidates.

For managers, the key lesson is to focus more on ensuring that models use accurate data by establishing clear processes and guidelines and offering proper training. While much attention focuses on after-the-fact interactions such as overrides, what’s equally—if not more—critical is how users interact with AI tools before any scheduling decisions are made.

A vicious cycle can emerge, Kwon explains: poor initial data inputs lead to more overrides, which in turn reduces trust in the AI tool. This decreased confidence makes users less willing to maintain accurate input data, leading to even more overrides down the line. This pattern can be consistent with what researchers call “algorithm aversion”—where users lose confidence in AI systems and increasingly reject their outputs.

Devoting more time to accurate input practices means less time on post-schedule overrides. However, Kwon notes that managers will probably always apply their domain expertise to adjust schedules as they see fit.

“You can’t properly evaluate either AI performance or the value of overrides without first giving the AI tool the best possible chance to succeed,” he says. “This means starting with accurate, up-to-date data—good input practices are fundamental to extract the full potential of AI.”

Image by Ariana Cohen-Halberstam with assets from AdobeStock.