AI could learn when to defer to human pathologists to make cancer diagnosis safer and fairer
Artificial intelligence could make cancer diagnosis safer and fairer by learning when to defer to human pathologists without overloading them, according to researchers from the University of Surrey and Monash University. The approach tackles two critical problems that have limited the use of AI-assisted decision-making in cancer pathology, radiology and other fields where human expertise remains essential.
Current collaborative human-AI systems require every expert to review each case during training, creating an expensive and time-consuming process. They also tend to overwork the most accurate experts during testing, risking burnout and errors.
The research introduces a probabilistic method that allows AI systems to learn from incomplete expert input while distributing workload evenly across teams.
The research team tested their approach on colon cancer pathology images, where three professional pathologists classified tissue samples into normal, precancerous and cancerous categories. Even when 70% of expert annotations were missing during training, the system maintained high accuracy while ensuring no single pathologist was overwhelmed with cases.
"In cancer pathology and radiology, we know that overloading experts leads to mistakes. There is a documented case where a radiologist misdiagnosed because they interpreted 162 cases in one day when the average is only 50. Our system prevents this by ensuring work is distributed fairly while maintaining high accuracy. The AI learns to handle routine cases independently and defer complex ones to humans, but crucially, it doesn't always defer to the same person," says Professor Gustavo Carneiro.
The challenge is particularly acute in cancer diagnosis, where distinguishing between benign, precancerous and malignant tissue requires expert judgment, but pathologists face growing caseloads. An AI system that can confidently handle straightforward cases while flagging complex ones for human review could reduce pressure on specialists without compromising diagnostic accuracy.
"Previous systems assumed you could get every expert to review every training sample, which simply is not realistic for large datasets or busy clinical teams. We have shown you can train effective Human-AI systems even when experts only review portions of the data. This makes the technology far more practical for real-world deployment in cancer pathology and other high-stakes medical fields," says Dr. Cuong Nguyen.
The system uses an algorithm that treats both the choice of which expert to consult and any missing expert opinions as variables that can be inferred during training. It also includes a mechanism to control how much work is assigned to each expert and to the AI classifier itself, allowing organizations to set workload limits during training rather than adjusting them afterward.
The research addresses growing concerns about AI deployment in health care, where purely automated systems may miss important details, but consulting humans for every decision is impractical and costly. The team also tested the approach on chest X-ray interpretation and bone disease imaging, demonstrating its versatility across different medical imaging tasks.
The research was presented at the International Conference on Learning Representations (ICLR 2025).
More information:
Probabilistic Learning to Defer: Handling Missing Expert's Annotations and Controlling Workload Distribution. openreview.net/pdf?id=zl0HLZOJC9
Provided by University of Surrey