BY KIM BELLARD
For better and for worse, our healthcare system is built around physicians. For the most part, they’re the ones we rely on for diagnoses, for prescribing medications, and for delivering care. And, often, simply for being a comfort.
Unfortunately, in 2023, they’re still “only” human, and they’re not perfect. Despite best intentions, they sometimes miss things, make mistakes, or order ineffective or outdated care. The order of magnitude for these mistakes is not clear; one recent study estimated 800,000 Americans suffering permanent disability or death annually. Whatever the real number, we’d all agree it is too high.
Many, myself included, have high hopes that appropriate use of artificial intelligence (AI) might be able to help with this problem. Two new studies offer some considerations for what it might take.
The first study, from a team of researchers led by Damon Centola, a professor at the Annenberg School for Communication at the University of Pennsylvania, looked at the impact of “structured information–sharing networks among clinicians.” In other words, getting feedback from colleagues (which, of course, was once the premise behind group practices).
Long story short, they work, reducing diagnostic errors and improving treatment recommendations.
Clinicians were given a case study and asked for their diagnosis and treatment recommendations. Those not in the control group got to see the diagnostic decisions of their peers (on an anonymous basis). They were, on average, twice as accurate as those making the decisions on their own.
Study co-author Elaine Khoong of UCSF says, “We are increasingly recognizing that clinical decision-making should be viewed as a team effort that includes multiple clinicians and the patient as well.” The researchers made sure that the structured network included clinicians of various ages, specialties, expertise, and geographical locations, trying to ensure that it was not simply a top-down, hierarchical network.
Professor Centola believes: “egalitarian online networks increase the diversity of voices influencing clinical decisions. As a result, we found that decision-making improves across the board for a wide variety of specialties.” Best of all, he notes:
The big risk with these information-sharing networks is that while some doctors may improve, there could be an averaging effect that would lead better doctors to make worse decisions. But, that’s not what happens. Instead of regressing to the mean, there is consistent improvement: The worst clinicians get better, while the best do not get worse.
The researchers think this approach could be easily adopted, building on existing e-consult technologies: “We anticipate, for instance, that instead of sending clinical cases to a single specialist, clinicians may instead submit cases to a network of specialists who participate in a structured information exchange process before providing a recommendation to the referring clinician.” Professor Centola points out that, while the networks need to be structured thoughtfully, they don’t have to be huge; in fact, 40 is ideal. “The increasing returns above that – going, say, from 40 to 4,000 – are minimal,” he says.
It’s worth pointing out that the anonymous clinicians in the structured networks were, in this case, human; an interesting follow-up would be to see what happens when some or even all of the recommendations come from AI.
Which leads to the second study, from a team of researchers from MIT and Harvard, which looked at what happens when radiologists get assistance from AI. Long story short: not much.
As Professor Rajpurkar said in a lengthy Twitter thread: “Why? Radiologists implicitly discount AI predictions, favoring their own judgment – a bias we call “automation neglect””
The “automation neglect” comes from radiologists discounting the AI probabilities by around 30% relative to their own assessments. The radiologists also tended to view their recommendations and the AI predictions as independent, when, in fact, they are based on the same data.
The paper found: “We find that AI assistance does not improve human’s diagnostic quality even though the AI predictions are more accurate than almost two-thirds of the participants in our experiment.” To make things worse, “radiologists are slower when provided with AI assistance.”
Slower but not more accurate is not a winning combination, and definitely not what we might have expected.
Complicating things, the results were heterogeneous: “AI assistance improves performance for a large set of cases, but also decreases performances in many instances.” The more “confident” the AI prediction was, the more it helped improve quality. But when the AI was less confident, radiologists’ performance also suffered.
The researchers are forced to conclude: “Our results demonstrate that, unless the documented mistakes can be corrected, the optimal solution involves assigning cases either to humans or to AI, but rarely to a human assisted by AI.” Professor Rajpurkar notes: “While AI holds promise, thoughtfully accounting for how humans actually use AI is critical. Our work provides concrete evidence on biases and costs that should inform system design.”
An open question the researchers posit is “whether the benefits from AI-specific training for radiologists and/or experience with AI are large.” I.e., can humans learn to work better with AI?
Given the results of the first study, I’d have been interested to see what would have happened if the second study had also tested getting recommendations not from AI but from a structured network of human physicians; did the radiologists discount just AI recommendations, or do they just not trust external recommendations generally?
At the risk of giving it short shrift, a third study, from Fabrizio Dell’Acqua at the Harvard Business School, suggests that when AI is too good, humans tend to “fall asleep at the wheel,” leading him to conclude: “maximizing human/AI performance may require lower quality AI, depending on the effort, learning, and skillset of the humans involved.” There is a lot about human/AI interaction we do not yet understand.
We’ve long looked at medicine as an “art,” allowing and even encouraging individual physicians to use their best judgement. That has led to well documented variability of care and outcomes, much of which is not in patients’ best interests. There’s too much for physicians to know, there’s too many extraneous factors influencing their decisions, and, at this point, there’s way too much money at stake. They need help.
In 2023, clinical decision-making should be, as Professor Khoong noted, a team effort. We have the ability now for that team to be human “equalitarian online networks,” as Professor Centola and his colleagues urge, and we increasingly will have the ability for such networks to include, or to be replaced by, AI. One way or the other, we need to “thoughtfully account” for how and when physicians use them.
Kim is a former emarketing exec at a major Blues plan, editor of the late & lamented Tincture.io, and now regular THCB contributor.