A review of studies published in JAMA Network Open found few randomized clinical trials for medical machine learning algorithms, and researchers noted quality issues in many published trials they analyzed.
The review included 41 RCTs of machine learning interventions. It found 39% were published just last year, and more than half were conducted at single sites. Fifteen trials took place in the U.S., while 13 were conducted in China. Six studies were conducted in multiple countries.
Only 11 trials collected race and ethnicity data. Of those, a median of 21% of participants belonged to underrepresented minority groups.
None of the trials fully adhered to the Consolidated Standards of Reporting Trials – Artificial Intelligence (CONSORT-AI), a set of guidelines developed for clinical trials evaluating medical interventions that include AI. Thirteen trials met at least eight of the 11 CONSORT-AI criteria.
Researchers noted some common reasons trials didn’t meet these standards, including not assessing poor quality or unavailable input data, not analyzing performance errors and not including information about code or algorithm availability.
Using the Cochrane Risk of Bias tool for assessing potential bias in RCTs, the study also found overall risk of bias was high in the seven of the clinical trials.
“This systematic review found that despite the large number of medical machine learning-based algorithms in development, few RCTs for these technologies have been conducted. Among published RCTs, there was high variability in adherence to reporting standards and risk of bias and a lack of participants from underrepresented minority groups. These findings merit attention and should be considered in future RCT design and reporting,” the study’s authors wrote.
WHY IT MATTERS
The researchers said there were some limitations to their review. They looked at studies evaluating a machine learning tool that directly impacted clinical decision-making so future research could look at a broader range of interventions, like those for workflow efficiency or patient stratification. The review also only assessed studies through October 2021, and more reviews would be necessary as new machine learning interventions are developed and studied.
However, the study’s authors said their review demonstrated more high-quality RCTs of healthcare machine learning algorithms need to be conducted. While hundreds of machine-learning enabled devices have been approved by the FDA, the review suggests the vast majority didn’t include an RCT.
“It is not practical to formally assess every potential iteration of a new technology through an RCT (eg, a machine learning algorithm used in a hospital system and then used for the same clinical scenario in another geographic location),” the researchers wrote.
“A baseline RCT of an intervention’s efficacy would help to establish whether a new tool provides clinical utility and value. This baseline assessment could be followed by retrospective or prospective external validation studies to demonstrate how an intervention’s efficacy generalizes over time and across clinical settings.”