Can AI Eliminate Errors in Radiological Reports?
The interpretation of certain radiological exams is often difficult and time-consuming. This is particularly true for MRIs and CT scans, which are characterized by a wealth of cross-sectional images and anatomical details. Moreover, these scans are sometimes marred by artifacts. Some imaging departments have considerable workloads that make them susceptible to errors in their reports. The risk is further increased when voice dictation, which can introduce its own artifacts, is used.
In a hectic environment, careful rereading of radiologic reports does not protect against errors. Under these conditions, artificial intelligence (AI), ie ChatGPT, could provide safeguards and avoid potentially significant repercussions, including medicolegal problems. A retrospective study illustrates this potential application, which arises in addition to the already established or validated diagnostic aid that AI can provide in certain indications.
A total of 200 reports from various examinations (ie, x-rays, CT scans, and MRIs) were collected between June and December 2023 within a single American institution. One hundred fifty of the most common errors (including omissions, insertions, syntax errors, and confusion between right and left) were artificially introduced into 100 of these reports.
The rereading was entrusted to six radiologists with varying levels of experience (two senior radiologists, two assistants, and two residents) and ChatGPT-4, with the aim of detecting errors. The comparison between the two modes of rereading was based on χ2 and Student's t-test, with the time spent on this also considered.
At the end of this comparison, the performance of ChatGPT-4 approached that of the radiologists. The error detection rate using generative AI was estimated at 82.7% (124/150; 95% CI, 75.0-87.9) compared with 89.3% (134/150; 95% CI, 83.4-93.3) for senior radiologists, 80.0% (120/150; 95% CI, 72.9-85.6) for assistants, and 80.0% (120/150; 95% CI, 72.9-85.6) for residents. The intergroup difference was not statistically significant. One of the senior radiologists distinguished himself from ChatGPT-4 and his colleagues with an error detection rate of 94.7% (142/150; 95% CI, 89.8-97.3; P = .006).
In terms of timing, there is no comparison as the time taken by ChatGPT-4 to detect errors in a report was estimated on average at 3.5 ± 0.5 seconds compared with 25.1 ± 20.1 seconds for radiologists (P < .001). Financially, the cost-effectiveness is a plus for ChatGPT-4, which entailed an estimated cost of correcting a report of $0.03 ± 0.01 compared with $0.42 ± 0.41 for radiologists (P < .001).
This retrospective study suggested that ChatGPT-4 can help radiologists in rereading reports by identifying the most common errors. The performance of AI in this field approaches that of the most experienced professionals (with one exception).
These results should be confirmed prospectively before the adoption of this particularly rapid re-reading mode. Training the conversational agent to hunt for errors is also a prerequisite that is not accessible to all imaging services or departments, at least at the moment.
This story was translated from JIM, which is part of the Medscape Professional Network, using several editorial tools, including AI, as part of the process. Human editors reviewed this content before publication.