TOPLINE:
The combination of a large language model-based natural language processing (LLM-NLP) approach with standard diagnostic codes identified more cases of eosinophilic esophagitis (EoE) than diagnostic codes alone.
METHODOLOGY:
- Researchers compared three methods for identifying cases of EoE: diagnostic codes alone, a rule-based NLP approach, and an LLM-NLP approach, each evaluated against the manual reference standard, and the NLP approaches were also tested in combination with diagnostic codes.
- They analyzed de-identified Penn Medicine records from 2008 through April 2024, flagged pathology reports mentioning “eosinophil,” and paired them with gastroenterology clinic notes.
- Patients were classified as having EoE if either a prior diagnosis was documented in clinic notes or pathology showed at least 15 eosinophils per high-power field with esophageal dysfunction symptoms; 300 randomly selected patients were split into a development set of 200 patients (including 56 cases of EoE) and a validation set of 100 patients (including 36 cases of EoE).
- The LLM-NLP approach used prompts to extract key symptoms and biopsy findings, such as difficulty swallowing, food getting stuck, reflux, and eosinophil counts, and then determined the presence of EoE either from a prior documented diagnosis or from matching symptoms and biopsy results, while also allowing the model to assign a diagnosis directly from the notes.
- The validated LLM-NLP methods were applied to an additional cohort (n = 580) to simulate real-world data.
TAKEAWAY:
- In the validation cohort, the approach solely based on standard diagnostic codes achieved a precision of 0.97 (95% CI, 0.91-1.00), an accuracy of 0.94 (95% CI, 0.89-1.00), and Cohen kappa of 0.87 (95% CI, 0.76-0.97), yielding an F1 score (ie, the harmonic mean of recall and precision) of 0.91 (95% CI, 0.84-1.00).
- The LLM-derived features and the LLM-assigned diagnosis each had an overall performance score -0.05 (95% CI, -0.13 to 0.04 and -0.16 to 0.05, respectively) lower than diagnostic codes alone, whereas combining the LLM with diagnostic codes produced a small improvement of 0.03 (95% CI, -0.01 to 0.07).
- The combination approach of LLM plus standard diagnostic codes scored 0.11 points higher (an 11% increase) than the rule-based NLP approach.
- In the larger cohort of 580 individuals, the combination approach of LLM plus diagnostic codes identified 203 cases of EoE, more than both the code-only (n = 173; P < .001) and LLM-NLP-only (n = 158; P < .001) approaches; among these 203 patients, 85% already had a diagnostic code, but 15% did not and were missed by codes alone.
IN PRACTICE:
“Combining LLM-NLP with a single diagnostic code offers a scalable and accurate alternative to traditional code-based algorithms while enabling simultaneous extraction of individual diagnostic variables,” the authors of the study concluded.
SOURCE:
The study was led by Corey J. Ketchem, MD, Hospital of the University of Pennsylvania, Philadelphia. It was published online in Gastro Hep Advances.
LIMITATIONS:
The analysis was limited to pre-biopsy clinic notes. The pipeline used an off-the-shelf OpenAI model; alternative models may differ in performance, cost, or explainability. Additionally, the manual reference standard was annotated by a single reviewer, which may have introduced misclassification bias.
DISCLOSURES:
The study was supported by the Consortium of Eosinophilic Gastrointestinal Disease Researchers and other sources; the work was conducted independently of the funders. One author reported serving as a consultant to Sanofi/Regeneron, Takeda, and Lucid.
This article was created using several editorial tools, including AI, as part of the process. Human editors reviewed this content before publication.
Admin_Adham