An artificial intelligence workflow used to draft hospital course summaries was incorporated into more than half of discharge notes in a prospective pilot study, with most reviewed summaries rated as having no potential for harm.
In the single-arm quality improvement study, published in JAMA Network Open, investigators evaluated MedAgentBrief, a large language mode workflow designed to summarize hospital courses for discharge documentation. The artificial intelligence (AI) system was deployed on one academic inpatient medicine unit at Stanford Health Care from August to October 2025.
The system generated 1,274 daily hospital course summaries during the 10-week pilot. Among 384 hospital discharges involving 331 patients, physicians incorporated AI-generated text into final discharge documentation in 219 cases, for a 57% use rate.
Physicians provided safety feedback on 100 summaries, including 88 that were ultimately used and 12 that were not. Among reviewed summaries, 88% were rated as having no harm potential. Investigators noted that feedback was voluntary.
One summary was rated as likely to cause moderate harm after suggesting a transition from intravenous vancomycin to oral antibiotics, without clarifying that the patient had already completed treatment and that oral antibiotics were being used for prophylaxis. Independent adjudication later concluded that the recommendation posed no clinical risk.
No summaries were rated as having potential for severe harm or death. Omissions were reported in 25% of reviewed summaries, inaccuracies in 20%, and hallucinations in 2%. Incorrect citations were not reported.
The workflow used a three-stage process involving draft generation, iterative refinement, and hallucination reduction rather than single-pass prompting. The authors noted that the 2% hallucination rate compared favorably with rates exceeding 40% reported in prior studies of single-pass clinical text generation.
The system processed only free-text clinical notes, including histories, physical examinations, and progress notes. It did not incorporate structured electronic health record (EHR) data such as laboratory results, medication lists, or flowsheet data.
The authors emphasized that omissions may be harder to address than hallucinations because the information may exist in the record but not be recognized by the model as clinically important.
Physician burnout scores decreased after implementation. Among 10 physicians, mean Stanford Professional Fulfillment Index Work Exhaustion scores fell from 1.75 to 1.20, crossing below the established burnout threshold of 1.33. Two physicians had increased burnout scores, and cognitive burden scores did not significantly change.
Objective time savings were modest and heterogeneous. Among seven physicians with matched baseline data, five had reductions in median documentation time of up to 2.9 minutes, while two had increases of up to 1.5 minutes.
Subjective efficiency ratings were more favorable. Physicians reported perceived time savings in 67% of feedback surveys, with 32% estimating savings of more than 15 minutes per summary. The authors described the benefit as “cognitive offloading rather than clock-time efficiency.”
Limitations included the single-unit academic setting, attending-only physician workflow, small numbers of participating physicians, voluntary feedback, lack of subgroup analyses by case complexity or length of stay, and absence of a contemporaneous control group. The study also lacked a systematic assessment of error rates in physician-authored discharge summaries, making it difficult to quantify the AI system’s incremental risk or benefit.
“While the workflow architecture effectively mitigated hallucinations, omissions remained the predominant error type,” the authors concluded. “Addressing these will require incorporating structured EHR data, adjusting generation timing, and developing scalable methods to align model outputs with physician judgment.”
Disclosures: One author reported receiving personal fees from OpenAI related to health data curation and ChatGPT Health initiatives. Other authors reported relationships with Fourier Health, Prealize Health, Atropos Health, Opala, Curai Health, Johnson & Johnson Innovative Medicine, AbbVie, Insitro, and other commercial entities.
Source: JAMA Network Open