Radiologists assigned to receive step-by-step explanations from a large language model achieved higher diagnostic accuracy in a randomized vignette study, while differential-diagnosis outputs may have increased inappropriate reliance on incorrect model suggestions.
A VHA study across 11 vendors finds AI-generated primary care notes score lower than clinician-written notes, with the largest deficits in thoroughness, organization, and usefulness