A review outlined best practices for integrating large language models into radiology workflows, noting that AI-generated reports can match radiologist accuracy and improve diagnostic reasoning. Researchers highlighted key risks—performance drift, hallucinations, data privacy breaches, and demographic bias—and recommended expanded evaluation metrics beyond accuracy, emphasizing consistency, transparency, and bias mitigation to ensure safe clinical use.
Source: Radiology