Radiologists Tested on AI X-Rays

Radiologists correctly distinguished artificial intelligence (AI)–generated radiographs from real images about 75% of the time in a new study, underscoring how difficult increasingly realistic synthetic medical images can be to detect.

The findings come from a retrospective diagnostic accuracy study published in Radiology that included 17 radiologists from six countries with varying levels of experience. Researchers evaluated physicians’ ability to identify synthetic radiographs generated by large language models (LLMs) and compared their performance with several multimodal LLMs.

The study was led by Mickael Tordjman, MD, and Murat Yuce, MD, of the Icahn School of Medicine at Mount Sinai, New York, and colleagues.

Researchers assessed performance across three phases using two balanced data sets totaling 264 radiographs, evenly split between synthetic and authentic images. They also evaluated four multimodal LLMs: GPT-4o and GPT-5 (OpenAI), Gemini 2.5 Pro (Google), and Llama 4 Maverick (Meta).

In phase 1, radiologists evaluated image quality and made diagnoses without being told that AI-generated images were included. In phase 2, they classified images as real or synthetic. In phase 3, they evaluated a separate data set of chest radiographs generated by RoentGen, a domain-specific diffusion model.

In the primary analysis, radiologists achieved 75% accuracy in distinguishing synthetic from real radiographs generated by GPT-4o. Performance was similar for the second data set of chest radiographs generated by RoentGen, with 70% accuracy.

Notably, 41% of radiologists recognized that AI-generated images were present when they were initially unaware of the study purpose.

Diagnostic performance similar for synthetic and real images

Radiologists’ diagnostic accuracy for identifying abnormalities was high and similar for both image types, reaching 92% for synthetic radiographs and 91% for real images. Image quality ratings were also comparable, suggesting that synthetic images were both visually convincing and clinically plausible.

Once informed that some images were artificial, radiologists’ sensitivity for detecting synthetic images was 69%, while specificity for identifying real images was 80%. Agreement among readers was modest, indicating variability in detection ability.

Experience offered limited advantage

Years of experience did not significantly affect performance, and prior familiarity with AI-generated images was not associated with improved accuracy. However, musculoskeletal radiologists performed better than other subspecialists, with 83% accuracy vs 70%.

Detection accuracy was consistent across anatomic regions and was not improved in clearly abnormal cases such as fractures.

AI systems also showed imperfect detection

None of the tested LLMs identified all synthetic radiographs. GPT-4o achieved 85% accuracy and GPT-5 reached 83%, outperforming Gemini 2.5 Pro (56%) and Llama 4 Maverick (59%).

Common features of synthetic images included excessive symmetry, uniform noise patterns, overly smooth bone contours, and subtle abnormalities in soft-tissue texture.

Limitations

The researchers noted several limitations, including a relatively small data set and the exclusion of obvious AI errors, which may have made detection more difficult. The equal proportion of synthetic images does not reflect real-world prevalence and could further reduce detection accuracy in practice.

Additionally, GPT-4o was used both to generate and detect images, introducing potential bias.

Implications

The findings highlight potential risks associated with the misuse of synthetic medical images in clinical care, research, and legal settings.

“None of the tested LLMs detected all synthetic radiographs,” the researchers wrote, emphasizing the need for physician training and technical safeguards.

They suggested that strategies such as watermarking, provenance tracking, and automated detection tools may help mitigate these risks.

The researchers reported no funding. Disclosures included editorial roles for several contributors, including positions with Radiology.

Source: Radiology

Lymphatic Imaging Tests a New Technique

Radiologists Tested on AI X-Rays

Diagnostic performance similar for synthetic and real images

Experience offered limited advantage

AI systems also showed imperfect detection

Limitations

Implications

Daily News

Recommendations

Surgical Techniques to Remove Subretinal Perfluoro-n-Octane

Surgical Techniques to Remove Subretinal Perfluoro-n-Octane

Surgical Techniques to Remove Subretinal Perfluoro-n-Octane

Surgical Techniques to Remove Subretinal Perfluoro-n-Octane

Trending Now

Lymphatic Imaging Tests a New Technique

Trending Now

Find topic & conditions by first letter

Compendium

Inside Dental Hygiene

Inside Dental Technology

Inside Dentistry

The ASCO Post

JADPRO

JNCCN

JNCCN 360

Corneal Physician

Glaucoma Physician

New Retinal Physician

Ophthalmology Management

Ophthalmic Professional

Presbyopia Physician

Retinal Physician

The Ophthalmologist

Contact Lens Spectrum

Eyecare Business

Optometric Management

Presbyopia Physician

The New Optometrist

The Pathologist

Radiologists Tested on AI X-Rays

Diagnostic performance similar for synthetic and real images

Experience offered limited advantage

AI systems also showed imperfect detection

Limitations

Implications

Daily News

Recommendations

Surgical Techniques to Remove Subretinal Perfluoro-n-Octane

Surgical Techniques to Remove Subretinal Perfluoro-n-Octane

Surgical Techniques to Remove Subretinal Perfluoro-n-Octane

Surgical Techniques to Remove Subretinal Perfluoro-n-Octane