ChatGPT-4 Vision, an artificial intelligence model designed for image interpretation, demonstrated diagnostic accuracy similar to resident physicians in evaluating thyroid nodules on ultrasound, according to a prospective clinical study in China. Researchers suggest the model may support diagnostic workflows, particularly in settings with limited radiology expertise.
The study involved 124 patients with pathologically confirmed thyroid nodules who underwent ultrasound at the Second Affiliated Hospital of Fujian Medical University. Each nodule was imaged in three planes—transverse, longitudinal, and one showing key features. These images were evaluated independently by ChatGPT-4 Vision, a resident physician, and an attending physician. Diagnostic results were compared to final pathology or FNA biopsy as the gold standard.
ChatGPT-4 Vision was primed using 100 ultrasound cases before testing. It analyzed each case using the 2020 Chinese Guidelines for Ultrasound Malignancy Risk Stratification of Thyroid Nodules (C-TIRADS).
The AI model achieved a sensitivity of 86.2%, specificity of 60.0%, and an AUC of 0.731. These values were comparable to those of the resident physician (sensitivity: 85.1%, specificity: 66.7%, AUC: 0.759; p > .05). The attending physician showed significantly better diagnostic performance (sensitivity: 97.9%, specificity: 80.0%, AUC: 0.889; p < .05).
In terms of consistency with pathology results, ChatGPT-4 Vision had a Kappa score of 0.457, compared with 0.495 for the resident and 0.816 for the attending physician, indicating greater agreement between senior clinicians and pathology.
While the AI model performed well in detecting malignancy, its lower specificity led to a higher false-positive rate. Among 45 benign nodules, ChatGPT-4 Vision would have resulted in 18 unnecessary biopsies, compared with 15 from the resident and 9 from the attending physician.
Dao-Rong Hong, MD, of the Department of Ultrasonography at The Second Affiliated Hospital of Fujian Medical University, stated, “The primed ChatGPT-4 Vision demonstrates promising diagnostic potential in thyroid nodule ultrasound imaging, performing comparably to resident physicians.”
The study had several limitations. It used only static ultrasound images, preventing the AI from assessing real-time features such as vascular flow. The single-center design and inclusion criteria—restricting nodules to those 1–3 cm in size—limit generalizability. Microcarcinomas and large nodules were excluded. In addition, the AI was not evaluated for its ability to assist physicians or improve diagnostic outcomes when used in clinical practice.
Despite these limitations, the authors concluded that ChatGPT-4 Vision offers a promising tool for image interpretation. Its performance was comparable to that of a resident and in line with currently approved computer-aided diagnostic software.
The researchers recommended further studies in multi-center settings using dynamic imaging. They also noted that AI tools like ChatGPT-4 Vision could help standardize diagnoses and reduce inter-reader variability, especially in resource-constrained or high-volume clinical environments.
The authors reported no conflicts of interest.
Source: Frontiers