A systematic review and meta-analysis published in Frontiers in Oncology found that ultrasound characteristics–based artificial intelligence–assisted diagnostic systems “demonstrated high clinical potential” in distinguishing benign from malignant thyroid nodules, with an ensemble deep learning classification model showing the highest diagnostic accuracy, according to Zhan et al.
In addition, the researchers wrote, “For thyroid nodules in female patients with an average diameter of [less than] 20 mm and an age of [at least] 50 years, artificial intelligence [AI]–assisted diagnostic models are more effective, especially deep learning models.”
Inside the Analysis
Following a comprehensive literature search of PubMed, Web of Science, and the Cochrane Library, 28 studies that met the inclusion criteria were analyzed. These studies examined AI-assisted thyroid nodule diagnosis and collectively represented 134,028 patients, 158,161 nodules, and 529,479 ultrasound images.
Study quality was evaluated using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. The meta-analysis was conducted using standard statistical software, with pooled sensitivity, specificity, diagnostic odds ratio, and summary receiver operating characteristic (SROC) area under the curve calculated. Subgroup analyses and assessments of clinical applicability were also performed.
Diagnostic Performance
According to the researchers, AI-supported diagnostic systems showed high accuracy, with pooled sensitivity of 0.89, specificity of 0.84, a positive likelihood ratio of 5.60, a negative likelihood ratio of 0.13, a diagnostic odds ratio of 43.94, and a SROC area under the curve of 0.93. The threshold effect analysis appeared to suggest no statistically significant heterogeneity.
Diagnostic accuracy was found to be greater in Asian countries; in prospective and multicenter study designs; in studies including external validation cohorts; in those without cross-validation; in deep learning–based models; and in postoperative patient subgroups. Moreover, improved performance was noted in cohorts with smaller nodule diameters (less than 20 mm), higher malignancy prevalence, older patient age (at least 50 years), and a greater proportion of female patients, although heterogeneity seemed to remain significant. Based on univariate and multivariate meta-regression analyses, the AI type and nodule malignancy rate were significant sources of heterogeneity.
Of note, EDLC-TN—an ensemble deep learning classification model for thyroid nodules—demonstrated the highest diagnostic accuracy.
The Future of AI in Thyroid Diagnostics
Looking ahead, and noting that most included studies were conducted in Asian regions (89%), the researchers wrote that “international multicenter data sets should be established in the future to ensure that the samples of each subgroup are balanced, include data from different ultrasound devices and different acquisition protocols, and adopt unified image annotation standards and pathologic confirmation procedures to construct diagnostic models for different subtypes of thyroid nodules.” They also wrote that “future AI developments should prioritize adaptability, algorithmic transparency, and interpretability.”
The researchers reported no conflicts of interest.