Six artificial intelligence (AI) platforms for ultrasound evaluation of thyroid nodules have received clearance from the US Food and Drug Administration (FDA), with published studies demonstrating similar or improved diagnostic performance compared with less-experienced physicians, according to an article published in Thyroid. Additional systems under investigation, such as those assessing lymph nodes and slide specimens, suggest an expanding role for AI in diagnostic assessment.
In an updated review, endocrinologist Johnson Thomas, MD, of Mercy Hospital, Springfield, Missouri, and radiologist Franklin N. Tessler, MD, of the University of Alabama at Birmingham, built on their 2022 primer on the role of AI in the workup of thyroid nodules, summarizing current evidence across ultrasound, cervical lymph node assessment, cytology and histology specimen evaluation, and molecular testing. They noted that since its publication, the pace of research and commercialization has continued to accelerate, while emphasizing that in their current state, these AI systems are designed to augment—not replace—clinical judgment.
FDA-Cleared Systems
All commercially available platforms analyze sonograms and generate malignancy risk estimates using established risk stratification systems such as the American College of Radiology Thyroid Imaging, Reporting and Data System (ACR TI-RADS), Korean TI-RADS, European TI-RADS, and the guidelines of the American Thyroid Association.
Across studies, several tools were found to improve performance for physicians of different levels of expertise, including those earlier in their careers. In one study of 130 pathologically confirmed nodules, the AmCAD-UT system appeared to statistically significantly improve accuracy among junior readers. Koios DS™ Thyroid increased overall area under the curve (AUC) from 0.776 to 0.817 in a retrospective study of 172 nodules. The tool improved both sensitivity (82% to 86%) and specificity (38% to 45%) when added to physician interpretation. In a cohort of 28 indeterminate nodules, the system reduced nodules meeting biopsy criteria from 24 to 10, capturing 6 of 7 malignancies.
S-Detect—the first AI product to receive FDA clearance — was evaluated prospectively in 312 nodules from 236 patients and demonstrated 95% sensitivity and 56% specificity, with performance comparable to experienced radiologists and superior to residents. The system reduced unnecessary biopsy rates by up to 28% compared with resident interpretations.
Large Language Models
By contrast, multimodal large language models demonstrated variable performance. In a prospective study of 106 nodules, ChatGPT-4o misclassified 26 benign and 11 malignant nodules based on ultrasound images, and performance decreased when shear wave elastography data were added. In a separate evaluation of 202 nodules, these systems were found to demonstrate high specificity but low sensitivity across most ACR TI-RADS categories.
The researchers described the poor performance of these models as “not surprising” and advised against their use for clinical decision-making at this time.
Lymph Nodes and Pathology
Currently, no commercially available systems provide lymph node assessment; however, ongoing research is exploring this capability. A meta-analysis of 27 studies in patients with papillary thyroid carcinoma reported 80% sensitivity and 83% specificity for AI in detecting cervical lymph node metastases on ultrasound, compared with 51% sensitivity and 84% specificity for physicians.
In cytology, a multicenter retrospective and prospective trial of 537 thyroid nodules found an AI model to achieve an AUC of 0.977 in distinguishing benign from malignant nodules on whole-slide images and improve junior cytopathologist specificity from 89% to 99% and accuracy from 88% to 95%. A proteomics-based machine learning model evaluated prospectively in 294 nodules achieved 85% accuracy and 92% sensitivity. No AI tools are currently cleared or approved for thyroid cytopathology or histology images.
"This is just a guess, but I think cytology [AI applications] will [become routine] first, spurred by the trend toward digital pathology. Nodal assessment, whether direct or indirect, will probably take longer," said Dr. Tessler in an interview with the American Association of Clinical Endocrinology (AACE) in partnership with Conexiant.
However, Dr. Thomas told AACE and Conexiant that reimbursement may play a significant role. While “many pathology labs are already using AI for slide assessment,” he noted, “the cost of whole-slide scanners and expenses associated with software is making it harder for increased adoption.” He added that large language models may find broader utility in clinical operations such as medical note creation and order entry.
Implementation Considerations
The researchers observed that the FDA-cleared AI systems for thyroid nodule assessment have not achieved widespread dissemination. Dr. Thomas attributed this to “last-mile” barriers, including “workflow friction, uncertain ROI [return on investment], and lack of prospective, independent validation in the settings where most thyroid US [ultrasound] is performed.” He added that “clear reimbursement and/or value-based justification could help with adoption,” noting that practices are more likely to move forward when there is “a predictable pathway to cover software costs (or a strong operational case: fewer unnecessary biopsies, less report variability, shorter report times).”
Dr. Tessler emphasized that broader uptake will require “compelling evidence that these systems can reduce time and effort while at least maintaining, if not improving, diagnostic accuracy, making the cost of implementation and continued usage worthwhile.” He added that such data “will have to come from independent, not company-sponsored, trials,” and that head-to-head comparisons of AI software would help practices determine which system to adopt.
Beyond published study results, both underscored the importance of thoughtful workflow integration. Dr. Thomas said, “I think integrating AI into thyroid risk stratification can reduce subjectivity. But we need multicenter prospective trials to make sure that the risk stratification is accurate and that we are reducing unnecessary biopsies.”
Dr. Tessler noted that implementation strategies will vary across specialties “because there are many differences in how people work.” He added that all practice types should map out the process “from the time the patient arrives to when the ultrasound findings are incorporated into the medical record” and evaluate how an AI system fits within that pathway. “It's helpful to discuss this with prospective vendors during the selection and trial process,” he said.
The researchers concluded: "AI tools hold promise for the evaluation of thyroid nodules, and potentially for lymph nodes, and biopsy and surgical specimens as well. However, clinicians and health care leaders must be aware of the limitations of AI and know how to select and implement technology that meshes with existing infrastructure, is applicable to the patient population on which it will be used, and provides a user experience that is easy to learn and enhances existing workflows."
Disclosure: Dr. Thomas holds intellectual property rights/patents related to the application of AI for thyroid nodule risk stratification (AIBx, not included in the review). Dr. Tessler reported no conflicts of interest.
Source: Thyroid