An artificial intelligence–based system designed to screen for diabetic retinopathy and diabetic macular edema demonstrated consistent real-world performance in a large-scale deployment across 45 sites in India, according to a recent study and interview with the researchers. The system, known as the Automated Retinal Disease Assessment, is the first ophthalmology-focused artificial intelligence model to undergo postmarketing performance evaluation at this scale.
Diabetic retinopathy remains a leading cause of preventable blindness worldwide. Although clinical guidelines recommend annual eye exams among patients with diabetes, screening rates remain low—only about 50% of eligible patients are screened annually in the United States. Further, screening rates are even lower in low-resource settings such as India, where an estimated 100 million patients live with diabetes. “Around 50% of these patients [may not] even know that they have diabetes,” said co–study author Sunny Virmani, MS, a project manager at Google, in an interview with JAMA.
To address this gap, Google deployed its Automated Retinal Disease Assessment (ARDA) artificial intelligence (AI) model across 45 clinics that are affiliated with Aravind Eye Hospital in Tamil Nadu. These clinics included tertiary eye care centers and more than 100 satellite vision centers that are staffed primarily by optometrists and technicians.
Between 2019 and 2023, over 600,000 patients were screened using the ARDA system. Researchers sampled approximately 1% of these screenings to compare AI-generated grades with those of expert human graders to assess whether the AI’s performance remained consistent in diverse, real-world conditions.
Key variables included:
-
Camera hardware variation
-
Technician skill levels
-
Changing patient demographics
-
Presence of comorbid ocular conditions (eg, cataracts and glaucoma)
-
Image quality degradation over time caused by equipment wear.
The researchers found that ARDA’s sensitivity and specificity for severe plus diabetic retinopathy (DR) were 97% and 96.4%, respectively. Sensitivity and specificity for sight-threatening DR (STDR) were 95.9% and 94.9%, respectively. “Performance for STDR and referable DR was similar to that in prior retrospective and prospective studies using the same deep learning system,” noted lead study author Arthur Brant, MD, of Stanford University, with colleagues.
Four patients with severe nonproliferative DR (NPDR) or proliferative DR (PDR) who were understaged were still referred to the clinic because the model referred all patients with moderate or worse DR or diabetic macular edema (DME), which aligned with clinical safety benchmarks. Its positive predictive value was approximately 50%: for every two patients referred, one had a clinically significant condition. While this resulted in some overreferral, the researchers noted in the JAMA interview that the model was still more efficient compared with universal ophthalmologist screening.
Nontechnical factors that were important to successful deployment of the model and were learned about through its implementation were:
-
Staff training to consistently produce high-quality images.
-
Patient education for how to sit for the images.
-
How room lighting and camera placement influenced image quality.
Other limitations included generalizability outside Tamil Nadu, defining DME on fundus photography rather than optical coherence tomography, and the 45º field cameras which may have understaged peripheral pathology.
Despite these challenges, the cloud-based infrastructure of the ARDA system allowed rapid feedback and monitoring. Sample images were regularly retrieved and regraded to confirm AI performance hadn't degraded over time.
“What we realized is our model is just one piece of it, and it is the core piece,” Mr. Virmani explained further in the interview. “However, the input and the output and what is done with the output—all of these things matter a lot. It’s really important to actually test these systems, not just in isolation, but overall from a health care perspective and from a workflow perspective, to figure out where the gaps and the bottlenecks are and how we can make sure that this is success for everyone, not just for the model,” he added.
The study authors concluded: “To ensure continued patient safety, we recommend that all AI algorithms monitor and publish their clinical performance.”
All disclosures can be found in the published interview and study
Sources:
JAMA
JAMA Network Open