A research team led by Parmanand Sharma, of the Department of Ophthalmology at Tohoku University Graduate School of Medicine in Sendai in Japan, has developed an artificial intelligence–based glaucoma screening system that detects early structural signs of disease using macula-centered fundus images. The system can detect glaucoma using the images but cannot make a definitive diagnosis without visual field testing.
Glaucoma is often difficult to detect in its early, preperimetric (PPG) stage, the researchers wrote in their recent npj Digital Medicine article. The artificial intelligence–based glaucoma screening (AI-GS) network addressed this challenge by combining six deep learning models—each optimized for specific tasks such as segmentation, classification, and feature analysis—to detect optic disc cupping, disc hemorrhages (DH), retinal nerve fiber layer defects (RNFLD), and other key disease indicators. The total model size was approximately 110 MB, making it suitable for deployment on mobile platforms.
The AI-GS system incorporated MTL_LWBNA-unet, a multitask learning model for simultaneous segmentation of the optic disc, cup, and fovea, and glaucoma classification; binary classifiers for DH and RNFLD detection; and feedforward neural network (FFCN) for classification based on numerical disc assessment parameters.
All models used macula-centered fundus images that were processed with contrast-limited adaptive histogram equalization. Outputs included segmentation masks and numerical parameters such as vertical cup-to-disc ratio (VCDR), area cup-to-disc ratio (ACDR), disc size index, and neuroretinal rim area (NRRA).
On a balanced testing data set of 8,370 images, the MTL_LWBNA-unet model achieved:
- Accuracy: 96.28% (± 0.45)
- Area Under the Curve (AUC): 0.9904 (± 0.0014)
- Sensitivity: 95.16% (± 1.51)
- Specificity: 96.92% (± 0.74).
For glaucoma prediction using cupping-based FFCN parameters, sensitivity was lower (85.99%) but specificity remained high (93.39%).
On the real-world Miyagi Screening data set, sensitivity of the standalone binary classifier dropped to 56.52% (95% confidence interval [CI] = 52.18–60.58) at approximately 93.76% specificity. In contrast, the full AI-GS network achieved 80.53% sensitivity (95% CI = 77.04–83.82) and 91.12% specificity (95% CI = 88.87–93.56).
In detecting PPG, sensitivity at 95% specificity was 83% for the standalone DL classifier. With the AI-GS network, sensitivity increased to 87.6% with a GS-adjusted threshold strategy. Sensitivity further improved to 93% with adjusted AI-GS on Miyagi Screening, which surpassed ophthalmologist screening.
AI-GS’s performance for detecting structural features was as follows:
DH
- Binary classifier AUC: 0.8723
- Accuracy: 75.49%
- Improved with segmentation input: AUC increased to 0.9678, sensitivity 87%, and specificity 94%.
RNFLD
- Best model AUC: 0.9933
- Accuracy: 95.97%
- Youden Index: 0.9182.
Detection of structural changes led to more accurate classification of cases that were missed by the standalone model, including 83.9% (n = 193/230) of missed images that were reclassified correctly by the Adjusted AI-GS model and subsequently aligned with expert review.
Vertical CDR, ACDR, and NRRA normalized to disc area had the highest AUCs for glaucoma diagnosis (0.82–0.98), which was consistent with clinical findings.
The AI-GS network also revealed structural correlations and risk patterns:
- Myopic factor (disc ovality index) was higher in the Japanese data set, especially in normal-tension glaucoma cases, which suggested myopia as a potential risk factor.
- Disc size index differed by population: glaucoma cases in Rotterdam had larger discs, while Japanese cases had smaller discs.
- Inferior-temporal NRRA showed faster changes particularly in primary open-angle glaucoma.
Regarding limitations, the researchers noted, some data sets included images that were labeled only by screening ophthalmologists without visual field or optical coherence tomography confirmation, and the normal category may have included PPG or other retinal diseases. Nonetheless, model performance was consistent across multiple data sets and device types.
The AI-GS system met the Prevent Blindness America recommendation of at least 85% sensitivity at 95% specificity for glaucoma screening. Its multimodel approach outperformed single binary classification, particularly in detecting PPG, DH, and RNFLD. Integration with telehealth platforms and portable devices is feasible because of the system’s low memory footprint, and “is particularly transformative for remote or underserved communities, where access to health care resources and specialized ophthalmologic services is limited,” the study authors wrote.
They concluded: “Unlike traditional methods, the AI-GS network performs both segmentation and classification tasks, utilizing a hybrid approach that combines black-box predictions with conventional glaucomatous optic disc feature analysis. This dual methodology allows for the early detection of glaucoma indicators such as DH and RNFLD, crucial for preventing irreversible vision loss … By making accurate glaucoma screening accessible to individuals at their convenience, the AI-GS network stands to make a profound impact on global health, offering a scalable solution to prevent the progression of glaucoma and the consequent loss of vision.”
The researchers suggested explorations of additional models that are trained on other eye diseases in future research to broaden screening capabilities.
No competing interests were declared.