Artificial intelligence models used to detect periodontal disease from intraoral photographs showed wide variability in accuracy across studies, according to a systematic review of 26 publications conducted between 2019 and 2025.
Seventeen studies used classification methods, four applied detection models, and five used segmentation tasks. Reported accuracy for classification ranged from 48% to 100%, detection ranged from 56% to 78%, and segmentation achieved Intersection over Union scores between 43% and 70%. Nearly half of the classification studies reported accuracy above 70%. Convolutional neural networks were the most commonly used models, and three studies tested the commercial software DentalMonitoring. Results from external validation were inconsistent; one study showed high agreement with a periodontist, while others reported low accuracy when applied to new populations.
Performance generally declined when models were tested on external data compared with internal training and validation sets. The review noted that this decline was consistent with data heterogeneity across populations and differences in image acquisition devices.
The studies were conducted in multiple countries, with the largest number originating in China, followed by India, the United States, Japan, and others. Sample sizes varied widely, ranging from 20 to 12,600. Imaging devices included professional single-lens reflex cameras, smartphones, intraoral cameras, and home-use devices. Ten studies used clinical examinations as reference standards, while 16 relied on visual assessments. Only eight studies involved multiple experts in image annotation. Most analyses used frontal intraoral views, with relatively few incorporating multiple perspectives.
Most studies focused on gingivitis, and only two assessed periodontitis. Those studies used probing depth values that did not fully align with current diagnostic standards. None of the included studies applied the 2018 Periodontal Classification comprehensively.
The review identified several limitations. Reporting quality was inconsistent, and reference standards varied widely, limiting comparability of findings. Demographic details, including ethnicity, were often missing, making it difficult to evaluate bias across populations. Data quality was another concern, with publicly available datasets showing low resolution, inadequate lighting, or poor focus. Only three studies provided open datasets, restricting reproducibility and independent validation.
The researchers concluded that while AI models demonstrated potential for detecting periodontal disease from intraoral photographs, their clinical applicability remained limited. Variability in accuracy, inconsistent use of reference standards, and weak reporting practices underscored the need for stronger methodology. They recommended that future research focus on improving study design, adopting gold standard diagnostic criteria, ensuring higher quality imaging, and conducting external validation to evaluate performance across diverse populations.
The authors reported no conflicts of interest.
Source: International Dental Journal