An interpretable artificial intelligence–based model may match the diagnostic accuracy of standard deep-learning approaches for distinguishing early-stage breast cancer from benign lesions on magnetic resonance imaging and could potentially improved physician performance, according to a recent study.
In a retrospective, multicenter study, researchers evaluated a clinically interpretable concept bottleneck model (CBM) using data from five hospitals collected between January 2016 and July 2025. The analysis included 1,695 lesions from 1,634 patients with pathology-confirmed diagnoses of malignant or benign breast lesions. The model incorporated preoperative multiparametric magnetic resonance imaging (MRI) and radiologist-defined imaging features and required board-certified radiologists to identify lesion regions of interest. Performance was assessed in internal and external cohorts, and clinical utility was evaluated in a multireader study with eight radiologists.
The CBM achieved an area under the receiver operating characteristic curve (AUC) of 0.92 ± 0.01 in the test set, similar to a black-box model, which achieved an AUC of 0.93 ± 0.01. In external validation, the CBM maintained its high performance with an AUC of 0.93, overall accuracy of 86%, and precision of 89%. In the multireader study, the CBM demonstrated diagnostic accuracy of 89% in distinguishing benign lesions from malignancies and outperformed seven of the eight radiologists.
With CBM assistance, radiologist accuracy improved from 71% to 79% without assistance to 77% to 91% with assistance, and interpretation time decreased from 143 seconds to 104 seconds. Inter-reader agreement also improved, indicating more consistent classification and reporting.
The model also affected clinical decision-making. Among benign lesions initially classified as suspicious, radiologists correctly downgraded about 22% of them to benign when using the CBM. Improvements were greater among less-experienced radiologists, whose accuracy and positive predictive value increased from 74% to 88% and 65% to 81%, respectively, compared with 77% to 82% and 69% to 74% among more experienced radiologists.
The CBM generated intermediate predictions based on radiologist-style concepts, such as lesion margins and enhancement patterns, before estimating malignancy. Concept accuracy ranged from 64% to 100%. Multiparametric MRI improved performance compared with single-sequence approaches, which showed lower diagnostic accuracy.
The study has several limitations. It was retrospective and included a simulated reading setting, which may not have reflected real-world workflows. The model depends on physician-provided lesion localization and wasn't evaluated for screening use. External validation was limited, and performance in broader clinical populations remained uncertain. The model may also misclassify small or atypical lesions without characteristic imaging features.
The findings suggested that interpretable artificial intelligence–based model can achieve high diagnostic performance while aligning with clinical reasoning and workflow.
“The CBM provides a versatile framework for classifying early breast cancer and benign lesions,” wrote Jiao Qu, of the Department of Radiology at West China Hospital at Sichuan University, and colleagues.
The study authors reported no competing interests.
Source: BMC Medicine