On an internal test set, a weakly supervised deep learning model achieved perfect classification of infiltrative basal cell carcinoma and poorly differentiated cutaneous squamous cell carcinoma on whole-slide histopathology images, with discrimination remaining high in external cohorts, according to a study published in The Journal of Pathology: Clinical Research.
Researchers developed a multiple-instance learning model using clustering-constrained attention with feature extraction from a vision transformer (Phikon) to classify whole-slide images without region-level annotations. The data set included 335 hematoxylin and eosin–stained slides collected between 2021 and 2023, with diagnoses confirmed by two board-certified dermatopathologists. Of these, 25% (n = 84) were reserved as an internal test set, and two external cohorts were used for validation.
On the internal test set, the model achieved an area under the curve (AUC) of 1.0, with 100% accuracy, sensitivity, and specificity. Prediction scores exceeded 0.95 for all but one whole-slide image. This cutaneous squamous cell carcinoma (cSCC) case, previously identified as an outlier, had a prediction score of 0.85 but was correctly classified.
In the Queensland cohort (n = 10), the model achieved an AUC of 1.0. At a fixed threshold of 0.5, accuracy was 90%, and at the Youden threshold all cases were correctly classified. In the COBRA cohort (n = 200), the model achieved an AUC of 0.92. At a fixed threshold, accuracy was 52%, with sensitivity of 100% and specificity of 4%. After threshold adjustment using Youden’s J statistic, accuracy improved to 87%, with sensitivity of 79% and specificity of 94%.
Attention heatmaps showed that tiles with the highest attention scores were localized within tumor regions. In cSCC, high-attention tiles included infiltrative atypical squamoid cells with desmoplastic stroma, while in basal cell carcinoma (BCC), attention concentrated within infiltrative tumor strands containing basaloid cells with mitotic figures.
Researchers also evaluated a dermatopathology foundation model (HistoGPT). In zero-shot testing, overall accuracy was 77%, with sensitivity of 98% for BCC and 33% for cSCC. After fine-tuning on the in-house data set, the model achieved an AUC of 1.0. At the Youden threshold, accuracy was 98%, with sensitivity of 96% and specificity of 98%.
The researchers reported a calibration shift in the external COBRA cohort, where performance at a fixed threshold showed high sensitivity and low specificity and improved after threshold adjustment. These differences likely stemmed from variation in diagnostic subtypes, image resolution, and file formats across data sets. The in-house data set was monocentric, and the external cohorts were limited in size. Detailed subtype annotations were not available for the COBRA cohort.
“These findings demonstrate that weakly supervised deep learning enables highly accurate classification of diagnostically challenging basal cell carcinoma and [cutaneous squamous cell carcinoma] subtypes,” wrote lead study author Anne Petzold of Friedrich-Alexander-Universität Erlangen-Nürnberg in Germany and colleagues, noting that “reliable deployment across institutions necessitates careful calibration and domain adaptation.”
Full disclosures are available in the original study.