The model, called SuRImage, was evaluated in a prospective multicenter diagnostic study using smartphone photographs of surgical lung specimens captured under natural lighting conditions in operating rooms. Compared with final pathology, SuRImage demonstrated moderate-to-good discrimination across diagnostic and grading tasks and generally outperformed frozen section analysis in the study’s predefined comparisons. However, the study did not test whether model use changed surgical decisions or improved patient outcomes.
Intraoperative assessment can help guide the extent of resection and lymph node evaluation in patients with clinical stage IA lung adenocarcinoma. Frozen section analysis is commonly used for this purpose but may take at least 30 minutes and can be limited by tissue sampling, freezing artifacts, and incomplete assessment of the full tumor section.
The study enrolled 1,727 patients with 1,854 nodules and included 2,910 surgical resection images from 3 hospitals in China between June 2020 and September 2023. Patients had indeterminate lung nodules classified as clinical stage IA lung adenocarcinoma on thin-slice computed tomography within 30 days before surgery.
The Guangdong Provincial People’s Hospital cohort included 1,529 patients, 1,638 nodules, and 2,344 images and was used for model development, validation, and internal testing. External test cohorts included 116 patients with 127 lesions and 307 images at the Affiliated Hospital of Guangdong Medical University and 82 patients with 89 lesions and 259 images at Meizhou People’s Hospital.
SuRImage was built around smartphone photographs of resected specimens and incorporated clinical information into the final prediction. The model was developed for 3 intraoperative tasks: binary identification of invasive lung adenocarcinoma vs preinvasive lesions; diagnosis of adenocarcinoma in situ, minimally invasive adenocarcinoma, and invasive lung adenocarcinoma; and multiclass grading of adenocarcinoma in situ, minimally invasive adenocarcinoma, and International Association for the Study of Lung Cancer grade 1, 2, and 3 invasive lung adenocarcinoma.
In the Guangdong internal test cohort, SuRImage achieved areas under the curve of 0.84 for invasive lung adenocarcinoma identification, 0.87 for diagnosis, and 0.85 for grading. In the 2 external test cohorts, the model achieved areas under the curve of 0.90 and 0.83 for identification, 0.88 and 0.84 for diagnosis, and 0.78 and 0.82 for grading.
Performance was not uniform across subgroups. Grade 1 invasive adenocarcinoma, a clinically important category because it may support less extensive surgery in selected patients, had lower discrimination than the overall grading task. SuRImage’s grade 1 areas under the curve were 0.66 in the Guangdong internal test cohort, 0.68 in one external test cohort, and 0.52 in the other.
For the binary identification task, SuRImage achieved 97% sensitivity, 92% specificity, 95% precision, 95% accuracy, and an F1 score of 0.96. Frozen section analysis achieved 93% sensitivity, 92% specificity, 85% precision, 93% accuracy, and an F1 score of 0.89.
For the 3-category diagnostic task, SuRImage achieved an average accuracy of 92% and an F1 score of 0.88, compared with 85% and 0.75, respectively, for frozen section analysis. Frozen section analysis performed similarly to the model for adenocarcinoma in situ and invasive lung adenocarcinoma but was less accurate for minimally invasive adenocarcinoma, with 80% accuracy compared with 93% for SuRImage.
The largest apparent difference was seen in the multiclass grading task. SuRImage achieved an average accuracy of 89% and an F1 score of 0.84, compared with 65% and 0.47 for frozen section analysis. That comparison should be interpreted cautiously because IASLC grading by frozen section was feasible for only 55 of 1,638 nodules in the Guangdong cohort, and detailed grading is not commonly performed with frozen section analysis in routine practice. The grading comparison is therefore less a conventional head-to-head comparison with routine frozen section practice than a comparison with the subset of cases in which detailed frozen section grading was attempted.
In the Guangdong cohort, frozen section analysis had an overall concordance rate of 87% compared with final pathology. Among the 55 gradable nodules, frozen section grading accuracy was 62% overall and 14% for grade 1 nodules, underscoring the difficulty of intraoperative grading from limited frozen section samples.
Frozen section reports were ambiguous for 320 of 1,638 nodules in the Guangdong cohort. Among the 1,318 nodules with clear frozen section reports, 1,123 were correctly diagnosed, 157 were underestimated, and 38 were overestimated compared with final pathology. The researchers reported that SuRImage maintained robust diagnostic performance among patients with ambiguous or incorrect frozen section findings.
The researchers also evaluated SuRImage assistance among 8 thoracic surgeons, including 4 junior and 4 senior surgeons. With model assistance, junior surgeons achieved higher diagnostic accuracy than unassisted senior surgeons across all 3 diagnostic tasks. Average grading accuracy among the surgeons improved from 64% without assistance to 73% with assistance. However, the reader-assistance results should not be read as evidence that SuRImage improves intraoperative management or patient outcomes.
The study had several limitations. SuRImage was developed specifically for clinical stage IA lung adenocarcinoma and was not designed to classify all lung nodules or other lung cancer histologies. Some tumor grades, particularly adenocarcinoma in situ and grade 1 invasive lung adenocarcinoma, were underrepresented, and the external validation cohorts were relatively small. Real-world feasibility also remains uncertain because patients without acquirable tumor section images or with poor image quality were excluded. The model also did not comprehensively incorporate the full range of variables used in surgical decision-making, including patient health status, age, medical history, patient preferences, preoperative radiomic features, and intraoperative clinical judgment.
The researchers framed SuRImage as the first deep learning model based on macroscopic surgical resection images for intraoperative assessment of clinical stage IA lung adenocarcinoma. That claim does not extend to all forms of artificial intelligence–assisted intraoperative diagnosis.
The findings are promising, but SuRImage remains investigational pending broader validation and studies assessing whether model-assisted interpretation improves intraoperative decision-making or patient outcomes.
Disclosures: The researchers declared no competing interests.
Source: The Lancet Digital Health