Japanese investigators developed and validated a new scoring system to differentiate complicated from uncomplicated appendicitis, according to a recent study
In the study, published in Scientific Reports, the investigators developed a scoring system that incorporated clinical and radiological factors using data from 199 patients with acute appendicitis who underwent immediate surgical treatment. They then validated the system externally on a separate cohort of 100 patients.
The investigators defined complicated appendicitis as gangrenous appendicitis, perforated appendicitis, or appendicitis with abscess. Uncomplicated cases included catarrhal and phlegmonous appendicitis.
In the development cohort of 199 patients, 105 (52.8%) had complicated appendicitis. The median age was 37 years for uncomplicated cases and 53 years for complicated cases and the median C-reactive protein levels were 9.4 mg/L and 73.9 mg/L for uncomplicated and complicated cases, respectively.
In the external validation group of 100 patients, 59 had complicated appendicitis. The median age was 34 years for uncomplicated and 47 years for complicated cases and the median C-reactive protein levels were 14.0 mg/L and 87.0 mg/L for uncomplicated and complicated cases, respectively.
The final scoring system comprised six independent predictors:
- Age > 47 years (3 points)
- Body temperature > 37.2°C (2 points)
- C-reactive protein level (0 to 7 points based on ranges)
- Maximum diameter of appendix or abscess (0 to 3 points based on ranges)
- Presence of appendicolith (3 points)
- Periappendiceal fat stranding grade (0 to 7 points based on severity).
The investigators found that the scoring system demonstrated good diagnostic accuracy in both development and external validation cohorts.
In the development cohort, the area under the receiver operating characteristic curve (AUC) was 0.882 (95% confidence interval [CI] = 0.835–0.929). Using a cutoff of 12 points, the system achieved a sensitivity of 82.9% and specificity of 86.2%. The positive likelihood ratio was 6.01, negative likelihood ratio 0.20, and diagnostic odds ratio 30.3.
External validation in 100 patients yielded an AUC of 0.868 (95% CI = 0.794–0.942), with no statistically significant differences from the development cohort (P = .750). At the 12-point cutoff, sensitivity and specificity in the validation group were 86.4% and 78.0%, respectively. The positive likelihood ratio was 3.93, negative likelihood ratio 0.17, and diagnostic odds ratio 23.1.
Computed tomography (CT) findings in the model development group showed:
- Appendicolith present in 40/94 of uncomplicated and 69/105 of complicated cases
- Abscess present in 0/94 of uncomplicated and 18/105 of complicated cases
- Extraluminal air present in 0/94 of uncomplicated and 17/105 of complicated cases
- Ileus present in 6/94 of uncomplicated and 18/105 of complicated cases
- Ascites present in 7/94 of uncomplicated and 20/105 of complicated cases.
Periappendiceal fat stranding grades for uncomplicated cases were: Grade 0 (53), Grade 1 (22), Grade 2 (15), and Grade 3 (4). Periappendiceal fat stranding grades for complicated cases were: Grade 0 (10), Grade 1 (31), Grade 2 (38), and Grade 3 (26).
The logistic regression model yielded the following odds ratios (OR):
- Age > 47 years: OR = 2.825 (95% CI = 1.278–6.242, P = .010)
- Body temperature > 37.2°C: OR = 2.230 (95% CI = 1.037–4.792, P = .040)
- CRP > 70.0 mg/L: OR = 7.182 (95% CI = 2.587–19.937, P < .001)
- Maximum diameter > 11.0 mm: OR = 3.273 (95% CI = 1.253–8.546, P = .015)
- Presence of appendicolith: OR = 3.064 (95% CI = 1.387–6.770, P = .006)
- FS Grade 3: OR = 6.521 (95% CI = 1.465–29.020, P = .014).
Internal validation results showed a model accuracy of 0.799 and Cohen's kappa of 0.595 for the training data set. The test data set yielded a model accuracy of 0.763 and Cohen's kappa of 0.521.
The investigators compared their scoring system to a radiological assessment using six CT findings (appendicolith, ileus, abscess, extraluminal air, ascites, and fat stranding ≥ grade 2). The radiologic approach showed high sensitivity (91.4%) but low specificity (40.4%). There was a statistically significant difference in diagnostic performance between the radiologic assessment and the new scoring system (P < .001). The study found 139 patients (69.8%) had diagnostic concordance between radiological assessment and the new scoring system.
The Hosmer-Lemeshow test showed no statistically significant difference in goodness of fit for the scoring system in both the development (P = .478) and external validation (P = .125) cohorts.
The investigators retrospectively reviewed records of 299 patients aged ≥ 15 years who underwent emergency surgery for histologically confirmed acute appendicitis between January 2009 and September 2023. Pregnant patients and those who had interval appendectomy after antibiotic therapy were excluded.
CT imaging was used to assess the presence of appendicolith, ileus, abscess, ascites, and maximum appendix diameter. Periappendiceal fat stranding was graded on a 0 to 3 scale.
Statistical analysis involved restricted cubic spline analysis to investigate relationships between continuous variables and complicated appendicitis. Multivariate logistic regression with stepwise backward elimination identified independent predictors. The model underwent internal validation using 10-fold cross-validation.
Limitations of the study included its retrospective design, small sample size, and potential bias caused by differences in patient characteristics between the development and validation cohorts.
The authors declared no competing interests.