Researchers developed an artificial intelligence system that predicts occurrence of more than 1,000 diseases by analyzing patients' medical records over time. In internal validation, the model achieved an average area under the receiver operating characteristic curve (AUC) of about 0.76 across diagnoses and 0.97 for predicting death. 97% of diseases exceeded an AUC of 0.5.
The system, called Delphi-2M, was trained on UK Biobank (UKB) records from about 403,000 participants and externally evaluated—without changing parameters—on 1.93 million Danish patients. External validation maintained predictive accuracy (average AUC ~0.67 vs. ~0.69 in UKB longitudinal testing), and disease-wise performance was highly correlated across data sets, indicating robust generalizability across health-care systems. The model drew on patients’ diagnosis histories along with basic demographic and lifestyle factors such as sex, body mass index, smoking, and alcohol use.
Performance varied across disease categories. Mental-health conditions formed clusters that persisted over time, whereas pregnancy-related clusters faded by about 10 years, as expected for time-limited conditions.
The system showed comparable performance to established clinical risk scores for cardiovascular disease and dementia, but it underperformed single biomarkers such as HbA1c for diabetes prediction—underscoring the continued importance of specific laboratory measures in some settings.
Beyond prediction, Delphi-2M can generate synthetic health trajectories extending up to 20 years into the future. Starting from age 60, the model recapitulated population-level disease incidences at ages 70–75, with ~17% of diagnoses correctly predicted in the first simulation year compared with ~12–13% using age and sex alone.
Using SHAP analyses, the authors reported patterns in disease progression and mortality risk: cancers had sustained effects on mortality, whereas the effects of septicemia and myocardial infarction regressed toward baseline within several years. The model can learn systematic biases from data sources, predicting higher rates for conditions primarily diagnosed in hospital settings.
Training exclusively on synthetic data produced similar results, with average AUC reduced by only ~3 percentage points, suggesting potential for privacy-preserving development.
Limitations include potential immortality bias from UKB recruitment of living participants aged 40–70, limited coverage beyond age 80, and learned patterns of record missingness that may not generalize across health systems.
Disclosures: The researchers report a patent filing related to generative transformer architectures and additional interests; see the article for the full list.
Source: Nature