Disease-associated genetic variants may alter tissue-specific protein isoforms that are absent from reference transcript annotations. Researchers found that incorporating alternative transcript isoforms identified protein-altering variants overlooked by conventional reference transcript analyses and experimentally demonstrated functional effects for a lung-specific DPP9 isoform associated with severe COVID-19 and pulmonary fibrosis.
To determine whether disease-associated variants exert coding effects through alternative transcript isoforms, the researchers integrated long-read RNA sequencing data from multiple human tissue atlases with population variation from the Genome Aggregation Database, disease-associated variants from the Genome-Wide Association Studies Catalog and ClinVar, protein structure prediction, and tissue-specific expression analyses across 22 human tissues. They validated selected findings using targeted long-read RNA sequencing, proteomics, enzymatic assays, and cell-based experiments. The primary objective was to identify protein-altering variants that are missed when genetic variants are interpreted using only reference transcripts.
The researchers found that alternative isoform-specific exons harbored a greater burden of genetic variation than reference exons and were more likely to contain missense mutations. They identified approximately 40,000 Genome-Wide Association Studies Catalog variants mapping to annotated alternative exons compared with approximately 24,000 mapping to reference exons. Additionally, approximately 80% of alternative transcripts containing disease-associated variants were not represented in current reference genome annotations, and the variant-containing alternative isoforms generally demonstrated greater tissue specificity than their matched reference isoforms.
The researchers also reported that computational analyses prioritized multiple missense variants predicted to alter protein structure or stability in alternative isoforms. Predicted structural effects were greatest among known pathogenic ClinVar variants, whereas variants of uncertain significance generally showed intermediate predicted effects and common Genome-Wide Association Studies variants showed smaller predicted effects. The investigators emphasized that these computational approaches were used to prioritize candidate variants for further study rather than to establish pathogenicity.
To evaluate whether their computational framework identified biologically relevant variants, the researchers examined a common DPP9 variant previously associated with severe COVID-19 and pulmonary fibrosis. Although the variant has generally been annotated as intronic, they found that it resides within an unannotated alternative first exon of a lung epithelial-specific coding transcript identified through long-read sequencing. Targeted long-read capture sequencing confirmed expression of the full-length transcript in multiple lung epithelial cell lines.
The researchers found that the variant produced a leucine-to-proline substitution within an alternative amino-terminal region unique to the lung-specific isoform. Structural modeling predicted disruption of an alpha helix, and biochemical experiments showed altered enzymatic properties, including increased affinity for the dipeptidyl peptidase inhibitor Val-boroPro and modestly increased affinity for one peptide substrate. Co-immunoprecipitation experiments demonstrated that both the wild-type and variant alternative isoforms retained interaction with NLRP1, supporting the predicted structural model while indicating that the amino acid substitution altered protein properties without abolishing binding.
The researchers noted several limitations. Most variant effects were based on computational predictions rather than experimental validation, with DPP9 serving as the principal functional example. The transcriptomic data were derived primarily from healthy tissues rather than disease-specific samples, the analysis focused largely on missense variants, and many alternative transcripts are predicted to be nonfunctional or undergo nonsense-mediated decay. They also noted that disease-specific long-read transcriptomic data and single-cell full-length transcriptomes may further refine variant interpretation in future studies.
Overall, the findings suggest that evaluating genetic variants within tissue-specific alternative transcript isoforms may identify protein-coding consequences that are overlooked when analyses rely exclusively on reference transcripts.
"Assessment of both common and rare disease-associated variants in the context of isoform-specific effects, often through tissue-specific alternative isoforms, will help explain genetic contributions to human disease," wrote lead study author Giovanna Weykopf, of the MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, and colleagues.
Disclosures: The study received funding from public and institutional research organizations, including the UK Medical Research Council and the European Research Council. The researchers reported no competing interests.
Source: Nature Communications