proteome-wide-ai-model-supports-rare-disease-diagnosis-using-evolutionProteome-Wide AI Model Supports Rare Disease Diagnosis Using Evolution
DNA double helix
Credit: TanyaJoy / iStock / Getty Images Plus

Missense variants remain a challenge in genetic interpretation due to their subtle and context-dependent effects. While current prediction models perform well in known disease genes, generalizability is limited in unknown areas of the proteome. 

In a new study published in Nature Genetics titled, “Proteome-wide model for human disease genetics,” researchers from Harvard Medical School and the Center for Genomic Regulation (CRG) in Barcelona have popEVE, a deep generative model combining evolutionary and human population data to estimate variant deleteriousness on a proteome-wide scale.  

popEVE can assist rare disease diagnoses by allowing doctors to focus on the most damaging variants first. The model can also work with a patient’s genetic information alone, a valuable feature for rare disease medicine where healthcare systems have limited resources, making diagnoses faster, simpler and cheaper than before. 

“Clinics don’t always have access to parental DNA and many patients come alone. popEVE can help these doctors identify disease-causing mutations, and we’re already seeing this from collaborations with clinics,” says Mafalda Dias, PhD, co-corresponding author of the study and researcher at the Center for Genomic Regulation. 

The space of disease-causing genetic variation is too large to be studied by population variation or disease-relevant experimental assays alone. The biodiversity of life on Earth provides a deeper view of genetic variation across billions of years of evolution, presenting a unique opportunity to uncover complex genetic patterns preserved to maintain fitness. Computational models can learn which amino acid positions are critical for life by comparing protein sequences across many different species. 

In 2021, the team published EVE (Evolutionary model of Variant Effect), which used evolutionary patterns to classify mutations in human disease genes as benign or harmful. While EVE could judge the impact of mutations within a gene, its scores were not directly comparable between genes.  

popEVE addresses this gap by combining evolutionary data with information from the UK Biobank and gnomAD, which provide data on variants that are present in healthy individuals to calibrate the model. 

To validate popEVE, the researchers analyzed genetic data from more than 31,000 families with children affected by severe developmental disorders. In 98% of cases where a causal mutation had already been identified, popEVE correctly ranked that variant as the most damaging in the child’s genome. The model outperformed state-of-the art competitors, such as DeepMind’s AlphaMissense. 

popEVE uncovered 123 new candidate disease genes that had never before been linked to developmental disorders. Many genes are active in the developing brain and interact physically with known disease proteins. 104 genes were observed in just one or two patients. 

The model also addresses the issue of underrepresentation in genetic databases to support all patients.

“No one should get a scary result just because their community isn’t well represented in global databases. popEVE helps fix that imbalance, something the field has been missing for a long time,” says Jonathan Frazer, PhD, co-corresponding author of the study and researcher at the Center for Genomic Regulation.