Google DeepMind has released AlphaGenome, an AI system that shows how even tiny changes in noncoding sections of the human genome affect gene expression.

AlphaGenome can analyse up to one million DNA letters at once and predict thousands of molecular properties characterising its regulatory activity. Furthermore, the model scores the effects of genetic variants or mutations by comparing predictions of mutated sequences with unmutated ones.

This tool accurately predicts how single variants or mutations in human DNA sequences influence a range of biological processes. Uncovering and predicting minuscule changes in noncoding regions could help scientists better understand cancers and rare diseases, thus revolutionising personalised medicine.

The human genome stores most of its instructions in genetic “dark matter,” i.e., the 98% of DNA that does not code for proteins. Historically, scientists have dismissed non-coding DNA as junk, but recent revelations show that this DNA controls when and how genes turn on and off. Finding these control switches could allow researchers to better design therapies that target genetic conditions.

The training data was sourced from large public consortia like ENCODE, GTEx, 4D Nucleome, and FANTOM5, which experimentally measured properties covering key modalities of gene regulation across hundreds of human and mouse cell types and tissues. Therefore, the model was trained on an expansive and robust dataset.

 Some of the predicted properties include where genes start and end in different cell and tissue types, where they are spliced, and the volume of RNA produced.

AlphaGenome used previous genomics models like Enformer and AlphaMissense as jumping points. They specialise in categorising the effects of variants within protein-coding regions.

Dr. Caleb Lareau, Assistant Member in Computational and Systems Biology, at the Memorial Sloan Kettering Cancer Center said: “It’s a milestone for the field. For the first time, we have a single model that unifies long-range context, base-level precision, and state-of-the-art performance across a whole spectrum of genomic tasks.”

Analysis showed that AlphaGenome achieves state-of-the-art performance across a wide range of genomic prediction benchmarks. Finally, AlphaGenome’s generality will also allow researchers to modify and fine-tune it on their own datasets to address their individual research questions better.