An American research NGO, Arc Institute, in collaboration with Nvidia and researchers from Stanford University, UC Berkeley, and UC San Francisco, has developed the largest AI model for biology, called Evo 2. This tool can identify mutations in human genes that cause various diseases.

The new AI was trained on the DNA of over 100,000 species across the entire tree of life. Thanks to its deep understanding of life, it can recognize patterns in genetic sequences that researchers would take years to uncover. The model accurately identifies disease-causing mutations in human genes and is also capable of designing new genomes similar to those of simple bacteria.

Additionally, the AI can process genetic sequences of up to one million nucleotides simultaneously, allowing it to understand relationships between distant parts of the same genome.

Building upon its predecessor Evo 1, which was trained entirely on unicellular genomes, Evo 2 is now the largest AI model in biology. It has been trained on:

Over 9.3 trillion nucleotides

128,000 complete genomes

Metagenomic data

Evo 2 was trained for several months on NVIDIA DGX Cloud AI via Amazon Web Services, using more than 2,000 NVIDIA H100 GPUs. The Evo 2 code has been published on the institute’s GitHub account, in hopes that it can be used in other fields to accelerate scientific research.