
List Article
Inovasi Detail
Copy of Generative AI tool marks a milestone in biology
How is Evo 2 different from Evo 1 – which came out just last year – and how did you advance the technology so quickly?
Honestly, Evo 1 was more effective than we thought it would be. Evo 1 was trained on only 113,000 or so genomes of simpler life forms like bacteria and archaea, known as the prokaryotes.
Evo 2, on the other hand, also includes the known genomes of 15,000 or so plants and animals – the eukaryotes – which includes humans. Our dataset has now expanded from about 300 billion nucleotides to almost 9 trillion with Evo 2. In terms of safety, we have left out the genomes of viruses to prevent Evo 2 from being used to create new or more dangerous diseases. It’s like a representative snapshot of all species on Earth. Because it has the potential to improve tasks related to human disease, we felt like we needed to share Evo 2 quickly.

Can you give us the lay version of how Evo 2 works?
All life is encoded in DNA using just four chemicals, known as nucleotides. These complex molecules are abbreviated using the letters A, C, G, and T. The human genome, at 3 billion nucleotides long, is just a string of these four letters. Now, if you imagine DNA as the characters in a book that is 3 billion letters long, the individual genes are the words. They are spelled differently. Some have more letters than others. And they have different purposes and meanings – that is, they have different functions.
With AI, we can search for patterns in all that code and use it to predict what the next nucleotide in the sequence is likely to be. In this way, Evo 2 is able to generate – to write – new genetic code that has never existed before. With Evo 2, you can enter a sequence of up to 1 million nucleotides. The million-nucleotide window in biology is important, as it allows us to explore long-distance interactions between two or more genes that may not be physically close to one another on the DNA molecule. The longer context window could allow us to spot connections between these long-distance collaborators that we wouldn’t even know about with a shorter window.

Trained on a dataset that includes all known living species – and a few extinct ones – Evo 2 can predict the form and function of proteins in the DNA of all domains of life and run experiments in a fraction of the time it would take a traditional lab.
ImagineImagine being able to speed up evolution – hypothetically – to learn which genes might have a harmful or beneficial effect on human health. Imagine, further, being able to rapidly generate new genetic sequences that could help cure disease or solve environmental challenges. Now, scientists have developed a generative AI tool that can predict the form and function of proteins coded in the DNA of all domains of life, identify molecules that could be useful for bioengineering and medicine, and allow labs to run dozens of other standard experiments with a virtual query – in minutes or hours instead of years (or millennia).
The open-source, all-access tool, known as Evo 2, was developed by a multi-institutional team co-led by Stanford’s Brian Hie, an assistant professor of chemical engineering and a faculty fellow in Stanford Data Science. Evo 2 was trained on a dataset that includes all known living species, including humans, plants, bacteria, amoebas, and even a few extinct species. Stanford Report talked to Hie about Evo 2’s advanced capabilities, why the scientific world is so eager to get its hands on this new tool, and how Evo 2 could reshape the biological sciences.