EvolutionaryScale, an A.I. biotech startup founded by a group of former Meta researchers, today (June 25) came out of stealth mode and released a large language model (LLM) that the company claims can design proteins in a process that mimics up to 500 million years of natural evolution. The startup is co-founded by the computer scientist Alexander Rives, who serves as the company’s chief scientist.
EvolutionaryScale emerged from the Meta’s Fundamental AI Research (FAIR) unit. The company also said today it had raised $142 million in seed funding—an enormous amount for a seed-stage round. The investment was led by former GitHub CEO Nat Friedman and the entrepreneur and investor Daniel Gross (who is also a co-founder of OpenAI’s ousted chief scientist Ilya Sutskever’s new startup, Superintelligence.) The venture capital firm Lux Capital also led the round, with participation from Amazon (AMZN) and Nvidia (NVDA)’s venture capital arm. Nvidia and Amazon’s AWS cloud unit are also partnering with the startup, aiming to make its A.I. models available to select customers soon.
Biotech is currently a hot field of A.I. application. EvolutionaryScale said its protein-engineering tool could be used to target cancer cells, discover new drugs, and engineer microbes that can break down harmful plastic in the environment, among other use cases. Lux Capital co-founder and managing partner Josh Wolfe called the startup a “ChatGPT moment for biology.”
How does it all work?
Similar to A.I. chatbots, EvolutionaryScale’s language model can be prompted to generate proteins “with a chain of thought,” the company said in a preprint paper titled “Simulating 500 million years of evolution with a language model” posted on its site today.
The model released today, named ESM3, was trained on data from 2.78 billion natural proteins, the company said. It “reasons over the sequence, structure and function of proteins” and can “follow complex prompts combining its modalities and is highly responsive to biological alignment,” according to the paper.
When prompting ESM3 to generate fluorescent proteins, among the generations the company discovered a bright fluorescent protein at a far distance from known fluorescent proteins. Similarly distant natural fluorescent proteins are separated by over half a billion years of evolution. “More than three billion years of evolution have produced an image of biology encoded into the space of natural proteins. Here we show that language models trained on tokens generated by evolution can act as evolutionary simulators to generate functional proteins that are far away from known proteins,” the paper said.
EvolutionaryScale is releasing ESM3 models in different sizes. The smaller model is being open sourced for non-commercial research, while AWS and Nvidia will make the larger models available commercially.
Correction: This article has been updated to reflect that Amazon, not Amazon Web Services, participated in EvolutionaryScale’s seed funding round.