Proteins are the molecules that do work in nature, and there is an entire industry emerging around successfully modifying and manufacturing them for various purposes. But this takes time and is random; Swing aims to change that with an AI-powered tool that tells scientists what new structures and sequences will make the protein do what they want. The company came out of stealth today with a significant seed round.
AI and proteins have been in the news lately, but largely because of the efforts of research teams like DeepMind and Baker Laboratory. Their machine-learning models take easily collected RNA sequence data and predict the structure the protein will adopt—a step that takes weeks and expensive specialized equipment.
But as incredible as this opportunity is in some areas, it is only a starting point for others. Modifying a protein to be more stable or to bind to a specific other molecule involves much more than simply understanding its general shape and size.
“If you’re a protein engineer and you want to engineer a certain property or function into a protein, just knowing what it looks like doesn’t help. It’s like if you have a picture of a bridge, it doesn’t tell you if it’s going to fall or not,” explained Cradle CEO and co-founder Stef van Grieken.
“Alphafold takes a sequence and predicts what the protein will look like,” he continued. “We’re the generative brother of that: You choose the properties you want to design, and the model will generate sequences that you can test in your lab.”
Predicting what proteins will do – especially those that are new to science on the spot is a difficult task for many reasons, but in the context of machine learning the biggest problem is that there is not enough data available. So Cradle created much of its own data set in a wet lab, testing protein after protein and seeing what changes in their sequences seemed to produce what effects.
Interestingly, the model itself is not exactly biotech, but a derivative of it “large language patterns” who produced text production machines such as the GPT-3. Van Grieken noted that these models are not strictly limited to language in the way they understand and predict data, an interesting feature of “generalization” that researchers are still exploring.
The protein sequences that Cradle ingests and predicts are, of course, not in a language we know, but they are relatively straightforward linear sequences of text that have associated meanings. “It’s like an alien programming language,” van Grieken said.
Protein engineers are not helpless, of course, but their work necessarily involves a lot of guesswork. One can be fairly certain that among the 100s of sequences they modify is the combination that will produce the desired effect, but beyond that it all comes down to exhaustive testing. A little hint here can speed things up considerably and avoid a huge amount of fruitless work.
The model works in three main layers, he explained. It is first assessed whether a given sequence is “natural”, i.e. whether it is a meaningful sequence of amino acids or just random. This is similar to a language model that can simply say with 99% confidence that the sentence is in English (or Swedish, in van Grieken’s example) and the words are in the correct order. This he knows from “reading” millions of such sequences determined by laboratory analysis.
It then looks at the actual or potential foreign language meaning of the protein. “Imagine that we give you a sequence and this is the temperature at which that sequence will break down,” he said. “If you do this for many sequences, you can say not just ‘this looks natural’, but ‘this looks like 26 degrees Celsius.’ this helps the model know which regions of the protein to focus on.
The model can then suggest sequences to include—essentially educated guesses, but a stronger starting point than zero. The engineer or lab can then test them and feed that data back into the Cradle platform where it can be re-ingested and used to fine-tune the model for the situation.
Modifying proteins for a variety of purposes is useful in biotechnology, from drug design to biomanufacturing, and the path from a vanilla molecule to a customized, effective, and efficient molecule can be long and expensive. Any way to cut short is likely to be welcomed at least by lab technicians who have to run hundreds of experiments just to get one good result.
Cradle has been operating in stealth and is now emerging, having raised $5.5 million in a seed round co-led by Index Ventures and Kindred Capital, with participation from angels John Zimmer, Feike Siebesma and Emily Leproust.
Van Grieken said the funding will allow the team to expand data collection — the more the merrier when it comes to machine learning — and work on the product to make it “more self-serving.”
“Our goal is to reduce the cost and time-to-market of a bio-based product by orders of magnitude,” van Grieken said in the press release, “so that anyone—even ‘two kids in the garage’—can bring a bio-based product to placing on the market.’