Parallel Domain says autonomous driving won't scale without synthetic data

Achieving safe autonomous driving requires almost endless hours of training software for every situation that could possibly arise before you put the vehicle on the road. Historically, autonomous companies have collected reams of real-world data to train their algorithms with, but it’s impossible to train a system how to handle edge cases based on real-world data alone. Not only that, but it takes a long time to even collect, sort, and label all that data in the first place.

Most self-driving car companies, such as Cruise, Waymo, and Waabi, use synthetic data to train and test perception models with a speed and level of control that is impossible with data collected from the real world. Parallel domainstartup that has built a data generation platform for autonomous companies says synthetic data is a critical component for scaling AI, powering vision and perception systems and preparing them for the unpredictability of the physical world.

The startup just closed a $30 million Series B round led by March Capital, with the participation of return to investors Costanoa Ventures, Foundry Group, Calibrate Ventures and Ubiquity Ventures. Parallel Domain is focused on the automotive market, supplying synthetic data to some of the major OEMs building advanced driver assistance systems and autonomous driving companies building much more advanced self-driving systems. Now, Parallel Domain is poised to expand into drones and mobile computer vision, according to co-founder and CEO Kevin McNamara.

“We’re also really doubling down on generative AI approaches to generating content,” McNamara told TechCrunch. “How can we use some of the advances in generative AI to bring a much greater variety of things, people and behaviors into our worlds? Because again, the hard part here is really, once you have a physically accurate render, how do you actually build the million different scenarios that a car will have to deal with?’

The startup is also looking to hire a team to support its growing customer base in North America, Europe and Asia, according to McNamara.

Building a virtual world

An example from Parallel Domain's synthetic data

An example from Parallel Domain’s synthetic data. Image credit: Parallel domain

When Parallel Domain was founded in 2017, the startup was hyper-focused on creating virtual worlds based on real-world mapping data. Over the past five years, Parallel Domain has added to its world generation, filling it with cars, people, different times of day, weather, and the full range of behaviors that make these worlds interesting. This allows the customers from which Parallel Domain counts Google, Continental, Woven Planet and the Toyota Research Institute – to generate the dynamic camera, radar and lidar data they will need to train and test their vision and perception systems, McNamara said.

Parallel Domain’s synthetic data platform consists of two modes: training and testing. When training, customers will describe high-level parameters—for example, highway driving with 50% rain, 20% nighttime, and an ambulance in each sequence—that they want to train their model on, and the system will generate hundreds of thousands of examples that fit those parameters.

On the testing side, Parallel Domain offers an API that allows the client to control the placement of dynamic things in the world, which can then be connected to their simulator to test specific scenarios.

Waymo, for example, is particularly keen on using synthetic data to test for different weather conditions, the company told TechCrunch. (Disclaimer: Waymo is not a verified Parallel Domain customer.) Waymo views time as a new lens to apply to all the miles it has driven in the real world and in simulation, since it would be impossible to remember all these experiences with random time conditions.

Whether for testing or training, when Parallel Domain’s software creates a simulation, it is able to automatically generate labels to match each simulated agent. This helps machine learning teams perform supervised training and testing without having to go through the arduous data labeling process themselves.

Parallel Domain envisions a world where autonomous companies use synthetic data for most, if not all, of their training and testing needs. Today, the ratio between synthetic and real data varies from company to company. More established companies with historical resources to collect a lot of data use synthetic data for about 20% to 40% of their needs, while companies that are earlier in the product development process rely 80% on synthetic versus 20% of the real world, according to McNamara.

Julia Klein, a partner at March Capital and now one of Parallel Domain’s board members, said she believes synthetic data will play a critical role in the future of machine learning.

“Getting the real-world data that you need to train computer vision models is often a hurdle, and there’s a difficulty in terms of being able to get that data, label it, prepare it for a position where it can actually to be used,” Klein told TechCrunch. “What we’ve seen with Parallel Domain is that they greatly speed up that process and also address things that you might not even get in real-world datasets.”

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *