Developers of self-driving cars initially had a similar philosophy of data maximization. They generate video from arrays of cameras inside and outside the vehicles, audio recordings from microphones, point clouds mapping objects in space from lidar and radar, diagnostic measurements of vehicle components, GPS measurements and much more.
Some assumed that the more data collected, the smarter the self-driving system could become, says Brady Wang, who studies automotive technologies at market researcher Counterpoint. But the approach didn’t always work because the volume and complexity of the data made it difficult to organize and understand, Wang says.
In more recent years, companies have begun to only retain data that was thought to be specifically useful, and have also focused on organizing it properly. In practice, data from an hour’s drive on a sunny day in the desert can look repetitive, so the usefulness of keeping them all has been questioned.
Borders are not entirely new. Chatham, the leading software engineer at Waymo, says it wasn’t easy to access more digital storage when the company was a small project within Google more than a decade ago and he was a one-man team. Data that had no apparent use was removed, such as recordings of failed driverless maneuvers. “If we thought of storage as infinite, the cost would be astronomical,” says Chatham.
After Waymo became an independent company with significant outside investment, the project absorbed data storage more freely. For example, when Waymo began testing the Jaguar I-Pace in late 2019, the crossover SUV came with more powerful sensors that generated a greater flow of information — to the point where full logs for an hour’s drive equaled more than 1,100 gigabytes , enough to fill 240 DVDs. At the time, Waymo significantly increased its storage capacity and teams became less picky about what they kept, Chatham says.
More recently, Chatham’s team began setting strict quotas and asking people across the company to be more sensible. Waymo now retains only some of its newly generated data and has recently started to delete stored data as it becomes obsolete compared to current technology, conditions and priorities. Chatham says the strategy is working well. “We need to start throwing data away quickly as our service grows,” he says.
Waymo carried fare-paying passengers more than 23,000 miles in California between September and November last year, up from about 13,000 miles in a similar time frame just six months earlier, according to disclosures to state regulators.
Data caps have in some cases taken into account the priorities of autonomous vehicle companies. With some negotiation allowed, Chatham’s team allocates quarterly storage space to groups of engineers working on various tasks, such as developing AI to identify what’s around a vehicle (perception) or testing planned software updates against previous trips (Evaluation). Those teams decide what’s worth keeping, say data on emergency vehicle actions, and an automated system filters out everything else. “That’s going to be a business decision,” Chatham says. “Is snow or rain data more important to the business?”
Snow has won for now, as Waymo only has limited data from driving in it so far. “We love every piece,” Chatham says. Rain has become less interesting. “We’ve gotten better at rain, so we don’t have to go to infinity.” Being data frugal can sometimes lead to creativity or valuable discoveries, he says. Waymo discovered at one point that its rain data needlessly included all of the sensor readings its cars collected while parked.