Skip to content

The census has been broken. Can AI solve it?

    Greg Yetman is co-director of the Center for International Earth Science Information Network (CIESIN), part of Columbia University’s Climate School. As part of a NASA contract, CIESIN has been exploring ways to provide socioeconomic data by observing the Earth since the early 1990s. Yetman says things like understanding that it’s common for people to live in basement apartments in the Queens borough of New York City, for example, are “always hard to capture and very hard to measure from space.” Apartment renovations, subletting by an owner or occupier, or unregistered settlements — all of which are likely to increase as the cost of living rises — are also not often recorded by the census or satellites. And if an individual is homeless or has little financial data, it may not show up in location-sharing data collected by private real estate agents.

    There is room for improvement in the US census, but the constitution requires one to be taken every decade, and Yetman says the country is “data rich.” By comparison, some countries have not conducted detailed household surveys for decades. Obstacles such as costs, conflict, or difficulty reaching remote locations can make some communities harder to count.

    In 2017, the Nigerian government, CIESIN and others working with funds from the Bill & Melinda Gates Foundation used satellite imagery and machine learning to map the country’s population to deliver measles vaccinations. Since then, Gates Foundation senior program officer Vince Seaman said, the effort has expanded to five other African countries, a project known as Grid3. That work, he adds, shows that the technology is only part of the solution. After machine learning was applied to photos from satellites, community surveys were conducted to personally reach thousands of people and verify the results.

    Research published last month used satellite imagery and machine learning to automatically identify residential lots and predict population, age and gender in five provinces in the western half of the Democratic Republic of the Congo (DRC). The project brought together Grid3 participants such as the University of Southampton in the UK with groups such as the DRC’s National Bureau of Statistics. Anonymous surveys of nearly 80,000 people were conducted by the Kinshasa School of Public Health and the University of California, Los Angeles School of Public Health to validate the performance of a deep learning model with approximately 80 percent accuracy. Co-authors say their method isn’t a substitute for a real effort to count the entire population, but it can provide a predictive snapshot of society in places with little or poor data. There has been no national census in the DRC since 1984.

    Yetman has been working with satellite images for over 20 years. He works with Pop Grid, a data collaboration organization for a diverse group of population-counting organizations, including the European Commission, Facebook, the German Aerospace Center and NASA. He says deep learning models for identifying buildings can’t always tell where one roof ends and another begins, and he cautions that there is no such thing as a model that works anywhere in the world.

    In the US, he explains, it’s problematic to apply an AI model trained with images of western US roofs when applied to houses on the east coast, because the westward expansion of the U.S. country follows a grid-based system as cities like Boston developed. with less uniformity. Likewise, a roof in South Africa looks different than in Zambia. AI can easily mistake the roof of a stall in a commercial market in Accra, Ghana for the roof of an unregistered house, or struggle to accurately predict the number of people living in urban settlements or rural villages. “Without the soil survey showing that there is a slum or informal settlement here, it’s very difficult to determine just the structure of the roof patterns,” says Yetman. He adds that getting high-quality data for training models to detect buildings or residential lots based on local conditions is the most difficult part of the job.