One of the unfortunate realities of science is that small data sets often produce unreliable results, because small, random fluctuations can have a big impact. One solution to this problem is to build ever larger data sets, where these fluctuations are usually small compared to any actual effects. One of the notable sources of big data is the UK Biobank; brain scans of people in the Biobank were recently used to identify changes in the brain caused by SARS-CoV-2 infection.
Now a large team of researchers has turned this idea on its head in a new paper. They took some of the largest datasets and broke them up into smaller chunks to find out how small datasets could go before things got unreliable. And for at least one type of experiment, the answer is that brain studies need thousands of participants before they are likely to be reliable. And even then, we shouldn’t expect many dramatic effects.
Link all things
The research team behind the study called the type of work they were interested in “brain-wide association studies,” or BWAS. It’s a pretty simple approach. Take some people and score them on a behavioral trait. Then give them any brain scans and see if there are any brain structures that show differences that consistently correlate with the behavioral trait.
By analyzing the whole brain at once, we avoid biases that could arise from what we think individual brain regions are doing. The disadvantage is that we have defined many brain structures, which increases the chance of a false association. And people have published BWAS studies with just a few dozen participants, meaning chance could play a big role in all of the results.
For the current study, the research team combined three large data sets to create a total population of more than 50,000. They then showed all possible associations, given the behavioral characteristics scored in the participants.
The simplest thing they did was look for the strongest correlation they could find. There is a measure of the strength of a correlation called r where a value of 1 represents perfect correlation and zero represents no correlation (-1 is anticorrelation). In terms of r, the largest association the researchers found among billions of tests was 0.16, which is not particularly strong. In fact, a correlation as weak as r = 0.06 was enough to get something in the top 1 percent of all correlations. (The same was true for anticorrelations.)
Unsurprisingly, many studies have already reported correlations stronger than this one. The results suggest that we should treat these results with considerable skepticism.
Where it goes wrong
To further explore the potential problems with association studies, the researchers divided the study population into much smaller groups, ranging from just 25 participants to as many as 32,000, and then rearranged the BWAS into these smaller populations. In the smallest studies, associations can be as high as r = 0.52. That’s much stronger than we’d expect based on the full dataset, and suggests some pretty serious issues with small studies.
But the researchers had to go much bigger to make these problems go away. “Statistical errors were ubiquitous in all BWAS samples,” the researchers write. Even with populations close to 1,000, the false-negative rates were very high, meaning an association found across the entire dataset was not detected. And real associations turned out to be twice as strong as in the entire population.
Overall, it seems that we need several thousand participants before BWAS-like studies are likely to yield reliable, reproducible results.
The researchers caution that this work applies to a specific type of brain research. It doesn’t mean that all low-population brain studies are unreliable — in fact, the paper shows that we’ve learned a lot about brain function from many small studies. I would note that much of what we understand about the function of different parts of the brain comes from studying injuries affecting a single individual. The authors also note that some related analyzes — using functional MRI or performing multivariate analysis — tended to produce more robust results using their data set.
Still, the article offers a clear and important warning to those doing research in the field. The question is how this caution will be handled. For this idea to change the standards by which articles are published, journal editors must pay attention, as do other researchers in the field who act as peer reviewers. Fortunately, the growth of large, public datasets like the Biobank will make it easier for everyone to demand larger, more rigorous studies.
Nature2022. DOI: 10.1038/s41586-022-04492-9 (About DOIs).