For two decades, researchers have used brain imaging technology to try to identify how the structure and function of a person’s brain relates to a range of mental illnesses, from anxiety and depression to suicidal tendencies.
But a new paper, published Wednesday in Nature, raises the question of whether much of this research actually produces valid findings. Many such studies, the paper’s authors found, typically include fewer than two dozen participants, far fewer than the number needed to generate reliable results.
“You need thousands of individuals,” said Scott Marek, a psychiatric researcher at Washington University School of Medicine in St. Louis and an author of the paper. He described the finding as a “gut punch” to the typical studies that use imaging to better understand mental health.
Studies using magnetic resonance imaging technology usually temper their conclusions with a cautionary note about the small sample size. But recruiting participants can be time-consuming and expensive, ranging from $600 to $2,000 per hour, said Dr. Nico Dosenbach, a neurologist at Washington University School of Medicine and another author on the paper. The median number of subjects in mental health studies that use brain imaging is about 23, he added.
But the Nature paper shows that the data from just two dozen subjects is generally insufficient to be reliable and could, in fact, yield “hugely inflated” findings,” Dr. Dosenbach said.
For their analysis, the researchers examined three of the largest studies using brain imaging technology to draw conclusions about brain structure and mental health. All three studies are underway: the Human Connectome Project, which has 1,200 participants; the Adolescent Brain Cognitive Development, or ABCD, study, with 12,000 participants; and the UK Biobank study, with 35,700 participants.
The authors of the Nature paper looked at subsets of data within those three studies to determine whether smaller slices were misleading or “reproducible,” meaning the findings can be considered scientifically valid.
For example, the ABCD study looks at whether the thickness of the brain’s gray matter can be correlated with mental health and problem-solving ability. The authors of the Nature paper looked at small subsets within the large study and found that the subsets produced results that were unreliable compared to the results of the full data set.
On the other hand, the authors found that when results were generated based on sample size involving several thousand subjects, the findings were similar to those of the full data set.
The authors performed millions of calculations using different sample sizes and the hundreds of brain regions examined in the several large studies. Time and again, the researchers found that subsets of data from fewer than several thousand people did not produce results consistent with those of the full data set.
dr. Marek said the article’s findings were “definitely” applicable beyond mental health. Other areas, such as genomics and cancer research, have made their own trade-offs with the limits of small sample sizes and have tried to correct the course, he noted.
“My hunch is that this is a lot more about population science than any of those areas,” he said.