3 Reasons To Be Wary Of Meta-Analyses

Related articles

Credit: ShutterstockCredit: Shutterstock

In the September 1906 issue of the North American Review, Samuel Clemens (Mark Twain) popularized the phrase, "There are three kinds of lies: lies, damned lies, and statistics."

More recently, and relevant to the increasing use of statistics in science and health issues, statistician Stephen John Senn wrote on Twitter

He said that because data are often tortured until they confess to exactly what a scholar wants the numbers to say. And then sometimes they are killed by accident. One of the key ways to intentionally put data on the rack is the meta-analysis.

In a meta-analysis, the scientific literature is searched, a subset of papers is selected and then a combined estimate is made. It is essentially "conducting research about previous research." As an example, I am going to use a recent e-cigarette paper published in The Lancet, because it provocatively makes the claim that e-cigarettes makes it less likely, not more likely, that smokers will quit, and e-cigarettes are a popular and controversial topic.

Controversial subjects lend themselves to strange methodologies and use of statistics so it is good to use for a discussion on the problems in meta-analyses. The paper, authored by Sara Kalkhoran, MD, and Stanton A. Glantz, PhD, of the Center for Tobacco Control Research and Education, University of California, San Francisco, CA, is not about the health issues, obviously e-cigarettes are better for health than smoking so harm reduction is clear, but it instead tackles their efficacy in "smoking cessation," which is the way nicotine gums and patches are used.

The statistical issues can be confusing and journalists fall prey to misunderstanding meta-analyses as well as anyone, but the worst thing that can happen for the public is that an unweighted, random effects meta-analysis gets mainstream attention and becomes the reason for policy decisions, without considering how valid the results are. Hopefully this article detailing the issues in methodology will prevent policy decisions being made incorrectly.

When tackling an issue that is controversial, there is always the chance that the data selected could be massaged into a result. That leads to Problem #1 in a meta-analysis that occurs in this paper: Selection Bias

Using keywords to get their selection pool, they say their initial search yielded 577 papers, from which the authors then chose (selected) about 20 papers. I did a simple Google Scholar search ("e-cigarette cessation") and got over 300,000 results, so anyone who uncritically accepts their results has to implicitly trust their winnowing process: namely that 20 out of 577, much less out of 300,000, was a truly representative sample.

As you can imagine, by changing the selection rules even slightly, you can get a different result, because the basic premise of meta-analysis is to average out errors. It can work well, even with only 20, if the studies are consistent in quality, but if selection bias instead causes papers that match confirmation bias to be selected, this "average" can be far from reality, due to outliers being included.

So we also have consider Problem #2 in a meta-analysis: Statistical Power

To have a legitimate meta-analysis, researchers must identify a common statistical measure the studies share -- the effect size -- and standard error that will allow computing a weighted average of that common statistical measure. For that, researchers consider issues like sample sizes of the individual studies and study quality.

Given that consideration, it is obvious that any shift in selection rules, as I mentioned in selection bias (#1 above), can produce a much different result. Results could have a great confidence interval and be completely wrong. (A famous recent example in physics was in 2011 when a group reported faster-than-light neutrinos with 6-sigma confidence. They had supreme confidence in flawed data and time travel is still not possible.)

The authors of the paper can readily see, using simple statistical analysis, that the 20 papers were not uniform enough to be combined: they don't vary slightly (due to chance); they vary a lot. For statistical experts, including (one hopes) peer reviewers, that would have put a halt to the paper because combining such dissimilar results is like combining apples and steak. No answer can be legitimate.

Why they proceeded knowing the results could not be combined is a mystery. How the submitted paper passed peer review at the journal is an even bigger mystery, but outside the scope of this article.

The answer may lie in Problem #3 with a meta-analysis: Using a too narrow and directed criterion in crafting the question. In this paper the question was crafted so narrowly it is logical fallacy: Is there an association between e-cigarette use and cigarette smoking cessation ¦ irrespective of their motivation for using e-cigarettes?

That crosses into Circulus in Probando (a circular argument), Cum Hoc Ergo Propter Hoc (correlation implies causation), and Petitio Principii ("begging the question", where the conclusion of an argument is assumed in the phrasing of the question). Looked at by an objective participant, there are obviously a number of reasons a person might start using e-cigarettes, one of which could be smoking reduction. It could be a fad. It could be harm reduction. It could be they like a flavor.

By using such a narrow question they are, intentionally or not, making the perfect result, an end to smoking, the enemy of the good, such as smoking harm reduction or a transition to quitting smoking by using a particular smoking cessation tool.

Conclusion

Their own analysis says that the studies they chose are too diverse to be combined. One size does not fit all when it comes to changing behavior and creating a paper designed to eliminate one tool, which is shown to be as effective (and as ineffective, thus "one size does not fit all") as any other is not informing the public.

Obviously smoking exclusively is worse than smoking a little and having e-cigarettes, which is worse than having just e-cigarettes. The perfect result would be having no one inhale anything at all, but that approach does not work for most, which is why so many techniques are used in the broader goal of stopping smoking.

That is the broader problem with the article. It seems to be agenda-based against one product type, instead of objectively analyzing what works for smoking cessation. From a statistical validity point of view, this Lancet paper has to be ignored and instead there should be concern among the editorial staff as to why no reviewers caught these obvious problems.

But journals are not immune to personality issues and agendas any more than scholars are, so stay tuned, as research is active and better and more nuanced reviews of e-cigarettes should appear soon. However, while that happens, this paper should not be used to make policy.