Within a few hours, several groups of scientists have raised increasing concerns about the integrity of the work. Those worries have infiltrated Twitter and now are entering the mainstream media, including the NY Times, and The Scientist Magazine.
“Experts demanded verification of data and methods used in a study of drugs to treat Covid-19. The study suggested the drugs might have increased deaths.” - NY Times
Besides the usual quibbling about some statistical valuations and assessments, concern has focused on the dataset itself, provided by a company called Surgisphere. The company claims to be a cloud-based aggregator of healthcare data gleaned from individual hospital electronic medical records. The general term for this type of organization is a collaborative, in this case, a Surgical Outcomes Collaborative. Their work seems straightforward; just pull the data and store it. But the process is far more complex.
Leaving aside the regulatory burdens, which are significant, especially when applied to multiple global jurisdictions, the data must be standardized, verified, and then stripped of any patient identifiers. Because of the increasing ability of machine learning to draw linkages, stripping patient identifiers involves more than removing names – using de-identified credit card purchases, researchers used only four easily obtained “outside” facts to identify more than 90% of the credit cardholders.
I place some credence in the skeptical concerns about this database for several reasons. I’ve had personal experience as a member of a surgical outcomes collaborative, the National Surgical Quality Improvement Program (NSQIP) of the American College of Surgeons. Over a roughly ten-year period, it has recruited almost 600 hospitals to participate, and each pays a small fee to the college but spends an even more substantial amount on the staff required to abstract, standardize and verify the data – there is no real automation. Contrast that with Surgisphere’s program that seems to have existed for only a year, involves 559 US hospitals, and where “Real-world data are collected through automated data transfers that capture 100% of the data from each healthcare entity at regular, predetermined intervals.” You would think that level of automation would have gathered more interest.
The NSQIP program has over 1500 peer-reviewed publications using their dataset, Surgisphere only 2, one of which is the Lancet article, the second in a recent issue of the New England Journal of Medicine. Smoking gun, hardly. But does it make you wonder about the data? Given this information, would you blindly accept that it is correct, especially after peer-review, or would you like outside confirmation? Do you believe, the words of Dr. Desai, one of the paper’s authors and owner of Surgisphere,
“What the world has to understand is that this is registry-based data,” Dr. Desai said. “We have no control over the source of the information. All we can do is report the data that is given to us.”
I bring this issue to your attention because it demonstrates the on-going politicization of scientific thought that is far more potentially harmful. Contrast the following quotes, the first concerning this current study
“Ideally, the database should be made public, but if that isn’t possible, it should at least be independently reviewed and an audit performed,” he said.
The second is in opposition to the EPA’s proposed transparency rule that would require just that independent review of science used in regulatory decisions,
“Now is not the time to play games with critical medical research that underpins every rule designed to protect us from harmful pollution in our air and in our water,” she [2] said. “The American people have the right to know the truth about threats to our health, and the truth about our future in the face of the climate crisis.”
One of the significant differences between science and religion is that science stakes it claims on verifiable facts, not merely on belief and faith, which are important in their own right. There is increasing evidence that peer-review is not infallible; this “study” on hydroxychloroquine may well be another example. The EPA transparency proposal might well require us to rethink some of our environmental policy. Still, it would just as surely protect us from these types of health studies where the data is simply analyzed, and there is no way to establish the underlying truth of the dataset itself.
Addendum - As of June 4th both the Lancet paper and an additional paper using the same database published in the New England Journal of Medicine have been retracted because the co-authors were not allowed to see the data for review.
Notes
[2] Gina McCarthy, former EPA head