Response to Meta-Analysis Published by Assendelft, et al., in the Annals of Internal Medicine
By Anthony Rosner, PhD, LLD [Hon.], LLC
The recent meta-analysis by Assendelft, et al., published in the May 13, 2003 Annals of Internal Medicine,1 is a troubling example of how clinical data are interpreted and presented for public consumption and policy development.In addition to raising numerous issues as to the clinical applicability of meta-analyses, it may even belie its basic premises, as I will illustrate. In reviewing this document, one must remain vigilant as to how the rowdy qualities of human bias, subjectivity and disagreement extend well into the rarefied atmospheres of randomized clinical trials, meta-analyses and actual clinical guidelines.
The overall conclusion of the Assendelft, et al., report - that there is no evidence that spinal manipulation therapy is superior to standard treatments for patients with either acute or chronic low back pain - can be interpreted in the same breath to indicate that, in terms of the pain or disability outcomes scales evaluated, it is not inferior. Before analyzing the methodological issues of the report itself, it is entirely justified to ask whether the treatments are truly equivalent.
1. Comparative Side-Effects and Relative Safety
For spinal manipulation, the occurrence of major complications, regardless of the region of the spine manipulated, has generally been shown to be less than one per million.2-5 Even transient, minor side-effects have been estimated to occur at one per 120,000 cervical manipulations.6 These figures pale when compared to an extensive body of literature describing as many as 220,000 deaths and other complications in the United States attributable each year to medications, in general,7-14 or the 10,000-20,000 fatalities and multiple-organ systems adversely affected by NSAIDs.15-23 Even what has been regarded as the more relatively benign COX-2 inhibitors24-27 and acetaminophen medications28 have been described to generate serious GI, cardiovascular and hepatic problems at rates that are orders of magnitude greater than side-effects attributed to spinal manipulation. The overall picture comparing spinal manipulation to the commonly used treatment alternatives of either direct analgesic ingestion, or visits to the general practitioner (80 percent resulting in analgesic use, by the authors' own citation 1,29), should be one of relative clarity to the patient: In one instance, there is an option with a low rate of lasting side-effects; in the other, there is a treatment regimen with severe and sometimes fatal complications inexplicably deemed "acceptable."30
2. Mix of Clinical Judgement With Data From the Literature
The authors strongly imply that this study is intended to be more rigorous than the systematic reviews and meta-analyses that preceded it. However, their admission to the effect that the comparison of spinal manipulative therapy with each different treatment alternative for each outcome for each back pain stratum "was not possible, because the data were sparse," raises one's suspicion that this particular review may not have been as "systematic" as first presumed. These fears are confirmed in the very next sentence, which informs the reader that the clinical judgment of effectiveness benchmarks from members of the Cochrane editorial board was used to fill in the gaps of experimental data, undermining the very process championed in this study.
Indeed, the noted epidemiologist David Sackett applauds the use of clinical expertise and experimental outcomes data to build a truly effective evidence base for optimum patient care,31 a sentiment echoed elsewhere.32 However, this undercuts the very process the authors suggest they are undertaking in pursuit of the most definitive experimental data available. In other words, how much adulteration of this "systematic review" has taken place?
3. Inadmissable Criterion of Quality
One of the criteria for methodologic quality of randomized clinical trials (RCTs) by the Cochrane Group - the blinding of the care provider (V3) - is impossible in the administration of manual therapy, particularly high-velocity spinal manipulation. Accordingly, its inclusion by the authors as a determinant of inclusion or rejection of RCTs is without justification. At least 11 studies have reported (erroneously) double-blinding in the chiropractic experimental literature; the nonfeasibility of blinding the practitioner in numerous modalities of alternative medicine has been extensively discussed elsewhere, and needs to be duly noted.33
4. Guideline Rationale
As part of their rationale for embarking on this investigation, Assendelft, et al., bemoan the disparity of recommendations for spinal manipulative therapy from different countries, citing, in particular, the dissensions expressed in the guidelines from Australia, Israel and The Netherlands. What the authors do not disclose is the preponderance of support for spinal manipulation expressed in eight out of a total of 11 such guidelines, with perhaps an additional half guideline thrown in for The Netherlands (which found sufficient justification for treating acute, but not chronic back pain by spinal manipulation,34 an oddity, since one of the authors of that study, van Tulder, who is Dutch, decisively supported the chronic over the acute evidence in a recent systematic review.35)
Furthermore, in the comparison of guidelines cited by the authors, there was concordance among all 11 nations (United States; United Kingdom; The Netherlands; Israel; New Zealand; Finland; Australia; Switzerland; Germany; Denmark; and Sweden) in six aspects of health care:
Other areas besides spinal manipulation in which differences arose were:
The reason this census of nations regarding guideline and medical practices has been taken is to point out that the reported concordances and discordances do not appear to correlate with the amount, design and quality of randomized clinical trials or systematic literature reviews that have been published. Rather, there appear to be human and cultural values at work here that I maintain have not necessarily been eliminated in the study currently under discussion. This leads directly to our next point of critique.
5. Meta-Analyses Themselves Are Subject to Bias and Omissions
Regarding their clinical relevance, the very basis of meta-analyses, including the report of Assendelft, et al., has to be scrutinized closely. One report has gone so far as to compare meta-analyses to statistical alchemy, due to their intrinsic nature:
"... the removal and destruction of the scientific requirements that have been so carefully developed and established during the 19th and 20th centuries. In the mixtures formed for most statistical meta-analyses, we lose or eliminate the elemental scientific requirements for reproducibility and precision, for suitable extrapolation, and even sometimes for fair comparison."36
Specifically, Feinstein raises the following deficiencies of meta-analyses, most having to do with the sloughing of important clinical information:
In any event, the numbers of patients needed to treat must be reported to observe a true difference in treatment groups, a practice often overlooked in meta-analyses.36 To make matters worse, a recent report involving four medical areas (cardiovascular disease, infectious disease, pediatrics, and surgery) indicates individual quality measures were not reliably associated with the strength of the treatment effect in 276 RCTs analyzed in 26 meta-analyses.37
The fact that arbitrariness and bias can not only creep into, but actually dominate meta-analyses, is demonstrated convincingly and dramatically in a recent study published in the Journal of the American Medical Association. In their efforts to compare two different preparations of heparin for their respective abilities to prevent postoperative thrombosis, Juni and his colleagues revealed that diametrically opposing results can be obtained in different meta-analyses, depending on which of 25 scales is used to distinguish between high- and low-quality RCTs. The root of the problem is evident from the variability of weights given to three prominent features of RCTs (randomization, blinding, and withdrawals) by the 25 studies that have compared the two therapeutic agents.
In one investigation, for example, a third of the total weighting of the quality of the trial is afforded to both randomization and blinding, whereas in another, none of the quality scoring is derived from these features. Widely skewed intermediate values for the three aspects of RCTs under discussion are apparent from the 23 other scales presented. The astute reader will suspect immediately that sharply conflicting conclusions might be drawn from these different studies, and these fears are amply borne out by the forest plot presented in the study.
Here, each of the meta-analyses listed resolves the studies they have reviewed into high- and low-quality strata, based on each of their scoring systems. It can be seen that 10 of the studies selected show a statistically superior effect of one heparin preparation over the other, but only for the low-quality studies. Seven other studies reveal precisely the opposite effect, in which the high- but not the low-quality studies display a statistically significant superiority of low-molecular weight heparin. Therefore, depending on which scale one uses, one can either demonstrate or refute the clinical superiority of one clinical treatment over the other. In this manner, all the rigor and labor-intensive elements of the RCT and its interpretation by the meta-analysis are simply reduced to the subjective and undoubtedly capricious human element of value judgment through the arbitrary assignment of numbers in the weighting of experimental quality.38 Reduced to lay terms often used to describe the limits of computer capabilities, one might summarize this undertaking as an apt demonstration of the principle, "Garbage in, garbage out."
6. Contradictions in Design
There appear to be contradictions in the design in the authors' comparison of spinal manipulative therapy to seven other treatment therapies (sham; conventional general practitioner; analgesics; physical therapy; exercises; back school; or a collection of therapies judged to be ineffective or even harmful, such as traction; corset; bedrest; home care; topical gel; diathermy; minimal massage, or no treatment). Specifically:
7. Contradictions in Evaluating Statistical and Clinical Significance
One especially troubling situation arises with the authors' interpretation of the forest plots comparing spinal manipulative and sham therapies. In one instance (Figure 3 in the Annals article), spinal manipulative therapy is shown to have "clinically important" short-term improvements in pain and disability; however, these differences are deemed to have "failed to reach a conventional level of statistical significance." In comparing spinal manipulative therapy to the group of treatments deemed ineffective, however, we now find a statistically significant advantage for the former intervention. It is perplexing indeed to then find the authors stating, "The clinical significance of this finding is questionable (emphasis mine).1 In the simplest of terms, one cannot have it both ways. It would almost seem as if there were a deliberate effort to minimize a treatment effect of potential interest pertaining to spinal manipulation.
8. Data Are Not Shown in Critical Areas of Interest
Given the aforementioned arbitrary characteristics of meta-analyses, and perhaps of the authors' presentation, one has every reason to wish for the opportunity to examine the data that support the authors' contention that "our sensitivity analyses supported the robustness of our results with respect to the type of manipulative therapy, profession of the manipulator, and the quality of the studies included."1 However, none pertaining to these critical areas were presented in the body of the paper. The issue is particularly important with regard to the skill and training of the manipulator, who at times has been misrepresented in the scientific literature.39,40 It is questionable how effectively the authors were able to draw comparisons of different chiropractic techniques, as they overlooked the most recent and arguably comprehensive attempts to do so from both the points of view of clinical effectiveness41 and a literature review.42
9. Clinical vs. Fastidious Treatments
Some treatments (traction, diathermy, minimal massage) have been deemed by the authors to lack sufficient evidence for their effectiveness as stand-alone applications, and as such, have been rejected from consideration in this investigation. What is not clear, however, is whether they are effective in a synergistic manner as ancillary treatments, and whether they have been excluded as potentially helpful adjuncts to manual therapy. This was alluded to in Feinstein's discussion of meta-analyses presented above (critique #5).36
10. Lack of Long-Term Follow-Up
In this study, follow-ups for back pain outcome assessments are limited to six months. However, numerous studies cite recurrences of low back pain for up to one year.43-45 This not only makes the definition of an episode problematical,46 but demands that follow-up times for at least a year be observed to assess a more durable and perhaps economical treatment effect. Indeed, the longevity of treatment effects of spinal manipulation in managing back pain for 12 months47-49 to three years50 has been amply demonstrated. In comparison to medications for the treatment of headaches, it has been shown to be markedly superior.51,52 As in several aforementioned areas of this study, this particular aspect for comparing treatments might be expected to diminish the actual capacity of spinal manipulation to display its full benefits.
From a variety of perspectives, these meta-analyses appear flawed and either obscured or overlooked the maximal clinical benefits that might be expected to have been conferred upon patients by spinal manipulation, particularly as performed by a chiropractor. The patient response to intervention is far more complex than the dimensions offered by the authors in their discussion. Tonelli points out, for example, that there will always be a region called an epistemological zone53 in which discrete differences between individuals cannot be made explicit and quantified. This degree of sophistication is best summarized by Horwitz, who points out that to assume that the entire range of clinical treatment in any modality has been successfully captured by the precision of existing analytical methods in the scientific literature, "would be like saying that a medical librarian who has access to systematic reviews, meta-analyses, Medline, and practice guidelines provides the same quality of healthcare as an experienced physician."54 Hopefully, these shortcomings in the current meta-analyses can be appreciated by the public and addressed more meaningfully in future research.
Click here for more information about Anthony Rosner, PhD, LLD [Hon.], LLC.