Home >Newsletters >December 2007>Features
 
ASA NEWSLETTER
 
 
December 2007
Volume 71
Number 12

Outcomes Research in Pain Medicine: What is the 'Gold Standard?'

Craig T. Hartrick, M.D.


andomized controlled trials (RCTs) are widely recognized as the “gold standard” in clinical research. They enjoy the highest levels of respect from editors and reviewers, and they garner the highest weighting when cumulative evidence is judged. The ability of RCTs to reduce selection bias is remarkable, and the reputation is well deserved, especially with respect to noninvasive and drug therapies. The blinding of subjects and investigators and the use of placebo control are additional measures aimed at reducing bias. Invasive therapies such as surgery and interventional pain procedures, however, present special problems in effectively reducing bias through the use of RCTs. This discussion briefly highlights some of these difficulties and recommends an alternative approach to outcomes study design in pain medicine.

Randomized trials using placebo or sham intervention controls have both scientific and ethical problems when invasive treatment approaches (interventional pain procedures and surgery) are studied.1 While it is beyond the scope of this presentation for a more comprehensive discussion, suffice it to say that the more dramatic the intervention, the more pronounced and prolonged will be the “meaning response” result.2 In the case of interventional pain therapies, this effect can be long enough to be confounded with the natural history of the disease process. RCTs using best available alternative therapies as a control may be preferable3 but cannot account for the added “meaning” and expectations associated with the intervention. Double-blind, double-dummy designs address the scientific but not the ethical concerns of subjecting subjects to sham intervention. More importantly, in an effort to minimize bias, RCTs further attempt to eliminate confounding factors by using rigid inclusion and exclusion criteria. The resultant uniformity of patients frequently severely limits the applicability of the study results to “real world” situations.

Observational studies, while encompassing “real world” scenarios, have been much maligned for overestimating treatment effects. While examples are too numerous to mention, the failings have to do with the design of these studies, not the act of observation. Poorly designed observational studies suffer from selection bias, often using historical controls, controls from unblinded studies, non-randomly assigned controls or controls from previous work. Control data collected over different periods from the time in which the intervention groups were studied further degrades quality.

In contrast, high quality observational studies incorporate many of the same design features commonly seen in RCTs. Prospective data collection and an intent-to-treat (ITT) approach permit high-quality cohort and matched case control analyses to be performed. Such a restricted cohort design includes a “zero time” for baseline, inclusion/exclusion criteria and adjustments for baseline susceptibility factors (based upon prior research). When observational studies are appropriately designed, the associations noted do not systemically overestimate the treatment effect.4 In fact, when contemporaneous controls are used, large-scale, well-designed observational studies demonstrate less variability, less heterogeneity and a greatly reduced likelihood for repeated studies to produce results in opposite directions when compared to RCTs.4 This may be due to the realistic and practical nature of the design as well as the broad representation of the population at risk.

Still, defining outcomes predictors using subgroup analysis is not without pitfalls. The approach has been derogatorily described as “data mining,” “data dredging” or likened to a “fishing expedition.” As a rule of thumb, even when the magnitude of the overall treatment effect is equivalent to the subgroup effect, the power of the subgroup analysis is reduced by a factor of four. While a stronger relationship — where the subgroup effect is twice the overall effect — eliminates the power problem completely, the need for greater numbers of subjects increases exponentially with weaker relationships.5 Further, one must consider the detrimental effect on the study’s power when multiple comparisons are used to test many potential predictors. These defects are effectively overcome when: 1) the study is powered to handle the subgroup analysis; 2) subgroups are restricted to those that are suggested by biological factors or experience (not derived data); and 3) subgroups are defined prior to data collection.5

The cautious interpretation of the subgroup analysis then follows only after confirmation of the relationship is made using regression analysis or another formal test of interaction (a single additional test). Computer modeling has been used to demonstrate the impact of subgroup analysis without specific interaction assessment.5 When the overall treatment effect was not significant (for p = 0.05), formal interaction tests accurately predicted 5 percent significant findings, whereas subgroup analysis found at least one group to be significant in 7 to 26 percent of cases. When the overall treatment effect is positive, the subgroup overestimation is even more extreme. However, as in the previous case, regression analysis correctly predicted the expected results. It follows that avoiding over-interpretation means that, absent strong a priori evidence, subgroup analyses should be viewed as hypothesis generators. The use of confidence intervals and group-specific effects are then used only after a direct interaction is established.

Concerns over study power, especially when examining predictors (or subgroups) with relatively small effects, are justified. After all, these calculations are performed to ensure that a large enough representative sample of the population is exposed. The more the sample begins to resemble the actual population, the more reliable the findings. Large-scale nationwide (and even international) outcomes studies are now possible using uniform data collection tools and the Internet. Benchmarking within practices, between practices and around the world can be used to monitor specific patient progress, treatment site practices and to establish best practices.

Observational outcomes studies, when done thoughtfully, have the potential to be the new standard in interventional pain medicine. While “dredging” and “mining” still seem a bit negative, I personally love to “fish.” As most avid anglers will agree, there are techniques and strategies to improve success.

References:
1. Cahana A. Ethical and epistemological problems when applying evidence-based medicine to pain management. Pain Pract. 2005; 5:298-302.
2. Moerman D. The meaning response: Thinking about placebos. Pain Pract. 2006; 6:233-236.
3. Van Zundert J. Clinical research in interventional pain management techniques: The clinician’s point of view. Pain Pract. 2007; 7:221-229.
4. Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Eng J Med. 2000; 342:1887-1892.
5. Brookes ST, Whitley E, Peters TJ, et al. Subgroup analyses in randomised controlled trials: Quantifying the risks of false-positives and false-negatives. Health Technol Assess. 2001; 5(33):1-56.



    Craig T. Hartrick, M.D., is Director, Anesthesiology Research, William Beaumont Hospitals, Royal Oak and Troy, Michigan.



return to top


 

FEATURES

Pain Medicine


ARTICLES


DEPARTMENTS


The views expressed herein are those of the authors and do not necessarily represent or reflect the views, policies or actions of the American Society of Anesthesiologists.

 

NL Archives

Information for Authors