andomized
controlled trials (RCTs) are widely recognized as
the “gold standard” in clinical research.
They enjoy the highest levels of respect from editors
and reviewers, and they garner the highest weighting
when cumulative evidence is judged. The ability
of RCTs to reduce selection bias is remarkable,
and the reputation is well deserved, especially
with respect to noninvasive and drug therapies.
The blinding of subjects and investigators and the
use of placebo control are additional measures aimed
at reducing bias. Invasive therapies such as surgery
and interventional pain procedures, however, present
special problems in effectively reducing bias through
the use of RCTs. This discussion briefly highlights
some of these difficulties and recommends an alternative
approach to outcomes study design in pain medicine.
Randomized trials using placebo or sham intervention
controls have both scientific and ethical problems
when invasive treatment approaches (interventional
pain procedures and surgery) are studied.1
While it is beyond the scope of this presentation
for a more comprehensive discussion, suffice it
to say that the more dramatic the intervention,
the more pronounced and prolonged will be the “meaning
response” result.2
In the case of interventional pain therapies, this
effect can be long enough to be confounded with
the natural history of the disease process. RCTs
using best available alternative therapies as a
control may be preferable3
but cannot account for the added “meaning”
and expectations associated with the intervention.
Double-blind, double-dummy designs address the scientific
but not the ethical concerns of subjecting subjects
to sham intervention. More importantly, in an effort
to minimize bias, RCTs further attempt to eliminate
confounding factors by using rigid inclusion and
exclusion criteria. The resultant uniformity of
patients frequently severely limits the applicability
of the study results to “real world”
situations.
Observational studies, while encompassing “real
world” scenarios, have been much maligned
for overestimating treatment effects. While examples
are too numerous to mention, the failings have to
do with the design of these studies, not the act
of observation. Poorly designed observational studies
suffer from selection bias, often using historical
controls, controls from unblinded studies, non-randomly
assigned controls or controls from previous work.
Control data collected over different periods from
the time in which the intervention groups were studied
further degrades quality.
In contrast, high quality observational studies
incorporate many of the same design features commonly
seen in RCTs. Prospective data collection and an
intent-to-treat (ITT) approach permit high-quality
cohort and matched case control analyses to be performed.
Such a restricted cohort design includes a “zero
time” for baseline, inclusion/exclusion criteria
and adjustments for baseline susceptibility factors
(based upon prior research). When observational
studies are appropriately designed, the associations
noted do not systemically overestimate the treatment
effect.4
In fact, when contemporaneous controls are used,
large-scale, well-designed observational studies
demonstrate less variability, less heterogeneity
and a greatly reduced likelihood for repeated studies
to produce results in opposite directions when compared
to RCTs.4
This may be due to the realistic and practical nature
of the design as well as the broad representation
of the population at risk.
Still, defining outcomes predictors using subgroup
analysis is not without pitfalls. The approach has
been derogatorily described as “data mining,”
“data dredging” or likened to a “fishing
expedition.” As a rule of thumb, even when
the magnitude of the overall treatment effect is
equivalent to the subgroup effect, the power of
the subgroup analysis is reduced by a factor of
four. While a stronger relationship — where
the subgroup effect is twice the overall effect
— eliminates the power problem completely,
the need for greater numbers of subjects increases
exponentially with weaker relationships.5
Further, one must consider the detrimental effect
on the study’s power when multiple comparisons
are used to test many potential predictors. These
defects are effectively overcome when: 1) the study
is powered to handle the subgroup analysis; 2) subgroups
are restricted to those that are suggested by biological
factors or experience (not derived data); and 3)
subgroups are defined prior to data collection.5
The cautious interpretation of the subgroup analysis
then follows only after confirmation of the relationship
is made using regression analysis or another formal
test of interaction (a single additional test).
Computer modeling has been used to demonstrate the
impact of subgroup analysis without specific interaction
assessment.5
When the overall treatment effect was not significant
(for p = 0.05), formal interaction tests accurately
predicted 5 percent significant findings, whereas
subgroup analysis found at least one group to be
significant in 7 to 26 percent of cases. When the
overall treatment effect is positive, the subgroup
overestimation is even more extreme. However, as
in the previous case, regression analysis correctly
predicted the expected results. It follows that
avoiding over-interpretation means that, absent
strong a priori evidence, subgroup analyses
should be viewed as hypothesis generators.
The use of confidence intervals and group-specific
effects are then used only after a direct interaction
is established.
Concerns over study power, especially when examining
predictors (or subgroups) with relatively small
effects, are justified. After all, these calculations
are performed to ensure that a large enough representative
sample of the population is exposed. The more the
sample begins to resemble the actual population,
the more reliable the findings. Large-scale nationwide
(and even international) outcomes studies are now
possible using uniform data collection tools and
the Internet. Benchmarking within practices, between
practices and around the world can be used to monitor
specific patient progress, treatment site practices
and to establish best practices.
Observational outcomes studies, when done thoughtfully,
have the potential to be the new standard in interventional
pain medicine. While “dredging” and
“mining” still seem a bit negative,
I personally love to “fish.” As most
avid anglers will agree, there are techniques and
strategies to improve success.
References:
1. Cahana A. Ethical and epistemological problems
when applying evidence-based medicine to pain management.
Pain Pract. 2005; 5:298-302.
2. Moerman D. The meaning response: Thinking about
placebos. Pain Pract. 2006; 6:233-236.
3. Van Zundert J. Clinical research in interventional
pain management techniques: The clinician’s
point of view. Pain Pract. 2007; 7:221-229.
4. Concato J, Shah N, Horwitz RI. Randomized, controlled
trials, observational studies, and the hierarchy
of research designs. N Eng J Med. 2000;
342:1887-1892.
5. Brookes ST, Whitley E, Peters TJ, et al. Subgroup
analyses in randomised controlled trials: Quantifying
the risks of false-positives and false-negatives.
Health Technol Assess. 2001; 5(33):1-56.
| |
|
Craig
T. Hartrick, M.D., is Director, Anesthesiology
Research, William Beaumont Hospitals, Royal
Oak and Troy, Michigan. |
|
|