DOI: https://doi.org/10.20529/IJME.2012.007
Much of the evidence-base from research is biased. Systematically assembled, quality-appraised, and appropriately summarised reviews of the effects of interventions from all relevant intervention studies are needed, in order to use research evidence to reliably inform health decisions. The Cochrane Library is an online collection of six searchable, up-to-date, evidence-based databases that is available free to access by anyone in India, thanks to a national subscription purchased by the Indian Council of Medical Research. This valuable resource contains the world’s single largest collection of systematic reviews and controlled clinical trials, as well as bibliographic details and records of methodological research, health technology assessments and economic analyses. The robust and transparent methods pioneered and used in Cochrane systematic reviews, and independence from industry funding facilitate the detection of biased, deceptive and fraudulent research, and have earned these reviews the reputation of being trusted sources of evidence to inform health decisions. Cochrane reviews have had considerable impact on academic medicine; have informed health practices, policies and guidelines; improved health outcomes; and saved numerous lives.
An editorial in the previous issue of this journal summarised the results of empirical research revealing that much of the evidence from research that is integral to the practice of evidence-based medicine cannot be trusted (1). This does not mean that none of the evidence can be trusted. However, it does require a special effort to identify sources of reliable evidence, to understand how this should be assessed, and the amount of confidence one can place in this evidence.
The first step in evidence-informed healthcare is to find relevant evidence that is free of the risk of bias. Randomised controlled trials (RCTs) are considered the type of study design that is least likely to provide biased estimates when assessing the effects of interventions. However, the language of research facilitates deception, through the use of descriptive terms that are widely employed to describe studies that have not necessarily used, or used adequately, the methods required to provide “Gold Standard” evidence that RCTs are assumed to provide (2, 4). Moreover, the results of a single RCT are unlikely to be generalisable to all situations where the intervention may be used. The results of different RCTs of the same intervention and control comparison may also differ substantially.
The least biased evidence that addresses these issues regarding the effects of interventions comes from well-conducted systematic reviews and meta-analyses of all RCTs conducted that compare an intervention to no intervention (or placebo), and to other interventions commonly used for that health condition. If RCTs are not ethical, practical, or feasible, then systematic reviews of particular types of well-conducted observational studies could provide alternative sources of evidence.
Systematic reviews use explicit and systematic methods to search for, locate, and retrieve; critically appraise for the risk of bias; reliably extract and analyse data from all relevant research studies addressing a focused clinical question, and summarise the overall results. They, therefore, provide information that individual trials cannot. Many systematic reviews, though not all, synthesise their results using meta-analyses. Meta-analysis is the statistical technique that aggregates the numerical data for each relevant outcome from the primary studies that are sufficiently similar in their participants, interventions, methods, and outcomes to combine in a clinically meaningful manner.
Systematic reviews in the Cochrane Database of Systematic Reviews (CDSR), one of six evidence-based databases that form part of The Cochrane Library (www.thecochranelibrary.com), are particularly reliable sources of evidence, as are systematic reviews that use the methods pioneered by the Cochrane Collaboration (www.cochrane.org). The resources in The Cochrane Library are free to access by anyone in India with a computer and an internet connection, thanks to a national subscription purchased by the Indian Council of Medical Research (ICMR) since 2007, and renewed for a further three years in 2010. More than half the world’s population also has free access to this valuable resource due to various sponsored initiatives or licensing agreements (5).
Only about 20% of reviews published each year are Cochrane systematic reviews. However, empirical research reveals that Cochrane systematic reviews are scientifically more rigourous, more likely to be up to date, and less biased in their methods and interpretation than non-Cochrane systematic reviews (6, 7). The 2010 impact factor for the CDSR was 6.186. The CDSR is now ranked in the top 10 of the 151 journals in the medicine, general, and internal category, and receives the seventh highest number of citations in its category.
Other more important examples of the impact of Cochrane reviews include informing the guidelines of many agencies, including those of the World Health Organisation, and influencing global, national and regional health policies. Cochrane reviews in many topic areas have provided clinicians, patients and their care-givers access to reliable evidence that have improved health outcomes and saved numerous lives.
The major reasons that contribute to the reliability of Cochrane systematic reviews stem from the rigourous methods used in their preparation. These methods are described in the Cochrane Handbook for Systematic Reviews of Interventions (www.cochrane-handbook.org/). They include:
Hence, apart from searching different online databases, Cochrane reviews routinely search the Cochrane Central Register of Controlled Clinical Trials (CENTRAL), the world’s largest repository of information regarding clinical trials that forms part of The Cochrane Library. It includes details of published articles taken from multiple bibliographic databases, other published resources, and from unpublished sources. Cochrane reviews also routinely search the specialised registers of the respective collaborative review groups supporting the review. Experts in the field and drug manufacturers are contacted for further, often unpublished, information, as well as the authors of identified studies; and the cross references of these studies are searched for further references. Clinical trials registries are also searched for on-going trials. No language restrictions are applied in the search strategy in order to avoid language bias; regional databases are also searched, and retrieved reports are translated, if needed.
An example is provided in Figure 1 that displays a (fictitious) meta-analysis (or forest plot) comparing drug A with drug B for the treatment of obesity. The outcome assessed in the figure is the risk of death.
In this hypothetical example, the five trials (identified in the rows in column 1 by the last name of the first author and year of publication) included in the meta-analysis randomised 930 adults to anti-obesity drug A, of whom 51 died, (columns two and three), and 928 adults to anti-obesity drug B, of whom 72 died (columns three and four). The variance in the trial by Pai 2010 was the least since it provided the most information (largest sample size and most deaths) and had the most precise results (narrow confidence intervals), and hence is assigned the most weight (66.6%) (column four) in the meta-analysis. Jessani 2005 had the next largest sample size and next highest number of deaths, but gets the least weight (6.9%) since it had the least deaths in the control group (drug B), and the least precise results. The rows in the sixth column display the numerical values of the relative (RR) and 95% CI for the comparisons from each trial (without differential weighting). This is also graphically displayed in the last column as a forest of horizontal lines (hence the name “forest plot”; if there were many more trials, the resemblance to a forest of lines would be even more apparent) scattered around the vertical line that touches the base of the plot at the RR of 1(no significant difference). The rectangular blob in the middle of the horizontal lines represents the RR estimate for each trial. The size of the blob is proportionate to the weight assigned to each trial. The width of the horizontal lines depicts the upper and lower limits of the 95% confidence intervals.
The pooled results of the five trials (proportionately weighted in the formula used for the meta-analysis to yield the weighted average) are provided in the last row. The pooled RR is 0.70 [95% CI 0.50 -0.99], and represents the average risk of death with drug A compared to drug B. The diamond at the bottom of the graph in the last column includes the pooled RR and confidence limits of the five trials. The lower limit of the pooled 95% CI in the diamond [RR = 0.99] stops short of touching the vertical line [RR = 1].
This example highlights the importance of evaluating effect sizes such as the RR and 95% CI, rather than only rely on p values < 0.05 to denote that the differences in the results are significant. The p value will continue to be < 0.05, even if the RR was 0.30, 95% CI 0.20 – 0.40; a result that is both statistically significant (both limits of the CI < 1), and clinically important (we estimate that drug A would reduce the risk of death by 70%, though it could be as low as 60% or as much as 80%). This example also emphasises the need to examine the absolute effects to understand the true benefits and harms of interventions, in addition to the more impressive relative estimates of effects.
This example highlights the importance of evaluating effect sizes such as the RR and 95% CI, rather than only rely on p values < 0.05 to denote that the differences in the results are significant. The p value will continue to be < 0.05, even if the RR was 0.30, 95% CI 0.20 – 0.40; a result that is both statistically significant (both limits of the CI < 1), and clinically important (we estimate that drug A would reduce the risk of death by 70%, though it could be as low as 60% or as much as 80%). This example also emphasises the need to examine the absolute effects to understand the true benefits and harms of interventions, in addition to the more impressive relative estimates of effects.
Clinical heterogeneity; Clinical heterogeneity arises from differences in the clinical aspects of trials. Trials carried out in different countries; in different years or even decades; on different populations; with different definitions and thresholds for diagnosis; and varying grades of severity of the health condition; are likely to yield results that differ considerably. Similarly trials using interventions that differ in doses, formulations, combinations, routes, regimens, and durations of treatment; and comparing them with placebo or no treatment, and a myriad of alternative treatments with the same dizzying array of variations, will also yield differing results. Trials that use outcomes that are defined, and ascertained in different ways, and at different time-points will add to the possibility of yielding results that are inconsistent in a meta-analysis.
Trials in meta-analyses whose methods increase the risk of bias often differ in their results from those at low risk of bias, resulting in methodological heterogeneity. Finally the results may be inconsistent purely by chance. Clinical and methodological reasons for heterogeneity can result in statistical heterogeneity that is not uncommon. What is important is to identify if observed inconsistency in the results is due to chance (random error), and to what extent important differences in the trials contribute to the inconsistency. This will help determine if the results of the individual trials can be still be pooled and presented as an average, or fixed effect, of the intervention across all the trials (hence the use of the term fixed effect meta-analysis in the figure legend and at the top of the last column).
However, if one inspects the graphical display of results in Figure 1, it is easily apparent that in Jessani 2005, the RR of 5.60; 95% CI 2.27 – 13.81 indicates that drug B was far more effective than drug A; a result that is in the opposite direction to the RR estimates of the other four trials. In the graphical display and the numerical description, the confidence limits in Jessani 2005 also clearly do not overlap with those of the other trials. Non-overlapping confidence intervals, especially if accompanied by effect estimates that differ in the direction of effects, are clear indications that the results from all the trials included in the meta-analysis are inconsistent with the pooled result, raising the possibility of statistical heterogeneity. It is possible (though unlikely given the clear difference in the direction of effects) that this inconsistency in results is due to chance. The chi-square (x2) test for homogeneity shown in the second last row reveals a very small p value, indicating that one can be 99.99999% sure that this inconsistency is not due to chance but due to differences in the trials (clinical or methodological heterogeneity).
Just as with the previous example, the p value from the chi-square test only provides us the certainty of excluding chance as an explanation of a result, and does not reveal how much of this inconsistency is actually important. The final notation in the second last row of the figure reveals an I2 value of 87%. The I2 statistic is derived from the chi-square test and reinterprets this to indicate the proportion of inconsistency that is due to true heterogeneity in the trials. The I2 value of 87% indicates that only 13% of the inconsistency observed is due to chance and 87% is due to differences in the way the drug works in the trials. This degree of inconsistency is too large to ignore; and it would be unreasonable to assume that the pooled effect estimates provide a realistic average effect of drug A. Had this value been less than 25%, one might be less worried about heterogeneity in the meta-analysis since more than 75% of the differences in results of the five trials occurred by chance.
For example, let us assume that the review authors had pre-specified that if substantial heterogeneity was detected, the trials would be sub-grouped by the presence of pre-existing risk factors for cardiac disease in participants. If Jessani 2005 had included many participants with previous episodes of angina or cardiac disease who were on medications, while the other trials had excluded such participants, the sub-group analysis of the forest plot would look different (Figure 2).
The meta-analysis now shows that the pooled results in the subgroup of trials where participants had no cardiac risk factors, drug A was far more effective than drug B in reducing the risk of death. There is no inconsistency in the results within this subgroup of trials [I2 = 0%]. Death was more likely with drug A than drug B for those with previous angina on medications, and this could be due to the heart condition and / or to medication interactions. It would be meaningless to pool the results of the five trials now in the face of such substantial heterogeneity, and significant differences in effect estimates in the sub-groups. However, the inconsistency in the pooled results of the five trials helped us in understanding the differential effects of drugs A and B in those with cardiac risk factors that would not have been so apparent from the results of a single trial.
These and other methods described in detail in the Cochrane Handbook, ensure that the results of Cochrane systematic reviews are robust and reliable. However, the numerical results alone may be insufficient to inspire confidence in the effects of the intervention, or to require a change in practice. Systematic reviews differ in the numbers of included trials that met inclusion criteria, or that provided data for each outcome in the reviews. They also differ in the risk of bias in the included studies; and even in those that contributed data for different outcomes within a review. Two systematic reviews addressing the same question may yield different conclusions and differ in the way they selected outcomes and defined outcome thresholds. A systematic review may conclude that drug A is recommended; and another systematic review may subsequently conclude that the drug is harmful, due to hitherto undisclosed adverse effects. The review may find that drug A causes fewer deaths than drug B but is less effective in treating obesity. Future comments in this journal will describe methods of integrating the numerical results with other important information when summarising the results of meta-analyses in systematic reviews, so that one can understand how much confidence to place in the overall evidence provided to reliably inform health decisions.
Competing interests The author is a contributor to the Cochrane Collaboration (www.cochrane.org) and director of one of the 14 independent Cochrane Centres (www.cochrane-sacn.org) worldwide. He has received research funding, travel support, and hospitality from organisations that support evidence-based healthcare.
Funding support The author is a salaried employee of the Christian Medical College, Vellore.
Declaration This article has not been previously published or submitted for publication elsewhere