Scientific Principles for Integrating and Evaluating the Available Data
Chapters 4 through 8 describe principles for how to consider the different categories of information likely to be available: in vitro data, human data, animal data, and data about related substances. This chapter describes how to appropriately weave different pieces of information together when complete safety data are not available and individual pieces of data may not be conclusive or are inconsistent. The described approach and its emphasis on biological plausibility and consistencies across different types of data are consistent with scientifically accepted approaches for making judgments, as well as with the safety standard outlined by the Dietary Supplement Health and Education Act (DSHEA), which authorizes the Food and Drug Administration to act when there is unreasonable or significant risk that overturns the assumption of safety.
The first section of this chapter explains why it is not possible to use a formulaic or algorithmic approach to integrate and evaluate the data. The next section describes how the systematic integration of knowledge and judgment can be used to assess dietary supplements for risk in a practical, informative, and transparent manner using causal models. The second half of this chapter describes concepts for weighing evidence that may initially appear inconsistent, focusing especially on interpretation of negative data. Finally, a short but important discussion about proof of harm describes underlying principles that are important when determining if an unreasonable or significant risk exists.
A FORMULAIC APPROACH FOR INTEGRATING DATA IS NOT PRACTICAL
Under the current legal and regulatory framework, the safety of many dietary supplement ingredients is more difficult to evaluate than other substances because of a general lack of quality data in the public domain, as well as the lack of requirement for premarket safety evaluation to drive future safety studies.1 In the absence of scientific studies specifically designed to assess the safety of dietary supplement ingredients, it is not possible to apply a specific algorithmic or formulaic approach to determining safety, and expert2 judgment in the interpretation of data is likely to be important, as it is for other substances. The Framework outlined in this report is different than frameworks that have formulaic components or that rely largely on assumptions that apply to the particular type of product being evaluated (see Appendix A). For example, the Flavor and Extract Manufacturers Association process relies largely on the fact that flavors will not be ingested in large amounts, and the Cosmetic Ingredient Review is unique in that many of the ingredients it evaluates are not bioavailable. The framework for Dietary Reference Intakes of the Institute of Medicine’s (IOM’s) Food and Nutrition Board includes a model for determining tolerable upper intake levels for ingestion of vitamins and minerals (IOM, 1998). The model is based on applying an uncertainty factor to the level at which no adverse events are observed from consumption of a nutrient, or if no data are available, to the lowest level of chronic intake at which adverse events are observed.
It is also not appropriate to develop a hierarchical approach to considering the different types of data—human data, animal data, in vitro data, or information about related substances—for various reasons. In part, such an approach is not feasible because of limitations in the quality of the data and what different types of studies can reveal, but these limitations can be overcome with other types of data. Although a hierarchical approach is not practical, it is possible to weigh the various types of data available to make conclusions regarding risk to human health. The second part of this chapter provides guidance on comparing animal and human studies with seemingly inconsistent results, but each situation will need to be evaluated to weigh the data appropriately.
AMOUNT OF INFORMATION NEEDED TO DRAW A CONCLUSION
GUIDING PRINCIPLE: In the absence of scientific studies designed specifically to test the safety of a dietary supplement, concern for public safety may be raised by the presence of even a few reports of possible safety concerns when viewed together and constituting the weight of available evidence.
Even if there are only one or two convincing reports of safety concerns about a dietary supplement, from either in vitro, animal, or human data, it may not be necessary to gather much additional information to raise concern about the implications for public health. However, in other cases, it may be necessary to assemble several data reports and reach a conclusion about risk based on the totality of available evidence, overall consistency, and biological plausibility of the evidence (a “weight of evidence” approach). In the absence of data on the safety of a specific ingredient, convincing information about safety of chemically or functionally related substances may be used to judge concern.
GUIDING PRINCIPLE: Integration of data across different categories of information and types of study design can enhance biological plausibility and identify consistencies, leading to conclusions regarding levels of concern for an adverse event that may be associated with use of a dietary supplement ingredient.
Individual pieces of information from any one of the categories of information (human, in vitro, animal, or related substances data) may sometimes be sufficiently compelling to both exceed a threshold level of concern and to justify focused evaluation or action. In many circumstances, however, data will need to be collated within the same category or across several categories to determine the appropriate level of concern. That is, even if concern raised by one category of data—for example, human data—does not meet a threshold for action, the body of evidence available across several categories may raise the level of concern. In integrating observations across categories of data, consistency and evidence of biological plausibility should raise the level of concern. In other words, available evidence from
each category of data, by itself, may be insufficient to indicate concern, but when a pattern of mechanistically related adverse effects is observed across two or more categories in a consistent manner, this can establish biological plausibility and warrant heightened concern for potential harmful effects in humans.
Causal Models for Considering Consistency and Biological Plausibility
Synthesis is the concept or process of integrating safety data from different types of study designs and across different categories of data. Data synthesis can be facilitated, and conceptually illustrated, by the use of causal evidence models. A causal evidence model (see Figure 10-1) provides a structure to help interpret available data from a number of sources that address a specific safety question (Harris et al., 2001). The model can describe the relationship among a dietary supplement, potential adverse health outcomes (e.g., liver failure, death), and biological effects3 by depicting the relationship as linkages that are illustrated with arrows. The type of arrow illustrates the type of evidence: convincing data are depicted by solid arrows and weaker or less conclusive data are depicted by dashed arrows. A “path” between the dietary supplement ingredient and an adverse health effect illustrates a relationship. When the available information is integrated, multiple links between the dietary supplement ingredient and a given health outcome are illustrated by multiple arrows, as discussed below.
A solid arrow (Arrow A, Figure 10-1) linking the ingredient to the adverse health effect illustrates that a clear association between the ingredient and the effect has been demonstrated. An arrow (Arrow B, Figure 10-1) linking the ingredient to the biological effect illustrates a situation where the ingredient is known to cause the biological effect, whether or not the biological effect has been linked directly to the adverse health effect (Arrow C, Figure 10-1). Note that Arrow B could be present without Arrow C for many situations, but that a conclusive situation occurs when the ingredient is linked to the biological effect and the biological effect is linked to the adverse health effect, illustrated by Arrows B and C together.
Figures 10-2 through 10-4 illustrate other possible scenarios where conclusive data (human, animal, or in vitro) exist. Figure 10-2 illustrates two possible scenarios of conclusive animal data. The first diagram illustrates a situation where the dietary supplement ingredient is known to
cause the adverse health effect in animals. The second diagram illustrates a situation where the dietary supplement ingredient is known to cause a biological effect in the animal that is related to the possible adverse health effect. Figure 10-3 illustrates conclusive in vitro data. The dietary supplement ingredient in 10-3 is known to cause the in vitro biological effect.
Validation of the in vitro assay for the biological effect provides a link between it and the adverse health outcomes. Figure 10-4 illustrates how information about functionally related data is used to make a link between the biological effect and the adverse health outcome. The fact that a related substance causes the adverse health effect through the biological effect provides a path of arrows between the dietary supplement ingredient and the adverse heath effect.
Evidence from all types of study designs may form linkages to aid in determining the extent of association between dietary supplement exposure and adverse health outcomes. Each causal model illustrates a specific ingredient’s relationship to a particular adverse health outcome, thus separate causal models should be constructed for adverse events associated with different mechanisms (e.g., cardiotoxicity, hepatotoxicity, neurotoxicity). The same model structure should be used for different categories of data (e.g., human, animal, or in vitro data).
Figure 10-5 illustrates how one model integrates different types of data, demonstrating the power of the model in drawing conclusions. In Figure 10-5A, dashed lines are used to illustrate weak or incomplete data and each category of data is illustrated in a separate diagram. None of the models in Figure 10-5A are conclusive. That is, none include a path between the dietary supplement ingredient and the adverse health effect. Figure 10-5B is an integrated illustration of all that is known about a particular dietary supplement ingredient’s relationship to the particular adverse effect. The weaker links are strengthened by consistent data of several types and a relationship path between the dietary supplement ingredient and the adverse health effect is apparent.
The following specific example illustrates use of the causal model diagramed in Figure 10-6. In this case, a biological effect caused by a dietary supplement ingredient may be known to occur following exposure to a different chemical that has known adverse health effects. For example, saw palmetto causes the biological effect inhibition of 5-α-reductase in vitro and in animal data. The drug finasteride is also known to have this biological effect, which has been linked to finasteride-induced developmental defects in male genitalia in utero. When the models are integrated, the relationship between saw palmetto and defects in male genitalia is illustrated; this link between the biological activity (inhibition of 5-α-reductase) of the known teratogen finasteride and saw palmetto is sufficient to raise concern about the safety of saw palmetto use in women who could become pregnant because this inhibitory effect of finasteride on 5-α-reductase is considered causative in the teratogenic effect.
Individual studies from a single category of data also form links in a causal evidence model, with multiple and consistent evidence for the same link strengthening the linkage (illustrated with multiple overlapping arrows
following integration). Thus the multiple links illustrate the consistency concept: consistency increases the linkage and thus increases the concern warranted. Figures 10-5 and 10-6 show that summing, or synthesizing, data addressing different linkages forms a more complete causal evidence model and can provide the biological plausibility needed to establish the association between a dietary supplement and an adverse event.
In summary, causal models are useful when individual pieces of evidence are weak and are of different types or when they do not clearly illustrate a relationship, when viewed individually, as may often be the case with dietary supplement ingredients. Frequently, in studies of dietary supplement activity, a single category of data supporting a causal evidence model is incomplete or weak, precluding firm conclusions. By linking data from more than one category, such as human and animal data, causal models create a more complete picture of the data and provide a more complete understanding of the relationship between biological effects and potential adverse health outcomes. Similarly, different types of study design (e.g., experimental and observational studies) within a category of data may also be assessed together to provide more robust conclusions.
Cross-design synthesis, a quantitative method that combines studies of different designs with different endpoints and different categories of data, has been proposed in the past (NRC, 1992). However, there is little current experience with this approach. As a result, the qualitative synthesis described here, and the weight of evidence as judged by experts, are appropriate approaches for evaluating the body of assembled evidence for safety of dietary supplements.
WEIGHING EVIDENCE THAT MAY APPEAR INCONSISTENT
Differences in Exposure or Product Formulation May Explain Inconsistencies
GUIDING PRINCIPLES: Risk is a function of exposure. Analysis therefore needs to link risk of harm to relevant dietary supplement ingredient exposure. One formulation of an ingredient may or may not be relevant to other formulations: relevance depends on similar bioequivalence of the active ingredients.
Some apparent inconsistencies in data may be explained by differences in the level of ingredient exposure, which in some cases may be related to differences in formulation. If different studies produce different conclusions about potential adverse effects, it is important to consider whether the exposures were comparable. Exposure at the site of action depends upon the amount ingested (amount of a constituent4 in the product), the route of exposure,5 and the bioavailability of the formulation. These concepts are reflected in the manner in which data were analyzed in the prototype monographs released with this report and summarized in Appendixes D through K. The amount of product ingested, route of exposure, and the processing and composition as well as the formulation of the ingredient used in each study, were all taken into consideration in determining the relevance of datasets for use in evaluating the level of concern for safety of the dietary supplement ingredients reviewed. As these prototype analyses were prepared, two underlying concepts were considered—concepts that are generally relevant and important to bear in mind when considering inconsistencies. These are (1) that the amount of active chemical constituents in products can vary, and (2) in the absence of evidence to the contrary, it is assumed that adverse effects observed at higher than ingested levels have some relevance to the safety of ingested levels of ingredient.
In studies that use controlled amounts of purified or well-characterized ingredients, it is relatively straightforward to relate a certain level of ingredient to an adverse effect. However, in many instances, the amount of a specific biologically active constituent in a dietary supplement is unknown and can be expected to vary from preparation to preparation (Feifer et al., 2002; Fong, 2002). For botanicals in particular, variation in final product can stem from inconsistent harvesting, storage, and processing, or differences in the plant genotype or growing environment (Fong, 2002). An example of preparation difference is that alcoholic extracts and dried botanicals for hot water extraction may be sold under the same name, although alcohol and water will extract a different array as well as different quantities of chemical constituents. In another example, shark cartilage is sold both as washed, ground material and as a water extract, each having different compositions. For these reasons, unless composition is confirmed by analysis, the amount of a particular chemical constituent in an ingredi-
ent cannot be determined even when a statement on a supplement label suggests a certain volume of material contains a particular amount of active ingredient. That even different formulations containing the same amount of an active ingredient cannot be assumed to be bioequivalent is well understood; this is one reason premarketing bioequivalence studies are required for new drug formulations to be sold under generic labels. Such testing is not conducted with dietary supplements even though formulation, processing, and preparation technologies can significantly alter composition and bioavailability.
Because of these and other possible inconsistencies, direct extrapolation of evidence of safety from one product formulation to another is ill advised without clear evidence of bioequivalence between the preparations. Evidence of risk should be treated differently, however; a general guideline to follow is that if adverse effects are observed with one product formulation, they should be assumed to occur following intake of other formulations as well, unless enough is known about the other preparations to discount the possibility that they have the potential for the same concern.6 For example, seemingly inconsistent results from different formulations of the same original substance might be explained by convincing evidence from animal data or chemical analysis comparing the two formulations.
In general, studies conducted using amounts of the ingredient greater than those likely to be consumed when using a product as a dietary supplement can be used in evaluating the appropriate level of concern about risk. This is because the amount of constituent ingested by humans may vary significantly, as discussed above, and because there may be consistency between the types of adverse effects observed following ingestion of these elevated amounts and lesser but similar effects observed following ingestion of lower amounts typically consumed as a dietary supplement. Alternatively, the effects seen at these higher levels may provide biological plausibility for less serious effects observed following consumption of lower amounts. For example, dose-response studies of nordihydroguaiaretic acid (NDGA, a constituent of chaparral) showing hepatotoxicity at high doses were useful in considering possible adverse effects on liver function related to reports of jaundice following ingestion of chaparral containing lower amounts of NDGA (see chaparral-focused prototype monograph, Appendix J). However, even without these elements of consistency or biological
plausibility, the results from studies using amounts not possible to obtain through ingestion of the dietary supplement may still be useful, as explained in Chapters 4 and 5 for human data and animal data, respectively, and to indicate what types of studies should be pursued to determine the potential for harm.
Poor Data Quality May Explain Inconsistencies
GUIDING PRINCIPLE: To evaluate the safety of an ingredient, it is best to consider all relevant data, but each study should be evaluated individually for quality.
Another possible explanation for apparent inconsistencies in the observation of adverse effects may be differences in the quality of the reports. Important aspects of studies from each category of data, including human, animal, and in vitro studies, have been discussed in detail in Chapters 4 through 7. However, there are a number of overarching considerations that bear emphasizing. Decisions to consider unpublished data in addition to published data should depend on the quality and completeness of the data set. Unfortunately, publication in the scientific literature does not in itself qualify data as acceptable for evaluation, and many published articles contain insufficient detail to allow the data described to be of much use in risk evaluations. Consideration of statistical power, validation of analytical methods, analytical approaches, consistency with the published literature, peer review of the data, and bias due to conflict of interest by the authors are all important in evaluating usefulness of a dataset.
Lack of Evidence as an Apparent Inconsistency
GUIDING PRINCIPLE: Absence of evidence of risk does not indicate that there is no risk.
In some cases, some data will indicate a risk, while other data will not suggest the risk exists, producing what could be interpreted as an inconsistency. Inconsistencies may be explained by the inability of some systems to detect adverse effects or differences in formulation, differences in frequency or length of exposure, differences in pre-existing human physiology (e.g., sex differences or chronic illness), or many other causes. Even if a study showing lack of adverse effects is reported, if the study is not adequately
designed to identify risk (e.g., not sufficiently powered, incompletely reported, does not include positive controls, or otherwise has inadequate mechanisms for detecting adverse events), it is not scientifically valid to use such information to mitigate suggested risk from other sources. Only negative data originating from well-designed studies or other credible sources may mitigate, if not fully eliminate, concerns raised by other sources of information and even well-designed, credible data are often not appropriate to use this way as discussed below. The basic principle that “absence of evidence of risk does not indicate there is no risk” leads to the question of how to weigh seemingly inconsistent data where some information suggests a risk and other information does not. How to compare this type of information is discussed here, with particular emphasis on inconsistencies between animal and human data.
Because of the relevance of human data, serious adverse events arising from randomized clinical trials, spontaneous reports with strong attribution, or case series are generally more compelling than other categories of data when they raise the level of concern. In general, if there is scientifically based evidence from human studies indicating that a concern for safety exists, then the lack of adverse events in animal studies, in vitro studies, or even other human studies cannot be used to overrule or disregard the evidence of harm. The absence of adverse findings in animal studies, no matter how well designed, does not prove that pathological effects will not occur in humans; thus, the absence of an effect or observation in animals cannot mitigate concern raised by human data.
Whereas animal studies cannot be used to mitigate findings of toxicity in humans, animal testing can and should be used to further investigate adverse events that have been reported in humans but for which sufficient attribution cannot be reached. For example, animal data may be used to identify problems specific to particular formulations and sources or products (e.g., in their content, contamination, bioavailability) by comparing groups of animals given the different formulations. This approach might be used, for example, to identify the presence of a contaminant due to a novel processing technology by comparing the effects of feeding the two formulations to the appropriate animal species. Appropriate positive controls would, of course, be necessary to conclude that one formulation has a different effect than another formulation. Also, animal models of human conditions and physiological states can be used to uncover particular vulnerabilities in humans in order to determine specific circumstances under which the dietary supplement ingredient may cause safety concerns in humans by modeling particular conditions (e.g., an animal model of diabetes).
Similarly, it is rarely appropriate to discard observations of adverse effects in animals simply because similar effects were not observed in humans. Evidence of risk from well-designed animal studies using appropriate
models cannot be overruled by lesser-quality human data, such as a lack of spontaneously reported adverse events in current or historical use, less than adequate clinical or in vitro studies, or lack of structural similarities to any known poison. To justify disregarding animal observations, the event occurring in animals would need to be specifically monitored for and detectable in humans under the conditions reported. For example, a lack of cancer in humans exposed to an animal carcinogen assumes greater importance when there are data of sufficient quality and power to detect the cancer if it were to occur. Thus, even though an event such as breast cancer that occurs with some frequency in the human population would be difficult, statistically, to attribute to an ingredient used in a human study, a lack of resolving power in epidemiologic studies does not rule out a relationship between the ingredient and human cancers.
Human exposure may need to be prolonged to detect latent or chronic toxicities. Therefore, regardless of the presence of even high-quality human data suggesting no toxicity following short-term exposure, certain chronic animal toxicity or biological activity data could warrant elevated concern. In particular, animal studies that warrant special concern are those that indicate the following potential effects in humans: evidence of cancer, reproductive system effects, developmental toxicity effects including teratology, or other delayed serious toxicities (see Table 5-1).
There are only a few scientifically appropriate reasons to discount animal observations because human effects are not observed, such as the well-understood differences in pharmacokinetics and/or pharmacodynamics between humans and the experimental animal described in Chapter 5. One example is manifestation of toxic effects that depends on metabolic pathways present in animals but not in humans, thus producing animal study results that are not very applicable to human health. (A hypothetical example is an adverse effect that results because the dietary supplement ingredient blocks a pathway that humans are not dependent on.) While it is unusual in the dietary supplement field at present to have such detailed knowledge, occasionally an understanding of the pathway responsible for the toxicity can mitigate the extrapolation to humans of concerns raised in animal studies.
PROOF OF HARM
For a number of dietary supplements ingredients, decisions regarding safety must be made despite sparse data and shortcomings in the studies that are available. In making such decisions, interpretation of available data must be made by weighing it against the assumption that all dietary supplement ingredients are safe (see Chapter 1, description of DSHEA). The assumption that dietary supplement ingredients are safe is not equivalent to
a scientific determination of safety, and thus the information required to overcome this assumption is not required to be absolute proof or evidence that a harm or adverse effect has occurred or will inevitably occur. Instead, the information required is something less. Rather than a quantitative, probabilistic assessment of risk, which is preferable and often possible when data about a chemical are substantial or at least include standard toxicology tests, it may be prudent or necessary to make a qualitative determination by using judgment and scientific inference to consider the limited data. In summary, to evaluate dietary supplements under DSHEA, it is necessary to determine only if an “unreasonable or significant risk” exists, not to have complete evidence that a dietary supplement causes a serious adverse event. That is, the standard of “unreasonable or significant risk” put forward by DSHEA is a lower standard than conclusive scientific proof, a fact that is likely to facilitate the ability to take action.
Feifer AH, Fleshner NE, Klotz L. 2002. Analytical accuracy and reliability of commonly used nutritional supplements in prostate disease. J Urol 168:150–154.
Fong HHS. 2002. Integration of herbal medicine into modern medical practices: Issues and prospects. Intergr Cancer Ther 1:287–293.
Harris RP, Helfand M, Woolf SH, Lohr KN, Mulrow CD, Teutsch SM, Atkins D. 2001. Current methods of the U.S. Preventive Services Task Force: A review of the process. Am J Prev Med 20:21S–35S.
IOM (Institute of Medicine). 1998. Dietary Reference Intakes: A Risk Assessment Model for Establishing Upper Intake Levels for Nutrients. Washington, DC: National Academy Press.
NRC (National Research Council). 1992. Combining Information: Statistical Issues and Opportunities for Research. Washington, DC: National Academy Press.