Using in Vitro High Throughput Screening Assays to Identify Potential Endocrine-Disrupting Chemicals

Background: Over the past 20 years, an increased focus on detecting environmental chemicals that pose a risk of adverse effects due to endocrine disruption has driven the creation of the U.S. Environmental Protection Agency (EPA) Endocrine Disruptor Screening Program (EDSP). Thousands of chemicals are subject to the EDSP; thus, processing these chemicals using current test batteries could require millions of dollars and decades. A need for increased throughput and efficiency motivated the development of methods using in vitro high throughput screening (HTS) assays to prioritize chemicals for EDSP Tier 1 screening (T1S). Objective: In this study we used U.S. EPA ToxCast HTS assays for estrogen, androgen, steroidogenic, and thyroid-disrupting mechanisms to classify compounds and compare ToxCast results to in vitro and in vivo data from EDSP T1S assays. Method: We implemented an iterative model that optimized the ability of endocrine-related HTS assays to predict components of EDSP T1S and related results. Balanced accuracy was used as a measure of model performance. Results: ToxCast estrogen receptor and androgen receptor assays predicted the results of relevant EDSP T1S assays with balanced accuracies of 0.91 (p < 0.001) and 0.92 (p < 0.001), respectively. Uterotrophic and Hershberger assay results were predicted with balanced accuracies of 0.89 (p < 0.001) and 1 (p < 0.001), respectively. Models for steroidogenic and thyroid-related effects could not be developed with the currently published ToxCast data. Conclusions: Overall, results suggest that current ToxCast assays can accurately identify chemicals with potential to interact with the estrogenic and androgenic pathways, and could help prioritize chemicals for EDSP T1S assays.

Endocrine hormones regulate a diverse set of physiological responses, some of which include sexual dimorphism, reproductive capacity, glucose metabolism, and blood pressure (Cooper and Kavlock 1997;de Mello et al. 2011;Dupont et al. 2000;Lodish et al. 2009;Ng et al. 2001). The many types of responses regu lated by hormones makes them of particular concern for disruption by xeno biotics (Ankley and Giesy 1998;Colborn and Clement 1992;Soto and Sonnenschein 2010;Tilghman et al. 2010). Endocrine disruption can lead to many adverse consequences, some of which include altered reproductive performance and hormonally mediated cancers (Birnbaum and Fenton 2003;Kavlock et al. 1996;Soto and Sonnenschein 2010;Spencer et al. 2011). Endocrine disruption can also have adverse effects on the fetus or newborn because of the delicate balance of hormones required during critical develop mental windows (Bigsby et al. 1999;Chandrasekar et al. 2011;Cooper and Kavlock 1997;Mahoney and Padmanabhan 2010).
For example, studies have demonstrated that thyroid hormone insufficiency during pregnancy may lead to adverse neurological outcomes in children (Haddow et al. 1999).
The Federal Food, Drug, and Cosmetic Act (FFDCA 1996), as amended by the Food Quality Protection Act (FQPA 1996), and the Safe Drinking Water Act Amendments (SDWA 1996), requires the U.S. Environmental Protection Agency (EPA) to determine whether certain substances may have an effect in humans similar to that produced by a naturally occurring estrogen, or other such endocrine effects (FFDCA 1996). In response, the U.S. EPA formed the Endocrine Disruptor Screening Program (EDSP) (U.S. EPA 2012b). The EDSP is a twotiered program that requires chemical manufacturers to submit or generate data on a suite of both in vivo and in vitro assays. The first phase of EDSP assays are designated as the Tier 1 screening (T1S) battery (U.S. EPA 2012c). These tests identify chemicals with the potential to interact with endocrine pathways or mechanisms, and focus on disruption of estrogen, androgen, and thyroid hormone pathways. Based on a weightofevidence approach, chemicals showing positive activity in T1S assays could then be subject to more complex Tier 2 tests (U.S. EPA 2011b). The European Commission is continuing the implementation of the European Union's Community Strategy for Endocrine Disrupters, which includes the establishment of a priority list of substances for further evalua tion and assay development and validation (European Commission 2012). In addition, the European Commission is working toward defining specific criteria to identify endocrine disruptors within a legis la tive framework, drawing on current scientific opinion (Kortenkamp et al. 2011).
The U.S. EPA estimates that the statutory requirements and discretionary authorities through passage of the FQPA and its amendments and the SDWA will require the EDSP to screen as many as 9,700 environ mental chemicals. Generating the data required under the current testing guidelines will be expensive and timeconsuming, and it will require signifi cant animal resources (U.S. EPA 2011a). To date, chemicals have been nominated by the U.S. EPA for EDSP T1S on the basis of exposure potential or registration status. Because of fiscal and time constraints, the U.S. EPA is considering using endocrine related in vitro high throughput screening (HTS) assays and in silico models to prioritize chemicals for testing in T1S (U.S. EPA 2011a). There has been a significant improvement in HTS technologies since the U.S. EPA began work on developing and implementing the EDSP. In 2007, the National Research Council Report Toxicity Testing in the 21st Century: A Vision and a Strategy (National Research Background: Over the past 20 years, an increased focus on detecting environmental chemicals that pose a risk of adverse effects due to endocrine disruption has driven the creation of the U.S. Environmental Protection Agency (EPA) Endocrine Disruptor Screening Program (EDSP). Thousands of chemicals are subject to the EDSP; thus, processing these chemicals using current test batteries could require millions of dollars and decades. A need for increased throughput and efficiency motivated the development of methods using in vitro high throughput screening (HTS) assays to prioritize chemicals for EDSP Tier 1 screening (T1S). oBjective: In this study we used U.S. EPA ToxCast HTS assays for estrogen, androgen, steroidogenic, and thyroid-disrupting mechanisms to classify compounds and compare ToxCast results to in vitro and in vivo data from EDSP T1S assays.

Method:
We implemented an iterative model that optimized the ability of endocrine-related HTS assays to predict components of EDSP T1S and related results. Balanced accuracy was used as a measure of model performance. results: ToxCast estrogen receptor and androgen receptor assays predicted the results of relevant EDSP T1S assays with balanced accuracies of 0.91 (p < 0.001) and 0.92 (p < 0.001), respectively. Uterotrophic and Hershberger assay results were predicted with balanced accuracies of 0.89 (p < 0.001) and 1 (p < 0.001), respectively. Models for steroidogenic and thyroid-related effects could not be developed with the currently published ToxCast data. conclusions: Overall, results suggest that current ToxCast assays can accurately identify chemicals with potential to interact with the estrogenic and androgenic pathways, and could help prioritize chemicals for EDSP T1S assays. Council 2007) acknowledged these advances and recommended that the agency develop a strategy to use modern molecularbased screening methods to reduce, and ultimately replace, the reliance on wholeanimal toxicity testing. The U.S. EPA's ToxCast program (U.S. EPA 2012e), and the U.S. government's crossagency Tox21 program (U.S. EPA 2012d) are using HTS assays and developing computational tools to predict chemical hazard, to characterize a diverse set of toxicity pathways, and to prioritize the toxicity testing of environmental chemicals (Huang et al. 2011;U.S. EPA 2012d). Included in these programs are assays that cover toxicity pathways involving estrogen, androgen, and thyroid hormone receptors, as well as targets within the steroido genesis pathway. The current ToxCast chemical library covers approximately 17% of the chemicals subject to the EDSP, and the larger Tox21 chemical library covers approximately 53% of the chemicals subject to EDSP. Assay technologies include competitive binding, reporter gene, and enzyme inhibition assays. The comparison of HTS assays, endocrinerelated modes of action (MOA) and EDSP T1S is shown in Figure 1. An endocrine MOA consists of a series of molecular initiating events relevant for estrogen, androgen, thyroid, or steroidogenic pathways. These assays do not represent their respective MOA in its entirety, but are used to detect chemicals capable of perturbing a particular MOA. In the present study, we investigated the predictive ability of ToxCast HTS assays for end points tested in EDSP T1S, and we tested the hypothesis that if a chemical activates the estrogen or androgen receptor in vitro, estrogen and androgenrelated effects will occur in in vivo bioassays. Ideally, HTS tests should be highly reproducible and yield a minimal number of falsepositive (specificity) and falsenegative (sensitivity) chemicals.
Previous studies have suggested the use of HTS assays for identifying endocrine disrupting potential. For example, the ReProTect project developed within the 6th European Framework Program tested 14 in vitro assays using 10 prototype compounds to determine feasibility for a reproductive screening program (Schenk et al. 2010). Those in vitro assays were grouped into three segments of the reproductive cycle: endocrine disruption, fertility, and embryonic development. The results of ReProTect showed, at least for the 10 prototype chemicals, that appropriate in vitro assay selection can effectively group compounds based on known reproductive toxicity (Schenk et al. 2010).
HTS assays are useful for identifying chemical impacts on molecular initiating events in biological or toxicological pathways. Combinations of HTS assays measuring com petitive ligand binding, reporter gene activa tion, and enzyme inhibition can be used to charac terize chemical potential for endocrine disruption. These chemical charac teriza tions can then be quantitatively evaluated by inves tigating associations with guideline EDSP T1S assay results. The aim of the present study was to use this datadriven approach to identify candidate MOAs for predictive modeling efforts, which subsequently will be used to pri oritize chemicals for further endocrinerelated testing.

Methods
Chemical selection. In this study we used data from the ToxCast Phase I chemical library, containing data for 309 unique chemical structures (U.S. EPA 2012f). Most of these chemicals are either current or former active ingredients in fooduse pesticides that were designed to be bioactive, or they are industrial chemicals that are environmentally relevant. Details of the chemical library were reported by Judson et al. (2009). Data on an additional 23 reference chemicals were included that were tested in a separate study , 17 of which were not in the ToxCast Phase I library. CAS registry numbers (CASRN) for the ToxCast Phase 1 chemicals and the additional 17 chemi cals are available online in Supplemental_File_1.csv (Rotroff et al. 2012).
Guideline and non-guideline endocrine assays. Data from guideline endocrinerelated in vitro and in vivo studies were extracted from EDSP Tier 1 validation reports from the U.S. EPA EDSP web site (U.S. EPA 2012a). Nonguideline studies were obtained from open literature by querying PubMed (http:// www.ncbi.nlm.nih.gov/pubmed) and Google Scholar (http://scholar.google.com/) using the following terms: (any chemical name or CASRN in the 309) AND ("in vitro" OR "in vivo") AND ("estrogen" OR "androgen" OR "uterotrophic" OR "Hershberger" OR "steroidogenesis" OR "thyroid hormone"). The automated search found a wide variety of studies representing 2,113 individual stud ies. The list of studies was manually curated to remove studies that did not contain data usable for the current analysis, leaving 248 unique studies (e.g., studies of mixtures without testing compounds individually, stud ies that mentioned the chemical but did not test it in a bioassay, studies measuring bio accumulation). Studies that identified their methods as following the Organisation for Economic Cooperation and Development (OECD) guidelines (Kanno et al. 2001(Kanno et al. , 2003OECD 1999OECD , 2001OECD , 2003OECD , 2007 or EDSP protocols were grouped together with EDSP T1S data for the guideline analy sis. When available, PubMed identifiers (PMID) were used as unique annotations for each report. For the few instances when no PMID was available or for each EDSP T1S validation report, a unique identifying number was gen erated. The citation information for all docu ments used in the analysis is available online in Supplemental_File_2.txt (Rotroff et al. 2012).
Guideline endocrinerelated assays gath ered from EDSP validation reports and OECD guideline studies were categorized according to whether they tested estrogen, androgen, steroidogenesis, or thyroidrelated MOAs (guidelineE, guidelineA, guidelineS, guide lineT, respectively). Additional information captured included study type (e.g., amphibian metamorphosis, reporter gene), assay type (e.g., serum levels, organ weight), species, strain, cell type, target, and whether or not it was an EDSP/OECD guideline study. Chemical potency [e.g., concentration at halfmaximum activity (AC 50 ), lowest effective concentration] for a given end point was captured as it was represented in the study report along with the maximum concentration/dose tested. In addition, agonist or antagonist responses were noted when applicable. Data from guideline and nonguideline studies were dichotomized as either active if a response was observed, or inactive if no response was observed. If a study investigated multiple end points for a given endocrine MOA and produced at least one

In vitro
In vivo statistically significant end point, then that study-chemical-MOA combination was con sidered active. Activity/inactivity was deter mined based on the presence of a statistically significant response or was based on the study author's conclusion. Data were further anno tated as having a hit value of either 1 or 0 for active and inactive, respectively. We combined all guideline and nonguideline literature stud ies to have a single hit value for each studychemical-MOA combination. Data that were conflicting or otherwise unclear were included in the data table but annotated as such, and removed from analyses. The data obtained from guideline endocrinerelated studies and other nonguideline literature reports are available online in Supplemental_File_3.csv (Rotroff et al. 2012).
For chemicals that produced a statistically significant and concentrationdependent response in a given assay, the AC 50 was recorded. The criteria for determining the activity of a compound are assay platform dependent [see Supplemental Material, Appendix A, for further details (http:// dx.doi.org/10.1289/ehp.1205065)]. The data were then dichotomized so that if an AC 50 was present for a given chemical end point concentration, a 1 was reported; if no response was observed, a 0 was reported. Chemicals tested in triplicate for quality control purposes were designated 1 or 0 on a majority basis. Chemicals that were run in duplicate with at least one sample producing an AC 50 were designated as a 1. Experimental methods for each assay used are provided in Supplemental Material, Appendix A (http://dx.doi. org/10.1289/ehp.1205065).
Model development. We performed an iterative, balanced optimization analysis to determine the ability of ToxCast HTS assays to correctly classify the results of guideline endocrinerelated assays while maintaining balance between sensitivity and specificity. The process for this analysis is illustrated in Figure 2. Because each HTS endocrine MOA may have multiple ToxCast HTS assays, we used disjunctive logic employing varied weightofevidence thresholds to determine optimal predictive performance. This model tested variable thresholds for the HTS ToxCast assay results represented as unweighted binary data, while the guideline or nonguideline endocrinerelated assay results remained static. Initially, the model began with a threshold cri terion of one positive ToxCast HTS assay out of the total number of ToxCast HTS assays for a chemical to be considered to perturb a given MOA. Once calculated, the model was then rerun with increasing increments of one assay until all ToxCast HTS assays for a given endocrine MOA were required to be positive for a chemical to be considered to perturb the given MOA. As the threshold for a positive call was increased, a larger weight of evidence was required for a chemical to be considered a "hit" for perturbing the given endocrine MOA. An exception was made for guideline pubertal studies and the ToxCast NVS_NR_hAR assay. Guideline pubertal studies test for effects that can arise through multiple different endocrinerelated pathways. For this reason, if a chemical was considered positive in the pubertal assay and the result conflicted with other guideline studies (e.g., receptor binding, reporter gene), the pubertal assay was not included in the weight of evi dence. The ToxCast NVS_NR_hAR assay is a human androgen receptor binding assay in the LNCaP prostatic cell line. The andro gen receptor in this cell line is known to bind to steroid hormones other than androgens (Veldscholte et al. 1992). For this reason, if a compound was negative in all other HTSA assays, the result for the NVS_NR_hAR assay was not included in the weightofevidence.
For a specific set of criteria across all over lapping chemicals, we calculated sensitivity, specificity, and balanced accuracy (BA) as mea sures of model performance ( Figure 2B). The guideline analysis was performed comparing ToxCast HTS assays and guideline endocrine assays gathered from EDSP validation reports and OECD guideline studies. We also con ducted a separate nonguideline analysis com paring ToxCast HTS assays with assays from nonguideline studies. Many of the EDSP/ OECD guideline studies and those reported in nonguideline literature used multiple stud ies/assays for each chemical-MOA combina tion. Because separate studies are not always in agreement relative to a chemical-MOA pertur bation, the model was run using two scenarios: a) Any positive report for a chemical resulted in a posi tive call for the chemical-MOA com bination; or b) > 50% (threshold > 0.50) of guideline or nonguideline endocrinerelated studies or assays must report the chemical to be active for a given endocrine MOA. For each threshold criteria the number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) were calculated. A TP was any chemical determined to be positive in the ToxCast HTS assays and was also positive in guideline endocrine reports. An FP was positive in ToxCast but reported as negative in the guideline endocrine reports. If a chemical was determined to be negative in the ToxCast HTS assays and positive in the guideline endocrine reports, it was recorded as an FN. Last, a TN was any chemical negative in the ToxCast HTS assays and negative in the guideline endocrine reports. At each thresh old combination, all of the available chemicals were classified as TP, FP, TN, or FN and were used to calculate sensitivity, specificity, and BA as a measure of model performance.

Statistical analysis.
To identify statisti cally significant BA values, we performed a permutation test. The test randomized which ToxCast assays were associated with guide line endocrine studies or biomedical literature for each endocrine MOA in order to deter mine whether or not a randomly chosen set of assays from the > 500 ToxCast end points would likely produce a similar association. The BA calculation based on random assay associations was performed using the same number of ToxCast assays as the model and with the same threshold criteria. Assays were permuted 10,000 times to build the random BA population distribution, and the percen tile where the model BA fell among this dis tribution was calculated to provide a pvalue. A pvalue of < 0.01 was considered statistically significant. The distributions developed from the permutation tests were used to define the confidence intervals in Figures 3 and 4.

Results
Data collection. Data covering guideline endocrinerelated in vitro and in vivo assays was extracted from documents used in EDSP Tier 1 validation or conducted according to OECD guidelines. We found a total of 40 studies covering 154 unique chemi cals, resulting in a total of 1,246 captured end points. Table 2 shows the chemical over lap between the ToxCast chemical library and the chemicals captured from guideline and nonguideline studies. Twentyone chemicals available from EDSP validation documents and other OECD guideline studies covering Figure 2. Illustration of the balanced optimization model used to analyze predictive capacity of endocrine-related ToxCast assays. Multiple assays and study reports were available for each chemical-MOA combination. (A) Snapshot of a step in this modeling/optimization process, in which chemical X is positive in three of five HTS assays and two of three guideline reports. In this example, the dynamic HTS threshold is at least two positive assays and the guideline threshold is at least 50% positive reports, so chemical X is considered a true positive (TP). With less than two positive assays, chemical X would be a false negative (FN); < 50% positive reports would produce a false positive (FP); and if both were negative according to this criteria, chemical X would be a true negative (TN). (B) Method for tabulating results for all chemicals (e.g., chemical X would be counted in the TP portion of the contingency table) to arrive at an estimate of balanced accuracy for each set of threshold parameters.  (Rotroff et al. 2012)].
Model results. The results presented in Figure 3 demonstrate the predictive ability of ToxCast HTSE and HTSA assays rela tive to the corresponding endocrine MOA in the guideline endocrinerelated studies. Detailed results from the univariate model with guideline studies are available online in Supplemental_File_4.csv (Rotroff et al. 2012).
Comparison of HTS and guideline endocrine assays. For HTSE end points, we obtained an optimal BA of 0.91 (p < 0.001) with a sensitivity of 0.89 and a specificity of 0.92, a threshold of two positives for ToxCast HTSE assays, and > 50% for guidelineE studies (Figure 3). This means a minimum of two ToxCast HTSE assays must report an AC 50 value for a chemical to be consid ered positive, and > 50% of guidelineE assays must be reported as positive in the EDSP validation reports or OECD guide line studies. Overlapping HTSE and HTSA chemicals and corresponding performance in the HTS and guideline studies is provided in Supplemental Material, Appendix C and Tables S2 and S3 (http://dx.doi.org/10.1289/ ehp.1205065). Twentyone guidelineErelated chemicals overlapped with ToxCast Phase I chemicals. One chemical, chlorpyrifos methyl (CASRN 5598130), was misclassified as a positive (FP) and one chemical, prochloraz (CASRN 67747095), was misclassified as a negative (FN) by this set of ToxCast assays. If the goal was to optimize sensitivity, a thresh old criteria of one ToxCast HTSE assay and > 50% of guidelineE would produce a perfect sensitivity of 1, but specificity drops to 0.5 across this set of ToxCast HTSE assays [see Supplemental_File_4.csv (Rotroff et al. 2012)]. An additional analy sis was conducted in which the threshold criteria for the guidelineE assays lowered from > 50% to any single positive report resulted in a positive call. This lowers the sensitivity from 0.89 to 0.5, and the over all BA drops to 0.75 (Figure 3). Figure 3 demonstrates the predictive ability of the ToxCast HTSA assays and the guidelineA results. The optimal predictive ability of the ToxCast HTSA assays was reached with a threshold of one HTSA assay and a threshold > 50% for the guidelineA assays. This set of criteria produced a BA of 0.92 (p < 0.001), with a sensitivity of 0.83 and specificity of 1 (See Supplemental Material, Appendix C, Table S3) (http://dx.doi. org/10.1289/ehp.1205065). The results for HTSS and HTST were not statistically significant among any of the analy ses, with BAs of 0.56 (p > 0.01) and 0.50 (p > 0.01), Figure 3. Forest plot illustrating the performance-as measured by sensitivity, specificity, and BA-of ToxCast endocrine-related assays for predicting outcomes captured in EDSP/OECD guideline studies. Symbols represent the optimal BA obtained across all threshold combinations and the corresponding sensitivity and specificity at the same threshold. Gray boxes indicate 95% confidence intervals around permuted BA distributions. Analyses designated "All" include all available assays for the stated endocrine MOA. A value of > 50% "required guideline positives" indicates that > 50% of the studies had to report a positive result for a chemical to be considered a positive in the analysis. If the "required guideline positives" value is 1, any study reporting a positive resulted in the chemical being considered positive in the analysis. A separate analysis compared only uterotrophic and Hershberger analyses (right). The tests listed on the left represent replicate MOA with test conditions annotated under "Required HTS Positives" and "Required guideline positives."   Figure 4. Forest plot illustrating the performance-as measured by sensitivity, specificity, and BA-of ToxCast endocrine-related assays for predicting outcomes captured in non-guideline endocrine studies. Symbols represent the optimal BA obtained across all threshold combinations and the corresponding sensitivity and specificity at the same threshold. Gray boxes indicate 95% confidence intervals around permuted BA distributions. A value of > 50% "required non-guideline positives" indicates that > 50% of the studies had to report a positive result for a chemical to be considered a positive in the analysis. If the "required non-guideline positives" value is 1, any study reporting a positive resulted in the chemical being considered positive in the analysis. The tests listed on the left represent replicate MOA with test conditions annotated under "Required HTS Positives" and "Required non-guideline positives."   (Rotroff et al. 2012)].

Comparison of HTS and uterotrophic and Hershberger assays.
A separate analysis was conducted to determine the predictive capability of the ToxCast HTSE assays to detect positive and negative chemicals reported in EDSP/OECD guideline uterotrophic assays (Figure 3). Eighteen chemicals were available for comparison, and the optimal thresholds for HTSE produced a BA of 0.9 (p < 0.001), with a sensitivity and a specificity of 0.88 and 0.9, respectively.
In addition, we determined the predic tive ability of ToxCast HTSA assays for EDSP/OECD guideline Hershberger results. Although, only six chemicals were available for comparison, the analysis resulted in a BA of 1 (p < 0.001), with a perfect measure of sensitivity and specificity with thresholds of one positive assay required for both HTSA and EDSP/OECD guideline Hershberger reports (Figure 3).
Comparison of HTS and non-guideline assays. Predictive modeling results for non guideline studies in the biomedical literature are presented in Figure 4. All results from the analysis with nonguideline studies are available online in Supplemental_File_5. csv (Rotroff et al. 2012). The HTSE MOA produced a maximum BA of 0.74 (p < 0.01), with at least one ToxCast assay being positive (ToxCast HTSE threshold of 1) and a literature threshold of > 50%. These criteria produced a sensitivity of 0.75 and a specificity of 0.72. Because of the wide range of test conditions, assay technologies, and species present in the openlitera ture, sensitivity was lower than in the guideline studies. This is apparent because of the model optimization that occurred with only one HTSE assay required for a positive classification, compared with optimizing at two assays in the guideline analysis. We observed an overall concordance of 0.7 between the guidelineE assay results and the estrogenrelated literature results given the stated thresholds (data not shown).
The optimal BA reached 0.65 (p > 0.01) with the ToxCast HTSA assays threshold of 1 and and androgenrelated literature threshold > 50%. At these thresholds, sensitivity was low (0.3) but specificity was 1 (Figure 4). There was a concordance between chemical classifications for guidelineA reports and nonguideline reports of 0.77 at the reported thresholds of > 50% (data not shown).

Discussion
The results of this study demonstrate that ToxCast in vitro assays perform adequately to prioritize chemicals for further EDSP T1S for estrogen and androgen activity, and these HTS assays are predictive of the likelihood of a positive or negative finding in more resource intensive assays. Additional HTS assays will be needed to predict steroidogenic and thyroid activity of chemicals. Methods for prioritizing chemicals based on a broad range of ToxCast HTS assays, in combination with physicalchemical properties, have been previously developed . Other efforts are also under way to develop more sophisticated, pathwaybased predictive models that would be more suitable for supporting regulatory decision making. The present study demon strates the MOA for which these models would be expected to succeed, and for which areas need additional technologies before a sufficient screening tool would be expected to be success ful. This information can now be used for more focused followup efforts to identify endocrine related MOAs for prioritization.
The HTSE and HTSA assays demonstrate a high degree of association with the guide lineE and guidelineA assays. The two types of misclassifications, FP and FN, are impor tant because they highlight shortcomings in the model or further specify the domain of applica bility. FPs are compounds predicted to be active but that were not active in this analysis based on the threshold of EDSP/OECD reports or literature data. These are significant because an FP could lead to unnecessary testing in more resource intensive assays, and FNs are of con cern because they represent potentially active chemicals that would have gone undetected.
The HTSE model correctly classified 90% of chemicals, and only 2 of 21 chemicals were misclassified as FP or FN. Chlorpyrifosmethyl was an FP, meaning that it was predicted to be estrogenic by ToxCast HTSE assays but was not positive in the only guidelineE report, which was a utero trophic study by Kang et al. (2004) [see Supplemental Material, Appendix C, Table S2 (http:// dx.doi.org/10.1289]. This same chemical was reported to be inactive in all of the extracted nonguidelineE literature data (active in 0 of 4 available assays). Chlorpyrifos methyl was inactive in all ToxCast HTSE assays except for the Attagene ERα TRANS and CIS reporter gene assays, which resulted in the subsequent positive call.
Nonguideline estrogenrelated literature for prochloraz reported observations of ERα antagonism in some reporter gene and proliferation assays (BonefeldJorgensen et al. 2005;Kjaerstad et al. 2010), but other studies did not observe activity in reporter gene assays (Andersen et al. 2002;Kojima et al. 2004;Lemaire et al. 2006;Petit et al. 1997) or prolifera tion assays (Andersen et al. 2002;Vinggaard et al. 1999) [see Supplemental_ File_3.csv (Rotroff et al. 2012]. Prochloraz was an FN in this analysis because it was active in the NCGC_ERalpha_Antagonist assay but nega tive in all other ToxCast HTSE binding and reporter gene assays [see Supplemental_File_1.csv (Rotroff et al. 2012)]. Prochloraz tested positive in the only guidelineE assay available [see Supplemental Material, Appendix C, Table S2 (http:// dx.doi.org/10.1289/ehp.1205065)]. This EDSP/OECD fathead minnow assay showed altered fecundity, vitello genin, and oocyte atresia after prochloraz treatment (U.S. EPA 2007). Prochloraz is known to disrupt steroido genesis through inhibition of CYP (cytochrome P450) 17 hydroxylase and aromatase, preventing the critical conversion of progesterone to 17αhydroxyprogesterone and testosterone to 17βestradiol, respectively (Blystone et al. 2007;Sanderson et al. 2002). The fathead minnow assay likely detected this non-receptormediated mechanism of estrogen disruption, and this mechanism of action would not have been expected to be detected in the current set of ToxCast HTSE assays. Prochloraz was the only compound mis classified in the HTSA analysis, and the effects observed in the reproductive study in male fish are likely a result of the same steroido genic perturbations. Prochloraz was correctly identified by the ToxCast aromatase enzyme inhibition assay, which was grouped with the HTSS-related MOA.
Although a limited number of chemi cals was available for comparison, we found a strong association between the ToxCast HTSE and HTSA assays with EDSP/OECD guideline uterotrophic and Hershberger stud ies. Eighteen chemicals were available for comparison between ToxCast HTSE and guideline uterotrophic assays and only two were misclassified [see Supplemental Material, Appendix C, for a perfect BA of 1 (see Supplemental Material, Appendix C, Table S3). There are several explanations for why a chemical may be misclassified by the ToxCast HTS models. In some scenarios a chemical may not have been tested at concentrations high enough to exhibit a response in ToxCast assays. Inconsistencies could also result from species, tissue, or celltype differences between the ToxCast and guideline studies. Most of the ToxCast assays use human cell lines or reporter constructs, and some areas of mis classification may result from species differences between these assays and the rodent bioassays. Comparisons of available species between guideline and nonguideline studies are available in Supplemental Material, Appendix B, Table S1 (http://dx.doi. org/10.1289/ehp.1205065). Interspecies differences should be taken into considera tion because they may be quite substantial. For example, studies have highlighted not only the importance of tissue and cell distribution and context within an organism for both ER and AR (Kolasa et al. 2003;Zhou et al. 2002) but also the presence of ERα and ERβ splice variants . Most in vitro assays are limited in their metabolic capabilities, so chemicals that require metabolic activation in order to be active may not be detected. However, methoxychlor and vinclozolin, which become more active with metabolism, were both detected in the HTSE (see Supplemental Material, Appendix C, Table S2) and HTSA (see Supplemental Material, Appendix C, Table S3) assays. Furthermore, in vivo assays may detect chemicals that perturb endocrine related end points elicited via toxicity in other organs, such as the liver (Leffert and Alexander 1976;Masuyama et al. 2000;Xie et al. 2003). The assays selected for the present study comprise only a small portion of the overall endocrine pathway domain. Alterations in neuro endocrine or other pathways, as well as some feedback mechanisms, could be affected by a compound and would not be detected by these assays. The methods we used to classify compounds may result in different conclusions than those obtained by the EDSP (U.S. EPA 2011b). Despite these limitations, evidence from the present study indicates that very few chemicals that are active in EDSP T1S go undetected by ToxCast HTSE and HTSA assays. Most of the misclassifications appear to be from downstream estrogenic and androgenic effects caused by alterations of upstream steroido genic enzymes. Most of the active guidelineE and guidelineA chemicals in this data set appear to operate through receptor mediated pathways and are detectable in vitro.
The nonguideline literature analysis demon strated that ToxCast HTS assays are also predictive of a broader range of endocrine related assays. As expected, we observed a loss of accuracy in predicting the nonguideline literature analysis compared with the EDSP/ OECD guideline studies because the non guideline literature studies used a wide variety of species, assay protocols, and technologies. An additional factor that led to the loss of sensitivity in the HTSA nonguideline analy sis was the imbalance of positive to negative reports. The guideline study had 6 positives of 13 total chemicals (46%) at > 50% threshold, and the nonguideline reports had 47 posi tives of 59 total chemicals (80%) at the same threshold. The sensitivity would be expected to improve with a more balanced data set.
This analysis shows that there is a clear need to develop HTS assays capable of detect ing steroido genesis and thyroid disrupting compounds. The current HTSS related assay within ToxCast is limited to a single cellfree aromatase enzyme activity assay. Aromatase is a key enzyme in the biosynthesis of estro gens from androgens (Schuurmans et al. 1991;Stoker et al. 2000a). However, in addition to aromatase inhibition, other mechanisms of steroidogenesis may be impacted by environ mental chemicals that are not tested in our cur rent HTS battery (Stoker et al. 2000a(Stoker et al. , 2000b. Additional assay technologies that may provide a more comprehensive set of steroido genesis end points are currently being assessed. The ToxCast HTST assays used in our analysis are composed of thyroid hormone receptor binding and reporter gene assays. A limited number of chemicals was available for comparison between the HTST assays and the guideline studies. The inability of the ToxCast HTST assay results to associate with compounds thought to disrupt thyroid homeo stasis in EDSP/OECD guideline studies suggests that most of these compounds are not acting through thyroid hormone receptorme diated mechanisms (Paul et al. 2010;Zorrilla et al. 2009). Thyroid hormone homeo stasis has been shown to be altered through enhanced or suppressed clearance of thyroid hormone by metabolic enzymes (Saghir et al. 2008;Zorrilla et al. 2009). ToxCast contains HTS assays that measure nuclear receptor activation and metabolic enzyme activity, which could be relevant for thyroid hormone metabolism. However, many chemicals that were active in these in vitro ToxCast assays were not asso ciated with adverse outcomes in the in vivo literature we reviewed, and the subsequent lack of specificity for thyroidactive chemicals led to their exclusion from this analysis (data not shown).
From these findings, we conclude that most chemicals chosen to validate EDSP T1S assays alter estrogen and/or androgenrelated end points through nuclear receptormediated mechanisms and are capable of being efficiently detected by the ToxCast HTS assays. For the purpose of prioritization, it is important to establish sufficient confidence that the assays being utilized are specific and sensitive so that chemicals prioritized for EDSP T1S include those most likely to be active. Although fur ther efforts are needed to improve detection of steroido genic and thyroiddisrupting chemicals with in vitro test systems, our results indicate that ToxCast endocrine assays are highly pre dictive of chemicals with estrogenic and andro genic receptorbased endocrine MOAs, and that their use in predictive models for endo crine testing would allow efficient prioritizing of chemicals for further testing.