An article published on July 11, 2018 in the peer-reviewed journal Toxicological Sciences reports on a new approach to computational prediction of toxicity that demonstrates reproducibility similar to or superior than that of animal testing data.

Thomas Luechtefeld and colleagues from the Johns Hopkins University Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, U.S. created a database containing chemical hazard information for approximately 10 000 chemicals, collected from the dossiers submitted to the European Chemical Agency (ECHA) by 2013. For several guideline tests standardized by the Organization for Economic Cooperation and Development (OECD), including “acute oral and dermal toxicity, eye and skin irritation, mutagenicity and skin sensitization,” multiple repetitions of the same test were found for several hundred chemicals. For example, in the Draize rabbit eye test, “two chemicals were tested  more than 90 times, 69 chemicals were tested more than 45 times.” Based on such data, the authors calculated that “the probability that an OECD guideline animal test would output the same results in a repeat test was 78% to 96% (sensitivity 50-87%).” This value was used as a benchmark for later evaluating the performance of computational predictions of toxicity.

The authors then constructed an expanded database compiling data from PubChem, ECHA, and National Toxicology Property (NTP)-curated acute oral toxicity dataset. Currently, this database contains “833 844 chemical property values used [as training data] for modeling across 80 908 chemicals for an average of about 10 [health hazard] properties per chemical.” This “automated read-across” resulted in “novel models called RASARs (read-across structure activity relationships),” where “conventional chemical similarity [is combined] with supervised learning.” Chemical similarity “is done by generating a binary fingerprint for each chemical and using Jaccard distance (similarity=1-distance) on fingerprints.” Supervised learning methods “provide a statistical model of the insights deliverable from chemical similarity” and allow assigning confidence to individual predictions.

The RASARs can be used in two different ways: ‘Simple’ and ‘Data fusion.’ The ‘Simple’ RASAR approach is essentially a traditional read-across method, where “hazard [of an unknown chemical is predicted] from chemical analogues with known hazard data.” Tested with nine binary (i.e., toxic vs. non-toxic) health hazards, these models “achieve 70-80% balanced accuracies with constraints on tested compounds.” This is “on par with the reproducibility of the respective animal tests,” the authors emphasize.

The ‘Data fusion’ RASAR approach extends the traditional read-across “by creating large feature vectors from all available property data rather than only the modeled hazard.” Thus, each endpoint is informed by “a broad variety of 19 categories of GHS [(Globally Harmonized System)] classifications (74 in total) of similar chemicals.” These models “show balanced accuracies in the 80-95% range across 9 health hazards with no constraints on tested compounds.” Thus, their reproducibility is higher than that of animal tests. This improvement is explained by the fact that the integration of multiple data sources allows “achiev[ing] more consistent, accurate, and useful information than the individual datasets.” One “side effect of integrating more data is the loss of a clear explanation for predictions,” the authors observe. However, they claim that this “can be ameliorated to some degree via analysis of feature importance.”

The authors conclude that the ‘Data fusion’ RASAR models successfully extend the traditional read-across method, allowing to address “the backlog of . . . untested substances,” including “several thousand food additives and contact materials” and selection of safer alternatives. They further point out that “any sufficiently large data-set of organic chemicals with a given property could be subjected to the RASAR, opening up for further hazards such as endocrine disruption.”

In a press release published on July 11, 2018 by the Johns Hopkins Bloomberg School of Public Health, the study’s principal investigator, Thomas Hartung, said that their results “suggest that we can replace many animal tests with computer-based prediction and get more reliable results.” He further emphasized that “our automated approach clearly outperformed the animal test, in a very solid assessment using data on thousands of different chemicals and tests.” Since computer-based predictions are faster and cheaper than animal testing, application of RASARs would enable “wider safety assessments” of more chemicals with lower costs. The RASAR software is currently being developed by the Underwriters Laboratories, a U.S.-based safety science firm for which Hartung and co-workers consult. However, the RASAR database and models can be shared with “collaborators” who ask for it.

An editorial published on July 11, 2018 by Nature quote Mike Rasenberg, head of computational assessment at ECHA, who said that the new paper “is a good initiative,” but “scientifically there is a lot of work to be done,” and “we can’t yet do all toxicology with a computer.”

Read more

Johns Hopkins Bloomberg School of Public Health (July 11, 2018). “Database analysis more reliable than animal testing for toxic chemicals.

Richard Van Noorden (July 11, 2018). “Software beats animal tests at predicting toxicity of chemicals.Nature


Luechtefeld, T., et al. (2018). “Machine learning of toxicological big data enables read-across structure activity relationships (RASAR) outperforming animal test reproducibility.Toxicological Sciences (published July 11, 2018).