Article

ChemSpider: An Online Chemical Information Resource

Journal of chemical education (Impact Factor: 1.11). 08/2010; 87(11). DOI: 10.1021/ed100697w

ABSTRACT

ChemSpider is a free, online chemical database offering access to physical and chemical properties, molecular structure, spectral data, synthetic methods, safety information, and nomenclature for almost 25 million unique chemical compounds sourced and linked to almost 400 separate data sources on the Web. ChemSpider is quickly becoming the primary chemistry Internet portal and it can be very useful for both chemical teaching and research.

Download full-text

Full-text

Available from: Antony John Williams
  • Source
    • "Incorporating a much larger list of candidate formulas could also provide greater coverage of observed formula peaks in dust samples. For example, EPA's Aggregated Computational Toxicology Resource (ACToR) database (Judson et al., 2012), which also aggregates inventories relevant to environmental toxicity, currently contains over 500 K CASRN, whereas PubChem (NCBI, 2015) and ChemSpider (Pence and Williams, 2010) each contain millions of structures (i.e., formulas) mapped to an even larger number of substances. Increasing the number of possible substance–formula matches through use of these less highlycurated public resources will increase the computational complexity of the data analysis, and may also introduce greater uncertainty when applying prioritization schemes to determine likely matches. "
    [Show abstract] [Hide abstract]
    ABSTRACT: There is a growing need in the field of exposure science for monitoring methods that rapidly screen environmental media for suspect contaminants. Measurement and analysis platforms, based on high resolution mass spectrometry (HRMS), now exist to meet this need. Here we describe results of a study that links HRMS data with exposure predictions from the U.S. EPA's ExpoCast™ program and in vitro bioassay data from the U.S. interagency Tox21 consortium. Vacuum dust samples were collected from 56 households across the U.S. as part of the American Healthy Homes Survey (AHHS). Sample extracts were analyzed using liquid chromatography time-of-flight mass spectrometry (LC–TOF/MS) with electrospray ionization. On average, approximately 2000 molecular features were identified per sample (based on accurate mass) in negative ion mode, and 3000 in positive ion mode. Exact mass, isotope distribution, and isotope spacing were used to match molecular features with a unique listing of chemical formulas extracted from EPA's Distributed Structure-Searchable Toxicity (DSSTox) database. A total of 978 DSSTox formulas were consistent with the dust LC–TOF/molecular feature data (match score ≥ 90); these formulas mapped to 3228 possible chemicals in the database. Correct assignment of a unique chemical to a given formula required additional validation steps. Each suspect chemical was prioritized for follow-up confirmation using abundance and detection frequency results, along with exposure and bioactivity estimates from ExpoCast and Tox21, respectively. Chemicals with elevated exposure and/or toxicity potential were further examined using a mixture of 100 chemical standards. A total of 33 chemicals were confirmed present in the dust samples by formula and retention time match; nearly half of these do not appear to have been associated with house dust in the published literature. Chemical matches found in at least 10 of the 56 dust samples include Piperine, N,N-Diethyl-m-toluamide (DEET), Triclocarban, Diethyl phthalate (DEP), Propylparaben, Methylparaben, Tris(1,3-dichloro-2-propyl)phosphate (TDCPP), and Nicotine. This study demonstrates a novel suspect screening methodology to prioritize chemicals of interest for subsequent targeted analysis. The methods described here rely on strategic integration of available public resources and should be considered in future non-targeted and suspect screening assessments of environmental and biological media.
    Full-text · Article · Mar 2016 · Environment International
    • "A comprehensive catalog of fungal NPs with experimentally characterized biosynthetic genes would be of great value to the community. While many catalogs related to fungal genetics and metabolism exist, they either focus on primary metabolism (Cerqueira et al., 2014;Stajich et al., 2012), or do not directly link secondary metabolite structures to the encoding BGCs (Degtyarenko et al., 2008;Gaulton et al., 2012;Pence and Williams, 2010;Wang et al., 2009). DoBISCUIT (Ichikawa et al., 2013) and ClusterMine360 (Conway and Boddy, 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Microorganisms produce a wide range of natural products (NPs) with clinically and agriculturally relevant biological activities. In bacteria and fungi, genes encoding successive steps in a biosynthetic pathway tend to be clustered on the chromosome as biosynthetic gene clusters (BGCs). Historically, “activity-guided” approaches to NP discovery have focused on bioactivity screening of NPs produced by culturable microbes. In contrast, recent “genome mining” approaches first identify candidate BGCs, express these biosynthetic genes using synthetic biology methods, and finally test for the production of NPs. Fungal genome mining efforts and the exploration of novel sequence and NP space are limited, however, by the lack of a comprehensive catalog of BGCs encoding experimentally-validated products. In this study, we generated a comprehensive reference set of fungal NPs whose biosynthetic gene clusters are described in the published literature. To generate this dataset, we first identified NCBI records that included both a peer-reviewed article and an associated nucleotide record. We filtered these records by text and homology criteria to identify putative NP-related articles and BGCs. Next, we manually curated the resulting articles, chemical structures, and protein sequences. The resulting catalog contains 197 unique NP compounds covering several major classes of fungal NPs, including polyketides, non-ribosomal peptides, terpenoids, and alkaloids. The distribution of articles published per compound shows a bias towards the study of certain popular compounds, such as the aflatoxins. Phylogenetic analysis of biosynthetic genes suggests that much chemical and enzymatic diversity remains to be discovered in fungi. Our catalog was incorporated into the recently launched Minimum Information about Biosynthetic Gene cluster (MIBiG) repository to create the largest known set of fungal BGCs and associated NPs, a resource that we anticipate will guide future genome mining and synthetic biology efforts toward discovering novel fungal enzymes and metabolites.
    No preview · Article · Jan 2016 · Fungal Genetics and Biology
  • Source
    • "Such data have been captured in computerized databases since the 1970s [24]. In addition to compilations of experimental data [30] [27], there are extensive efforts to create repositories of computed properties, such as crystal structure parameters and formation enthalpies for binary alloys [14]; the many data collected in the the Computational Materials Repository [20] [23], Materials Project [16] [18], Aflowlib.org [10]; and the NIST repositories [1]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Advances in both sensor and computing technologies promise new approaches to discovery in materials science and engineering. For example, it appears possible to integrate theoretical modeling and experiment in new ways, test existing models with unprecedented rigor, and infer entirely new models from first principles. But, before these new approaches can become useful in practice, practitioners must be able to work with petabytes and petaflops as intuitively and interactively as they do with gigabytes and gigaflops today. The Discovery Engines for Big Data project at Argonne National Laboratory is tackling key bottlenecks along the end-to-end discovery path, focusing in particular on opportunities at Argonne's Advanced Photon Source. Here, we describe results relating to data acquisition, management, and analysis. For acquisition, we describe automated pipelines based on Globus services that link instruments, computations, and people for rapid and reliable data exchange. For management, we describe digital asset management solutions that enable the capture, management, sharing, publication, and discovery of large quantities of complex and diverse data, along with associated metadata and programs. For analysis, we describe the use of 100K+ supercomputer cores to enable new research modalities based on near-real-time processing and feedback, and the use of Swift parallel scripting to facilitate authoring, understanding, and reuse of data generation, transformation, and analysis software.
    Full-text · Article · Jan 2015 · Advances in Parallel Computing
Show more