Article

Exploration of the chemical space and its three historical regimes

Authors:
  • Corporación SCIO, Colombia
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Chemical research unveils the structure of chemical space, spanned by all chemical species, as documented in more than 200 y of scientific literature, now available in electronic databases. Very little is known, however, about the large-scale patterns of this exploration. Here we show, by analyzing millions of reac- tions stored in the Reaxys database, that chemists have reported new compounds in an exponential fashion from 1800 to 2015 with a stable 4.4% annual growth rate, in the long run nei- ther affected by World Wars nor affected by the introduction of new theories. Contrary to general belief, synthesis has been the means to provide new compounds since the early 19th cen- tury, well before Wöhler’s synthesis of urea. The exploration of chemical space has followed three statistically distinguishable regimes. The first one included uncertain year-to-year output of organic and inorganic compounds and ended about 1860, when structural theory gave way to a century of more regular and guided production, the organic regime. The current organometal- lic regime is the most regular one. Analyzing the details of the synthesis process, we found that chemists have had preferences in the selection of substrates and we identified the workings of such a selection. Regarding reaction products, the discovery of new compounds has been dominated by very few elemental com- positions. We anticipate that the present work serves as a starting point for more sophisticated and detailed studies of the history of chemistry.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Every discovered substance enlarges the set of known chemicals, which we call the chemical space (5). Given the central role of this space for the formulation of the SCE, every discovery of new elements and compounds may affect the SCE by introducing or perturbing similarities among chemical elements or by affecting the ordering of their atomic weights. ...
... Gmelin's and Beilstein's handbooks, initiated in the nineteenth century, gather records of extractions, synthesis, and properties of substances (5,8). Nowadays, Reaxys, a large electronic database of chemical information that merges these two handbooks plus several ...
... The methods presented here become instrumental to study the further evolution of the periodic system and to ponder its current shape. other sources of chemical information, constitutes a suitable corpus for studies on the evolution of chemistry (5,8). ...
Article
Full-text available
The periodic system, which intertwines order and similarity among chemical elements, arose from knowledge about substances constituting the chemical space. Little is known, however, about how the expansion of the space contributed to the emergence of the system—formulated in the 1860s. Here, we show by analyzing the space between 1800 and 1869 that after an unstable period culminating around 1826, chemical space led the system to converge to a backbone structure clearly recognizable in the 1840s. Hence, the system was already encoded in the space for about two and half decades before its formulation. Chemical events in 1826 and in the 1840s were driven by the discovery of new forms of combination standing the test of time. Emphasis of the space upon organic chemicals after 1830 prompted the recognition of relationships among elements participating in the organic turn and obscured some of the relationships among transition metals. To account for the role of nineteenth century atomic weights upon the system, we introduced an algorithm to adjust the space according to different sets of weights, which allowed for estimating the resulting periodic systems of chemists using one or the other weights. By analyzing these systems, from Dalton up to Mendeleev, Gmelin’s atomic weights of 1843 produce systems remarkably similar to that of 1869, a similarity that was reinforced by the atomic weights on the years to come. Although our approach is computational rather than historical, we hope it can complement other tools of the history of chemistry.
... 1 Chemical knowledge and its evolution 2 not only matters to chemists interested in their disciplinary past and future, it is of central scientific and societal importance, as chemistry shapes and creates the disposition of the world's resources [3], and lies at the border of science, industry, welfare and hazard [3]. Chemistry doubles about each 16 years its material output through publication of new substances [4] and it is the most productive science in terms of number of publications 3 ( Figure S1). Moreover, it has been instrumental in the rise of biochemistry, molecular biology, material science and nanotechnology, to name but a few allied disciplines with their social and environmental impacts. ...
... Not surprisingly, Kant found chemistry as a paradigm for the method of critical philosophy [7], as he was astonished with the methods and logic of this science. 4 1.1 Chemistry's benefits of studying the evolution of its knowledge ...
... As discovering new substances is at the core of chemistry [12], analysing the historical driving forces of this process may lead to detecting the suitable conditions for speeding up the exploration of the chemical space, which spans all chemical species [4]. Although chemistry has had a stable 4.4% annual growth rate of new chemicals since 1800 up to date, there have been periods of rapid exploration as that between 1870 and 1910, driven by the rapid growth of organic chemistry [4], a period of "genesis of land and sea from chaos" [13]. ...
Preprint
Full-text available
Chemistry shapes and creates the disposition of the world's resources and exponentially provides new substances for the welfare and hazard of our civilisation. Over the history chemists-driven by social, semiotic and material forces-have shaped the discipline, while creating a colossal corpus of information and knowledge. Historians and sociologists, in turn, have devised causal narratives and hypotheses to explain major events in chemistry as well as its current status. In this Perspective we discuss the approaches to the evolution of the social, semiotic and material systems of chemistry. We critically analyse their reaches and challenge them by putting forward the need of a more holistic and formal setting to modelling the evolution of chemical knowledge. We indicate the advantages for chemistry of considering chemical knowledge as a complex dynamical system, which, besides casting light on the past and present of chemistry, allows for estimating its future, as well as the effects of hypothetical past events. We describe how this approach turns instrumental for forecasting the effects of material, semiotic and social perturbations upon chemical knowledge. Available data and the most relevant formalisms to analyse the different facets of chemical knowledge are discussed.
... Some of the most popular molecules occur in Wikipedia [7]; in October 2020, data on 17885 compounds were observed. Frequently occurred compounds, which are substrates and products of organic reactions, have been identified; their distributions by molecular weight [8] and abundance [8,9] were given. The distribution of ChemSpider database entries over molecular weight was also derived [2]. ...
... The power functions in logarithmic coordinates are transformed into linear graphs (the example in Figure 1b). These graphs are common for compounds distribution over the frequency of their participation in organic synthesis [8,9] and for the types of molecular fragments distribution over the frequency of their presence in known compounds [10,11]. There are general regularities in such "chemical" frequency distributions, conditionally dividing chemical space into subspaces of more and less popular compounds. ...
... There are general regularities in such "chemical" frequency distributions, conditionally dividing chemical space into subspaces of more and less popular compounds. As for organic synthesis, the "addiction" of chemists to the same/widespread compounds and their corresponding conservatism were noted in the literature [9]. This is another manifestation of the Matthew effect [23]. ...
Article
Full-text available
The idea of popularity/abundance of chemical compounds is widely used in non-target chemical analysis involving environmental studies. To have a clear quantitative basis for this idea, frequency distributions of chemical compounds over indicators of their popularity/abundance are obtained and discussed. Popularity indicators are the number of information sources, the number of chemical vendors, counts of data records, and other variables assessed from two large databases, namely ChemSpider and PubChem. Distributions are approximated by power functions, special cases of Zipf distributions, which are characteristic of the results of human/social activity. Relatively small group of the most popular compounds has been denoted, conventionally accounting for a few percent (several million) of compounds. These compounds are most often explored in scientific research and are practically used. Accordingly, popular compounds have been taken into account as first analyte candidates for identification in non-target analysis.
... Synthesized or isolated chemical compounds have been documented since the beginning of the 19th century and until the year 2015 an exponential increase in the number of new chemical compounds can be observed [6]. In this connection nowadays total chemical space, consisting of all thermodynamic stable structures, is indiscribable large. ...
... The latter consists of charged, interacting electrons and nuclei. As in the case of the HF approach, the ground-state of the model system can be described with a single Slater determinant (see equation 6.5). Here, the determinant includes the one-electron wavefunctions ψ KS k ( r). ...
... 6 shows the average coordination number (ACN, upper panels), the minimum coordination number (MCN, middle panels) and the average bond distance (ABD). In ...
... Over the past century a significant increase in the development and production of chemicals took place [53]. Many of these chemicals are beneficial for our life and well-being and their application reaches a variety of fields. ...
... 49 Moreover, we recently reported that undifferentiated mesenchymal-like BRAFi-resistant cells 50 exhibit myofibroblast/cancer associated fibroblast (CAF)-like features leading to pro-fibrotic 51 ECM reprogramming in vitro and in vivo (22,23). Cell autonomous ECM deposition and 52 remodeling abilities adopted by melanoma cells after MAPKi treatment results in cross- 53 linked collagen matrix and tumor stiffening fostering a feedforward loop dependent on the 54 mechanotransducers YAP and MRTFA and leading to therapy resistance (22). Thus, this pro- 55 fibrotic-like response, typical of the early adaptation and acquired resistance to MAPK 56 inhibition, provides a therapeutic escape route through the activation of alternative survival 57 pathways mediated by cell-matrix communications. ...
... Rapamycin, a pharmacological inhibitor of the mTOR signaling pathway, has also been reported to reduce virus yield upon VV infection [52]. A possible mechanism may be that mTOR activation results in the phosphorylation of 4E-BP, which in turn releases the translation factor elF4E, the component of el4F that binds to the 5'-cap structure of mRNA and promotes translation [52,53]. Upon VV infection, the factor elF4E has been reported to be redistributed in cavities present within viral factories [27,54] where viral translation can proceed. ...
Thesis
Toxicological tests for cosmetic products were classically performed on animal models. Prohibitive costs and evolution of the perception about animal experimentation in the general public have encouraged the development of in vitro tests capable of predicting the toxicity of compounds potentially classifiable as “CMR” (Carcinogenic, Mutagenic and/or Reprotoxic). More recently, tests based on animal experimentation have been banned in the European Union.In this work, we have compared transcriptomic signatures of non-genotoxic carcinogenic compounds on lung epithelia using bi-dimensional (such as the normal human bronchial epithelium cells, BEA-2B) and tri-dimensional (3D) cultures. 3D cultures are cells cultured in “air liquid interface” (ALI) reconstitute a differentiated epithelium composed of different types of cells (ciliated cells, goblet cells, etc.). Three known non-genotoxic carcinogens (cadmium chloride (CdCl2), hydroquinone (HQ) and Phorbol Myristate acetate (PMA)) were selected in this pilot study. ALI and BEAS-2B cultures were first analysed by microarray gene expression profiling upon incubation with the toxicants. This transcriptomic analysis performed on bulk cells revealed a comparable response based on a 200 genes signature between the two culture systems used and a better reproducibility when using the BEAS-2B model. Next, we performed single cell transcriptomic analysis to identify potential bias linked to the ALI culture systems and differences in the biological response to the treatment at the cell subpopulations level. We identified cell-type specific responses that allowed us to establish a transcriptomic signature for each cell type composing the ALI system in response to the toxicants and a hierarchy of “responding cell types”. Individual, toxicant-specific signatures were also established.A comparison between single-cell dataset belonging to bi-dimensional and ALI system, were also performed highlighting a lack of correlation between BEAS-2B and the cell populations previously described in the tri-dimensional cultures.Overall, our results show that the ALI system associated with a single cell transcriptomic analysis can provide additional information compared to bulk transcriptomic analysis. Subsequent experiments on a larger number of non-genotoxic carcinogens will have to be performed to determine whether this methodology can provide specific signatures, predicting the non-genotoxic carcinogenic nature of toxicant.
... Every discovered substance enlarges the set of known chemicals, which we call the chemical space [6]. Given the central role of this space for the formulation of the SCE, every discovery of new elements and compounds, may affect the SCE by introducing or perturbing similarities among chemical elements or by affecting the ordering of their atomic weights. ...
... Taken together, these approaches help us to determine whether specific changes in the 1860s chemical space actually led to the SCE; or whether the patterns of the SCE were already present earlier in history, which leads to ponder whether the SCE could have been formulated earlier. Evolution of the chemical space (1800-1868) Gmelin and Beilstein's Handbooks, initiated in the nineteenth-century, gather records of extractions, synthesis and properties of substances [6,8]. Nowadays Reaxys c , a large electronic database of chemical information, which merges these two handbooks plus several other sources of chemical information, constitutes a suitable corpus for historical studies of chemistry [6,8]. ...
... Evolution of the chemical space (1800-1868) Gmelin and Beilstein's Handbooks, initiated in the nineteenth-century, gather records of extractions, synthesis and properties of substances [6,8]. Nowadays Reaxys c , a large electronic database of chemical information, which merges these two handbooks plus several other sources of chemical information, constitutes a suitable corpus for historical studies of chemistry [6,8]. ...
Preprint
Full-text available
The periodic system arose from knowledge about substances, which constitute the chemical space. Despite the importance of this interplay, little is known about how the expanding space affected the system. Here we show, by analysing the space between 1800 and 1869, how the periodic system evolved until its formulation. We found that after an unstable period culminating around 1826, the system began to converge to a backbone structure, unveiled in the 1860s, which was clearly evident in the 1840s. Hence, contrary to the belief that the ``ripe moment'' to formulate the system was in the 1860s, it was in the 1840s. The evolution of the system is marked by the rise of organic chemistry in the first quarter of the nineteenth-century, which prompted the recognition of relationships among main group elements and obscured some of transition metals, which explains why the formulators of the periodic system struggled accommodating them. We also introduced an algorithm to adjust the chemical space according to different sets of atomic weights, which allowed for estimating the resulting periodic systems of chemists using one or the other nineteenth-century atomic weights. These weights produce orderings of the elements very similar to that of 1869, while providing different similarity relationships among the elements, therefore producing different periodic systems. By analysing these systems, from Dalton up to Mendeleev, we found that Gmelin's atomic weights of 1843 produce systems remarkably similar to that of 1869, a similarity that was reinforced by the atomic weights on the years to come.
... M ost of the chemical space remains uncharted and identifying its regions of biological relevance is key to medicinal chemistry and chemical biology 1,2 . To explore and catalog this vast space, scientists have invented a variety of chemical descriptors, which encode physicochemical and structural properties of small molecules. ...
... The current version of the CC is organized into 5 levels of complexity (A: Chemistry, B: Targets, C: Networks, D: Cells, and E: Clinics), each of which is divided into 5 sublevels (1)(2)(3)(4)(5). In total, the CC is composed of 25 spaces capturing the 2D/3D structures of the molecules, targets and metabolic genes, network properties of the targets, cell response profiles, drug indications, and side effects, among others (Fig. 1a). ...
Article
Full-text available
Chemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics. The broad release of bioactivity data has prompted enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Unfortunately, bioactivity descriptors are not available for most small molecules, which limits their applicability to a few thousand well characterized compounds. Here we present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them. Our signaturizers relate to bioactivities of 25 different types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. Indeed, we illustrate how inferred bioactivity signatures are useful to navigate the chemical space in a biologically relevant manner, unveiling higher-order organization in natural product collections, and to enrich mostly uncharacterized chemical libraries for activity against the drug-orphan target Snail1. Moreover, we implement a battery of signature-activity relationship (SigAR) models and show a substantial improvement in performance, with respect to chemistry-based classifiers, across a series of biophysics and physiology activity prediction benchmarks. Small molecules bioactivity descriptors are enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Here the authors present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them.
... In 2016 the World Health Organization reported 1.6 million deaths and 45 million disability-adjusted life-years lost due to known chemical exposures [1], and that number is increasing. A large number of new chemicals are introduced to the market annuallymore than 10 5 per year since the late 20th century e which represent a drastic increase in the chemical space ( Fig. 1), i.e. the totality of all chemical species (in a sample) [2]. Relatively few of these so-called 'chemicals of emerging concern' (CECs) are adequately characterized with respect to their toxicity and environmental fate, preventing accurate risk assessment [3]. ...
... One approach could be to internationally define and harmonize groups of ISs by matrix type, general goal of the research etc. These groups could then be used and reported for every analysis, even (or especially) if some of them are not detected (see (2) for further discussion). 2. A clear understanding of the fraction of the chemical space explored, every time an NTA workflow is applied, and transparent reporting of the limitations therein. ...
Article
Full-text available
The application of non-target analysis (NTA), a comprehensive approach to characterize unknown chemicals, including chemicals of emerging concern has seen a steady increase recently. Given the relative novelty of this type of analysis, robust quality assurance and quality control (QA/QC) measures are imperative to ensure quality and consistency of results obtained using different workflows. Due to fundamental differences to established targeted workflows, new or expanded approaches are necessary; for example to minimize the risk of losing potential substances of interest (i.e. false negatives, Type II error). We present an overview of QA/QC techniques for NTA workflows published to date, specifically focusing on the analysis of environmental samples using liquid chromatography coupled to HRMS. From a QA/QC perspective, we discuss methods used for each step of analysis: sample preparation, chromatography , mass spectrometry, and data processing. We then finish with a series of recommendations to improve the quality assurance of NTA workflows.
... The group of Grzybowski introduced the concept of the Network of Organic Chemistry (NOC), with molecules represented as nodes and chemical reactions as edges, thereby pioneering a systematic view of connected chemical knowledge.57,58,79 The systematised chemical space was further studied with regards of its statistics and historic evolution.59,80 Most notably, the NOC has been used for applications such as retrosynthesis, 81 to identify strategic molecules for bio-feed integration,83 and in reaction route planning. ...
... Fialkowski et al. first introduced the study of organic synthesis reactions with a network representation based on the Beilstein Database.79 Then, studies on the topology and growth of the network,57,59,80 synthesis planning through the network, 58,81 and applications to One-Pot-Reactions have followed. 82 In our previous works, we have highlighted the potential of the NOC for process route selection and for the identification of strategic molecules for sustainable supply chains. ...
Thesis
One of the largest challenges in the 21st century is the transition towards sustainable practices. In chemical engineering, the choice of feedstock, i.e. fossil or renewable, greatly influences the sustainability of chemical processing routes. At present, 90% of feedstocks in the chemical industry are non-renewable, thus, large-scale supply chain changes are urgently required. To enable this transition, it is of utmost importance that novel, yet competitive, processes based on renewable feedstocks are identified. Systematic early-stage sustainability assessment can cover large regions of chemical space and provide well-reasoned rankings of most promising reaction pathways. In this thesis, the hypothesis that networks are essential to support sustainability assessment of reaction pathways from big data is posed and answered. This thesis identifies three main areas for development: data, metrics, and decision-making, and investigates the use of networks within the areas. Networks provide an interlinked framework for reaction information (data), are key to assess flows of mass and energy (metrics), and form the basis of optimisation algorithms (decision-making). This work represents the chemical space by reaction networks assembled on large-scale data from Reaxys database. A methodology to identify the key molecules within the chemical supply chain, e.g. strategic molecules, is presented. Molecules are described by features based on their position within the network and an isolation forest outlier detection algorithm is employed to identify the key molecules. To assess pathways within network structures, chemical heuristics with following network optimisation are presented. This work introduces Petri net optimisation for reaction networks and compares the event-discrete modelling approach with the steady-state formalism used in reaction network flux analysis. This work explores a case study of reaction pathway identification from β-pinene to citral within chemical big data. Pathways are modelled in circular interaction with the supply network based on material availabilities and demands and an exergetic description of each reaction pathway is presented. The methodological pipeline automates early-stage sustainability assessment for large data sets. Last but not least, this thesis introduces a teaching approach to familiarize non-experts with network science and the complexity of sustainability problems.
... Most of the chemical space remains uncharted and identifying its regions of biological relevance is key to medicinal chemistry and chemical biology 1,2 . To explore and catalogue this vast space, scientists have invented a variety of chemical descriptors, which encode physicochemical and structural properties of small molecules. ...
... The current version of the CC is organized in 5 levels of complexity (A: Chemistry, B: Targets, C: Networks, D: Cells and E: Clinics), each of which is divided into 5 sublevels (1)(2)(3)(4)(5). In total, the CC is composed of 25 spaces capturing the 2D/3D structures of the molecules, targets and metabolic genes, network properties of the targets, cell response profiles, drug indications and side effects, among others ( Figure 1a). ...
Preprint
Full-text available
Chemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics. The broad release of bioactivity data has prompted enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Unfortunately, bioactivity descriptors are not available for most small molecules, which limits their applicability to a few thousand well characterized compounds. Here we present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them. Our signaturizers relate to bioactivities of 25 different types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. Indeed, we illustrate how inferred bioactivity signatures are useful to navigate the chemical space in a biologically relevant manner, and unveil higher-order organization in drugs and natural product collections. Moreover, we implement a battery of signature-activity relationship (SigAR) models and show a substantial improvement in performance, with respect to chemistry-based classifiers, across a series of biophysics and physiology activity prediction benchmarks.
... Background Chemical space available for the generation of new molecules is huge [1][2][3][4], making the synthesis and testing of all possible compounds impractical. Therefore chemists, both experimental and computational, developed tools and approaches for the exploration of chemical space with the aim to identify new compounds with desirable physico-chemical, biological and pharmacological properties [5][6][7][8][9][10][11][12]. A major in silico method for chemical space exploration is de novo molecular design in which new virtual molecules are assembled from scratch [13][14][15][16][17][18]. ...
Preprint
Full-text available
SYBA (SYnthetic Bayesian Accessibility) is a fragment-based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). It is based on a Bernoulli naïve Bayes classifier that is used to assign SYBA score contributions to individual fragments based on their frequencies in the database of ES and HS molecules. SYBA was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, upon the optimization of SAScore threshold (that changes from 6.0 to ~4.5), SAScore yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.
... Chemical space available for the generation of new molecules is huge [1][2][3][4], making the synthesis and testing of all possible compounds impractical. Therefore chemists, both experimental and computational, developed tools and approaches for the exploration of chemical space with the aim to identify new compounds with desirable physico-chemical, biological and pharmacological properties [5][6][7][8][9][10][11][12]. A major in silico method for chemical space exploration is de novo molecular design in which new virtual molecules are assembled from scratch [13][14][15][16][17][18]. ...
Article
Full-text available
Abstract SYBA (SYnthetic Bayesian Accessibility) is a fragment-based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). It is based on a Bernoulli naïve Bayes classifier that is used to assign SYBA score contributions to individual fragments based on their frequencies in the database of ES and HS molecules. SYBA was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, upon the optimization of SAScore threshold (that changes from 6.0 to – 4.5), SAScore yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.
... Our results show that E. coli metabolic network makes use of a wealth amount of A C C E P T E D M A N U S C R I P T Accepted manuscript to appear in ACS 20 Leal, Restrepo, Stadler, Jost the products of its reactions to start other reactions. This contrast with the historical trend in wet-lab chemistry reactions, where most of the products are seldom used in further reactions [27]. As the historical study was conducted over single substances, rather than over educts and products, further work on the curvature of wet-lab chemical reactions needs to be done to determine whether the behaviour found for E. coli is also a trend of chemical reactions, in general. ...
Article
Hypergraphs serve as models of complex networks that capture more general structures than binary relations. For graphs, a wide array of statistics has been devised to gauge different aspects of their structures. Hypergraphs lack behind in this respect. The Forman–Ricci curvature is a statistics for graphs based on Riemannian geometry, which stresses the relational character of vertices in a network by focusing on the edges rather than on the vertices. Despite many successful applications of this measure to graphs, Forman–Ricci curvature has not been introduced for hypergraphs. Here, we define the Forman–Ricci curvature for directed and undirected hypergraphs such that the curvature for graphs is recovered as a special case. It quantifies the trade-off between hyperedge (arc) size and the degree of participation of hyperedge (arc) vertices in other hyperedges (arcs). Here, we determine upper and lower bounds for Forman–Ricci curvature both for hypergraphs in general and for graphs in particular. The measure is then applied to two large networks: the Wikipedia vote network and the metabolic network of the bacterium Escherichia coli. In the first case, the curvature is governed by the size of the hyperedges, while in the second example, it is dominated by the hyperedge degree. We found that the number of users involved in Wikipedia elections goes hand-in-hand with the participation of experienced users. The curvature values of the metabolic network allowed detecting redundant and bottle neck reactions. It is found that ADP phosphorylation is the metabolic bottle neck reaction but that the reverse reaction is not similarly central for the metabolism. Furthermore, we show the utility of the Forman–Ricci curvature for quantification of assortativity in hypergraphs and illustrate the idea by investigating three metabolic networks.
... Figure 7a) shows the number of metabolic reactions with |e i | reactants and e j products. 90% of chemical reactions have at most three reactants and three products (also observed for the whole Chemical Space (Llanos et al. 2019)), which, according to Eq. 5, indicates that frequent curvature values in Fig. 12 (left) are ruled by the accumulated in-and out-degree. In particular, frequent values of curvature were found to distinguish bottle neck and redundant reactions in the metabolic network (Leal et al. 2018). ...
Article
Full-text available
Relationships in real systems are often not binary, but of a higher order, and therefore cannot be faithfully modelled by graphs, but rather need hypergraphs. In this work, we systematically develop formal tools for analyzing the geometry and the dynamics of hypergraphs. In particular, we show that Ricci curvature concepts, inspired by the corresponding notions of Forman and Ollivier for graphs, are powerful tools for probing the local geometry of hypergraphs. In fact, these two curvature concepts complement each other in the identification of specific connectivity motifs. In order to have a baseline model with which we can compare empirical data, we introduce a random model to generate directed hypergraphs and study properties such as degree of nodes and edge curvature, using numerical simulations. We can then see how our notions of curvature can be used to identify connectivity patterns in the metabolic network of E. coli that clearly deviate from those of our random model. Specifically, by applying hypergraph shuffling to this metabolic network we show that the changes in the wiring of a hypergraph can be detected by Forman Ricci and Ollivier Ricci curvatures.
... Background Chemical space available for the generation of new molecules is huge [1][2][3][4], making the synthesis and testing of all possible compounds impractical. Therefore chemists, both experimental and computational, developed tools and approaches for the exploration of chemical space with the aim to identify new compounds with desirable physico-chemical, biological and pharmacological properties [5][6][7][8][9][10][11][12]. A major in silico method for chemical space exploration is de novo molecular design in which new virtual molecules are assembled from scratch [13][14][15][16][17][18]. ...
Preprint
Full-text available
SYBA (SYnthetic Bayesian Accessibility) is a fragment-based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). It is based on a Bernoulli naïve Bayes classifier that is used to assign SYBA score contributions to individual fragments based on their frequencies in the database of ES and HS molecules. SYBA was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, upon the optimization of SAScore threshold (that changes from 6.0 to ~4.5), SAScore yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.
... If each is mined, how many can genuinely be subjected to the prerequisites mentioned in the previous text to further drug discovery? Drug discovery is already replete with ideas that expand chemical space simply based on the limits of chemistry per se [71,72]. Hence, an outstanding question is whether this could help the concept develop further, or if the concept of directed metabolite mimicry enlarges the chemical space. ...
Article
Significant attrition limits drug discovery. The available chemical entities present with drug-like features contribute to this limitation. Using specific examples of promiscuous receptor-ligand interactions, a case is made for expanding the chemical space for drug-like molecules. These ligand-receptor interactions are poor candidates for the drug discovery process. However, provided herein are specific examples of ligand-receptor or transcription-factor interactions, namely, the pregnane X receptor (PXR) and the aryl hydrocarbon receptor (AhR), and its interactions with microbial metabolites. Discrete examples of microbial metabolite mimicry are shown to yield more potent and non-toxic therapeutic leads for pathophysiological conditions regulated by PXR and AhR. These examples underscore the opinion that microbial metabolite mimicry of promiscuous ligand-receptor interactions is warranted, and will likely expand the existing chemical space of drugs.
... Keywords synthetic accessibility -Bayesian analysis Background Chemical space available for the generation of new molecules is huge [1][2][3][4], making the synthesis and testing of all possible compounds impractical. Therefore chemists, both experimental and computational, developed tools and approaches for the exploration of chemical space with the aim to identify new compounds with desirable physico-chemical, biological and pharmacological properties [5][6][7][8][9][10][11][12]. A major in silico method for chemical space exploration is de novo molecular design in which new virtual molecules are assembled from scratch [13][14][15][16][17][18]. ...
Preprint
Full-text available
SYBA (SYnthetic Bayesian Accessibility) is a fragment based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). SYBA is based on the Bayesian analysis of the frequency of molecular fragments in the database of ES and HS molecules. It was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, with thresholds optimized by the analysis of ROC curves, SAScore improves considerably and yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. Though SYBA was developed to quickly assess compound synthetic accessibility, its underlying Bayesian framework is a general approach that can be applied to any binary classification problem. Therefore, SYBA can be easily re-trained to classify compounds by other physico-chemical or biological properties. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.
... Then, during the past hundred years, students learned general and inorganic chemistry, and later practiced these through Periodic-Table colored glasses, rationalized by atomic structure theory. Thus, modern chemistry developed not only along the lines of easily available and practically useful chemicals, but also with effectively blinkered expectations according to the Periodic Table (Keserü et al., 2014;Pye et al., 2017;Llanos et al., 2019;Restrepo, 2019a,b). Under such circumstances, misunderstandings of the Periodic Table happen easily, and unexpected chemistry is overlooked. ...
Article
Full-text available
The chemical elements are the “conserved principles” or “kernels” of chemistry that are retained when substances are altered. Comprehensive overviews of the chemistry of the elements and their compounds are needed in chemical science. To this end, a graphical display of the chemical properties of the elements, in the form of a Periodic Table, is the helpful tool. Such tables have been designed with the aim of either classifying real chemical substances or emphasizing formal and aesthetic concepts. Simplified, artistic, or economic tables are relevant to educational and cultural fields, while practicing chemists profit more from “chemical tables of chemical elements.” Such tables should incorporate four aspects: (i) typical valence electron configurations of bonded atoms in chemical compounds (instead of the common but chemically atypical ground states of free atoms in physical vacuum); (ii) at least three basic chemical properties (valence number, size, and energy of the valence shells), their joint variation across the elements showing principal and secondary periodicity; (iii) elements in which the (sp)8, (d)10, and (f)14 valence shells become closed and inert under ambient chemical conditions, thereby determining the “fix-points” of chemical periodicity; (iv) peculiar elements at the top and at the bottom of the Periodic Table. While it is essential that Periodic Tables display important trends in element chemistry we need to keep our eyes open for unexpected chemical behavior in ambient, near ambient, or unusual conditions. The combination of experimental data and theoretical insight supports a more nuanced understanding of complex periodic trends and non-periodic phenomena.
... The basic idea of modern drug design is to search chemical compounds with desired affinity, potency, and efficacy against the biological target that is relevant to the disease of interest. However, not only that there are tens of thousands known chemical compounds existed in nature, but many more artificial chemical compounds are being produced each year [9]. Thus, the modern drug discovery pipeline is focused on narrowing down the scope of the chemical space where good drug candidates are [7,11]. ...
Chapter
Compound toxicity prediction is a very challenging and critical task in the drug discovery and design field. Traditionally, cell or animal-based experiments are required to confirm the acute oral toxicity of chemical compounds. However, these methods are often restricted by availability of experimental facilities, long experimentation time, and high cost. In this paper, we propose a novel convolutional neural network regression model, named BESTox, to predict the acute oral toxicity (\(LD_{50}\)) of chemical compounds. This model learns the compositional and chemical properties of compounds from their two-dimensional binary matrices. Each matrix encodes the occurrences of certain atom types, number of bonded hydrogens, atom charge, valence, ring, degree, aromaticity, chirality, and hybridization along the SMILES string of a given compound. In a benchmark experiment using a dataset of 7413 observations (train/test 5931/1482), BESTox achieved a squared correlation coefficient (\(R^2\)) of 0.619, root-mean-squared error (RMSE) of 0.603, and mean absolute error (MAE) of 0.433. Despite of the use of a shallow model architecture and simple molecular descriptors, our method performs comparably against two recently published models.
... Fig. 2a) shows the number of metabolic reactions with |e i | reactants and |e j | products. 90% of chemical reactions have at most three reactants and three products (also observed for the whole Chemical Space [3]), which, according to equation 1, indicates that frequent curvature values in Fig. 2b) are ruled by the accumulated in-and out-degree. In particular, frequent values of curvature were found to distinguish bottle neck and redundant reactions in the metabolic network [1]. ...
Conference Paper
Full-text available
Networks encoding symmetric binary relations between pairs of elements are mathematically represented by (undirected) graphs. Graph theory is a well developed mathematical subject, but empirical networks are typically less regular and also often much larger than the graphs that are mathematically best understood. Several quantities have therefore been introduced to characterize the large scale behavior or to identify the most important vertices in empirical networks. As the crucial structure of a graph is, however, given by the set of its edges rather than by its vertices, we should systematically define and evaluate quantities assigned to the edges rather than to the vertices. Curvature is a notion originally introduced in the context of smooth Riemannian manifolds to measure local or global deviation of a manifold from being Euclidean. Ricci curvature specifically, as a local measure, provides relatively broad information about the structure of positively curved manifolds. Therefore, there have been several attempts to discretize curvature notions to other settings such as cell complexes, graphs and undirected hypergraphs for obtaining similar results. By this discretizations they have been able to transfer some of the analytical or topological properties of original smooth curvatures to these discrete spaces. For the directed hypergraph case, these curvatures were introduced recently and very little is known about their descriptive power. In this paper, we first present the results of our discretizations of Forman-Ricci and Ollivier-Ricci curvature notions, then, we show that they are powerful tools for exploring local properties of directed hypergraph motifs. To conclude, we carry out a curvature-based analysis of the metabolic network of E. coli.
... Reconsider Example 1.1. Chemical compounds are discovered at an exponential rate [28]. Suppose that the patterns are regenerated (Fig. 2(b)) after PubChem added a new group of 6375 compounds called boronic esters which is characterized by the functional group outlined in Fig. 1. ...
... The CoRE MOF database and related collections of MOF materials (14,15) clearly demonstrate that tens of thousands of distinct MOFs have been made, but not even the most optimistic proponents of the versatility of these materials would claim that millions or billions of different materials have been made. For comparison, it has been estimated that worldwide, around 6 × 10 5 new chemical species per year are reported (26). ...
Article
Full-text available
Finding examples where experimental measurements have been repeated is a powerful strategy for assessing reproducibility of scientific data. Here, we collect quantitative data to assess how often synthesis of a newly reported material is repeated in the scientific literature. We present a simple power-law model for the frequency of repeat syntheses and assess the validity of this model using a specific class of materials, metal-organic frameworks (MOFs). Our data suggest that a power law describes the frequency of repeat synthesis of many MOFs but that a small number of “supermaterials” exist that have been replicated many times more than a power law would predict. Our results also hint that there are many repeat syntheses that have been performed but not reported in the literature, which suggests simple steps that could be taken to greatly increase the number of reports of replicate experiments in materials chemistry.
... The fundamental strategy in modern drug discovery and development is to identify chemical compounds that potently and selectively modulate the functions of the target molecules to elicit a desired biological response. How to quickly locate these compounds from the vast chemical space and then determine their drug-like properties remains a major challenge [1,2,3]. Traditionally, chemists and biologists perform in vitro and in vivo experiments to test the pharmacodynamics and pharmacokinetic (PD/PK) properties of selected candidates obtained from initial screening results [4,5]. ...
Preprint
Full-text available
As safety is one of the most important properties of drugs, chemical toxicology prediction has received increasing attentions in the drug discovery research. Traditionally, researchers rely on in vitro and in vivo experiments to test the toxicity of chemical compounds. However, not only are these experiments time consuming and costly, but experiments that involve animal testing are increasingly subject to ethical concerns. While traditional machine learning (ML) methods have been used in the field with some success, the limited availability of annotated toxicity data is the major hurdle for further improving model performance. Inspired by the success of semi-supervised learning (SSL) algorithms, we propose a Graph Convolution Neural Network (GCN) to predict chemical toxicity and trained the network by the Mean Teacher (MT) SSL algorithm. Using the Tox21 data, our optimal SSL-GCN models for predicting the twelve toxicological endpoints achieve an average ROC-AUC score of 0.757 in the test set, which is a 6% improvement over GCN models trained by supervised learning and conventional ML methods. Our SSL-GCN models also exhibit superior performance when compared to models constructed using the built-in DeepChem ML methods. This study demonstrates that SSL can increase the prediction power of models by learning from unannotated data. The optimal unannotated to annotated data ratio ranges between 1:1 and 4:1. This study demonstrates the success of SSL in chemical toxicity prediction; the same technique is expected to be beneficial to other chemical property prediction tasks by utilizing existing large chemical databases.
... The exposome is a growing entity. Since 1800, new chemical compounds have been synthesized at a stable 4.4% annual growth rate (Llanos et al. 2019). Still, it is possible to gain additional insights on the unknown exposures with an open-access and crowdsourced catalog. ...
Article
Full-text available
Background: Recent developments in technologies have offered opportunities to measure the exposome with unprecedented accuracy and scale. However, because most investigations have targeted only a few exposures at a time, it is hypothesized that the majority of the environmental determinants of chronic diseases remain unknown. Objectives: We describe a functional exposome concept and explain how it can leverage existing bioassays and high-resolution mass spectrometry for exploratory study. We discuss how such an approach can address well-known barriers to interpret exposures and present a vision of next-generation exposomics. Discussion: The exposome is vast. Instead of trying to capture all exposures, we can reduce the complexity by measuring the functional exposome-the totality of the biologically active exposures relevant to disease development-through coupling biochemical receptor-binding assays with affinity purification-mass spectrometry. We claim the idea of capturing exposures with functional biomolecules opens new opportunities to solve critical problems in exposomics, including low-dose detection, unknown annotations, and complex mixtures of exposures. Although novel, biology-based measurement can make use of the existing data processing and bioinformatics pipelines. The functional exposome concept also complements conventional targeted and untargeted approaches for understanding exposure-disease relationships. Conclusions: Although measurement technology has advanced, critical technological, analytical, and inferential barriers impede the detection of many environmental exposures relevant to chronic-disease etiology. Through biology-driven exposomics, it is possible to simultaneously scale up discovery of these causal environmental factors. https://doi.org/10.1289/EHP8327.
... • Only a small fraction of new chemical compounds are tested for toxicity. More than 20 million substances were reported as of January 2017, with about a million compounds added annually at an exponentially growing rate [992]; number of tested chemicals is measured in thousands. ...
Preprint
Full-text available
Recent decades have seen a rise in the use of physics-inspired or physics-like methods in attempts to resolve diverse societal problems. Such a rise is driven both by physicists venturing outside of their traditional domain of interest, but also by scientists from other domains who wish to mimic the enormous success of physics throughout the 19th and 20th century. Here, we dub the physics-inspired and physics-like work on societal problems "social physics", and pay our respect to intellectual mavericks who nurtured the field to its maturity. We do so by comprehensively (but not exhaustively) reviewing the current state of the art. Starting with a set of topics that pertain to the modern way of living and factors that enable humankind's prosperous existence, we discuss urban development and traffic, the functioning of financial markets, cooperation as a basis for civilised life, the structure of (social) networks, and the integration of intelligent machines in such networks. We then shift focus to a set of topics that explore potential threats to humanity. These include criminal behaviour, massive migrations, contagions, environmental problems, and finally climate change. The coverage of each topic is ended with ideas for future progress. Based on the number of ideas laid out, but also on the fact that the field is already too big for an exhaustive review despite our best efforts, we are forced to conclude that the future for social physics is bright. Physicists tackling societal problems are no longer a curiosity, but rather a force to be reckoned with, yet for reckoning to be truly productive, it is necessary to build dialog and mutual understanding with social scientists, environmental scientists, philosophers, and more.
... The fact that the drug-like chemical space [1][2][3] is around 10 60 -10 100 makes the process of finding a compound that simultaneously satisfies the plethora of criteria, such as bioactivity, drug metabolism and pharmacokinetic (DMPK) profile, as well as synthetic accessibility, as difficult as finding a needle in a haystack [4][5][6]. Hence, both medicinal and computational chemists attempted to develop approaches to efficiently explore chemical space to identify the compounds with desirable pharmacological activities as well as AD-MET properties [7][8][9][10][11]. Among these efforts, the virtual-library-based de novo molecule design method represents an important computational paradigm [12][13][14][15]. ...
Article
Full-text available
With the increasing application of deep-learning-based generative models for de novo molecule design, the quantitative estimation of molecular synthetic accessibility (SA) has become a crucial factor for prioritizing the structures generated from generative models. It is also useful for helping in the prioritization of hit/lead compounds and guiding retrosynthesis analysis. In this study, based on the USPTO and Pistachio reaction datasets, a chemical reaction network was constructed for the identification of the shortest reaction paths (SRP) needed to synthesize compounds, and different SRP cut-offs were then used as the threshold to distinguish a organic compound as either an easy-to-synthesize (ES) or hard-to-synthesize (HS) class. Two synthesis accessibility models (DNN-ECFP model and graph-based CMPNN model) were built using deep learning/machine learning algorithms. Compared to other existing synthesis accessibility scoring schemes, such as SYBA, SCScore, and SAScore, our results show that CMPNN (ROC AUC: 0.791) performs better than SYBA (ROC AUC: 0.76), albeit marginally, and outperforms SAScore and SCScore. Our prediction models based on historical reaction knowledge could be a potential tool for estimating molecule SA.
... • Only a small fraction of new chemical compounds are tested for toxicity. More than 20 million substances were reported as of January 2017, with about a million compounds added annually at an exponentially growing rate [1004]; number of tested chemicals is measured in thousands. ...
Article
Recent decades have seen a rise in the use of physics methods to study different societal phenomena. This development has been due to physicists venturing outside of their traditional domains of interest, but also due to scientists from other disciplines taking from physics the methods that have proven so successful throughout the 19th and the 20th century. Here we characterise the field with the term ‘social physics’ and pay our respect to intellectual mavericks who nurtured it to maturity. We do so by reviewing the current state of the art. Starting with a set of topics that are at the heart of modern human societies, we review research dedicated to urban development and traffic, the functioning of financial markets, cooperation as the basis for our evolutionary success, the structure of social networks, and the integration of intelligent machines into these networks. We then shift our attention to a set of topics that explore potential threats to society. These include criminal behaviour, large-scale migration, epidemics, environmental challenges, and climate change. We end the coverage of each topic with promising directions for future research. Based on this, we conclude that the future for social physics is bright. Physicists studying societal phenomena are no longer a curiosity, but rather a force to be reckoned with. Notwithstanding, it remains of the utmost importance that we continue to foster constructive dialogue and mutual respect at the interfaces of different scientific disciplines.
... The level of complexity and vast amounts of data within chemistry provides a prime opportunity to achieve significant breakthroughs with the application of AI. First, the type of molecules that can be constructed from atoms are almost unlimited, which leads to unlimited chemical space 166 ; the interconnection of these molecules with all possible combinations of factors, such as temperature, substrates, and solvents, are overwhelmingly large, giving rise to unlimited reaction space. 167 Exploration of the unlimited chemical space and reaction space, and navigating to the optimum ones with the desired properties, is thus practically impossible solely from human efforts. ...
Article
Full-text available
Artificial Intelligence (AI) coupled with promising machine learning (ML) techniques well known from computer science is broadly affecting many aspects of various fields including science and technology, industry, and even our day to day life. The ML techniques have been developed to analyze high-throughput data with a view to obtaining useful insights, categorizing, predicting and making evidence-based decisions in novel ways, which will promote the growth of novel applications and fuel the sustainable booming of AI. This paper undertakes performs a comprehensive survey on the development and application of AI in different aspects of fundamental sciences, including information science, mathematics, medical science, materials science, geoscience, life science, physics and chemistry. The challenges that each discipline of science meets, and the potentials of AI techniques to handle these challenges, are discussed in detail. Moreover, we shed light on new research trends entailing the integration of AI into each scientific discipline. The goal of this paper is to provide a broad research guideline on fundamental sciences with potential infusion of AI, to help motivate researchers to deeply understand the state-of-the-art applications of AI-based fundamental sciences, and thereby to help promote the continuous development of these fundamental sciences.
... It should be emphasized that of the 18,928 chemicals included in the study, only 3682 (19%) were mentioned in any paper from the 15 included major ecotoxicological journals and as few as 1118 (5.9%) were mentioned ten or more times (Supplementary Table 2), suggesting that we still far from covering the chemosphere. However, the number of chemicals has been estimated to have an approximate annual growth rate of ~4% (Llanos et al., 2019), which, if sustained over a 20 year period, becomes an accumulated increase of approximately 200%. This suggests that the ecotoxicological research community may actually be reducing the gapor at least expanding the knowledge at a rate comparable to the number of new chemicals introduced on the market. ...
Article
Full-text available
Environmental policymaking relies heavily on the knowledge of the toxicological properties of chemical pollutants. The ecotoxicological research community is an important contributor to this knowledge, which together with data from standardized tests supports policy-makers in taking the decisions required to reach an appropriate level of protection of the environment. The chemosphere is, however, massive and contains thousands of chemicals that can constitute a risk if present in the environment at sufficiently high concentrations. The scientific ecotoxicological knowledge is growing but it is not clear to what extent the research community manages to cover the large chemical diversity of environmental pollution. In this study, we aimed to provide an overview of the scientific knowledge generated within the field of ecotoxicology during the last twenty years. By using text mining of over 130,000 scientific papers we established time-trends describing the yearly publication frequency of over 3500 chemicals. Our results show that ecotoxicological research is highly focused and that as few as 65 chemicals corresponded to half of all occurrences in the scientific literature. We, furthermore, demonstrate that the last decades have seen substantial changes in research direction, where the interest in pharmaceuticals has grown while the interest in biocides has declined. Several individual chemicals showed an especially rapid increase (e.g. ciprofloxacin, diclofenac) or decrease (e.g. lindane and atrazine) in occurrence in the literature. We also show that university-and corporate-based research exhibit distinct publication patterns and that for some chemicals the scientific knowledge is dominated by publications associated with the industry. This study paints a unique picture and provides quantitative estimates of the scientific knowledge of environmental chemical pollution generated during the last two decades. We conclude that there is a large number of chemicals with little, or no, scientific knowledge and that a continued expansion of the field of ecotoxicology will be necessary to catch up with the constantly increasing diversity of chemicals used within the society.
... The collection of every species reported up to date constitutes the so-called Chemical Space (CS). This space currently comprises well over 30 million substances and is growing exponentially [2]. In order to characterize this ever-growing space, chemists seek for similarity of substances on the CS based on the way they combine [3]. ...
Conference Paper
Full-text available
The collection of every species reported up to date constitutes the so-called Chemi- cal Space (CS). This space currently comprises well over 30 million substances and is growing exponentially [2]. In order to characterize this ever-growing space, chemists seek for similarity of substances on the CS based on the way they combine [3]. Mendeleev’s work on chemical elements was based upon his knowledge of the CS by 1869 is per- haps the most famous example of how the CS determines similarity relations [4]. From a contemporary point of view, Network Theory serves as a natural framework to identify c these kind of relational patterns in the CS [5]. Nowadays, databases such as Reaxys 6 have grown to a point where they can be taken as proxies for the whole CS, opening the possibility to analyze it from a data driven perspective. In this work we propose to study the similarity of chemical elements according to the compounds they form. From each compound, we deleted each element to ob- tain a formula that is connected to the deleted element, v.g. S 1/2 O 4/2 , Na 2/1 O 4/1 and Na 2/4 S 1/4 are formulae coming from Na 2 SO 4 (Sodium sulfate) where Na, S and O, have been deleted respectively. This form a bipartite graph formed by elements and those formulae where they have been deleted, We build our network using 26,206,663 compounds recorded on Reaxys up to 2015. Similarity among chemical elements is constructed analogously to Social Network Analysis, where actors are declared similar whenever they are connected to the same set of other actors. The more formulae ele- ments share, the more similar they are. We introduce a new notion of in-betweenness of elements acting as mediators on similarity relations of others. We analyze the struc- tural features of this network and how they are affected by node removal. We show that the network is both highly dense and redundant. Even though it is heavily centralized, similarity relations are widely spread across a wide range of formulae, which grants the network extraordinary structure resiliency, even against directed attack. We discuss some implications of these results for chemistry.
Article
The dearomatizing photocycloaddition reaction is a powerful and effective strategy for synthesizing complex, three-dimensional, polycyclic scaffolds from simple aromatic precursors. Generally, the dearomatizing photocycloaddition reaction is promoted by visible light and occurs via an energy transfer (EnT) process. This mini-review provides an overview of recent advances in this area (2018-2020), encompassing both intramolecular and intermolecular transformations. While the majority of the studies are centered on intramolecular processes due to their predictable regio- and stereo-selectivity, intermolecular transformations that show an exceptionally broad substrate scope are beginning to emerge.
Article
Legislative design impedes study of chemicals in the environment
Article
Full-text available
The Periodic Law, one of the great discoveries in human history, is magnificent in the art of chemistry. Different arrangements of chemical elements in differently shaped Periodic Tables serve for different purposes. “Can this Periodic Table be derived from quantum chemistry or physics?” can only be answered positively, if the internal structure of the Periodic Table is explicitly connected to facts and data from chemistry. Quantum chemical rationalization of such a Periodic Tables is achieved by explaining the details of energies and radii of atomic core and valence orbitals in the leading electron configurations of chemically bonded atoms. The coarse horizontal pseudo-periodicity in seven rows of 2, 8, 8, 18, 18, 32, 32 members is triggered by the low energy of and large gap above the 1s and n sp valence shells (2 ≤ n ≤ 6 !). The pseudo-periodicity, in particular the wavy variation of the elemental properties in the four longer rows, is due to the different behaviors of the s and p vs. d and f pairs of atomic valence shells along the ordered array of elements. The so-called secondary or vertical periodicity is related to pseudo-periodic changes of the atomic core shells. The Periodic Law of the naturally given System of Elements describes the trends of the many chemical properties displayed inside the Chemical Periodic Tables. While the general physical laws of quantum mechanics form a simple network, their application to the unlimited field of chemical materials under ambient ‘human’ conditions results in a complex and somewhat accidental structure inside the Table that fits to some more or less symmetric outer shape. Periodic Tables designed after some creative concept for the overall appearance are of interest in non-chemical fields of wisdom and art.
Chapter
In this chapter we discuss the spectral theory of discrete structures such as graphs, simplicial complexes and hypergraphs. We focus, in particular, on the corresponding Laplace operators. We present the theoretical foundations, but we also discuss the motivation to model and study real data with these tools.
Article
Full-text available
Every day we are exposed to a cocktail of anthropogenic compounds many of which are biologically active and capable of inducing negative effects. The simplest way to monitor contaminants in a population is via human biomonitoring (HBM), however conventional targeted approaches require foreknowledge of chemicals of concern, often have compound specific extractions and provide information only for those compounds. This study developed an extraction process for human biomarkers of interest (BoE) in urine that is less compound specific. Combining this with an ultra-high resolution mass spectrometer capable of operating in full scan, and a suspect and non-targeted analysis (SS/NTA) approach, this method provides a more holistic characterization of human exposure. Sample preparation development was based on enzymatically hydrolysed urine spiked with 34 native standards and extracted by solid-phase extraction (SPE). HRMS data was processed by MzMine2 and 80% of standards were identified in the final data matrix using typical NTA data processing procedures.
Article
In terms of molecules and specific reaction examples reported in the literature, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. In effect, the proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well‐known methods. On the brighter side, the newly discovered chemistries are more complex than decades ago, and allow for the rapid construction of complex scaffolds in fewer numbers of steps. In this paper, we study these and other trends in the function of time, reaction‐type “popularity” and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer‐assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and also for identifying erroneous entries in reaction databases.
Chapter
Existing and emerging pollutants pose a significant threat to the health and viability of freshwater systems. Reliable monitoring is critical. This chapter compares passive and conventional grab sampling techniques for pollutants such as pharmaceuticals, pesticides, metals, nutrients, and emerging pollutants such as glyphosate and PFASs, highlighting the techniques’ relative advantages, disadvantages, and challenges. For example, spot grab sampling is not suitable for situations where pollutant concentrations are subject to fluctuation whereas passive sampling can provide a time-weighted pollutant concentration. The application of these sampling techniques to current water quality guidelines from different countries is described. Approaches such as non-target analysis and improved techniques such as the use of novel passive sampling sorbents are suggested to address the growing list of emerging freshwater pollutants. To enable wider use of passive samplers there needs to be a better theoretical understanding of how they operate and more calibration data, especially for emerging pollutants.
Article
In terms of molecules and specific reaction examples reported in the literature, organic chemistry features an impressive, exponential growth. However, new reaction classes/types that fuel this growth are being discovered at a much slower and only linear (or even sublinear) rate. In effect, the proportion of newly discovered reaction types to all reactions being performed keeps decreasing, suggesting that synthetic chemistry becomes more reliant on reusing the well‐known methods. On the brighter side, the newly discovered chemistries are more complex than decades ago, and allow for the rapid construction of complex scaffolds in fewer numbers of steps. In this paper, we study these and other trends in the function of time, reaction‐type “popularity” and complexity based on the algorithm that extracts generalized reaction class templates. These analyses are useful in the context of computer‐assisted synthesis, machine learning (to estimate the numbers of models with sufficient reaction statistics), and also for identifying erroneous entries in reaction databases.
Chapter
This research is focused on finding the simplest possible agent-based model called SPECscape (Social Primitives Experimental Cohort) that can demonstrate the emergence of wealth inequality. Agents feature a simple North-South-East-West best sugar patch search function within a 2D grid style code environment that allows formation of a proto-institution (common pool resource capability) under certain conditions. A Nearly Orthogonal Latin Hypercube (NOLH) is used to explore the behavior space of the model’s dynamics with four distinct sugarscape arrangements and introduction of exogenous shocks at specified stages of the model’s evolution. Our results suggest that proto-institutions and moderate shocks, are beneficial for agent members, and play an important role in lowering wealth inequality when many institutions are present and increasing wealth inequality when only a few are allowed to form, thereby indicating the presence of such institutions have a significant effect on wealth inequality in a society of agents.
Chapter
Full-text available
A química é uma atividade humana que pode parecer servir a guerra mas faz muito mais serviços à paz e ao bem-estar da humanidade, tudo, claro, depende das pessoas. Neste capítulo dá-se uma visão geral e interdisciplinar da química na guerra e na paz ao mesmo tempo que se procura mostrar como se desenvolve esta ciência ao serviço da humanidade. Chemistry is a human activity that may seem to serve war but does much more for the peace and well-being of humanity, everything, of course, depends on people. This chapter gives an overview and interdisciplinary of chemistry in war and peace while trying to show how it is developed this science to the service to humanity.
Article
In combinatorial chemical approaches, optimizing the composition and arrangement of building blocks towards a partic-ular function has been done using a number of methods, including high throughput molecular screening, molecular evolu-tion and computational prescreening. Here, a different approach is considered that uses sparse measurements of library molecules as the input to a machine learning algorithm which generates a comprehensive, quantitative relationship be-tween covalent molecular structure and function that can then be used to predict the function of any molecule in the pos-sible combinatorial space. To test the feasibility of the approach, a defined combinatorial chemical space consisting of ~1012 possible linear combinations of 16 different amino acids was used. The binding of a very sparse, but nearly ran-dom, sampling of this amino acid sequence space to 9 different protein targets is measured and used to generate a general relationship between peptide sequence and binding for each target. Surprisingly, measuring as little as a few hundred to a few thousand of the ~1012 possible molecules provides sufficient training to be highly predictive of the binding of the remaining molecules in the combinatorial space. Further, measuring only amino acid sequences that bind weakly to a target allows the accurate prediction of which sequences will bind 10-100 times more strongly. Thus, the molecular recognition information contained in a tiny fraction of molecules in this combinatorial space is sufficient to characterize any set of molecules randomly selected from the entire space, a fact that potentially has significant implications for the design of new chemical function using combinatorial chemical libraries.
Article
Full-text available
As safety is one of the most important properties of drugs, chemical toxicology prediction has received increasing attentions in the drug discovery research. Traditionally, researchers rely on in vitro and in vivo experiments to test the toxicity of chemical compounds. However, not only are these experiments time consuming and costly, but experiments that involve animal testing are increasingly subject to ethical concerns. While traditional machine learning (ML) methods have been used in the field with some success, the limited availability of annotated toxicity data is the major hurdle for further improving model performance. Inspired by the success of semi-supervised learning (SSL) algorithms, we propose a Graph Convolution Neural Network (GCN) to predict chemical toxicity and trained the network by the Mean Teacher (MT) SSL algorithm. Using the Tox21 data, our optimal SSL-GCN models for predicting the twelve toxicological endpoints achieve an average ROC-AUC score of 0.757 in the test set, which is a 6% improvement over GCN models trained by supervised learning and conventional ML methods. Our SSL-GCN models also exhibit superior performance when compared to models constructed using the built-in DeepChem ML methods. This study demonstrates that SSL can increase the prediction power of models by learning from unannotated data. The optimal unannotated to annotated data ratio ranges between 1:1 and 4:1. This study demonstrates the success of SSL in chemical toxicity prediction; the same technique is expected to be beneficial to other chemical property prediction tasks by utilizing existing large chemical databases. Our optimal model SSL-GCN is hosted on an online server accessible through: https://app.cbbio.online/ssl-gcn/home .
Article
This study highlights new opportunities for optimal reaction route selection from large chemical databases brought about by the rapid digitalisation of chemical data. The chemical industry requires a transformation towards more sustainable practices, eliminating its dependencies on fossil fuels and limiting its impact on the environment. However, identifying more sustainable process alternatives is, at present, a cumbersome, manual, iterative process, based on chemical intuition and modelling. We give a perspective on methods for automated discovery and assessment of competitive sustainable reaction routes based on renewable or waste feedstocks. Three key areas of transition are outlined and reviewed based on their state-of-the-art as well as bottlenecks: (i) data, (ii) evaluation metrics, and (iii) decision-making. We elucidate their synergies and interfaces since only together these areas can bring about the most benefit. The field of chemical data intelligence offers the opportunity to identify the inherently more sustainable reaction pathways and to identify opportunities for a circular chemical economy. Our review shows that at present the field of data brings about most bottlenecks, such as data completion and data linkage, but also offers the principal opportunity for advancement.
Article
Meyer und Mendelejew ordneten und klassifizierten im 19. Jahrhundert die Elemente nach chemischen Verbindungen. Wenn Chemiker jetzt immer mehr Verbindungen synthetisieren und charakterisieren, sollte ein Periodensystem dann nicht irgendwann ganz anders aussehen?
Article
Identifying synthetic routes to molecules of interest has been one of the most challenging tasks for synthetic chemists. We have witnessed the gradual adoption of computational tools in solving retrosynthetic design problems for the past 50 years. Especially in the past five years, computer-aided retrosynthesis publications have become more common due to advancements in computing power, data availability, and data-driven algorithms. This paper provides a review of contemporary retrosynthesis methodologies. We define the retrosynthesis framework and describe how machine learning techniques contribute to reaction template extraction and synthetic complexity ranking. We explore template-based and template-free synthetic search strategies and discuss how learning algorithms can prioritize the most applicable transformation rules. We conclude by addressing potential challenges and opportunities facing automated synthetic planning.
Article
Machine learning and artificial intelligence are increasingly being applied to the drug-design process as a result of the development of novel algorithms, growing access, the falling cost of computation and the development of novel technologies for generating chemically and biologically relevant data. There has been recent progress in fields such as molecular de novo generation, synthetic route prediction and, to some extent, property predictions. Despite this, most research in these fields has focused on improving the accuracy of the technologies, rather than on quantifying the uncertainty in the predictions. Uncertainty quantification will become a key component in autonomous decision making and will be crucial for integrating machine learning and chemistry automation to create an autonomous design–make–test–analyse cycle. This review covers the empirical, frequentist and Bayesian approaches to uncertainty quantification, and outlines how they can be used for drug design. We also outline the impact of uncertainty quantification on decision making.
Article
Full-text available
The design of some novel disubstituted 7,8-dihydro-6H-5,8-ethanopyrido[3,2-d]pyrimidine derivatives is reported. The series was developed from quinuclidinone, which afforded versatile platforms bearing one lactam function in position C-2 that were then used to create C-N or C-C bonds for S N Ar or palladium-catalyzed cross-coupling reactions by in situ C-O activation. The reaction conditions were optimized under microwave irradiation, and a wide range of amines or boronic acids were used to determine the scope and limitations of each method. To complete this study, the X-ray crystallographic data of 7,8-dihydro-6H-5,8-ethanopyrido[3,2-d]pyrimidine derivative 49 were used to formally establish the structures of the products.
Article
Thousands of organic substances that are used in industrial applications ultimately enter the soil and may negatively affect human health. Limited numbers of target pollutants are usually monitored in environmental media because of analytical limitations. In this study, a non-target screening method for quickly analyzing multiple soil samples from a contaminated area (a chemical industry park) by two-dimensional gas chromatography high-resolution time-of-flight mass spectrometry was developed. The types of compounds present in the soil samples were preliminarily analyzed through data simplification and visual assessment. A total of 81 organic compounds with detection frequencies ≥40% in the samples from the chemical industry park were selected for identification, including 38 PAHs, 26 oxygenated organic compounds, eight N-containing compounds, and nine other compounds. Potential sources of the organic compounds in the industrial park were investigated. Some pharmaceutical and organic synthetic intermediates in the soil were affected by nearby chemical plants. After assessing the relative abundances and detection frequencies, 36 pollutants that may pose potential risks to the environment were preliminarily identified. The results of the study were helpful for assessing environmental risks around Yangkou industrial park and the will be helpful when assessing risks in other contaminated areas.
Article
Full-text available
Since the beginning of time, civilizations have looked for more creative ways to dominate and defeat their enemies. The rapid development of the chemical industry just before the Second World War started the era of modern chemical weapon production based on poisons, including toxic arsenic compounds. This paper provides a detailed overview of the production, usage and destruction of this dangerous chemical weapon. Milestones include: (i) the development of knowledge concerning the synthesis and decomposition of toxic warfare agents containing arsenic compounds, (ii) increased awareness of the influence of this poison on human life and the environment, (iii) the development of modern technology for the destruction of chemical weapons, (iv) implementation of legislation which prohibits the use of chemical weapons in combat, and (v) the development of analytical methods to detect arsenic compounds in the environment that was used in warfare.The article includes events before World War I and next focuses on World War II, the Vietnam War and the two Gulf Wars. It further details the development of specific arsenical chemical weapons (e.g. Lewisite, Clark I, Clark II, Adamsite), as well as some agents used as herbicides, like Agent Blue. Special attention is paid to the disarmament times and the challenges of implementing a world-wide plan to destroy chemical weapon stockpiles.
Article
Full-text available
Computational methods and perspectives can transform the history of science by enabling the pursuit of novel types of questions, dramatically expanding the scale of analysis (geographically and temporally), and offering novel forms of publication that greatly enhance access and transparency. This essay presents a brief summary of a computational research system for the history of science, discussing its implications for research, education, and publication practices and its connections to the open-access movement and similar transformations in the natural and social sciences that emphasize big data. It also argues that computational approaches help to reconnect the history of science to individual scientific disciplines.
Article
Organic synthesis has been continuously evolving to higher levels of sophistication and into new domains ever since its emergence in the early part of the nineteenth century. Its impressive growth over the last two centuries parallels its enormous impact on science and society. Modern medicine, the dye industry, aromas and cosmetics, vitamins and nutritional goods, polymers and plastics, energy fuels and high-tech materials are some of its direct benefits that shaped the world as we know it today. As an enabling science and technology organic synthesis has also impacted society indirectly by facilitating the birth and sustainability of other disciplines such as medicinal and process chemistry, chemical biology and biotechnology, physics and biology, and materials science and nanotechnology. It is, therefore, of paramount importance to champion and support the continuous advancement of the art and science of organic synthesis – both method development and total synthesis – for the magnitude of its impact on science and society is measured by its condition at any given time. As sharp as it is currently, its present state begs for more to be achieved in the future, for a comparison of its capabilities with those of Nature leaves us with awe for the latter and much to be desired for the former.
Chapter
The article contains sections titled: 1.Introduction2.Physical Properties3.Chemical Properties4.Occurrence5.Production6.Quality Specifications, Transportation, and Storage7.Uses8.Economic Aspects9.Inorganic Iodine Compounds10.Organic Iodine Compounds11.Toxicology and Occupational Health
Chapter
The article contains sections titled: 1.Introduction2.The Copper Ions3.Copper Oxides and Basic Copper Compounds3.1.Copper(I) Oxide3.2.Copper(II) Oxide3.3.Copper(II) Hydroxide3.4.Copper(II) Carbonate Hydroxide4.Selected Copper Salts and Basic Copper Salts4.1.Copper(II) Acetate4.2.Copper(I) Chloride4.3.Copper(II) Chloride4.4.Copper(II) Oxychloride4.5.Copper(II) Sulfates4.5.1.Copper(II) Sulfate Pentahydrate4.5.2.Anhydrous Copper Sulfate4.5.3.Copper(II) Sulfate Monohydrate4.5.4.Basic Copper(II) Sulfates5.Other Copper Compounds and Complexes5.1.Other Copper Compounds5.2.Copper Complexes6.Copper Reclamation7.Copper and the Environment8.Economic Aspects9.Toxicology and Occupational Health
Chapter
An account is given of the history leading to the launch of the chemical information system Reaxys in 2009, its subsequent development until 2014, and outlook for the future. The path leading from the print form of the two major chemical Handbooks of the 19th century through the building of online databases of the late 1980s and the client/server system of CrossFire (1993-2009) is discussed with particular emphasis placed on the importance of technological development in creating user needs that in turn require an ongoing overhaul of the same technology to better serve the market. The evolution of the Gmelin and Beilstein Handbooks from property-centered collections of chemical structures into the premium data sources of chemical information in CrossFire in the early years of the 21st century is one excellent example of this phenomenon, and the subsequent development of CrossFire and its databases to Reaxys is shown as a second inevitable consequence of the same driver. The account closes with a description of some currently evolving trends and the first steps taken in Reaxys to continue this tradition of innovation.
Article
How do skilled synthetic chemists develop good intuitive expertise? Why can we only access such a small amount of the available chemical space-both in terms of the reactions used and the chemical scaffolds we make? We argue here that these seemingly unrelated questions have a common root and are strongly interdependent. We performed a comprehensive analysis of organic reaction parameters dating back to 1771 and discovered that there are several anthropogenic factors that limit reaction parameters and thus the scope of synthetic chemistry. Nevertheless, many of the anthropogenic limitations such as narrow parameter space and the opportunity for rapid and clear feedback on the progress of reactions appear to be crucial for the acquisition of valid and reliable chemical intuition. In parallel, however, all of these same factors represent limitations for the exploration of available chemistry space and we argue that these are thus at least partly responsible for limited access to new chemistries. We advocate, therefore, that the present anthropogenic boundaries can be expanded by a more conscious exploration of "off-road" chemistry that would also extend the intuitive knowledge of trained chemists.
Article
The method of minimum norm quadratic unbiased estimation (MINQUE) is applied to the estimation of the variances $sigma^2_i$ in the following problems: $(1) Combining k independent estimators øverliney_i (i = 1, cdots, k)$ of a parameter $mu$, $where øverliney_i is the arithmetic mean$ of n$_i$ ($geq 1$) observations y$_ij$ (j = 1, $cdots$, n$_i$) normally and independently distributed with mean $mu$ and variance $sigma^2_i$; (2) estimating the parameters $alpha$ and $beta$ in a linear regression model with replicates: y$_ij$ = $alpha$ + $beta$x$_i$ + e$_ij$ where the e$_ij$ are normally and independently distributed with mean 0 and variance $sigma^2_i$. The variance-covariance matrix $of the MINQU estimators tildesigma^2_i is derived and,$ using the trace and determinant criteria, the gains in efficiency over the customary estimators $s^2_i = (n_i - 1)^-1Sigma_j (y_ij-øverliney_i) ^2$ are tabulated. $A simple modification of tildesigma^2_i,$ which is always positive and analytically tractable, $performed better than the tildesigma^2_i,$ especially when k is small. The MINQU estimator of the variance of ordinary least squares (OLS) estimator of $mu$ (or $alpha$, $beta$) is obtained and the gains in efficiency over those estimators employing the s$^2_i$ are evaluated. The MINQU estimators lead to substantial gains in efficiency when m is small and k is relatively large. $Explicit expressions for the tildesigma^2_i,$ which can be computed for any value of k, are given.
Article
Some recent philosophers of science have argued that chemistry in the nineteenth century “largely lacked theoretical foundations, and showed little progress in supplying such foundations” until around 1900, or even later. In particular, nineteenth-century atomic theory, it is said, “played no useful part” in the crowning achievement of nineteenth-century chemistry, the powerful subdiscipline of organic chemistry. This paper offers a contrary view. The idea that chemistry only gained useful theoretical foundations when it began to merge with physics, it will be argued, is based on an implicit conception of scientific theory that is too narrow, and too exclusively oriented to the science of physics. A broader understanding of scientific theory, and one that is more appropriate to the science of chemistry, reveals the essential part that theory played in the development of chemistry in the nineteenth century. It also offers implications for our understanding of the nature of chemical theory today.
Article
At the core of chemistry lie the structure of the molecule, the art of its synthesis, and the design of function within it. This Essay traces the understanding of the structure of the molecule, the emergence of organic synthesis, and the art of total synthesis from the 19th century to the present day.
Article
Reactions have been studied between uranium hexafluoride and a series of lower fluorides of other elements. The study has also included reaction with a wide range of covalent chlorides. The reactivity of uranium hexafluoride is compared with that of the higher fluorides of d-transition elements, chromium, molybdenum, and tungsten, and considered in the light of uranium as an f-transition element.
Article
The number of chemical substances is considered as a cumulative measure of the cognitive growth of preparative chemistry. During the past 200 years there is approximately exponential growth without saturation. Separate analysis of organic and inorganic chemistry suggests at least a two-phase model either. Detailed discussion of the results (considering also the growth of chemists, chemical papers, patents, and chemical elements) reveals that an external (socio-economical) explanation is insufficient. Instead, an internal (methodological) approach is suggested to explain the exponential growth as well as balancing phenomena in war and post-war times.
Article
Organische Synthesen, die zwischen 1850 und 2004 beschrieben wurden, werden auf dem vereinfachten Niveau eines verknüpften Netzwerks analysiert (siehe das Bild des Netzwerks für 1850). Fundamentale statistische Gesetze, die organische Synthesen beeinflussen, werden aufgestellt. Diese Gesetze ermöglichen es, die präparative und industrielle Nützlichkeit organischer Moleküle abzuschätzen.
Article
When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf's law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. For instance, the distributions of the sizes of cities, earthquakes, solar flares, moon craters, wars and people's personal fortunes all appear to follow power laws. The origin of power-law behaviour has been a topic of debate in the scientific community for more than a century. Here we review some of the empirical evidence for the existence of power-law forms and the theories proposed to explain them.
Substances: The Ontology of Chemistry
  • J Van Brakel
J. van Brakel, Substances: The Ontology of Chemistry (North-Holland-Elsevier, 2012), pp. 171-209.
The chemical core of chemistry I: A conceptual approach
  • Schummer
J. Schummer, The chemical core of chemistry I: A conceptual approach. Hyle 4, 129-162 (1998).