Conference PaperPDF Available

A Network model of the Chemical Space provides similarity structure to the system of chemical elements

Authors:

Abstract and Figures

The collection of every species reported up to date constitutes the so-called Chemi- cal Space (CS). This space currently comprises well over 30 million substances and is growing exponentially [2]. In order to characterize this ever-growing space, chemists seek for similarity of substances on the CS based on the way they combine [3]. Mendeleev’s work on chemical elements was based upon his knowledge of the CS by 1869 is per- haps the most famous example of how the CS determines similarity relations [4]. From a contemporary point of view, Network Theory serves as a natural framework to identify c these kind of relational patterns in the CS [5]. Nowadays, databases such as Reaxys 6 have grown to a point where they can be taken as proxies for the whole CS, opening the possibility to analyze it from a data driven perspective. In this work we propose to study the similarity of chemical elements according to the compounds they form. From each compound, we deleted each element to ob- tain a formula that is connected to the deleted element, v.g. S 1/2 O 4/2 , Na 2/1 O 4/1 and Na 2/4 S 1/4 are formulae coming from Na 2 SO 4 (Sodium sulfate) where Na, S and O, have been deleted respectively. This form a bipartite graph formed by elements and those formulae where they have been deleted, We build our network using 26,206,663 compounds recorded on Reaxys up to 2015. Similarity among chemical elements is constructed analogously to Social Network Analysis, where actors are declared similar whenever they are connected to the same set of other actors. The more formulae ele- ments share, the more similar they are. We introduce a new notion of in-betweenness of elements acting as mediators on similarity relations of others. We analyze the struc- tural features of this network and how they are affected by node removal. We show that the network is both highly dense and redundant. Even though it is heavily centralized, similarity relations are widely spread across a wide range of formulae, which grants the network extraordinary structure resiliency, even against directed attack. We discuss some implications of these results for chemistry.
Content may be subject to copyright.
A Network model of the Chemical Space provides
similarity structure to the system of chemical elements
Eugenio Llanos1,2,3, Wilmer Leal1,2Andr´
es Bernal2,4, Guillermo Restrepo2, J¨
urgen
Jost2, and Peter F. Stadler1,2,5
1Bioinformatics Group, Department of Computer Science, Universit¨
at Leipzig,H¨
artelstraße
16-18, 04107 Leipzig, Germany
ellanos@sciocorp.org,
2Max Planck Institute for Mathematics in the Sciences, Inselstraße 22,
04103 Leipzig, Germany
3Corporaci´
oon SCIO, Calle 57b 50-50 bloque d22 of. 412, 111321 Bogota, Colombia
4Department of Basic Sciences, Universidad Jorge Tadeo Lozano, 110311 Bogota, Colombia
5The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, New Mexico 87501
1 Introduction
The collection of every species reported up to date constitutes the so-called Chemi-
cal Space (CS). This space currently comprises well over 30 million substances and
is growing exponentially [2]. In order to characterize this ever-growing space, chemists
seek for similarity of substances on the CS based on the way they combine [3]. Mendeleev’s
work on chemical elements was based upon his knowledge of the CS by 1869 is per-
haps the most famous example of how the CS determines similarity relations [4]. From
a contemporary point of view, Network Theory serves as a natural framework to identify
these kind of relational patterns in the CS [5]. Nowadays, databases such as Reaxys c
6
have grown to a point where they can be taken as proxies for the whole CS, opening the
possibility to analyze it from a data driven perspective.
In this work we propose to study the similarity of chemical elements according
to the compounds they form. From each compound, we deleted each element to ob-
tain a formula that is connected to the deleted element, v.g. S1/2O4/2, Na2/1O4/1and
Na2/4S1/4are formulae coming from Na2SO4(Sodium sulfate) where Na, S and O,
have been deleted respectively. This form a bipartite graph formed by elements and
those formulae where they have been deleted, We build our network using 26,206,663
compounds recorded on Reaxys up to 2015. Similarity among chemical elements is
constructed analogously to Social Network Analysis, where actors are declared similar
whenever they are connected to the same set of other actors. The more formulae ele-
ments share, the more similar they are. We introduce a new notion of in-betweenness
of elements acting as mediators on similarity relations of others. We analyze the struc-
tural features of this network and how they are affected by node removal. We show that
the network is both highly dense and redundant. Even though it is heavily centralized,
6Copyright 2019 Elsevier Limited except certain content provided by third parties. Reaxys is
a trademark of Elsevier Limited. Reaxys data were made accessible to our research project via
the Elsevier R&D Collaboration Network.
similarity relations are widely spread across a wide range of formulae, which grants
the network extraordinary structure resiliency, even against directed attack. We discuss
some implications of these results for chemistry.
2 Results
The network is heavily centralized: chemical reactivity of elements is far from uni-
form, as the degree distribution of elements exhibits three different regions, see
Figure 1(b). The first one is composed by a few elements that concentrate the vast
majority of relations (the first is H, which accounts for 95.9% of formulae, followed
by C 95%.0). The second one is composed by the bulk of elements, which connect
to 10,000-100,000 formulae. The third region corresponds to elements that have a
very low number of molecular formulae.
Formulae with degree one are mostly connected to central elements: formulae of
degree one correspond to compounds that are unique to one element (singulari-
ties). Eight elements concentrate most singularities (90%) evoke both the singular-
ity principle of the periodic chart and the distinction between organic and inorganic
chemistry. In general, the number of singularities scales semi-linearly with ele-
ment degree (power law with exponent 1.3, see Figure 1(a)). This result shows that
elements tend to be unique as long as more compounds of them are obtained, inde-
pendently of their identity. The more compounds one element has, the less similar
to others it becomes.
Similarity does not partition the space into clear-cut classes of elements: since for-
mulae generate similarity relations among the elements that are connected to them,
the degree of one formula corresponds to the number of elements it makes similar.
The smoothness of this degree distribution (Figure 1(a)) shows that elements can-
not be divided into clear-cut classes, since otherwise such classes would produce
local maxima corresponding to the sizes of these classes. This result has an interest-
ing chemical implication, as it challenges the usual view of elements as separated
families.
Element in-betweenness depends on its degree: elements work as mediators of sim-
ilarity relations through the formulae they constitute. Such mediation scales almost
linearly with the degree of the element (see Figure 1(c)). This is a very interesting
feature, since it shows that similarity relations are not concentrated on certain kind
of compounds or manifested by specific elements working as mediators, but are
evident on the entire CS.
Similarity relations are highly resilient to directed attack: since the network is
highly centralized, deleting random elements should not have a major effect on
the network topology. We instead deleted sequentially elements from the one with
highest degree down to 12 elements and those formulae on which they take part.
Deleting central elements has impact on the degree of the elements and the dis-
tribution goes down on absolute frequency. Notwithstanding, almost all elements
are affected in the same way and the shape of the curve is conserved (see different
data series on Figure 1(b)). The same happens on the degree of formulae, which is
shifted towards the left, but the shape remains (Figure 1(a)).
Strong and weak similarity relations are the less variant: since our network is
of an epistemic nature, vulnerability can be related to the viability of extracting
knowledge with limited information. To test how variant are the similarity rela-
tions against removal of molecular formulae, we calculated the variance of the rank
of pairwise element similarity (number of length 2 paths between the correspond-
ing nodes) when keeping only similarities mediated by each element. Surprisingly,
strong and weak similarities have the lowest variance (see Figure 1(d)), showing
that similarities are by no means random but they form a strong structure that stands
across the entire CS, revealing a fundamental nature of these similarity patterns.
100
101
102
103
104
105
106
107
108
0 10 20 30 40 50 60 70 80 90 100
(a)
Frequency
Formula degree
100
101
102
103
104
105
106
107
0 20 40 60 80 100 120
(b)
Degree
Element
100
101
102
103
104
105
106
107
100101102103104105106107
(c)
Element degree
in-betweenness 0.107X1.03 R2:0.99
singularity 0.0172X1.3 R2:0.98
0
200
400
600
800
1000
1200
1400
1600
1800
0 10 20 30 40 50 60 70 80 90 100 110
(d)
Variance
Average rank
Fig. 1. (a) Distribution of formula degree. Different colors of points correspond to series where
different central elements has been removed. (b) Degrees of elements. (c) Singularities and in-
betweennes vs formula degree. (d) Variance of pairwaise rank position vs average rank position.
Low variance is found on low average rank positions (similar elements) and high average rank
positions (dissimilar elements).
References
1. Schummer, J.: Scientometric studies on chemistry II: Aims and methods of producing new
chemical substances. Scientometrics 39 (1), 125–140 (1997)
2. Llanos, E.; Leal, W.; Luu, D.; Jost, J.; Stadler, P.F.; Restrepo, G.: Exploration of
the chemical space and its three historical regimes. Proceedings of the National
Academy of Sciences of the United States of America 116 (26), 12660–12665 (2019)
https://doi.org/10.1073/pnas.1816039116
3. Schummer, J.: The chemical core of chemistry I: a conceptual approach. HYLE–
International Journal for Philosophy of Chemistry 4 (2), 129-162 (1998)
4. Leal, W.; Llanos, E.; Stadler, P.F.; Jost, J.; Restrepo, G.: The Chemical Space from Which
the Periodic System Arose. ChemRxiv 10.26434/chemrxiv.9698888.v1 (2019)
5. Leal, W.; Restrepo, G.; Bernal, A.: A network study of chemical elements: from binary
compounds to chemical trends. MATCH communications in mathematical and in computer
chemistry 68, 417–442 (2012)
ResearchGate has not been able to resolve any citations for this publication.
Preprint
Full-text available
Meyer and Mendeleev came across with their periodic systems by classifying and ordering the known elements by about 1869. Order and similarity were based on knowledge of chemical compounds, which gathered together constitute the chemical space by 1869. Despite its importance, very little is known about the size and diversity of this space and even less is known about its influence upon Meyer's and Mendeleev's periodic system. Here we show, by analysing 11,484 substances reported in the scientific literature up to 1869 and stored in Reaxys database, that 80% of the space was accounted by 12 elements, oxygen and hydrogen being those with most compounds. We found that the space included more than 2,000 combinations of elements, of which 5%, made of organogenic elements, gathered half of the substances of the space. By exploring the temporal report of compounds containing typical molecular fragments, we found that Meyer's and Mendeleev's available chemical space had a balance of organic, inorganic and organometallic compounds, which was, after 1830, drastically overpopulated by organic substances. The size and diversity of the space show that knowledge of organogenic elements sufficed to have a panoramic idea of the space. We determined similarities among the 60 elements known by 1869 taking into account the resemblance of their combinations and we found that Meyer's and Mendeleev's similarities for the chemical elements agree to a large extent with the similarities allowed by the chemical space.
Article
Full-text available
It has been claimed that relational properties among chemical substances are at the core of chemistry. Here we show that chemical elements and a wealth of their trends can be found by the study of a relational property: the formation of binary compounds. We say that two chemical elements A and B are similar if they form binary compounds AC and BC, C being another chemical element. To allow the richness of chemical combinations, we also included the different stoichiomet-rical ratios for binary compounds. Hence, the more combinations with different chemical elements, and with similar stoichiometry, the more similar two chemical elements are. We studied 4,700 binary compounds by using network theory and point set topology, we obtained well-known chemical families of elements, such as: alkali metals, alkaline earth metals, halogens, lanthanides, actinides, some transi-tion metal groups and chemical patterns like: singularity principle, knight's move, and secondary periodicity. The methodology applied here can be extended to the study of ternary, quaternary and other compounds, as well as other chemical sets where a relational property can be defined.
Article
Chemical research unveils the structure of chemical space, spanned by all chemical species, as documented in more than 200 y of scientific literature, now available in electronic databases. Very little is known, however, about the large-scale patterns of this exploration. Here we show, by analyzing millions of reac- tions stored in the Reaxys database, that chemists have reported new compounds in an exponential fashion from 1800 to 2015 with a stable 4.4% annual growth rate, in the long run nei- ther affected by World Wars nor affected by the introduction of new theories. Contrary to general belief, synthesis has been the means to provide new compounds since the early 19th cen- tury, well before Wöhler’s synthesis of urea. The exploration of chemical space has followed three statistically distinguishable regimes. The first one included uncertain year-to-year output of organic and inorganic compounds and ended about 1860, when structural theory gave way to a century of more regular and guided production, the organic regime. The current organometal- lic regime is the most regular one. Analyzing the details of the synthesis process, we found that chemists have had preferences in the selection of substrates and we identified the workings of such a selection. Regarding reaction products, the discovery of new compounds has been dominated by very few elemental com- positions. We anticipate that the present work serves as a starting point for more sophisticated and detailed studies of the history of chemistry.
Article
Chemistry, as today's most active science, has increased its substances exponentially during the past 200 years without saturation. To get more insight why and how chemists produce new substances, a content analysis of 300 communications to theAngewandte Chemie of the years 1980, 1990, and 1995 is carried out regarding aims and methods of preparative research. In the most productive field of organic chemistry production mainly occurs to improve abilities for further production, while the less productive field of inorganic chemistry has more diverse aims. Methodological differences between organic and inorganic chemistry are discussed in detail as well as the relationship between pure and applied science.
The chemical core of chemistry I: a conceptual approach
  • J Schummer
Schummer, J.: The chemical core of chemistry I: a conceptual approach. HYLE-International Journal for Philosophy of Chemistry 4 (2), 129-162 (1998)
The Chemical Space from Which the Periodic System Arose
  • W Leal
  • E Llanos
  • P F Stadler
  • J Jost
  • G Restrepo
Leal, W.; Llanos, E.; Stadler, P.F.; Jost, J.; Restrepo, G.: The Chemical Space from Which the Periodic System Arose. ChemRxiv 10.26434/chemrxiv.9698888.v1 (2019)