Accelerated Discovery of Sustainable Building Materials

Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.


Concrete is the most widely used engineered material in the world with more than 10 billion tons produced annually. Unfortunately, with that scale comes a significant burden in terms of energy, water, and release of greenhouse gases and other pollutants. As such, there is interest in creating concrete formulas that minimize this environmental burden, while satisfying engineering performance requirements. Recent advances in artificial intelligence have enabled machines to generate highly plausible artifacts, such as images of realistic looking faces. Semi-supervised generative models allow generation of artifacts with specific, desired characteristics. In this work, we use Conditional Variational Autoencoders (CVAE), a type of semi-supervised generative model, to discover concrete formulas with desired properties. Our model is trained using open data from the UCI Machine Learning Repository joined with environmental impact data computed using a web-based tool. We demonstrate CVAEs can design concrete formulas with lower emissions and natural resource usage while meeting design requirements. To ensure fair comparison between extant and generated formulas, we also train regression models to predict the environmental impacts and strength of discovered formulas. With these results, a construction engineer may create a formula that meets structural needs and best addresses local environmental concerns.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Full-text available
We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This generative model allows efficient search and optimization through open-ended spaces of chemical compounds. We train deep neural networks on hundreds of thousands of existing chemical structures to construct two coupled functions: an encoder and a decoder. The encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the decoder converts these continuous vectors back to the discrete representation from this latent space. Continuous representations allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compounds. We demonstrate our method in the design of drug-like molecules as well as organic light-emitting diodes.
Full-text available
We compare several adaptive design strategies using a data set of 223 M2AX family of compounds for which the elastic properties [bulk (B), shear (G), and Young’s (E) modulus] have been computed using density functional theory. The design strategies are decomposed into an iterative loop with two main steps: machine learning is used to train a regressor that predicts elastic properties in terms of elementary orbital radii of the individual components of the materials; and a selector uses these predictions and their uncertainties to choose the next material to investigate. The ultimate goal is to obtain a material with desired elastic properties in as few iterations as possible. We examine how the choice of data set size, regressor and selector impact the design. We find that selectors that use information about the prediction uncertainty outperform those that don’t. Our work is a step in illustrating how adaptive design tools can guide the search for new materials with desired properties.
Full-text available
This paper investigates a problem of generating images from visual attributes. Given the prevalent research for image recognition, the conditional image generation problem is relatively under-explored due to the challenges of learning a good generative model and handling rendering uncertainties in images. To address this, we propose a variety of attribute-conditioned deep variational auto-encoders that enjoy both effective representation learning and Bayesian modeling, from which images can be generated from specified attributes and sampled latent factors. We experiment with natural face images and demonstrate that the proposed models are capable of generating realistic faces with diverse appearance. We further evaluate the proposed models by performing attribute-conditioned image progression, transfer and retrieval. In particular, our generation method achieves superior performance in the retrieval experiment against traditional nearest-neighbor-based methods both qualitatively and quantitatively.
Full-text available
Several studies independently have shown that concrete strength development is determined not only by the water-to-cement ratio, but that it also is influenced by the content of other concrete ingredients. High-performance concrete is a highly complex material, which makes modeling its behavior a very difficult task. This paper is aimed at demonstrating the possibilities of adapting artificial neural networks (ANN) to predict the compressive strength of high-performance concrete. A set of trial batches of HPC was produced in the laboratory and demonstrated satisfactory experimental results. This study led to the following conclusions: 1) A strength model based on ANN is more accurate than a model based on regression analysis; and 2) It is convenient and easy to use ANN models for numerical experiments to review the effects of the proportions of each variable on the concrete mix.
Conference Paper
This paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be learned end-to-end using a variational auto-encoder. We experiment with natural images of faces and birds and demonstrate that the proposed models are capable of generating realistic and diverse samples with disentangled latent representations. We use a general energy minimization algorithm for posterior inference of latent variables given novel images. Therefore, the learned generative models show excellent quantitative and visual results in the tasks of attribute-conditioned image reconstruction and completion.
In the era of large scientific data sets, there is an urgent need for methods to automatically prioritize data for review. At the same time, for any automated method to be adopted by scientists, it must make decisions that they can understand and trust. In this paper, we propose Discovery through Eigenbasis Modeling of Uninteresting Data (DEMUD), which uses principal components modeling and reconstruction error to prioritize data. DEMUD's major advance is to offer domain-specific explanations for its prioritizations. We evaluated DEMUD's ability to quickly identify diverse items of interest and the value of the explanations it provides. We found that DEMUD performs as well or better than existing class discovery methods and provides, uniquely, the first explanations for why those items are of interest. Further, in collaborations with planetary scientists, we found that DEMUD (1) quickly identifies very rare items of scientific value, (2) maintains high diversity in its selections, and (3) provides explanations that greatly improve human classification accuracy. Copyright © 2013, Association for the Advancement of Artificial Intelligence ( All rights reserved.
The standard unsupervised recurrent neural network language model (RNNLM) generates sentences one word at a time and does not work from an explicit global distributed sentence representation. In this work, we present an RNN-based variational autoencoder language model that incorporates distributed latent representations of entire sentences. This factorization allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features. Samples from the prior over these sentence representations remarkably produce diverse and well-formed sentences through simple deterministic decoding. By examining paths through this latent space, we are able to generate coherent novel sentences that interpolate between known sentences. We present techniques for solving the difficult learning problem presented by this model, demonstrate strong performance in the imputation of missing tokens, and explore many interesting properties of the latent sentence space.
Can we efficiently learn the parameters of directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and in case of large datasets? We introduce a novel learning and approximate inference method that works efficiently, under some mild conditions, even in the on-line and intractable case. The method involves optimization of a stochastic objective function that can be straightforwardly optimized w.r.t. all parameters, using standard gradient-based optimization methods. The method does not require the typically expensive sampling loops per datapoint required for Monte Carlo EM, and all parameter updates correspond to optimization of the variational lower bound of the marginal likelihood, unlike the wake-sleep algorithm. These theoretical advantages are reflected in experimental results.
We propose a formal Bayesian definition of surprise to capture subjective aspects of sensory information. Surprise measures how data affects an observer, in terms of differences between posterior and prior beliefs about the world. Only data observations which substantially affect the observer's beliefs yield surprise, irrespectively of how rare or informative in Shannon's sense these observations are. We test the framework by quantifying the extent to which humans may orient attention and gaze towards surprising events or items while watching television. To this end, we implement a simple computational model where a low-level, sensory form of surprise is computed by simple simulated early visual neurons. Bayesian surprise is a strong attractor of human attention, with 72% of all gaze shifts directed towards locations more surprising than the average, a figure rising to 84% when focusing the analysis onto regions simultaneously selected by all observers. The proposed theory of surprise is applicable across different spatio-temporal scales, modalities, and levels of abstraction.
The Materials Project: A materials genome approach to accelerating materials innovation
  • A Jain
  • S P Ong
  • G Hautier
  • W Chen
  • W D Richards
  • S Dacek
  • S Cholia
  • D Gunter
  • D Skinner
  • G Ceder
  • K Persson
Jain, A.; Ong, S. P.; Hautier, G.; Chen, W.; Richards, W. D.; Dacek, S.; Cholia, S.; Gunter, D.; Skinner, D.; Ceder, G.; and Persson, K. a. 2013. The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1(1):011002.
Learning structured output representation using deep conditional generative models
  • K Sohn
  • H Lee
Sohn, K.; Lee, H.; and Yan, X. 2015. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems 28. 3483-3491.
  • C Doersch
Doersch, C. 2016. Tutorial on Variational Autoencoders. arXiv:1606.05908 [stat.ML].
  • L Rampasek
  • D Hidru
  • P Smirnov
  • B Haibe-Kains
  • A Goldenberg
Rampasek, L.; Hidru, D.; Smirnov, P.; Haibe-Kains, B.; and Goldenberg, A. 2017. Dr.VAE: Drug Response Variational Autoencoder. arXiv:1706.08203 [stat.ML].
Gaussian Copula Variational Autoencoders for Mixed Data
  • S Suh
  • S Choi
Suh, S., and Choi, S. 2016. Gaussian Copula Variational Autoencoders for Mixed Data. arXiv:1604.04960 [stat.ML].