Fig 9 - uploaded by Max Mowbray
Content may be subject to copyright.
Common modeling steps using an industrial data set with hundreds of tags and a well-defined target (e.g. yield of the process). First, a screening of variables and selection of tags (sensors) using random forest (a). Many tags will end up being weakly correlated to the target, perhaps trying to explain its noise. By adding known noise as an additional tag(s), the selection of tags with a certain contribution is facilitated. Then, a decision tree to obtain a robust non-linear but interpretable model (b). And finally, neural networks (c) once data is cleaned and better understood to capture all the non-linearities present in the data.

Common modeling steps using an industrial data set with hundreds of tags and a well-defined target (e.g. yield of the process). First, a screening of variables and selection of tags (sensors) using random forest (a). Many tags will end up being weakly correlated to the target, perhaps trying to explain its noise. By adding known noise as an additional tag(s), the selection of tags with a certain contribution is facilitated. Then, a decision tree to obtain a robust non-linear but interpretable model (b). And finally, neural networks (c) once data is cleaned and better understood to capture all the non-linearities present in the data.

Source publication
Article
Full-text available
In the literature, machine learning (ML) and artificial intelligence (AI) applications tend to start with examples that are irrelevant to process engineers (e.g. classification of images between cats and dogs, house pricing, types of flowers, etc.). However, process engineering principles are also based on pseudo-empirical correlations and heuristi...

Contexts in source publication

Context 1
... the problem and potential solution are better understood. Diagnostics correspond to the beginning of any industrial application (see Fig. 7). Industrial data science can accelerate the process of discriminating what are the tags (sensors) that can help explain the problem while capturing nonlinearities via data-driven modeling techniques (see Fig. 9). The general idea is always to perform simpler, more interpretable, tree-based models for screening followed by more complex modeling techniques such as neural networks. Partition models (also known as decision trees) are common for screening, as they can handle tags with different units, the presence of missing values, and outliers ...
Context 2
... or depth in the tree). A bootstrap forest (also known as random forest 34 ) consists of several of these trees that are generated by sampling the dataset (a subset of tags and timestamps). Combining the average of the models, a more exhaustive list of potential tags (features) is obtained and ranked according to their feature importance (see Fig. 9a). However, noise within the data can be also captured. Random numbers with several types of distributions (e.g. normal, uniform…) or the target timeshuffled can be intentionally added as model parameters. This technique 35,36 is used as a cut-off and allows better separation between signal and noise, as well as the creation of simple ...
Context 3
... However, noise within the data can be also captured. Random numbers with several types of distributions (e.g. normal, uniform…) or the target timeshuffled can be intentionally added as model parameters. This technique 35,36 is used as a cut-off and allows better separation between signal and noise, as well as the creation of simple tree models (Fig. 9b). Once the data-set is better understood and prepared, neural networks ( ...
Context 4
... common approach is to summarize each batch using statistics and process knowledge (peak temperature or its average rate of change during the reaction phase). In the literature, these are known as landmark points or fingerprints (see Fig. 19), but it usually assumes we know what are the important features to generate. Generalizing this approach, one can calculate common statistics (average, max, min, range, std, first, last, or their robust equivalent), for every sensor during every phase, for every batch and grade. In auto-machine learning, this is known as feature ...
Context 5
... as discrepancy models where the first layers (a and b) capture the major variability within the data. Weaker but perhaps more interesting predictors can be identified by examining deep layers (c). Following the example used earlier, the first two layers are able to identify major drivers separately: a) flow and temperature; b) pressure stability. Fig. 19 Model inputs for batch processes can be generated by summarizing the information, which is known as landmark points in the literature. Here, the maximum temperature reached during fermentation can be found to be correlated to the quality of the ...
Context 6
... y t is the measured variable, x t is the real system state, w t is additive system disturbance and e t is typically a zeromean Gaussian noise. An example of such a system is shown in Fig. 29, which shows a second-order system. The measured output y(t + 1) is, therefore, a function of u(t) but also the inertia of the system. This is implicit and observed through the evolution of the state variable, x(t), which in this example corresponds to the measured y(t). There are two primary approaches to the identification of such a ...
Context 7
... g), as a linear combination of the current state and control input. The field of SI pioneered the efficient identification of the associated model parameters, θ LTI , through the development of subspace identification methods. 129 One of the foundational methods provided independently by Ho and Kalman (and others) leverages the concepts of system Fig. 29 A second-order linear dynamical system with one (a) observed state, y(t), and (b) control input, u(t). The discrete evolution of y(t + 1) can be approximated as a function of the cumulative sum (cusum) of state (over a past horizon) and the most recent control input, instead of simply using the previous measurement. A comparison is ...
Context 8
... provide obstacle to existing methods. The application of RL to supply chain optimization is similarly in its infant stage, however efforts such as OR-gym 230 provide means for researchers to develop suitable algorithms for standard benchmark problems. Again, this area would largely benefit from greater collaboration between academia and industry. Fig. 39 shows some training results from the inventory management problem described in ref. 230 generated by different evolutionary RL approaches including particle swarm optimization (PSO), 231 evolutionary strategies (ES), 232 artificial bee colony (ABC) 233 and a hybrid algorithm with a space reduction approach. 234 ...

Citations

... AI involves several methodological domains, such as reasoning, knowledge representation, solution search, and the basic paradigm of machine learning (ML) among them. In the last few years, especially since the introduction of AlphaGo, ML has been greatly developed in the field of industrial chemistry and chemical engineering, thus greatly helping the development of pharmaceuticals and fine chemicals, thus reducing time and cost [3][4][5]. So far, much of the literature has summarized the application of machine learning algorithms in the chemical industry ( Figure 2) [6]. ...
... Key to the development of computer- AI involves several methodological domains, such as reasoning, knowledge representation, solution search, and the basic paradigm of machine learning (ML) among them. In the last few years, especially since the introduction of AlphaGo, ML has been greatly developed in the field of industrial chemistry and chemical engineering, thus greatly helping the development of pharmaceuticals and fine chemicals, thus reducing time and cost [3][4][5]. So far, much of the literature has summarized the application of machine learning algorithms in the chemical industry ( Figure 2) [6]. ...
Article
Full-text available
With the development of Industry 4.0, artificial intelligence (AI) is gaining increasing attention for its performance in solving particularly complex problems in industrial chemistry and chemical engineering. Therefore, this review provides an overview of the application of AI techniques, in particular machine learning, in chemical design, synthesis, and process optimization over the past years. In this review, the focus is on the application of AI for structure-function relationship analysis, synthetic route planning, and automated synthesis. Finally, we discuss the challenges and future of AI in making chemical products.
... The data is used for illustration after cleaning of errors and repetitions. The data representation is essential for better performance of algorithm and it is carried out to convert the raw data into suitable form according to the requirement of algorithm [4]. ML is classified into two main types such as supervised and unsupervised learning approaches. ...
... In order to solve these issues, advanced and analytical tools are available which can be used for data automation such as screening models, i.e., AutoML. The advanced monitoring system has become a new standard in manufacturing environment, which is capable of notifying the abnormal behaviors, list correlated factors, and allows engineers to visualize process data [4]. Another, challenge is associated with data collection and analytics of data generated by complex processes. ...
Article
The field of machine learning has proven to be a powerful approach in smart manufacturing and processing in the chemical and process industries. This review provides a systematic overview of current state of artificial intelligence and machine learning and their applications in textile, nuclear power plant, fertilizer, water treatment, and oil and gas industries. Moreover, this study reveals the current dominant machine learning methods, pre and post processing of models, increased utilization of machine learning in terms of fault detection, prediction, optimization, quality control, and maintenance in these sectors. In addition, this review gives the insight into the actual benefits and impact of each method, and complications in their extensive deployment. Finally in the current impressive state, challenges, future development in terms of algorithm and infrastructure aspects are highlighted. A systematic overview of the current state of artificial intelligence and machine learning and their applications in textile, nuclear power plant, fertilizer, water treatment, and oil and gas industries is provided. The current dominant machine learning methods, pre and post processing of models, increased utilization of machine learning in terms of fault detection, prediction, optimization, quality control, and maintenance in these sectors is revealed.
... Beyond this application, ML models can also be used as surrogate models for complex scale-up models (e.g. by replacing costly simulations in computational fluid dynamics) [102]. Though literature in the field of ML learning for bioprocess scale-up is still scarce, we anticipate that methods will be evolving quickly, potentially using the field of chemical engineering as a blueprint (e.g., [103]). ...
Article
Fostered by novel analytical techniques, digitalization, and automation, modern bioprocess development provides large amounts of heterogeneous experimental data, containing valuable process information. In this context, data-driven methods like machine learning (ML) approaches have great potential to rationally explore large design spaces while exploiting experimental facilities most efficiently. Herein we demonstrate how ML methods have been applied so far in bioprocess development, especially in strain engineering and selection, bioprocess optimization, scale-up, monitoring, and control of bioprocesses. For each topic, we will highlight successful application cases, current challenges, and point out domains that can potentially benefit from technology transfer and further progress in the field of ML.
... Gekko uses numeric solvers such as the Interior Point Optimizer (IPOPT) [2] and Advanced Process Optimizer (APOPT) [3] among others to solve these complex problems. Using first and second derivative information from the provided algebraic equations in the problem statement, Gekko solves a range of different optimization problems, and has been used in various applications such as nuclear waste glass formulation [4], mosquito population control strategies [5], small module nuclear reactor design [6], ammonia production from wind power [7], smart transportation systems [8], chemical and process industries [9], smart-grid electric vehicle charging [10], optimization of high-altitude solar aircraft [11], model predictive control of sucker-rod pumping [12], and LNG-fueled ship design optimization [13]. Although Gekko solves differential and algebraic equations, it is unable to solve problems with functions that do not have derivative information available. ...
Article
Full-text available
Gekko is an optimization suite in Python that solves optimization problems involving mixed-integer, nonlinear, and differential equations. The purpose of this study is to integrate common Machine Learning (ML) algorithms such as Gaussian Process Regression (GPR), support vector regression (SVR), and artificial neural network (ANN) models into Gekko to solve data based optimization problems. Uncertainty quantification (UQ) is used alongside ML for better decision making. These methods include ensemble methods, model-specific methods, conformal predictions, and the delta method. An optimization problem involving nuclear waste vitrification is presented to demonstrate the benefit of ML in this field. ML models are compared against the current partial quadratic mixture (PQM) model in an optimization problem in Gekko. GPR with conformal uncertainty was chosen as the best substitute model as it had a lower mean squared error of 0.0025 compared to 0.018 and more confidently predicted a higher waste loading of 37.5 wt% compared to 34 wt%. The example problem shows that these tools can be used in similar industry settings where easier use and better performance is needed over classical approaches. Future works with these tools include expanding them with other regression models and UQ methods, and exploration into other optimization problems or dynamic control.
... Process industries nowadays collect large amounts of process data from sensors and machines, which can be exploited for various software-based innovations, in particular when combined with additional data sources, such as the enterprise resource planning system (ERP). Applications include among other advanced analytics, such as root cause analysis for defects and anomaly detection, process control, planning optimizations, such as predictive maintenance, and in general an improved observability of the processes based on visualizations and real time simulations (Mowbray et al., 2022). ...
... Digitalization and data science including AI/ML technology are sweeping various fields of science and engineering and data-based methods are being implemented on unprecedented scales, see (Khalil et al., 2021), (Mowbray et al., 2022;Sircar et al., 2021) for reviews of data science in various fields of engineering. Data science is a vast and dynamic field with great versatility. ...
Article
Full-text available
Geothermal heat pump (GHP) systems have been established as a proven technology for cooling and heating residential, public and commercial buildings. There is a geothermal solution to the ambitious goal of decarbonizing the space heating and cooling, which is contingent on the successful deployment of the GHP technology. This in turn requires accurate site characterization, sound design methodologies, effective control logic, and short and long-term (life-cycle) performance analysis and optimization. In this article, we review the afore-mentioned aspects of the vertical closed-loop GHPs specifically focusing on the important role of the subsurface. The basics of GHP technology are introduced along with relevant trends and statistics. GHPs are compared with similar technologies such as air source heat pumps (ASHP) along with the effects of deployment on the grid peak load. We then review the common system architectures and the growing trends for deeper boreholes and the drivers behind it. Various methods for design, sizing, and simulation of GHPs are introduced along with software tools common in research and industry. We then move to subsurface characterization, drilling and well construction of vertical boreholes. Long-term performance monitoring for GHP systems is an important source of information for model validation and engineering design and is garnering increasing attention recently. Data science is another field that is growing rapidly with its methods increasingly utilized in GHP applications. The environmental aspect of GHPs is briefly reviewed. Finally, concluding remarks are given to summarize the review and highlight the potential of petroleum engineering expertise and methods in GHP applications.
... Advanced techniques for model interpretation can be applied to summary data (e.g. SHAP 77,84,85 ). However, for batch process data a natural step will be to analyze the subset of tags using FPCA. ...
Preprint
Full-text available
Batch processes show several sources of variability, from raw materials' properties to initial and evolving conditions that change during the different events in the manufacturing process. In this chapter, we will illustrate with an industrial example how to use machine learning to reduce this apparent excess of data while maintaining the relevant information for process engineers. Two common use cases will be presented: 1) AutoML analysis to quickly find correlations in batch process data, and 2) trajectory analysis to monitor and identify anomalous batches leading to process control improvements.
... Nowadays, ML algorithms are well known and have been applied in many fields including Chemical Engineering. For instance, ML has been used in predictive analysis for modeling process operations (i.e., crystallization, absorption, distillation, gasification, dry reforming, etc.) (Damour et al., 2010;Velásco-Mejía et al., 2016;Kharitonova et al., 2019;Singh et al., 2007;Pandey et al., 2016;Azzam et al., 2018;Bagheri et al., 2019), for predicting thermodynamic properties of different fluids (Liu et al., 2019), hybrid modeling of chemical reactors (Ammar et al., 2021), also, a wide range of industrial applications of this kind of models can be found in Mowbray et al. (2022), Lee et al. (2018) and Trinh et al. (2021). ML has had an important repercussion in Chemical Engineering practice, because models are powerful and flexible tools that can describe chemical systems in real-time and also are relatively easy to implement into existing systems for monitoring, controlling and predicting the outputs of units operations (Kakkar et al., 2021). ...
Article
Full-text available
To boost process operation modern chemical technology can require detailed mathematical descriptions of such complex, interacting and nonlinear systems, specially when experiments or pilot plant data are lacking. In some cases, these process models are formulated in terms of partial differential equations, which, in turn are hard to solve due the high demand of computational resources. However, the recent access to large data sets has allowed to address the simulation of these complex chemical systems by machine learning schemes. This new approach has notable strengths over traditional methods, such as flexibility, relative easy implementation and fastest performance. The proposal is to build surrogate models to approximate system behavior by making use of massive data information. Nonetheless, one of the principal drawbacks of these methods is the lack of understanding and the inherent uncertainty related to them. This paper explores the capability of different machine learning techniques for modeling different chemical process with different non-linear behavior. Furthermore, to handle the uncertainty in the models and interpret the confidentiality of the results, a probabilistic gaussian machine learning framework was leveraged.
... AI can be also used as tool to support the decision making related in the fundamental research and practical production of chemicals. In addition, at process level, Mowbray et al. (2022) explains the fundamentals of Machine Learning (ML) and data science, and how they can be linked to process and industrial engineering. ...
Technical Report
Full-text available
Within the European Green Deal, the Chemicals Strategy for Sustainability (CSS) (EC, 2020a) identified a number of actions to reduce negative impacts on human health and the environment associated with chemicals, materials, products and services commercialised or introduced onto the EU market. In particular, the ambition of the CSS is to phase out the most harmful substances and substitute, as far as possible, all other substances of concern, and otherwise minimise their use and track them. This objective requires novel approaches to analysing and comparing, across all life cycle stages, effects, releases and emissions for specific chemicals, materials, products and services, and move towards zero-pollution for air, water, soil and biota.