Preprint

Predictive Modelling of CO2 Emission Reduction: Evaluating the Impact of Replacing Non-Renewable Energy Sources with Renewable Energy in the UK

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the author.

Abstract

A supervised learning multi-variable linear regression model to assess the impact of replacing non-renewable energy sources with renewable energy on CO2 emissions in the UK was utilised in this paper. The model was developed and tested using various tools and libraries, including Scikit-learn, Jupyter Notebook, NumPy, SciPy, seaborn, and matplotlib. Data preprocessing and analysis were conducted using various methods. The analysis found that non-renewable energy sources have a positive impact on CO2 emissions, but substituting them with sustainable and green energy has an adverse effect. The model developed using selected features outperformed the model without feature selection, suggesting that the feature selection process improves performance and efficiency. The paper emphasises the need for further research and collaborative efforts to enhance the accuracy and reliability of AI predictive models for effective business policy formulation and decision-making.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
A growing number of countries worldwide have committed to achieving net zero emissions targets by around mid-century since the Paris Agreement. As the world’s greatest carbon emitter and the largest developing economy, China has also set clear targets for carbon peaking by 2030 and carbon neutrality by 2060. Carbon-reduction AI applications promote the green economy. However, there is no comprehensive explanation of how AI affects carbon emissions. Based on panel data for 270 Chinese cities from 2011 to 2017, this study uses the Bartik method to quantify data on manufacturing firms and robots in China and demonstrates the effect of AI on carbon emissions. The results of the study indicate that (1) artificial intelligence has a significant inhibitory effect on carbon emission intensity; (2) the carbon emission reduction effect of AI is more significant in super- and megacities, large cities, and cities with better infrastructure and advanced technology, whereas it is not significant in small and medium cities, and cities with poor infrastructure and low technology level; (3) artificial intelligence reduces carbon emissions through optimizing industrial structure, enhancing information infrastructure, and improving green technology innovation. In order to achieve carbon peaking and carbon neutrality as quickly as possible during economic development, China should make greater efforts to apply AI in production and life, infrastructure construction, energy conservation, and emission reduction, particularly in developed cities.
Article
Full-text available
Regression analysis makes up a large part of supervised machine learning, and consists of the prediction of a continuous independent target from a set of other predictor variables. The difference between binary classification and regression is in the target range: in binary classification, the target can have only two values (usually encoded as 0 and 1), while in regression the target can have multiple values. Even if regression analysis has been employed in a huge number of machine learning studies, no consensus has been reached on a single, unified, standard metric to assess the results of the regression itself. Many studies employ the mean square error (MSE) and its rooted variant (RMSE), or the mean absolute error (MAE) and its percentage variant (MAPE). Although useful, these rates share a common drawback: since their values can range between zero and +infinity, a single value of them does not say much about the performance of the regression with respect to the distribution of the ground truth elements. In this study, we focus on two rates that actually generate a high score only if the majority of the elements of a ground truth group has been correctly predicted: the coefficient of determination (also known as R -squared or R ² ) and the symmetric mean absolute percentage error (SMAPE). After showing their mathematical properties, we report a comparison between R ² and SMAPE in several use cases and in two real medical scenarios. Our results demonstrate that the coefficient of determination ( R -squared) is more informative and truthful than SMAPE, and does not have the interpretability limitations of MSE, RMSE, MAE and MAPE. We therefore suggest the usage of R -squared as standard metric to evaluate regression analyses in any scientific domain.
Article
Full-text available
The generation volatility of photovoltaics (PVs) has created several control and operation challenges for grid operators. For a secure and reliable day or hour-ahead electricity dispatch, the grid operators need the visibility of their synchronous and asynchronous generators’ capacity. It helps them to manage the spinning reserve, inertia and frequency response during any contingency events. This study attempts to provide a machine learning-based PV power generation forecasting for both the short and long-term. The study has chosen Alice Springs, one of the geographically solar energy-rich areas in Australia, and considered various environmental parameters. Different machine learning algorithms, including Linear Regression, Polynomial Regression, Decision Tree Regression, Support Vector Regression, Random Forest Regression, Long Short-Term Memory, and Multilayer Perceptron Regression, are considered in the study. Various comparative performance analysis is conducted for both normal and uncertain cases and found that Random Forest Regression performed better for our dataset. The impact of data normalization on forecasting performance is also analyzed using multiple performance metrics. The study may help the grid operators to choose an appropriate PV power forecasting algorithm and plan the time-ahead generation volatility.
Article
Full-text available
The energy industry is at a crossroads. Digital technological developments have the potential to change our energy supply, trade, and consumption dramatically. The new digitalization model is powered by artificial intelligence (AI) technology. The integration of energy supply, demand, and renewable sources into the power grid will be controlled autonomously by smart software that optimizes decision-making and operations. AI will play an integral role in achieving this goal. This study focuses on the use of AI techniques in the energy sector. This study aims to present a realistic baseline that allows researchers and readers to compare their AI efforts, ambitions, new state-of-the-art applications, challenges, and global roles in policymaking. We covered three major aspects, including: i) the use of AI in solar and hydrogen power generation; (ii) the use of AI in supply and demand management control; and (iii) recent advances in AI technology. This study explored how AI techniques outperform traditional models in controllability, big data handling, cyberattack prevention, smart grid, IoT, ro-botics, energy efficiency optimization, predictive maintenance control, and computational efficiency. Big data, the development of a machine learning model, and AI will play an important role in the future energy market. Our study's findings show that AI is becoming a key enabler of a complex, new, and data-related energy industry, providing a key magic tool to increase operational performance and efficiency in an increasingly cut-throat environment. As a result, the energy industry, utilities, power system operators, and independent power producers may need to focus more on AI technologies if they want meaningful results to remain competitive. New competitors, new business strategies, and a more active approach to customers would require informed and flexible regulatory engagement with the associated complexities of customer safety, privacy, and information security. Given the pace of development in information technology, AI, and data analysis, regulatory approvals for new services and products in the new Era of digital energy markets can be enforced as quickly and efficiently as possible.
Article
Full-text available
High-quality data are the precondition for analyzing and using big data and for guaranteeing the value of the data. Currently, comprehensive analysis and research of quality standards and quality assessment methods for big data are lacking. First, this paper summarizes reviews of data quality research. Second, this paper analyzes the data characteristics of the big data environment, presents quality challenges faced by big data, and formulates a hierarchical data quality framework from the perspective of data users. This framework consists of big data quality dimensions, quality characteristics, and quality indexes. Finally, on the basis of this framework, this paper constructs a dynamic assessment process for data quality. This process has good expansibility and adaptability and can meet the needs of big data quality assessment. The research results enrich the theoretical scope of big data and lay a solid foundation for the future by establishing an assessment model and studying evaluation algorithms.
Article
Full-text available
Final Report prepared by the Policy Studies Institute for the Department of Trade and Industry (DTI) and the Department of Environment, Food and Rural Affairs (DEFRA) ii Executive Summary This is the final report for the DTI and DEFRA on the development of a new UK MARKAL & MARKAL-Macro (M-M) energy systems model. The focus of this final report is on the extensive range of UK 60% CO 2 abatement scenarios and sensitivity analysis run for analytical insights to underpin the 2007 Energy White Paper. This analysis was commissioned by the DTI to underpin the development of the 2007 UK Energy White Paper, and this technical report is a companion publication to the policy focused discussion of the modelling work (DTI, 2007). Model development (enabled through the energy systems modelling theme of the UK Energy Research Centre (UKERC)) is summarised, notably the range of enhancements to improve UK MARKAL's functionality and analytical sophistication. These include resource supply curves, explicit depiction of energy supply chains, remote and micro electricity grids, substantial technological detail in the major end-use sectors (residential, services, industry, transport and agriculture), and a full data update including substantial stakeholder interaction. A major component of the development work was the integration of MARKAL with a neoclassical growth model (MARKAL-Macro), to facilitate direct calculation of macro-economic impacts from changes in the energy sector as well as endogenous behavioural change in energy service demands. However, it is still important to acknowledge the limitations of these partial and general equilibrium dynamic optimisation energy system models. Cost optimization assumes a perfectly competitive market and neglects barriers and other non-economic criteria that affect energy decisions. Hence, without additional constraints, it may over-estimate the deployment of nominally cost effective energy efficiency technologies. The model has an incomplete ability to model firm and consumer behaviour. Additionally the spatial (as a UK aggregated model) and temporal approximations (seasonal and diurnal) provide less insight into the siting of infrastructures, and the supply-demand balancing of the electricity network. Further disadvantages from incorporating Macro include the omission of trade impacts and transitional costs and therefore is likely to represent a lower bound on GDP impacts.
Article
Full-text available
Heteroscedasticity refers to a phenomenon where data violate a statistical assumption. This assumption is known as homoscedasticity. When the homoscedasticity assumption is violated, this can lead to increased Type I error rates or decreased statistical power. Because this can adversely affect substantive conclusions, the failure to detect and manage heteroscedasticity could have serious implications for theory, research, and practice. In addition, heteroscedasticity is not uncommon in the behavioral and social sciences. Thus, in the current article, we synthesize extant literature in applied psychology, econometrics, quantitative psychology, and statistics, and we offer recommendations for researchers and practitioners regarding available procedures for detecting heteroscedasticity and mitigating its effects. In addition to discussing the strengths and weaknesses of various procedures and comparing them in terms of existing simulation results, we describe a 3-step data-analytic process for detecting and managing heteroscedasticity: (a) fitting a model based on theory and saving residuals, (b) the analysis of residuals, and (c) statistical inferences (e.g., hypothesis tests and confidence intervals) involving parameter estimates. We also demonstrate this data-analytic process using an illustrative example. Overall, detecting violations of the homoscedasticity assumption and mitigating its biasing effects can strengthen the validity of inferences from behavioral and social science data. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Article
Full-text available
In this paper we have compared the abilities of two types of artificial neural networks (ANN): multilayer perceptron (MLP) and wavelet neural network (WNN) — for prediction of three gasoline properties (density, benzene content and ethanol content). Three sets of near infrared (NIR) spectra (285, 285 and 375 gasoline spectra) were used for calibration models building. Cross-validation errors and structures of optimized MLP and WNN were compared for each sample set. Four different transfer functions (Morlet wavelet and Gaussian derivative – for WNN; logistic and hyperbolic tangent – for MLP) were also compared. Wavelet neural network was found to be more effective and robust than multilayer perceptron.
Article
Full-text available
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
Article
Full-text available
Purpose – Renewable energy (RE) is an important component to the complex portfolio of technologies that have the potential to reduce CO2 emissions and to enhance the security of energy supplies. Despite RE's potential to reduce CO2 emissions, the expenditure on renewable energy research, development, and demonstration (RERD&D) as a percentage of total government energy research, development, and demonstration (ERD&D) investment remains low in developed countries. The declining ERD&D expenditure prompted this research to explore the relationship between CO2 emissions per capita and RERD&D as opposed to ERD&D. Design/methodology/approach – An econometric analysis of annual CO2 emissions per capita during the period 1990‐2004 for the 15 pre‐2004 European Union (EU15) countries was carried out. It was hypothesized that the impact of RERD&D expenditure on the reduction of CO2 emissions would be higher than that of ERD&D expenditure, primarily due to several RE technologies being close to carbon neutral. Country‐level gross domestic product per capita and an index of the ratio between industry consumption and industrial production were introduced in the analysis as proxies to control for activities that generate CO2 emissions. A number of panel data econometric models that are able to take into account both country‐ and time‐specific unobserved effects were explored. Findings – It was found that random effect models were more appropriate to examine the study hypothesis. The results suggest that expenditure on RERD&D is statistically significant and negatively associated with CO2 emissions per capita in all models, whereas expenditure on ERD&D is statistically insignificant (ceteris paribus). Originality/value – The findings of this paper provide useful insight into the effectiveness of RERD&D investment in reducing CO2 emissions and are of value in the development of policies for targeted research, development, and demonstration investment to mitigate the impacts of climate change.
Article
Full-text available
The severity of damaging human-induced climate change depends not only on the magnitude of the change but also on the potential for irreversibility. This paper shows that the climate change that takes place due to increases in carbon dioxide concentration is largely irreversible for 1,000 years after emissions stop. Following cessation of emissions, removal of atmospheric carbon dioxide decreases radiative forcing, but is largely compensated by slower loss of heat to the ocean, so that atmospheric temperatures do not drop significantly for at least 1,000 years. Among illustrative irreversible impacts that should be expected if atmospheric carbon dioxide concentrations increase from current levels near 385 parts per million by volume (ppmv) to a peak of 450-600 ppmv over the coming century are irreversible dry-season rainfall reductions in several regions comparable to those of the "dust bowl" era and inexorable sea level rise. Thermal expansion of the warming ocean provides a conservative lower limit to irreversible global average sea level rise of at least 0.4-1.0 m if 21st century CO(2) concentrations exceed 600 ppmv and 0.6-1.9 m for peak CO(2) concentrations exceeding approximately 1,000 ppmv. Additional contributions from glaciers and ice sheet contributions to future sea level rise are uncertain but may equal or exceed several meters over the next millennium or longer.
Article
Full-text available
Following the UN Framework Convention on Climate Change [1], countries will negotiate in Kyoto this December an agreement to mitigate greenhouse gas emissions. Here we examine optimal CO2 policies, given long-term constraints on atmospheric concentrations. Our analysis highlights the interplay of uncertainty and socioeconomic inertia. We find that the ‘integrated assessment' models so far applied under-represent inertia, and we show that higher adjustment costs make it optimal to spread the effort across generations and increase the costs of deferring abatement. Balancing the costs of early action against the potentially higher costs of a more rapid forced subsequent transition, we show that early attention to the carbon content of new and replacement investments reduces the exposure of both the environmental and the economic systems to the risks of costly and unpleasant surprises. If there is a significant probability of having to stay below a doubling of atmospheric CO2-equivalent, deferring abatement may prove costly.
Book
Exploratory Data Analysis (EDA) is an approach to data analysis that involves the application of diverse techniques to gain maximum insights into a dataset. This book is designed to help you gain practical knowledge of the main pillars of EDA, including data cleaning, data preparation, data exploration, and data visualization. You’ll start by performing EDA using open source datasets and perform simple-to-advanced analyses to turn data into meaningful insights. You’ll then learn various descriptive statistical techniques to describe the basic characteristics of data. The book will also help you get to grips with performing EDA on time-series data. As you advance, you’ll learn how to implement EDA techniques for model development and evaluation and build predictive models to visualize results. By using Python for data analysis, you’ll work with real-world datasets, understand the data, summarize its characteristics, and visualize it for business intelligence. By the end of this book, you’ll have developed the skills to carry out a preliminary investigation on any dataset, yield insights into the data, present your results with visual aids, and build a model that correctly predicts future outcomes.
Conference Paper
It is increasingly necessary for researchers in all fields to write computer code, and in order to reproduce research results, it is important that this code is published. We present Jupyter notebooks, a document format for publishing code, results and explanations in a form that is both readable and executable. We discuss various tools and use cases for notebook documents.
Article
Machine learning addresses the question of how to build computers that improve automatically through experience. It is one of today’s most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. Recent progress in machine learning has been driven both by the development of new learning algorithms and theory and by the ongoing explosion in the availability of online data and low-cost computation. The adoption of data-intensive machine-learning methods can be found throughout science, technology and commerce, leading to more evidence-based decision-making across many walks of life, including health care, manufacturing, education, financial modeling, policing, and marketing.
Article
The distributional assumption for a generalized linear model is often checked by plotting the ordered deviance residuals against the quantiles of a standard normal distribution. Such plots can be difficult to interpret, because even when the model is correct, the plot often deviates substantially from a straight line. To rectify this problem Ben and Yohai (2004) proposed plotting the deviance residuals against their theoretical quantiles, under the assumption that the model is correct. Such plots are closer to a straight line, when the model is correct, making them much more useful for model checking. However the quantile computation proposed in Ben and Yohai is, in general, relatively complicated to implement and computationally expensive, so that general purpose software for these plots is only available for the Poisson and binary cases in the R package robust. As an alternative the theoretical quantiles can efficiently and simply be estimated by repeatedly simulating new response data from the fitted model and computing the corresponding residuals. This method also provides reference bands for judging the significance of departures of QQ-plots from ideal straight line form. A second alternative is to estimate the quantiles using quantiles of the response variable distribution according to the estimated model. This latter alternative generally has lower computational cost than the first, but does not yield QQ-plot reference bands. In simulations the quantiles produced by the new methods give results indistinguishable from the original Ben and Yohai quantile computations, but the scaling of computational cost with sample size is much improved so that a 500 fold reduction in computation time was observed at sample size 50,000. Application of the methods to generalized linear models fitted to prostate cancer incidence data suggest that they are particularly useful in large dataset cases that might otherwise be incorrectly viewed as zero-inflated. The new approaches are simple enough to implement for any exponential family distribution and for several alternative types of residual, and this has been done for all the families available for use with generalized linear models in the basic distribution of R.
Article
Coefficients of determination for continuous predicted values (R analogs) in logistic regression are examined for their conceptual and mathematical similarity to the familiar R statistic from ordinary least squares regression, and compared to coefficients of determination for discrete predicted values (indexes of predictive efficiency). An example motivated by substantive concerns and using empirical data from a national household probability sample is presented to illustrate the behavior of the different coefficients of determination in the evaluation of models including dependent variables with different base rates—that is, different proportions of cases or observations with “positive” outcomes. One R analog appears to be preferable to the others both in terms of conceptual similarity to the ordinary least squares coefficient of determination, and in terms of its relative independence from the base rate. In addition, base rate should also be considered when selecting an index of predictive efficiency. As expected, the conclusions based on R analogs are not necessarily consistent with conclusions based on predictive efficiency, with respect to which of several outcomes is better predicted by a given model.
Article
The terms multivariate and multivariable are often used interchangeably in the public health literature. However, these terms actually represent 2 very distinct types of analyses. We define the 2 types of analysis and assess the prevalence of use of the statistical term multivariate in a 1-year span of articles published in the American Journal of Public Health. Our goal is to make a clear distinction and to identify the nuances that make these types of analyses so distinct from one another. (Am J Public Health. Published online ahead of print November 15, 2012: e1. doi:10.2105/AJPH.2012.300897).
Article
This paper summarizes the development of a new hybrid MARKAL-Macro (M-M) energy system model for the UK. This hybrid model maintains the technological and sectoral detail of a bottom-up optimisation approach with aggregated energy demand endogeneity and GDP impacts from a single sector neoclassical growth model. The UK M-M model was developed for underpinning analysis of the UK's groundbreaking mandatory long-term - 60% carbon dioxide (CO2) emissions reduction target. Hybrid modelling illustrates that long-term UK CO2 emission reductions are feasible. However, there are endemic uncertainties, notably a trade-off between behavioural and technological decarbonisation options with resultant energy system impacts in the requirements for zero-carbon electricity. UK M-M model sensitivity runs further illustrate the range of energy system interactions including the deployment of the UK's limited CO2 storage capacity, alternate timing of power vs. transport sectoral reductions, the relative ease of switching between electricity generation portfolios, and substitution opportunities between natural gas and coal. The macro-economic cost impacts range from 0.3% to 1.5% reduction in UK GDP by 2050, with higher cost estimates strongly influenced by pessimistic assessments of future low-carbon technologies. However cost impacts from the UK M-M model are likely to be in the lower range for stringent CO2 reduction pathways as the simplicity of the reduced form macro-linkage omits competitiveness and transitional impacts on the UK economy.
Co2 behavior amidst the covid-19 pandemic in the united kingdom: The role of renewable and non-renewable energy development
  • T S Adebayo
  • H K Abdulkareem
  • D Bilal
  • M I Kirikkaleli
  • S Shah
  • Abbas
T. S. Adebayo, H. K. AbdulKareem, Bilal, D. Kirikkaleli, M. I. Shah and S. Abbas, "Co2 behavior amidst the covid-19 pandemic in the united kingdom: The role of renewable and non-renewable energy development," en, Renewable Energy, vol. 189, pp. 492-501, Apr. 2022, issn: 09601481. doi: 10.1016/j.renene.2022. 02. 111. [Online]. Available: https : / / linkinghub. elsevier. com / retrieve / pii/S0960148122002609 (visited on 16/06/2023).
Python 3 reference manual: | guide books
"Python 3 reference manual: | guide books." (), [Online]. Available: https://dl. acm.org/doi/book/10.5555/1593511 (visited on 25/06/2023).
Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, en
  • A Zheng
  • A Casari
A. Zheng and A. Casari, Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, en. "O'Reilly Media, Inc.", 23rd Mar. 2018, 245 pp., isbn: 978-1-4919-5319-8. Google Books: sthSDwAAQBAJ.
A review of supervised machine learning algorithms
  • A Singh
  • N Thakur
  • A Sharma
A. Singh, N. Thakur and A. Sharma, "A review of supervised machine learning algorithms," in 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), Mar. 2016, pp. 1310-1315.
Regression analysis of spatial data
  • C M Beale
  • J J Lennon
  • J M Yearsley
  • M J Brewer
  • D A Elston
C. M. Beale, J. J. Lennon, J. M. Yearsley, M. J. Brewer and D. A. Elston, "Regression analysis of spatial data," en, Ecology Letters, vol. 13, no. 2, pp. 246-264, 2010, issn: 1461-0248. doi: 10. 1111 / j. 1461 -0248. 2009. 01422. x. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1461-0248.2009.01422.x (visited on 25/06/2023).