Conference PaperPDF Available

Analysis of the Temperature of a Specific Location using Advanced Data Analytics

Authors:

Figures

Content may be subject to copyright.
Analysis of the Temperature of a Specific Location
using Advanced Data Analytics
J. Rajanikanth
Department of Computer Science
Engineering,
SRKR Engineering College,
Bhimavaram, Andhra Pradesh, INDIA
rajanikanth.1984@gmail.com
T.V. Rajinikanth
Department of Computer Science
Engineering,
Sreenidhi Institute of Science and
Technology, Hyderabad, Telangana,
INDIA
rajinitv@gmail.com
R. Shiva Shankar
Department of Computer Science
Engineering,
Sagi Rama Krishnam Raju Engineering
College, Bhimavaram, Andhra Pradesh,
INDIA
shiva.shankar591@gmail.com
AbstractFarmers rely heavily on weather forecasts to make
educated decisions about what crops to plant, as do solar
industry workers, acro-product retailers, scientists were working
in weather forecasting departments, and academics teaching at
universities specializing in agriculture. It is impossible to make
accurate predictions or draw meaningful conclusions about
temperatures over a wide range of places to assist farmers and
others in protecting their homes and lives from natural disasters
such as floods, rainstorms, and droughts. With this aim, an effort
was undertaken to efficiently analyze the temperature data for a
particular spatial location namely Hyderabad using sophisticated
data science methods, Data Mining Techniques, statistical
techniques, and Machine learning algorithms. This Endeavour
aimed to provide findings that are helpful in this area.
Keywords DataSceince; Weather Forecast; Temperature;
Data Mining (DM);Statistical Techniques (ST); Machine Learning
(ML);
I. INTRODUCTION
The city of Hyderabad has an area of 650 square kilometers
(250 sq mi). It is situated along the banks of the Musi River on
the Deccan Plateau in the northern region of South India, as
shown in Figure 1. A massive portion of Hyderabad is located
on hilly terrain surrounding artificial lakes, the largest of which
is Hussain Sagar Lake, as shown in Figure 2, which has an
average elevation of 542 meters (1,778 ft). The climate of
Hyderabad is classified as Koppen Aw, which is a tropical wet
and dry climate, and Koppen BSh, which is a hot semi-arid
climate [1], [2] and [3]. The average temperature throughout
the year is 26.6 °C. (79.9 °F), while the average temperature
during the month ranges from 21 °C to 33 °C (70 ℉ to 91 ℉).
Fig.1. Hyderabad Location in Telangana State map
During the summer (MarchJune), temperatures often reach
the upper 30s Celsius and are quite humid; between April and
June, the highest temperatures often go above 40 °C (104 °F).
The months of December and January have the coldest
temperatures, with the lowest temperature sometimes falling
below 10 ° C. (50 ℉). The warmest month is May, with
average highs of 26 to 39 ° C. (79 to 102 ℉); the coldest month
is December, with average highs of 14.5 to 28 degrees Celsius
(57 to 82 ℉) [4].
Fig.2. Radius of Hyderabad City Map
The southwest summer monsoon brings significant
precipitation to Hyderabad during June and October. This rain
accounts for most of the city's average annual rainfall [5]. The
most considerable rain recorded in 24 hours was 241.5 mm (10
in) on August 24, 2000. Records started being kept in
November 1891. The lowest temperature ever recorded was 6.1
°C (43 °F) on January 8, 1946, while the highest temperature
ever recorded was 45.5 °C (114 °F) on June 2, 1966 [6]. The
city has an annual average of 2,731 hours (about 3 and a half
months) of sunshine, with February having the most average
daily solar exposure.
II. RELATED WORK
Vijay Kumar Didal et. al. [7] stated in their paper that the
Soft Computing / ML techniques like Fuzzy logic, Genetic
Algorithms, and Neural Network etc. are useful in weather
forecasting. G. Dayanandam (2022) et.al. [9] in their article
states that daily Rainfall data from June 1989 to May 2019 (30
years) is considered for calculation of State, Districts (33) and
Mandal (592). District wise temperature and humidity profiles
are calculated based on AWS (Amazon Web Services) data
Proceedings of the Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC-2022).
DVD Part Number: CFP22OSV-DVD; ISBN: 978-1-6654-6940-1
978-1-6654-6941-8/22/$31.00 ©2022 IEEE
from 2013 to 2020 which includes chronological rainfall data
from the year 1951 to 2020.
Masato Yamanouchi et al. [9] said a high-density real-
time weather monitoring system reduces disaster damage and
may be used for agriculture, education, and more. Bogdan
Bochenek and colleagues [10] conducted a comprehensive
analysis of the 500 most relevant scientific articles on ML
techniques in climate and numerical weather prediction that
have been published since 2018. They concluded that ML
would be a vital component in the process of weather
forecasting in the future.
Siddharth Singh et al. [11] compared SVM, ANN, and a
Time Series based RNN as ML models for weather
forecasting. Root Mean Squared Error (RMSE), also known as
the difference between the values that were predicted and
those that were observed, is the metric that is used to evaluate
and compare the results produced by these models. It has been
discovered that an RNN model based on time series is the
most accurate in forecasting the weather.
J. Rajanikanth, et al. [12] say it is important to evaluate
Bangalore's temperature trends over 102 years utilizing
exploratory data analysis, ML algorithms, and DM (Data
Mining) approaches in addition to sophisticated statistical
techniques. According to J. Rajanikanth et al. [13],
meteorological data is the most significant information that
can be obtained for farmers, agricultural goods, and other
related topics. It is exceedingly difficult to accurately
anticipate or evaluate temperatures across several locations.
To assess the temperature data for Chennai, we attempted a
variety of data science approaches, DM techniques, and ML
algorithms.
Singh, A. et.al.[14] said supervised ML classification
methods categorize data from past knowledge and examined
their usefulness. DM and NN should be included into the
workflow of weather prediction, according to M. G. Schultz et
al. [15], who claim that this would result in more accurate
weather predictions. Extreme temperature swings, as
mentioned by T V Rajinikanth et al. [16], make it hard for us
to establish reliable weather inferences and forecasts. Along
with linear regression analysis, useful analytical tools include
the K-means cluster method and the J48 classification
approach.
Data analytics and ML techniques, such as RF
classification, are used to forecast the weather, as stated by
Nitin Singh et al. [17]. In addition to this, a portable and low-
cost method for predicting the weather has just been created.
Scherset et al. [18] offer a method that is based on DL and
makes use of an artificial CNN that is trained on prior weather
forecasts. Dr. Bindhu V [19] said in her article that she has
used raspberry pi to monitor the sensors to keep track of the
various weather variations. It was evaluated by putting it into
practice in the many districts that make up the Indian delta.
Mr. K. Karthiban [20] has researched the several frameworks
that are now in use for creating a safe Internet of Everything
with big data analytics, and he also provides a comprehensive
critique of it.
III. METHODOLOGY
The data set was initially considered from IMD / India
water portal and fine-tuned into the required format after
replacing missing values with mean values, inconsistency, etc.,
by applying preprocessing techniques. The pre-processing is
very much required to avoid wrong conclusions if any anomaly
exists in the data set. It consists of 13 Attributes as year and 12
months and 102 years records of data consisting of
Temperature, Cloud Cover, Vapor pressure and Precipitation. It
was subjected to Exploratory Data Analysis (EDA) methods,
namely Scatter plots, Histogram plots, Boxplots, and
correlation graphs etc., to find hidden patterns and relations
among the attributes of the data.
The Rolling Mean (RM) and Rolling Standard Deviation
(RSD) graphs were drawn to assess the model's stability over
time. Results of the Augmented Dickey-Fuller (ADF) test
were analyzed based on the p-value for stationarity of data.
The moving average graph was plotted for further analysis.
The chart was drawn with dataset_logscaleminus moving
Average vs year along with rolling mean & rolling standard
deviation were removed, and the Dickey-Fuller test was done
for further analysis. Exponential Decay weighted average used
to model or describe time series and plotted over the Indexed
Dataset_logscalevs year graph.
Auto Correlation Function (ACF) and Partial
Autocorrelation Function (PACF) plots were drawn, which are
used to obtain the values of p and q to feed into the AR-I-MA
model. Auto Regression (AR) model predicts future behavior
based on past behavior, and the graph was drawn. After that,
the Moving Average (MA) model was used to calculate the
overall trend in a data set and was removed for further
analysis. The entire proposed methodology is shown in Fig.3.
Fig.3. Proposed Model
The ARIMA model uses time series data to either improve
one's understanding of the data set or to better forecast future
tendencies, and a graph representing this model was made.
ARIMA model Predictions for 162 years, 18 years and 38
years were plotted and shown in the chart. The Seasonal
Autoregressive Integrated Moving Average (SARIMA) model
uses past values but also considers any seasonality patterns and
graphs drawn to predict average temperatures of Dec & the Feb
Months, apart from January from 2003 2010. The next step is
to use Ordinary Least Squares (OLS) to estimate the missing
values in a linear regression model.
Proceedings of the Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC-2022).
DVD Part Number: CFP22OSV-DVD; ISBN: 978-1-6654-6940-1
978-1-6654-6941-8/22/$31.00 ©2022 IEEE
The objective is to find the linear approximation that
minimizes the discrepancy between the observed data and the
projected answers. For 2003-2010, January temperatures were
determined using OLS by combining the SARIMA forecasted
values for December and February. The findings were
compared to those generated by SARIMAX and with absolute
observed values culled from Secunderabad, India's
meteorological archives. The data sets are Time series oriented
so tried all possibilities of Time series approaches like ARIMA
etc. apart from that Multilinear regression approach was
applied because to predict Temperature other influential
parameters (Cloud Cover, Vapor pressure and Precipitation)
are required.
IV. RESULTS
The given Hyderabad average temperature data set of 102
years was initially considered and fine-tuned into the required
format after removing missing values with mean values,
inconsistency, etc., by applying preprocessing techniques. It
consists of 13 attributes, namely Year, Jan, Feb, Mar… Dec. It
was subjected to Exploratory Data Analysis (EDA) techniques,
namely Scatter plots, Histogram plots, Boxplots, and
correlation graphs etc. for finding hidden patterns and relations
among the attributes of the data.
The Box plot was drawn to identify mean values, the data
set's dispersion, outliers, and skewness signs shown in Fig.4, in
which 12 Months were taken along the x-axis and Temperature
in 00C along the y-axis. This graph shows that May has the
highest temperature, followed by Apr and Jun months. The
histogram plot is shown in Fig.5. A Scatter plot i.e., Build a
graph in which the independent variable is displayed along the
horizontal axis and the dependent variable is plotted along the
vertical axis. Put a dot or any other symbol at the point where
the x-axis value overlaps the y-axis value for each set of data
was shown in Fig.6 for the Jan month with years along the x-
axis and Temp in 00C along the y-axis, which indicates the
temperature is increasing with the years.
Fig.7 shows the Distribution plot & Fig.8 shows the
Density plot for Jan month with Temperatures in 00C along the
x-axis and Distribution and Density values along the y-axis,
respectively. The count plot plots the count of the number of
records by category, whereas the bar plot plans a weight or
metric for each type and is shown in Fig.9 with Temperatures
across the x-axis and counts along the y-axis and having
23.0850C has the highest counts. Fig.10 shows Time series
plots for all 12 months with assorted colors, and it was
observed that the lowest temperature is for December at the
bottom and the highest for May at the top.
Fig.4: Boxplot is drawn for Temperature years
Fig. 5: Histogram plot for all the attributes across
Fig.11 shows a correlation plot, and diagonal elements have
a correlation value of 1. Fig.12 shows year vs Jan Temperature
in which years were taken along the x-axis and Temperature in
00C along the y-axis.
Fig.6: Scatter plot for Jan Month
Fig.7: Distribution Plot for Jan month
The graph was drawn Years vs Temperature in 00C along
with the RM and RSD used to assess the model's stability over
time, shown in Fig.13.
Fig.8: Density Plot for Jan month
Proceedings of the Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC-2022).
DVD Part Number: CFP22OSV-DVD; ISBN: 978-1-6654-6940-1
978-1-6654-6941-8/22/$31.00 ©2022 IEEE
Fig.9: Count plot of Jan Month
The Auto-Regressive (AR) model represents a particular
random process. The AR model's output linearly relies on its
initial value, i.e., some lags in the data or the number of
previous observations. Specifically, it uses an Auto-Regressive
model to maximize information criteria over a range of lag
times.
Fig.10: Line graph for all the 12 months
Fig.11: Correlation plot for 12 months
The unit root test, of which the Augmented Dickey-Fuller
test is a subtype, is often employed to ascertain the extent to
which a trend may characterize a given time series.
Fig.12: Year vs Jan Temperature of Hyderabad
Fig.13: Rolling Mean and Standard Deviation
Based on the p-value for data stationery in Table I, we
assessed the Augmented Dickey-Fuller (ADF) test results. Data
is steady and does not have a unit root (p-value = 0.05). The
test helps to improve the accuracy of the forecast. Figure 14
shows the relationship between the year and the indexed
Dataset log scale. The overall Trend of Temperatures are found
using Box plots, Line graphs and rolling Mean & standard
deviation & ARIMA models and their variants across years /
months.
TABLE I. RESULTS OF THE DICKEY-FULLER TEST
Fig. 14. Indexed Dataset log scale vs year
A linear combination of a stochastic component's
present and historical observations is used to predict the MA
model's output, called a moving average (MA). Using the
Indexed Dataset log scale on the y-axis and the year on the x-
axis, a moving average graph was created, as shown in Fig.15.
Fig.15:dataset_logscale minus moving Average vs year
Proceedings of the Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC-2022).
DVD Part Number: CFP22OSV-DVD; ISBN: 978-1-6654-6940-1
978-1-6654-6941-8/22/$31.00 ©2022 IEEE
Fig.16: Rolling mean and standard deviation year
Fig.16 displays a rolling mean and standard deviation
graph, where dataset logscale minus moving average is shown
along the y-axis and year is plotted along the x-axis. According
to Table II, the p-value is less than 0.05, indicating that the data
is steady and does not have a unit root.
TABLE II. RESULTS OF DICKEY-FULLER TEST FOR ROLLING MEAN
& SD
Fig.17: Exponential Decay weighted average graph Mean & SD
The exponential Decay weighted average graph shown in
Fig.17 is used to model or describe time series and plotted with
Indexed Dataset log scale along the y-axis and year along the
x-axis. Rolling Mean and Standard deviation graphs were
shown in Fig.18 with dataset Logscale Minus Moving Average
along the y-axis and year along the x-axis drawn for further
analysis. Dickey's fuller test was conducted and delivered in
Table III, which shows that the p-value <= 0.05: the data does
not have a unit root and is stationary.
Fig.19 shows dataset Log Diff Shifting vs year graph with
year along the x-axis and dataset Log Diff Shifting (i.e.,
indexed Dataset log Scale indexed Dataset log Scale) along the
y-axis.
Fig.17: Rolling mean and standard deviation
TABLE III. RESULTS OF DICKEY-FULLER TEST FOR ROLLING MEAN
& SD
Rolling mean and standard deviation graphs were
drawn and shown in Fig.20, in which year was taken along the
x-axis and dataset Log scaling Minus Moving Avg along the y-
axis.
Fig.19: Dataset Log Diff Shifting vs year graph
Fig.20: Rolling mean and standard deviation
The Dickey fuller test was conducted and shown in Table
IV, and shows that the p-value <= 0.05: the data does not have
a unit root and is stationary.
Proceedings of the Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC-2022).
DVD Part Number: CFP22OSV-DVD; ISBN: 978-1-6654-6940-1
978-1-6654-6941-8/22/$31.00 ©2022 IEEE
TABLE IV. RESULTS OF DICKEY-FULLER TEST FOR ROLLING
MEAN & SD
Fig.21: ACF & PACF graph
ACF and PACF plots are drawn and shown in Fig.21,
which are used to obtain the values of p and q to feed into the
ARIMA model. These graphs were drawn with n lags along the
x-axis and dataset Log Diff Shifting along the y-axis.
Auto Regression (AR) model predicted future behavior
based on past behavior and was drawn with years along the x-
axis and Dataset Log Diff Shifting & results_AR.fitted values
along the y-axis as plotted in Fig.22 after that Moving Average
(MA) model was used to calculate overall trend in a data set
and was drawn with year along the x-axis and Dataset Log Diff
Shifting & results_MA.fitted values as shown in Fig.23.
Fig.22: AR model
Fig.23: MA model
Auto-Regressive Integrated Moving Average (ARIMA)
uses past time series data to predict future values.
The ARIMA model is often denoted by the symbols
ARIMA (p, d, q), where p is the lag order of the auto regressive
model AR(p), d is the degree of differencing, and q is the
number of time lags between successive observations (or
several times data has been subtracted with past value). The
MA (q) model's order is denoted by Q.
The ARIMA model was used to visualize the dataset's time
series data and make predictions about future trends, as shown
in Table V; the graph was constructed with the year along the
x-axis and the Dataset Log Diff Shifting & results ARIMA
fitted values on the y-axis as shown in Fig.24.
Fig.24: ARIMA model
TABLE V. ARIMA PREDICTED DIFF VALUES
ARIMA model Predictions for 162 years were plotted and
shown in Fig.25. ARIMA model Predictions for 18 years were
plotted and displayed in Fig.26.
Fig.25: ARIMA model predictions for 162 years
Fig.26: ARIMA model predictions for 18 years
The ARIMA model was drawn with year along the x-
axis and Dataset Log Diff Shifting & results_ARIMA.fitted
values along the y-axis. ARIMA model Predictions for 38
years were plotted and shown in Fig.27.
Proceedings of the Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC-2022).
DVD Part Number: CFP22OSV-DVD; ISBN: 978-1-6654-6940-1
978-1-6654-6941-8/22/$31.00 ©2022 IEEE
Fig.27: ARIMA model predictions for 38 years
Fig.28: SARIMA model Prediction for Dec
Seasonal Autoregressive Integrated Moving Average
(SARIMA) model uses past values but also considers any
seasonality patterns, and graphs were drawn to predict average
temperatures of 2003 2010 years for the Dec & Feb Months
apart from Jan month also and shown in Fig.28, Fig.29 and
Fig.30.
Fig.29: SARIMA model Prediction for Feb
Fig.30: SARIMA model Prediction for Jan
Then, the initial unknown linear regression model parameters
are estimated using Ordinary Least Squares (OLS). The
objective is to reduce, as much as possible, the disparities
between the responses anticipated by the linear approximation
(use to approximate the value of a function at a particular
point) of the data and the observations gathered from some
random datasets. Jan temperatures were found from 2003 to
2010, as shown in Table VI.
The OLS model equation with coefficients and intercept is
JAN = 9.791486 + (-0.000424) * Year + (0.527119) * Feb +
(0.027214) * Dec --- (1)
The results were compared with SARIMAX and actual
observed values from Begum pet airport of Secunderabad,
India, weather history.
TABLE VI. ARIMA PREDICTED TEMPERATURE VALUES OF JAN
MONTH BOTH BY SARIMAX AND OLS.
V. CONCLUSION
The months of May, April, and June have the highest
average temperatures in the Hyderabad region. January through
May have the most elevated average temperatures; from June
through August, the temperature drops, and from September
through December, it rises ever-so-slightly. When a count plot
is generated, the total number of records is found in the
23.085OC category. The AR model's output linearly relies on
its initial value, i.e., some lags in the data or the number of
initial observations. The AR and MA models, the Exponential
Decay weighted average model, and the dataset's rolling mean
and standard deviation graph were all subjected to a Dickey
fuller test. As p = 0.05 from Log Diff Shifting demonstrates,
the data is not moving. The AR-I-MA model's p and q
parameters need to be determined. Plots of the autocorrelation
function (ACF) and partial auto correlation function (PACF)
are generated. An AR and MA models were constructed for the
Dataset Log Diff Shifting data set to determine the overall
trend. ARIMA is a forecasting method based on the premise
that just the historical values of the time series are necessary
for prediction. The ARIMA model uses data from the past 162,
18, and 38 years to make accurate predictions. The SARIMA
model considers historical data and draws from seasonality
patterns and accompanying graphics to forecast future average
temperatures. In a linear regression model, unknown
parameters may be estimated via Ordinary Least Squares
(OLS) is used. Specifically, it was shown that OLS on
SARIMA's projected values yielded the best outcomes. The
OLS forecasted temperatures are close to the observed
temperatures.
Acknowledgements: Special thanks to the IMD & India water
portal for providing the temperature data set.
Proceedings of the Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC-2022).
DVD Part Number: CFP22OSV-DVD; ISBN: 978-1-6654-6940-1
978-1-6654-6941-8/22/$31.00 ©2022 IEEE
REFERENCES
[1] Climate and food security. International Rice Research Institute. 1987.
p. 348. ISBN 978-971-10-4210-3.
[2] Norman, Michael John Thornley; Pearson, C.J; Searle, P.G.E
(1995). The ecology of tropical food crops. Cambridge University Press.
pp. 249251. ISBN 978-0-521-41062-5.
[3] Köppen W. Die Wärmezonen der Erde, nach der Dauer der heissen,
gemässigten und kalten Zeit und nach der Wirkung der Wärme auf die
organische Welt betrachtet. Meteorologische Zeitschrift. 1884;1(21):5-
226.
[4] "Hyderabad". India Meteorological Department.Archived from the
original on November 10 2013.Retrieved June 13 2012.
[5] "Weatherbase entry for Hyderabad".Canty and Associates LLC.
Archived from the original on October 30 2013.Retrieved June 13 2012.
[6] "Extreme weather events Overall". Meteorological Centre, Hyderabad.
December 2013. Archived from the original on April 2 2015.Retrieved
March 6 2015.
[7] Didal VK, Brijbhooshan AT, Choudhary K. Weather Forecasting in
India: A Review. Int. J. Curr. Microbiol. App. Sci. 2017;6(11):577-90.
[8] "Weather and Climatology of Telangana", published by Telangana State
Development Planning Society (TSDPS)/Directorate of Economics &
Statistics (DES (Directorate of Economics & Statistics)).
[9] Yamanouchi M, Ochiai H, Reddy YK, Esaki H, Sunahara H. Case study
of constructing weather monitoring system in a difficult environment.
In2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing
and Communications and Its Associated Workshops 2014 December 9
(pp. 692-696). IEEE.
[10] Bochenek B, Ustrnul Z. Machine learning in weather prediction and
climate analysesApplications and perspectives. Atmosphere. 2022
January 23;13(2):180.
[11] Singh S, Kaushik M, Gupta A, Malviya AK. Weather forecasting using
machine learning techniques. InProceedings of 2nd International
Conference on Advanced Computing and Software Engineering
(ICACSE) 2019 March 11.
[12] Rajanikanth J, Kanth TR. An explorative data analysis on Bengaluru
City weather with hybrid data mining techniques using R. In2017
International Conference on Current Trends in Computer, Electrical,
Electronics and Communication (CTCEEC) 2017 September 8 (pp.
1121-1125). IEEE.
[13] Rajanikanth J, Rajini Kanth TV. Chennai Weather Data Analysis Using
Hybrid Data Mining Techniques. InInternational Conference on
Application of Robotics in Industry using Advanced Mechanisms 2019
August 16 (pp. 357-367). Springer, Cham.
[14] Singh A, Thakur N, Sharma A. A review of supervised machine learning
algorithms. In2016 3rd International Conference on Computing for
Sustainable Global Development (INDIACom) 2016 March 16 (pp.
1310-1315). Ieee.
[15] Schultz MG, Betancourt C, Gong B, Kleinert F, Langguth M, Leufen
LH, Mozaffari A, Stadtler S. Can deep learning beat numerical weather
prediction?. Philosophical Transactions of the Royal Society A. 2021
April 5;379(2194):20200097.
[16] T V Rajinikanth, V V SSS Balaram, N.Rajasekhar, "Analysis of Indian
Weather Data Sets Using Data Mining Techniques",
DhinaharanNagamalai et al. (Eds) : ACITY, WiMoN, CSIA, AIAA,
DPPR, NECO, InWeS 2014, pp. 8994, 2014. © CS & IT-CSCP 2014
DOI : 10.5121/csit.2014.4510.
[17] Singh N, Chaturvedi S, Akhter S. Weather forecasting using machine
learning algorithm. In2019 International Conference on Signal
Processing and Communication (ICSC) 2019 March 7 (pp. 171-174).
IEEE.
[18] Scher S, Messori G. Predicting weather forecast uncertainty with
machine learning. Quarterly Journal of the Royal Meteorological
Society. 2018 Oct;144(717):2830-41.
[19] Dr. Bindhu V,Design and Development of Automatic Micro Controller
based Weather Forecasting Device, Journal of Electronics, and
Informatics (2020) Vol.02/ No. 01 Pages: 1-9
http://www.irojournals.com/iroei/ DOI: https://doi.org/10.36548/
jei.2020.1.001
[20] Mr. K. Karthiban, Big Data Analytics For Developing Secure Internet
Of Everything, Journal of ISMAC (2019) Vol.01/ No. 02, June 2019, pp:
129-136 ISSN: 2582-1369 (online) DOI: https://doi.org/10.36548
/jismac.2019.2.006
Proceedings of the Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC-2022).
DVD Part Number: CFP22OSV-DVD; ISBN: 978-1-6654-6940-1
978-1-6654-6941-8/22/$31.00 ©2022 IEEE
Article
Full-text available
In this paper, we performed an analysis of the 500 most relevant scientific articles published since 2018, concerning machine learning methods in the field of climate and numerical weather prediction using the Google Scholar search engine. The most common topics of interest in the abstracts were identified, and some of them examined in detail: in numerical weather prediction research—photovoltaic and wind energy, atmospheric physics and processes; in climate research—parametrizations, extreme events, and climate change. With the created database, it was also possible to extract the most commonly examined meteorological fields (wind, precipitation, temperature, pressure, and radiation), methods (Deep Learning, Random Forest, Artificial Neural Networks, Support Vector Machine, and XGBoost), and countries (China, USA, Australia, India, and Germany) in these topics. Performing critical reviews of the literature, authors are trying to predict the future research direction of these fields, with the main conclusion being that machine learning methods will be a key feature in future weather forecasting.
Article
Full-text available
The recent hype about artificial intelligence has sparked renewed interest in applying the successful deep learning (DL) methods for image recognition, speech recognition, robotics, strategic games and other application areas to the field of meteorology. There is some evidence that better weather forecasts can be produced by introducing big data mining and neural networks into the weather prediction workflow. Here, we discuss the question of whether it is possible to completely replace the current numerical weather models and data assimilation systems with DL approaches. This discussion entails a review of state-of-the-art machine learning concepts and their applicability to weather data with its pertinent statistical properties. We think that it is not inconceivable that numerical weather models may one day become obsolete, but a number of fundamental breakthroughs are needed before this goal comes into reach. This article is part of the theme issue ‘Machine learning for weather and climate modelling’.
Article
Full-text available
Weather forecasts are inherently uncertain. Therefore, for many applications forecasts are only considered valuable if an uncertainty estimate can be assigned to them. Currently, the best method to provide a confidence estimate for individual forecasts is to produce an ensemble of numerical weather simulations, which is computationally very expensive. Here, we assess whether machine learning techniques can provide an alternative approach to predict the uncertainty of a weather forecast given the large‐scale atmospheric state at initialisation. We propose a method based on deep learning with artificial convolutional neural networks that is trained on past weather forecasts. Given a new weather situation, it assigns a scalar value of confidence to medium range forecasts initialised from said atmospheric state, indicating whether the predictability is higher or lower than usual for the time of the year. While our method has a lower skill than ensemble weather forecast models in predicting forecast uncertainty, it is computationally very efficient and outperforms a range of alternative methods that do not involve performing numerical forecasts. This shows that it is possible to use machine learning in order to estimate future forecast uncertainty from past forecasts. The main constraint in the performance of our method seems to be the number of past forecasts available for training the machine learning algorithm.
Article
The entailment for weather forecasting to take the essential pre-cautious measures in our regular routines and elude the unwanted fatalities has made this more attractive area of research. Particularly in the rural areas the weather forecasting enables the farmers to have an effective crop management, avoiding the destruction in the crops and increasing the yield. In order to have a real time weather forecasting the proposed method in the paper tries to develop an automatic weather forecasting device based on the microcontroller. The proposed method utilizes the sensors to monitor the weather changes and engages the raspberry pi to process the information gathered and convey it to the end user. The proposed system was tested by implementing it in the Indian delta districts and the accuracy, precision and flexibility in the forecasting was evinced by the data output observed over and done with the Thinkspeak .Web
Chapter
Weather data is the most important information for the farmers, Agricultural products, their sales and marketing people, Tourist Travel agencies, Meteorological department scientists & Analysts, Agriculture university people and Environmental department people etc. It is very difficult to predict or analyze Temperatures effectively across various regions to help the farmers and others to safeguard their properties and lives. An attempt has been made in this direction with the help of data science techniques, Data Mining Techniques and Machine learning algorithms to analyze the Chennai temperature data effectively and to bring useful conclusions in this direction.