ArticlePDF Available

Predictive Analytics In Weather Forecasting Using Machine Learning Algorithms

Authors:

Abstract

Agriculture is the backbone of every economy. In a country like India, which has ever increasing demand of food due to rising population, advances in agriculture sector are required to meet the needs. To add to it, the present economic conditions and government policies of India are such that it necessitates the adoption of Precision farming or smart farming. It will enable the farmers to maximize their crop yields and minimize the input costs as well as the losses due to reasons like uncertain rainfall, droughts etc. from this model. For Predicting weather forecasting we will use machinelearning Algorithms like Linear Regression, Decision tree.
1
Predictive Analytics In Weather Forecasting Using
Machine Learning Algorithms
Aastha Sharma1,*, Vijayakumar V1
1SCSE, Vellore Institute of Technology, Chennai, India
Abstract
Agriculture is the backbone of every economy. In a country like India, which has ever increasing demand of food due to
rising population, advances in agriculture sector are required to meet the needs. To add to it, the present economic
conditions and government policies of India are such that it necessitates the adoption of Precision farming or smart
farming. It will enable the farmers to maximize their crop yields and minimize the input costs as well as the losses due to
reasons like uncertain rainfall, droughts etc. from this model. For Predicting weather forecasting we will use machine
learning Algorithms like Linear Regression, Decision tree.
Keywords: Predictive Analytics, Weather Forecasting, Machine Learning Algorithms.
Received on 10 April 2019, accepted on 24 April 2019, published on 08 July 2019
Copyright © 2019 Aastha Sharma et al., licensed to EAI. This is an open access article distributed under the terms of the Creative
Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and
reproduction in any medium so long as the original work is properly cited.
doi: 10.4108/eai.7-12-2018.159405
*Aastha Sharma. Email:aastha.sharma2018@vitstudemt.ac.in
1. Introduction
Machine Learning Technique is most robust technique
for predicting weather forecasting. In past days we had to
give instructions to System and then it gave result. but now
we have machine learning algorithm so we can directly give
inputs and feature and it generates result automatically. Just
we need train the data then it generates model and features.
[1]Most of the work related to machine learning for
agriculture either solves the purpose of cultivating a crop
and suggest weather data based on the statistical
information.[2]Most of the work does not handle the
planting of crops based on the climate.[3] plant diseases and
insect pests causes significant reduction in quality as well as
quantity of agricultural product so plant disease and insects
pests forecasting is of great significance and quite necessary.
2.Machine Learning Algorithms
Machine learning algorithms are described as learning a
target function (f) that best maps input variables (X) to an
output variable (Y): Y = f(X).
2.1. Linear Regression
Linear regression is the most basic and frequently used
predictive model for analysis. Regression estimates are
generally used to describe the data and the elucidate
relationship between one or more independent variables and
dependent variables. Linear regression finds the best-fit
through the points, graphically. The best-fit line through the
points is known as the regression line.
OLS Model
Ordinary least square model is the most common estimate
method that is used in linear model. It is used for getting
best estimates. It minimizes the sum of square in the
dependent variable. This helps us to find the relationship
between dependent variable and independent variable. As it
EAI Endorsed Transactions
on Cloud Systems Research Article
EAI Endorsed Transactions on
Cloud Systems
Online First
2
calculated the distance between predicted value and actual
value.
Advantage
Simple mathematical representation.
It doesn’t take extra-large memory.
It is very easy to clarify. Because it has numerical
results.
Disadvantage
It requires linearly spread data. If we have more
features it doesn’t provide accurate result.
The linear regression model fails when we have non-
linear data.
Algorithm Steps
Import all libraries and read weather data.
Define independent variable.
Define dependent variable.
Split and train data then test the data.
Create linear regression model.
Predict weather for future.
2.2. Decision Tree
It is a type of supervised learning algorithm that we mostly
use for classification problem. it works for two dependent
variable categorical and continuous dependent variables. In
this type of algorithm, we split the population into two or
more homogeneous sets. This is done because of most
significant attributes/ independent variables to make as
distinct groups as possible.
Data Shaping
It is a type of supervised learning algorithm that we mostly
use for classification problem. it works for two dependent
variable categorical and continuous dependent variables. In
this type of algorithm, we split the population into two or
more homogeneous sets. This is done because of most
significant attributes/ independent variables to make as
distinct groups as possible.
Label Selection
After shaping the data we select labels as features. We create labels
for classify data. After labelling we move to splitting. We select
first two columns from data for labelling.
Splitting
After labeling we splits the labels for finding best feature. Based on
the best feature result we only get the accurate predicted result.
Advantage
This algorithm handles both the continuous and
categorical data.
When we have non linear data decision tree will be
useful. Because it splits the data set for creating
more features.
Disadvantage
If you have more features, your decision tree is
probably going to be the deeper and bigger.
It normally over fits a lot as it creates high-
variance models.
Algorithm Steps
Import all libraries and read weather data.
Shape all data.
Remove noisy data.
Select labels.
Classify and train the data.
Predict the result.
.
2.3. Used Python Library
SKlearn
It is very useful library for machine learning modeling. It
initially released on 2007. It includes lot of machine
learning algorithms. In this library we use modules like
DecisionTreeClassifier, train_test_split, accuracy_score.
Numpy
It is basically used in mathematical operations. It reads
the data as numpy array for the manipulation purpose. It
provides fast mathematical functions for calculation. For
machine learning it is very common library.
Panda
This is the library make data analysis easier in python.
This library also used to read and write the files. With
data frame data manipulation can be easily done.
3. Proposed System Architecture
Aastha Sharma, Vijayakumar V
EAI Endorsed Transactions on
Cloud Systems
Online First
Predictive Analytics In Weather Forecasting Using Machine Learning Algorithms
3
3.1. Methodology
Data Preprocessing
The more you preprocess the data set, the more accurate
result you will get. basically, it is the process where we
remove some unwanted or not useful, noisy data from the
collected data. Also, if we don’t remove any null value or
empty field then we cannot get the proper results.
So, it is very important process to develop the model.
Normalization
It is also known as machine learning module. Here we train
the collected dataset, test the dataset and then generate the
new model, again for cross validation we blind the dataset.
Learn Model
This is the last process, In this phase we learn from model
and predict the result. Learning model is important we have
evaluated proper result. Here we get the artefact model from
the training process.
4. Results
4.1. Scatter plots
Figure 2. Scatter plots for linear regression model
4.2. OLS model Results
Figure 3. OLS model results
4.3. Calculated Errors
Figure 4. Errors for Estimates
4.1. Accuracy Measurement of Decision Tree
Figure 5. Select features Figure 6. Trained data
Figure 7. Accuracy measurement of decision tree
EAI Endorsed Transactions on
Cloud Systems
Online First
4
5. Future Scope
As for future scope we can’t able to use linear regression
when it comes to huge amount of data set and as its doesn’t
give accurate result. So, for predicting huge volume of
dataset we can develop a neural network system for more
better results and accurate prediction of the weather
forecasting. Also we connect analysing process to IOT
technology. Because without data we can not perform
analysis and prediction because IOT is major source of data.
So IOT will generate data from devices which helps to take
initiative to improve decision making.
6. Conclusion
Machine learning algorithms plays a major role in predictive
analytics, which uses the current and past historical data sets
to discover knowledge from it and by using that data it the
predict future occurrences. In this paper we have proposed
two algorithm such as linear and decision tree for weather
forecasting and prediction. we have concluded that linear
regression is best when predicting weather forecast which
have dependent dataset because already we have linear data
for linear regression but for decision tree, we must give the
label manually and the main Disadvantage of the decision
tree is If you have more features, your decision tree is
probably going to be the deeper and bigger and other one is
that It normally over fits a lot as it creates high-variance
models.
References
[1] Mark Holmstrom, Dylan Liu, Christopher Vo
Machine Learning Applied to Weather
ForecastingStanford University(Dated: December
15, 2016).
[2] Priyanka P. Shinde, Big Data Predictive
Analysis:Using R AnalyticalTool” Assistant
Professor, Department of MCA Government
College of Engineering Karad, Karad, Maharashtra,
India.
[3] Gauri D. Kalyankar, Shivananda R. Poojara,
Nagaraj V. Dharwadkar,”Predictive Analysis of
Diabetic Patient DataUsing Machine Learning and
Hadoop” Dept. of Computer Science and
Engineering Rajarambapu Institute of Technology
Sakhrale, Sangli Dist.
[4] Hina Gulati,”Predictive Analytics Using Data
Mining Technique” Computer Science and
Engineering Amity University, Noida, INDIA.
.
Aastha Sharma, Vijayakumar V
EAI Endorsed Transactions on
Cloud Systems
Online First
... So, the experimenter or the manufacturer introduces its products in the market and wants to make it on the place of desire and the focus of consumers by making their warranty limits more acceptable to them. For more information about applications of prediction, the reader can see the following researches: Ghafouri et al. [1], Pushpalatha et al. [2], Lee et al. [3], Burnaev [4], Sharma and Vijayakumar [5], and Asher et al. [6]. e future prediction problem can be separated into two types as follows: the first type is known as an OSP problem, and the other one is a TSP problem. ...
Article
Full-text available
New Weibull-Pareto distribution is a significant and practical continuous lifetime distribution, which plays an important role in reliability engineering and analysis of some physical properties of chemical compounds such as polymers and carbon fibres. In this paper, we construct the predictive interval of unobserved units in the same sample (one sample prediction) and the future sample based on the current sample (two-sample prediction). The used samples are generated from new Weibull-Pareto distribution due to a progressive type-II censoring scheme. Bayesian and maximum likelihood approaches are implemented to the prediction problems. In the Bayesian approach, it is not easy to simplify the predictive posterior density function in a closed form, so we use the generated Markov chain Monte Carlo samples from the Metropolis-Hastings technique with Gibbs sampling. Moreover, the predictive interval of future upper-order statistics is reported. Finally, to demonstrate the proposed methodology, both simulated data and real-life data of carbon fibres examples are considered to show the applicabilities of the proposed methods.
... Wang et al. [22] proposed a deep learning technique with a different validation parameter to predict future weather conditions. Sun et al. [23], Jane et al. [24], and Sharma [25] investigated the importance of machine learning in forecasting environmental conditions. All machine learning algorithms, such as supervised, unsupervised, and clustering methods, are analyzed. ...
Article
Full-text available
In the past few decades, climatic changes led by environmental pollution , the emittance of greenhouse gases, and the emergence of brown energy utilization have led to global warming. Global warming increases the Earth's temperature, thereby causing severe effects on human and environmental conditions and threatening the livelihoods of millions of people. Global warming issues are the increase in global temperatures that lead to heat strokes and high-temperature-related diseases during the summer, causing the untimely death of thousands of people. To forecast weather conditions, researchers have utilized machine learning algorithms, such as autoregressive integrated moving average, ensemble learning, and long short-term memory network. These techniques have been widely used for the prediction of temperature. In this paper, we present a swarm-based approach called Cauchy particle swarm optimization (CPSO) to find the hyperparameters of the long short-term memory (LSTM) network. The hyperparameters were determined by minimizing the LSTM validation mean square error rate. The optimized hyper-parameters of the LSTM were used to forecast the temperature of Chennai City. The proposed CPSO-LSTM model was tested on the openly available 25-year Chennai temperature dataset. The experimental evaluation on MAT-LABR2020a analyzed the root mean square error rate and mean absolute error to evaluate the forecasted output. The proposed CPSO-LSTM outperforms the traditional LSTM algorithm by reducing its computational time to 25 min under 200 epochs and 150 hidden neurons during training. The proposed hyperparameter-based LSTM can predict the temperature accurately by having a root mean square error (RMSE) value of 0.250 compared with the traditional LSTM of 0.35 RMSE.
... Aastha Sharma et al. [23] used linear regression and decision tree to classify and predict rain that could help farmers in precision farming. Bin Wang et al. [24] used ANN to weather forecasting and measured the performance of ANN using negative log-likelihood error. Ling Chen et al. [25] statistically compared the performance of ARIMA model and ANN in short-term wind speed forecasting. ...
Experiment Findings
Full-text available
The objective of this project is to demonstrate the use of machine learning techniques for identifying weather patterns at airport and to predict rainfall based on various factors of weather from the weather datasets obtained. This analysis can be used to help make intelligent and planned decisions about flight management at airport stations. Kmeans clustering and Hierarchical Clustering were used to bring about the weather patterns in the form of clusters from the dataset. Linear Regression was used to predict rain based on visibility in sky from Dublin Airport weather dataset. Multiple Linear Regression was used to predict rain based on cloud amount, visibility in sky, relative humidity and sun shine duration from Shannon Airport weather dataset. Multivariate Multiple linear regression was used on Cork Airport weather dataset to predict rain and visibility of cloud based on atmospheric temperature, cloud amount in sky and relative humidity with the visualization of 95% confidence using prediction eclipse.
Machine Learning Applied to Weather Forecasting
  • Mark Holmstrom
  • Dylan Liu
  • Christopher Vo
Mark Holmstrom, Dylan Liu, Christopher Vo "Machine Learning Applied to Weather Forecasting" Stanford University(Dated: December 15, 2016).
Big Data Predictive Analysis:Using R AnalyticalTool" Assistant Professor, Department of MCA Government College of Engineering Karad
  • P Priyanka
  • Shinde
Priyanka P. Shinde, "Big Data Predictive Analysis:Using R AnalyticalTool" Assistant Professor, Department of MCA Government College of Engineering Karad, Karad, Maharashtra, India.
Predictive Analytics Using Data Mining Technique
  • Hina Gulati
Hina Gulati,"Predictive Analytics Using Data Mining Technique" Computer Science and Engineering Amity University, Noida, INDIA. .
Vijayakumar V EAI Endorsed Transactions on Cloud Systems Online First
  • Aastha Sharma
Aastha Sharma, Vijayakumar V EAI Endorsed Transactions on Cloud Systems Online First