Conference PaperPDF Available

Ground-based cloud image categorization using deep convolutional visual features

Liang Ye, Zhiguo Cao, Yang Xiao, Wei Li
National Key Laboratory of Science and Technology on Multi-spectral Information Processing,
School of Automation, Huazhong University of Science and Technology, P. R. China
{liang ye, zgcao, Yang Xiao, wlee}
Ground-based cloud image categorization is an essential and
challenging task in automatic sky and cloud observation field.
Till now, it still has not been well addressed in both meteorol-
ogy and image processing communities, due to the large vari-
ation of cloud appearance. One feasible way to solve this is to
find more discriminative visual representation to characterize
the different kinds of clouds. Many efforts have been paid in
this way. However, to our knowledge, most of the existing
methods only resort to the hand-craft visual descriptors (e.g.,
LBP, CENTRIST and color histogram). The resulting per-
formance is unfortunately not satisfied enough. Inspired by
the great success of deep convolutional neural networks (C-
NN) in large-scale image classification task (e.g., ImageNet
challenge), we first propose to transfer CNN to solve our rel-
ative small-scale cloud classification issue. The experiments
on two challenging cloud datasets demonstrate that, using the
deep convolutional visual features generated by CNN can sig-
nificantly outperform all the state-of-the-art methods in most
cases. Another important contribution of our work is that, we
find that applying Fisher Vector (FV) to encoding the off-the-
shelf CNN features can further leverage the performance.
Index TermsGround-based cloud classification, con-
volutional neural networks, Fisher Vector
Ground-based cloud observation plays an important role in
weather phenomenon observation, record, and research. In
recent years, ground-based cloud automatic observation sys-
tem has been needed and developed, in which cloud classifi-
cation is one of significant tasks. It can reduce the cost signif-
icantly, observe more continuously, and record more objec-
tively instead of human observers and recorders. Categoriza-
tion of cloud type is regarded as one of basic meteorological
The corresponding author is Yang Xiao (Yang This
work is jointly supported by the Chinese Fundamental Research Fund-
s for the Central Universities (HUST: 2014QNRC035 and 2015QN036),
and National High-tech R&D Program of China (863 Program) (Grant No.
elements specified by the China Meteorological Administra-
tion [1] and gets lots of attention from many researchers these
years. Isosalo et al. used local texture information to classi-
fy the clouds in the sky views and found that LBP performed
better than LEP in 2007 [2]. Soon after statistical texture fea-
tures were applied by Calb and Sabburg [3]. In 2010, Heinle
et al. presented an cloud classification algorithm based on a
set of mainly statistical features describing the color as well as
the texture of an image and they used the k-nearest-neighbor
classifier [4]. Liu et al. found that several structural features
extracted from the segment image and edge image are use-
ful for distinguishing cirriform, cumuliform, and waveform
clouds [5]. The most recent approach proposed by Zhuo et
al. can capture both texture and structure information from a
color cloud image [6]. Despite these efforts, the methods up
to now still lack in performance and accuracy hardly exceeds
Image features plays a key role in improving the perfor-
mance of cloud classification. Recently, the CNN has been
used to extract features for images which is proved to be very
useful for classification. Girshick et al. used CNN features to
represent image regions which were fed to a SVM classifier
[7]. For another, feature encoding and pooling methods can
often improve the quality of features for classification. Fisher
Vector (FV) is a state-of-the-art feature encoding technique
and has been widely used for image classification shown by
Snchez [8]. Recently, Cimpoi et al. proposed a new texture
descriptor obtained by Fisher Vector pooling of a Convolu-
tional Neural Network (CNN) filter bank [9]. This approach
which they called D-CNN combined the CNN and FV and
obtained good performance on texture and material recogni-
In this paper, we present a novel way to get the features
descriptor for ground-based cloud image classification. In-
stead of designing the filters and hand-craft descriptors via
different clouds characteristic represented on image, our ap-
proach uses CNN features of cloud images and FV encoding
results as low-level features of cloud images and outperforms
the state-of-the-art approaches for cloud classification.
Fig. 1. The main pipeline of ground-based cloud image classification using deep convolutional features.
As shown in Fig. 1, the whole process of our approach con-
sists of following steps: resize the input image for adapt-
ing to the input requirement of CNN; extract features using
the Caffe [10] implementation of the CNN configuration de-
scribed by Simonyan et al. [11]; encode the features extracted
by CNN using Fisher Vector; use a linear SVM classifier to
categorize the input images.
2.1. To be compatible with the CNN input
According to [10] and [11], the architecture of the CNN re-
quires inputs of a fixed 224 ×224 pixel size. Therefore, the
input images need be resized to be compatible with the C-
NN. Following [7], inputs were directly warped to 224 ×
224 pixel. Generally, the cloud images collected are nearly
foursquare, so the deformation of warped images is trivial. In
addition, the warped image will be normalized using the pa-
rameter in the pretrained CNN model before put in the CNN
refer to [12].
2.2. To get the features output from CNN
In R-CNN described by Girshick et al. [7], the features output
from the penultimate full connection layer are used as the fi-
nal features of an image region. As discussed in [9], R-CNN,
as an object descriptor, does not perform as well as D-CNN as
a texture descriptor in texture and material recognition. Com-
pared with different cloud categories, texture is more divisi-
ble and useful in classification than structure because shapes
of clouds usually change disorderly. Moreover, for some cat-
egories of cloud, the shapes can be also regarded as textures
of the images since clouds may do not cover the whole im-
age region. Therefore, an assumption that DCNN-like ap-
proach may obtain better performance in cloud classification
than RCNN-like approach will be testified by our following
To get the above features, a simple and efficient MAT-
LAB toolbox implementing CNNs [12] is used with the pre-
trained model off-the-shelf described by [13] which has 5
convolutional layers and 3 full connection layers and named
“imagenet-vgg-m”. The structure of this model is also shown
in Fig. 1.
2.3. Encode the features via Fisher Vector
If we use the features result in full connection layer of C-
NN directly like R-CNN in [7], the Fisher Vector encoding
process is not needed. Whereas we use the features result in
convolutional layer like D-CNN in [9], then the Fisher Vec-
tor is applied to acquire the representation of images through
encoding the CNN features.
Fisher Vector is one of the methods for encoding features.
Based on the assumption that all of the descriptors of an im-
age are independent identically distributed and meet Gaussian
mixture Models(GMMs), the gradient vectors of the likeli-
hood function of each GMM are used to describe the direc-
tion in which parameters should be modified to best fit the
data. Normalizing these vectors result in Fisher Vector and
they have been shown to be very useful in image classifica-
tion [8].
For each cloud image, the features output from the
5th convolutional layer of model “imagenet-vgg-m” are
13×13×512-dimensional data. That could be regarded as
169 (13×13) descriptors, each of which is a 512-dimensional
feature. 64 Gaussian components were subsequently used
to encode those descriptors via Fisher Vector, resulting in a
65K-dimensional descriptor for one input. In addition, FV is
Fig. 2. The images from 6 different cloud categories.
implemented based on the open source of VLFEAT [14].
2.4. Classification using SVM
An great effect of dimension raise via Fisher Vector is mak-
ing the samples more linearly separable. Therefore, Linear
SVM (built from the library package of LIBLINEAR [15]) is
applied for classification with the final fea-tures represented
by FV following CNN.
To verify our approach which uses CNN features and FV, it
is tested on a challenging ground-based cloud image dataset
’6 cloud HUST cloud’ collected by Zhuo et al. [6]. There are
1231 images from 6 classes: cirrocumulus and altocumulus;
cirrus and cirrostratus; cumulus; stratocumulus; stratus and
altostratus; clear sky. Fig. 2 shows some of images from each
class and more details can be found in [6].
For comparing with the approach of zhuo et al. [6], the
same experimental setup is used. The experiments are divided
into 5 groups. The number of training samples per class used
respectively in the 5 groups are 5, 10, 20, 40 and 80. The
rest of samples are used as a testing set. Furthermore, the
available samples are split randomly into a training set and
a testing set for 10 rounds. Effectiveness and robustness of
different methods are investigated using the average accuracy
and standard deviation of 10 rounds of testing on 5 groups
with different number of training samples.
Due to the same experimental setup, the performance of
the methods proposed by Zhuo et al. [6], Liu et al. [5], Heinle
et al. [4], Calb and Sabburg [3], and Isosalo et al. [2] on “6
class HUST cloud” listed in [6] are cited and employed for
comparison with ours.
In addition, following the comparison did by [9], the ap-
proach which use the results of the penultimate full connec-
Fig. 3. The image samples of 9 different finer cloud categories
from the “6 class HUST cloud” dataset.
tion layer (named “fc7”) of CNN as features directly is al-
so employed for comparison in this paper, and we can call
this method “RCNN-like”. Our approach which is inspired
by D-CNN use the Fisher Vector results of the 5th convolu-
tional layer (named “conv5”) outputs as features can be called
Table. 1 shows the classification results of different ap-
proaches on the “6 class HUST cloud” dataset. The results
clearly show that the approach of DCNN-like outperforms the
state-of-the-art methods on average accuracy for 10 rounds,
corresponding to all the situations of different training sam-
ples numbers. It demonstrates the effectiveness and generality
of our approach for the ground-based cloud image classifica-
On the other hand, as described in [9], D-CNN is a bet-
ter texture descriptor than R-CNN which has been considered
a good object descriptor. That is due to, the spatial location
information of local image patch which can indicate position,
shape and contour is included in the features output from ful-
l connection layer of CNN, and this location information of
parts of image in texture describing is reasonably less useful
than it in object describing. R-CNN uses the features output
from full connection layer directly, while D-CNN pools the
image parts through encoding the features output from convo-
lutional layer via Fisher Vector. This is why D-CNN performs
better than R-CNN in texture classification in [9]. For cloud
classification, the images usually include both cloud pixels
and sky pixels. And the position of cloud pixels or patch-
es should not affect the classification of the cloud. Since the
Training samples 5 10 20 40 80
Isosalo et al.[2] 39.8(±3.1) 47.7(±4.8) 58.3(±2.1) 66.5(±1.3) 72.2(±1.0)
Calbo et al.[3] 37.4(±2.4) 43.3(±3.8) 50.9(±2.0) 57.6(±3.0) 63.8(±1.2)
Heinle et al.[4] 34.4(±2.3) 37.7(±3.1) 44.9(±1.9) 51.5(±1.5) 56.8(±1.6)
Liu et al.[5] 29.0(±2.5) 30.6(±2.0) 33.8(±1.5) 38.1(±1.1) 41.1(±1.9)
Zhuo et al.[6] 45.2(±3.6) 53.5(±2.8) 66.2(±2.1) 74.7(±1.0) 79.8(±1.3)
RCNN-like 57.9(±4.5) 65.5(±2.5) 70.7(±1.2) 75.5(±1.2) 80.3(±1.3)
DCNN-like 55.8(±3.4) 65.4(±2.0) 74.4(±2.6) 79.2(±1.1) 83.8(±1.3)
Table 1. Cloud classification result(%) on the “6 class HUST cloud” dataset. Each method is tested for 5 groups (select 5,
10, 20, 40, 80 samples randomly as training samples and the rest of samples as testing samples). For each group, 10 rounds
experiments are carried and results of average accuracy for 10 rounds are reported.
cloud types cumulus cirrus cirrostratus cirrocumulus altocumulus clear sky stratocumulus stratus altostratus average
Isosalo et al.[2] 20 30.1 23.1 28.1 26.0 48.2 35.9 39.7 67.5 35.4
Calbo et al.[3] 25.2 23.7 46.5 43.9 50.4 61.7 59.7 68.9 66.1 49.6
Heinle et al.[4] 43.7 42.6 43.0 51.3 45.9 57.7 61.3 65.4 73.5 53.8
Liu et al.[5] 42.7 58.3 52.1 60.5 52.1 57.4 62 63.8 71.4 57.8
Zhuo et al.[6] 60.5 58.2 60.7 75.1 57.9 72.8 52.5 74.9 64.5 64.1
RCNN-like 77.5 65.0 68.9 90.0 60.5 94.6 63.1 96.8 73.4 76.7
DCNN-like 79.7 74.9 72.1 95.2 68.1 90.1 62.4 98.4 78.6 80.0
Table 2. Cloud classification result(%) under the finer categorization principle on the dataset. Experiments are carried using 40
training samples randomly and the rest of samples in the dataset for testing. The classification accuracy of each cloud category
is shown for the different methods.
clouds in our images rarely include the entire shape of the
cloud, and the edges between sky and clouds can be also re-
garded as texture of images, considering the cloud classifi-
cation task as a texture recognition is better than considering
it as an object classification. This is also proved in Table.
1 that shows our D-CNN-like approach obtained the best per-
formance and the approach in [5] didn’t perform well because
it only used structure features like shape and position relation
while ignoring the texture information.
Following Zhuo et al.[6], we refined the “6 class HUST
cloud” dataset to 9 individual categories. As shown in Fig.
3, altocumulus and cirrocumulus are regarded as one class re-
spectively while they are compounded to a single class in the
“6 class HUST cloud” dataset because these two categories
of cloud are extremely similar. The same can be applied to
cirrus and cirrostratus, stratus and altostratus, all images in
the dataset are separated 9 classes. This finer categorization
principle brings more challenges to cloud classification. As it
stands, the average accuracy of the state-of-the-art approach
of zhuo et al. [6] drops ten percentage from 74.7% to 64.1%,
under the condition that 40 samples are used for training per
class. Our approach is also tested according to the finer cate-
gorization principle and methods in Table. 1 are reemployed
for comparison. Experiments are carried out under the same
condition with [6]. The classification results of each class and
overall average accuracy are both reported and listed in Table.
According to Table. 2, our approaches ”RCNN-like” and
”DCNN-like” both outperform the other methods. Further-
more, Compared to the results on 6-class categorization prin-
ciple listed in Table. 1, the performance of our approach on
6-class and 9-class categorization principle are comparable.
The promising results above show the stronger descriptive
power of CNN features and the effectiveness of Fisher Vec-
tor encoding these features. Thus, our approach has more
potential in the practical applications of cloud classification.
Actually, except loading the CNN model and learning the G-
MMs and the SVM, to process a new test cloud image will
cost within 3.1s (30ms for getting CNN features and 2s for
encoding via FV) using our approach in a normal computer. It
satisfy the requirement of categorization task in ground-base
cloud automatic observation system.
In this paper, deep convolutional features are firstly used as
a new perception mechanism for ground-based cloud image
categorization. It cloud be great low-level visual descriptors
for cloud images and Fisher Vector encoding could improve
the classification performance advisably. According to the
experimental results, Our approach using deep convolution-
al features and Fisher Vector outperforms the state-of-the-
art methods significantly on the challenging “6 class HUST
cloud” dataset. And its advantage were highlighted under the
finer categorization rule (9 classes).
In future work, we intend to extend the ground-based im-
age dataset and test different models of convolutional neural
network. Moreover, fine tuning of the CNN models with our
extended dataset cloud be considered to improve the perfor-
mance of cloud categorization.
[1] China Meteorological Administration, Specification for
Ground Meteorological Observation, cloud. China Me-
teorological Press, 2003.
[2] Antti Isosalo, Markus Turtinen, Matti Pietik¨
A Isosalo, M Turtinen, and M Pietik¨
ainen, “Cloud char-
acterization using local texture information., in Proc.
Finnish Signal Processing Symposium 2007 (FINSIG
2007). in: Proc. Finnish Signal Processing Symposium
2007 (FINSIG 2007), Oulu, Finland., 2007.
[3] Josep Calbo and Jeff Sabburg, “Feature extraction from
whole-sky ground-based images for cloud-type recogni-
tion,” Journal of Atmospheric and Oceanic Technology,
vol. 25, no. 1, pp. 3–14, 2008.
[4] Anna Heinle, Andreas Macke, and Anand Srivastav,
Automatic cloud classification of whole sky images,”
Atmospheric Measurement Techniques Discussions, vol.
3, no. 1, pp. 269–299, 2010.
[5] Lei Liu, Xuejin Sun, Feng Chen, Shijun Zhao, and
Taichang Gao, “Cloud classification based on structure
features of infrared images,” Journal of Atmospheric
and Oceanic Technology, vol. 28, no. 3, pp. 410–417,
[6] Wen Zhuo, Zhiguo Cao, and Yang Xiao, “Cloud classi-
fication of ground-based images using texture–structure
features,” Journal of Atmospheric and Oceanic Technol-
ogy, vol. 31, no. 1, pp. 79–92, 2014.
[7] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jiten-
dra Malik, “Rich feature hierarchies for accurate object
detection and semantic segmentation, arXiv preprint
arXiv:1311.2524, 2013.
[8] Jorge S´
anchez, Florent Perronnin, Thomas Mensink,
and Jakob Verbeek, “Image classification with the fisher
vector: Theory and practice,” International journal of
computer vision, vol. 105, no. 3, pp. 222–245, 2013.
[9] Mircea Cimpoi, Subhransu Maji, and Andrea Vedaldi,
“Deep convolutional filter banks for texture recognition
and segmentation, arXiv preprint arXiv:1411.6836,
[10] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey
Karayev, Jonathan Long, Ross Girshick, Sergio Guadar-
rama, and Trevor Darrell, “Caffe: Convolutional archi-
tecture for fast feature embedding, arXiv preprint arX-
iv:1408.5093, 2014.
[11] Karen Simonyan and Andrew Zisserman, “Very deep
convolutional networks for large-scale image recogni-
tion,” arXiv preprint arXiv:1409.1556, 2014.
[12] A. Vedaldi and K. Lenc, “Matconvnet – convolutional
neural networks for matlab, CoRR, vol. abs/1412.4564,
[13] Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and
Andrew Zisserman, “Return of the devil in the details:
Delving deep into convolutional nets, arXiv preprint
arXiv:1405.3531, 2014.
[14] A. Vedaldi and B. Fulkerson, “VLFeat: An open and
portable library of computer vision algorithms,” http:
//, 2008.
[15] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-
Rui Wang, and Chih-Jen Lin, “Liblinear: A li-
brary for large linear classification, The Journal
of Machine Learning Research, vol. 9, pp. 1871–
1874, 2008,
... At present, there are some works that combine deep learning with groundbased cloud type recognition. The deep models adopted include convolutional neural networks (CNN) [17,[39][40][41][42] and graph neural networks (GNN) [4,10,43]. Ye et al. [39] took the lead in introducing the deep learning model into the cloud type recognition of ground-based cloud images, and proposed an extraction method of high-level semantic features of ground cloud images using convolutional neural networks (CNN). ...
... The deep models adopted include convolutional neural networks (CNN) [17,[39][40][41][42] and graph neural networks (GNN) [4,10,43]. Ye et al. [39] took the lead in introducing the deep learning model into the cloud type recognition of ground-based cloud images, and proposed an extraction method of high-level semantic features of ground cloud images using convolutional neural networks (CNN). On this basis, they combined Fisher vector (FV) coding and SVM for cloud classification of ground-based cloud images. ...
Full-text available
Clouds have an enormous influence on the hydrological cycle, Earth’s radiation budget, and climate changes. Accurate automatic recognition of cloud shape based on ground-based cloud images is beneficial to analyze the atmospheric motion state and water vapor content, and then to predict weather trends and identify severe weather processes. Cloud type classification remains challenging due to the variable and diverse appearance of clouds. Deep learning-based methods have improved the feature extraction ability and the accuracy of cloud type classification, but face the problem of lack of labeled samples. In this paper, we proposed a novel classification approach of ground-based cloud images based on contrastive self-supervised learning (CSSL) to reduce the dependence on the number of labeled samples. First, data augmentation is applied to the input data to obtain augmented samples. Then contrastive self-supervised learning is used to pre-train the deep model with a contrastive loss and a momentum update-based optimization. After pre-training, a supervised fine-tuning procedure is adopted on labeled data to classify ground-based cloud images. Experimental results have confirmed the effectiveness of the proposed method. This study can provide inspiration and technical reference for the analysis and processing of other types of meteorological remote sensing data under the scenario of insufficient labeled samples.
... First, with statistics on trash amount, household numbers provided for each garbage collection programme were accessible. Second, in certain number of respondents, the number of homes serviced was smaller than the number of people in the town [25]. ...
... The built-in IoT can sense and sending the trash data over the Internet to the servers. The use of detectors is necessary if the garbage economic is to be sent to servers which are stored and used to monitor and start a removal process on a regular basis [25]. This means that the basket is removed long before it fills up and thus avoids the excessive waste collection when the baskets are not filled. ...
Full-text available
Waste management is one of the world's biggest challenges, either in the developed or the emerging economies. The biggest problem with pollution is that the compost heap is well flowed in public areas before the next sanitation period starts. Demographic expansion has caused the hygienic condition with regard to the waste management system to deteriorate considerably. Disposal of waste is a fundamental element in waste disposal. Gradually, the technologies of artificial intelligence (AI) gained popularity in offering different computer ways to solving intelligent waste problem. The management of misdefined issues, experiences and uncertainties and partial data were efficient for AI. Even though this work did conduct much study, very few evaluations demonstrated the influence of AI to resolve many difficulties of intelligent management of waste. Accurate evaluation of garbage amount and quality is critical to Smart waste management system development and design. However, it is a challenging task to anticipate the quantity of trash created, given the several characteristics and its variability. The framework utilized in this document is the convolution neural network, a suitable approach for estimating the waste mass.
... Among these, Ye et al. [20] employed a CNN to extract cloud map features and Fisher vector coding and an SVM classifier to classify cloud maps. By optimizing the pooled feature map, Shi et al. [21] obtained the depth features of the cloud graph for cloud identification. ...
Full-text available
Ground-based cloud images contain a wealth of cloud information and are an important part of meteorological research. However, in practice, ground cloud images must be segmented and classified to obtain the cloud volume, cloud type and cloud coverage. Existing methods ignore the relationship between cloud segmentation and classification, and usually only one of these is studied. Accordingly, our paper proposes a novel method for the joint classification and segmentation of cloud images, called CloudY-Net. Compared to the basic Y-Net framework, which extracts feature maps from the central layer, we extract feature maps from four different layers to obtain more useful information to improve the classification accuracy. These feature maps are combined to produce a feature vector to train the classifier. Additionally, the multi-head self-attention mechanism is implemented during the fusion process to enhance the information interaction among features further. A new module called Cloud Mixture-of-Experts (C-MoE) is proposed to enable the weights of each feature layer to be automatically learned by the model, thus improving the quality of the fused feature representation. Correspondingly, experiments are conducted on the open multi-modal ground-based cloud dataset (MGCD). The results demonstrate that the proposed model significantly improves the classification accuracy compared to classical networks and state-of-the-art algorithms, with classification accuracy of 88.58%. In addition, we annotate 4000 images in the MGCD for cloud segmentation and produce a cloud segmentation dataset called MGCD-Seg. Then, we obtain a 96.55 mIoU on MGCD-Seg, validating the efficacy of our method in ground-based cloud imagery segmentation and classification.
... Liu et al. [7] produced a MGCD dataset with 8000 groundbased cloud images and corresponding meteorological data, yielding an accuracy as high as 87.9% with multi-modal fusion algorithm. Ye et al. [8] introduced a CNN to extract the features of cloud images; fisher vector coding and SVM classifier are utilized for cloud images classification. Zhang et al. [9] proposed a CloudNet model and obtained a high accuracy on a self-built CCSN dataset containing 2543 cloud images. ...
Full-text available
Changes in clouds can affect the outpower of photovoltaics (PVs). Ground-based cloud images classification is an important prerequisite for PV power prediction. Due to the intra-class difference and inter-class similarity of cloud images, the classical convolutional network is obviously insufficient in distinguishing ability. In this paper, a classification method of ground-based cloud images by improved combined convolutional network is proposed. To solve the problem of sub-network overfitting caused by redundancy of pixel information, overlap pooling kernel is used to enhance the elimination effect of information redundancy in the pooling layer. A new channel attention module, ECA-WS (Efficient Channel Attention–Weight Sharing), is introduced to improve the network’s ability to express channel information. The decision fusion algorithm is employed to fuse the outputs of sub-networks with multi-scales. According to the number of cloud images in each category, different weights are applied to the fusion results, which solves the problem of network scale limitation and dataset imbalance. Experiments are carried out on the open MGCD dataset and the self-built NRELCD dataset. The results show that the proposed model has significantly improved the classification accuracy compared with the classical network and the latest algorithms.
Air quality index is use to identify how polluted the current air is and measures the level of pollution in air. Increasing AQI always been a matter of worry because of rapid increase in traffic, urbanization and pollutants. This paper aims to predict AQI of Delhi region during COVID-19 using time series modelling which is a machine learning algorithm. Time series modelling involves models to fit into collected dataset and use them to predict future values. The research is based on major pollutants like particulate matter, CO, SO, NO, NH3 and ozone. Data of the pollutants are collected from Central Pollution Control Board (CPCB), Government of India. Coefficient of determination of PM 10 is 0.95 and PM 2.5 is 0.82.
Multi-class problem for weather recognition is a challenging task in the area of computer vision and machine learning. This research work, a novel multi-view deep learning method called MMDeep, has been proposed to handle the weather recognition problem in a multi-class scenario. The proposed method obtains the multiple views (multiple distinct interpretations) from the images by using pixel-wise computer vision operations and then deploys multi-view deep learning using the same views. In this method, natural views and mathematical views are utilized to learned models, and collective performance is obtained using ensemble of results obtained from each of the views. The proposed method is compared to the baseline methods as well as other models proposed for the same task. The model produces mean accuracy of 0.9746 with a standard deviation of 0.0127, which is significantly better than all the models compared with this method. The statistical analysis of the results of proposed method also shows its effectiveness as compared to state-of-the-art methods.KeywordsWeather recognitionMulti-view deep learningEnsemble learningComputer visionMachine learning
Clouds are structures formed by ice crystals, water grains, or both that come together in the atmosphere for various reasons. Clouds have a direct impact on areas such as climate, ecological balance, and air traffic. It is now inevitable to knead the devices used to detect cloud types with artificial intelligence technologies. In this process, deep learning models have begun to be used in the detection of cloud types that are the result of meteorological events. In this study, two publicly available datasets of cloud types were used. In the proposed approach, super-resolution and semantic segmentation were applied as pre-processing steps. Then, feature sets were created using the ShuffleNet model. The binary sailfish optimization method was used for efficient feature selection and classification was performed using the linear discriminant analysis method. Overall accuracy successes of 98.56% and 100% were obtained for the two datasets used for cloud type classification. It was concluded that the approach proposed in this study is successful in cloud type detection.
Full-text available
The recently increasing development of whole sky imagers enables temporal and spatial high-resolution sky observations. One application already performed in most cases is the estimation of fractional sky cover. A distinction between different cloud types, however, is still in progress. Here, an automatic cloud classification algorithm is presented, based on a set of mainly statistical features describing the color as well as the texture of an image. The k-nearest-neighbour classifier is used due to its high performance in solving complex issues, simplicity of implementation and low computational complexity. Seven different sky conditions are distinguished: high thin clouds (cirrus and cirrostratus), high patched cumuliform clouds (cirrocumulus and altocumulus), stratocumulus clouds, low cumuliform clouds, thick clouds (cumulonimbus and nimbostratus), stratiform clouds and clear sky. Based on the Leave-One-Out Cross-Validation the algorithm achieves an accuracy of about 97%. In addition, a test run of random images is presented, still outperforming previous algorithms by yielding a success rate of about 75%, or up to 88% if only "serious" errors with respect to radiation impact are considered. Reasons for the decrement in accuracy are discussed, and ideas to further improve the classification results, especially in problematic cases, are investigated.
Full-text available
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.
Full-text available
Cloud classification of ground-based images is a challenging task. Recent research has focused on extracting discriminative image features, which are mainly divided into two categories: 1) choosing appropriate texture features and 2) constructing structure features. However, simply using texture or structure features separately may not produce a high performance for cloud classification. In this paper, an algorithm is proposed that can capture both texture and structure information from a color sky image. The algorithm comprises three main stages. First, a preprocessing color census transform (CCT) is applied. The CCT contains two steps: converting red, green, and blue (RGB) values to opponent color space and applying census transform to each component. The CCT can capture texture and local structure information. Second, a novel automatic block assignment method is proposed that can capture global rough structure information. A histogram and image statistics are computed in every block and are concatenated to form a feature vector. Third, the feature vector is fed into a trained support vector machine (SVM) classifier to obtain the cloud type. The results show that this approach outperforms other existing cloud classification methods. In addition, several different color spaces were tested and the results show that the opponent color space is most suitable for cloud classification. Another comparison experiment on classifiers shows that the SVM classifier is more accurate than the k–nearest neighbor (k-NN) and neural networks classifiers.
Conference Paper
Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.
Technical Report
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
MatConvNet is an implementation of Convolutional Neural Networks (CNNs) for MATLAB. The toolbox is designed with an emphasis on simplicity and flexibility. It exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing linear convolutions with filter banks, feature pooling, and many more. In this manner, MatConvNet allows fast prototyping of new CNN architectures; at the same time, it supports efficient computation on CPU and GPU allowing to train complex models on large datasets such as ImageNet ILSVRC. This document provides an overview of CNNs and how they are implemented in MatConvNet and gives the technical details of each computational block in the toolbox.
Research in texture recognition often concentrates on the problem of material recognition in uncluttered conditions, an assumption rarely met by applications. In this work we conduct a first study of material and describable texture at- tributes recognition in clutter, using a new dataset derived from the OpenSurface texture repository. Motivated by the challenge posed by this problem, we propose a new texture descriptor, D-CNN, obtained by Fisher Vector pooling of a Convolutional Neural Network (CNN) filter bank. D-CNN substantially improves the state-of-the-art in texture, mate- rial and scene recognition. Our approach achieves 82.3% accuracy on Flickr material dataset and 81.1% accuracy on MIT indoor scenes, providing absolute gains of more than 10% over existing approaches. D-CNN easily trans- fers across domains without requiring feature adaptation as for methods that build on the fully-connected layers of CNNs. Furthermore, D-CNN can seamlessly incorporate multi-scale information and describe regions of arbitrary shapes and sizes. Our approach is particularly suited at lo- calizing stuff categories and obtains state-of-the-art re- sults on MSRC segmentation dataset, as well as promising results on recognizing materials and surface attributes in clutter on the OpenSurfaces dataset.
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
The latest generation of Convolutional Neural Networks (CNN) have achieved impressive results in challenging benchmarks on image recognition and object detection, significantly raising the interest of the community in these methods. Nevertheless, it is still unclear how different CNN methods compare with each other and with previous state-of-the-art shallow representations such as the Bag-of-Visual-Words and the Improved Fisher Vector. This paper conducts a rigorous evaluation of these new techniques, exploring different deep architectures and comparing them on a common ground, identifying and disclosing important implementation details. We identify several useful properties of CNN-based representations, including the fact that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance. We also identify aspects of deep and shallow methods that can be successfully shared. A particularly significant one is data augmentation, which achieves a boost in performance in shallow methods analogous to that observed with CNN-based methods. Finally, we are planning to provide the configurations and code that achieve the state-of-the-art performance on the PASCAL VOC Classification challenge, along with alternative configurations trading-off performance, computation speed and compactness.