Available via license: CC BY 4.0
Content may be subject to copyright.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3014136, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 1
Abstract—This paper presents a stand-alone cloud detection
algorithm over land (CDL) for Microwave Humidity Sounder -2
(MWHS-2), which is characterized by the first operational satellite
sensor measuring 118.75 GHz. The CDL is based on the advanced
machine learning (ML) algorithm Gradient Boosting Decision
Tree (GBDT), which achieves the state-of-the-art performance on
tabular data, with high accuracy, fast training speed, great
generalization ability, and weight factor ranking of predictors (or
features). Given that the new generation weather radar of China
(CINRAD) provides improved cloud information with extensive
temporal-spatial coverage, the observations from CINRAD are
used to train the algorithm in this study. There are four groups of
radiometric information employed to evaluate the CDL: all
frequency ranges from MWHS-2 (all-algorithm), the humidity
channels near 183.31 GHz (hum-algorithm), the temperature
channels near 118.75 GHz (tem-algorithm), and the window
channels at 89 and 150 GHz (win-algorithm). It is revealed that the
tem-algorithm (around 118.75 GHz) has a superior performance
for CDL along with the optimal values of most evaluation metrics.
Although the all-algorithm uses all available frequencies, it shows
inferior ability for CDL. Followed are the win-algorithm and hum-
algorithm, and the win-algorithm performs better. The analysis
also indicates that the latitude, zenith angle, and the azimuth are
the top ranking features for all four algorithms. The presented
algorithm CDL can be applied in the quality control processes of
assimilating microwave (MW) radiances or in the retrieval of
atmospheric and surface parameters for cloud filtering.
Index Terms—MWHS-2, cloud detection, machine learning,
ground-based radar.
I. I
NTRODUCTION
N contrast to visible and infrared satellite observations,
which can only sense the radiation from the top of clouds,
microwave sounders can propagate through most non-
precipitating clouds and have a better ability to sense the cloud
particles [1]–[2]. Moreover, microwave temperature and
humidity sounders can acquire multiple channels of brightness
temperatures (BTs), providing rich information for profiling
atmospheric temperature and moisture. However, the measured
This paragraph of the first footnote will contain the date on which you
submitted your paper for review. This study was supported by the National
Natural Science Foundation of China (41590873) and National Key R&D
Program of China (2018YFC1506603). (Corresponding author: Yan Yin.)
Shuxian Liu, Yan Yin, and Zhigang Chu are with the Collaborative
Innovation Center on Forecast and Evaluation of Meteorological Disasters, Key
brightness temperature is affected by many factors, including
the water vapor, cloud and rain contamination, surface
emissivity and so on.
Over the ocean, a number of cloud detection methods for
MW observations have been developed. The amount of cloud
liquid water path (LWP) can be obtained from the two window
channels (at 23.8 GHz and 31.4 GHz) [3]–[4]. Two other
window channels of 89 GHz and 150 GHz can be used to
retrieve the cloud scattering index [5] and cloud ice water path
(IWP) [6]. Using the dual oxygen absorption bands (at 50–60
GHz and 118.75 GHz) in microwave temperature sounder
(MWTS) and microwave humidity sounder (MWHS), several
pairs of oxygen channels can be applied to compute the cloud
emission and scattering index at different height levels [7].
Buehler et al. [8] developed a cloud filter method based on the
brightness temperature differences of channels around the water
vapor line 183.31 GHz.
Nonetheless, cloud detection over land is still challenging,
because the impact of land surface on the brightness
temperature is much larger than that of clouds and the surface
emissivity changes spatially and temporally with the latitude
and seasons. Despite these difficulties, several approaches from
physically based detection methods to quite a few statistical
techniques aided by various cloud products have been proposed.
Actually, cloud detection is well suited for machine learning
(ML) techniques [9]–[13], as it is a type of classification that
involves multivariate analysis and has complex nonlinear
relationship between variables.
Therefore, quantities of ML-based cloud detection methods
have been developed on different ranges of frequencies
available onboard the MW instruments. The temperature
channels (between the 50 and 60 GHz), window channels (at
23.8, 31.4, 89, and 150 GHz), humidity channels (around
183.31 GHz) have already been employed in the cloud
classification model for Advanced Microwave Sounding Units
A (AMSU-A) and Advanced Microwave Sounding Units B
(AMSU-B), by using a Neural Network (NN) method trained
with the cloud classification products of Meteosat Second
Generation Spinning Enhanced Visible and Infrared Imager
Laboratory for Aerosol-Cloud-Precipitation of China Meteorological
Administration, Nanjing University of Information Science & Technology,
Nanjing 210044, Jiangsu, China (e-mails: liushuxian@nuist.edu.cn,
yinyan@nuist.edu.cn, chuzhigang@nuist.edu.cn).
Shuai An, is with the Smart Supply Chain Y Bu, JD.COM, Beijing 100000,
China (e-mail: anshuai1@jd.com).
CDL: A cloud detection algorithm over land for
MWHS-2 based on the Gradient Boosting
Decision Tree
Shuxian Liu, Yan Yin
*
, Zhigang Chu, Shuai An
I
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3014136, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 2
(SEVIRI) [1]. Furthermore, the window channels from 19 GHz
to 91 GHz have been proved to perform better than the humidity
channels near 183.31 GHz over land in cloud detection [14],
based on the Naïve Bayes (NB) classifier for Special Sensor
Microwave Imager/Sounder (SSMIS). Favrichon et al. [15]
uses the NN model trained on the SEVIRI cloud products, and
the analysis show that the cloud index confidence increases
with the number of channel frequencies, with the accuracy more
than 70% in detecting cloud contamination.
However, it is noted that the channels near 118.75 GHz have
not yet been evaluated or compared with the humidity and
window channels in ML-based cloud detection methods. As
previous studies indicate, the BTs at 118.75 GHz show
extremely strong dependence on cloud particles [16]. Thus, it is
imperative to incorporate the radiometric information of 118.75
GHz into the ML algorithm and assess its performance for
cloud detecting. Gradient Boosting Decision Tree (GBDT)
algorithm achieves the state-of-the-art performance on tabular
data (data size less than 1,000, more than 10,000, or even more
than 1 million) instead of image data, and the ensemble
properties of GBDT may avoid the overfitting. Radar
measurements are widely recognized as relatively accurate
labels in ML-based cloud classifier models [17]. Compared
with the cloud radar, ground-based weather radar has wider
temporal and spatial coverage, thus having obvious superiority
in collocating with polar-orbiting satellite measurements. Here,
observations from China’s new-generation weather radar
(CINRAD) are used to train the GBDT classifier.
The purpose of this study is to develop a stand-alone cloud
detection algorithm over land for MWHS-2 in order to identify
the cloudy scenes prior to assimilating the radiances into the
numerical weather prediction (NWP) models or retrieving the
atmospheric and surface parameters. And the CDL model will
be developed for multiple frequency ranges in order to compare
the results of 118.75 GHz with humidity and window channels.
This paper is organized as follows. The satellite and radar
datasets are described in section 2. The cloud detection
algorithm is presented in section 3. In section 4, the results are
presented focusing on the effects of CDL for MWHS-2. The
conclusions are summarized in section 5.
II. D
ATA
The MWHS-2 on board FY-3C is a 15-channel cross-track
scanning microwave radiometer, with eight channels in the
oxygen absorption line (118.75 GHz), five channels in the water
vapor absorption line (183.31 GHz), and two window channels
at 89 and 150 GHz. The characteristics of MWHS-2 are
illustrated in Table 1, containing the peak weighting functions
and the central frequencies for the 15 channels. As also shown
in Table 1, channels 2–9 are for temperature sounding from 20
hPa to 1000 hPa, channels 11–15 are for humidity sounding
from 450 hPa to 800 hPa, and channels 1 and 10 are two
window channels.
In this study, we select eight S-band weather radars located
in the East China from the China new-generation weather radar
S-band A-type (CINRAD-SA), with an effective distance of
~230 km (Figure 1). The quality control of radar data includes
fuzzy logic clutter filter (http://www.weather.gov/code88d/),
median filter, and reflectivity bias correction [18]–[19]. The
two-dimensional (2-D) composite reflectivity is generated by
using the Severe Weather Automatic Nowcast System (SWAN)
[20], which is developed by the China Meteorological
Administration (CMA). Then the value of composite
reflectivity whose measure time is closest to the time when
MWHS-2 passes the East China and measure range is in the
area of MWHS-2’s filed-of-view (FOV), is selected to calculate
the average value in each FOV.
TABLE
I
I
NSTRUMENT
C
HARACTERISTICS
O
F
MWHS-2.
Channel Center Frequency
(GHz)
Peak Weighting
Function (hPa) Sounding
1 89.0 surface window
2 118.75
±
0.08 20 temperature
3 118.75
±
0.2 60 temperature
4 118.75
±
0.3 100 temperature
5 118.75
±
0.8 250 temperature
6 118.75
±
1.1 300 temperature
7 118.75
±
2.5 700 temperature
8 118.75
±
3.0 surface temperature
9 118.75
±
5.0 surface temperature
10 150.0 surface window
11 183.31
±
1 450 humidity
12 183.31
±
1.8 500 humidity
13 183.31
±
3 600 humidity
14 183.31
±
4.5 700 humidity
15 183.31
±
7 800 humidity
Figure 1. Spatial distribution of radar network in East China.
Radar reflectivity factor is often used for cloud detection
[21]–[25]. In this study, the scenes are flagged as cloudy when
the radar reflectivity exceeds 5 dBZ. To assess the sensitivity of
each channel to clouds, the Probability Density Function (PDF)
of BTs are examined for clear and cloudy scenes, as shown in
Figure 2. It seems that the presence of clouds tends to decrease
the observed BTs over land. The mean value and standard
deviation in cloudy scenes are greater than that in clear scenes,
especially for the window channels 1 and 10, temperature-
sounding channels 7–9, and humidity-sounding channels 13–15.
The rather different distributions for these channels are quite
promising for detecting the cloud contamination.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3014136, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 3
Figure 2. Probability Density Function (PDF) of brightness temperature under
clear (black solid line) and cloudy (green solid line) scenes for the 15 channels
(CHs) of MWHS-2.
III. A
N
A
LGORITHM
F
OR
C
LOUD
D
ETECTION
O
VER
L
AND
(CDL)
A. Gradient Boosting Decision Tree (GBDT) algorithm
The Gradient boosting decision tree (GBDT) [26] is an
iterative decision tree algorithm and is also known as MART
(Multiple Additive Regression Tree). Due to its strong
generalization ability, the GBDT has been widely used in many
machine learning scenarios, such as click-through rate
estimation, search ranking, and commodity sales forecasting
[27]–[30].
The GBDT is based on a boosting strategy, which is a
primary method of ensemble learning [31]. It constructs a set of
weak learners (trees) and accumulates the results of multiple
decision trees as the final predicted output. Specifically, the
new decision tree learns the error residuals of all previous trees
in each iteration, thereby generating a stronger base model. The
boosting tree model can be expressed as an additive model of
the decision tree:
f
M
(
x
)=
T
(
x
;
Θ
m
)
M
m=1
,
(1)
where (;
) is the base decision tree model; x is the feature
vector;
is the parameter of decision tree; and is the
number of trees. The boosting tree model algorithm proceeds as
follows:
Initialize the first base model:
()=0
. (2)
For m=1 to M, calculate the error residual
=
−
(
),
(3)
= 1,2,…,, where is the sample size, and y is the label of
x.
Fit the error residual
to learn a regression tree and obtain
(;
). Update
()=
()+ (;
).
(4
)
In the above algorithm, the most important step is to calculate
the error residual
. For various loss functions , GBDT uses
the idea of steepest descent: that is, it uses the negative gradient
of the loss function
−,(
)
(
)()=
(),
(5)
to approximate the residuals, thus obtaining a general
framework.
LightGBM [32] is the most successful and advanced gradient
boosting framework that uses the GBDT-based learning
algorithm. It is designed to be distributed and efficient with
many advantages, such as good accuracy, fast training speed,
high efficiency, and low memory usage. In many cases,
although the performance of LightGBM is problem dependent,
this algorithm has been found to be more accurate and faster
than other GBDT tools, including the Scikit-learn
(https://scikitlearn.org/stable/modules/generated/sklearn.ense
mble.GradientBoostingClassifier.html) and the XGboost [33].
In this paper, we use LightGBM to learn the cloud classification
model.
Generally, we should tune the parameters of GBDT because
it has many parameters that may affect the performance of the
model. In this study, eight tuning parameters are used, and their
dynamic ranges are summarized in Table 2. The grid search is
used to find the optimal parameter combination, which means
the algorithm needs to be tuned iteratively 29,160 (3×3×4×3×
3×3×6×5) times, as indicated in Table 2.
TABLE
II
S
UMMARY
O
F
T
UNING
P
ARAMETERS
A
ND
T
HEIR
D
YNAMIC
R
ANGES
Parameter and dynamic range
1. max number of leaves in one tree (num_leaves) [50, 100, 150]
2. maximum depth of the tree (max_depth) [7, 10, 15]
3. max number of bins (max_bin) [100, 150, 200, 255]
4. minimal number of data in one leaf (min _leaf) [100, 150, 200]
5. fraction of features randomly selected on each tree (feature_fraction)
[0.6, 0.8, 1.0]
6. specifies the fraction of data to be used for each iteration and is generally
used to speed up the training and avoid overfitting (bagging_fraction)
[0.6, 0.8, 1.0]
7. shrinkage rate (learning_rate) [0.01, 0.02, 0.05, 0.1, 0.2, 0.5]
8. number of boosting iteration (n_estimators) [100, 200, 300, 400, 500]
B. Training and testing datasets for CDL
The training dataset in this study is 53460 for 61 days from
July to August 2016, and the testing dataset is 6571 for 7 days
in September 2016. Particularly, the optimal parameters are
estimated with 5-fold cross validation using the original
training data for the robustness. Figure 3 shows the PDFs for
training (blue solid line) and testing (red solid line) datasets
respectively. It can be seen that the PDFs of training and testing
datasets are similar with respect to the radar reflectivity (label).
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3014136, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 4
Figure 3. The Probability Density Functions of training (blue solid line) and
testing (red solid line) datasets with respect to the radar reflectivity.
We primarily use four algorithms to train the CDL prediction
model based on different frequency ranges, including the
window channels (channels at 89 and 150 GHz) algorithm (win-
algorithm), humidity channels (channels around 183.31 GHz)
algorithm (hum-algorithm), temperature channels (channels
around 118.75 GHz) algorithm (tem-algorithm), and all
frequency ranges (all-algorithm) as indicated in Table 3.
Besides the BT and BT difference (BTD), the latitude, zenith
angle, and azimuth are also considered into the algorithms.
Figure 4 displays the processing flowcharts of CDL training and
prediction.
TABLE
III
F
EATURES
F
OR
T
HE
F
OUR
A
LGORITHMS
.
Features
Win-
Hum-
Tem-
All-
algorithm
algorithm
algorithm
algorithm
BT(89.0 GHz)
BT(150.0 GHz)
BTD(150.0 − 89.0 GHz)
BT(183.31±3 GHz)
BT(183.31±4.5 GHz)
BT(183.31±7 GHz)
BTD(183.31±3 − 183.31±1 GHz)
BTD(183.31±7 − 183.31±1 GHz)
BTD(183.31±7 − 183.31±3 GHz)
BT(118.75±2.5 GHz)
BT(118.75±3.0 GHz)
BT(118.75±5.0 GHz)
BTD(118.75±3.0 − 118.75±2.5 GHz)
BTD(118.75±5.0 − 118.75±2.5 GHz)
BTD(118.75±5.0 − 118.75±3.0 GHz)
latitude
zenith angle
azimuth
Figure 4. The strategy and flowchart of the CDL algorithm.
C. Model performance metrics
The algorithms are evaluated quantitatively based on the
Accuracy, Precision, Recall, Area Under Curve (AUC), and log
loss. The Accuracy, Precision, and Recall metrics are calculated
according to the confusion matrix as shown in Figure 5.
Furthermore, the F
1
score is also calculated in order to find the
optimum balance or harmonic mean between the Precision and
Recall:
AUC is the area covered by the receiver operating
characteristic curve (ROC). The meaning of AUC is the
probability that the prediction result of positive sample is
greater than that of negative sample. Therefore, AUC represents
the capability of the classifier to sort the samples. The larger the
AUC is, the better the classification effect is.
Log loss is defined based on probability estimates. And it is
also called the logistic regression loss or cross-entropy loss.
Regarding the binary classification with a probability estimate
= ( = 1), the log loss per sample is calculated with the
negative log-likelihood of the classifier when the true label ∈
{0,1} is given:
(,)= −(|)
=−(()+(1−)(1−)).
(7)
The closer the log loss is to 0, the better the model performs.
Figure 5. Confusion matrix and evaluation metrics.
D. Optimal prediction model parameter selection
For each algorithm, the GBDT model has been tuned for
29,160 times basing on the eight parameters of num_leaves,
bagging_fraction, learning_rate, n_estimators, max_depth,
max_bin, min_leaf, feature_fraction (Table 2). And the optimal
model is decided when log loss, which is widely used as an
effective reference, gets the minimum value. Finally, the
selection of optimal parameters for the four algorithms is
indicated in Table 4.
=2∗∗
+ .
(6)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3014136, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 5
TABLE
IV
O
PTIMAL
P
REDICTION
M
ODEL
P
ARAMETERS
S
ELECTION
F
OR
T
HE
F
OUR
A
LGORITHMS
.
num_leaves
max_depth
max_bin
min_leaf
feature_fraction
bagging_fraction
learning_rate
n_estimators
Win-algorithm
100 15 100 100 0.8 0.6 0.05 300
Hum-algorithm
50 15 100 100 0.6 0.6 0.02 500
Tem-algorithm
100 15 100 100 0.6 0.6 0.02 500
All-algorithm
100 15 100 100 0.8 0.6 0.02 500
IV. R
ESULTS
A
ND
D
ISCUSSION
A. Observations of the BT and radar reflectivity
The spatial distributions of radar reflectivity and brightness
temperature observations for channel 1, channels 7–10, and
channels 13–15 have been displayed in Figure 6. It can be seen
that the spatial distribution of brightness temperature
corresponds well with the radar reflectivity. In clear scenes (0–
5 dBZ), the BTs are ~285 K for channels 1, 9–10, ~275 K for
channels 7–8, between 255–265 K for channels 13–15.
Generally, BT in the south region is ~5 K larger than that in
north region, which can be explained by the effect of latitude,
and the fact that the temperature is higher in the region closer
to the equator in summer. In cloudy regions (>5 dBZ), the
depressions of BTs increase significantly with the radar
reflectivity due to the scattering effect by the cloud
hydrometeors. And the BTs are almost as low as ~230 K when
the radar reflectivity is above 25 dBZ.
Figure 6. Spatial distributions of radar reflectivity (a) and MWHS-2 brightness
temperature observations for channel 1 (b), channel 10 (c), channels 13–15 (d–
f), and channels 7–9 (g–i) on 4 July 2016.
B. Evaluation of the CDL results for MWHS-2
The confusion matrices (Figure 7) present the results of CDL,
demonstrating for each true class (y axis) vs predicted class (x
axis), where 0 denotes the clear scene and 1 denotes the cloudy
scene. In the figure, each 2 × 2 matrix shows the overall class
statistics. The diagonal of the confusion matrix exhibits the
correctly classified numbers for clear (true positive, TP) and
cloudy (true negative, TN) scenes. It shows that the tem-
algorithm outperforms the other algorithms, with the relatively
high TP value of 4580 and TN value of 1303. The hum-
algorithm performs worst and has the wrongly classified
numbers of clear scenes (false negative, FN) value as high as
1072, which is about twice that of the other algorithms. This
may be attributed to the fact that the 183.31 GHz channels are
responsive to both water vapor and cloud hydrometeors [8, 35].
So it is difficult for hum-algorithm to explicitly distinguish the
high humidity in clear scenes or the cloud hydrometeors in
cloudy scenes. A more detailed evaluation of the four
algorithms is described below.
Figure 7. Confusion matrix of the four algorithms for the testing dataset. 0
denotes the clear scene and 1 denotes the cloudy scene. Green boxes represent
the number of pixels which are correctly classified for each of the classes, light
orange boxes represent the false classifications.
Table 5 summarizes the evaluation metrics for the CDL
prediction model, divided on the basis of win-algorithm, hum-
algorithm, tem-algorithm, and all-algorithm. Obviously, the
tem-algorithm model shows superiority over the other three
models with the maximum values of F
1
(0.93), accuracy (0.895),
and AUC (0.883). It should be noted that the performance of
the all-algorithm is suboptimal to the tem-algorithm regarding
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3014136, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 6
the evaluation metrics of F
1
, accuracy, and AUC, even though
the all-algorithm has the maximum features. This might stem
from the fact that the temperature-sounding channels around
118.75 GHz are most sensitive to cloud properties and
outperform the other channels. Whereas, the log loss of all-
algorithm has the minimum value of 0.28. In addition, the hum-
algorithm shows inferior prediction capability when compared
with the win-algorithm.
TABLE
V
E
VALUATION METRICS OF FOUR ALGORITHMS FOR THE TESTING DATASET
.
Figure 8 displays the comparison between observed BTs (O)
and simulated BTs (B) which are calculated by the Community
Radiative Transfer Model (CRTM) in clear sky scenarios for
MWHS-2. Before cloud filtering, there are many outliers
existing between O and B as indicated in Figure 8a. However,
the number of outliers has been significantly reduced after
removing the cloudy radiances identified by the CDL.
Furthermore, the linear correlation coefficient between O and B
increases from 0.86 to 0.92, and the maximum of Z-score,
which is calculated with the bi-weight mean and standard
deviation [35, 36], is reduced from 6.98 to 3.01 after the cloud
filtering.
Figure 8. The comparison between observed BTs (O) (x label) and simulated
BTs (B) (y label) before (a) and after (b) the cloud filtering by the CDL.
Additionally, B is calculated using the CRTM in clear sky scenarios.
Table 6 lists the split number of the features for the win-
algorithm, hum-algorithm, tem-algorithm, and all-algorithm
respectively. The split number can be calculated after GBDT
model fitting as a weight factor of every feature. It is apparent
that the latitude ranks the first for the win-algorithm and all-
algorithm, and the zenith angle ranks the first for hum-
algorithm and tem-algorithm. The strong latitude-dependence
may be attributed to the effect of surface emissivity and season,
which can be evidenced from Figure 6. The significant
dependence of CDL on zenith angle can be explained by the
fact that the length of optical path for a cross-track scanning
radiometer varies with the zenith angle, which is also called the
limb effect. Besides, the ranking of azimuth is also very high,
showing a significant connection to CDL.
TABLE
VI
S
PLIT
N
UMBERS
O
F
F
EATURES
I
N FOUR
A
LGORITHMS
O
F
T
HE
O
PTIMAL
CDL
M
ODEL
A
ND
T
HEIR
C
ORRESPONDING
R
ANKING
.
Features
Win-algorithm Hum-algorithm Tem-algorithm All-algorithm
Split
number
Ranking
Split
number
Ranking
Split
number
Ranking
Split
number
Ranking
BT (89.0 GHz) 10848
4 3345 8
BT (150.0 GHz) 10801
6 2629 13
BTD (150.0 − 89.0 GHz) 10651
5 3740 5
BT (183.31 ± 3 GHz) 3373 9 2972 10
BT (183.31 ± 4.5 GHz) 3473 8 2761 12
BT (183.31 ± 7 GHz) 4432 5 2559 14
BTD (183.31 ± 3 − 183.31 ± 1 GHz) 4368 6 3898 4
BTD (183.31 ± 7 − 183.31 ± 1 GHz) 3768 7 3186 9
BTD (183.31 ± 7 − 183.31 ± 3 GHz) 5943 2 3605 7
BT (118.75 ± 2.5 GHz) 1073 7 2267 18
BT (118.75 ± 3.0 GHz) 1442 6 2364 16
BT (118.75 ± 5.0 GHz) 1827 4 2315 17
BTD (118.75 ± 3.0 − 118.75 ± 2.5 GHz) 973 9 3724 6
BTD (118.75 ± 5.0 − 118.75 ± 2.5 GHz) 1034 8 2545 15
BTD (118.75 ± 5.0 − 118.75 ± 3.0 GHz) 1575 5 2910 11
latitude 12635
1 5665 4 2088 3 5903 1
zenith angle 11712
2 6586 1 2299 1 4204 3
azimuth 11053
3 5770 3 2214 2 4398 2
Apart from the latitude, zenith angle, and azimuth, the BT
and BTD are also important features for CDL. For the all-
algorithm, the ranking of split number for BTD is higher than
that for BT in general. Specifically, the BTD (183.31 ± 3 −
183.31 ± 1 GHz), BTD (150.0 − 89.0 GHz), and BTD (118.75
± 3.0 − 118.75 ± 2.5 GHz) rank 4, 5, and 6 respectively. For the
hum-algorithm, BTD (183.31 ± 3 − 183.31 ± 1 GHz) and BT at
183.31 ± 7 GHz are relatively top factors, and the BTs are of
Algorithm name F
1
Accuracy
AUC Log loss
Tem-algorithm 0.930 0.895 0.883 0.36
All-algorithm 0.928 0.893 0.881 0.28
Win-algorithm 0.918 0.878 0.861 0.32
Hum-algorithm 0.864 0.810 0.835 0.36
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3014136, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 7
minor importance in comparison with the BTDs, with the
average rankings for BTDs and BTs of 5 and 7 respectively.
Regarding the tem-algorithm, the weighting factor for BT
(118.75 ± 5.0 GHz) and BTD (118.75 ± 5.0 − 118.75 ± 3.0 GHz)
rank 4 and 5 respectively. For the win-algorithm, the BT at 89
GHz contributes more than BTD (150.0 − 89.0 GHz) and BT at
150 GHz.
V. S
UMMARY
The less sensitivity to clouds for MW radiation when
compared with the visible-infrared radiation, and the complex
and variable surface emissivity make the cloud detection over
land much more challenging than that over the ocean. Since the
assimilation of MW measurements in the operational NWP
model is independent from the other instrument measurements,
a stand-alone algorithm CDL for MWHS-2 is proposed in this
study. It is based on the GBDT algorithm and is trained on the
CINRAD observations. The CDL has been investigated by
employing the win-algorithm, hum-algorithm, tem-algorithm,
and all-algorithm. The model has been tuned iteratively 29,160
times for each algorithm to find the optimal prediction model
parameters.
The CDL estimates are reasonably comparable to the cloud
mask from CINRAD measurements. It is recommended that the
new temperature-sounding channels around 118.75 GHz are
very important for fitting a prediction model for CDL, as
significant improvements have been found for tem-algorithm
over land. Specifically, the evaluation metrics for the tem-
algorithm are optimal among the four algorithms, including the
F
1
score (0.93), accuracy (0.895), and AUC (0.883). It is
noticeable that the tem-algorithm even outperforms the all-
algorithm which has the maximum number of features, and this
is contradictory to the study by Favrichon et al. [12]. Compared
with the algorithm for window channels, algorithm for channels
around 183.31 GHz show inferior cloud property prediction
capability over land, which is consistent with the results from
Islam et al. [11]. For all four algorithms, the latitude, zenith
angle, and azimuth are top ranking features and have relatively
high split numbers. Additionally, after removing the cloudy
radiances identified by the CDL, the linear correlation
coefficient between O and B can reach 0.96 and the maximum
of Z-score is reduced to 3.01, which is promising for satellite
data assimilation in the forthcoming work.
Although the ML-based CDL algorithm is able to detect
cloud contamination with high accuracy, disadvantages of the
CDL algorithm are still evident. To some extent, the spatial and
temporal distribution of the sample in this study is limited.
What’s more, the weather radar measurements that are used as
the label in this algorithm are unable to detect the thin clouds
under non-precipitating conditions accurately. However, the
important point is that MW radiation can penetrate most non-
precipitating clouds and shows extremely weak sensitivity to
thin clouds [37]. Thus, it is not worth taking them into account
in cloud detection because this cloud information has little
impact on MW measurements [11]. Future work will focus on
multiclass and multilayered cloud classification. Overall, the
ML-based cloud detection method presented in this study can
be used in the quality control processes of assimilating the MW
radiances in NWP center and be applied for performing MW
retrievals of atmospheric and surface parameters over clouds.
The CDL algorithm is also suitable for other MW sounders,
such as FY-3D.
R
EFERENCES
[1] Aires, F., Marquisseau, F., Prigent, C., and Sèze, G, 2011:
A Land and Ocean Microwave Cloud Classification
Algorithm Derived from AMSU-A and -B, Trained Using
MSG-SEVIRI Infrared and Visible Observations. Mon.
Weather Rev., 139, 2347–2366.
[2] Han, H., Li, J., Goldberg, M., Wang, P., Li, J., Li, Z., Sohn,
B.J., Li, J., 2016: Microwave sounder cloud detection
using a collocated high resolution imager and its impact
on radiance assimilation in tropical cyclone forecasts.
Mon. Weather Rev., 144, 3937–3959.
[3] Grody, N., Zhao, J., Ferraro, R., Weng, F., Boers, R., 2001:
Determination of precipitable water and cloud liquid water
over oceans from the NOAA-15 Advanced Microwave
Sounding Unit. J. Geophys. Res., 2001, 2943–2953.
[4] Weng, F., Zhao, L., Ferraro, R, Poe, G., Li, X., Grody, N.,
2003: Advanced microwave sounding unit cloud and
precipitation algorithms. Radio Science, 38(4), 8068.
[5] Bennartz R., Thoss A., Dybbroe A., et al., 2002:
Precipitation analysis using the Advanced Microwave
Sounding Unit in support of nowcasting applications.
Meteorological Applications, 9(02):177-189.
[6] Zhao, L., Weng, F., 2002: Retrieval of ice cloud
parameters using the advanced microwave sounding unit
(AMSU), J. Appl. Meteorol., 41, 384-395.
[7] Han, Y., Zou, X., Weng, F., 2015: Cloud and precipitation
features of super typhoon Neoguri revealed from dual
oxygen absorption band sounding instruments on board
FengYun-3C satellite. Geophys. Res. Lett., 42, 916-924.
[8] Buehler, S.A., Kuvatov, M., Sreerekha, T.R., John, V.O.,
Rydberg, B., Eriksson, P., Notholt, J., 2007: A cloud
filtering method for microwave upper tropospheric
humidity measurements. Atmos. Chem. Phys., 7, 5531–
5542.
[9] Ishida H., Oishi Y., Morita K., et al., 2018: Development
of a support vector machine based cloud detection method
for MODIS with the adjustability to various conditions.
Remote Sensing of Environment, 205:390-407.
[10] Xie, F., Shi, M., Shi, Z., Yin, J., Zhao, D., 2017: Multilevel
cloud detection in remote sensing images based on deep
learning. IEEE Journal of Selected Topics in Applied
Earth Observations and Remote Sensing, 10(8), 3631-
3640.
[11] Chen, Y., Rongshuang, F., Muhammad, B., Yang X.,
Jingxue, W., Wei, L., 2018: Multilevel cloud detection for
high-resolution remote sensing imagery using multiple
convolutional neural networks. ISPRS International
Journal of Geo-Information, 7(5), 181.
[12] Chai, D., Newsam, S., Zhang, H. K., Qiu, Y., Huang, J.,
2019: Cloud and cloud shadow detection in landsat
imagery based on deep convolutional neural networks.
Remote Sensing of Environment, 225, 307-316.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3014136, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 8
[13] Chen, Y., Tang, L., Kan, Z., Latif, A., Yang X., Bilal, M.,
Li, Q., 2020: Cloud and cloud shadow detection based on
multiscale 3d-cnn for high resolution multispectral
imagery. IEEE Access, PP (99), 1-1.
[14] Islam, Tanvir, Rico-Ramirez, Miguel A, Srivastava,
Prashant K., 2015: CLOUDET: A Cloud Detection and
Estimation Algorithm for Passive Microwave Imagers and
Sounders Aided by Naïve Bayes Classifier and Multilayer
Perceptron. IEEE Journal of Selected Topics in Applied
Earth Observations & Remote Sensing, 8(9):4296-4301.
[15] Favrichon, S., Prigent, C., Jimenez, C., Aires, F., 2019:
Detecting cloud contamination in passive microwave
satellite measurements over land. Atmos. Meas. Tech., 12
(3), 1531 - 1543.
[16] Bauer, P., Mugnai, A, 2003: Precipitation profile
retrievals using temperature-sounding microwave
observations. J. Geophys. Res., 108, 4730.
[17] Islam, T., Srivastava, P K., Dai, Q., Gupta, M., 2014: Ice
cloud detection from AMSU-A, MHS, and HIRS satellite
instruments inferred by cloud profiling radar. Remote
Sensing Letters, 5(12):1012-1021.
[18] Han, J., Chu, Z., Wang, Z., Xu, D., Li, N., Kou, L., Xu, F.,
Zhu, Y., 2018: The establishment of optimal ground-based
radar datasets by comparison and correlation analyses
with space-borne radar data. Meteorological Applications,
25(1):161-170.
[19] Chu, Z., Ma, Y., Zhang, G., Wang, Z., Han, J., Kou, L., Li,
N. Mitigating Spatial Discontinuity of Multi-Radar QPE
Based on GPM/KuPR.
Hydrology, 5(3).
[20] Wu T, Wan Y, Wo W, Leng L., 2013: Design and
application of radar reflectivity quality control algorithm
in SWAN. Meteorol. Sci. Technol. 41(5): 809–817 (in
Chinese with an English abstract).
[21] Liu, S., Chu, Z., Yin, Y., Liu, R., 2019: Evaluation of
MWHS-2 Using a Co-located Ground-Based Radar
Network for Improved Model Assimilation. Remote Sens.,
11, 2338.
[22] Martner, B. E., Moran, K. P., 2001: Using cloud radar
polarization measurements to evaluate stratus cloud and
insect echoes. J. Geophys. Res., 106(D5):4891.
[23] Hong, G., Heygster, G., Miao, J.G., Kunzi, K., 2005:
Detection of tropical deep convective clouds from AMSU-
B water vapor channels measurements. J. Geophys. Res.,
110, doi:10.1029/2004JD004949.
[24] Manandhar, S., Yuan, F., Lee, Y. H., Meng, Y. S., 2016:
Weather Radar to detect and differentiate clouds from rain
events. USNC-URSI Radio Science Meeting, Fajardo.
103-104.
[25] Wang Zhe, Wang Zhenhui, Cao Xiaozhong, Tao Fa., 2018:
Comparison of cloud top heights derived from FY-2
meteorological satellites with heights derived from
ground-based millimeter wavelength cloud radar. Atmos.
Res.
[26] Friedman J H., 2001: Greedy Function Approximation: A
Gradient Boosting Machine. The Annals of Statistics,
29(5):1189-1232.
[27] Friedman, J.H., 2002. Stochastic gradient boosting.
Comput. Stat. Data Anal. 38.
[28] Richardson Matthew, Dominowska Ewa, Robert Ragno.
Predicting clicks: estimating the click-through rate for
new ads. In Proceedings of the 16th international
conference on World Wide Web, ACM. 2007, pages 521–
530.
[29] Li Ping. Robust logitboost and adaptive base class (abc)
logitboost. arXiv preprint arXiv:1203.3491, 2012. In UAI,
2010.
[30] Min Min, Jun Li, Fu Wang, Zijing Liu, W. Paul Menzel.,
2020: Retrieval of cloud top properties from advanced
geostationary satellite imager measurements based on
machine learning algorithms. Remote Sensing of
Environmet, 239, 111616.
[31] Dietterich T G. Ensemble learning. The handbook of brain
theory and neural networks, Cambridge. 2002, 2: 110-125.
[32] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W.,
Ye, Q., Liu, T.-Y., 2017 Lightgbm: A highly efficient
gradient boosting decision tree. Adv. Neur. Inf. Process.
Sys, 30, 3146-3154.
[33] Chen Tianqi, Carlos Guestrin. Xgboost: A scalable tree
boosting system. In Proceedings of the 22Nd ACM
SIGKDD International Conference on Knowledge
Discovery and Data Mining, pages 785–794. ACM, 2016.
[34] Bobryshev O, Buehler S A, John V O, et al., 2018: Is
There Really a Closure Gap Between 183.31-GHz
Satellite Passive Microwave and In Situ Radiosonde
Water Vapor Measurements? IEEE Transactions on
Geoscience & Remote Sensing, 1-7.
[35] Lanzante, J. R. 1996: Resistant, robust and nonparametric
techniques for the analysis of climate data: theory and
examples, including applications to historical radiosonde
station data, Int. J. Climatol., 16, 1197 – 1226.
[36] Zou X, Zeng Z. 2006: A quality control procedure for GPS
radio occultation data. Journal of Geophysical Research:
Atmospheres, 111(D2).
[37] Weng, F., 2007: Advances in radiative transfer modeling
in support of satellite data assimilation. J. Atmos. Sci., 64,
3799–3807.
Shuxian Liu received the B.S. degree in
atmospheric physics from the Nanjing
University of Information Science and
Technology, Nanjing, China, in 2016.
She is currently pursuing the Ph.D.
degree with the School of Atmospheric
Physics, Nanjing University of
Information Science and Technology.
Her research interests include satellite
data assimilation and cloud detection.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3014136, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 9
Dr. Yan Yin, got his Ph.D. degree from Tel
Aviv University, Israel, 1999, and worked
at University of Leeds, UK, from 1999–
2004. He also worked as a lecturer at
University of Wales, Aberystwyth, UK,
during 2004-2005. Now he is a professor in
atmospheric physics and atmospheric
environment at Nanjing University of
Information Science and Technology. His
main research works are focused on cloud
and precipitation physics, aerosol properties and their effects on
environment and climate, and weather modification. He has
published more than 250 scientific papers, including 150
published in SCI journals and about 50 in EI journals. He has
also authored/co-authored 6 books.
Zhigang Chu received the M.S. degree in
atmospheric sounding and the Ph.D.
degree in atmospheric physics from the
Nanjing University of Information Science
and Technology, Nanjing, China, in 2009
and 2013, respectively. He is currently a
Lecturer with the School of Atmospheric
Physics, Nanjing University of
Information Science and Technology. His research interests
include remote sensing for understanding and quantifying
weather and cloud/precipitation microphysics.
Shuai An received the M.S. degree in
computer science from Nankai University
in 2018. From 2018 to 2020, he worked as
an algorithm engineer in JD.COM, Beijing,
China. Now he is a Ph.D. candidate in the
School of Informatics, University of
Edinburgh, UK. His research interests
include database theory and machine
learning.