PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

In the field of environmental health, assessing air pollution exposure has historically posed challenges, primarily due to sparse ground observation networks. To overcome this limitation, satellite remote sensing of aerosols provides a valuable tool for monitoring air quality and estimating particulate matter concentration (PM) at the surface. In this study, we employ two predictive models to estimate Aerosol Optical Depth (AOD) levels over Ghana and selected localities from January 2003 to December 2019. Our investigation focuses on evaluating the capabilities of multiple linear regression (MLR) and artificial neural network (ANN) models in predicting AOD levels. Additionally, we introduce a novel approach to constructing the MLR model by leveraging the ANN architecture. These models utilize meteorological variables as input, to facilitate accurate predictions. Despite Ghana's alarming air pollution health ranking and its substantial role in mortality, routine monitoring remains sparse. This research contributes a comprehensive sixteen-year assessment (2003-2019) of AOD at a 3 km resolution, obtained from MODIS Aqua and Terra satellites. The findings indicate that the southwestern part of the country displays elevated aerosol levels compared to other major cities. This phenomenon can be attributed to biogenic emissions, given the region's dense vegetation. Additionally , many small cities within this area are recognized as hotspots for surface mining operations, potentially contributing to increased local dust loadings in the atmosphere. Notably , the MLR model, implemented using the ANN model structure, outperformed the other utilized models. This endeavor aims to unravel the spatiotemporal distribution patterns of aerosols across Ghana, and its major urban hubs.
This is a non-peer reviewed preprint submitted to Atmospheric Pollution Research,
Elsevier.
Highlights
Machine-Learning Approaches for Assessing Aerosol Optical Depth (AOD) in Ghana,
West Africa
Jesse Gilbert, Jeffrey N. A. Aryee, Mary Jessie Adjei
MODIS exhibits exceptional performance in retrieving AOD values throughout Ghana.
The research contributes a comprehensive sixteen-year assessment (2003-2019) of AOD
at a 3 km resolution, obtained from MODIS Aqua and Terra satellites.
The examination of MODIS Aqua and Terra AOD retrievals unveiled an overall lower
aerosol burden over Ghana, marked by mean AOD values hovering around 0.35.
The findings indicate that the southwestern part of the country displays elevated aerosol
levels compared to other major cities.
All machine learning models developed in the study exhibited an acceptable level of
accuracy.
Machine-Learning Approaches for Assessing Aerosol Optical Depth
(AOD) in Ghana, West Africa
Jesse Gilberta, Jeffrey N. A. Aryeea,
, Mary Jessie Adjeia
aDepartment of Meteorology and Climate Science, Kwame Nkrumah University of Science and
Technology, Kumasi, Ghana
Abstract
In the field of environmental health, assessing air pollution exposure has historically posed
challenges, primarily due to sparse ground observation networks. To overcome this limitation,
satellite remote sensing of aerosols provides a valuable tool for monitoring air quality and es-
timating particulate matter concentration (PM) at the surface. In this study, we employ two
predictive models to estimate Aerosol Optical Depth (AOD) levels over Ghana and selected
localities from January 2003 to December 2019. Our investigation focuses on evaluating the
capabilities of multiple linear regression (MLR) and artificial neural network (ANN) models in
predicting AOD levels. Additionally, we introduce a novel approach to constructing the MLR
model by leveraging the ANN architecture. These models utilize meteorological variables as
input, to facilitate accurate predictions. Despite Ghana’s alarming air pollution health ranking
and its substantial role in mortality, routine monitoring remains sparse. This research con-
tributes a comprehensive sixteen-year assessment (2003-2019) of AOD at a 3 km resolution,
obtained from MODIS Aqua and Terra satellites. The findings indicate that the southwest-
ern part of the country displays elevated aerosol levels compared to other major cities. This
phenomenon can be attributed to biogenic emissions, given the region’s dense vegetation. Ad-
ditionally, many small cities within this area are recognized as hotspots for surface mining
operations, potentially contributing to increased local dust loadings in the atmosphere. No-
tably, the MLR model, implemented using the ANN model structure, outperformed the other
utilized models. This endeavor aims to unravel the spatiotemporal distribution patterns of
aerosols across Ghana, and its major urban hubs.
Keywords: Aerosols, Machine Learning, ANN, MLR, MODIS, Ghana
Corresponding author
Email address: jeff.jay8845@gmail.com (Jeffrey N. A. Aryee)
Preprint submitted to Atmospheric Pollution Research January 7, 2024
1. Introduction
Aerosols are airborne particles suspended in the atmosphere, encompassing various compo-
sitions and sizes (Kondratyev et al., 2006; Putaud et al., 2010b). These particles emerge from
two primary sources: direct emission of primary particulate matter (PM) and secondary parti-
cle formation from gaseous precursors. Black carbon (BC), primary biological aerosol particles
(PBAPs), sea salt spray, and, mineral dust are some examples of primary aerosols. On the
contrary, secondary aerosols arise from processes of sulfate, nitrate, and ammonium formation
(Boucher et al., 2013). Aerosols directly influence the atmosphere’s energy balance through
radiation scattering and absorption (Yu et al., 2006). Furthermore, they act as nuclei for cloud
formation (Lohmann and Feichter, 2005), and indirectly impact atmospheric heat by absorb-
ing radiation, contributing to reduced low cloud cover (Johnson et al., 2004). Consequently,
aerosols impact the earth’s hydrological cycle (Creamean et al., 2013) and, to a considerable
extent, food security (Misra, 2014). Fine PM with diameters below 2.5 µm (PM2.5) have been
linked to negative health outcomes, such as immediate to chronic effects (WHO, 2013). These
include aggravated respiratory symptoms (Kato, 2018; Prieto-Parra et al., 2017; Zeng et al.,
2016), worsened asthma (Jung et al., 2017; Williams et al., 2019), heightened cardiovascular
diseases (Dabass et al., 2018; Vidale and Campana, 2018), diminished lung function (Thaller
et al. (2008)), and increased premature mortality linked to heart or lung conditions (Apte
et al., 2015; David et al., 2019). The surge in vehicle numbers and land use transformations,
largely driven by rapid urban growth in Ghana’s major cities (Ministry of Transport, 2016),
has led to escalating PM2.5concentrations. These concentrations are significantly affected by
local traffic emissions, land use practices like bush burning, and industrial discharges. Rapid
and precise assessment of the spatiotemporal distribution of PM2.5(SIERRA-VARGAS and
Teran, 2012; Cai et al., 2017) at a finer resolution can enhance the accuracy of health out-
come studies related to PM2.5 (Williams et al., 2019), especially when conducted on a local
spatial scale. However, PM2.5ground monitoring stations are usually limitedly distributed
worldwide, especially for developing countries such as Ghana and its neighbors. This may be
a result of resource constraints because establishing and maintaining a network of monitor-
ing stations requires financial investment for equipment, infrastructure, and personnel, which
may be challenging for countries with limited budgets. Furthermore, the spatio-temporal vari-
ation of PM2.5is complex, influenced by a combination of factors, including local emissions
from various sources, meteorological influences, topography, and seasonal patterns such as the
movement of the trade winds over Ghana. Long-range transport from diverse geographical
2
locations and chemical transformations further contribute to this complexity. Without con-
tinuous monitoring of PM2.5, our ability to rapidly assess, model, and forecast PM2.5levels
for Ghana, particularly over the major cities, is severely limited. As an alternative, satellite
retrievals are used since they have a wider capture and provide information in most instances
even for remote and inaccessible locations. The most relevant satellite-derived parameter for
assessing PM2.5concentration levels is Aerosol Optical Depth (AOD) (Wu et al., 2016; Mhaw-
ish et al., 2017; Shen et al., 2018; Wei et al., 2019). AOD measures the drop in light intensity
caused by aerosol scattering and absorption throughout the atmospheric column (Lee et al.,
2011). This metric directly reflects the extent of aerosol presence, providing valuable insights
into the overall concentration of optically active particles within each geographical location
(Dandou et al., 2002). In the domain of satellite-based aerosol retrievals, AOD datasets are
predominantly sourced from two key satellite observations: the Moderate Resolution Imaging
Spectroradiometer (MODIS) (Kanabkaew, 2013; Hu et al., 2013; Xu et al., 2016), and the Terra
Multi-angle Imaging SpectroRadiometer (MISR) (Liu et al., 2007; Kahn and Gaitley, 2015).
MODIS stands out, utilizing Aqua and Terra satellites. Its widespread adoption is owed to
its impressive characteristics: a wide swath width covering 2330 kilometers and near-global
coverage every 1 to 2 days (Fosu-Amankwah et al., 2021). MODIS aerosol retrievals rely on
three main algorithms: Dark Target (DT), Deep Blue (DB), and the combined Dark Target and
Deep Blue (DTB) algorithms. Further discussion on these algorithms is provided in Section
2.2.1. This present study utilized the DTB product, based on the DT and DB algorithm. AOD
has consistently demonstrated strong correlations with PM measurements (Engel-Cox et al.,
2004; Chu et al., 2003; Gupta et al., 2006). Conventional modeling approaches have predom-
inantly relied on chemical transport models (CTMs) and land-use regression (LUR) models,
for modeling PM2.5levels while incorporating AOD as a key predictor (Kloog et al., 2012;
Geng et al., 2015). Nevertheless, it is worth noting that Land Use Regression models (LURs)
have inherent limitations in capturing temporal fluctuations, while Chemical Transport Models
(CTMs) may exhibit discrepancies when employed in isolation (Danesh Yazdi et al., 2020).
Various statistical methods, spanning from simple univariate regression to complex non-linear
models have been developed for the estimation of PM (Danesh Yazdi et al., 2020; Lee et al.,
2011; Taheri Shahraiyni and Sodoudi, 2016; Li, 2020; Nabavi et al., 2019). Machine learning
(ML) methods have gained popularity in the study of aerosol dynamics and air quality fore-
casts (Bai et al., 2019; Xiao et al., 2020; Nabavi et al., 2019; Danesh Yazdi et al., 2020). These
techniques offer enhanced accuracy and flexibility while requiring smaller datasets compared to
traditional models. This shift in approach highlights the need for further research to explore the
3
potential of ML in advancing our understanding of aerosol dynamics and improving air quality
predictions. The application of advanced algorithms enables non-parametric exploration of the
intricate relationship between predictor variables and measured pollutant concentrations (Di
et al., 2016; Taghavi-Shahri et al., 2020). The current investigation focuses on understanding
aerosol dynamics in Ghana, a nation experiencing enhanced population growth and economic
expansion (Abokyi et al., 2019). Geographically situated in West Africa, Ghana is influenced
by two major trade winds, especially the northwesterly winds carrying Saharan dust. This
unique positioning exposes the region to diverse aerosol types and concentrations throughout
the year. Agriculture, particularly among young adults, plays a vital role in the economy
(Ghana Statistical Service, 2014), with activities like biomass burning contributing to aerosol
levels. Additionally, urban areas in Ghana are the major hub for economic activities and
substantial emissions from industries, transportation, construction, petroleum extraction, and
mining, further impacting aerosol concentrations (Fosu-Amankwah et al., 2021). This study
aims to explore these aerosol dynamics in Ghana’s evolving environmental landscape, offering
valuable insights for scientific research. Despite the multitude of potential aerosol emission
sources, systematic assessment of aerosols in Ghana is notably lacking in the existing scientific
literature. Hence the need to assess ML performance in forecasting AOD levels with readily
available data for our region. The current study therefore aims to provide: i) Spatio-temporal
aerosol assessment from MODIS 3km AOD Aqua and Terra products over Ghana; ii) train ML
models with meteorological variables; iii) Assess the performance and accuracy of the machine
learning model in predicting AOD at different spatial and temporal scales.
2. Dataset and Methodology
2.1. Study Area
The study was carried out in Ghana, situated on the West African Guinea Coast. Ghana
experiences a monsoonal climate, characterized by two distinct seasons, dry and wet, due to its
tropical location (Aryee et al., 2018). The West African Monsoon (WAM) significantly influ-
ences rainfall in the region, along with various convective activities triggered by the movements
of the Inter-Tropical Discontinuity (ITD), resulting in a mean annual rainfall ranging from 150
to 2500 mm. It is a component of the global monsoon system, and the shift in the trade wind
path in the lower troposphere during the WAM has traditionally been linked to the thermal
contrast between the cooler tropical Atlantic and the warmer North African continent (Sultan
and Janicot, 2003; Janicot et al., 2011; Louvet et al., 2005). The primary driving forces behind
4
these wind systems are the energy and temperature differentials between the Gulf of Guinea and
the Sahara. Along the ITD, the moist maritime tropical air mass originating from the Atlantic
Ocean converges with the dry northeast continental tropical air mass (Amekudzi et al., 2015).
The seasonal movement of the ITD gives rise to a bi-modal rainfall pattern in the southern
part of Ghana and a uni-modal pattern in the northern region (Owusu and Waylen, 2013). The
onset of the rainy season occurs when the maritime tropical air mass is significantly laden with
water vapor, typically between the second and third decades of March, with peak precipitation
recorded in June. The minor season, lasting only a few weeks, initiates in the first and second
decades of September and concludes in the second to third decades of November (Amekudzi
et al., 2015; Manzanas et al., 2014). The country experiences an annual average rainfall ranging
between 900 and 1900 mm (Baidu et al., 2017). The annual average relative humidity (RH)
varies from 77 percent to 85 percent (Williams et al., 2017; Asante and Amuakwa-Mensah,
2014; Kabo-Bah et al., 2016). Ghana is notably one of the fastest-developing countries on the
African continent (Amoatey et al., 2018). The nation boasts four key urban centers: Accra,
Kumasi, Takoradi, and Tamale. However, this rapid economic growth coupled with population
growth, industrialization, and a surge in vehicular density seems to have contributed to incre-
mental growth in air pollution in the country, particularly in these urban centers. Furthermore,
findings from Agyemang-Bonsu et al. (2010) indicate that a considerable number of vehicles
imported into Ghana are aging, receive limited maintenance, and consequently contribute to
elevated emission levels. As of September 2018, over 28,000 deaths in Ghana were linked to air
pollution (World Health Organisation, 2016; Odonkor et al., 2020). The World Health Orga-
nization (WHO) reported that Ghana’s annual average PM2.5 level in 2016 was 31.1 µg/m3,
well above the recommended guideline of 10µg/m3(World Health Organisation, 2016).
2.2. Dataset
2.2.1. MODIS
MODIS serves as the pivotal instrument aboard NASA’s Terra and Aqua satellites, launched
in December 1999 and May 2002, respectively. Terra follows an orbit from north to south
across the equator in the morning, whereas Aqua’s orbit goes from south to north over the
equator in the afternoon. With Terra MODIS and Aqua MODIS working in tandem, they
achieve a comprehensive view of the Earth’s surface approximately every one to two days. This
remarkable feat is achieved by capturing data across 36 spectral bands, spanning wavelengths
from 0.4 to 14.385 µm. The visual richness of MODIS imagery emerges with spatial resolutions
of 250m, 500m, and 1km (Kahn et al., 2009; Zhang and Reid, 2009; Shi et al., 2015; Acharya
5
and Sreekesh, 2013). Of noteworthy significance are the specific channels with wavelengths
spanning from 0.47 to 2.12µm, adeptly employed for retrieving vital aerosol characteristics.
In further detail, this instrumental suite produces daily-level aerosol optical thickness data,
expertly mapped at a global spatial resolution of 10 km ×10 km. The MODIS swath width
is 2330 km, slightly narrower than that of AVHRR. As a result, the coverage for a single
day is not complete, and any gaps from one day are filled in on the next. For an animation
illustrating the MODIS scan pattern, refer to http://aqua.nasa.gov/sites/default/files/
aqua_modis_h264.mov (last accessed: 27 July 2023). The retrieval of aerosol data through
MODIS employs three distinct algorithms, each designed for specific settings: DT and DB
algorithms are employed over land, while the DT algorithm is used over oceans (Hsu et al.,
2019). Additionally, the DTB algorithm harmonizes these main approaches, selecting the most
suitable one based on the land’s characteristics. The DT algorithm is tailored for dark surfaces
like dark soil and vegetation. Meanwhile, the advanced second-generation DB algorithm is
adept at bright surfaces such as deserts, urban areas, and vegetated regions. The retrieval
process of aerosol properties using the DT algorithm over dense vegetation and dark soil relies
on how visible wavelengths, specifically 0.47 and 0.65 µm, correlate with a shortwave of 2.12 µm
in the infra-red range (Levy et al., 2007). The present operational MODIS dataset, denoted as
C061, furnishes standard aerosol characteristics with a spatial resolution of 10 ×10 km²within
the Level 2 (L2) datasets, specifically MOD04 for Terra and MYD04 for Aqua. In aggregated
Level 3 (L3) products, this resolution is lowered to 1° × 1°. Furthermore, an additional aerosol
file based on the Dark Target (DT) approach with a resolution of 3 km is included in the
C006 dataset, which has been continued in the current C061. This enhancement serves to offer
air quality insights at local or urban scales. Detailed information about the C006 dataset is
elaborated by (Remer et al., 2013), while the progression from the DT C006 to the current
C061 is detailed from (Mattoo, 2017). It is noteworthy that the DT product may exhibit
relatively higher uncertainty, particularly when applied to bright underlying surfaces (Levy
et al., 2010). The monthly AOD dataset at a wavelength of λ= 0.55 µm for Terra at 10:30
am local time and for Aqua at 1:30 pm local time during overpass times was downloaded from
https://ladsweb.modaps.eosdis.nasa.gov/forthestudy (last accessed on 27 July 2023).
2.2.2. Input Variables
To predict the AOD from MODIS, climate variables such as temperature(t2m; K), dew-
point(K), surface net downward shortwave flux(Jm-2), surface upward longwave flux (Jm-2),
surface upward latent heat flux (Jm-2), relative humidity, boundary layer height(m), Down-
6
ward UV radiation (Jm-2), Evaporation (m), Precipitation(m), Pressure (hPa), Top net solar
radiation (Jm-2), and low cloud cover. We utilized these covariates, primarily relying on expert
knowledge and data availability. Meteorological variables are known for their influence on the
dispersion and transport of fine particulate matter. Atmospheric heat flux variables are also
known to be influenced by aerosols which can lead to negative or positive radiative forcing.
As previously mentioned, AOD is a key predictor of PM2.5, with higher levels of AOD indicat-
ing higher PM2.5levels (Wang et al., 2003; Liu et al., 2007; Van Donkelaar et al., 2006, 2010;
Gupta et al., 2006). In general, the MinMaxScaler normalizer and the normalization layer from
Keras tensorflow were used to normalize and standardize the input data. All climate variables
used in prediction were obtained from ERA5 (https://cds.climate.copernicus.eu/) with
a temporal range spanning 2003 to 2019. A total of 80% of the data was used for training and
20% was used for testing. The DT and deep blue DB combined products were used for this
study. The decision to use this product was mainly due to the highly variable topography of
our region. Here, we incorporate meteorological data to predict AOD levels, from 1st January
2003 to 31st December 2019 over the entire country and some selected locales.
Table 1 represents the statistical distribution of the input variables for the model.
Table 1: Descriptive Statistics of Covariates Used.
Variables Count Mean Standard Deviation Minimum 25% 50% 75% Maximum
t2m 204 303.02 1.73 299.75 301.30 303.296 304.38 307.33
d2m 204 295.24 2.087 281.60 294.92 295.79 296.40 297.54
RH 204 62.28 8.97 27.45 56.89 64.79 69.63 73.60
SH 204 -164.16 62.11 -342.88 -212.80 -136.27 -112.29 -81.22
LH 204 -230.03 48.33 -353.90 -263.17 -228.46 -193.69 -78.18
DS 204 542.92 95.05 333.19 456.26 558.37 616.44 745.10
UL 204 -63.65 22.82 -140.17 -76.46 -56.27 -46.11 -34.08
BLH 204 597.63 109.32 410.93 514.22 579.73 673.09 940.14
AQUA 204 0.47 0.27 0.15 0.28 0.39 0.55 1.50
TERRA 204 0.53 0.26 0.16 0.34 0.46 0.61 1.62
2.2.3. Data Preprocessing
The issue of missing (NaN) values is a prominent challenge commonly faced in the analysis
of satellite-borne data products. In this study, we employed a methodology outlined by Fosu-
Amankwah et al. (2021) to ensure a cohesive analysis of Aerosol Optical Depth (AOD) datasets.
This involved partitioning the country into smaller grids (0.04° × 0.04°) and employing an av-
eraging technique on pixel values (a minimum of three) within these grids to address missing
data instances. The practice of averaging data within reduced grids is a conventional approach
7
in spatial analysis, particularly in the context of remote sensing data and other geospatial
datasets characterized by substantial missing or NaN values. In this research, the entire coun-
try and specific major localities were considered, with AOD datasets from AQUA and TERRA
serving as dependent variables, while other satellite-borne data products, including temper-
ature, dewpoint, surface upward longwave flux, surface upward latent heat flux, surface net
downward shortwave flux, relative humidity, boundary layer height, Downward UV radiation,
Evaporation, Precipitation, Pressure, Top net solar radiation, and low cloud cover, constituted
the independent variables. Given the inherent variation in spatial resolution among the diverse
data products utilized in this study, all data products not conforming to the 1 km ×1 km
grid were standardized through bilinear interpolation. The bilinear interpolation method was
chosen primarily for its versatility, as it strikes a balance between computational efficiency and
accuracy (Arif and Akbar, 2005). This makes it suitable for a broad spectrum of applications
where smooth interpolation between gridded data points is essential. Additionally, the tem-
poral resolution of independent variables was adjusted to align with AOD datasets from Aqua
and Terra, ensuring consistency.
2.3. Model Development
While performing the data preprocessing, it became evident that the MODIS dataset con-
tained a substantial number of missing values (NaN), which further compounded the scarcity of
ground-level observations for PM2.5. To bridge this disparity, the integration of various machine
learning algorithms emerged as a requisite approach to facilitate the comprehensive evaluation,
continuous monitoring, and precise forecasting of Aerosol Optical Depth (AOD) values. These
prognostic outcomes have multifaceted utility, including but not limited to data imputation,
model parameterization, and broader assessments about regional air quality. One of the main
goals of this study was to use ML to assess Aerosol Optical Depth (AOD) in certain areas of
Ghana. As the no free lunch (NFL) theorem states; there is no single algorithm that performs
best for all possible problems. Hence, we employed two well-known machine learning methods
that are often used to predict AOD based on previous research.
2.3.1. Multiple Linear Regression(MLR)
The MLR model is a well-known ML algorithm utilized in establishing the linear relation-
ships among numerous independent variables and a continuous dependent variable. Prior to
constructing the MLR model, an extensive review of existing literature was undertaken to as-
similate and implement the most effective methodologies outlined in prior research studies.
Notably, the MLR model, unlike certain other machine learning algorithms, does not involve
8
a lot of hyperparameter tuning. However, akin to various machine learning techniques, proper
normalization of covariates holds paramount importance in ensuring optimal model perfor-
mance. Consequently, the MinMaxScaler normalizer was employed in this study to standardize
our covariates. This choice was motivated by the method’s widespread adoption in numerous
studies, predominantly due to its efficiency and computational expediency, making it a popular
choice in the scientific community. The mathematical representation of the MLR model is given
in equation 1 as:
y=β0+β1x1+β2x2+. . . +βpxp+ϵ(1)
where:
y: Dependent variable
x1, x2, . . . , xp: Independent variables
β0, β1, β2, . . . , βp: Regression coefficients
ϵ: Error term
2.3.2. Multi Linear Regression(MLR) Using ANN
Before constructing the neural network using the optimal parameters determined through
the hyperparameter tuning process, we employed the Artificial Neural Network (ANN) model
for conducting a multi-linear regression analysis. This was executed by defining a single in-
put layer and an output layer while excluding any hidden layers during model construction.
The fundamental principle of linear regression is to capture linear associations between the
dependent variable and independent variables. By structuring the ANN model with solely an
input and output layer, it effectively operates akin to a linear regression model. This is mainly
because one of the fundamental differences between a linear regression model and a neural
network, even with just input and output layers, is the activation function. Linear regression
models directly output a weighted sum of inputs without applying any non-linear transforma-
tion. In contrast, neural networks typically use activation functions, even in the hidden layers,
to introduce non-linearity. This non-linearity allows neural networks to discern complex pat-
terns and relationships in the data. For this model, the Adam optimizer was utilized with a
specific learning rate set at 0.01. A batch size of 12 and a total of 89 epochs were employed
during the training process. It is noteworthy that these hyperparameters were tuned using the
trial-and-error approach.
9
2.3.3. Artificial Neural Network(ANN)
ANNs are computational models inspired by the intricate neural networks in living organ-
isms. They are renowned for their adept utilization of the backpropagation error technique,
a pivotal method for training these networks. This technique revolves around iteratively fine-
tuning the network’s weights to minimize the discrepancy between anticipated and actual out-
puts. This intricate process encompasses several essential steps. In the forward pass, input data
traverses through the network layer by layer. Neurons calculate weighted sums of inputs, sub-
sequently processed by activation functions to yield neuron outputs. The distinction between
predicted and target outputs is computed via a designated error or loss function, like the Mean
Squared Error (MSE) for regression or Cross-Entropy for classification. Calculated derivatives
guide the adjustment of weights and biases within the network using optimization algorithms,
often Gradient Descent or its variants. This endeavor strives to discern weight adjustments
that curtail the error. The iterative repetition of this process over multiple epochs refines
weights, progressively diminishing errors. Convergence towards an optimal weight configura-
tion, minimizing the error function, characterizes the network’s learning trajectory. Influential
hyperparameters like activation functions and learning rates can be selected through manual
methods, involving trial and error, or alternative strategies such as grid search, random search,
or Bayesian optimization. In this study, grid search was employed to identify optimal param-
eters, as detailed in Table 2. The training of the model involved the application of the Adam
optimizer, primarily chosen mainly due to its fast convergence. Additionally, the MSE was
employed as the loss function during the training process. The mathematical representation of
the MLPN is given in equation 2 as:
yj=f N
X
i=1
xji ·wj i +bj!(2)
where:
yjis the output of node j,
fis the activation function,
xji represents the input from node ito node j,
wji denotes the weight associated with the connection between node iand node j,
bjstands for the bias of node j.
10
Figure 1: Artificial Neural Network Structure.
Figure 1 depicts the architecture of the ANN model, commonly referred to as the Multilayer
Perceptron Network (MLPN). This model falls within the category of Feed-forward backprop-
agation Neural Networks (FFNN), characterized by its composition of multiple hidden layers
and input/output layers that enable the seamless flow of data throughout the network (Bedi
et al., 2020).
2.3.4. Hyper-Parameter Tuning
While machine learning methods eliminate the need for certain distribution assumptions,
they introduce another factor known as hyper-parameters, which essentially act as guiding
settings during the learning process. To fine-tune these hyper-parameters, we conducted a
grid search, evaluating model performance based on mean square error and cross-validation R2
values (indicating prediction accuracy). For our neural network component, we took further
steps to enhance its performance. Specifically, we optimized the architecture by selecting three
hidden layers and adjusting the number of neurons within them. Moreover, we made necessary
refinements such as determining the optimal number of learning iterations (epochs), choosing
an appropriate activation function for the neurons, the batch size, dropout rate and fine-tuning
the speed at which the model adapts its learning rate. All of these measures were implemented
to yield the most effective outcomes from our model. This whole tuning process was done using
Keras tuner. Table 2 depicts the result of the tuning process.
Table 2: Best hyperparameters from Keras tuner.
Hidden Layer Neurons Learning Rate Activation Function Dropout Rate Batch size
1 71 0.001 ReLU 0.2 16
2 65 0.001 ReLU 0.2 16
3 10 0.001 ReLU 0.2 16
11
2.4. Validation Metrics
The control parameters of the models were initially chosen and then adjusted through trials
to achieve the most optimal fitness measures. To assess the effectiveness of the proposed
models, four statistical indicators were employed: root mean square error (RMSE), coefficient
of determination (R2), mean absolute error (MAE), and the Kling-Gupta Efficiency (KGE).
2.4.1. RMSE
RMSE quantifies the standard deviation of the disparities between predicted and observed
values, known as residuals. A diminished RMSE signifies a model with reduced errors. The
formula for RMSE is depicted in equation 3 as:
RMSE =v
u
u
t
1
n
n
X
i=1
(prediobservedi)2(3)
2.4.2. MAE
The MAE signifies the average of the absolute differences between the predicted and ob-
served values in the dataset. A smaller MAE value indicates a more accurate model, as it
reflects reduced errors. The formula for MAE is depicted in equation 4 as:
MAE =1
n
n
X
i=1
|observedipredi|(4)
2.4.3. R2
The R2indicates the fraction of the variance in the dependent variable that can be predicted
by the independent variables in the model. It ranges from 0 to 1, where 1 indicates a perfect fit.
A higher R-squared value suggests that the model explains a larger portion of the variability
in the data. The formula for R2is depicted in equation 5 as:
R2= 1 Pn
i=1(observedipredi)2
Pn
i=1(observedimean of observed)2(5)
2.4.4. KGE
The KGE is a metric used to evaluate model performance. It combines correlation, mean
ratio, and variability ratio. A perfect match between observed and simulated data yields a
KGE value of 1, indicating optimal model performance. The formula for KGE is depicted in
equation 6 as:
KGE = 1 q(r1)2+ (β1)2+ (γ1)2(6)
12
Where:
ris the correlation coefficient between observed and predicted values,
βis the ratio of the standard deviation of predicted values to the standard deviation of
observed values,
γis the ratio of the mean of predicted values to the mean of observed values.
3. Results and Discussion
3.1. AOD distribution over Ghana
The representation of aerosol levels across Ghana’s climatology is depicted in Figures 2a and
2b. These visuals are derived from AOD measurements at a wavelength of 550 nm, collected
over sixteen years (2003-2019) using MODIS. Figures 2a and 2b showcase the geographical
distribution of AOD values across the nation, providing a clear demonstration of MODIS’s ef-
fectiveness in capturing AOD throughout Ghana. The average AOD values for Aqua and Terra
over Ghana were 0.470 (±0.206 sd) and 0.528 (±0.208 sd), respectively. When compared to
its Aqua equivalent, the Terra DTB algorithm produced results with higher mean AOD. Dis-
crepancies in aerosol statistics between Terra and Aqua, despite their shared data processing
techniques, can be attributed to disparities in their orbital characteristics and the timing of
satellite overpasses. Terra follows a descending orbital trajectory, traversing the equator in a
southward direction around 10:30 am local solar time. Conversely, Aqua ascends northward
and undertakes its data collection roughly at 1:30 pm local time. The relatively short temporal
interval of approximately 3 hours between the overpasses of the Aqua and Terra satellites offers
a unique opportunity to merge data from both sources. This collaborative approach holds the
potential to mitigate data losses, primarily attributed to cloud-related issues. These variations
in overpass timings, shaped by the distinct orbital trajectories of the satellites, introduce the
potential for disparate aerosol statistics to emerge. These differences may arise due to the
influence of diurnal aerosol or cloud cycles, contributing to distinct sampling outcomes (Levy
et al., 2018). Additionally, the Terra satellite, with its morning overpass, is more inclined to
encounter aerosols characterized by notable hygroscopic growth during its observational jour-
ney. This phenomenon can be attributed to the heightened humidity levels typically prevalent
in tropical regions during the morning hours (Moradi et al., 2016). Several studies (Tsai et al.,
2011; Wang et al., 2010; Zhang et al., 2016; Engel-Cox et al., 2006) have highlighted the signif-
icant influence of ambient RH on the correlation between satellite AOD and PM. Furthermore,
13
the Terra satellite’s morning orbit provides an opportunity to intercept a dense aerosol air mass
generated during the morning ”rush hour” period. This air mass includes emissions such as soot
from wood fires, frequently used for domestic and commercial food preparation in the region, as
well as exhaust emissions from older vehicles. These aerosol particles may have been retained
within the stable nighttime and morning atmospheric conditions (Fosu-Amankwah et al., 2021).
The difference we observed in Aqua and Terra AOD retrievals in our study was approximately
6%. This difference, although noteworthy, contrasts the findings of Levy et al. (2018) and Fosu-
Amankwah et al. (2021), who independently reported a more substantial statistical difference
of around 13% in global Aqua and Terra AOD retrievals. It is important to highlight that
the discrepancy we observed, which is nearly half of their reported difference, may be linked
to the fact that we integrated the DTB comprising the DT and DB products, as opposed to
their utilization of stand-alone products. Our observations reveal a concentration of elevated
AODs in the southwestern region of the country, while sporadically elevated AOD fluctuations
manifest along the mid and southeastern boundaries. These findings align with findings from
Aklesso et al. (2018) and Fosu-Amankwah et al. (2021). Aklesso et al. (2018) attributed their
findings to the geographical characteristics of the southernmost regions of southern West Africa.
They pointed out that these areas are predominantly characterized by low elevations. They
observed that the multi-year averaged AOD550 (AOD at 550 nm) over this region exhibits an
upward trend as the elevation decreases. This phenomenon can be explained by the influence
of high terrain, which can either impede or modify wind directions. Such alterations in wind
patterns disrupt the horizontal dispersion of pollutants (Ma et al., 2016; Ning et al., 2018),
consequently leading to the diminishing in pollutant concentration levels. Furthermore, Fosu-
Amankwah et al. (2021) provided additional insights into the factors influencing the elevated
AOD values observed in southwestern Ghana. These heightened AOD values are postulated
to arise from various sources, including the presence of sea salt spray suspended within the
atmospheric column and aerosols originating from specific source regions. These components
not only directly contribute to localized dust accumulations but also exert a significant in-
fluence on overall aerosol loadings. In contrast, the comparatively higher AODs detected in
the middle and eastern sectors of Ghana are likely linked to anthropogenic activities. These
activities encompass emissions of both fine and coarse PM, primarily associated with surface
mining operations. Furthermore, these elevated AOD levels may result from emissions of BC
originating from biomass combustion and aerosols transported from distant source regions. The
complex interplay of these factors, coupled with the unique regional topography, contributes to
the observed AOD patterns. Of particular note is the presence of the Akuapim-Togo mountain
14
(a) (b)
Figure 2: Average AOD distribution over Ghana for Aqua (a) and Terra (b) retrievals resolution over Ghana
spanning 2003-2019
range within the eastern corridors of Ghana. This geographical feature significantly influences
the dispersion and containment of aerosols within the region. Consequently, the topographical
characteristics of the elevated terrain likely play a crucial role in contributing to the relatively
higher AOD levels observed in the eastern sector of the country. These observations underscore
the intricate interrelationships between geographical features, human activities, and aerosol
dynamics in shaping regional aerosol distribution patterns.
3.2. Feature Importance
According to Mayer and Gr´of (2021) and Markovics and Mayer (2022), the process of fea-
ture selection stands out as a crucial phase in machine learning, possibly even more important
than the model selection itself. Understanding the individual contribution of each covariate
to the model’s performance is indispensable. The inclusion of irrelevant variables and highly
correlated variables, often referred to as multi-collinearity, can significantly impair a model’s
effectiveness. In this study, we employed both correlation analysis and the intrinsic feature
importance attribute derived from the random forest (RF) algorithm. These methods were in-
strumental in our quest to identify the most optimal features for our models. Figure 3 provides a
comprehensive visualization of the contributions made by various covariates for both the AQUA
and TERRA datasets, obtained through the RF feature importance property. For TERRA,
the most influential feature was the downward shortwave flux, contributing approximately 50%,
followed by the latent heat flux, which accounted for around 12% of the variance. Notably, the
other features contributed less than 10% individually. Conversely, in the case of AQUA, the
15
Figure 3: Relative contribution (%) of individual covariates on model performance.
downward shortwave flux remained paramount, contributing about 45%. Here, the influence
of latent heat flux decreased significantly, from 12% to 8%. Intriguingly, the contribution of
sensible heat flux rose notably from 8% to 17%, making it the second most influential feature
for predicting AQUA. The shift between sensible and latent heat flux concerning AQUA and
TERRA can be ascribed to the variance in satellite overpass times. For TERRA, the overpass
transpires in the morning when the sun has yet to reach its zenith. Overnight, surface temper-
atures tend to decrease, causing any moisture present on surfaces to undergo a phase change,
releasing latent heat. This latent heat energy profoundly influences the aerosols detected by
the satellite during its morning overpass. Conversely, AQUA’s overpass occurs in the afternoon
when the sun shines brightly. By this time, the sun has sufficiently warmed the surface, causing
the adjacent air to heat up. Consequently, this warm air ascends, carrying heat energy with it,
thereby influencing the aerosols detected by the satellite. Figure 4 illustrates the correlation
matrix, a fundamental analytical tool employed in this study. The correlation matrix serves a
dual purpose: firstly, it aids in identifying covariates that exhibit strong correlations with the
target variable. Secondly, it facilitates the detection of variables displaying high intercorrela-
tions, thereby assisting in mitigating the challenge of multicollinearity. Through the elimination
of highly correlated and redundant variables, the selected set comprises the following key pa-
rameters: downward shortwave flux, latent heat flux, sensible heat flux, evaporation, pressure,
and temperature.
16
3.3. Spatial Patterns of the Predicted AOD Levels
To evaluate the models’ predictive abilities of the spatiotemporal distribution of AOD at a
1km resolution over the region, an array of validation and performance metrics were employed,
as previously detailed in section 2.4. Machine learning models were applied across the entirety of
the region. Figures 5a and 5b illustrate the diverse metrics utilized for evaluating the predictive
capabilities of TERRA and AQUA data, respectively. In Fig 5a, the first row showcases RMSE
values. Minimal RMSE values were observed for nearly the entire country, particularly in
the northern and middle sectors, ranging from 0.025 to 0.100. Notably, all models exhibited
suboptimal performance in the southwestern sector of the country, with RMSE values nearing
0.200. Similarly, the second row, representing Mean Absolute Error (MAE) values, displayed
a comparable bias. Models performed exceptionally well in the northern and middle sectors,
exhibiting MAE values ranging from 0.100 to 0.175. Conversely, in the southwestern part of the
country, all models demonstrated elevated MAE values, approaching 0.300. Moreover, the third
row, which depicts the R2, both MLR and ANN(MLR) models surpassed the traditional ANN
model with hidden layers, yielding R2values ranging from 0.50 to 0.85 across the entire country.
Notably, in the southwestern part of the country, both the traditional ANN and ANN(MLR)
models outperformed the MLR model. Finally, the last row illustrates KGE values for the
models. It is evident from the figure that both the traditional ANN and ANN(MLR) models
outperformed the MLR model, specifically in the middle and southern sectors of the country,
where KGE values ranged from 0.50 to 0.8. Notably, the MLR model exhibited comparatively
low KGE values in the southwestern part of the country compared to the other models.
In Fig. 5b, the first row depicts RMSE values for various models’ predictions of AOD
from the AQUA satellite across the study area. Among these models, the ANN(MLR) model
demonstrated exceptional performance, displaying RMSE values ranging from 0.025 to 0.125
nationwide. Conversely, both the traditional ANN and MLR models exhibited comparatively
inferior performance, specifically in the middle and southern sectors of the country, where RMSE
values ranged from 0.125 to 0.175. In the second row, denoting MAE, all models performed
remarkably well in the northwestern part of the country, displaying MAE values ranging from
0.125 to 0.200. In the middle and southern sectors, ANN demonstrated superior performance,
closely followed by ANN(MLR). MLR, however, displayed the lowest performance in these
sectors, with values ranging from 0.225 to 0.300. Moving to the third row representing the R2,
the MLR model exhibited superior performance nationwide, closely followed by ANN(MLR).
The traditional ANN model showcased notable proficiency, particularly in the northwestern
and eastern parts of the middle sector, where R2values ranged from 0.5 to 0.75. Conversely,
17
Figure 4: Correlation Matrix.
(a) (b)
Figure 5: Error Metrics for the Predicted Spatial Distribution of AOD levels for Terra (a) and Aqua (b) retrievals
over Ghana spanning 2003-2019
18
in the western part of the middle sector extending towards the southern region, both MLR
and ANN(MLR) models displayed superior performance. In the final row, representing KGE
values, all models demonstrated high KGE values, especially in the northern and southern
sectors, ranging from 0.50 to 0.85. However, within the middle sector, the ANN(MLR) model
displayed superior performance, closely followed by the traditional ANN model. MLR yielded
lower KGE values in this sector, ranging from 0.1 to 0.25.
On average, the models demonstrated exceptional predictive accuracy for AQUA and TERRA
data across much of the country, particularly in the northern and middle regions. However,
challenges arose in the southern sector, particularly in the southwestern and specific middle
regions. This difficulty can be attributed to the exclusive utilization of meteorological variables
as input data in our models. The southwestern region, as detailed by Fosu-Amankwah et al.
(2021), is characterized by substantial vegetation cover. Under low to moderate temperatures,
dense vegetation could potentially serve as a significant source of biogenic aerosols, as noted by
Charlson et al. (1992). Consequently, this phenomenon could influence aerosol optical depths,
especially observed by TERRA during its morning overpass. Additionally, the presence of dense
vegetation in the area could lead to challenges related to illegal timber cutting which occurs
mostly during nighttime, directly contributing to local dust loadings and plant debris. These
factors, in turn, impact aerosol loading. Unfortunately, the absence of readily available ground
observation data prevented the incorporation of such meaningful covariates into our models.
Consequently, our models were limited in their ability to efficiently learn from these complex
environmental variables, underscoring the importance of integrating comprehensive datasets
for more accurate and nuanced predictions.
3.4. ML aerosol assessments over selected locales.
Within the framework of our machine learning evaluations, our attention was directed to-
ward Accra and Takoradi, two significant urban hubs within Ghana’s landscape. Accra, being
the nation’s capital, boasts a substantial population of approximately 3 million (Rain et al.,
2011; Ghana Statistical Service, 2019). Its demographic expansion is notably rapid, given its
status as a primary recipient of migrants compared to other regions (Ghana Statistical Service,
2019). In contrast, Takoradi, often referred to as the oil city, is a coastal metropolitan area
situated about 280 km west of Accra, positioned within the southwestern expanse of Ghana.
Upon analyzing the AOD datasets across major Ghanaian cities, our analysis revealed that
these two metropolitan regions exhibited the highest mean aerosol burden. Specifically, Accra
demonstrated readings of 0.56 for TERRA and 0.48 for AQUA, while Takoradi displayed values
19
of 0.56 for TERRA and 0.51 for AQUA. The escalated aerosol levels in Accra can be attributed
to rapidly increasing population figures, extensive industrial and economic activities, and a high
volume of vehicular traffic, especially during morning rush hours, leading to significant vehicu-
lar emissions. Conversely, in Takoradi, heightened aerosol burdens could result from emissions
originating from offshore oil rigs and gas industries. The transport of these emissions over
the city is influenced by factors such as wind direction, speed, and proximity to the coastline.
This influential observation enabled our research focus: to assess the predictive capabilities
of the employed machine learning models in simulating and predicting fluctuations in aerosol
burden levels within these locales. Our objective was to critically evaluate the performance
of these models under such critical circumstances. Figure 6 presents a comparative analysis
of observed and simulated AOD levels over Accra and Takoradi, employing machine learning
models developed within this study.
In the case of Accra (AQUA) represented in the first row, the ANN(MLR) model demon-
strated superior performance, exhibiting an RMSE value of 0.18, MAE value of 0.203, R2value
of 0.75, and a KGE value of 0.72. Following closely was the MLR model, outperforming the
ANN model. Notably, the ANN model displayed noteworthy skill in terms of MAE values,
boasting the lowest MAE value of 0.163. Refer to Table 3 for a comprehensive summary of
the utilized metrics. In the context of Accra (TERRA), all models demonstrated commendable
performance. However, considering KGE values, ANN(MLR) surpassed both the MLR and
ANN models. Notably, the MLR model exhibited the lowest KGE value at 0.45, compared
to the impressive 0.72 for ANN(MLR) and 0.69 for ANN. Moving to Takoradi (AQUA), the
ANN(MLR) model exhibited exceptional proficiency, yielding an RMSE value of 0.110, MAE
value of 0.122, R2value of 0.72, and the highest recorded KGE value among the models at 0.76.
Subsequently, the MLR model and the ANN model followed suit. Finally, examining Takoradi
(TERRA) in the last row, all models showcased commendable performance in simulating AOD
levels over the region. Yet, the ANN(MLR) model demonstrated superior accuracy, closely
trailed by the MLR model and then the ANN model. Specifically, the ANN(MLR) model dis-
played an RMSE value of 0.173, MAE value of 0.273, R2value of 0.67, and a KGE value of
0.63. Notably, the MLR model recorded the lowest KGE value, standing at 0.31. A detailed
summary of the various metrics employed in this analysis is presented in Table 3.
4. Conclusions
The study delineates a comprehensive spatio-temporal assessment of aerosol distribution
across Ghana and two of its prominent cities using MODIS AOD data at a spatial resolution of
20
Figure 6: Comparison between predicted and observed variations in AOD levels over selected locales.
Table 3: Summary of the Various Metrics employed over selected locales
Results
Location Metrics MLR ANN(MLR) ANN
Accra(AQUA) RMSE 0.12 0.08 0.12
MAE 0.23 0.20 0.16
R20.73 0.75 0.43
KGE 0.69 0.72 0.67
Accra(TERRA) RMSE 0.08 0.09 0.10
MAE 0.18 0.18 0.19
R20.76 0.72 0.21
KGE 0.45 0.72 0.69
Tarkoradi(AQUA) RMSE 0.18 0.11 0.17
MAE 0.24 0.12 0.21
R20.74 0.72 0.25
KGE 0.61 0.76 0.67
Tarkoradi(TERRA) RMSE 0.19 0.17 0.19
MAE 0.28 0.27 0.27
R20.21 0.67 0.24
KGE 0.31 0.63 0.60
21
1 km. This investigation spans a sixteen-year period (2003–2019) and delves into the intricate
patterns of aerosol distribution and concentration. AOD is a key predictor of PM2.5, with higher
levels of AOD indicating higher PM2.5levels, a relationship underscored by several referenced
studies.
Guided by the principle of the No Free Lunch (NFL) theorem, which underscores the ab-
sence of a universally optimal algorithm for all problems, we undertook an evaluation of the
performance of two distinct machine learning algorithms in predicting AOD values over the en-
tire country and some selected locales, specifically, Accra and Takoradi. The selection of these
regions is grounded in their status as major cities in Ghana, characterized by the highest mean
aerosol burden in comparison to other significant urban centers. To the best of our knowledge,
this study is the first to use machine learning models to perform AOD assessments over the
country. The following conclusions are drawn:
1. The analysis of spatio-temporal aerosol distribution revealed noteworthy insights. The
examination of MODIS Aqua and Terra AOD retrievals unveiled an overall lower aerosol
burden over Ghana, marked by mean AOD values hovering around 0.35. Moreover, the
retrieval patterns demonstrated a subtle variance of approximately 0.06 between the mean
Terra and Aqua AODs. Our research reveals distinctive patterns in aerosol concentra-
tion across various regions within our study area. The southwestern part of the country
consistently exhibits elevated aerosol loadings, while the northern, eastern coastal areas
and some parts of the middle sector, consistently display lower aerosol concentrations.
This recurring observation aligns with prior studies undertaken in the same geographical
vicinity (Aklesso et al., 2018; Fosu-Amankwah et al., 2021) and can be ascribed to various
factors. The presence of dense vegetation in the southwestern region likely contributes
to this aerosol distribution pattern, possibly linked to increased biogenic emissions. Ad-
ditionally, Fosu-Amankwah et al. (2021) suggested that elevated aerosol loadings in the
southwestern sector of the country result from a combination of factors, including the
complex dynamics of sea salt spray deposition from oceanic bubble eruptions and emis-
sions from the petroleum and gas sectors along the western coast. The proximity of these
coastal phenomena to the southwestern region significantly contributes to higher aerosol
concentrations. Aklesso et al. (2018) also attributed their findings to geographical char-
acteristics in southern West Africa, where low elevations prevail and elevated terrain can
alter wind patterns, affecting pollutant dispersion.
2. Utilizing a comprehensive range of validation metrics for assessing model performance, we
22
can confidently conclude all models developed in the study exhibited an acceptable level
of accuracy. However, the MLR Regression utilizing the ANN architecture ANN(MLR),
exhibited superior predictive capabilities compared to both the original MLR and the
standard Keras-tuned ANN model.
3. The sub-optimal performance of the standard ANN model aligns with previous research
(He et al., 2016; Srivastava and Singh, 2015), indicating that standard feed-forward neu-
ral networks might encounter issues related to saturation and reduced accuracy as the
number of hidden layers increases, as observed in various similar studies. This modest per-
formance of the ANN model might be attributed to inherent limitations. Specifically, two
prominent limitations include overfitting, where the model becomes too closely tailored
to training data, leading to poorer generalization on unseen data, and the requirement of
substantial data, which often necessitates large datasets to achieve meaningful learning.
Other noteworthy limitations include the local minimum trap and the exploding gradient
problems. More information on these limitations can be found in (Shang and Wah, 1996).
4. In the selected major cities, the presence of elevated pollution levels, predominantly at-
tributed to biogenic and anthropogenic emissions, is a defining characteristic. However,
our analysis of MODIS data preprocessing revealed a widespread occurrence of NaN
(Not a Number) values. This challenge, coupled with limited ground-level observations,
severely constrains our ability to systematically monitor and forecast air quality in these
urban centers and the broader region. Leveraging machine learning models presents a
promising solution to mitigate this substantial constraint. These models offer an effective
approach for monitoring and predicting air quality using readily accessible data sources,
while maintaining an acceptable level of accuracy, thereby overcoming the limitations
posed by data gaps and enabling more robust environmental assessments.
5. For future studies, it’s worth noting that the datasets we used in our research have a
relatively basic level of detail in terms of spatial resolution. To improve the precision of
upcoming studies, it would be beneficial to consider datasets from other data sources with
finer spatial resolutions. These more detailed datasets would allow for quicker and more
thorough assessments of aerosol distributions and related factors, providing a more com-
prehensive understanding of the subject matter. Also, our analysis primarily concentrated
on meteorological variables as input data for the models. However, it is recommended
that future investigations consider the integration of additional variables, such as land
use characteristics, population growth patterns, vehicular emissions, agricultural residue
burning, domestic waste burning, industrial/biogenic emissions, the density of transporta-
23
tion hubs, and daily traffic counts. These supplementary variables can provide a more
holistic understanding of the factors influencing aerosol distribution and facilitate more
comprehensive predictive models.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal
relationships that could have appeared to influence the work reported in this paper.
Funding
This research received no external funding
References
Abokyi, E., Appiah-Konadu, P., Abokyi, F., and Oteng-Abayie, E. F. (2019). Industrial growth
and emissions of co2 in ghana: the role of financial development and fossil fuel consumption.
Energy Reports, 5:1339–1353.
Acharya, P. and Sreekesh, S. (2013). Seasonal variability in aerosol optical depth over india:
a spatio-temporal analysis using the modis aerosol product. International journal of remote
sensing, 34(13):4832–4849.
Agyemang-Bonsu, K., Dontwi, I., Tutu-Benefoh, D. a., Bentil, D., Boateng, O., Asuobonteng,
K., and Agyemang, W. (2010). Traffic-data driven modelling of vehicular emissions using
copert iii in ghana: A case study of kumasi.
Aklesso, M., Kumar, K. R., Bu, L., and Boiyo, R. (2018). Analysis of spatial-temporal hetero-
geneity in remotely sensed aerosol properties observed during 2005–2015 over three countries
along the gulf of guinea coast in southern west africa. Atmospheric Environment, 182:313–
324.
Amekudzi, L. K., Yamba, E. I., Preko, K., Asare, E. O., Aryee, J., Baidu, M., and Codjoe, S. N.
(2015). Variabilities in rainfall onset, cessation and length of rainy season for the various
agro-ecological zones of ghana. Climate, 3(2):416–434.
Amoatey, P., Omidvarborna, H., and Baawain, M. (2018). The modeling and health risk
assessment of pm2. 5 from tema oil refinery. Human and Ecological Risk Assessment: An
International Journal, 24(5):1181–1196.
24
Apte, J. S., Marshall, J. D., Cohen, A. J., and Brauer, M. (2015). Addressing global mortality
from ambient pm2. 5. Environmental science & technology, 49(13):8057–8066.
Arif, F. and Akbar, M. (2005). Resampling air borne sensed data using bilinear interpolation
algorithm. In IEEE International Conference on Mechatronics, 2005. ICM’05., pages 62–65.
IEEE.
Aryee, J., Amekudzi, L., Quansah, E., Klutse, N., Atiah, W., and Yorke, C. (2018). Develop-
ment of high spatial resolution rainfall data for ghana. International Journal of Climatology,
38(3):1201–1215.
Asante, F. A. and Amuakwa-Mensah, F. (2014). Climate change and variability in ghana:
Stocktaking. Climate, 3(1):78–101.
Bai, Y., Zeng, B., Li, C., and Zhang, J. (2019). An ensemble long short-term memory neural
network for hourly pm2. 5 concentration forecasting. Chemosphere, 222:286–294.
Baidu, M., Amekudzi, L. K., Aryee, J. N., and Annor, T. (2017). Assessment of long-term
spatio-temporal rainfall variability over ghana using wavelet analysis. Climate, 5(2):30.
Bedi, S., Samal, A., Ray, C., and Snow, D. (2020). Comparative evaluation of machine learning
models for groundwater quality assessment. Environmental Monitoring and Assessment,
192:1–23.
Boucher, O., Randall, D., Artaxo, P., Bretherton, C., Feingold, G., Forster, P., Kerminen, V.-
M., Kondo, Y., Liao, H., Lohmann, U., et al. (2013). Clouds and aerosols. In Climate change
2013: The physical science basis. Contribution of working group I to the fifth assessment re-
port of the intergovernmental panel on climate change, pages 571–657. Cambridge University
Press.
Cai, S., Wang, Y., Zhao, B., Wang, S., Chang, X., and Hao, J. (2017). The impact of the “air
pollution prevention and control action plan” on pm2. 5 concentrations in jing-jin-ji region
during 2012–2020. Science of the Total Environment, 580:197–209.
Charlson, R. J., Schwartz, S., Hales, J., Cess, R. D., Coakley Jr, J., Hansen, J., and Hofmann,
D. (1992). Climate forcing by anthropogenic aerosols. Science, 255(5043):423–430.
Chu, D., Kaufman, Y., Ichoku, C., Remer, L., Tanr´e, D., and Holben, B. (2002). Validation of
modis aerosol optical depth retrieval over land. Geophysical research letters, 29(12):MOD2–1.
25
Chu, D. A., Kaufman, Y., Zibordi, G., Chern, J., Mao, J., Li, C., and Holben, B. (2003). Global
monitoring of air pollution over land from the earth observing system-terra moderate reso-
lution imaging spectroradiometer (modis). Journal of Geophysical Research: Atmospheres,
108(D21).
Creamean, J. M., Suski, K. J., Rosenfeld, D., Cazorla, A., DeMott, P. J., Sullivan, R. C., White,
A. B., Ralph, F. M., Minnis, P., Comstock, J. M., et al. (2013). Dust and biological aerosols
from the sahara and asia influence precipitation in the western us. science, 339(6127):1572–
1578.
Dabass, A., Talbott, E. O., Rager, J. R., Marsh, G. M., Venkat, A., Holguin, F., and Sharma,
R. K. (2018). Systemic inflammatory markers associated with cardiovascular disease and
acute and chronic exposure to fine particulate matter air pollution (pm2. 5) among us nhanes
adults with metabolic syndrome. Environmental research, 161:485–491.
Dandou, A., Bosioli, E., Tombrou, M., Sifakis, N., Paronis, D., Soulakellis, N., and Sarigiannis,
D. (2002). The importance of mixing height in characterising pollution levels from aerosol
optical thickness derived by satellite. Water, Air and Soil Pollution: Focus, 2:17–28.
Danesh Yazdi, M., Kuang, Z., Dimakopoulou, K., Barratt, B., Suel, E., Amini, H., Lyapustin,
A., Katsouyanni, K., and Schwartz, J. (2020). Predicting fine particulate matter (pm2. 5)
in the greater london area: An ensemble approach using machine learning methods. Remote
Sensing, 12(6):914.
David, L. M., Ravishankara, A., Kodros, J. K., Pierce, J. R., Venkataraman, C., and Sadavarte,
P. (2019). Premature mortality due to pm2. 5 over india: Effect of atmospheric transport
and anthropogenic emissions. GeoHealth, 3(1):2–10.
De Longueville, F., Hountondji, Y.-C., Henry, S., and Ozer, P. (2010). What do we know
about effects of desert dust on air quality and human health in west africa compared to other
regions? Science of the total environment, 409(1):1–8.
Di, Q., Koutrakis, P., and Schwartz, J. (2016). A hybrid prediction model for pm2. 5 mass
and components using a chemical transport model and land use regression. Atmospheric
environment, 131:390–399.
Engel-Cox, J. A., Hoff, R. M., Rogers, R., Dimmick, F., Rush, A. C., Szykman, J. J., Al-Saadi,
J., Chu, D. A., and Zell, E. R. (2006). Integrating lidar and satellite optical depth with am-
26
bient monitoring for 3-dimensional particulate characterization. Atmospheric Environment,
40(40):8056–8067.
Engel-Cox, J. A., Holloman, C. H., Coutant, B. W., and Hoff, R. M. (2004). Qualitative and
quantitative evaluation of modis satellite sensor data for regional and urban scale air quality.
Atmospheric environment, 38(16):2495–2509.
Fosu-Amankwah, K., Bessardon, G. E., Quansah, E., Amekudzi, L. K., Brooks, B. J., and
Damoah, R. (2021). Assessment of aerosol burden over ghana. Scientific African, 14:e00971.
Garc´ıa-Pando, C. P., Stanton, M. C., Diggle, P. J., Trzaska, S., Miller, R. L., Perlwitz, J. P.,
Baldasano, J. M., Cuevas, E., Ceccato, P., Yaka, P., et al. (2014). Soil dust aerosols and wind
as predictors of seasonal meningitis incidence in niger. Environmental health perspectives,
122(7):679–686.
Geng, G., Zhang, Q., Martin, R. V., van Donkelaar, A., Huo, H., Che, H., Lin, J., and He,
K. (2015). Estimating long-term pm2. 5 concentrations in china using satellite-based aerosol
optical depth and a chemical transport model. Remote sensing of Environment, 166:262–270.
Ghana Statistical Service (2014). Ghana Statistical Service. https://www.statsghana.gov.
gh. Accessed: October 27, 2023.
Ghana Statistical Service (2019). Ghana Statistical Service. https://www.statsghana.gov.
gh. Accessed: October 27, 2023.
Gupta, P., Christopher, S. A., Box, M. A., and Box, G. P. (2007). Multi year satellite remote
sensing of particulate matter air quality over sydney, australia. International Journal of
Remote Sensing, 28(20):4483–4498.
Gupta, P., Christopher, S. A., Wang, J., Gehrig, R., Lee, Y., and Kumar, N. (2006). Satellite re-
mote sensing of particulate matter and air quality assessment over global cities. Atmospheric
Environment, 40(30):5880–5892.
He, J., Yu, Y., Xie, Y., Mao, H., Wu, L., Liu, N., and Zhao, S. (2016). Numerical model-based
artificial neural network model and its application for quantifying impact factors of urban
air quality. Water, Air, & Soil Pollution, 227:1–16.
Hinds, W. C. and Zhu, Y. (2022). Aerosol technology: properties, behavior, and measurement
of airborne particles. John Wiley & Sons.
27
Hsu, N., Lee, J., Sayer, A., Kim, W., Bettenhausen, C., and Tsay, S.-C. (2019). Viirs deep
blue aerosol products over land: Extending the eos long-term aerosol data records. Journal
of Geophysical Research: Atmospheres, 124(7):4026–4053.
Hu, X., Waller, L. A., Al-Hamdan, M. Z., Crosson, W. L., Estes Jr, M. G., Estes, S. M.,
Quattrochi, D. A., Sarnat, J. A., and Liu, Y. (2013). Estimating ground-level pm2. 5 con-
centrations in the southeastern us using geographically weighted regression. Environmental
research, 121:1–10.
Janicot, S., Caniaux, G., Chauvin, F., de Co¨etlogon, G., Fontaine, B., Hall, N., Kiladis, G.,
Lafore, J.-P., Lavaysse, C., Lavender, S. L., et al. (2011). Intraseasonal variability of the west
african monsoon. Atmospheric Science Letters, 12(1):58–66.
Johnson, B., Shine, K., and Forster, P. (2004). The semi-direct aerosol effect: Impact of
absorbing aerosols on marine stratocumulus. Quarterly Journal of the Royal Meteorological
Society, 130(599):1407–1422.
Jung, C.-R., Young, L.-H., Hsu, H.-T., Lin, M.-Y., Chen, Y.-C., Hwang, B.-F., and Tsai, P.-J.
(2017). Pm2. 5 components and outpatient visits for asthma: A time-stratified case-crossover
study in a suburban area. Environmental Pollution, 231:1085–1092.
Kabo-Bah, A. T., Diji, C. J., Nokoe, K., Mulugetta, Y., Obeng-Ofori, D., and Akpoti, K.
(2016). Multiyear rainfall and temperature trends in the volta river basin and their potential
impact on hydropower generation in ghana. Climate, 4(4):49.
Kahn, R. A. and Gaitley, B. J. (2015). An analysis of global aerosol type as retrieved by misr.
Journal of Geophysical Research: Atmospheres, 120(9):4248–4281.
Kahn, R. A., Nelson, D. L., Garay, M. J., Levy, R. C., Bull, M. A., Diner, D. J., Martonchik,
J. V., Paradise, S. R., Hansen, E. G., and Remer, L. A. (2009). Misr aerosol product
attributes and statistical comparisons with modis. IEEE Transactions on Geoscience and
Remote Sensing, 47(12):4095–4114.
Kanabkaew, T. (2013). Prediction of hourly particulate matter concentrations in chiangmai,
thailand using modis aerosol optical depth and ground-based meteorological data. Environ-
mentAsia, 6(2).
Kato, Y. (2018). Application of dust and pm2. 5 detection methods using modis data to the
asian dust events which aggravated respiratory symptoms in western japan in may 2011.
28
In Remote Sensing of the Atmosphere, Clouds, and Precipitation VII, volume 10776, pages
144–159. SPIE.
Kloog, I., Nordio, F., Coull, B. A., and Schwartz, J. (2012). Incorporating local land use
regression and satellite aerosol optical depth in a hybrid model of spatiotemporal pm2. 5
exposures in the mid-atlantic states. Environmental science & technology, 46(21):11913–
11921.
Kondratyev, K. Y., Ivlev, L. S., Krapivin, V. F., and Varostos, C. A. (2006). Atmospheric
aerosol properties: formation, processes and impacts. Springer Science & Business Media.
Lee, H., Liu, Y., Coull, B., Schwartz, J., and Koutrakis, P. (2011). A novel calibration approach
of modis aod data to predict pm 2.5 concentrations. Atmospheric Chemistry and Physics,
11(15):7991–8002.
Levy, R., Mattoo, S., Munchak, L., Remer, L., Sayer, A., Patadia, F., and Hsu, N. (2013).
The collection 6 modis aerosol products over land and ocean. Atmospheric Measurement
Techniques, 6(11):2989–3034.
Levy, R., Remer, L., Kleidman, R., Mattoo, S., Ichoku, C., Kahn, R., and Eck, T. (2010). Global
evaluation of the collection 5 modis dark-target aerosol products over land. Atmospheric
Chemistry and Physics, 10(21):10399–10420.
Levy, R. C., Mattoo, S., Sawyer, V., Shi, Y., Colarco, P. R., Lyapustin, A. I., Wang, Y., and
Remer, L. A. (2018). Exploring systematic offsets between aerosol products from the two
modis sensors. Atmospheric Measurement Techniques, 11(7):4073–4092.
Levy, R. C., Remer, L. A., Mattoo, S., Vermote, E. F., and Kaufman, Y. J. (2007). Second-
generation operational algorithm: Retrieval of aerosol properties over land from inversion of
moderate resolution imaging spectroradiometer spectral reflectance. Journal of Geophysical
Research: Atmospheres, 112(D13).
Li, L. (2020). A robust deep learning approach for spatiotemporal estimation of satellite aod
and pm2. 5. Remote Sensing, 12(2):264.
Liu, Y., Franklin, M., Kahn, R., and Koutrakis, P. (2007). Using aerosol optical thickness to
predict ground-level pm2. 5 concentrations in the st. louis area: A comparison between misr
and modis. Remote sensing of Environment, 107(1-2):33–44.
29
Lohmann, U. and Feichter, J. (2005). Global indirect aerosol effects: a review. Atmospheric
Chemistry and Physics, 5(3):715–737.
Louvet, S., Fontaine, B., and Roucou, P. (2005). Active phase and pauses of the west african
monsoon and associated atmospheric dynamics. In European Geoscience Union (EGU).
Ma, Z., Hu, X., Sayer, A. M., Levy, R., Zhang, Q., Xue, Y., Tong, S., Bi, J., Huang, L.,
and Liu, Y. (2016). Satellite-based spatiotemporal trends in pm2. 5 concentrations: China,
2004–2013. Environmental health perspectives, 124(2):184–192.
Manzanas, R., Amekudzi, L., Preko, K., Herrera, S., and Guti´errez, J. M. (2014). Precipitation
variability and trends in ghana: An intercomparison of observational and reanalysis products.
Climatic change, 124:805–819.
Markovics, D. and Mayer, M. J. (2022). Comparison of machine learning methods for photo-
voltaic power forecasting based on numerical weather prediction. Renewable and Sustainable
Energy Reviews, 161:112364.
Mattoo, S. (2017). Aerosol dark target (10 km & 3 km) collection 6.1 changes.
Mayer, M. J. and Gr´of, G. (2021). Extensive comparison of physical models for photovoltaic
power forecasting. Applied Energy, 283:116239.
Mhawish, A., Banerjee, T., Broday, D. M., Misra, A., and Tripathi, S. N. (2017). Evaluation
of modis collection 6 aerosol retrieval algorithms over indo-gangetic plain: Implications of
aerosols types and mass loading. Remote sensing of environment, 201:297–313.
Ministry of Transport (2016). Ministry of Transport. https://wedocs.unep.org. Accessed:
November 29, 2023.
Misra, A. K. (2014). Climate change and challenges of water and food security. International
Journal of Sustainable Built Environment, 3(1):153–165.
Moore, D., Jerrett, M., Mack, W., and K¨unzli, N. (2007). A land use regression model for
predicting ambient fine particulate matter across los angeles, ca. Journal of Environmental
Monitoring, 9(3):246–252.
Moradi, I., Arkin, P., Ferraro, R., Eriksson, P., and Fetzer, E. (2016). Diurnal variation
of tropospheric relative humidity in tropical regions. Atmospheric Chemistry and Physics,
16(11):6913–6929.
30
Nabavi, S. O., Haimberger, L., and Abbasi, E. (2019). Assessing pm2. 5 concentrations in
tehran, iran, from space using maiac, deep blue, and dark target aod and machine learning
algorithms. Atmospheric Pollution Research, 10(3):889–903.
Ning, G., Wang, S., Ma, M., Ni, C., Shang, Z., Wang, J., and Li, J. (2018). Characteristics
of air pollution in different zones of sichuan basin, china. Science of the Total Environment,
612:975–984.
Odonkor, S. T., Mahami, T., et al. (2020). Knowledge, attitudes, and perceptions of air
pollution in accra, ghana: a critical survey. Journal of environmental and public health,
2020.
Owusu, K. and Waylen, P. R. (2013). The changing rainy season climatology of mid-ghana.
Theoretical and applied climatology, 112:419–430.
Prieto-Parra, L., Yohannessen, K., Brea, C., Vidal, D., Ubilla, C. A., and Ruiz-Rudolph, P.
(2017). Air pollution, pm2. 5 composition, source factors, and respiratory symptoms in
asthmatic and nonasthmatic children in santiago, chile. Environment international, 101:190–
200.
Putaud, J.-P., Van Dingenen, R., Alastuey, A., Bauer, H., Birmili, W., Cyrys, J., Flentje, H.,
Fuzzi, S., Gehrig, R., Hansson, H., Harrison, R., Herrmann, H., Hitzenberger, R., uglin, C.,
Jones, A., Kasper-Giebl, A., Kiss, G., Kousa, A., Kuhlbusch, T., oschau, G., Maenhaut,
W., Molnar, A., Moreno, T., Pekkanen, J., Perrino, C., Pitz, M., Puxbaum, H., Querol,
X., Rodriguez, S., Salma, I., Schwarz, J., Smolik, J., Schneider, J., Spindler, G., ten Brink,
H., Tursic, J., Viana, M., Wiedensohler, A., and Raes, F. (2010a). A european aerosol
phenomenology 3: Physical and chemical characteristics of particulate matter from 60
rural, urban, and kerbside sites across europe. Atmospheric Environment, 44(10):1308–1320.
Putaud, J.-P., Van Dingenen, R., Alastuey, A., Bauer, H., Birmili, W., Cyrys, J., Flentje, H.,
Fuzzi, S., Gehrig, R., Hansson, H.-C., et al. (2010b). A european aerosol phenomenology–3:
Physical and chemical characteristics of particulate matter from 60 rural, urban, and kerbside
sites across europe. Atmospheric Environment, 44(10):1308–1320.
Rain, D., Engstrom, R., Ludlow, C., and Antos, S. (2011). Accra ghana: A city vulnerable to
flooding and drought-induced migration. Case study prepared for cities and climate Change:
Global Report on Human Settlements, 2011:1–21.
31
Reeves, C., Formenti, P., Afif, C., Ancellet, G., Atti´e, J.-L., Bechara, J., Borbon, A., Cairo,
F., Coe, H., Crumeyrolle, S., et al. (2010). Chemical and aerosol characterisation of the
troposphere over west africa during the monsoon period as part of amma. Atmospheric
Chemistry and Physics, 10(16):7575–7601.
Remer, L., Mattoo, S., Levy, R., and Munchak, L. (2013). Modis 3 km aerosol product:
algorithm and global perspective. Atmospheric Measurement Techniques, 6(7):1829–1844.
Shang, Y. and Wah, B. W. (1996). Global optimization for neural network training. Computer,
29(3):45–54.
Shen, X., Bilal, M., Qiu, Z., Sun, D., Wang, S., and Zhu, W. (2018). Validation of modis c6
dark target aerosol products at 3 km and 10 km spatial resolutions over the china seas and
the eastern indian ocean. Remote Sensing, 10(4):573.
Shi, K., Zhang, Y., Zhu, G., Liu, X., Zhou, Y., Xu, H., Qin, B., Liu, G., and Li, Y. (2015).
Long-term remote monitoring of total suspended matter concentration in lake taihu using
250 m modis-aqua data. Remote Sensing of Environment, 164:43–56.
SIERRA-VARGAS, M. P. and Teran, L. M. (2012). Air pollution: Impact and prevention.
Respirology, 17(7):1031–1038.
Srivastava, D. and Singh, R. M. (2015). Groundwater system modeling for simultaneous identifi-
cation of pollution sources and parameters with uncertainty characterization. Water resources
management, 29:4607–4627.
Stafoggia, M., Zauli-Sajani, S., Pey, J., Samoli, E., Alessandrini, E., Basaga˜na, X., Cernigliaro,
A., Chiusolo, M., Demaria, M., ıaz, J., et al. (2016). Desert dust outbreaks in southern
europe: contribution to daily pm10 concentrations and short-term associations with mortality
and hospital admissions. Environmental health perspectives, 124(4):413–419.
Sultan, B. and Janicot, S. (2003). The west african monsoon dynamics. part ii: The “preonset”
and “onset” of the summer monsoon. Journal of climate, 16(21):3407–3427.
Sunnu, A., Afeti, G., and Resch, F. (2008). A long-term experimental study of the saharan
dust presence in west africa. Atmospheric Research, 87(1):13–26.
Taghavi-Shahri, S. M., Fass`o, A., Mahaki, B., and Amini, H. (2020). Concurrent spatiotem-
poral daily land use regression modeling and missing data imputation of fine particulate
32
matter using distributed space-time expectation maximization. Atmospheric environment,
224:117202.
Taheri Shahraiyni, H. and Sodoudi, S. (2016). Statistical modeling approaches for pm10 pre-
diction in urban areas; a review of 21st-century studies. Atmosphere, 7(2):15.
Thaller, E. I., Petronella, S. A., Hochman, D., Howard, S., Chhikara, R. S., and Brooks, E. G.
(2008). Moderate increases in ambient pm 2.5 and ozone are associated with lung function
decreases in beach lifeguards. Journal of occupational and environmental medicine, pages
202–211.
Tsai, T.-C., Jeng, Y.-J., Chu, D. A., Chen, J.-P., and Chang, S.-C. (2011). Analysis of the
relationship between modis aerosol optical depth and particulate matter from 2006 to 2008.
Atmospheric Environment, 45(27):4777–4788.
Van Donkelaar, A., Martin, R. V., Brauer, M., Kahn, R., Levy, R., Verduzco, C., and Vil-
leneuve, P. J. (2010). Global estimates of ambient fine particulate matter concentrations
from satellite-based aerosol optical depth: development and application. Environmental
health perspectives, 118(6):847–855.
Van Donkelaar, A., Martin, R. V., and Park, R. J. (2006). Estimating ground-level pm2. 5
using aerosol optical depth determined from satellite remote sensing. Journal of Geophysical
Research: Atmospheres, 111(D21).
Vidale, S. and Campana, C. (2018). Ambient air pollution and cardiovascular diseases: From
bench to bedside. European Journal of preventive cardiology, 25(8):818–825.
Wang, J., Liu, X., Christopher, S. A., Reid, J. S., Reid, E., and Maring, H. (2003). The effects
of non-sphericity on geostationary satellite retrievals of dust aerosols. Geophysical Research
Letters, 30(24).
Wang, Z., Chen, L., Tao, J., Zhang, Y., and Su, L. (2010). Satellite-based estimation of regional
particulate matter (pm) in beijing using vertical-and-rh correcting method. Remote sensing
of environment, 114(1):50–63.
Wei, J., Peng, Y., Guo, J., and Sun, L. (2019). Performance of modis collection 6.1 level 3
aerosol products in spatial-temporal variations over land. Atmospheric Environment, 206:30–
44.
33
WHO (2013). Health effects of particulate matter: Policy implications for countries in eastern
europe, caucasus and central asia. World Health Organization.
Williams, A. P., Abatzoglou, J. T., Gershunov, A., Guzman-Morales, J., Bishop, D. A., Balch,
J. K., and Lettenmaier, D. P. (2019). Observed impacts of anthropogenic climate change on
wildfire in california. Earth’s Future, 7(8):892–910.
Williams, P. A., Crespo, O., Atkinson, C. J., and Essegbey, G. O. (2017). Impact of climate
variability on pineapple production in ghana. Agriculture & Food Security, 6(1):1–14.
World Health Organisation (2016). World Health Statistics 2016. https://www.who.int.
Accessed: November 29, 2023.
Wu, Y., de Graaf, M., and Menenti, M. (2016). Improved modis dark target aerosol optical
depth algorithm over land: Angular effect correction. Atmospheric Measurement Techniques,
9(11):5575–5589.
Xiao, F., Yang, M., Fan, H., Fan, G., and Al-Qaness, M. A. (2020). An improved deep learning
model for predicting daily pm2. 5 concentration. Scientific reports, 10(1):20988.
Xu, J., Jiang, H., Xiao, Z., Wang, B., Wu, J., and Lv, X. (2016). Estimating air particulate
matter using modis data and analyzing its spatial and temporal pattern over the yangtze
delta region. Sustainability, 8(9):932.
Yu, H., Kaufman, Y., Chin, M., Feingold, G., Remer, L., Anderson, T., Balkanski, Y., Bellouin,
N., Boucher, O., Christopher, S., et al. (2006). A review of measurement-based assessments of
the aerosol direct radiative effect and forcing. Atmospheric Chemistry and Physics, 6(3):613–
666.
Yung, J. A., Fuseini, H., and Newcomb, D. C. (2018). Hormones, sex, and asthma. Annals of
Allergy, Asthma & Immunology, 120(5):488–494.
Zeng, X., Xu, X., Zheng, X., Reponen, T., Chen, A., and Huo, X. (2016). Heavy metals in pm2.
5 and in blood, and children’s respiratory symptoms and asthma from an e-waste recycling
area. Environmental pollution, 210:346–353.
Zhang, J. and Reid, J. S. (2009). An analysis of clear sky and contextual biases using an
operational over ocean modis aerosol product. Geophysical Research Letters, 36(15).
34
Zhang, T., Gong, W., Zhu, Z., Sun, K., Huang, Y., and Ji, Y. (2016). Semi-physical estimates
of national-scale pm10 concentrations in china using a satellite-based geographically weighted
regression model. Atmosphere, 7(7):88.
35
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The increase of the worldwide installed photovoltaic (PV) capacity and the intermittent nature of the solar resource highlights the importance of power forecasting for the grid integration of the technology. This study compares 24 machine learning models for deterministic day-ahead power forecasting based on numerical weather predictions (NWP), tested for two-year-long 15-min resolution datasets of 16 PV plants in Hungary. The effects of the predictor selection and the benefits of the hyperparameter tuning are also evaluated. The results show that the two most accurate models are kernel ridge regression and multilayer perceptron with an up to 44.6% forecast skill score over persistence. Supplementing the basic NWP data with Sun position angles and statistically processed irradiance values as the inputs of the learning models results in a 13.1% decrease of the root mean square error (RMSE), which underlines the importance of the predictor selection. The hyperparameter tuning is essential to exploit the full potential of the models, especially for the less robust models, which are prone to under or overfitting without proper tuning. The overall best forecasts have a 13.9% lower RMSE compared to the baseline scenario of using linear regression. Moreover, the power forecasts based on only daily average irradiance forecasts and the Sun position angles have only a 1.5% higher RMSE than the best scenario, which demonstrates the effectiveness of machine learning even for limited data availability. The results of this paper can support both researchers and practitioners in constructing the best data-driven techniques for NWP-based PV power forecasting.
Article
Full-text available
Although air pollution in Ghana is ranked number one in environmental health threats to public health and sixth to cause of deaths, routine monitoring is rare. This paper presents fourteen years (2005-2018) assessment of aerosol optical depth (AOD) at 3 km resolution from MODIS Aqua and Terra satellites to ascertain the Spatio-temporal and seasonal distribution of aerosols over Ghana and its major cities. The MODIS AOD at 3 km were validated against ground-based Aerosol Robotic Network (AERONET) AODs to ascertain the suitability of the MODIS 3 km data for air quality application in the region. The contribution of distant aerosols to city aerosol loadings was also assessed with Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) backscatter model. A moderate-high aerosol burden (AODs ∼ 0.50) was observed over Ghana with a significant contribution from the pre-monsoon season. City centres of Takoradi and Kumasi showed higher aerosol loads (AODs ∼ 0.80) than Accra and Tamale. The HYSPLIT model showed that distant or transported aerosol sources to the city centres were of both marine and land generated origins. Linear regression analysis between MODIS AOD and AERONET AOD showed a reasonably good correlation of ∼ 0.60 for Aqua and Terra. From the validation analysis, both Aqua and Terra satellites can be used for air quality monitoring over Ghana; however, more ground research must be conducted to ascertain better aerosol model assumptions for the region.
Article
Full-text available
Forecasting the power production of grid-connected photovoltaic (PV) power plants is essential for both the profitability and the prospects of the technology. Physically inspired modelling represents a common approach in calculating the expected power output from numerical weather prediction data. The model selection has a high effect on physical PV power forecasting accuracy, as the difference between the most and least accurate model chains is 13% in mean absolute error (MAE), 12% in root mean square error (RMSE), and 23–33% in skill scores for a PV plant on average. The power forecast performance analysis performed and verified for one-year 15-min resolution production data of 16 PV plants in Hungary for day-ahead and intraday time horizons on all possible combinations of nine direct and diffuse irradiance separation, ten tilted irradiance transposition, three reflection loss, five cell temperature, four PV module performance, two shading loss, and three inverter models. The two most critical calculation steps are identified as irradiance separation and transposition modelling, while the inverter models are the least important. Absolute and squared errors are two conflicting metrics, as the more detailed models result in the lowest MAE, while the simplest ones have the lowest RMSE. Wind speed forecasts have only a marginal effect on the PV power prediction. The results of this study contribute to a deeper understanding of the physical forecasting approach in the research community, while the main conclusions are also beneficial for PV plant owners in preparing their generation forecasts.
Article
Full-text available
Over the past few decades, air pollution has caused serious damage to public health. Therefore, making accurate predictions of PM2.5 is a crucial task. Due to the transportation of air pollutants among areas, the PM2.5 concentration is strongly spatiotemporal correlated. However, the distribution of air pollution monitoring sites is not even making the spatiotemporal correlation between the central site and surrounding sites vary with different density of sites, and this was neglected by previous methods. To this end, this study proposes a weighted long short-term memory neural network extended model (WLSTME), which addressed the issue that how to consider the effect of the density of sites and wind conditions on the spatiotemporal correlation of air pollution concentration. First, a number of nearest surrounding sites were chosen as the neighbor sites to the central site, and their distance, as well as their air pollution concentration and wind condition, were input to multilayer perception (MLP) to generate weighted historical PM2.5 time series data. Second, historical PM2.5 concentration of the central site and weighted PM2.5 series data of neighbor sites were input into a long short-term memory (LSTM) to address spatiotemporal dependency simultaneously and extract spatiotemporal features. Finally, another MLP was utilized to integrate spatiotemporal features extracted above with the meteorological data of the central site to generate the forecasts future PM2.5 concentration of the central site. Daily PM2.5 concentration and meteorological data on Beijing-Tianjin-Hebei from 2015 to 2017 were collected to train models and to evaluate its performance. Experimental results with three existing methods showed that the proposed WLSTME model has the lowest RMSE (40.67) and MAE (26.10) and the highest p (0.59). Further experiments showed that in all seasons and regions, WLSTME performed the best. This finding confirms that WLSTME can significantly improve PM2.5 prediction accuracy. Over the past few decades, rapid economic growth worldwide has caused severe air pollution, which has elicited extensive global attention. PM2.5 (particulate matter with a diameter less than 2.5 um), as an important component of air pollutants, is related to cardiopulmonary and other systemic diseases because it penetrates the respiratory system 1,2. According to a recent World Health Organization (WHO) study 3 , approximately 90% of people breathe air that does not comply with WHO Air Quality Guidelines, and about 3 million deaths worldwide are caused by outdoor air pollution in 2012. Considering the proven negative effect of air pollution, forecasting daily PM2.5 concentration must be provided to control air pollution and combat health problems. Many studies have established unique approaches for PM2.5 prediction. These methods can be divided into physical and empirical models. Physical models, such as CMAQ 4 and WRF/Chem-MADRID 5 , can provide explicit insights into the physical-chemical processes of the diffusion and transformation of multiple pollutants and present the direct linkage between pollutant emission and air pollution. However, these chemical transport models are dependent on a priori knowledge, which may cause errors. Empirical models demonstrate the relationships between dependent and multiple independent variables based on historical data. Empirical methods mainly include regression models and machine learning algorithms. Classical regression models, such as multiple linear regression 6 , land use regression 7 , and autoregressive moving average 8 , are relatively simple. These models can just fit linear relationships between inputs and outputs, which are determined by their structures.
Article
Full-text available
Contamination from pesticides and nitrate in groundwater is a significant threat to water quality in general and agriculturally intensive regions in particular. Three widely used machine learning models, namely, artificial neural networks (ANN), support vector machines (SVM), and extreme gradient boosting (XGB), were evaluated for their efficacy in predicting contamination levels using sparse data with non-linear relationships. The predictive ability of the models was assessed using a dataset consisting of 303 wells across 12 Midwestern states in the USA. Multiple hydrogeologic, water quality, and land use features were chosen as the independent variables, and classes were based on measured concentration ranges of nitrate and pesticide. This study evaluates the classification performance of the models for two, three, and four class scenarios and compares them with the corresponding regression models. The study also examines the issue of class imbalance and tests the efficacy of three class imbalance mitigation techniques: oversampling, weighting, and oversampling and weighting, for all the scenarios. The models’ performance is reported using multiple metrics, both insensitive to class imbalance (accuracy) and sensitive to class imbalance (F1 score and MCC). Finally, the study assesses the importance of features using game-theoretic Shapley values to rank features consistently and offer model interpretability.
Article
Full-text available
Air pollution has been a major challenge worldwide particularly in the developing world. It has dire implications for human health. Understanding the knowledge and behaviour of the populace is key to the development and implementation of necessary intervention programmes. The aim of this study was to assess the knowledge, attitudes, and perceptions of air pollution in the Accra, Ghana. The study employed a cross-sectional design to obtain quantitative data form 1404 respondents, and the results were analysed with SPSS version 23. There were more (54.1%) female respondents than males (45.9%) in the study. The majority (70.5%) of the respondents were aware of the haze (air pollution) and its adverse effects on health. There was however a significant relationship between the sociodemographics and air pollution awareness (P=0.01). There was also a correlation between residents’ age, educational level, length of stay, marital status, and knowledge/awareness rate of air pollution (P
Article
Full-text available
Accurate estimation of fine particulate matter with diameter ≤2.5 μm (PM2.5) at a high spatiotemporal resolution is crucial for the evaluation of its health effects. Previous studies face multiple challenges including limited ground measurements and availability of spatiotemporal covariates. Although the multiangle implementation of atmospheric correction (MAIAC) retrieves satellite aerosol optical depth (AOD) at a high spatiotemporal resolution, massive non-random missingness considerably limits its application in PM2.5 estimation. Here, a deep learning approach, i.e., bootstrap aggregating (bagging) of autoencoder-based residual deep networks, was developed to make robust imputation of MAIAC AOD and further estimate PM2.5 at a high spatial (1 km) and temporal (daily) resolution. The base model consisted of autoencoder-based residual networks where residual connections were introduced to improve learning performance. Bagging of residual networks was used to generate ensemble predictions for better accuracy and uncertainty estimates. As a case study, the proposed approach was applied to impute daily satellite AOD and subsequently estimate daily PM2.5 in the Jing-Jin-Ji metropolitan region of China in 2015. The presented approach achieved competitive performance in AOD imputation (mean test R2: 0.96; mean test RMSE: 0.06) and PM2.5 estimation (test R2: 0.90; test RMSE: 22.3 μg/m3). In the additional independent tests using ground AERONET AOD and PM2.5 measurements at the monitoring station of the U.S. Embassy in Beijing, this approach achieved high R2 (0.82–0.97). Compared with the state-of-the-art machine learning method, XGBoost, the proposed approach generated more reasonable spatial variation for predicted PM2.5 surfaces. Publically available covariates used included meteorology, MERRA2 PBLH and AOD, coordinates, and elevation. Other covariates such as cloud fractions or land-use were not used due to unavailability. The results of validation and independent testing demonstrate the usefulness of the proposed approach in exposure assessment of PM2.5 using satellite AOD having massive missing values.
Article
Full-text available
The rapid rise in greenhouse gas emissions have become a global concern catching the attention of policy makers and researchers all over the world. Fossil fuel combustion has been named as the major source of greenhouse gas emissions, meanwhile, studies focusing on fossil fuel impact on CO2 emissions are rare for developing countries including Ghana. This study employed the ARDL procedure with structural breaks and the Bayer-Hanck joint cointegration approach to examine the validity of the EKC hypothesis in the dynamic linkage between industrial growth and emissions of carbon dioxide (CO2) in Ghana, capturing the role of fossil fuel consumption and financial development. The variables are found to be cointegrated and both the short-run and the long-run parameters showed evidence of a U-shaped relationship between industrial growth and CO2 emissions which was further confirmed by the Lind and Mehlum U-test. The short-run causality revealed a uni-directional causality running from fossil fuel consumption to emissions of CO2. For policy purposes, the study advocates for efficient and low carbon emission technologies.
Article
In this study, a spatiotemporal land use regression (LUR) model using Distributed Space-Time Expectation Maximization (D-STEM) software was developed. We trained the model using daily mean ambient particulate matter ≤2.5 μm (PM2.5) data measured hourly in 2015 at 30 regulatory monitoring network stations within the megacity of Tehran, Iran. Since a substantial amount of measured data were missing (48% of the total number of daily PM2.5 observations), we used the D-STEM to impute missing data and compared the missing imputation performance between different fitted models and the mean substitution method. We used h-block cross-validation (h-block CV) method in order to account for spatial autocorrelation in the model building and validation. In the imputation of missing data, the D-STEM LUR model had a mean absolute percentage error (MAPE) of 25.3%, outperforming the mean substitution method, which resulted in MAPE of 28.3%. The spatiotemporal R-squared was 0.73 and the average CV R-squared of 2-block and 5-block cross-validations was 0.60. These values were 0.68 and 0.47 when the spatial aspect of the LUR model was assessed, and 0.995 and 0.992 when the temporal aspect of the LUR model was assessed. This study demonstrated the competence of D-STEM software in spatiotemporal modeling, missing data imputation, and mapping of daily ambient PM2.5 at a very high spatial resolution (20 m × 20 m). These estimations are available for future research, especially for epidemiological studies on short- and/or long-term health effects of ambient PM2.5. Generally, we found D-STEM as a promising tool for spatiotemporal LUR modeling of ambient air pollution, especially for those models that rely on regulatory network monitoring stations with a considerable amount of missing data.