ArticlePDF Available

Abstract and Figures

This paper describes the parallelization process of the code PROMES, which represents a re- gional atmospheric model developed by some of the authors. The parallel code, called PROMESPAR, has been carried out under a distributed platform (cluster of PCs) and using Message Passing Interface (MPI) communication subroutines.
Content may be subject to copyright.
PROMESPAR: A Parallel Implementation Of
The Regional Atmospheric Model PROMES
Juan E. Garrido1, Enrique Arias1, Diego Cazorla1, Fernando Cuartero1,
Iv´an Fern´andez2, Clemente Gallardo2
Abstrac t— This paper describes the parallelization
process of the code PROMES, which represents a re-
gional atmospheric model developed by some of the
authors. The parallel code, called PROMESPAR, has
been carried out under a distributed platform (cluster
of PCs) and using Message Passing Interface (MPI)
communication subroutines.
Keywords: Regional atmospheric model, paralleliza-
tion, message passing interface
1 Introduction
Climate change induced by human activities is one of the
topics to which more attention is devoted to scientific
research today. This is due, not only by the great com-
plexity involved in the processes affecting the climate,
but also to the threat involved in the serious impact that
occurs on the economics and the environment in many
parts of the planet. Three or four decades ago, it was
believed that the oceans would be able to absorb the pol-
lutants emitted by human activities; but today, maritime
degradation is undeniable. Even more recently, the idea
that humanity could induce a change in climate was a
hypothesis that received little scientific support. How-
ever, there is now a broad consensus among scientists,
about the evidence of anthropogenic climate change and
the need for better knowledge about likely developments
in the following decades.
To simulate the climate, we use numerical models repro-
ducing the main processes occurring in the five compo-
nents of the climate system: Atmosphere, hydrosphere,
geosphere, and biosphere, and the exchange of mass and
energy between them. The results obtained by the models
are evaluated and compared with the observed features of
the climate in recent decades. Once it is found the quality
level of the climate model is correct, we apply it to simu-
late potential changes in the climate, considering various
scenarios of anthropogenic emissions of greenhouse gases
This work has been supported by National Project CGL2007-
66440-C04-03. Instituto de Investigaci´on en Inform´atica de
Albacete 1and Instituto de Ciencias Ambientales 2,Univer-
sity of Castilla-La Mancha, Avda. Espa˜na s/n,02071-Albacete,
Spain, email for correspondence:{Enrique.Arias}, Tele-
phone number: +34-967-599200 Ext: {2497}, Fax number: +34-
and aerosols. Since this information, we can deduce the
potential impact of climate change produced in such a
The history of weather forecasting is intimately associ-
ated to development of high performance and parallel
computing [9].
Is in the early stage of 1922, when L. F. Richardson pro-
vides a vision of how to partition the large amount of
computation required in this task, by using thousands of
computers. [1].
However, is in later forties, when the first steps towards
the use of computers in weather forecasting were done.
This beginning was made by von Neumann, Charney and
his colleagues in the computer ENIAC and its successors.
The work done by these researchers was so important
that, thereafter, it was considered the numerical weather
prediction methods as a whole discipline, and that was
the origin of the establishment of national prediction cen-
tres. In fact, today the major supercomputing centres
tend to focus on such tasks.
While the first steps in the weather prediction were bear-
ing fruit, it was thought to apply the same methodol-
ogy in predicting the a longer term, not only predict-
ing changes in the atmosphere (weather) but also in the
global system time (climate change).
Since the forties, there was a dramatic improvement in
numerical methods, algorithms and computer technol-
ogy, as well as physical models and science related with
weather and climate.
In fact, scientists working with models of climate and
weather are the main users of parallel platforms. How-
ever, it is necessary not only to have a platform, but also
the parallel algorithms suited to these platforms to ex-
ploit the full potential of resources. Scientists in these
areas were the first to make effective use of machines
with segmented architecture, such as IBM 360/195, Cray
1, Cyber 205, Cray YMP and Cray 90.
Until nineties, it was not made a serious attempt to de-
velop an operational parallel model. Americans were the
first to combine the efforts of experts in meteorology with
Proceedings of the World Congress on Engineering 2009 Vol I
WCE 2009, July 1 - 3, 2009, London, U.K.
ISBN: 978-988-17012-5-1
WCE 2009
experts in high performance computing, included in the
program of High Performance Computing and Communi-
cations (HPCC), in a more ambitious program such as the
Computer Hardware, Advanced Mathematics and Model
Physics (CHAMMP), a program of the U.S. Department
of Energy. The result was the development of a set of
models to use scalable parallel computer systems.
Thanks to the parallelization of weather prediction mod-
els, it is provided to scientists the ability to deal with
longer simulations, to increase the spatial resolution, etc.
Throughout the last decade, several parallel approaches
have been developed. Among them, we remark [3] the
based on vectorial multiprocessors such as CCM2 and its
scalable variants [2, 12, 20], massively parallel comput-
ers (adaptation from the spectral model of the National
Meteorology Centre) [19], distributed memory multipro-
cessors [18] (integrated prediction system) and passing
messages [15].
Since 1995 until now, it has been followed different ways
in the application of parallelism to the weather predic-
tion. These paths have led us to new versions of the above
mentioned models (i.e. the last version of the CAM model
called CCM [7]), to applications in our area of interest,
such as GRID technology (IrisGRID [4] or CrossGrid [5]),
the apparition of [6] program, to
adaptations of different codes to the most powerful ma-
chine of the moment [17, 13, 14] and implementations of
meteorological aspects such as weather data assimilation
or transposing multidimensional vectors [10].
Special mention deserves the MM5 fifth generation
mesoscale model [8].It is relevant because it is the model
used as reference by Promespar (its implementation is
carried out taking into account only part of the compre-
hensive scheme conforming the full model).
Designed to work with high resolution (higher than 5km),
the MM5 consists on a model with very sophisticated
physical parameterizations schemes, but needing a huge
computational power. It was developed by the Univer-
sity of Pennsylvania (PSU) and the National Center for
Atmospheric Research (NCAR) in the United States.
The MM5 model, running on a parallel distributed mem-
ory platform, with massively parallel processor (MPP),
networks of workstations, etc., is called MM90 [16]. This
code was implemented in Fortran 90, using a communica-
tion library developed at Argonne National Laboratory
called RSL, library that corresponded to the ones pro-
vided by the seller (NX for Intel Paragon, or MPL for
IBM SP2), or MPI for other platforms. MM90 is the
successor of MM5 implementation of massively parallel
machine called MPMM [11].
The paper is organized as follows. Section 2 introduces
the regional atmospheric model PROMES, and in Section
3 the parallelization of PROMES is presented. The ex-
perimental results are outlined in Section 4. Finally, the
conclusions and future work are commented in Section
2 The regional atmospheric model
PROMES is a regional atmospheric model developed by
some of the authors and presented in [1]. In particular,
PROMES is a mesoscale forecast model over which sev-
eral physical phenomena which act on the atmosphere are
parametrized modifying its conditions and behaviour. Ir
becomes evident that due to the fact that the model is
represented by a set of equations, as bigger the number of
physical parameters to parametrized as complex its res-
olution; and obviously its accuracy. The complexity on
the solution makes necessary the used of parallel plat-
forms to solver the problem in order to obtain the results
in a reasonable time.
Figure 1 shows the physical parameters that are modelled
Figure 1: Physical parameters modelled at PROMES
In order to make the computations easier, the model di-
vides the zone to be studied on a set of vertical columns,
each one with the atmosphere behaviour in an instant of
time. This division is known as grid of calculus and it is
shown on Figure 2.
Finally, and overview of the structure of PROMES code
is shown in Figure 3.
3 PROMESPAR: a distributed memory
implementation of PROMES
As it was previously commented, in order to obtain a
very accuare solution in a reasonable time, it is nec-
essary the use of parallel platforms. In this paper, a
distributed memory implementation of PROMES code,
called PROMESPAR, is presented.
La parallelization of PROMES consists on dividing the
Proceedings of the World Congress on Engineering 2009 Vol I
WCE 2009, July 1 - 3, 2009, London, U.K.
ISBN: 978-988-17012-5-1
WCE 2009
Figure 2: Grid of calculus
Figure 3: General squeme of PROMES code
domain on a set of subdomains getting out the work to
carried out into the different processors (see Figure 4).
Once the domain has been divided the processors just
exchange the frontier information.
In order to obtain an equally load balancing, a constrain
is applied to the size of the subdomain and the number of
processor to be used. This constrain is given by equation
X sizeP roc )±XBorderSize (1)
ProcY BlockSize=(
OrY matSize
Y sizeP roc )±YBorderSize (2)
where ProcXBlockSize and ProcY BloclSize mean the
size of blocks for each processor at Xor ycoordinate,
Figure 4: Squeme of spliting the domain into subdomains
respectively, which is computed from the original dimen-
sion of the matrix (OrXmatSize and OrY matSize)and
the number of processors by each coordinate (Xsiz eP roc
and Y sizeP roc), and taking into account the boundary
conditions (XBorderSize and YBorderSize).
However, processor 0 has additional tasks due to the fact
that it acts as master reading initial conditions, boundary
values for the domain, etc from files
In any case, the good load balancing could be affected
mainly by two factors:
Static imbalance. Those processors whose sub-
domains contain maritima zones have less compu-
tational load. This circunstance is due to the fact
that the computations needed for solving the fore-
casting model are simplest in this kind of cells (some
physical phenomena as the effect of orography, heat
exchange with the masses of plants, etc are not taken
into account).
Dynamic imbalance. This kind of imbalence is
devoted by the initial conditions. For instance, the
effect of solar radiation could vary if a cloudy day or
a sunny day is considered. These effects are unpre-
dictable. However, other effects as the solar radia-
tion during the night are predictable.
Figure 5 shows the different libraries considered in the
implementation of PROMESPAR, all used under FOR-
Proceedings of the World Congress on Engineering 2009 Vol I
WCE 2009, July 1 - 3, 2009, London, U.K.
ISBN: 978-988-17012-5-1
WCE 2009
TRAN programming language. In particular, the follow-
ing libraries have been considered:
MPI: Messing Passing Interface use for communica-
tions purpose. This library supports the comuni-
cation between the different processors of the dis-
tributed memory platform.
NETCD: NetCDF (network Common Data Form) is
a set of software libraries and machine-independent
data formats that support the creation, access, and
sharing of array-oriented scientific data.
IOPSL: Library for input/output operations with
meteorological data.
Other physical libreries: computation of solar radia-
tion, heat exchange ground-atmosphere, etc.
physical operations
Figure 5: Components squeme of PROMESPAR
Figure 6 represents the workflow of the parallel imple-
mentation of PROMES, PROMESPAR.
The workflow in Figure 6 is followed by each processor,
and the barriers on Figure 6 mean communication or syn-
chronization taks amount the different processors.
4 Experimental results
The experimental results have been obtained taken into
account 24 hours of simulation. The distributed memory
implementation has been run into a cluster of PCs with
16 Intel processors at 1.8GHz, each one with 512 MB of
main memory and interconnected by a Myrinet Network
using NFS file system.
The performance obtained in the parallel implementa-
tions are evaluated in terms of:
Execution time: Time spent in order to solve the
Figure 6: Workflow of PROMESPAR
Speed-up: The ratio of the time taken to solve a
problem on a processor to the time required to solve
the same problem on a parallel computer with piden-
tical processors.
Efficiency: A measure of the fraction of time for
which a processor is usefully employed; it is defined
as the ratio of the speed-up to the number of proces-
Most time consuming has been spent at main loop where
are contained the most caomputational cost operations.
In particular, apart from send and receive operations
for communication purpose, physical operations are in-
voqued. These operations are shown at Figures 3 and
The experimental results considered in this section take
into account a 24 hour simulation, which is equivalent to
carry out 2881 iterations of main loop.
Figures 7, 8 and 9 show the results of the previous exper-
iment (24 hour simulation) in terms of execution time,
Proceedings of the World Congress on Engineering 2009 Vol I
WCE 2009, July 1 - 3, 2009, London, U.K.
ISBN: 978-988-17012-5-1
WCE 2009
speed-up and efficiency.
Figure 7: Execution time of PROMESPAR
Figure 8: Speed-up of PROMESPAR
Figure 9: Efficiency of PROMESPAR
From the experimental results, te main conclusion is that
the best results, in terms of execution time has been ob-
tained considering 8 processors. However, in terms of
speed-up and efficiency best results are obtained for 2
processors. This is a normal circunstance due to the in-
fluence of the communications. However, for this partic-
ular applications the main goal is to reduce the execution
As it was previously commented, the most time consum-
ing of PROMESPAR code is spend on main loop. Figure
10 show a detailled study of the time spend on main loop.
It is possible to observe that fisicapal,Coriolis and Difu-
sion functions spent the most quantity of time, and ob-
viously the parallelization approach allows to reduce this
execution time, overall from one to two processors. Any-
way, the reduction of execution time results quite good.
Figure 10: Execution time of the main loop of PROMES-
PAR for an hour simulation
5 Conclusion
PROMES is a mesoscale regional atmospheric model de-
veloped by some of the authors of this paper. However,
due to the high time consuming by PROMES code and
the necessity of having more accurate results, both cir-
cunstances justify the used of parallelism. In this paper,
a distributed memory implementation of the regional at-
mospheric model PROMES has been carried out. This
parallel implementation is called PROMESPAR.
The experimental results show a dramatically execution
time reduction by means of the use of a parallel plat-
form considering the same configuration that the orig-
inal PROMES code. These results leads to think that
either longer or more accurate simulations could be car-
Proceedings of the World Congress on Engineering 2009 Vol I
WCE 2009, July 1 - 3, 2009, London, U.K.
ISBN: 978-988-17012-5-1
WCE 2009
ried out spending the same time, or more complex models
could be considered. In fact, the authors are extend-
ing PROMES code in order to be able of making cli-
mate change studies. Climate change studies consider
100 years simulations spending, obviously, lot of time and
then if the researchers want to provide conclusions from
these studies the use of parallelism becames essential.
The authors would like to thank to the Madrid supercom-
puter and visualization center (CESVIMA) to be able to
use the supercomputer known as MAGERIT.
[1] M. Castro, C. Fern´andez, M. A. Gaertner. Descrition
of a meso-scale atmospheric numerical model. Math-
ematics, climate and environment, 230–253, 1993.
[2] David L. Williamson Byron A. Boville James
J. Hack, James M. Rosinski and John E. Truesdale.
Computational design of the ncar community cli-
mate model. Parallel Computing, (21):1545–1569,
[3] Special Issue on Parallel Computing in Climate and
Weather Modeling Parallel Computing, 21, 10, 1995.
[4] Spanish GRID Iniciative
to coordinate both scientific and academic research
[5] The primary ob-
jective of the CROSSGRID Project is to further ex-
tend the Grid environment to a new category of ap-
plications of great practical importance, and into 11
new European countries.
[6] Cli- is a distributed computing
project to produce predictions of the Earth’s cli-
mate up to 2080 and to test the accuracy of climate
[7] Cam: Community atmosphere model.
[8]J.DudhiaD.O.GrellandD.R.Stuer. Ade-
scription of the fifth-generation penn state/ncar
mesoscale model (mm5). Ncar/tn-398+str, National
Center for Atmosphere Research, Boulder, Colorado,
[9] John Drake and Ian Foster. Introduction to the
special issuue on parallel computing in climate and
weather modelling. Parallel Computing, (21):1539–
1544, 1995.
[10] Yun He and Chris H. Q. Ding. Mpi and openmp
paradimgs on cluster of smp architectures: the va-
cancy traking algorithm for multi-dimensional ar-
ray transposition. In Proceedings of the IEEE/ACM
SC2002 Conference, 2002.
[11] R. Nanhyndiah S. Hammond J. Michalakes, T. Can-
field and G. Grell. Parallel implementation, vali-
dation and performance of mm5. In Proceedings of
the Sixth ECMWF Workshop on the Use of Parallel
Processors in Meteorology, World Scientific, River
Edge, New Jersey, pages 266–276, 1995.
[12] John Michalakes Brian Toonen John Drake, Ian Fos-
ter and Patrick Worley. Design and performance of a
scalable parallel community climate model. Parallel
Computing, (21):1571–1591, 1995.
[13] H. Kanamaru Cui Y. H. Juang Kanamitsu, M. Par-
allel implementation of the regional spectral at-
mospheric model. Pier energy-related environmen-
tal research cec-500-2005-016, Scripps Institution of
Oceanography, University of California at San Diego,
and National Oceanic and Atmospheric Administra-
tion for the California Energy Commission, 2005.
[14] Takashi ABE Hirofumi Sakuma Keiko Takahashi,
Akira Azami and Tetsuya Sato. Developing cou-
pled ocean-atmospheric global climate model for the
earth simulator and its computational/physical val-
idation. NEC Res. and Develop, 44(1), 2003.
[15] P. G. Eltgroth W. P. Dannevik C. R. Mechoso J.
D. Farrara M. F. Wehner, A. A. Mirin and J. A.
Spahr. Performance of a distributed memory finite
difference atmospheric general circulation model.
Parallel Computing, (21):1655–1675, 1995.
[16] J. Michalakes. Mm90: A scalable parallel imple-
mentation of the penn state/ncar mesoscale model
(mm5). 1997.
[17] N. L. Miller. Recent advances in regional climate
modelling and climate change analyses of extreme
heat. Pier energy-related environmental research
cec-500-2005-016, Lawrence Berkeley National Lab-
oratory, for the California Energy Commission, 2005.
[18] L. Isaksen G. Robinson G. Mozdzynsky S. R.
M. Barros, D. Dent and F. Wollenweber. The ifs
model: A parallel production weather code. Parallel
Computing, (21):1621–1638, 1995.
[19] J. G. Sela. Weather forecasting on parallel architec-
tures. Parallel Computing, (21):1639–1654, 1995.
[20] John M. Dennis Steven Hammond, Richard D. Loft
and Richard K. Sato. Implementation and perfor-
mance issues of a massively parallel atmospheric
model. Parallel Computing, (21):1593–1619, 1995.
Proceedings of the World Congress on Engineering 2009 Vol I
WCE 2009, July 1 - 3, 2009, London, U.K.
ISBN: 978-988-17012-5-1
WCE 2009
Full-text available
Weather radar operation generates data at a high rate that requires prompt processing. The operations performed on data for weather product generation are repeated in each resolution cell and thus are naturally prone to parallelization. Parallel processing using graphic cards is an emerging technology that allows for implementation of high-throughput algorithms at a low cost. In this paper, the parallel implementation of the main product of a polarimetric weather radar using GPU is presented, focusing on its optimization. A speedup exceeding 20\(\times \) is obtained when compared to the serial implementation. Also processing is found to be memory bound, which results in a counter-intuitive performance improvement when the number of threads per job is reduced.
We present implementation and performance issues of a data parallel version of the National Center for Atmospheric Research (NCAR) Community Climate Model (CCM2). We describe automatic conversion tools used to aid in converting a production code written for a traditional vector architecture to data parallel code suitable for the Thinking Machines Corporation CM-5. Also, we describe the 3-D transposition method used to parallelize the spherical harmonic transforms in CCM2. This method employs dynamic data mapping techniques to improve data locality and parallel efficiency of these computations. We present performance data for the 3-D transposition method on the CM-5 for machine size up to 512 processors. We conclude that the parallel performance of the 3-D transposition method is adversely affected on the CM-5 by short vector lengths and array padding. We also find that the CM-5 spherical harmonic transforms spend about 70% of their execution time in communication. We detail a transposition-based data parallel implementation of the semi-Lagrangian Transport (SLT) algorithm used in CCM2. We analyze two approaches to parallelizing the SLT, called the departure point and arrival point based methods. We develop a performance model for choosing between these methods. We present SLT performance data which shows that the localized horizontal interpolation in the SLT takes 70% of the time, while the data remapping itself only require approximately 16%. We discuss the importance of scalable I/O to CCM2, and present the I/O rates measured on the CM-5. We compare the performance of the data parallel version of CCM2 on a 32-processor CM-5 with the optimized vector code running on a single processor Cray Y-MP. We show that the CM-5 code is 75% faster. We also give the overall performance of CCM2 running at higher resolutions on different numbers of CM-5 processors. We conclude by discussing the significance of these results and their implications for data parallel climate models.
Global forecasting at the National Meteorological Center is performed using a spherical harmonics spectral model. The model is used in global data assimilation, and in the aviation and medium range forecast production suites. In this presentation, the major numerical and physical aspects of the model are highlighted, and parallelization considerations are analyzed and applied to three parallel computer architectures. The first parallelization is on a parallel vector processor shared memory platform such as Cray YMP or C90. The other two are on distributed memory platforms: a data parallel implementation on Connection Machines and a single program multiple data (SPMD) adaptation to a Cray T3D machine.
A new version of the UCLA atmospheric general circulation model suitable for massively parallel computer architectures has been developed. This paper presents the principles for the code's design and examines performance on a variety of distributed memory computers. A two dimensional domain decomposition strategy is used to achieve parallelism and is implemented by message passing. This parallel algorithm is shown to scale favorably as the number of processors is increased. In the fastest configuration, performance roughly equivalent to that of multitasking vector supercomputers is achieved.
We describe the design of a parallel global atmospheric circulation model, PCCM2. This parallel model is functionally equivalent to the National Center for Atmospheric Research's Community Climate Model, CCM2, but is structured to exploit distributed memory multi-computers. PCCM2 incorporates parallel spectral transform, semi-Lagrangian transport, and load balancing algorithms. We present detailed performance results on the IBM SP2 and Intel Paragon. These results provide insights into the scalability of the individual parallel algorithms and of the parallel model as a whole.
This paper describes MM90, a parallel regional weather model based on the Penn State/NCAR MM5. Parallelization of finite differencing, horizontal interpolation, and nesting on distributed-memory (message passing) computers is handled transparently using the RSL library package. Fortran90 modules, derived data types, dynamic memory allocation, pointers, and recursion are used, making the code modular, flexible, extensible, and run-time configurable. The model can dynamically sense and correct load imbalances. The paper provides performance, scaling, and load-balancing data collected on the IBM SP2 computers at Argonne National Laboratory and NASA Ames Laboratory. Future work will address the impact of parallel modifications on existing modeling software; an approach using commercially available source translation software is described.
In this introduction to the special issue on ‘Parallel computing in climate and weather modeling’, we review the historical development of computer models of the weather and climate system, and the application of novel, high performance parallel computing technologies to the execution of these models. We also provide some context for the articles that follow by summarizing the structure of typical models and the numerical methods used to implement them. Finally, we describe the eight articles in the special issue, and outline challenges that must be addressed in future research.
The integrated Forecasting System (IFS) of the European Centre for Medium-range Weather Forecasts (ECMWF) is a spectral weather forecasting model, which daily produces weather forecasts on up to 16 processors of a CRAY C90. This paper describes the shared-memory implementation of the code and the subsequent development that has been carried out in order to generate a parallel version, suitable for a scalable distributed-memory architecture with many processors. Performance results presented for several vector and parallel systems indicate that the parallelization effort has been successful in achieving good performance and high efficiency.
An overview of the computational design for the latest version of the NCAR Atmospheric General Circulation Model, designated CCM2, is presented. Parallel implementation details are driven by two major algorithmic classes of computation that require different patterns of data communication, the spectral transform method and the semi-Lagrangian advection technique. The organization and performance characteristics of a shared-memory parallel implementation, and an analogous distributed-memory message-passing parallel implementation are described. The advantages and limitations of this coarse-grained partitioning are discussed in the context of global climate modeling research.