Conference PaperPDF Available

The International Surface Temperature Initiative's Global Land Surface Databank

  • NOAA National Centers for Environmental Information

Abstract and Figures

The International Surface Temperature Initiative (ISTI) consists of an end-to-end process for land surface air temperature analyses. The foundation is the establishment of a global land surface Databank. This builds upon the groundbreaking efforts of scientists in the 1980s and 1990s. While using many of their principles, a primary aim is to improve aspects including data provenance, version control, openness and transparency, temporal and spatial coverage, and improved methods for merging disparate sources. The initial focus is on daily and monthly timescales. A Databank Working Group is focused on establishing Stage-0 (original observation forms) through Stage-3 data (merged dataset without quality control). More than 35 sources of data have already been added and efforts have now turned to development of the initial version of the merged dataset. Methods have been established for ensuring to the extent possible the provenance of all data from the point of observation through all intermediate steps to final archive and access. Databank submission procedures were designed to make the process of contributing data as easy as possible. All data are provided openly and without charge. We encourage the use of these data and feedback from interested users.
Content may be subject to copyright.
The international surface temperature initiative's global land surface databank
J. H. Lawrimore, J. Rennie, W. Gambi de Almeida, J. Christy, M. Flannery, B. Gleason, A. Klein-Tank, A.
Mhanda, K. Ishihara, D. Lister, M. J. Menne, V. Razuvaev, M. Renom, M. Rusticucci, J. Tandy, P. W. Thorne,
and S. Worley
Citation: AIP Conference Proceedings 1552, 1036 (2013); doi: 10.1063/1.4821420
View online:
View Table of Contents:
Published by the AIP Publishing
This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: Downloaded to IP: On: Tue, 29 Oct 2013 07:23:30
The International Surface Temperature Initiative’s
Global Land Surface Databank
J. H. Lawrimore1, J. Rennie2, W. Gambi de Almeida3, J. Christy4, M. Flannery5,
B. Gleason1, A. Klein-Tank6, A. Mhanda7, K. Ishihara8, D. Lister9, M. J. Menne1,
V. Razuvaev10, M. Renom11, M. Rusticucci12, J. Tandy13, P. W. Thorne2,
and S. Worley14
1NOAA’s National Climatic Data Center, Asheville, NC, USA
2 Nansen Environmental and Remote Sensing Center, Bergen, Norway
3Instituto Nacional de Pesquisas Espaciais, Centro de Previsão de Tempo e Estudos Climáticos, Brazil
4University of Alabama-Huntsville, Huntsville, AL, USA
5Bureau of Meteorology, Melbourne, Australia
6Royal Netherlands Meteorological Institute (KNMI), De Bilt, Netherlands
7African Centre of Meteorological Applications for Development, Niamey, Niger
8Japan Meteorological Agency, Tokyo, Japan
9Climatic Research Unit, UEA, Norwich, UK
10Russian Research Institute of Hydrometeorological Information, Obninsk, Russia
11Universidad de la Republica, Montevideo, Uruguay
12University of Buenos Aires, Argentina
13Met Office Hadley Centre, Exeter, United Kingdom
14National Center for Atmospheric Research, Boulder, CO, USA
Abstract. The International Surface Temperature Initiative (ISTI) consists of an end -to-end process for land surface air
temperature analyses. The foundation is the establishment of a global land surface Databank. This builds upon the
groundbreaking efforts of scientists in the 1980s and 1990s. While using many of their principles, a primary aim is to
improve aspects including data provenance, version control, openness and transparency, temporal and spatial coverage,
and improved methods for merging disparate sources. The initial focus is on daily and monthly timescales. A Databank
Working Group is focused on establishing Stage-0 (original observation forms) through Stage-3 data (merged dataset
without quality control). More than 35 sources of data have already been added and efforts have now turned to
development of the initial version of the merged dataset. Methods have been established for ensuring to the extent
possible the provenance of all data from the point of observation through all intermediate steps to final archive and
access. Databank submission procedures were designed to make the process of contributing data as easy as possible. All
data are provided openly and without charge. We encourage the use of these data and feedback from interested users.
Keywords: Climate change, climate dataset construction, data provenance.
Universal temperature scales were not developed
until the early 18th century when the scale most closely
resembling today’s Fahrenheit scale was developed.
This was followed by the work of Anders Celsius that
was eventually extended to become today’s standard
scientific temperature scale (1).
These efforts made possible the record of
temperature that today provides insight into the Earth’s
climate. The Central England Temperature record
began in 1659. More than 100 years of this record
were based on instrumental measurements, some
estimated from measurements in indoor unheated
rooms, combined with non-instrumental weather diary
entries. A daily series considered to be truly
representative did not begin until 1772 (2). An even
longer continuous record of temperature for a single
location is the monthly mean temperature series for De
Bilt, Netherlands, which extends from 1706 to the
present (3). Several other long European series exist
going back over 200 years.
Some early records were made in North America in
the late 1700s. Throughout the 1800s measurements
expanded across other continents. These early records
were carefully made by professionals who had the
skills and training to operate and care for the delicate
meteorological instruments. As instruments became
cheaper and more durable, it became possible for an
even greater expansion (4). National Meteorological
Temperature: Its Measurement and Control in Science and Industry, Volume 8
AIP Conf. Proc. 1552, 1036-1041 (2013); doi: 10.1063/1.4821420
© 2013 AIP Publishing LLC 978-0-7354-1178-4/$30.00
This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: Downloaded to IP: On: Tue, 29 Oct 2013 07:23:30
and Hydrological Services (NMHS) around the world
have operated networks to support weather and climate
observations since the late 19th Century.
It was not until the 1980s and 1990s that major
efforts were made to collect observations and create
consolidated global datasets. The Global Historical
Climatology Network-Monthly dataset contained more
than 6000 stations when it was released in 1992 (5). A
second version contained 7280 stations with monthly
mean, maximum, and minimum temperature (6). An
independent effort was made to create CRUTEM (the
Climatic Research Unit TEMperature record) at about
the same time and this global dataset of more than
4000 stations is still maintained today (7).
In the early part of the 21st century attention turned
to daily data. The Global Historical Climatology
Network-Daily dataset (8) provides daily maximum,
minimum, and mean temperature for more than 25,000
stations. These records are generally shorter duration
than monthly means with most not beginning until the
middle of the 20th century and large gaps still present,
particularly in the Southern Hemisphere.
These and other monthly and daily global datasets
provide the foundation for studying variation and
change in the Earth’s climate over the past 100 to 200
years. While these have led to tremendous advances in
understanding, there remain impediments due to
residual deficiencies in global collections. Regional
and local scale assessments are constrained by limited
spatial coverage in many regions, especially in the
1800s (Figure 1). Although additional sources of data
exist, often in their original manuscript form or more
recently as images of the original forms, digitization
efforts that make the integration into datasets possible
have lagged.
Available metadata records are incomplete and
inadequate for fully characterizing uncertainty
associated with changing observing practices,
instrumentation, and environmental conditions
surrounding the station. Such metadata are especially
important in the assessment and correction of
inhomogeneities in the climate record (9). Although
this information is often maintained in NMHS
archives, in most cases metadata have not been
included in data exchange activities.
There also has been limited attention given to the
need for version control and provenance tracking in
the construction of datasets. For decades, climate
scientists were focused on building datasets with the
best temporal and spatial coverage possible, separating
valid from invalid reports, and developing methods to
remove inhomogeneities from the record. External
pressures that would lead to doubts about the integrity
of the underlying data and the steps involved in the
calculation of global temperatures were never
envisioned. Only in recent years has there developed a
need for scientists to better document the provenance
and implement version control from the point of
measurement through dissemination, quality control,
bias correction, archive and access. The requirement to
be fully open and transparent as to the details
associated with each processing step includes the need
to provide access to the software associated with each
data processing step, quality control, and bias
correction. By putting in place new practices the wider
community will have the opportunity to more fully
engage in the process. This should engender greater
public confidence and understanding.
In response to these needs, efforts to develop a
global land surface Databank were initiated as part of
the International Surface Temperature Initiative
(ISTI). This activity is overseen by a Databank
Working Group (DWG) which reports to the ISTI
Steering Committee (10). It leverages design
principles and lessons learned from the International
Comprehensive Ocean-Atmosphere Data Set
(ICOADS) effort; a highly successful program that has
produced and maintained an integrated and up-to-date
dataset of global ocean measurements since the mid-
1980s (11). It is envisaged that the databank to be
successful and garner all available data with good
provenance will requiree at least a similar lifetime.
The Databank is being constructed and made
available in six stages (the latter two, quality
controlled and homogenized products are not
discussed further herein, see companion piece by
thorne et al.) from the original observation to the final
quality controlled and bias corrected product (refer to
Figure 2 in accompanying ISTI conference
proceedings paper). The initial focus is on temperature
data on the daily and monthly timescale, although
other elements and timescales will be added later.
Stage-zero consists of observations in their original
form. The historical record consists primarily of
observations recorded on paper and housed in NMHS
archives and other locations such as national
museums. Over the past two decades there has been a
transition towards fully automated networks that
operate without the need for an observer. However,
there remain thousands of stations that continue to rely
on paper records. Many paper records have been
converted to photographic or scanned images over the
past decade through programs such as NOAA’s
Climate Database Modernization Program (12), and
the International Environmental Data Rescue
Organization (IEDRO) (13). Such images are essential
to preserve the original observations. In other cases
only the original paper form or possibly a microfiche
This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: Downloaded to IP: On: Tue, 29 Oct 2013 07:23:30
or microfilm copy exists. These sources may not
physically reside on the Databank server, but when the
location is known its archived location is documented.
Stage-1 consists of digital data in native format.
This is beneficial in that it does not require extra effort
on the part of the data provider to perform
reprocessing and reformatting while reducing the
possibility that errors could occur during translation.
FIGURE 1. Number of non-unique stations available to the Databank in digital form as of September 21, 2012 for each of four
periods from 1850 through 2012. Stations have at least one observation of monthly mean temperature during each period.
Databank policy encourages data be provided in its
rawest form; that closest to the measurements that
were first reported by the observer (10). Ideally no
quality control or homogenization should be applied
prior to submission so that the provenance can be
better assured leading up to and through the point
where quality control of the Databank is accomplished
in a consolidated and automated way.
However if the original raw observations do not
exist, the quality controlled and/or homogeneity
adjusted data will be accepted. The details of such
processing applied prior to submission are collected
and retained. This information is used in merging and
remains with the source data to support future
decisions regarding its use.
Following Stage-1, all data are converted to a
common format in Stage-2. This step appends data
provenance to help users understand the history of
each observation. Stage-2 format is ASCII and each
data source is in a separate subdirectory. An inventory
file is produced containing any available metadata. At
a minimum this typically consists of a station id, name,
latitude, longitude, elevation, and beginning and
ending year. Accompanying this is a map which shows
the locations and the number of years of data in their
Data Provenance
To provide a traceable record Data Provenance
Tracking (DPT) flags are required. Stage-2 data
provides the first opportunity to assign such flags. A
DPT flag is a 3- to 4-digit numeral or alpha character
representing unique information regarding each
observation. There are currently five DPT flags: (1)
Stage-0 Source, (2) Stage-1 Source, (3) Data Type, (4)
Mode of Digitization, and (5) Mode of
Transmission/Collection. Additional flags can be
added in the future, for example to specify instrument
type as sufficient metadata becomes available. The
This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: Downloaded to IP: On: Tue, 29 Oct 2013 07:23:30
information contained within each DPT flag
completely defines an observation.
DPT 1 defines the Stage-0 Source from which the
observation originated. Sources include NMHS hosts
such as the Japan Meteorological Agency and the
Australian Bureau of Meteorology, universities such as
University Rovira I Virgili, University of Alabama-
Huntsville, and internationally sponsored programs
such as the World Meteorological Organization’s
World Weather Records.
DPT 2 describes the source of the Stage-1 data.
This source may differ from the Stage-0 data provider
or provide additional information such as the name of
the host’s dataset from which the data originated.
DPT 3 indicates if the data provided by the host
had been previously quality controlled or homogeneity
DPT 4 describes the mode of digitization and the
institution responsible.
DPT 5 provides the mode of transmission and
collection. This describes the process used to transfer
the data to the Databank.
Data Merging
Next data are merged into a single Stage-3 dataset.
This is fraught with many complexities associated with
the nature of weather and climate data which were
collected by hundreds of thousands of observers in
hundreds of countries often using differing languages,
observing methods, and documenting and archive
procedures. Often metadata provide only the most
basic information such as station name and location,
and often even this information is inaccurately
Because all stages of data are provided within the
Databank, it is possible for any interested individual or
organization to implement their own unique merging
technique for creating a merged dataset and this is
encouraged. Nevertheless, the ISTI is currently
developing a merging methodology which will be
applied to development of a Stage-3 dataset. This will
be fully documented and made available along with all
source code used in performing the merge. This is an
evolving process with refinements expected to be
made on a continuing basis in coming months and
Because many sources may contain records for the
same station it is necessary to create a process for
identifying and removing duplicate stations, merging
some sources to produce a longer station record, and in
other cases for determining when a station should be
brought in as a new record.
First a source hierarchy is created. Prioritization is
based on a number of criteria (provenance, degree of
processing, presence of max and min elements etc.).
Because ISTI places special emphasis on data
provenance, the Stage-3 databank holdings are
envisaged to constitute as close to the raw data as
possible, ideally with provenance tracking back to the
raw, hard copy record. Monthly mean maximum and
minimum temperature are preferred because they can
be directly used to calculate monthly mean
temperature. In cases where only monthly mean
temperature data are provided it is often unclear what
method was used for its computation (3). In addition,
biases affecting maximum temperature can differ from
those affecting minimum temperature, necessitating
different corrections (14).
With this framework in mind, prioritization of the
Stage-2 sources for the merge process is
accomplished. Different decisions may lead to a
different hierarchy, and further development is needed
before a final hierarchy is established. The merge
process occurs iteratively, starting from the highest
priority data source (target) and progressing through
all the source decks (candidates). Potential approaches
include the use of a Bayesian approach based upon
metadata matching and data equivalence criteria.
Metadata Comparisons
There are at least three geolocation characteristics
which can be used to identify potentially matching or
definitively unique stations. The distance between
stations based upon latitude and longitude fitted to an
exponential decay function decaying from 1 at no
distance to zero at 100km, and the probability that the
two stations are the same returned as this value. This
can be followed by a similar approach using the height
difference between the two stations. A third involves a
test of the similarity of the station name, using a
measure such as the Jaccard Index (JI), which is
defined as the intersection divided by the union of two
sample sets. The Jaccard Index looks for cases in
which certain letters exist in both station names, as
well as the number of times letters occur in one name,
but not in the other.
These three geolocation metrics have a probability
from 0 to 1. Using a simple Bayesian approach, they
can be multiplied and a combined probability returned
that the two stations are the same. If this surpasses a
threshold further evaluation based on data
comparisons can begin. This threshold should be set
low enough to account for the possibility that there are
errors in metadata.
Data Comparisons
There are two distinct types of scenarios for data
comparisons. Those where station data overlap, and
This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: Downloaded to IP: On: Tue, 29 Oct 2013 07:23:30
those where they do not. For cases with overlap, a
direct comparison of observations during the same
months and years can made. For cases in which data
do not overlap, testing for data equivalence is required.
Potential approaches include the generation of a
Bayesian probability, combining both the geolocation
and data probabilities. This probability can be
evaluated to determine whether the target and
candidate observations are from the same or different
If it is concluded that the candidate station is the
same as the target station a merge can be performed.
Only data not already in the target station record will
be added. Preference is always given to the target,
since it contains data that were higher in priority. If a
candidate station goes through the entire target dataset
and no match is found, then the station is deemed
unique. Further details associated with the merge
process are under development, but it is expected that
the first version of the Databank will show a
significant increase in the number of stations
compared to global datasets such as the Global
Historical Climatology Network-Monthly (Figure 2)
FIGURE 2. Number of stations in the Global
Historical Climatology Network-Monthly version 3
dataset (black) and number of stations in a Stage-3
merged databank still under development (red).
Data are provided from a primary ftp site hosted by
the Global Observing Systems Information Center
(GOSIC; and World Data Center A at
NOAA/NCDC. In addition World Data Center B at
Oblinsk, Russia established an ftp site that is routinely
updated to mirror the data on the primary site.
All data are provided in ASCII to facilitate access and
ease of use. Future efforts may include conversion to
NetCDF Climate and Forecast (CF) convention.
In some cases the data provider has agreed to
contribute regular data updates. Upon updates the
previous version is moved to an archive directory and
permanently stored. Within the archive directory each
version is maintained and designated by the year,
month, and day the data were first received. It is
preferable that the entire source dataset be transferred
as updates are made rather than collection of only the
most recent observations. Acquiring the full source
better ensures the most up-to-date data.
A version number is assigned to new sources or
updates to sources as they are added. All files from a
single source are combined into a single tar file
compressed using gzip. The version number is
contained within a naming structure:
1. source identifies the data provider.
2. timescale is monthly, daily, or hourly.
3. stage# is currently either Stage-1 or Stage-2.
4. X is incremented when there is a major change
to the source dataset such as replacement or
addition of a large percentage of data.
5. Y is incremented when there are small updates
to the source dataset such as real-time updates
to existing stations.
6. yyyymmdd is the year, month, and day the
data source was provided or updated.
Databank submission procedures are designed to
make the process easy while ensuring the submitted
data are of high quality and traceable. Policies require
submission of information about the contributed data
including file formats and metadata such as station
location and name. Data should be provided in the
original native format e.g ASCII text, Microsoft Excel,
XML, NetCDF. A complete guide to data submission
procedures is available online
Construction of a land surface databank is a major
undertaking requiring time and international
coordination. It has been preceded by many
groundbreaking efforts and comes at a time when the
need for high quality, traceable, and complete data is
clearer than ever.
This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: Downloaded to IP: On: Tue, 29 Oct 2013 07:23:30
As an integral part of the ISTI, the global land
surface Databank provides the foundation from which
new methods of analysis, consistent benchmarking of
performance and data serving to end-users will be
established. Information regarding how the Databank
effort fits within the broad effort of the ISTI is
provided in an accompanying paper. Further
information also can be found through Constructive comments
are encouraged and can be provided at
We thank the many contributors of data that have
made establishment of the Databank possible.
1. Knowles Middleton, W.E., A History of the
Thermometer and its use in Meteorology,
Baltimore, Maryland: The John Hopkins Press,
2002, pp. 5-104.
2. Parker, D.E., T.P. Legg, C.K. Folland Int. J.
Clim. 12, 317-342 (1992).
3. Lawrimore, J. H., M. J. Menne, B. E. Gleason, C.
N. Williams, D. B. Wuertz, R. S. Vose, and J.
Rennie J. Geophys. Res., 116, D19121,
doi:10.1029/2011JD016187 (2011).
4. National Weather Service, 2011: What is the
COOP Program? [web site]
5. Vose, R. S., R. L. Schmoyer, P. M. Steurer, T. C.
Peterson, R. Heim, T. R. Karl, and J. Eischeid, The
Global Historical Climatology Network: Long-term
monthly temperature, precipitation, sea level
pressure, and station pressure data, 1992,
ORNL/CDIAC-53, NDP-041, 325 pp. [Available
from Carbon Dioxide Information Analysis Center,
Oak Ridge National Laboratory, P.O. Box 2008,
Oak Ridge, TN 37831.]
6. Peterson, T. C., and R. S. Vose, Bull. Amer.
Meteorol. Soc., 78, 28372849 (1997).
7. Jones, P. D., D. H. Lister, T. J. Osborn, C.
Harpham, M. Salmon, and C. P. Morice, J.
Geophys. Res., doi:10.1029/2011JD017139 (2012)
8. Menne, M.J., I. Durre, R.S. Vose, B.E. Gleason,
and T.G. Houston, Journal of Atmospheric and
Oceanic Technology, 27, 897-910 (2012)
9. Thorne, Peter W., and Coauthors, Bull. Amer.
Meteor. Soc., 92, ES40ES47. doi:
10. International Surface Temperature Initiative,
2011: “Databank effort” [web site]
11. Woodruff, S.D., Worley, S.J., Lubker, S.J., Ji, Z.,
Freeman, J.E., Berry, D.I., Brohan, P., Kent, E.C.,
Reynolds, R.W., Smith, S.R. & Wilkinson, C., Int.
J. Climatol., 31, 951-967, doi:10.1002/joc.2103
12. Dupigny-Giroux, Lesley-Ann, Thomas F. Ross,
Joe D. Elms, Raymond Truesdell, Stephen R.
Doty, . Bull. Amer. Meteor. Soc., 88, 10151017.
13. International Environmental Data Rescue
Organization (IEDRO), 2011: What is IEDRO?
[web site]
14. Williams, C., M. J. Menne, P. W. Thorne,, J.
Geophys. Res., 117, D05116 DOI:
10.1029/2011JD016761 (2012)
This article is copyrighted as indicated in the abstract. Reuse of AIP content is subject to the terms at: Downloaded to IP: On: Tue, 29 Oct 2013 07:23:30
... Because the main focus of the paleo-reanalysis is the pre-20th century period, only series starting before 1,880 were kept. In EKF400v2, GHCN-Monthly v3 is replaced with the merged data collection of the international surface temperature initiative (ISTI, Rennie et al., 2014). Like in the case of GHCN-Monthly v3 data, time series that start before 1880 were selected, leaving 619 records. ...
Full-text available
Abstract Data assimilation techniques are becoming increasingly popular for climate reconstruction. They benefit from estimating past climate states from both observation information and from model simulations. The first monthly global paleo‐reanalysis (EKF400) was generated over the 1600 and 2005 time period, and it provides estimates of several atmospheric fields. Here we present a new, considerably improved version of EKF400 (EKF400v2). EKF400v2 uses atmospheric‐only general circulation model simulations with a greatly extended observational network of early instrumental temperature and pressure data, documentary evidences and tree‐ring width and density proxy records. Furthermore, new observation types such as monthly precipitation amounts, number of wet days and coral proxy records were also included in the assimilation. In the version 2 system, the assimilation process has undergone methodological improvements such as the background‐error covariance matrix is estimated with a blending technique of a time‐dependent and a climatological covariance matrices. In general, the applied modifications resulted in enhanced reconstruction skill compared to version 1, especially in precipitation, sea‐level pressure and other variables beside the mostly assimilated temperature data, which already had high quality in the previous version. Additionally, two case studies are presented to demonstrate the applicability of EKF400v2 to analyse past climate variations and extreme events, as well as to investigate large‐scale climate dynamics.
Full-text available
Chapter 2 assesses observed large-scale changes in climate system drivers, key climate indicators and principal modes of variability. Chapter 3 considers model performance and detection/attribution, and Chapter 4 covers projections for a subset of these same indicators and modes of variability. Collectively, these chapters provide the basis for later chapters, which focus upon processes and regional changes. Within Chapter 2, changes are assessed from in situ and remotely sensed data and products and from indirect evidence of longer-term changes based upon a diverse range of climate proxies. The time-evolving availability of observations and proxy information dictate the periods that can be assessed. Wherever possible, recent changes are assessed for their significance in a longer-term context, including target proxy periods, both in terms of mean state and rates of change.
Full-text available
Although climate change is a global phenomenon, its manifestations and consequences are different in different regions, and therefore climate information on spatial scales ranging from sub-continental to local is used for impact and risk assessments. Chapter 10 assesses the foundations of how regional climate information is distilled from multiple, sometimes contrasting, lines of evidence. Starting from the assessment of global-scale observations in Chapter 2, Chapter 10 assesses the challenges and requirements associated with observations relevant at the regional scale. Chapter 10 also assesses the fitness of modelling tools available for attributing and projecting anthropogenic climate change in a regional context starting from the methodologies assessed in Chapters 3 and 4. Regional climate change is the result of the interplay between regional responses to both natural forcings and human influence (considered in Chapters 2, 5, 6 and 7), responses to large-scale climate phenomena characterizing internal variability (considered in Chapters 1–9), and processes and feedbacks of a regional nature.
Full-text available
Gulev, S. K., P. W. Thorne, J. Ahn, F. J. Dentener, C. M. Domingues, S. Gerland, D. Gong, D. S. Kaufman, H. C. Nnamchi, J. Quaas, J. A. Rivera, S. Sathyendranath, S. L. Smith, B. Trewin, K. von Shuckmann, R. S. Vose, 2021, Changing State of the Climate System. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson- Delmotte, V., P. Zhai, A. Pirani, S. L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M. I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J. B. R. Matthews, T. K. Maycock, T. Waterfield, O. Yelekçi, R. Yu and B. Zhou (eds.)]. Cambridge University Press. In Press.
Full-text available
This study is an extensive revision of the Climatic Research Unit (CRU) land station temperature database that has been used to produce a grid-box data set of 5° latitude × 5° longitude temperature anomalies. The new database (CRUTEM4) comprises 5583 station records of which 4842 have enough data for the 1961-1990 period to calculate or estimate the average temperatures for this period. Many station records have had their data replaced by newly homogenized series that have been produced by a number of studies, particularly from National Meteorological Services (NMSs). Hemispheric temperature averages for land areas developed with the new CRUTEM4 data set differ slightly from their CRUTEM3 equivalent. The inclusion of much additional data from the Arctic (particularly the Russian Arctic) has led to estimates for the Northern Hemisphere (NH) being warmer by about 0.1°C for the years since 2001. The NH/Southern Hemisphere (SH) warms by 1.12°C/0.84°C over the period 1901-2010. The robustness of the hemispheric averages is assessed by producing five different analyses, each including a different subset of 20% of the station time series and by omitting some large countries. CRUTEM4 is also compared with hemispheric averages produced by reanalyses undertaken by the European Centre for Medium-Range Weather Forecasts (ECMWF): ERA-40 (1958-2001) and ERA-Interim (1979-2010) data sets. For the NH, agreement is good back to 1958 and excellent from 1979 at monthly, annual, and decadal time scales. For the SH, agreement is poorer, but if the area is restricted to the SH north of 60°S, the agreement is dramatically improved from the mid-1970s.
Full-text available
Several topics that were covered during a meeting that took place at Met Office, on behalf of the UK government, in Turkey in February 2010, are presented. The meeting focused on the changing interests of climate scientists in the datasets. One of the requirements is openness and transparency of the data that involves hard work to ascertain provenance and associated quality assurance of observations, and applying strict revision control and versioning. These aspects increase process overheads substantially but add significant value in terms of product robustness, quantifying uncertainties, and user confidence in products. The global data access needs significant improvement. There is still no single recognized data repository for land meteorological data, which exists for ocean data (World Ocean Database), weather balloons (Integrated Global Radiosonde Archive), and surface ocean measurements, International Comprehensive Ocean-Atmosphere Data Set (ICOADS).
Full-text available
Commenced in 2000, the Climate Database Modernization Program (CDMP) managed by the National Climatic Data Center (NCDC) is a joint effort of the National Oceanic and Atmospheric Administration (NOAA) personnel and private sector. This program that costs multi-million dollar has been focusing on imaging and keying climate and environmental records worldwide back from the 18th up to present and has created jobs in various sectors of the national economy, as well. All the data images recovered have been scanned by CDMP contractors to ensure the best image quality followed by indexing. CDMP has developed an online system called Web Search Store Retrieve Display (WSSRD ®) where all the indexed images and other keyed data is processed via existing control and quality assessment routines at NCDC are made available. The keying of data and digital preservation of the historic records provided researchers and data users to understand the climate variability and changes.
Changes in the circumstances behind in situ temperature measurements often lead to biases in individual station records that, collectively, can also bias regional temperature trends. Since these biases are comparable in magnitude to climate change signals, homogeneity "corrections" are necessary to make the records suitable for climate analysis. To quantify the effectiveness of U.S. surface temperature homogenization, a randomized perturbed ensemble of the USHCN pairwise homogenization algorithm was run against a suite of benchmark analogs to real monthly temperature data. Results indicate that all randomized versions of the algorithm consistently produce homogenized data closer to the true climate signal in the presence of widespread systematic errors. When applied to the real-world observations, the randomized ensemble reinforces previous understanding that the two dominant sources of bias in the U.S. temperature records are caused by changes to time of observation (spurious cooling in minimum and maximum) and conversion to electronic resistance thermometers (spurious cooling in maximum and warming in minimum). Error bounds defined by the ensemble output indicate that maximum temperature trends are positive for the past 30, 50 and 100 years, and that these maximums contain pervasive negative biases that cause the unhomogenized (raw) trends to fall below the lower limits of uncertainty. Moreover, because residual bias in the homogenized analogs is one-tailed under biased errors, it is likely that maximum temperature trends have been underestimated in the USHCN. Trends for minimum temperature are also positive over the three periods, but the ensemble error bounds encompass trends from the unhomogenized data.
Since the early 1990s the Global Historical Climatology Network-Monthly (GHCN-M) data set has been an internationally recognized source of data for the study of observed variability and change in land surface temperature. It provides monthly mean temperature data for 7280 stations from 226 countries and territories, ongoing monthly updates of more than 2000 stations to support monitoring of current and evolving climate conditions, and homogeneity adjustments to remove non-climatic influences that can bias the observed temperature record. The release of version 3 monthly mean temperature data marks the first major revision to this data set in over ten years. It introduces a number of improvements and changes that include consolidating "duplicate" series, updating records from recent decades, and the use of new approaches to homogenization and quality assurance. Although the underlying structure of the data set is significantly different than version 2, conclusions regarding the rate of warming in global land surface temperature are largely unchanged.
This NDP contains monthly temperature, precipitation, sea level pressure, and station pressure data for thousands of meteorological stations worldwide. The database was compiled from pre-existing national, regional, and global collections of data as a part of the Global Historical Climatology Network (CHCN) project. It contains data from roughly 6000 temperature stations, 7500 precipitation stations, 1800 sea level pressure stations, and 1800 station pressure stations. Each station has at least 10 years of data, and about 40% have more than 50 years of data. Spatial coverage is good over most of the globe, particularly for the United States and Europe. Data gaps are evident over the Amazon rainforest, the Sahara desert, Greenland, and Antarctica.
Release 2.5 of the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) is a major update (covering 1662–2007) of the world's most extensive surface marine meteorological data collection. Building on extensive national and international partnerships, many new and improved contributing datasets have been processed into a uniform format and combined with the previous Release 2.4. The new data range from early non-instrumental ship observations to measurements initiated in the twentieth century from buoys and other automated platform types. Improvements to existing data include replacing preliminary Global Telecommunication System (GTS) receipts with more reliable, delayed mode reports for post-1997 data, and in the processing and quality control (QC) of humidity observations. Over the entire period of record, spatial and temporal coverage has been enriched and data and metadata quality has been improved. Along with the observations, now updated monthly in near real time, Release 2.5 includes quality-controlled monthly summary products for 2° latitude × 2° longitude (since 1800) and 1° × 1° boxes (since 1960), together with multiple options for access to the data and products. The measured and estimated data in Release 2.5 are subject to many technical changes, multiple archive sources, and historical events throughout the more than three-century record. Some of these data characteristics are highlighted, including known unresolved errors and inhomogeneities, which may impact climate and other research applications. Anticipated future directions for ICOADS aim to continue adding scientific value to the observations, products, and metadata, as well as strengthen the cooperative enterprise through expanded linkages to international initiatives and organisations. Copyright © 2010 Royal Meteorological Society
2011: What is IEDRO? [web site
International Environmental Data Rescue Organization (IEDRO), 2011: What is IEDRO? [web site] 14. Williams, C., M. J. Menne, P. W. Thorne,, J. Geophys. Res., 117, D05116 DOI: 10.1029/2011JD016761 (2012)
  • D E Parker
  • T P Legg
  • C K Folland
Parker, D.E., T.P. Legg, C.K. Folland Int. J. Clim. 12, 317-342 (1992).