Content uploaded by Stefan Rahimi
Author content
All content in this area was uploaded by Stefan Rahimi on Oct 11, 2023
Content may be subject to copyright.
1
Downscaling file descriptions, directory structure, and data access
Stefan Rahimi & Lei Huang (updated 6 October 2023)
Description website: https://dept.atmos.ucla.edu/alexhall/downscaling-cmip6
On AWS: https://registry.opendata.aws/wrf-cmip6, s3://wrf-cmip6-noversioning/
This document describes the directory structure inside the container hosting the
downscaling data and the files therein on the storage medium. All files are in NetCDF-4
format.
1. Data tiers
Each of the three tiers of data contains output from all grids: the 45- 9-, and 3-km
California and Wyoming grids. Users of the data may wish to use data of different tiers
depending on their research interest or analysis objectives (see Figure below). Only a
subset of GCMs have been downscaled to the highest resolution (3-km), and all GCMs
are downscaled to 9-km. The downscaling method mirrors the methods of Rahimi et al.,
(2022).
Figure 1: Domain setup of WRF simulations with a colorfill representing the complex
topography across the region.
2
Tier 1
Tier 1 data files contain a datastream of 200+ WRF variables at 6-hour intervals. This
includes variables on 39 vertical levels. Tier 1 files can be used to analyze, and process
variables of interest not provided in Tier 2 and Tier 3 files. These files may also be used
in conjunction with ndown.exe to dynamically downscale to higher resolutions across
other regions of western North America, bypassing the need to rerun the coarser
resolution experiments or preprocess the GCM input. These files are date and time
specific, with the following naming convention:
wrfout_d01_<yyyy>-<mm>-<dd>_<hh>:00:00,
where yyyy, mm, dd, and hh, are the 4-digit year, the 2-digit month, the 2-digit day, and
2-digit hour, respectively.
File sizes for each 6-hourly file are approximately 22, 140, 170, and 90 Mb for the 45-,
9-, 3-km CA, and 3-km WY domains, respectively. Note that all wrfout* files have the
label ‘d01’ in their name, even though we identify different grids by d01, d02, d03, and
d04 (see below).
Note that all variables in this tier are in their WRF-native form.
Tier 2
Tier 2 data files contain an auxiliary WRF datastream which contains 21 variables at 1-
hour intervals. These variables are:
Name
Units
1. 2-m temperature
2. 2-m specific humidity
3. Surface pressure
4. 10-m u-component of the wind (grid relative)
5. 10-m v-component of the wind (grid relative)
6. Snow water equivalent
7. Skin temperature
8. Non-convective precipitation (cumulative)
9. Convective precipitation (cumulative)
10. Cumulative snowfall equivalent
11. Diffuse downwelled solar radiation
12. Surface upwelled solar radiation (all sky)
13. Surface upwelled solar radiation (clear sky)
14. Surface downwelled solar radiation (all sky)
15. Surface downwelled solar radiation (clear sky)
16. Surface upwelled longwave radiation (all sky)
17. Surface upwelled longwave radiation (clear sky)
18. Surface downwelled longwave radiation (all sky)
[K]
[kg kg-1]
[Pa]
[m s-1]
[m s-1]
[mm]
[K]
[mm]
[mm]
[mm]
[W m-2]
[W m-2]
[W m-2]
[W m-2]
[W m-2]
[W m-2]
[W m-2]
[W m-2]
3
19. Surface downwelled longwave radiation (clear sky)
20. Surface runoff
21. Sub-surface runoff
[W m-2]
[mm s-1]
[mm s-1]
In a subset of experiments, we also provide direct downward irradiance [W m-2],
downward diffuse irradiance [W m-2], direct normal irradiance [W m-2], model level
boundary layer zonal and meridional winds [m s-1], perturbation potential temperature
[K], perturbation pressure [Pa], perturbation geopotential [m s-2], and outgoing longwave
radiation.
As for Tier 1 data, these files are date and time specific, with the following naming
convention:
auxhist_d01_<yyyy>-<mm>-<dd>_<hh>:00:00,
where yyyy, mm, dd, and hh, are the 4-digit year, the 2-digit month, the 2-digit day, and
2-digit hour, respectively. Note that all variables in this tier are in their WRF-native form.
Note that all auxhist* files have the label ‘d01’ in their name, even though we identify
different grids by d01, d02, d03, and d04 (see below).
Tier 3
Tier 3 data files contain either daily averaged, daily maximum, or daily minimum
quantities for 42 variables. These variables are:
Name
Units
Label
1. 2-m average temperature
2. 2-m minimum temperature
3. 2-m maximum temperature
4. Maximum hourly precipitation
5. 2-m specific humidity
6. Maximum 10-m wind speed
7. Snow water equivalent
8. Precipitation rate
9. Snow precipitation rate
10. Relative humidity
11. Integrated vapor transport (zonal and
meridional components; earth relative)
12. Ice water path
13. Liquid water path
14. Soil moisture
15. Soil temperature
16. Skin temperature
17. Surface pressure
18. Surface runoff
[K]
[K]
[K]
[mm h-1]
[kg kg-1]
[m s-1]
[mm]
[mm d-1]
[mm d-1]
[0-100]
[kg s-1 m-1]
[kg m-2]
[kg m-2]
[m3 m-3]
[K]
[K]
[Pa]
[mm d-1]
‘t2’
‘t2min
‘t2max’
‘prec_max’
‘q2’
‘wspd10max’
‘snow’
‘prec’
‘prec_snow’
‘rh’
‘ivt’
‘iwp’
‘lwp’
‘soil_m’
‘soil_t’
‘tskin’
‘psfc
‘sfc_runoff’
4
19. Sub-surface runoff
20. Evaporation
21. Evapotranspiration
22. Downwelled SW at surface (> 0 into sfc)
23. Downwelled LW at surface (> 0 into sfc)
24. Net SW flux at the surface (> 0 into sfc)
25. Net LW flux at surface (> 0 into atm)
26. Sensible heat flux at surface (> 0 into atn)
27. Latent heat flux at surface (> 0 into atm)
28. Ground heat flux at surface (> into atm)
29. 3-D q
30. 3-D w
31. 10-m u, v (earth relative)
[mm d-1]
[mm d-1]
[mm d-1]
[W m-2]
[W m-2]
[W m-2]
[W m-2]
[W m-2]
[W m-2]
[W m-2]
[kg kg-1]
[m s-1]
[m s-1]
‘subsfc_runoff’
‘evap_sfc’
‘etrans_sfc’
‘sw_dwn’
‘lw_dwn’
‘sfc_sfc’
‘lw_sfc’
‘sh_sfc’
‘lh_sfc’
‘gh_sfc’
‘q_3d
‘w_3d
‘uv10’
32. 3-D u (earth relative)
[m s-1]
‘u_3d’
33. 3-D v (earth relative)
[m s-1]
‘v_3d’
34. 3-D geopotential height
[m2 s-2]
‘phi_3d’
35. 3-D temperature
[K]
‘t_3d’
36. Convective precipitation*
[mm d-1]
‘prec_c’
37. Mean 10-m wind speed
[m s-1]
‘wspd10mean’
38. Planetary boundary layer height
[m]
‘pblh’
39. Convective available potential energy
[J/kg]
‘cape’
40. Convective inhibition
[J/kg]
‘cin’
41. Lifting condensation level
[m]
‘lcl’
42. Level of free convection
[m]
‘lfc’
(*) denotes that the variable is available only for d01 and d02.
Tier 3 files are labeled by variable, GCM, experimental ID (e.g. ‘hist’ for historical or
‘ssp370’ for SSP3-7.0), variant, domain, and year in the file name:
<variable>.daily.<gcm>.<variant>.<exp_id>.<domain>.<year>.nc,
For example, wspd10max.daily.mpi-esm1-2-lr.r71ip1f1.ssp370.d02.2092.nc.
The domain naming convention (d0*) is as follows:
● Domain 1 – d01: The 45-km domain
● Domain 2 – d02: The 9-km domain
● Domain 3 – d03: The 3-km California domain
● Domain 4 – d04: The 3-km Wyoming domain
An additional string may appear in the Tier 3 data file names, “bias-correct”. This
designates GCMs that have been bias-corrected to ERA5 prior to dynamical
downscaling, and their parent directories are appended with the string “_bc”. For
instance, we have the “cesm2_r11i1p1f1_historical” and
“cesm2_r11i1p1f1_historical_bc” data. These are downscaled identically, except that
5
bias correction of the mean-state GCM fields following Bruyère et al. (2014) is
implemented in the latter.
Tier 3 files are much smaller than Tier 1 and Tier 2 files, and thus are great for
analyses.
The 3-D variables are interpolated to the 1000, 925, 850, 800, 700, 600, 500, 400, 300,
and 250 hPa isobaric surfaces.
The soil fields are defined on the 4 Noah-MP soil levels of 5, 25, 70, and 150
centimeters. These correspond to the variable ‘ZS’ in the wrfinput files.
The time coordinates for each Tier 3 file is an array of integers containing the 4-digit
year, 2-digit month, and 2-digit day (yyyymmdd).
Tier 4
Tier 4 data files contain decadal averages of variables at hourly temporal scale from our
hourly datastream (Tier 2) for 11 variables (more to be added). These variables are:
Name
Units
Label
1. Downwelling longwave flux at bottom
[W m-2]
‘lw_dwn’
2. Downwelling shortwave flux at bottom
[W m-2]
‘sw_dwn’
3. Snow water equivalent
[mm]
‘snow’
4. Surface runoff
[mm d-1]
‘sfc_runoff’
5. Sub-surface runoff
[mm d-1]
‘subsfc_runoff’
6. 2-m average temperature
[K]
‘t2’
7. Surface skin temperature
[K]
‘tskin’
8. 10-meter u and v wind (earth relative)
[m s-1]
‘uv10’
9. Total precipitation
[mm d-1]
‘prec’
10. Convective precipitation
[mm d-1]
‘prec_c’
11. Snow precipitation
[mm d-1]
‘prec_snow’
Tier 4 files are labeled by variable, GCM, variant, domain, and start year of the decade
in the file name:
<variable>.decadal_mean.<gcm>.<variant>.<domain>.<start_year>.nc,
For example, lw_dwn.decadal_mean.mpi-esm1-2-lr.r7i1p1f1.d02.2000.nc.
6
An additional string may appear in the Tier 4 data file names: “bias-correct”. This
designates GCMs that have been bias corrected to ERA5 prior to dynamical
downscaling.
Metadata
Invariant fields are provided in grid-specific wrfinput_<domain> files as well as the Tier 1
files. These files are located in:
/wrf-cmip6-noversioning/downscaled_products/wrf_coordinates/
2. Directory structure
Each downscaled dataset follows an identical directory structure, which begins at the
top directory whose name is that of the dataset being dynamically downscaled. For
instance, the ERA5 downscaled dataset’s parent directory name is “era5” in /reanalysis
subfolder of /downscaled_products. The datasets are still being generated, so these
directory names are not listed in this document at the current time. Please see
https://dept.atmos.ucla.edu/alexhall/downscaling-cmip6
for a list of ongoing and completed dynamically downscaled products.
Within each dataset’s directory lives 4 subdirectories: 6hourly, hourly, postprocessed,
and spin_up that contain the Tier 1, Tier 2, Tier 3, and “spin-up” files, respectively. Spin-
up directories contain the full months’ spin-up data used for each year-long simulation.
Within the 6hourly, hourly, spin_up directories, each year is a subdirectory, inside which
lives domain directories containing the files therein. The overall directory structure is
given as:
/<data_name>/<file_type>/<year>/<domain>,
Where <data_name> is the label of the product being downscaled, <file_type> is either
6hourly, hourly, or spin_up, <year> is the 4-digit year, and <domain> is d01, d02, d03,
or d04, which are the domain number labels. Recall,
● Domain 1 – d01: The 45-km domain
● Domain 2 – d02: The 9-km domain
● Domain 3 – d03: The 3-km California domain
● Domain 4 – d04: The 3-km Wyoming domain
The postprocessed files are all contained in a single directory within
/<data_name>/postprocessed/<domain> given their relatively small size.
Of note, each simulated year is conducted independently of other years. Files contained
in the 6hourly and hourly directories span the time range from 1 September through 1
September of the following year. Consider a non-leap year in an hourly directory: 8,761
files are stored within. For fiscal year 1988 (directory /1988/<domain>), the first auxhist
7
file has a time stamp of 1988-09-01_00, while the last file has a time stamp of 1989-09-
01_00. However, we note that the directory for the next fiscal year 1989 also has a file
time stamped 1989-09-01_00. The question is begged, “Which file should we use?” The
short answer is, “it depends”.
Suppose you are interested in computing hourly precipitation rates in [mm/h]. Since
RAINNC and RAINC are output cumulatively, you would need to apply a running
difference in time at each grid point to get hourly precipitation increments. Since we
have 8,761 files in /1988, you can compute the precipitation rate for 8,760 time slices in
fiscal year 1988 within a single directory. However, if you are interested in hourly
precipitation events over a 5-year period, say, fiscal years 1988-1993, you would
compute the hourly rates within each yearly directory separately from one another, and
concatenate the results of multiple years. This is an artifact of how we output
precipitation in WRF (and a few other variables), for which we just let precipitation
accumulate in the output. To conclude, if you compare the 1989-09-01_00 file in /1988
with the same file in /1989, you will see a much lower precipitation value in the /1989 file
because it has only run for 1 month as opposed to 13 months.
Besides the cumulative variables (RAINNC, RAINC, SNOWNC, etc.), feel free to
disregard the final 09-01_00 file in each yearly subdirectory in analysis.
3. Data access
There are two ways to access the data. Because it is open source and, in our
experience, the fastest method, we only discuss the Amazon Web Services Command
Line Interface (AWS CLI) method here.
The data are located in a publicly accessible bucket on AWS’ Simple Storage Service
(S3). Buckets are simply data storage centers that can be accessed to either download
data or analyze the data using Python via Jupyter using Amazon Sagemaker.
Focusing on downloading using AWS CLI, make sure that your system has the
appropriate prerequisites to properly download, unpack, install, and use AWS CLI (see
official documentation here). Begin the process by downloading AWS CLI version II for
Linux and creating a directory in your home space, we call it /Awscli:
Mkdir Awscli
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"
-o "awscliv2.zip"
unzip awscliv2.zip
mkdir bin
mkdir aws-cli
You will see both the bin and aws-cli directories within your Awscli directory.
8
Now, it is time to install the software, which will give you access to the “aws” command
that allows you to transfer, peruse, inventory, and manage data on S3. Within Awscli,
install aws by typing:
./aws/install -i <absolute path to aws-cli>/aws-cli -b
<absolute path to bin>/bin
Within Awscli, you will now be able to run:
bin/aws –version
aws-cli/2.1.6 Python/3.7.3 Linux/4.12.14-95.51-default
exe/x86_64.sles.12 prompt/off
Now it is time to configure your AWS CLI. In your home directory, you should see the
hidden directory, /.aws. Add the following lines to the end of the file:
s3 =
max_concurrent_requests = 4
max_queue_size = 1000
The max_concurrent_requests is set to 4 (instead of the default 10) for our purposes
because we are interested in transferring <domain> directories individually. We
recommend transferring no more than 40 directories’ worth of data at a given time. AWS
S3 transfers are multithreaded; that is, file transfers are parallelized. This parallelization
can quickly overwhelm the system if too many transfers are initiated.
Transfers
Use the aws sync and cp commands to transfer data from the wrf-cmip6-noversioning
bucket to your local machine. For example, say I want to transfer the /mpi-esm1-2-
lr_r7i1p1f1_ssp370/hourly/2059/d03 directory to my local machine. I would type:
/glade/u/home/srahimi/Awscli/bin/aws s3 cp s3://wrf-cmip6-
noversioning/downscaled_products/gcm/mpi-esm1-2-
lr_r7i1p1f1_ssp370/hourly/2059/d03/ d03 --recursive --no-sign-
request
Note that I have used the absolute path to invoke my own aws command. The “--
recursive” flag recursively copies all of the directory’s contents into a newly created
directory d03. Note that you do not need to create d03 a priori.