User Instructions
Running the Workflow on NSF NCAR HPCs
This section walks a new user through the end-to-end steps required to configure and execute the WPS WRF Workflow on NSF NCAR’s Casper or Derecho HPC Systems. This workflow should also work on any other HPC system that has either a PBS or Slurm queue scheduler.
Prerequisites
Access and Permissions
Login credentials for Casper or Derecho (e.g., via
ssh username@casper.hpc.ucar.edu)Valid PBS project/account codes (e.g.,
#PBS -A NWS####)Write permissions on scratch/project directories for workflow outputs and downloaded data
Local Tools
gitfor cloning the repoconda(or module) to activate a Python 3.11 environmentPBS commands:
qsub,qstat,qdel
Clone the Repository
git clone git@github.com:NCAR/wps_wrf_workflow.git
cd wps_wrf_workflow
Configure the Workflow
Main YAML Configuration
Enter the config directory:
cd config
Open the FastEddy New Mexico config:
vim config_fasteddy_nm.yaml
Key fields to update:
template_dir: Path to../templates/fasteddy_nmin a local clone, so that the workflow picks up the user’s edits to template namelists, job submission scripts, etc.exp_name: Optional. Add/modify this field if you want to give this simulation an experiment name (e.g., if using different IC/LBC models or testing different WRF namelist options). (Default:None)exp_wrf_only: Optional. Ifexp_nameis set but you only want the experiment name to apply to real/WRF while using the same metgrid output (met_em*files) for all experiments (e.g., if only WRF namelist options are being changed), then set this toTrue. (Default:False.)wps_ins_dirandwrf_ins_dir: Directories where WPS and WRF are installed; defaults can remain if they are read-accessible. (Default values:/glade/u/home/jaredlee/programs/WPS-4.6-dmparand/glade/u/home/jaredlee/programs/WRF-4.6, respectively.)Note
Currently the workflow assumes WPS is compiled for
dmparexecution, notserial, in order to speed up the WPS programs (and on NSF NCAR HPCs, it should be compiled on Casper, not Derecho, while WRF should be compiled on Derecho, not Casper).wps_run_dirandwrf_run_dir: Scratch or project paths that are write-accessible. The workflow will create subdirectories automatically. NOTE: These must be updated from the defaults in the repo.grib_dir: Location to store downloaded HRRR GRIB2 files. This location must be a path that is write-accessible by the user and should be updated from the default value.sim_hrs: Total simulation length in hours (e.g.,30). This value drives how most subsequent steps run. (Default value:24.)do_geogrid: Set toTrueto run geogrid; if geogrid outputs are already available (e.g., from a colleague), set toFalse. Ensureopt_output_from_geogrid_pathinnamelist.wpstemplate(s) points to correct geogrid output location. (Default value:False.)ungrib_domain:fullungribs the complete GRIB file domain (for HRRR this is CONUS, while for GFS and GEFS this is global). The optionsubsetis available only for GEFS output currently, to geographically subset GEFS output to some smaller domain to enableungrib.exeto run faster, which may be helpful in operational configurations (this optional setting could be added in the future for GFS or other IC/LBC model sources, though). (Default value:full.)icbc_modelandicbc_source:icbc_model: e.g.,hrrr:gfs:gfs-fnl:gefs:. The script is tolerant to all-lowercase or all-caps entries here. (Default value:GFS.)icbc_source:AWSorGoogleCloudto dowload from AWS or GoogleCloud HRRR repositories, orGLADEto link to local RDA archive files (GLADE/RDA respositories are available forgfsandgfs-fnlbut not forhrrr: orgefs:). Also note thaticbc_sourcecan tolerate all-caps or all-lowercase entries for bothAWSandGoogleCloud, in addition to camel-case forGoogleCloud. (Default value:GLADE.)
icbc_analysis:Trueto initialize from the analysis (forecast hour 0) files from successive initial condition/lateral boundary condition (IC/LBC) model cycles (NOTE: this can only be done retrospectively, and should provide the best-possible ICs/LBCs);False: to initialize from forecast files from a single IC/LBC model cycle. (Default value:False.)icbc_fc_dt: Normally set to0. If set to some other positive integer N, then ICs/LBCs are obtained from an N-hour old cycle of theicbc_model. (That situation may be useful to stay ahead of the clock in operational forecast systems.) (Default value:0.)get_icbc:Trueto download or create symbolic links to model data to use as ICs/LBCs. If the specified files already exist locally where expected, then nothing is re-downloaded or re-linked. (Default value:False.)hrrr_native:Trueto download the HRRR native (hybrid)-level files for atmospheric variables and pressure-level files for soil variables only (HRRR native-level files have more vertical levels [51] than pressure-level files [40] but do not include soil variables).Falseto download only the HRRR pressure-level files for both atmospheric and soil variables. Has no effect ificbc_modelis set to something other thanhrrr. (Default value:True.)do_geogrid,do_ungrib,do_metgrid,do_real,do_wrf: Control each WPS/WRF step. Geogrid is a one-time domain setup; Ungrib/Metgrid/Real need to be run for each WRF forecast cycle (and potentially also for different WRF configurations for the same WRF cycle, depending on what is different). (Default values:Falsefor all.)do_avg_tsfc: IfTrue, runs a WPS utility (avg_tsfc.exe) that calculates a 24-hour average surface temperature to better estimate lake-surface temps (avoiding interpolation from oceans, which can result in wildly inaccurate surface temperatures for inland lakes). This step is run after Ungrib but before Metgrid. If the output file (TAVGSFC) has already been generated from a previous attempt to run the workflow for this WRF cycle/ experiment, then set this toFalseto save a few minutes. (Default value:False.)use_tavgsfc:Trueto use the output from theavg_tsfc.exeutility (a file called TAVGSFC) in Metgrid. This will add the appropriate line to the&metgridsection ofnamelist.wpsif it does not already exist. (Default value:False.)archive: WhenTrue, the workflow automatically moves all output (namelists, wrfout*, logs) into an archival directory (setarc_dirto a write accessible directory) for easy retrieval. (Default value:False.)realtime: WhenTruewhilerun_wrf = True, the workflow will hold after submitting WRF and monitor the progress of the WRF job, and not move on to the next step of the workflow or the next model cycle until WRF completes successfully. Ifrealtime = Falsewhilerun_wrf = True, then the workflow will move to the next step or model cycle immediately after submitting WRF to the queue, without waiting to monitor its status, progress, or success. Ifrun_wrf = False, thenrealtimehas no effect. (Default value:False.)account: When provided, this HPC account key will be used instead of the account key that is provided in thesubmit_*.bashscripts in the templates directory. If not provided, then the account key line is not overwritten.
All other fields can remain at their default values unless specialized cases arise.
Using Time-Varying (GEOS-5) or Climatological Aerosols with mp_phys=28
This section provides some brief instructions for users who wish to use the Thompson-Eidhammer
aerosol-aware microphysics scheme in WRF (mp_phys=28), whether with climatological
aerosols or with time-varying aerosols from GEOS-5. These instructions presume the user is
using WRF 4.4+, which is when the GEOS-5 and black carbon aerosol capabilities were added.
These instructions are intended to complement helpful information already posted at
https://www2.mmm.ucar.edu/wrf/users/physics/mp28_updated_new.html and at
https://github.com/wrf-model/WRF/pull/1616.
1. Thompson-Eidhammer climatological aerosol
The workflow does not require any special variables to be set in the config yaml file. The only settings to choose this option would come in the template WPS & WRF namelists.
In
namelist.wps, set this in the template namelist (assuming Derecho/Casper):
&metgrid
constants_name = '/glade/work/wrfhelp/WPS_files/QNWFA_QNIFA_QNBCA_SIGMA_MONTHLY.dat',
In
namelist.input, ensure you have these settings in the template namelist:
&time_control
[any options for auxinput15 or auxinput17 should be commented or removed]
&domains
wif_input_opt = 2,
num_wif_levels = 30,
&physics
mp_physics = 28, 28, 28,
aer_opt = 3,
use_aero_icbc = .true.,
use_rap_aero_icbc = .false.,
wif_fire_emit = .false.,
qna_update = 0,
2. GEOS-5 time-varying aerosols
Using this option presumes that the geos2wrf software has already been utilized to
process GEOS-5 global files (which are in NetCDF from NASA) into regional files in WPS
Intermediate Format, with the variables that are expected to be found by Metgrid, Real, and WRF,
as detailed in: https://github.com/wrf-model/WRF/pull/1616. A link to the geos2wrf
Github repository will be posted here after that repo becomes publicly available.
There are some settings in the config yaml file that are required, but first, these are the settings in the template WPS & WRF namelists that should be set prior to running the workflow:
In
namelist.wps, set something like this in the template namelist:
&metgrid
fg_name = '/glade/derecho/scratch/jaredlee/ipc/wps/20170301_00/ungrib/GFS_FNL',’/glade/derecho/scratch/jaredlee/ipc/wps/20170301_00/ungrib/GEOS',
Note the user should not supply QNWFA_QNIFA_QNBCA_SIGMA_MONTHLY.dat in
constants_name when using GEOS-5 aerosols, as that file is for the aerosol climatology.
In
namelist.input, ensure you have these settings in the template namelist:
&time_control
[any options for auxinput15 commented or deleted]
auxinput17_inname = "wrfqnainp_d0*",
auxinput17_interval_m = 180, 180, 180,
io_form_auxinput17 = 2,
&domains
wif_input_opt = 2,
num_wif_levels = 72,
&physics
mp_physics = 28, 28, 28,
aer_opt = 3,
use_aero_icbc = .false.,
use_rap_aero_icbc = .true.,
wif_fire_emit = .true.,
qna_update = 1,
In addition, these options in the config yaml file need to be considered:
use_geos5_aero_fcst:Trueto use time-varying aerosol data from GEOS-5 forecasts in Metgrid, for use with the Thompson-Eidhammer aerosol-aware microphysics scheme in WRF v4.4+ (mp_phys=28). These GEOS-5 data files must already be in WPS Intermediate Format. This option will add the appropriate line/entry to the&metgridsection ofnamelist.wpsif it does not already exist. (Default value:False.)use_geos5_aero_anal:Trueto use time-varying aerosol data from GEOS-5 analyses in Metgrid, for use with the Thompson-Eidhammer aerosol-aware microphysics scheme in WRF v4.4+ (mp_phys=28). These GEOS-5 data files must already be in WPS Intermediate Format. This option will add the appropriate line/entry to the&metgridsection ofnamelist.wpsif it does not already exist. (Default value:False.)geos5_int_dir: String specifying the path of the parent directory where the GEOS-5 time-varying aerosol data in WPS Intermediate Format are stored. This option is only used ifuse_geos5_aero_fcstoruse_geos5_aero_analare set toTrue. (Default value:None.)
Edit Template Files
Move into the FastEddy template directory:
cd ../templates/fasteddy_nm
Update Account in Submit Scripts:
Open each PBS script (e.g.,
submit_geogrid.bash.casper,submit_ungrib.bash.casper, etc.) and specify the desired user account to charge for core hours:
#PBS -A <user_account_code>The user may also wish to adjust the number of nodes and cores per node requested in some of these submit scripts based on runtime, core hour charges, etc.:
#PBS -l select=<# of nodes>:ncpus=<# of CPUs per node>:mpiprocs=<# of MPI processes per node> [snip] mpiexec -n <# of nodes * CPUs per node> ./wrf.exe
Modify
namelist.wps.hrrr:opt_output_from_geogrid_path:
opt_output_from_geogrid_path = "/path/to/geogrid_output"
&ungribsection:
prefix = "/path/to/ungrib_output/<CYCLE>/ungrib/HRRR"
Note: Workflow will create
.../ungrib_output/<CYCLE>/hybridand.../<CYCLE>/soilsubdirectories automatically if needed.&metgridsection:fg_name = "/path/to/ungrib_output/<CYCLE>/ungrib/HRRR_hybr", "/path/to/ungrib_output/CYCLE/ungrib/HRRR_soil", opt_output_from_metgrid_path = "/path/to/metgrid_output/<CYCLE>/metgrid"
If using your own WPS installation, then the user should also update these variables:
opt_geogrid_tbl_path = '/path/to/WPS_install/geogrid', opt_metgrid_tbl_path = '/path/to/WPS_install/metgrid',
Directories specified above need write access; the control script will mkdir -p as
needed and update <CYCLE> in these namelist variables automatically.
Python Environment Setup
Activate Python 3.11:
conda activate /glade/work/jaredlee/conda-envs/my-npl-202403
Verify dependencies:
pip install -r environment.yml # or ensure 'yaml', 'netCDF4', 'numpy', 'pandas', etc., import without errors
Dependencies are declared in environment.yml, which is based on NSF NCAR’s NPL 2024a stack plus extras.
Running the Workflow
From the repository root:
# Display usage/help
python setup_wps_wrf.py -h
# Execute workflow for one cycle
python setup_wps_wrf.py \
-b 20250324_00 \
-c config/config_fasteddy_nm.yaml
-b YYYYMMDD_HH: Start cycle (e.g.,20250324_00)-c: Workflow config YAML path (can be a relative path fromsetup_wps_wrf.py)
Automatic Directory Creation: The Python scripts will create all parent directories for
geogrid, ungrib, metgrid, etc., based on the configured paths.
Workflow Execution Details
For running geogrid.exe, ungrib.exe, metgrid.exe, real.exe,
and wrf.exe, batch job submission scripts are needed to submit them to the HPC queue.
If running on a non-NSF NCAR HPC system, users will need the following submission script
files in template_dir:
submit_geogrid.bashsubmit_ungrib.bashsubmit_metgrid.bashsubmit_real.bashsubmit_wrf.bash
However, if users are running on NSF NCAR HPCs (Casper and/or Derecho), WPS needs to be
compiled on Casper (whose queue allows for single-core jobs without reserving an entire node),
while WRF needs to be compiled on Derecho (whose queues require reserving an entire
128-core node even if only 1 core is used). Set wps_ins_dir and wrf_ins_dir
to point to those installation directories. Both Derecho and Casper allow peer scheduling
to queues on either machine from either machine (see:
Peer Scheduling scheduling between systems
for more information). To enable transparent-to-the-user execution of the entire workflow
from a login node on either Casper or Derecho, two sets of files are needed. If executing the
workflow on Casper, these files need to be in template_dir, with the
submit_real and submit_wrf scripts including the required syntax to submit
to a queue on Derecho from Casper:
submit_geogrid.bash.caspersubmit_ungrib.bash.caspersubmit_metgrid.bash.caspersubmit_real.bash.caspersubmit_wrf.bash.casper
If executing the workflow on Derecho, then these files need to be in template_dir,
with the submit_geogrid, submit_ungrib, and submit_metgrid scripts
including the required syntax to submit to a queue on Casper from Derecho:
submit_geogrid.bash.derechosubmit_ungrib.bash.derechosubmit_metgrid.bash.derechosubmit_real.bash.derechosubmit_wrf.bash.derecho
The workflow will automatically copy the appropriate submission script template to the run
directories and strip the .casper or .derecho file name suffix if they exist.
Additionally, note that in template_dir the namelist templates must have suffixes
corresponding to icbc_model, to enable WRF experiments that can utilize different
models for the ICs/LBCs. This is done because there are typically different numbers of soil
or atmospheric levels in each model’s output, which requires different values for certain
namelist settings, and to not over-complicate the workflow scripts with lots of
if/then loops to handle model-specific changes to namelist variables that might be further
complicated with future updates to those external models, the number of output levels, or
other key parameters. For example, if a user wants to be able to run WRF driven by
GFS, GFS-FNL, or HRRR output, the user would need these files in template_dir:
namelist.input.gfsnamelist.input.gfs-fnlnamelist.input.hrrrnamelist.wps.gfsnamelist.wps.gfs-fnlnamelist.wps.hrrr
Note that users only need to have the template files corresponding to the desired icbc_model
variants that they would like to be available to use.
If users use HRRR model output as ICs/LBCs for WRF, note that the number of vertical levels
is different in the native (hybrid)-level output (51) than in the pressure-level output
(40). Therefore, if users want the flexibility to run with either native/hybrid or
pressure-level HRRR output, then two different template WRF namelists in template_dir
are needed:
namelist.input.hrrr.hybrnamelist.input.hrrr.pres
If users do not have either of these files, the workflow defaults to using namelist.input.hrrr,
which then may cause an error when real.exe is run if the wrong value for
num_metgrid_levels is specified in namelist.input.hrrr for the type of HRRR output.
Note that if the user only intends to run with ONLY hybrid-level or ONLY pressure-level HRRR output,
then the user will only need to have namelist.input.hrrr present; just ensure that the correct
value for num_metgrid_levels is set in namelist.input.hrrr.
Also note that for the WPS and WRF namelists, this workflow does NOT generate grid/domain information from scratch or from any user inputs. The user is required to specify the grid/domain details in advance in these namelist template files. If the expected template namelist files do not exist prior to running the workflow, then the workflow will fail. Other tools already exist for setting grid/domain configurations for WPS and WRF namelists, such as WRF Domain Wizard. Future updates to the workflow may add the capability to specify domain configuration details in a YAML file to automatically update the WPS and WRF namelists.
One final note: If the user desires to control which variables are written out to history streams,
then there should also be a file (or multiple file names separated by commas, which could be the
same or unique for each domain) set by the user in the &time_control section of
namelist.input.{icbc_model}, such as:
iofields_filename = “vars_io.txt”,
Any files listed on that line should be stored in template_dir. If any requested files are
not found in template_dir, the workflow will log a warning, and WRF will still run, but
then the default output variables for the specified stream(s) in the file will be written out
for that domain. For more information on this file and its required syntax, see the
WRF Model README.io_config
file.
ICBC Download/Link:
Downloads IC/LBC files from a web server or links to them in a local repository. For example, if
icbc_model = hrrrandhrrr_native = True, thendownload_hrrr_from_aws.pydownloads HRRR native-grid (hrrr.YYYYMMDD/CONUS/hrrr.tHHz.wrfnatf00.grib2) then pressure-grid (wrfprs) files for each hour in the requested simulation.Skips download/linking if files already exist locally (useful for repeated runs).
Ungrib:
Ungrib is inherently serial; the workflow subdivides it per hour and runs 2×N jobs (hybrid and soil, if using HRRR native-grid files) or N jobs (for all other IC/LBC models) to make it embarrassingly parallel. Ungrib is run separately as a 1-core job for each
icbc_modelfile in its own directory to avoidungrib.execleanup processes that delete all files matching a starting pattern, which often causes “file not found” errors when running multiple instances ofungrib.exesimultaneously within the same directory.Includes a short
sleep(1–3 s) betweenqsubcalls to avoid overloading the PBS queue.WPS intermediate format files (
YYYYMMDD_HH/ungrib/HRRR_hybrid*,*HRRR_soil*) move into a combinedungrib/directory once complete.
Geogrid:
Domain setup; runs once per domain. Subsequent simulations using the same domain can skip by setting
do_geogrid: False.
avg_tsfc:
Calculates a 24-h average surface temperature field to improve lake-surface temps in land masks. Ignores times outside whole 24-h periods by default.
Metgrid:
Uses
ungrib(and optionallyavg_tsfc) outputs to produce NetCDF files on the WRF horizontal grid but on the vertical levels from the ungribbed WPS intermediate format file.
Real:
Takes output from metgrid (
met_em_d0*files) and puts it onto the full 3D WRF grid to generate initial-time (wrfinput_d0*), lateral boundary condition (wrfbdy_d01), and (optionally) lower boundary condition (wrflowinp_d0*) files that span the requested simulation time.Submits via
qsub submit_real.bash; monitors job status.Logs for every processor executing real.exe will appear in
rsl.out.*andrsl.error.*files. Note that WRF writes logs to the same file names, so these will be overwritten unless moved elsewhere.
WRF:
Submits WRF model via
qsub submit_wrf.bash; monitors job status.If a user types
CTRL+C, WRF continues running on the compute nodes; logs andwrfout*files appear in thewrf/subdirectory. Otherwise, the workflow will monitor the WRF simulation’s progress, and only exit upon finding an error or success message in the log files. A future update will clarify how to move on to the next WPS/WRF cycle after submitting WRF, without waiting to monitor the WRF job.
Monitoring and Troubleshooting
Log Locations: Each step (
geogrid/,ungrib/,metgrid/,real/,wrf/) has its own*.logfiles (orrsl.*files forreal.exeandwrf.exe). Currently, the workflow scripts only look for key phrases to indicate success or failure of the job, and does not analyze the error messages to provide hints about what might be wrong. Future enhancements to the workflow could include such helpful hints, though. The WRF & MPAS-A Forum is a useful resource to consult for WPS & WRF troubleshooting issues.Inspecting Jobs:
qstat -u $USER # List running PBS jobs tail -f wrf/logs/metgrid.log # Follow metgrid progress
Common Errors:
Error in ext_pkg_open_for_write_begin: Write-permission error on output path - verify
wps_run_dirand template prefixes.Missing Python modules: Ensure the Python 3.11 environment with required packages has been activated.
Slurm vs PBS scripts: A warning
check_job_status.shreferences Slurm; it can be ignored or updated for PBS compatibility.
Reviewing Output
Data Directory: For example for HRRR,
data/hrrr/hrrr.YYYYMMDD/conus/for raw GRIB2 files for ICs/LBCs.Workflow Directory:
ungrib/,geogrid/,metgrid/subfolders withinwps_run_dir/YYYYMMDD_HH/Log files,
wrfinput*,wrfbdy, andwrfout*files within wrf_run_dir/YYYYMMDD_HH/
Archive: If
archive: True, all run artifacts move toarc_dir/YYYYMMDD_HH/upon completion.