1. Tom Roche
  2. lightningNOx


(part of the AQMEII-NA_N2O family of projects)


Note Bitbucket's reST renderer currently has problems with

so apologies in advance for "typos" below.


Code in this repository supports the CMAQ workflow= inline with parameters for users to preprocess lightning-NO:sub:x observations and related data into a form consumable by CCTM. Hopefully it will soon migrate to a repository managed by CMAS or EPA.


initial commit

Provenance of files used for initial commit:

  1. I downloaded and expanded the CMAQ-5.0.1 tarballs as suggested on the CMAQ wiki.

  2. I defined the environment variables M3HOME and M3DATA as directed on the CMAQ wiki.

  3. I ran the following bash session:

    $ mkdir -p ~/code/lightningNOx/ # new dir/folder, not in tarball tree
    $ pushd ~/code/lightningNOx/
    $ git init
    # if tarsplat on remote host, use `rsync -avh --append`
    $ cp -r ${M3HOME}/scripts/lnox/* ./
    $ rm -fr ./R-out/
    $ rm -fr ./R-scripts/.Rhistory
    $ mv ./README ./README.0
    $ chmod 444 ./README.0
    $ chmod 444 R-scripts/*.dump
    $ chmod 444 R-scripts/README
    # get ICCG inputs: they are small enough to manage in repo
    $ mkdir -p ./ICCG_in/
    $ cp ${M3DATA}/raw/lnox/input/* ./ICCG_in/
    $ chmod 444  ./ICCG_in/*
    # ocean mask, NLDN inputs are too large

code structure

The current code generates Makefiles to build the various lightning-NO:sub:x artifacts. The primary generators are

  • Makefile.template: this template Makefile composes make variables, targets, rules, and recipes that encode all information required to build lightning-NO:sub:x artifacts for a given temporality (or time period) except the temporal information itself, which is present only as template values.
  • config_lNOx.sh contains one or more temporalities over which it iterates, instantiating a Makefile for each temporality by writing the temporal template values.

The primary generators drive one or more secondary generators for either

  • getting (whether creating or retrieving)
  • plotting

each of the major types of lightning-NO:sub:x artifacts:

  1. mask files. These are spatial artifacts; assuming that lightning-NO:sub:x is being built for a normal CMAQ run with a single spatiality, only one mask file will need to be built. Building the mask file requires as input a MET_CRO_2D file (or METCRO2D) file, which is assumed to be provided with the rest of the meteorology for one's CMAQ run. The METCRO2D file is only accessed for its (IOAPI-provided, horizontal) grid information; since (again) that information is supposed to be constant over a CMAQ run, any METCRO2D for any temporality in the run should suffice.
  2. ICCG files. These are temporal artifacts, but have inputs for only 'summer' and 'winter', so the user must arbitrarily map their temporality to that space (in config_lNOx.sh).
  3. flash-totals files. These can be either downloaded or built, however the current code only supports download. (That being said, (Rob Pinder's?) code from the tarballs "benchmark" is in this repo, and could be presumably be recruited to build monthly flash-totals from "raw" NLDN data if required.)
  4. flash-parameters files. These are the current primary output of this project, providing lightning-NO:sub:x data suitable for input to CCTM.

Secondary generators are either

  • a pair of (bash, R) scripts, s.t. the bash script takes and processes arguments from make (failing fast on error), passing them to the R script, which does the real work. (This is done because I find argument handling in R more tedious than argument handling in bash.) This process generates mask and ICCG artifacts, and plots all of the artifacts.
  • a bash script driving LTNG_2D_DATA, to generate flash-parameters files.
  • a bash script driving wget, to download flash-totals files.

The current code also incorporates (Rob Pinder's?) previous code for building LTNG_2D_DATA (and the Makefile mechanism ensures that it's only built once).

use on HPCC

The repository code is setup to build/run "out of the box" on EPA AMAD HPCC (as of Mar 2014). Unfortunately, HPCC has some annoying quirks:

  1. amad1 has both a {working, not too downlevel}{Intel Fortran, R}, but lacks the necessary links to Dave Wong's magic IOAPI and netCDF libraries.
  2. infinity has the magic libraries and matching Intel Fortran, but its R is broken WRT R package=rgdal, which breaks R package=raster (used for regridding)
  3. No EPA system (EMVL or HPCC) supports ssh-ing out, and therefore also does not support git protocol = git.
  4. No EPA system of which I'm aware (and definitely not EMVL or HPCC) has "most normal" SSL certificates (and certainly not bitbucket's)

which complicates the build/run process unnecessarily, but not unduly. The build/run process is essentially

  1. (application-specific) Setup a workspace
  2. (application-specific) Edit Makefile.template
  3. (generic) Run config_lNOx.sh in a manner complicated by the above caveats.

In the following subsections, I give

  1. the application-specific steps for build/run-ing for the benchmark
  2. the application-specific steps for build/run-ing for AQMEII-NA
  3. the generic/common steps.


Sample lightning-NO:sub:x data and reference output are included with the CMAQ-5.0.1 tarballs. I refer to that as the tarballs "benchmark" since it's not part of the actual CMAQ-5.0.1 benchmark but serves a similar function. To build it on HPCC,

  1. Unpack the CMAQ-5.0.1 tarballs to a shared space (i.e., on /project), and record the location.

  2. Clone branch = tarball-benchmark of this repository to shared space. Given the HPCC usage notes above, one must do something like the following (note line broken to accommodate page width)

    env GIT_SSL_NO_VERIFY=true git clone -b tarball-benchmark \
  3. Note the path created by git clone, i.e., the path to the cloned repository.

  4. In Makefile.template, edit the paths to

    • PROJECT_DIR==path to cloned repository
    • TARBALL_ROOT==path to root of unpacked tarballs
  5. (necessitated by HPCC quirks) Open config_lNOx.sh and note the first element in MMYYYY_ARR, which is 06/2006. Note that this will cause the build process to die on infinity after creating Makefile.2006.06, so we will need to delete that file before restarting on amad1.

  6. With the previous step in mind, run the generic steps of the build.

The above process should result in the following outputs:

path size
$PROJECT_DIR/ICCG_out/iccg.2006.06.csv 1453750
$PROJECT_DIR/mask_out/mask.csv 363420
$PROJECT_DIR/LTNG_2D_DATA_out/LTNG_RATIO.2006.06.ioapi 4404668
$PROJECT_DIR/NLDN_in/NLDN.2006.06.IOAPI 560160
$PROJECT_DIR/ICCG_out/iccg.2006.06.pdf 460445
$PROJECT_DIR/LTNG_2D_DATA_out/lNOx_parms.2006.06.pdf 3636240
$PROJECT_DIR/mask_out/mask.pdf 448648
$PROJECT_DIR/NLDN_out/NLDN.2006.06.pdf 346148

The tarballs' reference or "known-good" output (including plots) for the "benchmark" should be @ $TARBALL_ROOT/CMAQv5.0.1/data/ref/lnox/ (Note the "benchmark" used PNG output for some plots, while all my plots are PDF.)


Building lightning-NO:sub:x for the AQMEII-NA_N2O study is quite similar to building the benchmark, except that

  1. There is no need for the CMAQ-5.0.1 tarballs.
  2. Clone project branch = master instead of branch = tarball-benchmark

The build process for AQMEII-NA is

  1. Clone branch = master of this repository to shared space. Given the HPCC usage notes above, one must do something like the following

    env GIT_SSL_NO_VERIFY=true git clone https://bitbucket.org/tlroche/lightningnox.git
  2. Note the path created by git clone, i.e., the path to the cloned repository.

  3. In Makefile.template, edit the path to

    • PROJECT_DIR==path to cloned repository
  4. (necessitated by HPCC quirks) Open config_lNOx.sh and note the first element in MMYYYY_ARR, which is 12/2007. Note that this will cause the build process to die on infinity after creating Makefile.2007.12, so we will need to delete that file before restarting on amad1.

  5. With the previous step in mind, run the generic steps of the build.

Running the above steps should produce many artifacts. The most important outputs will be in $PROJECT_DIR/LTNG_2D_DATA_out (unless you change paths in Makefile.template), including the following plots:


Note that the following are complicated by the HPCC quirks noted above, which hopefully will be fixed Real Soon Now.

  1. Open a shell on infinity, and run (filling-in the envvar appropriately)
    • ls -al $PROJECT_DIR/config_lNOx.sh
  2. Presuming that's found, run config_lNOx.sh
    • $PROJECT_DIR/config_lNOx.sh
    • ... which should build LTNG_2D_DATA, then die when it hits R.
  3. Open a shell on amad1 (or other HPCC box with a working R as defined above), and delete two files created by the previous invocation of config_lNOx.sh:
    • one is the Makefile noted in your application-specific setup , e.g., rm $PROJECT_DIR/Makefiles/Makefile.2006.06
    • the other is common to all applications for this project: rm $PROJECT_DIR/config_lNOx.sh.log
  4. In the same R-worthy shell, run (again) config_lNOx.sh:
    • $PROJECT_DIR/config_lNOx.sh
    • ... which should build the application-specific artifacts noted above (in each application-specific section)


  1. investigate (ask Pinder?) about data and plot manipulations:

    1. parameters-file plot. my plot of the tarball reference output data (${TARBALL_ROOT}/CMAQv5.0.1/data/ref/lnox/LTNG_RATIO.2006.06.ioapi, mounted here) differs from the plot packaged with the tarball reference output (${TARBALL_ROOT}/CMAQv5.0.1/data/ref/lnox/plot_LNOx_params.pdf, mounted here) in page 5 (CMAQ vs NLDN strike bias). Note

      • all other pages in that plot (including CMAQ-estimated strikes and NLDN strike obs) match.
      • this is the only data calculated (rather than just being retrieved) in plot_lNOx_parameters.R
    2. ICCG. make_ICCG.R reverses data rows in output to match reference data

    3. mask:

      • reference mask plot (and presumably data) is either lower-resolution, or deliberately 'continental-shelf's its output relative to mine, which much more finely resolves coastlines and islands.
      • plot_mask.R transposes data (retrieved via R = read.csv) before its call to R = image
      • but plot_lNOx_parameters.R does not transpose data (retrieved via M3::get.M3.var)
      • neither reverses rows
    4. NLDN monthly. I don't know how data was processed by whoever put the files online for download; I do know that plot_NLDN_monthly.R neither transposes or reverses.

    5. my plot_lNOx_parameters.R: does not reverse rows when plotting the above data

      • ICCG
      • mask
      • NLDN monthly

      but does reverse rows when plotting data for

      • CMAQ-calculated strikes
      • strike bias
      • moles(NO)/strike
      • strike count parameter
  2. MET_CRO_2D handling: run_make_ICCG.sh and run_plot_ICCG.sh copy the MET_CRO_2D file locally to *.nc (i.e., they change the file extension) to make netCDF tools happy. Having our data files use non-standard file extensions is bad, but there's not much I can do about that. What's egregious and I can change is, don't copy it more than once!

    • Code should check for the file, and copy only if not found, so as to only copy this large file once.
    • Makefile(s) target= clean should remove the *.nc file(s).
  3. Move all these TODOs to this project's issue tracker.

  4. For all code that uses: document (uber.config.cmaq.*, config.cmaq*) from CMAQ-build

  5. Project needs extensive refactoring! lotsa common code :-(

    • write/use more functions, put in 'source'able files, call from {payload, main loop}
    • note lotsa functions in previous version of make_lNOx_parameters.sh that should probably be recovered!
    • R scripts in this project: fix/use common plotting functions, undo row-reversal in get.* functions
  1. Refactor this and my lightning-NO:sub:x wikipage which currently share (e.g.) info on running this code.
  2. R scripts in this project: need better arg parsing. Currently all use vanilla R commandArgs, which are completely positional. Instead, use R package=optparse: see
  3. all my bash scripts (this project and beyond):
    • need better arg parsing: use built-in getopts (though it can't handle long options)
    • if calling a function=$CMD && 'tee-ing eval' (e.g., doing eval $CMD 2>&1 | tee -a $LOG_FP), make sure that $CMD is not writing log within its body (producing doubled lines in log)
    • if calling a function=$CMD && not 'tee-ing eval', make sure that $CMD is writing log within its body (to ensure completeness of log)
  4. make_mask.R: complete vectorization: still loops to calculate weights.
  5. all plot*.R: make subtitles plot
  6. support workflow=[download NLDN hourly, use to build NLDN monthly]. Currently I only support downloading NLDN monthly.
  7. create {*_driver.sh, uber_driver.sh} à la regridders to either {create bare dir for full 'make' testing, test from repo clone}.