I added a few test_runner.py flags for controlling the strictness of the testing constraints. --tolerance=int sets the order of the relative error allowed, --bitwise sets whether or not to allow bitwise tests to occur, --strict=[low, medium, high] also control tolerance and bitwise flags to preset levels. strict defaults to low, which means tolerance=3 and bitwise = false. in addition, i combined make_new_tests.py into test_runner.py, so there is one fewer step to run. lastly, i updated the docs to reflect all of these modifications (and flesh out a bit more on how to troubleshoot failures).
the result is now that nearly all of the tests pass on my test machines (using the default low strictness), but developers can change their precision to arbitrary levels of tightness before pushing or while testing our own code. see the docs for more information on all of this.
UPDATE 1: thanks to mike kuhlen and nathan goldbaum for honing in on some problematic test problems and coming up with ways of pruning these tests to not cause strange behavior across compilers/platforms.
fixed a few bugs with --bitwise and --sim-only flag
now, we purge all of the newly-created *__test_standard.py after run time instead of just ignoring them.
Documentation updated to reflect ngoldbaum's comment (and other slight modifications)
Modified the flags in yt's testing framework such that now, one can use --answer-store to designate whether or not you store/compare, --answer-name=X to designate what reference you are storing to/comparing against, and modified --local-store to just be simply --local. Additionally, I made the default --answer-name be set to enzogold2.2 (since that will be the cloud gold standard for this version), but one can change this in future versions as the codebase gets better. I set it so the default behavior was to run the quick suite, when no other tests are picked to run. And I documented all of these changes. This makes running the test suite a lot easier, because there are sensible defaults.
This is great, Cameron. I've tested with --strict=low (all pass), --strict=medium (all pass), --strict=high(some fail), and --bitwise (more fail) against a slightly different optimization for the quick suite. I think this is good to go.
I set strict to operate as follows (but i am open to changing these values):
--strict=low means --tolerance=3 and --bitwise is not set
--strict=medium means --tolerance=6 and --bitwise is not set
--strict=high means --tolerance=13 and --bitwise is set
In addition, the values used for tolerance and bitwise are printed out to STDOUT at the beginning of a run, and they're included in the test_results.txt file for later use.
Cameron, this looks awesome. One comment, you can simplify checking for valid arguments to the --strict flag with the "choices" keyword in the add_option command. If you set the choices flag, the parser will do the checking itself. You can see an example of this with the --suite flag.
Britton, I'll look at this. I saw the --suite flag, but it looked like there was a lot going on that didn't I didn't really understand, so I stuck with this homegrown but shorter version... But I'll see if the choices helps me out.
Great work, Cameron. In IRC there was broad consensus. Nice!