It passes the push suite on my laptop. I haven't tried running the full suite.
If this is merged, our minimum required yt version moves up to yt-3.2.2 or newer (which is unreleased, but we need this since I had to update a couple of places in yt's Enzo frontend to get the ShockTubeTest to work). See yt PR 1781.
Sam tells me over GChat that he didn't update the test suite to work with yt-3 over performance concerns. We should probably compare the runtime with the 2.x test suite and in this PR before it's merged and evaluate if there is a slowdown and if there is either try to fix it or determine whether it's a big enough problem to worry about.
Probably reasonable performance concerns; I suspect they're better now, but not fully remedied.
@ngoldbaum , would you mind looking into this and reporting back? Even if there's a moderate performance hit, it's probably worthwhile to update!
Followup on this: I'm seeing a tremendous slowdown when I compare the test suite with yt-2.x to the test suite with yt-3.x. Using the changeset in yt PR 1781 and this enzo PR, when I run the test suite with this line:
(i.e., just doing the tests, having generated the data previously), the yt-3.x test runner takes anywhere between 100-400 times longer for any individual test (which can be comprised of between 1 and ~30ish sub-tests). I haven't done any profiling to drill down to where exactly this is happening, but it is definitely a problem - it's a difference between the push suite taking less than a minute (for the tests only) to taking at least a couple of hours. So, I think we really need to hold off on this PR until this performance issue is resolved. I suspect it's on the yt side of things, though - do we want to bump this to the yt-dev list?
Any chance you can upload a profile somewhere? You should be able to get one by doing python -m cProfile -o prof.out before ./test_runner.py in the command line example you gave.
So it looks like basically all of the time is spent in one yt function:
In my bastardized yt version with three open pull requests temporarily merged in, I'm able to run the quick suite under yt-3.0 in 200 seconds.
I've pulled in these two PRs (in addition to the further original PR) and have verified that when using yt-3 the total time (using the ... --test-only option in the test suite, which isolates the tests themselves) dropped from ~8,000 seconds to 120 seconds (compared to ~60 seconds the yt-2-based test suite). So, big win there, and that's a reasonable amount of time.
However, here's something that was actually also a problem in the original PR: it doesn't seem to run all of the tests, and some of the ones that run seem to fail in odd ways. There are 69 separate simulations run in the "push suite", with ~30 individual tests run per simulation in the yt-2 version of the test suite, for a total of 1779 simulations as reported by test_runner.py. The yt-3-based test_runner, with the exact same command line, only runs a total of 253 tests, of which 5 error and 13 fail. Manual inspection of the failures (where the relative difference in datasets values between the 'gold standard' and this version of enzo exceeds 0.001) show arrays with utterly different answers, which makes me think that something funny is happening in the comparisons.
Pastebin of yt-2 test runner test_results.txt (note: it seems to be too large to paste the entire thing, so this is just the first half or so):
I tried running the test runner with --verbose turned on, but no additional useful output was provided. :-/
I don't have time to look closer at this this week. If someone else wants to take this up, feel free. I will come back to this later.
I think all the tests are running, it's just an issue with tests being yielded or not.
Also you'll likely need to generate a new gold standard, since some of the test names have changed.
Hi @ngoldbaum , I think we're now ready to revisit this PR and get it merged into enzo-dev. I'm seeing a lot of tests failing, and hope that you can take a look at this. I'm generating the gold standard for the "push" suite using the current tip of enzo-dev (i.e., with the yt-2-based test_runner.py file) and then I have a second repository that contains the same repository but with this PR pulled in. When I run the actual test suite I get a wildly different number of tests, which I now think I understand. However, I also get many failed tests as well: I see a lot of this error:
ERROR: Failure: AttributeError ('EnzoSimulation' object has no attribute 'length_unit')
This latter problem seems particularly odd because I've verified that for those calculations that there are identical datasets in both the gold standard and new simulation sets, and that the same datasets are there.
For reference, all of the output from calling the test runner for the PR'd version of the code can be seen here:
I'm traveling back from Canada today. I will try to look at this in the next couple of days.
From what I can tell, EnzoSimulation does have the length_unit attribute, but only when it is a cosmological simulation. @brittonsmith can you look at that? I'm looking at line 103 of yt/frontends/enzo/simulation_handling.py which happens inside the conditional.
I just issued a PR that should fix this and one other EnzoSimulation bug. It's here.
So I think the issue is that you're generating the answers on yt-2, then running the tests with yt-3. I doubt that will work, since we didn't do anything in the yt-2 to yt-3 transition to ensure the testing infrastructure could load test answers produced by yt-2.
It's been a while since I looked at this PR so bear with me, but this morning I'm trying to generate a set of answers using yt-3 and then verifying that the tests all pass that way. I'm also going to try to take a shot at generating more verbose output from the tests so you can verify that the test coverage is the same as it was under yt-2.
Found the issue that caused the test counts to not match. Was an easy fix in the end, updates incoming...
I've fixed this issue and updated the docs. This should be ready for review.
Let me know if regenerating the gold standard after merging this change is going to be an issue, or if we need to make a note about it somewhere in the docs.
I've pulled in all of the changes from all three of your PRs, and have run the test suite using yt-3 (for both the gold standard and the comparison, which is the same codebase but with an extra printf() thrown in so that the changeset hash is different). I get this set of errors for both the gold standard run and the comparison run:
Toro-1-ShockTube__test_standard.test_standard: ERROR (<type 'exceptions.ValueError'>, ValueError('bad axis2 argument to swapaxes',), <traceback object at 0x10eedfe18>)
as well as a couple of these:
PhotonTestAMR__test_amrphotontest.test_amr_photon_test: ERROR (<class 'yt.utilities.exceptions.YTFieldNotFound'>, YTFieldNotFound(), <traceback object at 0x10ac36368>)
The former set of errors seem to go along with 1D and 2D simulations. I'm not sure what's up with the latter error; I've manually verified that the two different simulations have the same set of fields, though when I look at the associated python testing script (PhotonTestAMR__test_amrphotontest.py) I'm reasonably confident that the field 'H_p2_Fraction' does not in fact exist and 'H_p1_Fraction' should have a lower-case 'f'. As an additional clue, the equivalent script in the PhotonTest directory doesn't have that set of problems. In that directory, the fields are:
HOWEVER, what is more alarming is that the PhotonShadowing test didn't run, and in fact segfaulted for some reason... however, there are four tests in that file that involve the PhotonShadowing test that all passed. That is worrying - do you have any idea what's going on there?
I didn't see those yesterday, but then again I only ran the quick suite. Let me try again with the push suite.
I was able to reproduce some of these, but not all. Here are the test results I see:
All of the failures I see are due to RefineBy still being 3 in two tests. That will be fixed when PR 340 is merged
For the sake of reproducibility, I generated my answers using yt version 3ca8a739e529, grackle 2.1, and enzo version 1c18d141262e. The latter version has been pushed to this pull request. I compared with 1c18d141262e by making a trivial change to the repo with 1c18d141262e checked out, as you did.
Let's go through the failures you saw one by one:
I can't reproduce this one at all. The test runs fine both when I generate the answers and when I compare with the gold standard.
I don't see a seg fault when I run PhotonShadowing and it runs to completion.
Sorry for the delay in response. I'll look into this later today and report back. By the way, what type of computer/operating system are you running on? Some Linux variant or OS X?
OK, following up on this... it turns out that the PhotonTest problem (run not executing) is actually a compiler issue. I recently updated to gcc 5.2.0, and when one uses opt-high this leads to problems with NaNs in the PhotonTest problem (but in no other test problems, interestingly enough). So, that was easily dealt with by running with opt-debug.
That said, I still see 52 errors in the test suite when I generate both the gold standard and the "test" version of the test problems with yt-3 using the following arguments to the test runner:
Traceback (most recent call last):
File "/Users/bwoshea/Desktop/yt-conda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
File "/Users/bwoshea/Desktop/yt-conda/src/yt-hg/yt/utilities/answer_testing/framework.py", line 327, in __call__
nv = self.run()
File "/Users/bwoshea/Desktop/yt-conda/src/yt-hg/yt/utilities/answer_testing/framework.py", line 548, in run
hashes[g.id] = hashlib.md5(g[self.field].tostring()).hexdigest()
File "/Users/bwoshea/Desktop/yt-conda/src/yt-hg/yt/data_objects/grid_patch.py", line 83, in __getitem__
tr = super(AMRGridPatch, self).__getitem__(key)
File "/Users/bwoshea/Desktop/yt-conda/src/yt-hg/yt/data_objects/data_containers.py", line 272, in __getitem__
File "/Users/bwoshea/Desktop/yt-conda/src/yt-hg/yt/data_objects/data_containers.py", line 1178, in get_data
fluids, self, self._current_chunk)
File "/Users/bwoshea/Desktop/yt-conda/src/yt-hg/yt/geometry/geometry_handler.py", line 245, in _read_fluid_fields
File "/Users/bwoshea/Desktop/yt-conda/src/yt-hg/yt/frontends/enzo/io.py", line 161, in _read_fluid_selection
rv[(ftype, fname)] = gds.get(fname).value.swapaxes(0,2)
ValueError: bad axis2 argument to swapaxes
This happens for a bunch of different test problems, which are all 1D and 2D test problems. I've tried this with my original installation of yt-3 installed ages ago with the old non-conda install script (changeset 428dd2f17465), as well as a fresh install of yt-3 that I installed this morning using the conda-based install script (also changest 428dd2f17465). Both have this problem. When I turn on the verbose option in the test runner, I get the contents of this file: http://galactica.pa.msu.edu/~bwoshea/data/all_test_runner_output_2016_06_30.txt . Looking through this in detail, I see that all of the errors have to do with GridValues-type tests.
I suspect that I'm seeing this because I am using the --strict=high argument to the test runner and you aren't. Setting --strict to "high" requires that the bitwise tests get called, and when I dig into the yt codebase I see (in yt/frontends/enzo/answer_testing_support.py) that the GridValuesTest only seems to get called in that circumstance. So, if you're not using --strict=high, that would explain why you're not seeing the error. Would you mind trying to reproduce this?
Actually, answering my own question by looking at your pasted output, at the top of your paste it has:
Relative error tolerance: 1e-3
Bitwise tests not included
So that's what's going on. If you turn on --strict=high I bet you'll see my problems. Once that's resolved, we can accept this PR!
Awesome, thank you. I'll give this a try ASAP and let you know how it goes. Just to check, though - if I just merge this PR into my own yt distribution (installed via conda) do I have to do anything other than that? Do I have to do any sort of "python setup.py develop"-type thing?
If your running out of a repository (and not the conda binary packages or the nightly binary builds) then yes, that's all you need to do since that PR only modifies python code.
Awesome, thank you. Also, a followup: when I pull yt PR #2258 into my yt repository, all tests pass with flying colors. Thank you, @ngoldbaum , for your work on this one. I'm going to hit approve on both this PR and the yt one, and we can pull the trigger on this when a couple more people sign off. @MatthewTurk , @brittonsmith ?
@bwoshea @ngoldbaum I had a look through the changes to the python scripts and docs and it all looks OK to me. From the source code the changes look pretty minor - just a simple update to YT-3. However, from reading back through the comments it was obviously a lot more work than that! Excellent work!
that's 3 approves; time to merge. Nice job on this one, @ngoldbaum - this was a tremendous amount of work. Thanks also to @john_regan and @MatthewTurk for reviewing!