cached output in CarpetIOSCalar

Create issue
Issue #764 closed
Roland Haas created an issue

the attached patch uses a set of ostringstreams to cache CarpetIOScalar's output and writes it to disk in chunks of io_chunk_size bytes.

The code reproduces the results of the pre-change thorn in the included test suite.

If this is accepted I'll give adding the same functionality to IOASCII a try.

Keyword:

Comments (11)

  1. Erik Schnetter
    • removed comment

    The C++ fstream object performs its own buffering. By closing and re-opening the files every time, we currently ensure that all information is written to disk, in case the simulation is crashes, is aborted, or runs out of queue time.

    I believe the main feature of your patch is to delay writing things to file, without flushing in between, which will indeed speed up things (which is presumably the intent of your patch). This could, in principle, also be achieved by keeping the output files open at all times, and by not flushing them.

    There is another serious performance problem in the current code that your patch doesn't address: there are too many files created. Creating many small files will always be inefficient. Writing to too many files at once may also make it problematic to keep these files open at all times, because the OS imposes a limit on this number.

    I think the true remedy is to write fewer, larger files. For example, all norms (for a variable or group, or for all variables) could be written to a single file. This would be faster during output, and would also improve file system speed when looking at the simulation later. We can then provide a simple awk script that re-creates the current files from this single file. That means that, in effect, writing to a single file is like caching the output, only that we then cache it in a file, which is safer if a simulation aborts.

    Instead of your approach, I would prefer a solution that would lead to fewer, larger files.

    Regarding your patch: I don't like the reference counting. It seems that this is only necessary because you place cached streams directly into STL containers, which need a copy constructor. You can instead place pointers to cache_streams into the STL container, and then there will be only a single copy of each cached stream. Alternatively, the STL provide some abstractions for this case, e.g. auto_ptr.

  2. Roland Haas reporter
    • removed comment

    I really really detest this Spam filter. This is the second time I had to type my comments since the spam filter happily removed my posting and my browser (Opera) did not restore the text in the box.

    Using pointers instead of copying the cached_stream objects seems like a good idea.

    The number of file limit can be reached if we request scalar output for the majority of our grid functions. I have seen limits as low as 1024 files per process (reachable with hdf5_slicer and HDF5 3D output from 576 MPI processes).

    I order to avoid loss of data when the simulation fails any caching should flush its caches during checkpointing (I believe ther eis a bin for it). This way all currently cached data will be reproduced when the user recovers from the last checkpoint.

    I can easily change CarpetIOScalar's code (trunk version) to have different columns for the individual reductions. This is very straigthforward with the exception of the case where the requested reductions change during the run (where I would opt to output a second header with the column numbers). Note that this logic will fail if the user changes the requested reductions when recovering from a checkpoint (unless I track the parameter value out of the checkpoint and what is seen the next time the code is called, not sure though if the flesh lets me see the parameter values from the checkpoint).

  3. Roland Haas reporter
    • removed comment

    I also request that the spam filter be removed. I updated the original caching patch to not use reference counting anymore (much nicer this way) and to also flush the caches in the CHECKPOINT bin to ensure that no data is lost when jobs are not cleanly terminated. I attach a second patch to implement an option all_reductions_in_one file that implements Erik's suggestion. The basic idea is simple: open only a single file outside of the loop over all requested reductions instead of a new file for each reduction. However due to the state-caching, IO timing calls and the desire to have somewhat sensible headers, column arrangements, there are a large number of if (all_reductions_in_one_file) clauses that are used to effectively move code blocks from inside the loop over reductions to the outside. I provide a (new) test and have tested that the code passes the old test if the parameter is off. Please note that currently both patches are against a vanilla CarpetIOScalar code. They cannot be both applied (though merging them is simple).

  4. Erik Schnetter
    • removed comment

    I like the second patch (but haven't tested it yet). Please apply.

    Furthermore, I think that your spam filter requests should be in the end of your posts, not in the beginning.

  5. Roland Haas reporter
    • changed status to resolved
    • removed comment

    applied.

    You are correct, historical correctness requires the rant to be at the _end_ of any comment :-). However I complain too much and there is little need to be so very pushy since someone is already working on the project. We also apparently need the spam filter even for the track tickets. A workaround for me has also been suggested. So I am molified for now (and will limit myself to verbal rants in my office after closing the door :-) ).

    I'll close the ticket since the caching patch would be less effective in speeding up IO I think. One can get almost the same result by keeping the file streams open and increasing their buffer size via `fstream->rdbuf().pubsetbuf((bufptr = new char[size]), size); ... delete[](bufptr);`. The remaining logic would be almost identical (with the currently used InternalFlushCache doing fstream->pubsync()).

    A patch to IOASCII will be committed eventually as well without further notice.

  6. Erik Schnetter
    • removed comment

    Are you planning to implement something in regard to keeping files open across outputs?

    I don't quite like the idea of only flushing when checkpointing; sometimes one wants to look at output without forcing a checkpoint, or abort a simulation at a time when no checkpoint has been written.

    What exactly are you planning to implement for IOASCII?

  7. Roland Haas reporter
    • removed comment

    I have not started really thinking about anything else. Keeping the files open will require some tracking of failures due to too many open files, which is something the code will not handle gracefully. So I think I would have to add an option maximum_number_of_open_files or so to limit how many files are kept open. Any further files will display the current behaviour. Setting this parameter to 1 would restore the current behaviour and would be the default.

    Flushing at checkpoint times would be in addition to whatever flushing the stream decides to do on its own. It ensures that one can recover from checkpoints without loss of data. Flushing after each line is possible as well, assuming that most of the IO latency is due to the file open call and not present when doing IO (ie. the lustre servers either do not attempt sync writes from different nodes or are smart enough to optimize situations where a file is open on only a single node).

    IOASCII has fewer obvious places where things could be combined. One things might be to keep files open the way described above.

    Another option would be to first collect all data that needs to be output and hand it over to an auxilliary thread (pthread not OpenMP). With some blocking mechanism to avoid pilling up write requests from different iterations this would ensure there is no lag between output to IOScalar files and say the iteration counter in stdout as well avoiding blocking IO (since opening a file is slow and writing might be slow). This would be very easy to implement on top of my first patch.

    For now I will not do so very much since I will use both the caching and the combining patch in my own repository for a while to see if there are any unexpected side-effects (and if there are noticable speedups).

  8. Erik Schnetter
    • changed status to resolved
    • removed comment

    Since no immediate action is planned and nothing specific has been proposed I am closing this ticket.

  9. Log in to comment