Read a subset of an AMBER NetCDF trajectory file

Issue #44 resolved
Idé Julien created an issue

Dear bio3d team,

One powerful feature of the NetCDF format is its ability to access to a subset of a big variable without loading the full variable into memory. In my opinion this feature may be particularly interesting when one has to work on very big trajectory files. I think it may be interesting and easy to modify the read.ncdf function to include this feature. In attachment I put my revised read.ncdf function which allow to extract only the desired frames and atoms from a NetCDF trajectory file. Lattice parameters can also by extracted if asked. I also join a revised version of the read.dcd function which also read lattice parameters for CHARMM dcd files. Hopping that it will be useful for future version of the bio3d package.

Julien

Comments (4)

  1. Barry Grant

    Thanks Julien, we appreciate and welcome this contribution!

    I will note that we have been working on major updates to the package recently (all available here on bitbucket) - one of which was to modify read.ncdf() to do essentially what you propose. It now has 'first', 'last', 'stride' and 'cell' options (see below). Could you possibly try this new version out to see if it meets your requirements?

    > packageVersion("bio3d")
    [1]2.0> args(read.ncdf)
    function (trjfile, headonly = FALSE, verbose = TRUE, time = FALSE,
        first = NULL, last = NULL, stride = 1, cell = FALSE)
    

    We will certainly add the 'cell' options to read.ncdf() as you have done. Instructions for obtaining the development version can be found here

    All best,

    Barry

  2. Idé Julien reporter

    Thank you Barry for your fast reply.

    The new version of the read.ncdf() function is very nice. Of course the most important is to have the possibility to extract the frames we need. My function also allow to extract a subset of the structure but it is probably less interesting.

    I guess when you said "We will certainly add the 'cell' options to read.ncdf() as you have done" you were talking about the read.dcd() function, right?

    Do you think it could be interesting to add to your package some functions to convert trajectory files? I usually work with NetCDF files, thus, I always convert my dcd files into NetCDF files using a fortran program I have written. In case you are interested I also have a dcd2ncdf function to do this (It is not as fast as my fortran code but it works).

    I have a really last question on a different topic. Have you ever considered to use the rgl library to provide some tools to visualize molecular structures?

    Sorry for bothered you with all my suggestions. It is just that I know R since about five years now and all the other theoretical chemist I now are usually not interested by this language. I am very exited to see other people using R to do that kind of MD analysis.

    Cheers,

    Julien

  3. Barry Grant

    It would be just great to get you involved here Julien and I would be happy to have you play a role in shaping the package into something more useful for you and your co-workers.

    Sorry I missed the substructure reading part of your read.ncdf(). Would you like to add that to the development version of the function here on bitbucket? I think the most useful thing would be if it could take an atom selection object as returned from atom.select() rather than just a start and end index. What do you think?

    The dcd2ncdf() would be a good addition also. We have tried to document the procedure for adding new functions to the development version under the bitbucket "wiki" tab. E.g. See How to add a new function.

    Thanks again! Barry

  4. Log in to comment