Don't rebuild external libraries so often

Issue #136 open
Erik Schnetter created an issue

One way to keep the external libraries that have been built would be the following. Create a dummy configuration "ext" where all external libraries are built. When another configuration is built, it should be easy to specify to look there for the external libraries, or maybe this should even be the default.

Keyword:

Comments (27)

  1. Frank Löffler
    • removed comment

    That doesn't work in a general setting. Different configurations might have different configuration options. For example I do have two default configurations on the numrel machine: one with the Intel, one with the Gnu compiler. Using libraries from one within the other does not work.

  2. Erik Schnetter reporter
    • removed comment

    If you have two incompatible configurations, then you can use two dummy configurations, one for each.

  3. Frank Löffler
    • removed comment

    Then I would have to specify the dummy configuration as well, and remember to update it if I update the configuration it belongs to. What exactly do you mean with 'too often'? I can see two possibilities:

    a) You might have several configurations, only differing in their thornlist. In this case external libraries built for one could be used within the other, and rebuilding would probably be not necessary. It wastes compile time and and space. Of course you can in this case always build the library yourself and let Cactus use this instead of the Cactus-build version.

    b) You might have the problem that the libraries within one configuration are rebuilt too often. If that is the case, we should be able to fix this without creating a dummy configuration.

    Should we take this discussion to the Cactus developers mailing list?

  4. Erik Schnetter reporter
    • removed comment

    Yes, building the external libraries too often wastes time and space, this is exactly why I opened this issue. The only current remedy is to build the library yourself, outside of Cactus. But this is what I want to avoid: building an external library yourself is often complicated, and Cactus already knows how to do it, so Cactus should do this for you.

    The Cactus-built external libraries are rebuilt after a "make clean" or "make realclean" (or when the library itself is updated, of course). Most people use "make clean" if something inexplicable goes wrong with their build, and they want a fresh start. In this case, rebuilding external libraries is probably a good idea, since it's not clear what actually is going wrong.

  5. Frank Löffler
    • removed comment

    We could let Cactus search other existing configurations for identical config-info files (apart from the timestamps...) and copy the relevant files (using hard links if possible), instead of rebuilding them. Of course that would mean that we would have to have a way to a) let Cactus know which thorn builds a library and which not and b) which files belong to the result of building that library.

  6. Ian Hinder
    • removed comment

    I think that is too complicated, and would get in the way of debugging. Is it true that using libraries from one compiler will not work with another? If I install HDF5 from source I'm pretty sure I have been able to use it with the intel compiler in the past, even though it will have built with GCC.

    The problem which this ticket attempts to solve is that one does not want to build external libraries outside Cactus because it is sometimes complicated and Cactus already knows how to do it. Currently, the libraries are built within Cactus and stored in each configuration and are rebuilt whenever the configuration is rebuilt.

    How about implementing a command which uses Cactus to build the libraries and install them outside the Cactus tree. For example,

    make HDF5-buildlib prefix=$HOME/software

    This would look for the thorn HDF5 and run its configuration script, telling it to build and install in a particular location. This is independent of any specific configuration.

    If there are corner-cases where the build needs to be customised to a particular configuration, we would need there to be an association with a configuration for the build, and maybe the prefix would be modified to include the config name.

  7. Roland Haas repo owner
    • removed comment

    Replying to [comment:6 hinder]:

    I think that is too complicated, and would get in the way of debugging. Is it true that using libraries from one compiler will not work with another? If I install HDF5 from source I'm pretty sure I have been able to use it with the intel compiler in the past, even though it will have built with GCC.

    I found that Fortran modules (the .mod files in /include directories to be specific) are apparently compiler (and version) specific. If I compile HDF5 (+Fortran interface) using gcc 4.1.2 then I cannot use it with gcc 4.5 (the error message is something like "error parsing module description"). I assume that different compilers could also use different methods to mangle eg. Fortran routine names. I never had problems with C/C++ routines.

  8. Erik Schnetter reporter
    • removed comment

    When Cactus builds external libraries, it makes extensive use of the options that the user specified. In many cases, building is not possible without such options, e.g. when it comes to choosing a CPU architecture, switching between 32-bit and 64-bit mode, or finding good C and Fortran compilers. Another problem is finding other libraries on which a certain library depends (e.g. PETSc depends on LAPACK). On some systems, the standard make/ar/ranlib/tar tools do not quite work out of the box.

    In general, when one uses only C, and when a library does not depend on other libraries, then it can be built without problems. Often, this is also the case for C++, but certainly not for Fortran.

    Cactus "knows" how to build and install these libraries because it knows how to configure them.

    What would be possible is to take a certain option list and to build a set of external libraries with these. These can then be installed somewhere, but it does not matter whether this is inside a Cactus source tree, or outside, or whether we call this a "mini-configuration", or whether we look for such mini-configurations in other Cactus source trees, or in the home directory of another user.

  9. Bruno Mundim
    • removed comment

    Hi,

    the way I used to deal with this problem was simply to build the libraries and move them elsewhere outside the configuration directory or Cactus tree. That had worked fine for me so far and I think that would work for most people as well, since they can always name the external directory to match the configuration and let the config-info to exist there as well as a reminder of the compiler options used. However, recently, after one of those "sim-reconfig" that doesn't actually reconfigure anything, I erased the configuration that I used originally to build the libraries. I started then to have problems with the gsl library, which has already been installed. Investigating this issue I found out that the options set in GSL.sh uses ${GSL_DIR}/bin/gsl-config, which actually hard coded the original installation directory (the default on config/scratch/GSL). So moving things around ended up not being a good idea.

    The best way, in my opinion, to deal with this issue at the moment would be to let the user define the installation directory in its .cactus/config option file. So whenever we build the library the script also looks for GSL_INSTALL_DIR variable and if it finds it installs the library at that user specified directory, otherwise it installs in the default location at config/scratch/GSL. We would set something like this in the .cactus/config file:

    GSL_DIR = BUILD GSL_INSTALL_DIR = /home/me/local/gsl-1.14

    and after building and installation we would change to only

    GSL_DIR = /home/me/local/gsl-1.14

    in order to use it for most other configurations.

    I have a patch attached with a small change to GSL.sh script. I encourage you to give a look and express your opinion. My proposal doesn't change the default behaviour. It only adds an option for the user to specify the installation directory. The only drawback I can think of at the moment it is that you may forget which configuration was used to build the libraries. However, you could always name the directory accordingly and we could improve the script to make it dump a similar config-info file there as a reminder, if really necessary

    Anyway, I find annoying to have to rebuild libraries frequently and I hope my proposed solution to GSL.sh can be ported for the other library scripts as well and that it can be really a step forward on avoiding all this unnecessary and time consuming re-compilations.

  10. Erik Schnetter reporter
    • removed comment

    The patch is missing the declaration of GSL_INSTALL_DIR in the configuration.ccl file. Or do you envision this to be a global environment variable instead of a Cactus configuration variable?

    It's a somewhat dangerous options; e.g. we couldn't use it in our standard simfactory option lists, because there is no mechanism to clean these external locations. It's also dangerous because it doesn't take compiler compatibility into account.

    However, these are not arguments against such an option. What do others think?

  11. Bruno Mundim
    • removed comment

    Hi Erik:

    The patch is missing the declaration of GSL_INSTALL_DIR in the configuration.ccl file. Or do you envision this to be a global environment variable instead of a Cactus configuration variable?

    I thought more like an environment variable, but I am fine with declaring it in configuration.ccl. What is the advantage of doing so though? It seems to me that it serves only to document the environment or configuration variables used. Is there anything else? What is the idea or road map behind these options at configuration.ccl?

    It's a somewhat dangerous options; e.g. we couldn't use it in our standard simfactory option lists, >because there is no mechanism to clean these external locations.

    I am not sure I understood why it is dangerous. Once you know the directory the external libraries were installed you know where to clean them. It is all in your GSL_INSTALL_DIR variables (or equivalent).

    It's also dangerous because it doesn't take compiler compatibility into account.

    Yes, you are right about the compiler compatibility and it is the only problem I see at the moment. However, the user or simfactory option list can still use the default installation config/scratch directories. Those users that see the compatibility issue as a big problem would still be configuring the same way as before. Those that don't face this issue on a daily basis or it is happy to work around it (for example by having different external locations for different compilers) would now have at least a better alternative to do so (instead of simply moving the library away from its original installation directory).

    Again, I am not advocating to change the default, I am only proposing to add an option such that we can install libraries in directories we choose to do so.

  12. Erik Schnetter reporter
    • removed comment

    The advantage of a configuration option over an environment variable is that one can have two different install dirs e.g. on Kraken, e.g. if one wants to experiment with using the Intel vs. the PGI compiler. The respective libraries would be incompatible.

    The idea behind declaring these options in a .ccl file is that, at one point, we may/should/could clean up the configuration mechanism to ensure that only intended variables are passed to the configuration scripts. For example, I found a system that had the R package installed, and had an environment variable RPATH set; RPATH is also a configuration option for (HDF?), which then breaks many things.

    Again: what do others think?

  13. Frank Löffler
    • removed comment

    The patch only handles the problem for one library. Most users would probably like to have this mechanism for all libraries at the same time, without having to specify a lot of installation directories.

    What about the following: within configs/ (or a new subdirectory of the main tree) we create (if requested) directories containing all build libraries of a given configuration. We also save the configuration file (also including options given on the command line) which was used for these there. Comparing these saved configuration files to the one of a currently being built configuration shouldn't be hard or take long. Thus, when building a new configuration, we could (if requested) look through these directories for a matching configuration file and if found, let the library scripts know about that directory as installation directory, and they can then decide to either rebuilt into there or just reuse it. One issue with that is to decide what should happen on a make -clean. Right now this also rebuilds the libraries, which is fine, as they are local to one configuration and do not affect others. We should probably still clean the libraries, accepting that this will potentiall affect other configurations as well. That shouldn't be too much of a problem, as the same configuration options /should/ result in the same library being built, but that might not always be the case - like after an update of the library itself.

    On a higher level this could even be done with thorns in theory. I am not suggesting this right now, just would like to note it. I say in theory, because in practice this will not work as there might be dependencies of thorns on each other, changing something if some other thorn is present in a configuration or not, e.g., through the include-file mechanism. We would have to look out for similar dependencies of libraries on each other though.

    All a user would have to do is to enable this mechanism with one directive. Once we tested this for some time I could even think about making this the default, but of course we shouldn't do that too soon.

  14. Bruno Mundim
    • removed comment

    The patch only handles the problem for one library. Most users would probably like to have this > mechanism for all libraries at the same time, without having to specify a lot of installation directories.

    The patch is easy and simple enough to be ported to all other libraries, and I volunteer to do so if it is the case. I disagree it is a lot of installation directories. Right now the default in a .cactus/config file is to have this to force a library building:

    GSL_DIR = BUILD

    I propose to allow an extra option (that would add one *optional* line to each library line in the configuration file):

    GSL_DIR = BUILD GSL_INSTALL_DIR = /home/me/local/intel11/gsl-1.14.etc

    I mean there is even no need to have it there. If the library is not found in the system it is built and installed in the config/scratch directories. My suggestion doesn't change this behaviour, it only adds an option: to free the user to install the libraries where he or she wants to. The external libraries have nothing to do with Cactus and it doesn't make sense to me to bury them in the Cactus tree. The only advantage of using Cactus to build the external libraries is that Cactus know how to do so, as once Erik said. The user may want to use these libraries with other pieces of software or he/she may just not be willing to build libraries whenever a new configuration is created or an old one cleaned. There is no such an urgent need to do so, and, besides, we may inadvertently erase configs or whole Cactus trees without remembering that the libraries were buried there. So I think we should at least to provide a way out of this, and the simplest way I thought so far was to let the user especify the library installation directory.

    What about the following: within configs/ (or a new subdirectory of the main tree) we create (if requested) directories containing all build libraries of a given configuration. We also save the configuration file (also including options given on the command line) which was used for these there. Comparing these saved configuration files to the one of a currently being built configuration shouldn't be > hard or take long. Thus, when building a new configuration, we could (if requested) look through > these directories for a matching configuration file and if found, let the library scripts know about that directory as installation directory, and they can then decide to either rebuilt into > there or just reuse it. One issue with that is to decide what should happen on a make -clean. Right now this also rebuilds the libraries, which is fine, as they are local to one configuration and do not affect > others. We should probably still clean the libraries, accepting that this will potentiall affect > other configurations as well. That shouldn't be too much of a problem, as the same configuration > options /should/ result in the same library being built, but that might not always be the case - > like after an update of the library itself.

    Your suggestion may improve things a bit re the frequency in which the libraries are built, but still doesn't give the user the freedom where to install the external libraries, it only changes the current default installation directories. These libraries would still be inside Cactus/configs tree.

    Concerning make -clean we could add an option to make libraries-clean or similar.

  15. anonymous
    • removed comment

    I think we might be converging on a solution, but we also have two problems at hand here. Let's try to disentangle them.

    1. We might want to use a library location outside of Cactus, but still we would like Cactus to build that library.

    A user would have to specify that location. It might be useful to do that for each library separately, but it might also be of interest to have one switch to build all libraries in some specified place.

    2. Regardless of the location being inside or outside of Cactus it should be possible to avoid too many rebuilds (the original intent of the ticket) in an automatic fashion.

    Here we could, as described above, let the user specify that location, or choose some default within the Cactus tree for that, possibly containing the configuration name, but for sure containing the configuration options in a file. Then Cactus can compare configuration data of different configurations within subdirectories there, and figure out by itself if a library needs to be rebuilt, or if an existing version can be used.

    Solving problem 1 needs user interaction to avoid rebuilds, solving problem 2 would make it work automatically, but is also more work to implement.

    I suggest we go ahead with the current patch, and keep this as overwrite even if we implement something for problem 2.

  16. Bruno Mundim
    • removed comment

    Ok, I will prepare similar patches for all the libraries in the einsteintoolkit.th and apply them shortly after I test them all. I will follow Erik's suggestion to declare the *_INSTALL_DIR as a configuration option in configuration.ccl.

    We can keep on the discussions on how to devise the mechanism suggested in 2.

  17. Erik Schnetter reporter
    • removed comment

    Bruno, have you tried whether your patch actually reduces the number of times external libraries are built? The mechanism that detects whether external libraries need rebuilding is unchanged, and will still rebuild them for every new configuration, and after every make clean.

  18. Bruno Mundim
    • removed comment

    Yes, it does. I haven't built libraries anymore since then. Note however that my patch doesn't change the default. Maybe SimFactory could add an option for the path where the user wants to install his/her external libraries. That would make it easier to modify the option lists of the machines. Right now I change the configuration file by hand, but I guess we don't want that to happen for a beginner user, right? What do you think about this new option for SimFactory or the mdb entries?

  19. Erik Schnetter reporter
    • removed comment

    Which thorn are you using to build e.g. GSL, where your patch helps? Is it in the ExternalLibraries arrangement, or is it in CactusExternal?

  20. Erik Schnetter reporter
    • removed comment

    ExternalLibraries/GSL rebuilds a library whenever (a) the GSL thorn changes, or (b) the file configs/*/scratch/done/GSL is missing, e.g. after a make *-clean. The location of GSL_INSTALL_DIR does not factor into this decision at all! In particular:

    - If two Cactus configurations use the same install locations, then the library there will be built twice -- the first will silently be overwritten - If you clean a configuration or remove it, and then rebuild it, the GSL library will be built again, even if it already exists there - If you install GSL into the Cactus source tree, the same thing happens.

    You can easily see this in the build script. The test whether to build is done before GSL_INSTALL_DIR is checked.

  21. Bruno Mundim
    • removed comment

    What my patch does is to provide a way out of the default behaviour, but it does *not* change the default behaviour. The way I usually do when building the libraries is as follows:

    1) set in my .cactus/config the path where I want the library built and the BUILD statement to force it to be built. For example, I used the following configuration for GSL to be built in my laptop with gcc:

    GSL_DIR = BUILD GSL_INSTALL_DIR = /home/bruno/local/gcc4.4.1/gsl-1.14

    2) Once I have built them, I change my .cactus/config file to reflect their location and comment out the two lines above:

    1. GSL_DIR = BUILD
    2. GSL_INSTALL_DIR = /home/bruno/local/gcc4.4.1/gsl-1.14 GSL_DIR = /home/bruno/local/gcc4.4.1/gsl-1.14

    3) All other configurations that are built with gcc, I use the latest version of my .cactus/config file.

    This has worked well for me so far. I didn't think hard of any corner case and I didn't think of having this process automated. I hope it is clearer how I did/do. Now your points:

    If two Cactus configurations use the same install locations, then the library there will be built twice -- the first will silently be overwritten

    I didn't run into this problem, since once I build the libraries I set its location in the .cactus/config file as I described above. However, we could work on making this more robust.

    If you clean a configuration or remove it, and then rebuild it, the GSL library will be built again, even if it already exists there

    It won't be built again, because at this time I have already set GSL_DIR to the library location. Again we may want to make it more robust, maybe be by assigning GSL_DIR = GSL_INSTALL_DIR *after* the library is installed.

    If you install GSL into the Cactus source tree, the same thing happens.

    This is true. Remember that I didn't change the default. I just provided a way out of it.

  22. Erik Schnetter reporter
    • removed comment

    I see. Your proposal is then to use Cactus to install (on every system you use) one version of all the libraries you need, instead of installing them yourself, or looking for pre-installed libraries. Later you just use these libraries.

    Please apply this patch.

  23. Bruno Mundim
    • removed comment

    No, it isn't. I have committed it on r19. The same applies to the other ET external libraries. I haven't done so to the external libraries exclusive to Cactus.

    We left this ticket open because my solution is just a work around. People want something more sophisticated. Meanwhile this patch allows you to just use Cactus to install libraries where you want.

  24. Frank Löffler
    • changed status to open
    • removed comment

    Ok, since this is commited - reopening the ticket as reminder for the more general solution.

  25. Log in to comment