perftools and cmake

Issue #133 new
Frank Dellaert created an issue

@richardroberts added (experimental) support for tcmalloc in 68a85afe, but

  • It is not apparent to the user that this is an option
  • At least for me cmake -G Xcode .. does not generate correct linking.

Because malloc is an absolute performance hog, solving this issue could be an great boon. I'm eager to try it, but I think a cmake expert should look at it. I'll assign @amelim for now, but if there is someone better let me know. Maybe we should find who is the most proficient in cmake that is going to be here for a while. @jdong37 ? @lvzhaoyang ?

Comments (16)

  1. Frank Dellaert reporter

    BTW, given that it's BSD, if it really makes a big difference over malloc (single-threaded) to TBB's malloc (multi-threaded) we might want to just include it in the distribution, even.

  2. Frank Dellaert reporter

    @amelim I am voting we check this and add to 3.2 if performance boost. Malloc is a beast.

  3. Jing Dong

    The user option is added in 37e2f28, default is OFF, all unit tests passed after switching to tcmalloc.

    1. Any existing profiling script maybe I can used to test the speed?
    2. I have no linking issue on my side (Ubuntu), maybe @lvzhaoyang could help me to test on Mac?
  4. Jing Dong

    Comments backup:

    Frank Dellaert: Thanks !!! But I think maybe this should be examined in a branch and done via a pull request, as I have a number of comments. * cmake messages: I don't think that is the current style we adopt, we are rather silent * Are TBB and tcmalloc mutually exclusive? * Do we really want to make it the default? Did you compare the timing? * Is the timing equally good on all platforms? Esp. the latter should be examined by giving several people on different platforms instructions on how to time with and without, and then come to a conclusion on how to set the default on which platforms. You can implement that and then we can (all) discuss in the pull request, by having the people you asked for timings also as reviewers.

  5. Chris Beall

    Also keep in mind that in base/DerivedValue.h we always use a boost pool, and it's not clear to me how that interacts with tcmalloc. I don't know why this one is always boost pool, instead of getting configured through the CMake allocator choice. Best choice in FastList, FastVector, etc. also needs to be tested. (see base/FastDefaultAllocator.h) @richardroberts might have a few things to add here, since he tested these things.

  6. Jing Dong

    Mac still has linking issue about gperf, Linux works fine. Still seeking the correct way to link against gperf in Mac. Once done should be ready release to @lvzhaoyang and @cbeall3 for profiling.

  7. Jing Dong

    Hi @cbeall3 @richardroberts , I have tested replacing the boost::singleton_pool malloc in DerivedValue by tcmalloc, there's no performance improving (38 s vs. 39s, 1s slower even). Looks like we may not replace the boost pool allocator in DerivedValue. @richardroberts do you have any comment on it? Thanks!

  8. Log in to comment