- edited description
perftools and cmake
@richardroberts added (experimental) support for tcmalloc in 68a85afe, but
- It is not apparent to the user that this is an option
- At least for me cmake -G Xcode .. does not generate correct linking.
Because malloc is an absolute performance hog, solving this issue could be an great boon. I'm eager to try it, but I think a cmake expert should look at it. I'll assign @amelim for now, but if there is someone better let me know. Maybe we should find who is the most proficient in cmake that is going to be here for a while. @jdong37 ? @lvzhaoyang ?
Comments (16)
-
reporter -
reporter - edited description
-
reporter BTW, given that it's BSD, if it really makes a big difference over malloc (single-threaded) to TBB's malloc (multi-threaded) we might want to just include it in the distribution, even.
-
reporter @jdong37 ? @lvzhaoyang ?
-
reporter @amelim I am voting we check this and add to 3.2 if performance boost. Malloc is a beast.
-
I will have a look, after fixing issue #25 I think
-
-
assigned issue to
-
assigned issue to
-
The user option is added in 37e2f28, default is OFF, all unit tests passed after switching to tcmalloc.
- Any existing profiling script maybe I can used to test the speed?
- I have no linking issue on my side (Ubuntu), maybe @lvzhaoyang could help me to test on Mac?
-
reporter Please see my comment on 37e2f28. Possibly impactful things like this should not be made directly on develop.
-
Ah yes I will purge the changes and start new branch
-
Comments backup:
Frank Dellaert: Thanks !!! But I think maybe this should be examined in a branch and done via a pull request, as I have a number of comments. * cmake messages: I don't think that is the current style we adopt, we are rather silent * Are TBB and tcmalloc mutually exclusive? * Do we really want to make it the default? Did you compare the timing? * Is the timing equally good on all platforms? Esp. the latter should be examined by giving several people on different platforms instructions on how to time with and without, and then come to a conclusion on how to set the default on which platforms. You can implement that and then we can (all) discuss in the pull request, by having the people you asked for timings also as reviewers.
-
Also keep in mind that in base/DerivedValue.h we always use a boost pool, and it's not clear to me how that interacts with tcmalloc. I don't know why this one is always boost pool, instead of getting configured through the CMake allocator choice. Best choice in FastList, FastVector, etc. also needs to be tested. (see base/FastDefaultAllocator.h) @richardroberts might have a few things to add here, since he tested these things.
-
reporter So @jdong37, any progress? @richardroberts, any comment on @cbeall3 's question?
-
Mac still has linking issue about gperf, Linux works fine. Still seeking the correct way to link against gperf in Mac. Once done should be ready release to @lvzhaoyang and @cbeall3 for profiling.
-
Hi @cbeall3 @richardroberts , I have tested replacing the boost::singleton_pool malloc in DerivedValue by tcmalloc, there's no performance improving (38 s vs. 39s, 1s slower even). Looks like we may not replace the boost pool allocator in DerivedValue. @richardroberts do you have any comment on it? Thanks!
-
- changed milestone to GTSAM 4 Roadmap: Modernization
- changed version to 4.0.0
- Log in to comment