compiling Baikal with gcc >= 9.3 is very slow
It was found that is it slow compile of Baikal code with gcc 9.3.0 or newer (10.1 is also affected)
- takes about 30 minutes to compile 8th order FD RHS using gcc 9.3.0 using
-O1
and-march=core2
- Zach and Roland have been looking into this
- slowness goes away if one uses
gcc -Q -O1 --help=optimizers
which claims to report the options that are used by-O1
Zach wanted to look into moving operators into non-inlined files anyway which may fix the issue.
Comments (7)
-
-
@Erik Schnetter Thanks for the tip! I have just refactored NRPy+'s finite-difference generating code so that it generates CCTK_ATTRIBUTE_NOINLINE finite difference functions within Baikal* instead of inlined code.
The net result is far faster compiles (>10x faster for gcc 10.1 on a Linux machine), and far faster codegens (~2.4x faster to generate Baikal* thorns using NRPy+). Further, in early tests, I have found no degradation in runtime performance (same performance within error bars).
I confirmed that the updated Baikal* thorns still pass the testsuite, so I have replaced the Baikal* thorns in WVUThorns master with the updated ones. @Roland Haas will be retrying on the same machine used to produce the original benchmarks for this ticket.
-
reporter Time spent compiling Baikal and BaikalVacuum version 8b2d570 "WVUThorns/Baikal*: Compute finite difference derivatives within functions instead of inlined. Results in ~2.4x faster codegen and much faster compiles with GCC 9.3 and later" using
-O1 -march=core
using gcc 9.3.0 on the same OSX VM using MacPorts as in the description (only showing files taking more than 1s):File name time to compile Baikal/src/driver_enforcedetgammabar_constraint.c 12.2738 Baikal/src/BSSN_RHSs_enable_Tmunu_True_FD_order_4.c 7.50983 Baikal/src/driver_BSSN_T4UU.c 5.40932 Baikal/src/driver_pt2_BSSN_RHSs.c 3.18587 Baikal/src/BSSN_Ricci_FD_order_4.c 2.23229 BaikalVacuum/src/BSSN_RHSs_enable_Tmunu_False_FD_order_8.c 20.5416 BaikalVacuum/src/driver_pt2_BSSN_RHSs.c 13.5228 BaikalVacuum/src/BSSN_RHSs_enable_Tmunu_False_FD_order_6.c 11.0136 BaikalVacuum/src/BSSN_Ricci_FD_order_8.c 5.86034 BaikalVacuum/src/BSSN_Ricci_FD_order_6.c 3.13888 BaikalVacuum/src/BSSN_to_ADM.c 1.22795 I am recompiling the release code to compare but is has been compiling for a couple minutes already so is much slower to compile.
-
reporter Table of compile time for gcc 9.3.0 using
-O1 -march=core
and the ET_2020_05_v0 version of the codeFile name time to compile Baikal/src/driver_enforcedetgammabar_constraint.c 458.326 Baikal/src/driver_pt2_BSSN_RHSs.c 65.7796 Baikal/src/BSSN_RHSs_enable_Tmunu_True_FD_order_4.c 46.865 Baikal/src/BSSN_Ricci_FD_order_4.c 9.34175 Baikal/src/driver_BSSN_T4UU.c 7.00304 BaikalVacuum/src/BSSN_RHSs_enable_Tmunu_False_FD_order_4.c 8947.04 BaikalVacuum/src/BSSN_RHSs_enable_Tmunu_False_FD_order_8.c 2453.11 BaikalVacuum/src/BSSN_Ricci_FD_order_4.c 1100.26 BaikalVacuum/src/driver_enforcedetgammabar_constraint.c 433.173 BaikalVacuum/src/BSSN_Ricci_FD_order_8.c 305.455 BaikalVacuum/src/driver_pt2_BSSN_RHSs.c 66.2126 BaikalVacuum/src/BSSN_to_ADM.c 1.22881 which is about a factor of 430 faster for the slowest file (BSSN_RHSs_enable_Tmunu_False_FD_order_4.c) and changes compile time from several hours to a minute or so.
-
430x! I’ve confirmed roundoff-level agreement with the original version (and no *runtime* performance degradation) and pushed this updated version to the WVUThorns repo (master branch). Can we consider this ticket closed, then?
-
reporter At least for master I would say yes, it is resolved.
-
reporter - changed status to resolved
- Log in to comment
In McLachlan, I find that the derivative operators themselves are quite large. I declare them CCTK_ATTRIBUTE_NOINLINE, but make their definition still available when compiling the caller. GCC then specializes the function, i.e. uses a special calling convention that is more efficient than the regular one.