- changed status to invalid
Performance regression in KSPSolve with BoomerAMG
I use the following script to very crudely assess the performance of builds on our HPC.
It solves Poisson with BoomerAMG and some options gleaned from @chris_richardson. Everything else (refinement, assembly etc.) is much faster in 2016.1.0 so good work on the whole!
But, I have noticed a severe performance regression moving to 2016.1.0 (new) from 1.6.0 (old) in the KSPSolve stage. I use PETSc 3.7.3 (new) with 2016.1.0, and PETSc 3.6.4 (old) with 1.6.0. However I can also create a similar regression even if I compile DOLFIN 2016.1.0 against PETSc 3.6.4 on our cluster. I can also recreate a similar regression in Docker on my laptop (Dual Core i7 Haswell) (quay.io/fenicsproject/stable:2016.1.0.r4 vs quay.io/fenicsproject/stable:1.6.0). I include results below which you can reproduce (in Docker containers).
On my laptop (1 core, Docker) KSPSolve is ~30 seconds on new FEniCS with new PETSc and ~16 seconds on old FEniCS with old PETSc (Docker).
Running on our cluster (8 cores, single node) KSPSolve is ~100 seconds new FEniCS+ new PETSc and around ~24 seconds on old FEniCS + old PETSc. But I also get ~100 seconds with new FEniCS + old PETSc. Tried various MPI backends here too (MPICH, OpenMPI, Intel).
quay.io/fenicsproject/stable:1.6.0
Passing options to PETSc: -ksp_monitor -ksp_view -pc_hypre_boomeramg_print_debug 1
Z Initial Nx: 10
Z Initial Ny: 10
Z Initial Nz: 20
Refinement 1...
Refinement 2...
Refinement 3...
Finished refinement
Z Degrees of freedom: 1056321
Z Degrees of freedom per core: 1056321
Building sparsity pattern over cells [======> ] 21.3%
Building sparsity pattern over cells [===============> ] 46.9%
Building sparsity pattern over cells [=====================> ] 68.3%
Building sparsity pattern over cells [==============================> ] 93.9%
Building sparsity pattern over cells [================================] 100.0%
Assembling matrix over cells [=====> ] 14.9%
Assembling matrix over cells [===========> ] 27.7%
Assembling matrix over cells [================> ] 40.5%
Assembling matrix over cells [=====================> ] 53.3%
Assembling matrix over cells [==========================> ] 66.1%
Assembling matrix over cells [===============================> ] 78.9%
Assembling matrix over cells [====================================> ] 91.7%
Assembling matrix over cells [========================================] 100.0%
Assembling vector over cells [==========> ] 25.6%
Assembling vector over cells [====================> ] 51.2%
Assembling vector over cells [================================> ] 81.1%
Assembling vector over cells [========================================] 100.0%
Computing sub domain markers [============> ] 31.8%
Computing sub domain markers [============================> ] 70.0%
Computing sub domain markers [========================================] 100.0%
Applying boundary conditions to linear system.
<Parameter set "linear_solver" containing 8 parameter(s) and 2 nested parameter set(s)>
linear_solver | type value range access change
------------------------------------------------------------------------
absolute_tolerance | double 1e-15 [] 1 0
divergence_limit | double 10000 [] 1 0
error_on_nonconvergence | bool true {true, false} 1 0
maximum_iterations | int 10000 [] 1 0
monitor_convergence | bool false {true, false} 1 0
nonzero_initial_guess | bool false {true, false} 1 0
relative_tolerance | double 1e-10 [] 1 1
report | bool true {true, false} 1 0
<Parameter set "gmres" containing 1 parameter(s) and 0 nested parameter set(s)>
gmres | type value range access change
----------------------------------------------
restart | int 30 [] 1 0
<Parameter set "preconditioner" containing 3 parameter(s) and 2 nested parameter set(s)>
preconditioner | type value range access change
-----------------------------------------------------------------------------------------------------------------------------
report | bool false {true, false} 1 0
shift_nonzero | double 0 [] 1 0
structure | string different_nonzero_pattern [different_nonzero_pattern, same, same_nonzero_pattern] 1 0
<Parameter set "ilu" containing 1 parameter(s) and 0 nested parameter set(s)>
ilu | type value range access change
-------------------------------------------------
fill_level | int 0 [] 1 0
<Parameter set "schwarz" containing 1 parameter(s) and 0 nested parameter set(s)>
schwarz | type value range access change
----------------------------------------------
overlap | int 1 [] 1 0
Solving linear system of size 1056321 x 1056321 (PETSc Krylov solver).
PETSc Krylov solver starting to solve 1056321 x 1056321 system.
Proc = 0 Level = 0 Build Interp Time = 0.630000
Proc = 0 Level = 0 Build Coarse Operator Time = 0.780000
Proc = 0 Level = 1 Build Interp Time = 0.810000
Proc = 0 Level = 1 Build Coarse Operator Time = 0.680000
Proc = 0 Level = 2 Build Interp Time = 0.220000
Proc = 0 Level = 2 Build Coarse Operator Time = 0.280000
Proc = 0 Level = 3 Build Interp Time = 0.050000
Proc = 0 Level = 3 Build Coarse Operator Time = 0.100000
Proc = 0 Level = 4 Coarsen Time = 0.010000
Proc = 0 Level = 4 Build Interp Time = 0.000000
Proc = 0 Level = 4 Build Coarse Operator Time = 0.020000
Proc = 0 Level = 5 Coarsen Time = 0.000000
Proc = 0 Level = 5 Build Interp Time = 0.000000
Proc = 0 Level = 5 Build Coarse Operator Time = 0.010000
Proc = 0 Level = 6 Coarsen Time = 0.000000
Proc = 0 Level = 6 Build Interp Time = 0.000000
Proc = 0 Level = 6 Build Coarse Operator Time = 0.010000
Proc = 0 Level = 7 Coarsen Time = 0.000000
Proc = 0 Level = 7 Build Interp Time = 0.000000
Proc = 0 Level = 7 Build Coarse Operator Time = 0.000000
Proc = 0 Level = 8 Coarsen Time = 0.000000
Proc = 0 Level = 8 Build Interp Time = 0.000000
Proc = 0 Level = 8 Build Coarse Operator Time = 0.000000
Proc = 0 Level = 9 Coarsen Time = 0.000000
Proc = 0 Level = 9 Build Interp Time = 0.000000
Proc = 0 Level = 9 Build Coarse Operator Time = 0.000000
Proc = 0 Level = 10 Coarsen Time = 0.000000
Proc = 0 Level = 10 Build Interp Time = 0.000000
Proc = 0 Level = 10 Build Coarse Operator Time = 0.000000
Proc = 0 Level = 11 Coarsen Time = 0.000000
Proc = 0 Level = 11 Build Interp Time = 0.000000
Proc = 0 Level = 11 Build Coarse Operator Time = 0.000000
Proc = 0 Level = 12 Coarsen Time = 0.000000
Proc = 0 Level = 12 Build Interp Time = 0.000000
Proc = 0 Level = 12 Build Coarse Operator Time = 0.000000
Proc = 0 Level = 13 Coarsen Time = 0.000000
Proc = 0 Level = 13 Build Interp Time = 0.000000
Proc = 0 Level = 13 Build Coarse Operator Time = 0.000000
Residual norms for solve.
0 KSP Residual norm 5.583794485124e+01
1 KSP Residual norm 1.611860391018e+01
2 KSP Residual norm 5.305846131867e+00
3 KSP Residual norm 1.905306416164e+00
4 KSP Residual norm 7.143669810157e-01
5 KSP Residual norm 2.657322220626e-01
6 KSP Residual norm 9.758858276495e-02
7 KSP Residual norm 4.513775590689e-02
8 KSP Residual norm 2.178923305886e-02
9 KSP Residual norm 7.983546156009e-03
10 KSP Residual norm 2.613698083862e-03
11 KSP Residual norm 9.317568226336e-04
12 KSP Residual norm 3.832151753727e-04
13 KSP Residual norm 1.712405504300e-04
14 KSP Residual norm 7.479485497816e-05
15 KSP Residual norm 2.678717319622e-05
16 KSP Residual norm 9.046576049639e-06
17 KSP Residual norm 3.257880478991e-06
18 KSP Residual norm 1.141033761126e-06
19 KSP Residual norm 3.762399501122e-07
20 KSP Residual norm 1.235637444976e-07
21 KSP Residual norm 4.413263355892e-08
22 KSP Residual norm 1.755015266427e-08
23 KSP Residual norm 7.016483228222e-09
24 KSP Residual norm 2.580492440593e-09
KSP Object:() 1 MPI processes
type: cg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-10, absolute=1e-15, divergence=10000
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object:() 1 MPI processes
type: hypre
HYPRE BoomerAMG preconditioning
HYPRE BoomerAMG: Cycle type V
HYPRE BoomerAMG: Maximum number of levels 25
HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
HYPRE BoomerAMG: Convergence tolerance PER hypre call 0
HYPRE BoomerAMG: Threshold for strong coupling 0.5
HYPRE BoomerAMG: Interpolation truncation factor 0.9
HYPRE BoomerAMG: Interpolation: max elements per row 5
HYPRE BoomerAMG: Number of levels of aggressive coarsening 4
HYPRE BoomerAMG: Number of paths for aggressive coarsening 2
HYPRE BoomerAMG: Maximum row sums 0.9
HYPRE BoomerAMG: Sweeps down 1
HYPRE BoomerAMG: Sweeps up 1
HYPRE BoomerAMG: Sweeps on coarse 1
HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
HYPRE BoomerAMG: Relax weight (all) 1
HYPRE BoomerAMG: Outer relax weight (all) 1
HYPRE BoomerAMG: Using CF-relaxation
HYPRE BoomerAMG: Measure type local
HYPRE BoomerAMG: Coarsen type Falgout
HYPRE BoomerAMG: Interpolation type classical
linear system matrix = precond matrix:
Mat Object: () 1 MPI processes
type: seqaij
rows=1056321, cols=1056321
total: nonzeros=1.5585e+07, allocated nonzeros=1.5585e+07
total number of mallocs used during MatSetValues calls =0
not using I-node routines
PETSc Krylov solver (cg, hypre) converged in 24 iterations.
Hypre preconditioner method: boomeramg
[MPI_AVG] Summary of timings | reps wall avg wall tot
------------------------------------------------------------------
Apply (PETScMatrix) | 2 0.030532 0.061064
Apply (PETScVector) | 5 5.7755e-05 0.00028878
Assemble cells | 2 2.9384 5.8767
Assemble exterior facets | 1 0.27444 0.27444
Build mesh number mesh entities | 1 3.696e-06 3.696e-06
Build sparsity | 2 1.178 2.3561
Delete sparsity | 2 1.1487e-05 2.2974e-05
DirichletBC apply | 1 1.5554 1.5554
DirichletBC compute bc | 1 1.5168 1.5168
DirichletBC init facets | 1 1.4913 1.4913
Generate Box mesh | 1 0.0018448 0.0018448
Init PETSc | 1 0.0015295 0.0015295
Init dof vector | 1 0.0070173 0.0070173
Init dofmap | 1 26.141 26.141
Init dofmap from UFC dofmap | 1 1.3064 1.3064
Init tensor | 2 0.021542 0.043084
Krylov solver | 1 16.277 16.277
PETSc Krylov solver | 1 16.276 16.276
PLAZA: Face long edge | 3 2.6711 8.0134
PLAZA: Get tetrahedra | 876000 4.8844e-06 4.2787
PLAZA: refine | 3 7.0296 21.089
SCOTCH graph ordering | 1 0.31682 0.31682
Z Assembling... | 1 10.289 10.289
Z Creating mesh... | 1 0.0024143 0.0024143
Z DirichletBC... | 1 0.0091192 0.0091192
Z Expression... | 1 0.16942 0.16942
Z Forms... | 1 0.00093475 0.00093475
Z FunctionSpace... | 1 26.191 26.191
Z PETSC Options... | 1 0.0075967 0.0075967
Z Refinement 1... | 1 0.26389 0.26389
Z Refinement 2... | 1 2.0922 2.0922
Z Refinement 3... | 1 18.733 18.733
Z Solving... | 1 16.278 16.278
compute connectivity 0 - 1 | 3 0.015244 0.045731
compute connectivity 2 - 1 | 3 1.0456 3.1369
compute connectivity 2 - 3 | 1 5.1591 5.1591
compute entities dim = 1 | 3 0.62157 1.8647
compute entities dim = 2 | 4 4.5958 18.383
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
poisson.py on a arch-linux2-c-opt named 4fbc7e36c67a with 1 processor, by fenics Wed Aug 31 12:56:11 2016
Using Petsc Release Version 3.6.3, Dec, 03, 2015
Max Max/Min Avg Total
Time (sec): 7.487e+01 1.00000 7.487e+01
Objects: 3.000e+01 1.00000 3.000e+01
Flops: 1.027e+09 1.00000 1.027e+09 1.027e+09
Flops/sec: 1.372e+07 1.00000 1.372e+07 1.372e+07
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 2.014e+07 1.00000 0.000e+00 2.014e+07
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 7.4869e+01 100.0% 1.0269e+09 100.0% 0.000e+00 0.0% 0.000e+00 100.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecTDot 48 1.0 6.8321e-02 1.0 1.01e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0 0 10 0 0 0 1484
VecNorm 25 1.0 1.7067e-02 1.0 5.28e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 3095
VecCopy 2 1.0 2.6250e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 31 1.0 1.5567e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 48 1.0 7.3175e-02 1.0 1.01e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0 0 10 0 0 0 1386
VecAYPX 23 1.0 4.2406e-02 1.0 4.86e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1146
VecAssemblyBegin 5 1.0 5.9366e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 5 1.0 1.7881e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 6 1.0 2.0504e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 24 1.0 5.8344e-01 1.0 7.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 70 0 0 0 1 70 0 0 0 1239
MatConvert 1 1.0 7.3389e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 2.2173e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 6.0960e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 3.3379e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 1 1.0 7.1886e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 1 1.0 3.8147e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 1 1.0 3.6915e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 5 0 0 0 0 0
PCApply 25 1.0 1.1789e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 0 16 0 0 0 0 0
KSPSetUp 1 1.0 1.7953e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 1.6275e+01 1.0 1.03e+09 1.0 0.0e+00 0.0e+00 0.0e+00 22100 0 0 0 22100 0 0 0 63
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 0 0 0
Index Set 6 6 108936 0
IS L to G Mapping 6 6 25355400 0
Vector 12 12 59172536 0
Vector Scatter 2 2 1296 0
Matrix 1 1 203923352 0
Preconditioner 1 1 1488 0
Krylov Solver 1 1 1208 0
========================================================================================================================
Average time to get PetscTime(): 3.00407e-06
#PETSc Option Table entries:
-ksp_monitor
-ksp_view
-log_summary
-pc_hypre_boomeramg_P_max 5
-pc_hypre_boomeramg_agg_nl 4
-pc_hypre_boomeramg_agg_num_paths 2
-pc_hypre_boomeramg_print_debug 1
-pc_hypre_boomeramg_strong_threshold 0.5
-pc_hypre_boomeramg_truncfactor 0.90000000000000002
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --COPTFLAGS=-O2 --CXXOPTFLAGS=-O2 --FOPTFLAGS=-O2 --with-blas-lib=/usr/lib/libopenblas.a --with-lapack-lib=/usr/lib/liblapack.a --with-c-support --with-debugging=0 --with-shared-libraries --download-suitesparse --download-scalapack --download-metis --download-parmetis --download-ptscotch --download-hypre --download-mumps --download-blacs --download-spai --download-ml --prefix=/usr/local
-----------------------------------------
Libraries compiled on Tue Jun 28 09:19:42 2016 on 77705a0c4e38
Machine characteristics: Linux-4.4.6-coreos-x86_64-with-Ubuntu-16.04-xenial
Using PETSc directory: /tmp/petsc-3.6.3
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O2 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O2 ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/tmp/petsc-3.6.3/arch-linux2-c-opt/include -I/tmp/petsc-3.6.3/include -I/tmp/petsc-3.6.3/include -I/tmp/petsc-3.6.3/arch-linux2-c-opt/include -I/usr/local/include -I/usr/include/mpich
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/tmp/petsc-3.6.3/arch-linux2-c-opt/lib -L/tmp/petsc-3.6.3/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lHYPRE -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lpthread -lscalapack -lspai -lml -lmpichcxx -lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -Wl,-rpath,/usr/lib -L/usr/lib -llapack -lopenblas -lparmetis -lmetis -lpthread -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lm -lmpichfort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -lrt -lm -lz -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -lmpich -lgcc_s -ldl
-----------------------------------------
quay.io/fenicsproject/stable:2016.1.0.r4
Passing options to PETSc: -ksp_monitor -ksp_view -pc_hypre_boomeramg_print_debug 1
Z Initial Nx: 10
Z Initial Ny: 10
Z Initial Nz: 20
Refinement 1...
Compute shared mesh entities of dimension 1.
Number of cells increased from 12000 to 96000 (700.0% increase).
Refinement 2...
Compute shared mesh entities of dimension 1.
Number of cells increased from 96000 to 768000 (700.0% increase).
Refinement 3...
Compute shared mesh entities of dimension 1.
Number of cells increased from 768000 to 6144000 (700.0% increase).
Finished refinement
Z Degrees of freedom: 1056321
Z Degrees of freedom per core: 1056321
*** -------------------------------------------------------------------------
*** Warning: Automatic determination of degree for Expressions has been deprecated in FEniCS version 2016.1.
*** It will (likely) be removed in the next FEniCS release.
*** Specify the polynomial degree for interpolation of Expression, or provide a FiniteElement
*** -------------------------------------------------------------------------
*** -------------------------------------------------------------------------
*** Warning: Automatic determination of degree for Expressions has been deprecated in FEniCS version 2016.1.
*** It will (likely) be removed in the next FEniCS release.
*** Specify the polynomial degree for interpolation of Expression, or provide a FiniteElement
*** -------------------------------------------------------------------------
Building sparsity pattern over cells [========> ] 25.6%
Building sparsity pattern over cells [================> ] 51.2%
Building sparsity pattern over cells [========================> ] 76.8%
Building sparsity pattern over cells [================================] 100.0%
Assembling matrix over cells [=====> ] 12.8%
Assembling matrix over cells [==========> ] 25.6%
Assembling matrix over cells [===============> ] 38.4%
Assembling matrix over cells [====================> ] 51.2%
Assembling matrix over cells [=========================> ] 64.0%
Assembling matrix over cells [==============================> ] 76.8%
Assembling matrix over cells [===================================> ] 89.6%
Assembling matrix over cells [========================================] 100.0%
Assembling vector over cells [===========> ] 29.9%
Assembling vector over cells [=======================> ] 59.7%
Assembling vector over cells [===================================> ] 89.6%
Assembling vector over cells [========================================] 100.0%
Computing sub domain markers [========================================] 100.0%
Applying boundary conditions to linear system.
<Parameter set "linear_solver" containing 8 parameter(s) and 0 nested parameter set(s)>
linear_solver | type value range access change
--------------------------------------------------------------------------
absolute_tolerance | double <unset> [] 0 0
divergence_limit | double <unset> [] 0 0
error_on_nonconvergence | bool <unset> {true, false} 0 0
maximum_iterations | int <unset> [] 0 0
monitor_convergence | bool <unset> {true, false} 0 0
nonzero_initial_guess | bool <unset> {true, false} 0 0
relative_tolerance | double 1e-10 [] 0 1
report | bool <unset> {true, false} 0 0
PETSc Krylov solver starting to solve 1056321 x 1056321 system.
KSP Object: 1 MPI processes
type: cg
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-10, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: hypre
HYPRE BoomerAMG preconditioning
HYPRE BoomerAMG: Cycle type V
HYPRE BoomerAMG: Maximum number of levels 25
HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1
HYPRE BoomerAMG: Convergence tolerance PER hypre call 0.
HYPRE BoomerAMG: Threshold for strong coupling 0.25
HYPRE BoomerAMG: Interpolation truncation factor 0.
HYPRE BoomerAMG: Interpolation: max elements per row 0
HYPRE BoomerAMG: Number of levels of aggressive coarsening 0
HYPRE BoomerAMG: Number of paths for aggressive coarsening 1
HYPRE BoomerAMG: Maximum row sums 0.9
HYPRE BoomerAMG: Sweeps down 1
HYPRE BoomerAMG: Sweeps up 1
HYPRE BoomerAMG: Sweeps on coarse 1
HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi
HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi
HYPRE BoomerAMG: Relax on coarse Gaussian-elimination
HYPRE BoomerAMG: Relax weight (all) 1.
HYPRE BoomerAMG: Outer relax weight (all) 1.
HYPRE BoomerAMG: Using CF-relaxation
HYPRE BoomerAMG: Not using more complex smoothers.
HYPRE BoomerAMG: Measure type local
HYPRE BoomerAMG: Coarsen type Falgout
HYPRE BoomerAMG: Interpolation type classical
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=1056321, cols=1056321
total: nonzeros=15584961, allocated nonzeros=15584961
total number of mallocs used during MatSetValues calls =0
not using I-node routines
[MPI_AVG] Summary of timings | reps wall avg wall tot
-------------------------------------------------------------------
Apply (PETScMatrix) | 2 0.027329 0.054659
Apply (PETScVector) | 6 8.1309e-05 0.00048785
Assemble cells | 2 2.9439 5.8879
Assemble exterior facets | 1 0.31685 0.31685
Build BoxMesh | 1 0.0024531 0.0024531
Build sparsity | 2 1.04 2.0801
Compute SCOTCH graph re-ordering | 1 0.25735 0.25735
Compute connectivity 2-1 | 3 0.63623 1.9087
Compute connectivity 2-3 | 1 0.73728 0.73728
Compute entities dim = 1 | 3 0.35475 1.0643
Compute entities dim = 2 | 4 1.586 6.3439
Computed shared mesh entities | 3 0.00013871 0.00041613
Delete sparsity | 2 7.3415e-06 1.4683e-05
DirichletBC apply | 1 0.9833 0.9833
DirichletBC compute bc | 1 0.94495 0.94495
DirichletBC init facets | 1 0.91345 0.91345
Init PETSc | 1 0.0012785 0.0012785
Init dof vector | 1 0.031135 0.031135
Init dofmap | 1 11.125 11.125
Init dofmap from UFC dofmap | 1 0.97001 0.97001
Init tensor | 2 0.065852 0.1317
Krylov solver | 1 30.748 30.748
Number distributed mesh entities | 1 3.196e-06 3.196e-06
PETSc Krylov solver | 1 30.748 30.748
PLAZA: Face long edge | 3 1.2441 3.7322
PLAZA: Get tetrahedra | 876000 4.054e-06 3.5513
PLAZA: refine | 3 5.1754 15.526
SCOTCH: call SCOTCH_graphBuild | 1 0.0011413 0.0011413
SCOTCH: call SCOTCH_graphOrder | 1 0.18641 0.18641
Z Assembling... | 1 9.6341 9.6341
Z Creating mesh... | 1 0.0032052 0.0032052
Z DirichletBC... | 1 0.00024782 0.00024782
Z Expression... | 1 0.041955 0.041955
Z Forms... | 1 0.0013435 0.0013435
Z FunctionSpace... | 1 11.133 11.133
Z PETSC Options... | 1 0.031352 0.031352
Z Refinement 1... | 1 0.20884 0.20884
Z Refinement 2... | 1 1.5389 1.5389
Z Refinement 3... | 1 13.781 13.781
Z Solving... | 1 30.749 30.749
WARNING: -log_summary is being deprecated; switch to -log_view
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
poisson.py on a arch-linux2-c-opt named 5ab6fa58350d with 1 processor, by fenics Wed Aug 31 12:54:41 2016
Using Petsc Release Version 3.7.3, Jul, 24, 2016
Max Max/Min Avg Total
Time (sec): 6.814e+01 1.00000 6.814e+01
Objects: 1.700e+01 1.00000 1.700e+01
Flops: 3.851e+08 1.00000 3.851e+08 3.851e+08
Flops/sec: 5.652e+06 1.00000 5.652e+06 5.652e+06
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 1.453e+07 1.00000 0.000e+00 1.453e+07
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 6.8138e+01 100.0% 3.8511e+08 100.0% 0.000e+00 0.0% 0.000e+00 100.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecTDot 18 1.0 2.2055e-02 1.0 3.80e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0 0 10 0 0 0 1724
VecNorm 10 1.0 7.0877e-03 1.0 2.11e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 2981
VecCopy 2 1.0 2.8162e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 22 1.0 1.4510e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 18 1.0 3.0588e-02 1.0 3.80e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0 0 10 0 0 0 1243
VecAYPX 8 1.0 1.3669e-02 1.0 1.69e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 1236
VecAssemblyBegin 6 1.0 6.8188e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 6 1.0 6.0558e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 9 1.0 2.2938e-01 1.0 2.71e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 70 0 0 0 0 70 0 0 0 1182
MatConvert 1 1.0 7.7841e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 5.2452e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 5.4584e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 1 1.0 1.4291e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 1 1.0 2.1219e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 1 1.0 1.3331e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 20 0 0 0 0 20 0 0 0 0 0
PCApply 10 1.0 1.7107e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 25 0 0 0 0 25 0 0 0 0 0
KSPSetUp 1 1.0 3.4337e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 3.0748e+01 1.0 3.85e+08 1.0 0.0e+00 0.0e+00 0.0e+00 45100 0 0 0 45100 0 0 0 13
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 1 0 0 0.
Index Set 2 2 105880 0.
IS L to G Mapping 4 4 16903632 0.
Vector 7 7 59164672 0.
Matrix 1 1 203923784 0.
Preconditioner 1 1 1384 0.
Krylov Solver 1 1 1232 0.
========================================================================================================================
Average time to get PetscTime(): 3.38554e-06
#PETSc Option Table entries:
-ksp_monitor
-ksp_view
-log_summary
-pc_hypre_boomeramg_agg_nl 4
-pc_hypre_boomeramg_agg_num_paths 2
-pc_hypre_boomeramg_P_max 5
-pc_hypre_boomeramg_print_debug 1
-pc_hypre_boomeramg_strong_threshold 0.5
-pc_hypre_boomeramg_truncfactor 0.90000000000000002
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --COPTFLAGS=-O2 --CXXOPTFLAGS=-O2 --FOPTFLAGS=-O2 --with-blas-lib=/usr/lib/libopenblas.a --with-lapack-lib=/usr/lib/liblapack.a --with-c-support --with-debugging=0 --with-shared-libraries --download-suitesparse --download-scalapack --download-metis --download-parmetis --download-ptscotch --download-hypre --download-mumps --download-blacs --download-spai --download-ml --prefix=/usr/local
-----------------------------------------
Libraries compiled on Sun Aug 28 11:48:54 2016 on 721c971d0d37
Machine characteristics: Linux-4.4.6-coreos-x86_64-with-Ubuntu-16.04-xenial
Using PETSc directory: /tmp/petsc-3.7.3
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -O2 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O2 ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/tmp/petsc-3.7.3/arch-linux2-c-opt/include -I/tmp/petsc-3.7.3/include -I/tmp/petsc-3.7.3/include -I/tmp/petsc-3.7.3/arch-linux2-c-opt/include -I/usr/local/include -I/usr/include/mpich
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/tmp/petsc-3.7.3/arch-linux2-c-opt/lib -L/tmp/petsc-3.7.3/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lpthread -lparmetis -lmetis -lHYPRE -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichcxx -lstdc++ -lscalapack -lml -lmpichcxx -lstdc++ -lspai -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -Wl,-rpath,/usr/lib -L/usr/lib -llapack -lopenblas -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lpthread -lm -lmpichfort -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -lrt -lm -lz -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -lmpich -lgcc_s -ldl
-----------------------------------------
Comments (1)
-
reporter - Log in to comment
DOLFIN is now a lot better disciplined about parsing PETSc options. If you do not explicitly ask a solver to
set_from_options()
then DOLFIN will not do it for you on your behalf. The fix for the above then is to usePETScKrylovSolver
instead ofLinearSolver
and then useset_from_options()
method on the solver. This should lead to fewer cases of DOLFIN silently/unexpectedly changing PETSc options.