Wiki
Clone wikiblopex / HypreTestingLinux
Introduction
Testing new BLOPEX in Hypre 2.6.0b. BLOPEX LOBPCG computes a number of smallest eigenpairs utilizing Hypre's IJ, Struct and SStruct interfaces. Test drivers ij, struct, sstruct in test and ex11 in examples.
Setup Hypre
See http://code.google.com/p/blopex/wiki/LLNL
All of these test were done using the LLNL ATLAS system: 8 AMD Opteron processors and 16G of memory per node, 64bit memory addressing, a total of 1152 nodes and 9216 processors. These tests use the standard BLAS/LAPACK libraries provided with hypre.
Here is a typical msub script:
#!/bin/csh #MSUB -l nodes=1 #MSUB -l walltime=10:00 #MSUB -q pbatch #MSUB -m be #MSUB -V #MSUB -o test_hypre1.out #MSUB -A asccasc ##### These are shell commands date srun -n2 ij -lobpcg -n 64 64 64 -pcgitr 0 -vrand 20 -seed 1 echo 'Done'
Execution of Tests for Hypre's IJ interface
Find 20 smallest eigenpairs of 3-D 7-point 64x64x64 Laplacian. Use 2 cores, random guess and BoomerAMG (default) preconditioner.
$ srun -n2 ij -lobpcg -n 64 64 64 -pcgitr 0 -vrand 20 -seed 1 ........ Initial Max. Residual 2.43837832680517e+00 Iteration 1 bsize 20 maxres 9.27760154340509e-01 Iteration 2 bsize 20 maxres 9.95859757156789e-02 Iteration 3 bsize 20 maxres 5.35809124539832e-02 ........ Iteration 32 bsize 3 maxres 1.63915213644935e-08 Iteration 33 bsize 2 maxres 1.00061006177016e-08 Iteration 34 bsize 1 maxres 9.79423242872933e-09 Eigenvalue lambda Residual 7.00663900604040e-03 3.11322750992554e-09 1.40078232353962e-02 2.18555493423167e-09 1.40078232353966e-02 1.56740554270983e-09 ........ 3.96606622255917e-02 9.79423242872933e-09 3.96606622273213e-02 7.52027698802309e-09 3.96606622335298e-02 7.01148270739451e-09 34 iterations ============================================= Solve phase times: ============================================= LOBPCG Solve: wall clock time = 157.880000 seconds wall MFLOPS = 0.000000 cpu clock time = 157.840000 seconds cpu MFLOPS = 0.000000
Below are the similar runs and corresponding timings on 4, 8 and 12 cores:
srun -n4 ij -lobpcg -n 64 64 64 -pcgitr 0 -vrand 20 -seed 1 ........ Initial Max. Residual 2.43707798239336e+00 Iteration 1 bsize 20 maxres 9.19578401303560e-01 Iteration 2 bsize 20 maxres 1.05575154554308e-01 Iteration 3 bsize 20 maxres 4.94189137866536e-02 ........ Iteration 39 bsize 1 maxres 2.32192704559918e-08 Iteration 40 bsize 1 maxres 1.47105047511431e-08 Iteration 41 bsize 1 maxres 8.97634657275717e-09 Eigenvalue lambda Residual 7.00663900604048e-03 3.82339243630002e-09 1.40078232353942e-02 4.64989432090594e-09 1.40078232353962e-02 4.36956251539866e-09 ........ 3.96606622168279e-02 8.01279743679407e-09 3.96606622272654e-02 7.83277321700525e-09 3.96606622273116e-02 8.39957241938649e-09 41 iterations ============================================= Solve phase times: ============================================= LOBPCG Solve: wall clock time = 76.540000 seconds wall MFLOPS = 0.000000 cpu clock time = 76.540000 seconds cpu MFLOPS = 0.000000 srun -n8 ij -lobpcg -n 64 64 64 -pcgitr 0 -vrand 20 -seed 1 ........ Initial Max. Residual 2.43975107136868e+00 Iteration 1 bsize 20 maxres 9.42509159068712e-01 Iteration 2 bsize 20 maxres 1.06547393971217e-01 Iteration 3 bsize 20 maxres 4.20822219529653e-02 ........ Iteration 33 bsize 3 maxres 1.32343428294644e-08 Iteration 34 bsize 1 maxres 1.05484409989219e-08 Iteration 35 bsize 1 maxres 9.11921415936450e-09 Eigenvalue lambda Residual 7.00663900604047e-03 3.70085899255288e-09 1.40078232353957e-02 4.90069048034002e-09 1.40078232353970e-02 1.98359423970836e-09 ........ 3.96606622273044e-02 9.11921415936450e-09 3.96606622273126e-02 6.58539147644802e-09 3.96606622277364e-02 6.55620097895291e-09 35 iterations ============================================= Solve phase times: ============================================= LOBPCG Solve: wall clock time = 40.970000 seconds wall MFLOPS = 0.000000 cpu clock time = 40.970000 seconds cpu MFLOPS = 0.000000 srun -n12 ij -lobpcg -n 64 64 64 -pcgitr 0 -vrand 20 -seed 1 ........ Initial Max. Residual 2.43769022193455e+00 Iteration 1 bsize 20 maxres 9.36812311008159e-01 Iteration 2 bsize 20 maxres 1.11863318144932e-01 Iteration 3 bsize 20 maxres 5.01242692640469e-02 ........ Iteration 33 bsize 2 maxres 1.75108824274872e-08 Iteration 34 bsize 1 maxres 1.31438097854918e-08 Iteration 35 bsize 1 maxres 9.28416855326690e-09 Eigenvalue lambda Residual 7.00663900603874e-03 1.16582862186407e-09 1.40078232353951e-02 1.97027315050180e-09 1.40078232353964e-02 1.02406385727161e-09 ........ 3.96606622272498e-02 8.57657681057407e-09 3.96606622272987e-02 9.14301724087940e-09 3.96606622273175e-02 6.16593293200329e-09 35 iterations ============================================= Solve phase times: ============================================= LOBPCG Solve: wall clock time = 61.580000 seconds wall MFLOPS = 0.000000 cpu clock time = 61.570000 seconds cpu MFLOPS = 0.000000
In all the tests the desired tolerance for residual norms (1e-8) was achieved at each eigenpair.
Execution of Tests for Hypre's Struct interface
Find 20 smallest eigenpairs of 3-D 7-point 64x64x64 Laplacian. Use 2 cores, random guess and SMG (default) preconditioner.
srun -n2 struct -lobpcg -P 2 1 1 -n 32 64 64 -pcgitr 0 -tol 1.e-6 -vrand 20 -seed 1 ........ Initial Max. Residual 2.44147768637573e+00 Iteration 1 bsize 20 maxres 9.61366917411159e-01 Iteration 2 bsize 20 maxres 1.16958182015928e-01 Iteration 3 bsize 20 maxres 7.54703862046335e-02 ........ Iteration 36 bsize 1 maxres 1.26361673376445e-06 Iteration 37 bsize 1 maxres 1.01621060896889e-06 Iteration 38 bsize 1 maxres 9.77612585752585e-07 Eigenvalue lambda Residual 7.00663900604137e-03 7.56594294724381e-08 1.40078232354006e-02 3.77279806460283e-07 1.40078232354045e-02 7.15222192077976e-07 ........ 3.96606622278755e-02 6.76919987150030e-07 3.96606622283022e-02 6.36262557074980e-07 3.96606622286556e-02 6.80569313188676e-07 38 iterations ============================================= Solve phase times: ============================================= PCG Solve: wall clock time = 210.590000 seconds wall MFLOPS = 0.000000 cpu clock time = 210.560000 seconds cpu MFLOPS = 0.000000
Below are the similar runs and corresponding timings on 4 and 8 cores:
srun -n4 struct -lobpcg -n 64 64 64 -pcgitr 0 -vrand 20 -seed 1 ........ Initial Max. Residual 2.44389386670883e+00 Iteration 1 bsize 20 maxres 9.08302043327602e-01 Iteration 2 bsize 20 maxres 9.93203912671460e-02 Iteration 3 bsize 20 maxres 6.50486977171459e-02 ........ Iteration 20 bsize 2 maxres 2.00702240659784e-06 Iteration 21 bsize 2 maxres 1.22272570610140e-06 Iteration 22 bsize 2 maxres 9.32297872042726e-07 Eigenvalue lambda Residual 4.82051933120808e-03 4.08419642237034e-07 5.26877698441874e-03 4.92678783383892e-07 6.01579864841470e-03 1.03485402240996e-07 ........ 1.54068271154632e-02 6.79964477667430e-07 1.67625995014159e-02 6.32181629832286e-07 1.70492925785677e-02 6.00467774810507e-07 22 iterations ============================================= Solve phase times: ============================================= PCG Solve: wall clock time = 429.050000 seconds wall MFLOPS = 0.000000 cpu clock time = 429.010000 seconds cpu MFLOPS = 0.000000 srun -n8 struct -lobpcg -P 8 1 1 -n 8 64 64 -pcgitr 0 -tol 1.e-6 -vrand 20 -seed 1 ........ Initial Max. Residual 2.44408933274620e+00 Iteration 1 bsize 20 maxres 1.14006356546377e+00 Iteration 2 bsize 20 maxres 1.49131321672789e-01 Iteration 3 bsize 20 maxres 1.32045118727645e-01 ........ Iteration 26 bsize 1 maxres 1.82839821558345e-06 Iteration 27 bsize 1 maxres 1.24712146542881e-06 Iteration 28 bsize 1 maxres 9.50723501978925e-07 Eigenvalue lambda Residual 7.00663900604126e-03 9.75486466223631e-08 1.40078232353970e-02 2.70584840420372e-07 1.40078232354029e-02 4.31678613414848e-07 ........ 3.96606622279406e-02 9.50723501978925e-07 3.96606622282889e-02 8.40847685491894e-07 3.96606622288377e-02 7.28101790402948e-07 28 iterations ============================================= Solve phase times: ============================================= PCG Solve: wall clock time = 65.620000 seconds wall MFLOPS = 0.000000 cpu clock time = 65.620000 seconds cpu MFLOPS = 0.000000 Done
In all the tests the desired tolerance for residual norms (1e-6) was achieved at each eigenpair.
Note: dimensions -n nx ny nz in the driver's parameter list, are assumed to be per processor, while the processors topology is defined by -P Px Py Pz.
Execution of Tests for Hypre's SStruct interface
To test Hypre's SStruct interface we use the default input file sstuct.in.default (can be found in src/test/TEST_sstruct and copied to src/test). We compute 10 smallest eigenpairs using 2 cores, random initial guess and SMG preconditioner:
srun -n2 sstruct -P 1 1 2 -lobpcg -solver 10 -tol 1.e-6 -pcgitr 0 -seed 1 -vrand 10 ........ Initial Max. Residual 3.66746834197939e+00 Iteration 1 bsize 10 maxres 3.95931766889061e+00 Iteration 2 bsize 10 maxres 2.52086174796120e+00 Iteration 3 bsize 10 maxres 1.82502104263550e+00 ........ Iteration 98 bsize 1 maxres 1.36386601977662e-06 Iteration 99 bsize 1 maxres 1.12325862360037e-06 Iteration 100 bsize 1 maxres 1.01253561108829e-06 Eigenvalue lambda Residual 1.34880848246844e+00 8.63315792564494e-07 1.36560847743383e+00 7.12441389392736e-07 1.47908699641188e+00 8.01558782275141e-07 ........ 1.81237285967070e+00 9.25492569443323e-07 1.82920488646799e+00 8.43603474141757e-07 1.84668108358999e+00 1.01253561108829e-06 100 iterations ============================================= Solve phase times: ============================================= LOBPCG Solve: wall clock time = 18.640000 seconds wall MFLOPS = 0.000000 cpu clock time = 18.630000 seconds cpu MFLOPS = 0.000000
Compute 10 smallest eigenpairs using 2 cores, random initial guess and PFMG preconditioner:
srun -n2 sstruct -P 1 1 2 -lobpcg -solver 11 -tol 1.e-6 -pcgitr 0 -seed 1 -vrand 10 ........ Initial Max. Residual 3.66746834197939e+00 Iteration 1 bsize 10 maxres 4.01329957316064e+00 Iteration 2 bsize 10 maxres 2.71181526959483e+00 Iteration 3 bsize 10 maxres 1.80491843909126e+00 ........ Iteration 98 bsize 1 maxres 2.43721630804724e-06 Iteration 99 bsize 1 maxres 2.26280601667150e-06 Iteration 100 bsize 1 maxres 2.13094586186140e-06 Eigenvalue lambda Residual 1.34880848246839e+00 7.70861393247927e-07 1.36560847743322e+00 7.60135734276439e-07 1.47908699641186e+00 9.62472424047735e-07 ........ 1.81237285961860e+00 9.97971810445109e-07 1.82920488646791e+00 9.98086964234622e-07 1.84668108359012e+00 2.13094586186140e-06 100 iterations ============================================= Solve phase times: ============================================= LOBPCG Solve: wall clock time = 5.190000 seconds wall MFLOPS = 0.000000 cpu clock time = 5.190000 seconds cpu MFLOPS = 0.000000
For both cases the desired tolerance (1e-6) for residual norms was achieved at each eigenpair.
Hypre Testing on Larger Problems
These jobs were run on LLNL using the standard BLAS/LAPACK libraries provided with hypre. The following jobs were run on 64 processors to see if we could increase the size of the problem. Runs #2 and #3 crashed (running out of memory). #4 with AMG tuning parameters from LLNL ran ok and #5 using both of the LLNL recommendations ran ok.
1) srun -n64 ij -n 200 200 200 -P 4 4 4 successful 2) srun -n64 ij -n 250 250 250 -P 4 4 4 crashed 3) srun -n64 ij -n 400 400 400 -P 4 4 4 crashed 4) srun -n64 ij -n 400 400 400 -hmis -agg_nl 1 -interptype 6 -Pmx 4 successful 5) srun -n64 ij -n 400 400 400 -P 4 4 4 -hmis -agg_nl 1 -interptype 6 -Pmx 4 successful
Test on LLNL ATLAS system. Execute tests for Hypre's IJ interface. Find 20 smallest eigenpairs of 3-D 7-point 400x400x400 Laplacian. Use 8 nodes and 64 processors, random guess and BoomerAMG (default) preconditioner.
srun -n64 ij -lobpcg -n 400 400 400 -pcgitr 0 -vrand 20 -seed 1 -P 4 4 4 -hmis -agg_nl 1 -interptype 6 -Pmx 4 ........ Laplacian: (nx, ny, nz) = (400, 400, 400) (Px, Py, Pz) = (4, 4, 4) (cx, cy, cz) = (1.000000, 1.000000, 1.000000) ........ Initial Max. Residual 2.44739676813972e+00 Iteration 1 bsize 20 maxres 8.10036373965025e-01 Iteration 2 bsize 20 maxres 2.74010627338399e-02 Iteration 3 bsize 20 maxres 2.38522241400591e-02 ........ Iteration 98 bsize 3 maxres 1.75733759952715e-06 Iteration 99 bsize 3 maxres 1.52178814791134e-06 Iteration 100 bsize 3 maxres 1.41696513919346e-06 ........ Eigenvalue lambda Residual 1.84132323622540e-04 7.88362030113262e-09 3.68260879920936e-04 6.32103299991598e-09 3.68260879931142e-04 6.25805840799856e-09 ........ 1.04338636286942e-03 1.82336151666657e-08 1.04338636342027e-03 4.31563373965021e-07 1.04338637164431e-03 1.41696513919346e-06 100 iterations ============================================= Solve phase times: ============================================= LOBPCG Solve: wall clock time = 4176.460000 seconds wall MFLOPS = 0.000000 cpu clock time = 4175.910000 seconds cpu MFLOPS = 0.000000
Test on LLNL ATLAS system. Execute tests for Hypre's IJ interface. Find 20 smallest eigenpairs of 3-D 7-point 500x500x500 Laplacian. Use 64 nodes and 512 processors, random guess and BoomerAMG (default) preconditioner.
srun -n512 ij -lobpcg -n 500 500 500 -pcgitr 0 -vrand 20 -seed 1 -P 8 8 8 -hmis -agg_nl 1 -interptype 6 -Pmx 4 ........ Laplacian: (nx, ny, nz) = (500, 500, 500) (Px, Py, Pz) = (8, 8, 8) (cx, cy, cz) = (1.000000, 1.000000, 1.000000) ........ Initial Max. Residual 2.44762003452109e+00 Iteration 1 bsize 20 maxres 7.78059751283823e-01 Iteration 2 bsize 20 maxres 2.24893106059477e-02 Iteration 3 bsize 20 maxres 1.80924248589866e-02 ........ Iteration 98 bsize 3 maxres 1.88845907948380e-06 Iteration 99 bsize 3 maxres 1.70505327519112e-06 Iteration 100 bsize 3 maxres 1.69226446409738e-06 ........ Eigenvalue lambda Residual 1.17962542906779e-04 9.12159288714868e-09 2.35923539283847e-04 9.26846395155813e-09 2.35923539299908e-04 6.48999956327757e-09 ........ 6.68442040292635e-04 4.06660781310217e-07 6.68442041063227e-04 6.45320260847077e-07 6.68442052000848e-04 1.69226446409738e-06 100 iterations ============================================= Solve phase times: ============================================= LOBPCG Solve: wall clock time = 1166.910000 seconds wall MFLOPS = 0.000000 cpu clock time = 1166.760000 seconds cpu MFLOPS = 0.000000
Hypre Testing on Larger Problems (AMD BLAS/LAPACK libraries)
Test on LLNL ATLAS system. Execute tests for Hypre's IJ interface. Find 20 smallest eigenpairs of 3-D 7-point 400x400x400 Laplacian. Use 8 nodes and 64 processors, random guess and BoomerAMG (default) preconditioner.
srun -n64 ij -lobpcg -n 400 400 400 -pcgitr 0 -vrand 20 -seed 1 -P 4 4 4 -hmis -agg_nl 1 -interptype 6 -Pmx 4 ........ Laplacian: (nx, ny, nz) = (400, 400, 400) (Px, Py, Pz) = (4, 4, 4) (cx, cy, cz) = (1.000000, 1.000000, 1.000000) ........ Initial Max. Residual 2.44739676813973e+00 Iteration 1 bsize 20 maxres 8.10036373965024e-01 Iteration 2 bsize 20 maxres 2.74010627338399e-02 Iteration 3 bsize 20 maxres 2.38522241565166e-02 ........ Iteration 98 bsize 3 maxres 1.76268238384281e-06 Iteration 99 bsize 3 maxres 1.52652960627237e-06 Iteration 100 bsize 3 maxres 1.42073159770084e-06 ........ Eigenvalue lambda Residual 1.84132323721136e-04 7.88363056635417e-09 3.68260879905147e-04 8.50482771093186e-09 3.68260879923057e-04 6.97386968653446e-09 ........ 1.04338636289971e-03 4.71260888702569e-08 1.04338636337951e-03 4.16761054854837e-07 1.04338637175944e-03 1.42073159770084e-06 100 iterations ============================================= Solve phase times: ============================================= LOBPCG Solve: wall clock time = 4109.830000 seconds wall MFLOPS = 0.000000 cpu clock time = 4108.830000 seconds cpu MFLOPS = 0.000000
Updated