This README file serves for requesting the "SISC Reproducibility Badge: code and data available" of the paper "A Massively Parallel Augmented Subspace Eigensolver and Its Implementation for Large Scale Eigenvalue Problems", authored by Yangfei Liao, Haochen Liu, Zijing Wang, and Hehu Xie.
GitLab repository link is https://gitlab.com/xiegroup/pase1.0.git.
The numerical examples are carried out on LSSC-IV in the State Key Laboratory of Scientific and Engineering Computing, Chinese Academy of Sciences. Each computing node has two 18-core Intel Xeon Gold 6140 processors at 2.3 GHz and 192 GB memory. In the following tests, we utilized the Intel MPI compiler (mpiicc), and both BLAS and LAPACK are versions of Intel MKL.
In the PASE 1.0 repository, there are a total of two folders, namely OpenPFEM4PASE and PASE. In Section 6 of the paper, a total of five test cases are presented. The corresponding test programs for these cases are as follows:
./pase/pase/test/test_gmg1.c../pase/pase/test/test_gmg2.c../pase/pase/test/test_gmg3.c and ./pase/pase/test/test_gmg4.c../OpenPFEM4PASE/test/Eigen3D_adapted_pase.c../OpenPFEM4PASE/test/Eigen3D_adapted_pase2.c.Install MPI, BLAS, LAPACK, PETSc and SLEPc. And export the corresponding paths in the bashrc file.
Note that when installing PETSc, it is necessary to simultaneously install packages METIS and ParMETIS.
Installation for PASE
The source code path for the PASE software package is pase1.0/PASE.
After configuring and installing the external environment, it is necessary to modify the ./CMakeLists.txt in project PASE.
./config/LSSC4_oneapi2021 according to the MPI environment../CMakeLists .OPENPFEM_DIR path to pase1.0/OpenPFEM4PASE/shared(Absolute path) in bashrc file.Enter the path and then execute the following commands to install PASE
41mkdir build2cd build3cmake ../4make| Parameters | Description |
|---|---|
| nev | number of eigenvalues to compute |
| num_levels | number of multigrid levels |
| initial_level | the layer at which the initial value is calculated |
| aux_rtol | convergence tolerance of the eigenproblem (3.2) in Algorithm 1 |
| pc_type | type of preconditioner |
| if_batches | whether to implement a batching strategy |
| batch_size | size of each batch (the |
| more_aux_nev | number of additional eigenpairs of the eigenvalue problem 5.3 to be solved such that the desired ones are contained (the |
| more_batch_size | number of additional eigenpairs computed during each batch in order to help PASE converges faster (the |
| max_refine | number of refinement times |
| smoothing_type | type of smoothing method |
| Parameters | Description |
|---|---|
| nevConv | number of eigenvalues to compute |
| gapMin | threshold for detecting repeated roots |
| nevGiven | number of known eigenpairs |
| block_size | sizes of |
| nevMax | the maximum number of eigenpairs to compute |
| nevInit | initial size of |
| tol_gcg[0] | absolute tolerance |
| tol_gcg[1] | relative tolerance |
| compW_method | the method for linear solver |
| Parameters | Description |
|---|---|
| nev | number of eigenvalues to compute |
| ncv | the maximum dimension of the subspace to be used by the solver |
| mpd | the maximum dimension allowed for the projected problem |
| maxit | the maximum iteration count |
| blocksize | the block size in LOBPCG |
| restart | the percentage of the block of vectors to force a restart in LOBPCG |
The corresponding test code is ./pase/test/test_gmg1.c in project PASE
Algorithm-related parameters
Set the macro definition of TESTPASE on line 11 of the program to
11Set the number of eigenpairs on Line 26 to
11int nev = 200; //800 for table 2Set the number of multigrid levels on Line 27 to
11int num_levels = 3;Set the numbers of uniform refinements while multigrid generation using FEM on Line 29 to
11MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);The fourth parameter of function MatrixRead means the number of refinements before the parallelization of FEM mesh, and the fifth parameter means the number of refinements after the parallelization of FEM mesh. The sum of the two number is the total number of uniform refinements. In this example,
The fourth parameter ofPASE_PARAMETER_Create represents the convergence accuracy of PASE, which is set to
11PASE_PARAMETER_Create(¶m, num_levels, nev, 1e-8, PASE_GMG);Set the layer at which the initial value is calculated is represented by the following on Line 39 to
11param->initial_level = 1;Set the tolerance of the eigenproblem (3.2) in Algorithm 1 on Line 40 to
11param->aux_rtol = 1e-9;Set the precondition type on Line 41 to
xxxxxxxxxx41param->pc_type = PRECOND_A; // PRECOND_NONE, PRECOND_B, PRECOND_B_A2//PRECOND_NONE: Do not impose any preconditioning on A_{Hh}3//PRECOND_B : Only impose preconditioning on B_{Hh}4// PRECOND_B_A: Impose preconditioning on B_{Hh} first, and then impose preconditionint on \tilde{A_{Hh}} during the solution of the linear system of equationsIn Table 1 and Table 2, only the numbers of uniform refinements have been altered. Specifically,
xxxxxxxxxx61//Line 2 of Table 1,22MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);3//Line 3 of Table 1,24MatrixRead(&A_array, &B_array, &P_array, 5, 1, num_levels, MPI_COMM_WORLD);5//Line 4 of Table 1,26MatrixRead(&A_array, &B_array, &P_array, 5, 2, num_levels, MPI_COMM_WORLD);Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx21-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \2-mat_superlu_dist_replacetinypivotto get better efficiency when using directsolver in PETSc(KSP).
The CPU time of PASE in Table 1
Run the executable program ./build/test1 , set the parameter nev as
The number of runtime processes is set to
The CPU time of PASE in Table 2
Run the executable program ./build/test1 , set the parameter nev as
The number of runtime processes is set to
Figure 2
Run the executable program ./build/test1 with different numbers of processes , set the parameter nev as
Run the executable program ./build/test1 , change the parameter nev and the parameter pc_type. The number of process is set to
xxxxxxxxxx21param->pc_type = PRECOND_A;2//param->pc_type = PRECOND_NONE;The corresponding test code is ./pase/test/test_gmg1.c in project PASE
Algorithm-related parameters
Set the macro definition of TESTPASE on line 11 of the program to
xxxxxxxxxx11Set the number of eigenpairs on Line 26 to
xxxxxxxxxx11int nev = 200; //800 for table 2In the function PASE_DIRECT_GCGE, the relevant GCG algorithm parameters have already been set based on the size of nev, and the user only needs to provide the convergence accuracy. Specifically, when nev is 200 and 800 respectively, the corresponding parameters are set as follows
xxxxxxxxxx71nev = 200;2nevConv = nev;3atol = 1e-2;4rtol = 1e-8;5block_size = nevConv / 5;6nevMax = 2 * nevConv;7nevInit = nevConv;xxxxxxxxxx71nev = 800;2nevConv = nev;3atol = 1e-2;4rtol = 1e-8;5block_size = nevConv / 10;6nevInit = 3 * block_size;7nevMax = nevInit + nevConv;Modify the parameters as mentioned above and recompile.
Running process
The CPU time of GCGE in Table 1 and 2
Run the executable program ./build/test1 , set the parameter nev as
The corresponding test code is ./pase/test/test_gmg1.c in project PASE.
We conduct tests separately for the Krylov-Schur and LOBPCG methods and obtain optimal settings for ncv and mpd. Additionally, corresponding parameters were set for the LOBPCG method to achieve maximum efficiency.
Algorithm-related parameters
Set the macro definition of TESTPASE on line 11 of the program to
xxxxxxxxxx11Set the number of eigenpairs on Line 26 to
xxxxxxxxxx11int nev = 200; //800 for table 2Set the parameters for selecting theEPS type on line 66 to choose Krylov-Schur and LOBPCG as solvers respectively.
xxxxxxxxxx11flag = 2; //flag = 2: krylov-Schur, flag = 6: lobpcgSet the parameters for Krylov-Schur and LOBPCG. These parameters have already been set in the function PASE_DIRECT_EPS.
xxxxxxxxxx131if (flag == 2) // ks2{3 ncv = 2 * nev;4 mpd = ncv;5}6else if (flag == 6) // lobpcg7{8 ncv = 2 * nev;9 blocksize = nev / 5;10 restart = 0.1;11 mpd = 3 * blocksize;12}13max_it = 2000;Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx11-st_type sinvert -st_shift 0.0 -eps_target 0.0to get better efficiency when using Krylov-Schur.
The CPU time of Krylov-Schur and LOBPCG in Table 1 and 2
Run the executable program ./build/test1 , set the parameter nev as
The corresponding test code is ./pase/test/test_gmg2.c in project PASE.
Algorithm-related parameters :
Set the macro definition of TESTPASE on line 11 of the program to
xxxxxxxxxx11Set the number of eigenpairs on Line 26 to
xxxxxxxxxx11int nev = 200; //800 for table 4Set the number of multigrid levels on Line 27 to
xxxxxxxxxx11int num_levels = 3;Set the numbers of uniform refinements while multigrid generation using FEM on Line 29 to
xxxxxxxxxx11MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);The fourth parameter of function MatrixRead means the number of refinements before the parallelization of FEM mesh, and the fifth parameter means the number of refinements after the parallelization of FEM mesh. The sum of the two number is the total number of uniform refinements. In this example,
The fourth parameter ofPASE_PARAMETER_Create represents the convergence accuracy of PASE, which is set to
xxxxxxxxxx11PASE_PARAMETER_Create(¶m, num_levels, nev, 1e-8, PASE_GMG);Set the layer at which the initial value is calculated is represented by the following on Line 39 to
xxxxxxxxxx11param->initial_level = 1;Set the tolerance of the eigenproblem (3.2) in Algorithm 1 on Line 40 to
xxxxxxxxxx11param->aux_rtol = 1e-9;Set the precondition type on Line 41 to
xxxxxxxxxx11param->pc_type = PRECOND_A;In Table 3 and Table 4, only the numbers of uniform refinements have been altered. Specifically,
xxxxxxxxxx61//Line 2 of Table 3,42MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);3//Line 3 of Table 3,44MatrixRead(&A_array, &B_array, &P_array, 5, 1, num_levels, MPI_COMM_WORLD);5//Line 4 of Table 3,46MatrixRead(&A_array, &B_array, &P_array, 5, 2, num_levels, MPI_COMM_WORLD);Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx21-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \2-mat_superlu_dist_replacetinypivotto get better efficiency when using directsolver in PETSc(KSP).
The CPU time of PASE in Table 3
Run the executable program ./build/test2 , set the parameter nev as
The number of runtime processes is set to
The CPU time of PASE in Table 4
Run the executable program ./build/test2 , set the parameter nev as
The number of runtime processes is set to
Figure 3
Run the executable program ./build/test2 with different numbers of processes , set the parameter nev as
The corresponding test code is ./pase/test/test_gmg2.c in project PASE.
Algorithm-related parameters
Set the macro definition of TESTPASE on line 11 of the program to
xxxxxxxxxx11Set the number of eigenpairs on Line 26 to
xxxxxxxxxx11int nev = 200; //800 for table 4In the function PASE_DIRECT_GCGE, the relevant GCG algorithm parameters have already been set based on the size of nev, and the user only needs to provide the convergence accuracy. Specifically, when nev is
xxxxxxxxxx71nev = 200;2atol = 1e-2;3rtol = 1e-8;4nevConv = nev;5block_size = nevConv / 5;6nevMax = 2 * nevConv;7nevInit = nevConv;xxxxxxxxxx71nev = 800;2atol = 1e-2;3rtol = 1e-8;4nevConv = nev;5block_size = nevConv / 10;6nevInit = 3 * block_size;7nevMax = nevInit + nevConv;Modify the parameters as mentioned above and recompile.
Running process
The CPU time of GCGE in Table 3 and 4
Run the executable program ./build/test2 , set the parameter nev as
The corresponding test code is ./pase/test/test_gmg2.c in project PASE.
We conduct tests separately for the Krylov-Schur and LOBPCG methods and obtain optimal settings for ncv and mpd. Additionally, corresponding parameters were set for the LOBPCG method to achieve maximum efficiency.
Algorithm-related parameters
Set the macro definition of TESTPASE on line 11 of the program to
xxxxxxxxxx11Set the number of eigenpairs on Line 26 to
xxxxxxxxxx11int nev = 200; //800 for table 4Set the parameters for selecting theEPS type on line 66 to choose Krylov-Schur and LOBPCG as solvers respectively.
xxxxxxxxxx11flag = 2; //flag = 2: krylov-Schur, flag = 6: lobpcgSet the parameters for Krylov-Schur and LOBPCG. These parameters have already been set in the function PASE_DIRECT_EPS.
xxxxxxxxxx131if (flag == 2) // ks2{3 ncv = 2 * nev;4 mpd = ncv;5}6else if (flag == 6) // lobpcg7{8 ncv = 2 * nev;9 blocksize = nev / 5;10 restart = 0.1;11 mpd = 3 * blocksize;12}13max_it = 2000;Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx11-st_type sinvert -st_shift 0.0 -eps_target 0.0to get better efficiency when using Krylov-Schur.
The CPU time of Krylov-Schur and LOBPCG in Table 3 and 4
Run the executable program ./build/test2 , set the parameter nev as
The corresponding test code is ./pase/test/test_gmg3.c and ./pase/test/test_gmg4.c in project PASE.
Algorithm-related parameters
Set the macro definition of TESTPASE on line 11 of the program to
xxxxxxxxxx11Set the number of eigenpairs on Line 26 to
xxxxxxxxxx11int nev = 2000;Set the number of multigrid levels on Line 27 to
xxxxxxxxxx11int num_levels = 3;Set the numbers of uniform refinements while multigrid generation using FEM on Line 29 to
xxxxxxxxxx11MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);The fourth parameter of function MatrixRead means the number of refinements before the parallelization of FEM mesh, and the fifth parameter means the number of refinements after the parallelization of FEM mesh. The sum of the two number is the total number of uniform refinements. In this example,
The fourth parameter ofPASE_PARAMETER_Create represents the convergence accuracy of PASE, which is set to
xxxxxxxxxx11PASE_PARAMETER_Create(¶m, num_levels, nev, 1e-8, PASE_GMG);Set the layer at which the initial value is calculated is represented by the following on Line 39 to
xxxxxxxxxx11param->initial_level = 1;Set the tolerance of the eigenproblem (3.2) in Algorithm 1 on Line 40 to
xxxxxxxxxx11param->aux_rtol = 1e-9;Set the precondition type on Line 41 to
xxxxxxxxxx11param->pc_type = PRECOND_A;Set the parameter indicating whether to implement a batching strategy (Algorithm 11) on Line 42 to true
xxxxxxxxxx11param->if_batches = true;Set the size of each batch (the
xxxxxxxxxx11param->batch_size = 500;Set the number of additional eigenpairs of the eigenvalue problem 5.3 to be solved such that the desired ones are contained (the
xxxxxxxxxx11param->more_aux_nev = 60;Set the number of additional eigenpairs computed during each batch in order to help PASE converges faster (the test_gmg3.c and test_gmg4.c.
xxxxxxxxxx21param->more_batch_size = 70;2//param->more_batch_size = 80;In Table 5 and Table 6, only the numbers of uniform refinements have been altered. Specifically,
xxxxxxxxxx41//Line 2 of Table 5,62MatrixRead(&A_array, &B_array, &P_array, 5, 1, num_levels, MPI_COMM_WORLD);3//Line 3 of Table 5,64MatrixRead(&A_array, &B_array, &P_array, 5, 2, num_levels, MPI_COMM_WORLD);Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx21-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \2-mat_superlu_dist_replacetinypivotto get better efficiency when using directsolver in PETSc(KSP).
The CPU time of PASE in Table 5
Run the executable program ./build/test3 and change the numbers of uniform refinements as stated above.
The number of runtime processes is set to
The CPU time of PASE in Table 6
Run the executable program ./build/test4 and change the numbers of uniform refinements as stated above.
The number of runtime processes is set to
The corresponding test code is ./pase/test/test_gmg3.c and ./pase/test/test_gmg4.c in project PASE.
Algorithm-related parameters
Set the macro definition of TESTPASE on line 11 of the program to
xxxxxxxxxx11Set the number of eigenpairs on Line 26 to
xxxxxxxxxx11int nev = 2000;In the function PASE_DIRECT_GCGE, the relevant GCG algorithm parameters have already been set based on the size of nev, and the user only needs to provide the convergence accuracy. Specifically, when nev is
xxxxxxxxxx71nev = 2000;2nevConv = nev;3atol = 1e-2;4rtol = 1e-8;5block_size = 200;6nevInit = 3 * block_size;7nevMax = nevInit + nevConv;Modify the parameters as mentioned above and recompile.
Running process
The CPU time of GCGE in Table 5 and 6
Run the executable program ./build/test3 and ./build/test4, set the parameter nev as
The corresponding test code is ./pase/test/test_gmg3.c and ./pase/test/test_gmg4.c in project PASE.
We conduct tests separately for the Krylov-Schur and LOBPCG methods and obtain optimal settings for ncv and mpd. Additionally, corresponding parameters were set for the LOBPCG method to achieve maximum efficiency.
Algorithm-related parameters
Set the macro definition of TESTPASE on line 11 of the program to
xxxxxxxxxx11Set the number of eigenpairs on Line 26 to
xxxxxxxxxx11int nev = 2000;Set the parameters for selecting theEPS type on Line 70 to choose Krylov-Schur and LOBPCG as solvers respectively.
xxxxxxxxxx11flag = 2; //flag = 2: krylov-Schur, flag = 6: lobpcgSet the parameters for Krylov-Schur and LOBPCG. These parameters have already been set in the function PASE_DIRECT_EPS.
xxxxxxxxxx131if (flag == 2) // ks2{3 ncv = 2400;4 mpd = 800;5}6else if (flag == 6) // lobpcg7{8 ncv = 2 * nev;9 blocksize = nev / 5;10 restart = 0.1;11 mpd = 3 * blocksize;12}13max_it = 2000;Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx11-st_type sinvert -st_shift 0.0 -eps_target 0.0to get better efficiency when using Krylov-Schur.
The CPU time of Krylov-Schur and LOBPCG in Table 5 and 6
Run the executable program ./build/test3 and ./build/test4, set the parameter nev as
The corresponding test code is ./test/Eigen3D_adapted_pase.c in project OPENPFEM4PASE.
Compilation process
Install MPI and PASE. And export the corresponding paths in the bashrc file.
Installation for OpenPFEM4PASE
The source code path for the OpenPFEM software package is pase1.0/OpenPFEM4PASE.
After configuring and installing the external environment, it is necessary to modify the ./CMakeLists.txt in project OpenPFEM4PASE.
./config/LSSC4_oneapi2021 according to the MPI environment../CMakeLists .PASE_DIR path to pase1.0/PASE/shared(Absolute path) in bashrc file.Enter the path and then execute the following commands to install OpenPFEM4PASE
xxxxxxxxxx41mkdir build2cd build3cmake ../4makeAlgorithm-related parameters
Set the number of refinement times on Line 79 to
xxxxxxxxxx11INT max_refine = 10;Set the number of eigenpairs on Line 80 to
xxxxxxxxxx11INT nev = 200;Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx21-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \2-mat_superlu_dist_replacetinypivotto get better efficiency when using directsolver in PETSc(KSP).
Figure 4
Run the executable program ./build/test4 , the data used in the Figure 4 will be printed in the output file like
xxxxxxxxxx41element N = [ 196608 236828 319778 444937 639576 940419 1403539 2081119 2974727 43362822];3PostErr = [ 1.2211 0.8477 0.5667 0.3736 0.2381 0.1534 0.1023 0.069 0.0466 0.03084];
Figure 5
Run the executable program ./build/test4 with different numbers of processes, the data used in the Figure 5 will be printed in the output file like
xxxxxxxxxx11The total time of PASE solving after 5 times refinements is 619.915023 sec.
The corresponding test code is ./test/Eigen3D_adapted_pase2.c in project OPENPFEM4PASE.
Compilation process
Install MPI and PASE. And export the corresponding paths in the bashrc file.
Installation for OpenPFEM4PASE
The source code path for the OpenPFEM software package is pase1.0/OpenPFEM4PASE.
After configuring and installing the external environment, it is necessary to modify the ./CMakeLists.txt in project OpenPFEM4PASE.
./config/LSSC4_oneapi2021 according to the MPI environment../CMakeLists .PASE_DIR path to pase1.0/PASE/shared(Absolute path) in bashrc file.Enter the path and then execute the following commands to install OpenPFEM4PASE
xxxxxxxxxx41mkdir build2cd build3cmake ../4makemake sure the executable in ./CMakeLists.txt is ${CMAKE_CURRENT_SOURCE_DIR}/test/Eigen3D_adapted_pase2.c
Algorithm-related parameters
Set the number of refinement times on Line 79 to
xxxxxxxxxx11INT max_refine = 10;Set the number of eigenpairs on Line 80 to
xxxxxxxxxx11INT nev = 200;Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx21-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \2-mat_superlu_dist_replacetinypivotto get better efficiency when using directsolver in PETSc(KSP).
Figure 6
Run the executable program ./build/test5 , the data used in the Figure 6 will be printed in the output file like
xxxxxxxxxx41element N = [ 196608 236828 319778 444937 639576 940419 1403539 2081119 2974727 43362822];3PostErr = [ 1.2211 0.8477 0.5667 0.3736 0.2381 0.1534 0.1023 0.069 0.0466 0.03084];
Figure 7
Run the executable program ./build/test5 with different numbers of processes, the data used in the Figure 7 will be printed in the output file like
xxxxxxxxxx11The total time of PASE solving after 5 times refinements is 619.915023 sec.
In the default settings, we print iteration information of PASE. If you do not need to output relevant information, please set the value of PRINT_INFO to 0 in line 25 of ./pase/src/pase_sol.c.