This README file serves for requesting the "SISC Reproducibility Badge: code and data available" of the paper "A Massively Parallel Augmented Subspace Eigensolver and Its Implementation for Large Scale Eigenvalue Problems", authored by Yangfei Liao, Haochen Liu, Zijing Wang, and Hehu Xie.
GitLab repository link is https://gitlab.com/xiegroup/pase1.0.git.
The numerical examples are carried out on LSSC-IV in the State Key Laboratory of Scientific and Engineering Computing, Chinese Academy of Sciences. Each computing node has two 18-core Intel Xeon Gold 6140 processors at 2.3 GHz and 192 GB memory. In the following tests, we utilized the Intel MPI compiler (mpiicc), and both BLAS and LAPACK are versions of Intel MKL.
In the PASE 1.0 repository, there are a total of two folders, namely OpenPFEM4PASE and PASE. In Section 6 of the paper, a total of five test cases are presented. The corresponding test programs for these cases are as follows:
./pase/pase/test/test_gmg1.c
../pase/pase/test/test_gmg2.c
../pase/pase/test/test_gmg3.c
and ./pase/pase/test/test_gmg4.c
../OpenPFEM4PASE/test/Eigen3D_adapted_pase.c
../OpenPFEM4PASE/test/Eigen3D_adapted_pase2.c
.Install MPI, BLAS, LAPACK, PETSc and SLEPc. And export the corresponding paths in the bashrc
file.
Note that when installing PETSc, it is necessary to simultaneously install packages METIS
and ParMETIS
.
Installation for PASE
The source code path for the PASE software package is pase1.0/PASE
.
After configuring and installing the external environment, it is necessary to modify the ./CMakeLists.txt
in project PASE.
./config/LSSC4_oneapi2021
according to the MPI environment../CMakeLists
.OPENPFEM_DIR
path to pase1.0/OpenPFEM4PASE/shared
(Absolute path) in bashrc
file.Enter the path and then execute the following commands to install PASE
41mkdir build
2cd build
3cmake ../
4make
Parameters | Description |
---|---|
nev | number of eigenvalues to compute |
num_levels | number of multigrid levels |
initial_level | the layer at which the initial value is calculated |
aux_rtol | convergence tolerance of the eigenproblem (3.2) in Algorithm 1 |
pc_type | type of preconditioner |
if_batches | whether to implement a batching strategy |
batch_size | size of each batch (the |
more_aux_nev | number of additional eigenpairs of the eigenvalue problem 5.3 to be solved such that the desired ones are contained (the |
more_batch_size | number of additional eigenpairs computed during each batch in order to help PASE converges faster (the |
max_refine | number of refinement times |
smoothing_type | type of smoothing method |
Parameters | Description |
---|---|
nevConv | number of eigenvalues to compute |
gapMin | threshold for detecting repeated roots |
nevGiven | number of known eigenpairs |
block_size | sizes of |
nevMax | the maximum number of eigenpairs to compute |
nevInit | initial size of |
tol_gcg[0] | absolute tolerance |
tol_gcg[1] | relative tolerance |
compW_method | the method for linear solver |
Parameters | Description |
---|---|
nev | number of eigenvalues to compute |
ncv | the maximum dimension of the subspace to be used by the solver |
mpd | the maximum dimension allowed for the projected problem |
maxit | the maximum iteration count |
blocksize | the block size in LOBPCG |
restart | the percentage of the block of vectors to force a restart in LOBPCG |
The corresponding test code is ./pase/test/test_gmg1.c
in project PASE
Algorithm-related parameters
Set the macro definition of TESTPASE on line 11 of the program to
11
Set the number of eigenpairs on Line 26 to
11int nev = 200; //800 for table 2
Set the number of multigrid levels on Line 27 to
11int num_levels = 3;
Set the numbers of uniform refinements while multigrid generation using FEM on Line 29 to
11MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);
The fourth parameter of function MatrixRead
means the number of refinements before the parallelization of FEM mesh, and the fifth parameter means the number of refinements after the parallelization of FEM mesh. The sum of the two number is the total number of uniform refinements. In this example,
The fourth parameter ofPASE_PARAMETER_Create
represents the convergence accuracy of PASE, which is set to
11PASE_PARAMETER_Create(¶m, num_levels, nev, 1e-8, PASE_GMG);
Set the layer at which the initial value is calculated is represented by the following on Line 39 to
11param->initial_level = 1;
Set the tolerance of the eigenproblem (3.2) in Algorithm 1 on Line 40 to
11param->aux_rtol = 1e-9;
Set the precondition type on Line 41 to
xxxxxxxxxx
41param->pc_type = PRECOND_A; // PRECOND_NONE, PRECOND_B, PRECOND_B_A
2//PRECOND_NONE: Do not impose any preconditioning on A_{Hh}
3//PRECOND_B : Only impose preconditioning on B_{Hh}
4// PRECOND_B_A: Impose preconditioning on B_{Hh} first, and then impose preconditionint on \tilde{A_{Hh}} during the solution of the linear system of equations
In Table 1 and Table 2, only the numbers of uniform refinements have been altered. Specifically,
xxxxxxxxxx
61//Line 2 of Table 1,2
2MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);
3//Line 3 of Table 1,2
4MatrixRead(&A_array, &B_array, &P_array, 5, 1, num_levels, MPI_COMM_WORLD);
5//Line 4 of Table 1,2
6MatrixRead(&A_array, &B_array, &P_array, 5, 2, num_levels, MPI_COMM_WORLD);
Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx
21-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \
2-mat_superlu_dist_replacetinypivot
to get better efficiency when using directsolver in PETSc(KSP).
The CPU time of PASE in Table 1
Run the executable program ./build/test1
, set the parameter nev
as
The number of runtime processes is set to
The CPU time of PASE in Table 2
Run the executable program ./build/test1
, set the parameter nev
as
The number of runtime processes is set to
Figure 2
Run the executable program ./build/test1
with different numbers of processes , set the parameter nev
as
Run the executable program ./build/test1
, change the parameter nev
and the parameter pc_type
. The number of process is set to
xxxxxxxxxx
21param->pc_type = PRECOND_A;
2//param->pc_type = PRECOND_NONE;
The corresponding test code is ./pase/test/test_gmg1.c
in project PASE
Algorithm-related parameters
Set the macro definition of TESTPASE
on line 11 of the program to
xxxxxxxxxx
11
Set the number of eigenpairs on Line 26 to
xxxxxxxxxx
11int nev = 200; //800 for table 2
In the function PASE_DIRECT_GCGE
, the relevant GCG algorithm parameters have already been set based on the size of nev
, and the user only needs to provide the convergence accuracy. Specifically, when nev
is 200 and 800 respectively, the corresponding parameters are set as follows
xxxxxxxxxx
71nev = 200;
2nevConv = nev;
3atol = 1e-2;
4rtol = 1e-8;
5block_size = nevConv / 5;
6nevMax = 2 * nevConv;
7nevInit = nevConv;
xxxxxxxxxx
71nev = 800;
2nevConv = nev;
3atol = 1e-2;
4rtol = 1e-8;
5block_size = nevConv / 10;
6nevInit = 3 * block_size;
7nevMax = nevInit + nevConv;
Modify the parameters as mentioned above and recompile.
Running process
The CPU time of GCGE in Table 1 and 2
Run the executable program ./build/test1
, set the parameter nev
as
The corresponding test code is ./pase/test/test_gmg1.c
in project PASE.
We conduct tests separately for the Krylov-Schur and LOBPCG methods and obtain optimal settings for ncv
and mpd
. Additionally, corresponding parameters were set for the LOBPCG method to achieve maximum efficiency.
Algorithm-related parameters
Set the macro definition of TESTPASE
on line 11 of the program to
xxxxxxxxxx
11
Set the number of eigenpairs on Line 26 to
xxxxxxxxxx
11int nev = 200; //800 for table 2
Set the parameters for selecting theEPS
type on line 66 to choose Krylov-Schur and LOBPCG as solvers respectively.
xxxxxxxxxx
11flag = 2; //flag = 2: krylov-Schur, flag = 6: lobpcg
Set the parameters for Krylov-Schur and LOBPCG. These parameters have already been set in the function PASE_DIRECT_EPS
.
xxxxxxxxxx
131if (flag == 2) // ks
2{
3 ncv = 2 * nev;
4 mpd = ncv;
5}
6else if (flag == 6) // lobpcg
7{
8 ncv = 2 * nev;
9 blocksize = nev / 5;
10 restart = 0.1;
11 mpd = 3 * blocksize;
12}
13max_it = 2000;
Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx
11-st_type sinvert -st_shift 0.0 -eps_target 0.0
to get better efficiency when using Krylov-Schur.
The CPU time of Krylov-Schur and LOBPCG in Table 1 and 2
Run the executable program ./build/test1
, set the parameter nev
as
The corresponding test code is ./pase/test/test_gmg2.c
in project PASE.
Algorithm-related parameters :
Set the macro definition of TESTPASE on line 11 of the program to
xxxxxxxxxx
11
Set the number of eigenpairs on Line 26 to
xxxxxxxxxx
11int nev = 200; //800 for table 4
Set the number of multigrid levels on Line 27 to
xxxxxxxxxx
11int num_levels = 3;
Set the numbers of uniform refinements while multigrid generation using FEM on Line 29 to
xxxxxxxxxx
11MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);
The fourth parameter of function MatrixRead
means the number of refinements before the parallelization of FEM mesh, and the fifth parameter means the number of refinements after the parallelization of FEM mesh. The sum of the two number is the total number of uniform refinements. In this example,
The fourth parameter ofPASE_PARAMETER_Create
represents the convergence accuracy of PASE, which is set to
xxxxxxxxxx
11PASE_PARAMETER_Create(¶m, num_levels, nev, 1e-8, PASE_GMG);
Set the layer at which the initial value is calculated is represented by the following on Line 39 to
xxxxxxxxxx
11param->initial_level = 1;
Set the tolerance of the eigenproblem (3.2) in Algorithm 1 on Line 40 to
xxxxxxxxxx
11param->aux_rtol = 1e-9;
Set the precondition type on Line 41 to
xxxxxxxxxx
11param->pc_type = PRECOND_A;
In Table 3 and Table 4, only the numbers of uniform refinements have been altered. Specifically,
xxxxxxxxxx
61//Line 2 of Table 3,4
2MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);
3//Line 3 of Table 3,4
4MatrixRead(&A_array, &B_array, &P_array, 5, 1, num_levels, MPI_COMM_WORLD);
5//Line 4 of Table 3,4
6MatrixRead(&A_array, &B_array, &P_array, 5, 2, num_levels, MPI_COMM_WORLD);
Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx
21-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \
2-mat_superlu_dist_replacetinypivot
to get better efficiency when using directsolver in PETSc(KSP).
The CPU time of PASE in Table 3
Run the executable program ./build/test2
, set the parameter nev
as
The number of runtime processes is set to
The CPU time of PASE in Table 4
Run the executable program ./build/test2
, set the parameter nev
as
The number of runtime processes is set to
Figure 3
Run the executable program ./build/test2
with different numbers of processes , set the parameter nev
as
The corresponding test code is ./pase/test/test_gmg2.c
in project PASE.
Algorithm-related parameters
Set the macro definition of TESTPASE
on line 11 of the program to
xxxxxxxxxx
11
Set the number of eigenpairs on Line 26 to
xxxxxxxxxx
11int nev = 200; //800 for table 4
In the function PASE_DIRECT_GCGE
, the relevant GCG algorithm parameters have already been set based on the size of nev
, and the user only needs to provide the convergence accuracy. Specifically, when nev
is
xxxxxxxxxx
71nev = 200;
2atol = 1e-2;
3rtol = 1e-8;
4nevConv = nev;
5block_size = nevConv / 5;
6nevMax = 2 * nevConv;
7nevInit = nevConv;
xxxxxxxxxx
71nev = 800;
2atol = 1e-2;
3rtol = 1e-8;
4nevConv = nev;
5block_size = nevConv / 10;
6nevInit = 3 * block_size;
7nevMax = nevInit + nevConv;
Modify the parameters as mentioned above and recompile.
Running process
The CPU time of GCGE in Table 3 and 4
Run the executable program ./build/test2
, set the parameter nev
as
The corresponding test code is ./pase/test/test_gmg2.c
in project PASE.
We conduct tests separately for the Krylov-Schur and LOBPCG methods and obtain optimal settings for ncv
and mpd
. Additionally, corresponding parameters were set for the LOBPCG method to achieve maximum efficiency.
Algorithm-related parameters
Set the macro definition of TESTPASE
on line 11 of the program to
xxxxxxxxxx
11
Set the number of eigenpairs on Line 26 to
xxxxxxxxxx
11int nev = 200; //800 for table 4
Set the parameters for selecting theEPS
type on line 66 to choose Krylov-Schur and LOBPCG as solvers respectively.
xxxxxxxxxx
11flag = 2; //flag = 2: krylov-Schur, flag = 6: lobpcg
Set the parameters for Krylov-Schur and LOBPCG. These parameters have already been set in the function PASE_DIRECT_EPS
.
xxxxxxxxxx
131if (flag == 2) // ks
2{
3 ncv = 2 * nev;
4 mpd = ncv;
5}
6else if (flag == 6) // lobpcg
7{
8 ncv = 2 * nev;
9 blocksize = nev / 5;
10 restart = 0.1;
11 mpd = 3 * blocksize;
12}
13max_it = 2000;
Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx
11-st_type sinvert -st_shift 0.0 -eps_target 0.0
to get better efficiency when using Krylov-Schur.
The CPU time of Krylov-Schur and LOBPCG in Table 3 and 4
Run the executable program ./build/test2
, set the parameter nev
as
The corresponding test code is ./pase/test/test_gmg3.c
and ./pase/test/test_gmg4.c
in project PASE.
Algorithm-related parameters
Set the macro definition of TESTPASE
on line 11 of the program to
xxxxxxxxxx
11
Set the number of eigenpairs on Line 26 to
xxxxxxxxxx
11int nev = 2000;
Set the number of multigrid levels on Line 27 to
xxxxxxxxxx
11int num_levels = 3;
Set the numbers of uniform refinements while multigrid generation using FEM on Line 29 to
xxxxxxxxxx
11MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);
The fourth parameter of function MatrixRead
means the number of refinements before the parallelization of FEM mesh, and the fifth parameter means the number of refinements after the parallelization of FEM mesh. The sum of the two number is the total number of uniform refinements. In this example,
The fourth parameter ofPASE_PARAMETER_Create
represents the convergence accuracy of PASE, which is set to
xxxxxxxxxx
11PASE_PARAMETER_Create(¶m, num_levels, nev, 1e-8, PASE_GMG);
Set the layer at which the initial value is calculated is represented by the following on Line 39 to
xxxxxxxxxx
11param->initial_level = 1;
Set the tolerance of the eigenproblem (3.2) in Algorithm 1 on Line 40 to
xxxxxxxxxx
11param->aux_rtol = 1e-9;
Set the precondition type on Line 41 to
xxxxxxxxxx
11param->pc_type = PRECOND_A;
Set the parameter indicating whether to implement a batching strategy (Algorithm 11) on Line 42 to true
xxxxxxxxxx
11param->if_batches = true;
Set the size of each batch (the
xxxxxxxxxx
11param->batch_size = 500;
Set the number of additional eigenpairs of the eigenvalue problem 5.3 to be solved such that the desired ones are contained (the
xxxxxxxxxx
11param->more_aux_nev = 60;
Set the number of additional eigenpairs computed during each batch in order to help PASE converges faster (the test_gmg3.c
and test_gmg4.c
.
xxxxxxxxxx
21param->more_batch_size = 70;
2//param->more_batch_size = 80;
In Table 5 and Table 6, only the numbers of uniform refinements have been altered. Specifically,
xxxxxxxxxx
41//Line 2 of Table 5,6
2MatrixRead(&A_array, &B_array, &P_array, 5, 1, num_levels, MPI_COMM_WORLD);
3//Line 3 of Table 5,6
4MatrixRead(&A_array, &B_array, &P_array, 5, 2, num_levels, MPI_COMM_WORLD);
Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx
21-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \
2-mat_superlu_dist_replacetinypivot
to get better efficiency when using directsolver in PETSc(KSP).
The CPU time of PASE in Table 5
Run the executable program ./build/test3
and change the numbers of uniform refinements as stated above.
The number of runtime processes is set to
The CPU time of PASE in Table 6
Run the executable program ./build/test4
and change the numbers of uniform refinements as stated above.
The number of runtime processes is set to
The corresponding test code is ./pase/test/test_gmg3.c
and ./pase/test/test_gmg4.c
in project PASE.
Algorithm-related parameters
Set the macro definition of TESTPASE
on line 11 of the program to
xxxxxxxxxx
11
Set the number of eigenpairs on Line 26 to
xxxxxxxxxx
11int nev = 2000;
In the function PASE_DIRECT_GCGE
, the relevant GCG algorithm parameters have already been set based on the size of nev
, and the user only needs to provide the convergence accuracy. Specifically, when nev
is
xxxxxxxxxx
71nev = 2000;
2nevConv = nev;
3atol = 1e-2;
4rtol = 1e-8;
5block_size = 200;
6nevInit = 3 * block_size;
7nevMax = nevInit + nevConv;
Modify the parameters as mentioned above and recompile.
Running process
The CPU time of GCGE in Table 5 and 6
Run the executable program ./build/test3
and ./build/test4
, set the parameter nev
as
The corresponding test code is ./pase/test/test_gmg3.c
and ./pase/test/test_gmg4.c
in project PASE.
We conduct tests separately for the Krylov-Schur and LOBPCG methods and obtain optimal settings for ncv
and mpd
. Additionally, corresponding parameters were set for the LOBPCG method to achieve maximum efficiency.
Algorithm-related parameters
Set the macro definition of TESTPASE
on line 11 of the program to
xxxxxxxxxx
11
Set the number of eigenpairs on Line 26 to
xxxxxxxxxx
11int nev = 2000;
Set the parameters for selecting theEPS
type on Line 70 to choose Krylov-Schur and LOBPCG as solvers respectively.
xxxxxxxxxx
11flag = 2; //flag = 2: krylov-Schur, flag = 6: lobpcg
Set the parameters for Krylov-Schur and LOBPCG. These parameters have already been set in the function PASE_DIRECT_EPS
.
xxxxxxxxxx
131if (flag == 2) // ks
2{
3 ncv = 2400;
4 mpd = 800;
5}
6else if (flag == 6) // lobpcg
7{
8 ncv = 2 * nev;
9 blocksize = nev / 5;
10 restart = 0.1;
11 mpd = 3 * blocksize;
12}
13max_it = 2000;
Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx
11-st_type sinvert -st_shift 0.0 -eps_target 0.0
to get better efficiency when using Krylov-Schur.
The CPU time of Krylov-Schur and LOBPCG in Table 5 and 6
Run the executable program ./build/test3
and ./build/test4
, set the parameter nev
as
The corresponding test code is ./test/Eigen3D_adapted_pase.c
in project OPENPFEM4PASE.
Compilation process
Install MPI and PASE. And export the corresponding paths in the bashrc
file.
Installation for OpenPFEM4PASE
The source code path for the OpenPFEM software package is pase1.0/OpenPFEM4PASE
.
After configuring and installing the external environment, it is necessary to modify the ./CMakeLists.txt
in project OpenPFEM4PASE.
./config/LSSC4_oneapi2021
according to the MPI environment../CMakeLists
.PASE_DIR
path to pase1.0/PASE/shared
(Absolute path) in bashrc
file.Enter the path and then execute the following commands to install OpenPFEM4PASE
xxxxxxxxxx
41mkdir build
2cd build
3cmake ../
4make
Algorithm-related parameters
Set the number of refinement times on Line 79 to
xxxxxxxxxx
11INT max_refine = 10;
Set the number of eigenpairs on Line 80 to
xxxxxxxxxx
11INT nev = 200;
Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx
21-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \
2-mat_superlu_dist_replacetinypivot
to get better efficiency when using directsolver in PETSc(KSP).
Figure 4
Run the executable program ./build/test4
, the data used in the Figure 4 will be printed in the output file like
xxxxxxxxxx
41element N = [ 196608 236828 319778 444937 639576 940419 1403539 2081119 2974727 4336282
2];
3PostErr = [ 1.2211 0.8477 0.5667 0.3736 0.2381 0.1534 0.1023 0.069 0.0466 0.0308
4];
Figure 5
Run the executable program ./build/test4
with different numbers of processes, the data used in the Figure 5 will be printed in the output file like
xxxxxxxxxx
11The total time of PASE solving after 5 times refinements is 619.915023 sec.
The corresponding test code is ./test/Eigen3D_adapted_pase2.c
in project OPENPFEM4PASE.
Compilation process
Install MPI and PASE. And export the corresponding paths in the bashrc
file.
Installation for OpenPFEM4PASE
The source code path for the OpenPFEM software package is pase1.0/OpenPFEM4PASE
.
After configuring and installing the external environment, it is necessary to modify the ./CMakeLists.txt
in project OpenPFEM4PASE.
./config/LSSC4_oneapi2021
according to the MPI environment../CMakeLists
.PASE_DIR
path to pase1.0/PASE/shared
(Absolute path) in bashrc
file.Enter the path and then execute the following commands to install OpenPFEM4PASE
xxxxxxxxxx
41mkdir build
2cd build
3cmake ../
4make
make sure the executable in ./CMakeLists.txt
is ${CMAKE_CURRENT_SOURCE_DIR}/test/Eigen3D_adapted_pase2.c
Algorithm-related parameters
Set the number of refinement times on Line 79 to
xxxxxxxxxx
11INT max_refine = 10;
Set the number of eigenpairs on Line 80 to
xxxxxxxxxx
11INT nev = 200;
Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
xxxxxxxxxx
21-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \
2-mat_superlu_dist_replacetinypivot
to get better efficiency when using directsolver in PETSc(KSP).
Figure 6
Run the executable program ./build/test5
, the data used in the Figure 6 will be printed in the output file like
xxxxxxxxxx
41element N = [ 196608 236828 319778 444937 639576 940419 1403539 2081119 2974727 4336282
2];
3PostErr = [ 1.2211 0.8477 0.5667 0.3736 0.2381 0.1534 0.1023 0.069 0.0466 0.0308
4];
Figure 7
Run the executable program ./build/test5
with different numbers of processes, the data used in the Figure 7 will be printed in the output file like
xxxxxxxxxx
11The total time of PASE solving after 5 times refinements is 619.915023 sec.
In the default settings, we print iteration information of PASE. If you do not need to output relevant information, please set the value of PRINT_INFO
to 0
in line 25 of ./pase/src/pase_sol.c
.