README For PASE1.0

This README file serves for requesting the "SISC Reproducibility Badge: code and data available" of the paper "A Massively Parallel Augmented Subspace Eigensolver and Its Implementation for Large Scale Eigenvalue Problems", authored by Yangfei Liao, Haochen Liu, Zijing Wang, and Hehu Xie.

GitLab repository link is https://gitlab.com/xiegroup/pase1.0.git.

The numerical examples are carried out on LSSC-IV in the State Key Laboratory of Scientific and Engineering Computing, Chinese Academy of Sciences. Each computing node has two 18-core Intel Xeon Gold 6140 processors at 2.3 GHz and 192 GB memory. In the following tests, we utilized the Intel MPI compiler (mpiicc), and both BLAS and LAPACK are versions of Intel MKL.

Descriptions of Codes

In the PASE 1.0 repository, there are a total of two folders, namely OpenPFEM4PASE and PASE. In Section 6 of the paper, a total of five test cases are presented. The corresponding test programs for these cases are as follows:

The first test case is associated with the test program located at ./pase/pase/test/test_gmg1.c.
The second test case is associated with the test program located at ./pase/pase/test/test_gmg2.c.
The third test case is associated with the test programs located at ./pase/pase/test/test_gmg3.c and ./pase/pase/test/test_gmg4.c.
The fourth test case is associated with the test program located at ./OpenPFEM4PASE/test/Eigen3D_adapted_pase.c.
The fifth test case is associated with the test program located at ./OpenPFEM4PASE/test/Eigen3D_adapted_pase2.c.

Preparations For PASE

Install MPI, BLAS, LAPACK, PETSc and SLEPc. And export the corresponding paths in the bashrc file.
Note that when installing PETSc, it is necessary to simultaneously install packages METIS and ParMETIS.
Installation for PASE
- The source code path for the PASE software package is pase1.0/PASE.
- After configuring and installing the external environment, it is necessary to modify the ./CMakeLists.txt in project PASE.
  1. Modify ./config/LSSC4_oneapi2021 according to the MPI environment.
  2. Notice the paths for linking PETSc, SLEPc, and OPENPFEM in the ./CMakeLists .
  3. Set OPENPFEM_DIR path to pase1.0/OpenPFEM4PASE/shared(Absolute path) in bashrc file.
- Enter the path and then execute the following commands to install PASE
```
4
1
mkdir build
2
cd build
3
cmake ../
4
make
```

Descriptions of Parameters

PASE

Parameters	Description
nev	number of eigenvalues to compute
num_levels	number of multigrid levels
initial_level	the layer at which the initial value is calculated
aux_rtol	convergence tolerance of the eigenproblem (3.2) in Algorithm 1
pc_type	type of preconditioner
if_batches	whether to implement a batching strategy
batch_size	$k$ in Algorithm 11)
more_aux_nev	$k_{\text{of}}$ in Algorithm 10)
more_batch_size	$k_{\text{ol}}$ in Algorithm 11)
max_refine	number of refinement times
smoothing_type	type of smoothing method

GCGE

Parameters	Description
nevConv	number of eigenvalues to compute
gapMin	threshold for detecting repeated roots
nevGiven	number of known eigenpairs
block_size	$P$ $W$ in GCG algorithm
nevMax	the maximum number of eigenpairs to compute
nevInit	$X$ in GCG algorithm
tol_gcg[0]	absolute tolerance
tol_gcg[1]	relative tolerance
compW_method	the method for linear solver

SLEPC

Parameters	Description
nev	number of eigenvalues to compute
ncv	the maximum dimension of the subspace to be used by the solver
mpd	the maximum dimension allowed for the projected problem
maxit	the maximum iteration count
blocksize	the block size in LOBPCG
restart	the percentage of the block of vectors to force a restart in LOBPCG

The model eigenvalue problem

The corresponding test code is ./pase/test/test_gmg1.c in project PASE

PASE

Algorithm-related parameters

$1$
```
1
1
#define TESTPASE 1 
```

$200$ $800$ in table 2)


1
1
int nev = 200; //800 for table 2

$3$


1
1
int num_levels = 3;

$4$ $1$
```
1
1
MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);
```
MatrixRead $5$ $1,698,879$ $6$ $13,785,215$ $7$ $111,063,295$ DoFs.
PASE_PARAMETER_Create $10^{-8}$ (Line 35 of PASE1.0/pase/test/test1.c)
```
1
1
PASE_PARAMETER_Create(&param, num_levels, nev, 1e-8, PASE_GMG);
```
$1$
```
1
1
param->initial_level = 1;
```
$10^{-9}$
```
1
1
param->aux_rtol = 1e-9;
```

$\rm{PRECOND\_A}$


xxxxxxxxxx
4
1
param->pc_type = PRECOND_A; // PRECOND_NONE, PRECOND_B, PRECOND_B_A
2
//PRECOND_NONE: Do not impose any preconditioning on A_{Hh}
3
//PRECOND_B   : Only impose preconditioning on B_{Hh}
4
// PRECOND_B_A: Impose preconditioning on B_{Hh} first, and then impose preconditionint on \tilde{A_{Hh}} during the solution of the linear system of equations

In Table 1 and Table 2, only the numbers of uniform refinements have been altered. Specifically,


xxxxxxxxxx
6
1
//Line 2 of Table 1,2
2
MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);
3
//Line 3 of Table 1,2
4
MatrixRead(&A_array, &B_array, &P_array, 5, 1, num_levels, MPI_COMM_WORLD);
5
//Line 4 of Table 1,2
6
MatrixRead(&A_array, &B_array, &P_array, 5, 2, num_levels, MPI_COMM_WORLD);

Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
```
xxxxxxxxxx
2
1
-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \
2
-mat_superlu_dist_replacetinypivot
```
to get better efficiency when using directsolver in PETSc(KSP).
- The CPU time of PASE in Table 1
  Run the executable program ./build/test1nev $200$ and change the numbers of uniform refinements as stated above.
  $72$ $288$ $576$ , respectively.
- The CPU time of PASE in Table 2
  Run the executable program ./build/test1nev $800$ and change the numbers of uniform refinements as stated above.
  $72$ $288$ $1440$ , respectively.
- Figure 2
  Run the executable program ./build/test1nev $200$ $6$ .
- Run the executable program ./build/test1 , change the parameter nevpc_type $288$ .
```
xxxxxxxxxx
2
1
param->pc_type = PRECOND_A;
2
//param->pc_type = PRECOND_NONE;
```

GCGE

The corresponding test code is ./pase/test/test_gmg1.c in project PASE

Algorithm-related parameters

TESTPASE $2$


xxxxxxxxxx
1
1
#define TESTPASE 2

$200$ $800$ in table 2)


xxxxxxxxxx
1
1
int nev = 200; //800 for table 2

In the function PASE_DIRECT_GCGE, the relevant GCG algorithm parameters have already been set based on the size of nev, and the user only needs to provide the convergence accuracy. Specifically, when nev is 200 and 800 respectively, the corresponding parameters are set as follows


xxxxxxxxxx
7
1
nev = 200;
2
nevConv = nev;
3
atol = 1e-2;
4
rtol = 1e-8;
5
block_size = nevConv / 5;
6
nevMax = 2 * nevConv;
7
nevInit = nevConv;


xxxxxxxxxx
7
1
nev = 800;
2
nevConv = nev;
3
atol = 1e-2;
4
rtol = 1e-8;
5
block_size = nevConv / 10;
6
nevInit = 3 * block_size;
7
nevMax = nevInit + nevConv;

Modify the parameters as mentioned above and recompile.
Running process
- The CPU time of GCGE in Table 1 and 2
  Run the executable program ./build/test1nev $200$ $800$ $72$ $288$ $576$ $72$ $288$ $1440$ , respectively in Table 2.

SLEPc

The corresponding test code is ./pase/test/test_gmg1.c in project PASE.

We conduct tests separately for the Krylov-Schur and LOBPCG methods and obtain optimal settings for ncv and mpd. Additionally, corresponding parameters were set for the LOBPCG method to achieve maximum efficiency.

Algorithm-related parameters

TESTPASE $3$


xxxxxxxxxx
1
1
#define TESTPASE 3

$200$ $800$ in table 2)


xxxxxxxxxx
1
1
int nev = 200; //800 for table 2

Set the parameters for selecting theEPS type on line 66 to choose Krylov-Schur and LOBPCG as solvers respectively.


xxxxxxxxxx
1
1
flag = 2; //flag = 2: krylov-Schur, flag = 6: lobpcg

Set the parameters for Krylov-Schur and LOBPCG. These parameters have already been set in the function PASE_DIRECT_EPS.


xxxxxxxxxx
13
1
if (flag == 2) // ks
2
{
3
  ncv = 2 * nev;
4
  mpd = ncv;
5
}
6
else if (flag == 6) // lobpcg
7
{
8
  ncv = 2 * nev;
9
  blocksize = nev / 5;
10
  restart = 0.1;
11
  mpd = 3 * blocksize;
12
}
13
max_it = 2000;

Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
```
xxxxxxxxxx
1
1
-st_type sinvert -st_shift 0.0 -eps_target 0.0
```
to get better efficiency when using Krylov-Schur.
The CPU time of Krylov-Schur and LOBPCG in Table 1 and 2
Run the executable program ./build/test1nev $200$ $800$ $72$ $288$ $576$ $72$ $288$ $1440$ , respectively in Table 2.

A more general eigenvalue problem

PASE

The corresponding test code is ./pase/test/test_gmg2.c in project PASE.

Algorithm-related parameters :

$1$


xxxxxxxxxx
1
1
#define TESTPASE 1

$200$ $800$ in table 4)


xxxxxxxxxx
1
1
int nev = 200; //800 for table 4

$3$


xxxxxxxxxx
1
1
int num_levels = 3;

$4$ $1$
```
xxxxxxxxxx
1
1
MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);
```
MatrixRead $5$ $1,698,879$ $6$ $13,785,215$ $7$ $111,063,295$ DoFs.
PASE_PARAMETER_Create $10^{-8}$ (Line 35 of PASE1.0/pase/test/test_gmg2.c)
```
xxxxxxxxxx
1
1
PASE_PARAMETER_Create(&param, num_levels, nev, 1e-8, PASE_GMG);
```

$1$


xxxxxxxxxx
1
1
param->initial_level = 1;

$10^{-9}$


xxxxxxxxxx
1
1
param->aux_rtol = 1e-9;

$\rm{PRECOND\_A}$


xxxxxxxxxx
1
1
param->pc_type = PRECOND_A;

In Table 3 and Table 4, only the numbers of uniform refinements have been altered. Specifically,


xxxxxxxxxx
6
1
//Line 2 of Table 3,4
2
MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);
3
//Line 3 of Table 3,4
4
MatrixRead(&A_array, &B_array, &P_array, 5, 1, num_levels, MPI_COMM_WORLD);
5
//Line 4 of Table 3,4
6
MatrixRead(&A_array, &B_array, &P_array, 5, 2, num_levels, MPI_COMM_WORLD);

Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
```
xxxxxxxxxx
2
1
-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \
2
-mat_superlu_dist_replacetinypivot
```
to get better efficiency when using directsolver in PETSc(KSP).
- The CPU time of PASE in Table 3
  Run the executable program ./build/test2nev $200$ and change the numbers of uniform refinements as stated above.
  $72$ $288$ $576$ , respectively.
- The CPU time of PASE in Table 4
  Run the executable program ./build/test2nev $800$ and change the numbers of uniform refinements as stated above.
  $72$ $288$ $1440$ , respectively.
- Figure 3
  Run the executable program ./build/test2nev $200$ $6$ .

GCGE

The corresponding test code is ./pase/test/test_gmg2.c in project PASE.

Algorithm-related parameters

TESTPASE $2$


xxxxxxxxxx
1
1
#define TESTPASE 2

$200$ $800$ in table 4)


xxxxxxxxxx
1
1
int nev = 200; //800 for table 4

In the function PASE_DIRECT_GCGE, the relevant GCG algorithm parameters have already been set based on the size of nevnev $200$ $800$ respectively, the corresponding parameters are set as follows


xxxxxxxxxx
7
1
nev = 200;
2
atol = 1e-2;
3
rtol = 1e-8;
4
nevConv = nev;
5
block_size = nevConv / 5;
6
nevMax = 2 * nevConv;
7
nevInit = nevConv;


xxxxxxxxxx
7
1
nev = 800;
2
atol = 1e-2;
3
rtol = 1e-8;
4
nevConv = nev;
5
block_size = nevConv / 10;
6
nevInit = 3 * block_size;
7
nevMax = nevInit + nevConv;

Modify the parameters as mentioned above and recompile.
Running process
The CPU time of GCGE in Table 3 and 4
Run the executable program ./build/test2nev $200$ $800$ $72$ $288$ $576$ $72$ $288$ $1440$ , respectively in Table 4.

SLEPc

The corresponding test code is ./pase/test/test_gmg2.c in project PASE.

Algorithm-related parameters

TESTPASE $3$


xxxxxxxxxx
1
1
#define TESTPASE 3

$200$ $800$ in table 4)


xxxxxxxxxx
1
1
int nev = 200; //800 for table 4

Set the parameters for selecting theEPS type on line 66 to choose Krylov-Schur and LOBPCG as solvers respectively.


xxxxxxxxxx
1
1
flag = 2; //flag = 2: krylov-Schur, flag = 6: lobpcg

Set the parameters for Krylov-Schur and LOBPCG. These parameters have already been set in the function PASE_DIRECT_EPS.


xxxxxxxxxx
13
1
if (flag == 2) // ks
2
{
3
  ncv = 2 * nev;
4
  mpd = ncv;
5
}
6
else if (flag == 6) // lobpcg
7
{
8
  ncv = 2 * nev;
9
  blocksize = nev / 5;
10
  restart = 0.1;
11
  mpd = 3 * blocksize;
12
}
13
max_it = 2000;

Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
```
xxxxxxxxxx
1
1
-st_type sinvert -st_shift 0.0 -eps_target 0.0
```
to get better efficiency when using Krylov-Schur.
The CPU time of Krylov-Schur and LOBPCG in Table 3 and 4
Run the executable program ./build/test2nev $200$ $800$ $72$ $288$ $576$ $72$ $288$ $1440$ , respectively in Table 4.

Numerical tests for batch scheme

PASE

The corresponding test code is ./pase/test/test_gmg3.c and ./pase/test/test_gmg4.c in project PASE.

Algorithm-related parameters

TESTPASE $1$


xxxxxxxxxx
1
1
#define TESTPASE 1

$2000$


xxxxxxxxxx
1
1
int nev = 2000;

$3$


xxxxxxxxxx
1
1
int num_levels = 3;

$4$ $1$
```
xxxxxxxxxx
1
1
MatrixRead(&A_array, &B_array, &P_array, 4, 1, num_levels, MPI_COMM_WORLD);
```
MatrixRead $6$ $13,785,215$ $7$ $111,063,295$ DoFs.
PASE_PARAMETER_Create $10^{-8}$ (Line 35 of PASE1.0/pase/test/test_gmg3.c)
```
xxxxxxxxxx
1
1
PASE_PARAMETER_Create(&param, num_levels, nev, 1e-8, PASE_GMG);
```

$1$


xxxxxxxxxx
1
1
param->initial_level = 1;

$10^{-9}$


xxxxxxxxxx
1
1
param->aux_rtol = 1e-9;

$\rm{PRECOND\_A}$


xxxxxxxxxx
1
1
param->pc_type = PRECOND_A;

Set the parameter indicating whether to implement a batching strategy (Algorithm 11) on Line 42 to true


xxxxxxxxxx
1
1
param->if_batches = true;

$k$ $500$


xxxxxxxxxx
1
1
param->batch_size = 500;

$k_{\text{of}}$ $60$
```
xxxxxxxxxx
1
1
param->more_aux_nev = 60;
```

$k_{\text{ol}}$ $70$ test_gmg3.c $80$ in test_gmg4.c.


xxxxxxxxxx
2
1
param->more_batch_size = 70;
2
//param->more_batch_size = 80;

In Table 5 and Table 6, only the numbers of uniform refinements have been altered. Specifically,


xxxxxxxxxx
4
1
//Line 2 of Table 5,6
2
MatrixRead(&A_array, &B_array, &P_array, 5, 1, num_levels, MPI_COMM_WORLD);
3
//Line 3 of Table 5,6
4
MatrixRead(&A_array, &B_array, &P_array, 5, 2, num_levels, MPI_COMM_WORLD);

Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
```
xxxxxxxxxx
2
1
-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \
2
-mat_superlu_dist_replacetinypivot
```
to get better efficiency when using directsolver in PETSc(KSP).
- The CPU time of PASE in Table 5
  Run the executable program ./build/test3 and change the numbers of uniform refinements as stated above.
  $288$ $1440$ , respectively.
- The CPU time of PASE in Table 6
  Run the executable program ./build/test4 and change the numbers of uniform refinements as stated above.
  $288$ $1440$ , respectively.

GCGE

The corresponding test code is ./pase/test/test_gmg3.c and ./pase/test/test_gmg4.c in project PASE.

Algorithm-related parameters

TESTPASE $2$


xxxxxxxxxx
1
1
#define TESTPASE 2

$2000$


xxxxxxxxxx
1
1
int nev = 2000;

In the function PASE_DIRECT_GCGE, the relevant GCG algorithm parameters have already been set based on the size of nevnev $2000$ , the corresponding parameters are set as follows


xxxxxxxxxx
7
1
nev = 2000;
2
nevConv = nev;
3
atol = 1e-2;
4
rtol = 1e-8;
5
block_size = 200;
6
nevInit = 3 * block_size;
7
nevMax = nevInit + nevConv;

Modify the parameters as mentioned above and recompile.
Running process
- The CPU time of GCGE in Table 5 and 6
  Run the executable program ./build/test3 and ./build/test4nev $2000$ $288$ $1440$ , respectively in Table 5 and Table 6.

SLEPc

The corresponding test code is ./pase/test/test_gmg3.c and ./pase/test/test_gmg4.c in project PASE.

Algorithm-related parameters

TESTPASE $3$


xxxxxxxxxx
1
1
#define TESTPASE 3

$2000$


xxxxxxxxxx
1
1
int nev = 2000;

Set the parameters for selecting theEPS type on Line 70 to choose Krylov-Schur and LOBPCG as solvers respectively.


xxxxxxxxxx
1
1
flag = 2; //flag = 2: krylov-Schur, flag = 6: lobpcg

Set the parameters for Krylov-Schur and LOBPCG. These parameters have already been set in the function PASE_DIRECT_EPS.


xxxxxxxxxx
13
1
if (flag == 2) // ks
2
{
3
    ncv = 2400;
4
    mpd = 800;
5
}
6
else if (flag == 6) // lobpcg
7
{
8
  ncv = 2 * nev;
9
  blocksize = nev / 5;
10
  restart = 0.1;
11
  mpd = 3 * blocksize;
12
}
13
max_it = 2000;

Modify the parameters as mentioned above and recompile.
Running process
The command-line parameters for the executable should be
```
xxxxxxxxxx
1
1
-st_type sinvert -st_shift 0.0 -eps_target 0.0
```
to get better efficiency when using Krylov-Schur.
The CPU time of Krylov-Schur and LOBPCG in Table 5 and 6
Run the executable program ./build/test3 and ./build/test4nev $2000$ $288$ $1440$ , respectively in Table 5 and Table 6.

Adaptive finite element method

The corresponding test code is ./test/Eigen3D_adapted_pase.c in project OPENPFEM4PASE.

Compilation process
1. Install MPI and PASE. And export the corresponding paths in the bashrc file.
2. Installation for OpenPFEM4PASE
  - The source code path for the OpenPFEM software package is pase1.0/OpenPFEM4PASE.
  - After configuring and installing the external environment, it is necessary to modify the ./CMakeLists.txt in project OpenPFEM4PASE.
    1. Modify ./config/LSSC4_oneapi2021 according to the MPI environment.
    2. Modify the paths for linking PETSc, SLEPc, and PASE in the ./CMakeLists .
    3. Set PASE_DIR path to pase1.0/PASE/shared(Absolute path) in bashrc file.
3. Enter the path and then execute the following commands to install OpenPFEM4PASE
```
xxxxxxxxxx
4
1
mkdir build
2
cd build
3
cmake ../
4
make
```

Algorithm-related parameters

$10$


xxxxxxxxxx
1
1
INT max_refine = 10;

$200$


xxxxxxxxxx
1
1
INT nev = 200;

Modify the parameters as mentioned above and recompile.

Running process

The command-line parameters for the executable should be


xxxxxxxxxx
2
1
-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \
2
-mat_superlu_dist_replacetinypivot

to get better efficiency when using directsolver in PETSc(KSP).

Figure 4

Run the executable program ./build/test4 , the data used in the Figure 4 will be printed in the output file like


xxxxxxxxxx
4
1
element N = [  196608 236828  319778  444937  639576  940419  1403539 2081119 2974727 4336282
2
 ];
3
PostErr = [  1.2211 0.8477  0.5667  0.3736  0.2381  0.1534  0.1023  0.069 0.0466  0.0308
4
 ];

Figure 5

Run the executable program ./build/test4 with different numbers of processes, the data used in the Figure 5 will be printed in the output file like


xxxxxxxxxx
1
1
The total time of PASE solving after 5 times refinements is 619.915023 sec.

Adaptive finite element method for Hydrogen atom

The corresponding test code is ./test/Eigen3D_adapted_pase2.c in project OPENPFEM4PASE.

Compilation process
1. Install MPI and PASE. And export the corresponding paths in the bashrc file.
2. Installation for OpenPFEM4PASE
  - The source code path for the OpenPFEM software package is pase1.0/OpenPFEM4PASE.
  - After configuring and installing the external environment, it is necessary to modify the ./CMakeLists.txt in project OpenPFEM4PASE.
    1. Modify ./config/LSSC4_oneapi2021 according to the MPI environment.
    2. Modify the paths for linking PETSc, SLEPc, and PASE in the ./CMakeLists .
    3. Set PASE_DIR path to pase1.0/PASE/shared(Absolute path) in bashrc file.
3. Enter the path and then execute the following commands to install OpenPFEM4PASE
```
xxxxxxxxxx
4
1
mkdir build
2
cd build
3
cmake ../
4
make
```
make sure the executable in ./CMakeLists.txt is ${CMAKE_CURRENT_SOURCE_DIR}/test/Eigen3D_adapted_pase2.c

Algorithm-related parameters

$10$


xxxxxxxxxx
1
1
INT max_refine = 10;

$200$


xxxxxxxxxx
1
1
INT nev = 200;

Modify the parameters as mentioned above and recompile.

Running process

The command-line parameters for the executable should be


xxxxxxxxxx
2
1
-mat_superlu_dist_rowperm NOROWPERM -mat_superlu_dist_colperm PARMETIS \
2
-mat_superlu_dist_replacetinypivot

to get better efficiency when using directsolver in PETSc(KSP).

Figure 6

Run the executable program ./build/test5 , the data used in the Figure 6 will be printed in the output file like


xxxxxxxxxx
4
1
element N = [  196608 236828  319778  444937  639576  940419  1403539 2081119 2974727 4336282
2
 ];
3
PostErr = [  1.2211 0.8477  0.5667  0.3736  0.2381  0.1534  0.1023  0.069 0.0466  0.0308
4
 ];

Figure 7

Run the executable program ./build/test5 with different numbers of processes, the data used in the Figure 7 will be printed in the output file like


xxxxxxxxxx
1
1
The total time of PASE solving after 5 times refinements is 619.915023 sec.

In the default settings, we print iteration information of PASE. If you do not need to output relevant information, please set the value of PRINT_INFO to 0 in line 25 of ./pase/src/pase_sol.c.