SSMFE_CORE - Sparse Symmetric Matrix-Free Eigensolver (Core Algorithm)

#include <spral_ssmfe.h> /* or <spral.h> for all packages */

Purpose

This package computes extreme (leftmost and/or rightmost) eigenpairs \(\{\lambda_i, x_i\}\) of the following eigenvalue problems:

  • the standard eigenvalue problem

    \[A x = \lambda x,\]
  • the generalized eigenvalue problem

    \[A x = \lambda B x,\]
  • the eigenvalue problem

    \[AB x = \lambda x\]

where \(A\) and \(B\) are real symmetric (or Hermitian) matrices and \(B\) is positive definite.

The packages SPRAL_SSMFE and SPRAL_SSMFE_EXPERT provide a simplified interface to this routine, and should be used if the user does not require access to low level features provided in this package.

Major version history

2014-11-20 Version 1.0.0

Initial release

Usage overview

This package implements a block iterative algorithm for simultaneous computation of several eigenpairs for the problems above. The block nature of this algorithm allows the user to benefit from highly optimized linear algebra subroutines and from the ubiquitous multicore architecture of modern computers. It also makes this algorithm more reliable than Krylov-based algorithms employed by e.g. ARPACK in the presence of clustered eigenvalues. However, convergence of the iterations may be slow if the density of the spectrum is high.

Thus, good performance (in terms of speed) is contingent on the following two factors:

  1. the number of desired eigenpairs must be substantial (e.g. not fewer than the number of CPU cores), and

  2. the employment of a convergence acceleration technique.

The acceleration techniques that can be used are shift-and-invert and preconditioning.

The former rewrites the eigenvalue problem for a matrix \(M\) as \(Ax = \lambda x\) with \(A = (M - \sigma I)^{-1}\), where \(I\) is the identity matrix and \(\sigma\) is a real value near eigenvalues of interest, and the generalized problem \(M x = \mu B x\) as the \(Ax = \lambda B x\) with \(A = (M - \sigma B)^{-1}\).

The latter applies to the case of positive definite \(A\) and requires a matrix or an operator \(T\), called a preconditioner, such that the vector \(v = T f\) is an approximation to the solution \(u\) of the system \(A u = f\) (see the simple example below). Note: This technique is only recommended for experienced users.

For futher detail on the algorithm, see the outline in the method section below, and associated references.

The package SSMFE provides a user-friendly interface to this algorithm, whilst SSMFE_EXPERT provides an interface that allows users to manage their own memory. If this routine is used instead of SSMFE_EXPERT, the user is additionally responsible for deciding when a sufficient number of eigenpairs have been computed to sufficient accuracy. The amount of computation performed by the solver subroutines in this package and the memory they use are negligible. These features facilitate the use of these subroutines for shared-memory, out-of-core and hybrid computation.

Subroutines

To use the solver procedures, the user must maintain a workspace of (kw+1) blocks each containing m vectors of size n. For notational convienience we refer to this workspace as a Fortran array W[kw+1][m][n], but the user is free to store it as they wish. Note the block dimension is indexed from zero, not from one. The following table provides minimum values of kw for each setup:

minAprod=T

minAprod=F

Problem

minBprod=T

minBprod=F

minBprod=T

minBprod=F

standard

7

5

3

3

standard_shift

7

5

N/A

N/A

generalized

7

5

5

3

generalized_shift

7

7

N/A

N/A

buckling

7

7

N/A

N/A

Further, the user must also store the converged eigenvectors \(X\), and (for generalised problems) their \(B\)-images \(BX\) using separate storage, e.g. X[mep][n], BX[mep][n]. In addition to being output, the routine may need to reorthagonalise against these from time to time.

The first (zero-indexed) block holds the eigenvector approximations: the user must fill this block with \(m\) linearly independent vectors before the first call to a solver procedure.

The number of desired eigenpairs may exceed \(m\): whenever converged eigenpairs have been detected, a solver procedure reports the indices of these eigenpairs and they must be moved by the user to separate storage (X(:)).

void spral_ssmfe_core_default_options(struct spral_ssmfe_core_options *options)

Intialises members of options structure to default values.

Parameters:
  • options – Structure to be initialised.

void spral_ssmfe_double(struct spral_ssmfe_rcid *rci, int problem, int left, int right, int m, double *lambda, double *rr, int *ind, void **keep, const struct spral_ssmfe_core_options *options, struct spral_ssmfe_inform *inform)

Computes specified number of leftmost and rightmost eigenvalues and corresponding eigenvectors.

Uses reverse-communication. Upon return the user must perform a task specified by the rci parameter and recall the routine. Possible values of rci and associated tasks are:

rci.job

Task to be performed

-3

None. Fatal error, see inform.flag.

-2

Failed to converge, see inform.flag.

-1

None. Computation complete.

1

Calculate \(\bar{V} = AU\).

2

Apply preconditioner \(\bar{V} = TU\). (Copy if T=I).

3

Compute \(\bar{V} = BU\)

4

Test convergence for each of \(m\) eigenvalues:

  • If eigenpair i has converged, set inform.converged[i] to a positive number.

  • Otherwise, leave at current value.

Tests may use the estimated eigenvalue errors inform.err_lambda[i] and eigenvector errors inform.err_x[i].

5

Copy converged eigenvectors \(X\) to user storage:

  • If rci.i>0: W[rci.kx][rci.jx:rci.jx+rci.nx-1][:].

  • Else: W[rci.kx][rci.jx-rci.nx+1:rci.jx][:].

Optionally save their \(B\)-images:

  • If rci.i>0: W[rci.ky][rci.jx:rci.jx+rci.nx-1][:].

  • Else: W[rci.ky][rci.jx-rci.nx+1:rci.jx][:].

11

If rci.i == 0, copy \(\bar{V} = U\).

Otherwise, reorder columns of block rci.kx such that column ind[j] becomes the new column j for j=0, …, rci.nx-1

Note: if rci.kx == rci.ky, only reorder once.

12

Compute the dot products

\[r_{ii} = U_i \cdot \bar{V}_i\]

13

Perform the scalings

\[U_i = U_i/\sqrt{(U_i\cdot \bar{V}_i)}\]

and

\[\bar{V}_i = \bar{V}_i/\sqrt{(U_i\cdot \bar{V}_i)}\]

for each column \(U_i\) and \(\bar{V}_i\) of \(U\) and \(\bar{V}\).

Note: if rci.kx == rci.ky, only scale once.

14

Perform the updates

\[\bar{V}_i = \bar{V}_i + r_{ii} U_i\]

for each column \(\bar{V}_i\) of \(\bar{V}\)

15

Perform the update

\[R = \alpha U^T V + \beta R\]

16

Perform the update

\[V = \alpha U R + \beta V\]

17

Perform the update

\[U = \alpha U R\]

Note: \(V\) may be used as a workspace

21

\(B\)-orthogonalize columns of \(V\) to all vectors \(X\) by solving

\[(X^TBX) Q = X^T \bar{V}\]

for \(Q\) and updating

\[\begin{split}U & = & U - XQ \\\end{split}\]

If \(B\ne I\), the :math:bar{V} must also be updated as

\[\bar{V} & = & \bar{V} - BXQ,\]

or

\[\bar{V} = BU\]

22

\(B\)-orthogonalize columns of \(U\) to all vectors \(X\) by solving

\[(X^TBX) Q = X^T U\]

for \(Q\) and updating

\[U = U - BXQ\]

999

Restart:

If rci.k>0: Restart suggested with block size m >= rci.nx + rci.i + rci.j, adjusting workspace size to match. Set rci.i=0 and rci.j=0 and recall the routine. If a restart is not desirable, routine may be recalled with no change to parameters.

If rci.k=0: Restart required with the same block size.

In both cases, the first block W[0][:][:] should retain vectors rci.jx:rci.jx+rci.nx-1, filling remaining vectors randomly such that the entire set of columns is linearly independent from each other and also from the converged eigenvectors.

The matrices are defined as follows:

  • \(U\) = W[rci.kx][rci.jx:rci.jx+rci.nx-1][:]

  • \(V\) = W[rci.ky][rci.jy:rci.jy+rci.ny-1][:]

  • \(\bar{V}\) = W[rci.ky][rci.jy:rci.jy+rci.nx-1][:]

  • \(R\) = rr[rci.k][rci.j:rci.j+rci.ny-1][rci.i:rci.i+rci.nx-1]

and \(\alpha\) and \(\beta\) are given by rci.alpha and rci.beta respectively. We use the notation \(r_{ii}\) to refer to the \(i\)-th diagonal element of \(R\), being rr[rci.k][rci.j+i-1][rci.i+i-1].

Parameters:
  • rci – Reverse communication type. rci.job must be set to 0 before the first call.

  • problem

    Problem to be solved, one of:

    0

    \(Ax=\lambda x\)

    >0

    \(Ax=\lambda Bx\)

    <0

    \(ABx=\lambda x\)

  • left – Number of left eigenpairs to find. On return with rci.job=5, can be set to zero if sufficient have been found.

  • right – Number of left eigenpairs to find. On return with rci.job=5, can be set to zero if sufficient have been found.

  • lambda[m] – Current eigenvalue estimates in ascending order.

  • m – Block size of workspace W. Must be at least 2.

  • rr[3][2*m][2*m] – reverse communication workspace.

  • ind[m] – reverse communication workspace.

  • keep – Internal workspace used by routine.

  • options – specifies algorithm options to be used.

  • inform – returns information about the exection of the routine.

void spral_ssmfe_double_complex(struct spral_ssmfe_rciz *rci, int problem, int left, int right, int m, double *lambda, double complex *rr, int *ind, void **keep, const struct spral_ssmfe_core_options *options, struct spral_ssmfe_inform *inform)

As spral_ssmfe_double(), but types of rci and rr changed to support type double complex.

void spral_ssmfe_largest_double(struct spral_ssmfe_rcid *rci, int problem, int nep, int m, double *lambda, double *rr, int *ind, void **keep, const struct spral_ssmfe_core_options *options, struct spral_ssmfe_inform *inform)

Computes specified number of eigenvalues of largest magnitude and corresponding eigenvectors.

Uses reverse-communication. Upon return the user must perform a task specified by the rci parameter and recall the routine. Possible values of rci and associated tasks are as for spral_ssmfe_double() above.

Parameters:
  • rci – Reverse communication type. rci.job must be set to 0 before the first call.

  • problem

    Problem to be solved, one of:

    0

    \(Ax=\lambda x\)

    >0

    \(Ax=\lambda Bx\)

    <0

    \(ABx=\lambda x\)

  • nep – Number of eigenpairs to find.

  • lambda[m] – Current eigenvalue estimates in ascending order.

  • m – Block size of workspace W. Must be at least 2.

  • rr[3][2*m][2*m] – reverse communication workspace.

  • ind[m] – reverse communication workspace.

  • keep – Internal workspace used by routine.

  • options – specifies algorithm options to be used.

  • inform – returns information about the exection of the routine.

void spral_ssmfe_largest_double_complex(struct spral_ssmfe_rciz *rci, int problem, int nep, int m, double *lambda, spral_double_complex *rr, int *ind, void **keep, const struct spral_ssmfe_core_options *options, struct spral_ssmfe_inform *inform)

As spral_ssmfe_largest_double(), but types of rci and rr changed to support type double complex.

void spral_ssmfe_core_free(void **keep, struct spral_ssmfe_inform *inform)

Free memory allocated in keep and inform.

Parameters:
  • keep – Workspace to be freed.

  • inform – Information type to be freed.

Warning

As memory in keep and inform has been allocated using Fortran functions, this routine must be called to avoid a memory leak.

Derived types

struct spral_ssmfe_core_options

Options that control the algorithm.

int err_est

Error estimation scheme, one of:

1

Residual error bounds: modified Davis-Kahan estimate for eigenvector error and Lehmann bounds for eigenvale error (see method section).

2 (default)

Convergence curve-based estimate.

Default is 2.

int extra_left

Number of extra approximate eigenvectors corresponding to leftmost eigenvalues used to enhance convergence. Default is 0.

int extra_right

Number of extra approximate eigenvectors corresponding to rightmost eigenvalues used to enhance convergence. Default is 0.

bool minAprod

If true, minimize number of multiplications with \(A\) by requiring 2 additional blocks of memory for the workspace W[:][:][:]. If false, three returns with rci.job=1 occur per iteration instead of one. Must be true if problem < 0. Default is true.

bool minBprod

If true, minimize number of multiplications with \(B\) by requiring 2 additional blocks of memory for the workspace W[:][:][:]. If false, at least three returns with rci.job=3 occur per iteration instead of one. Default is true.

double min_gap

Restart sensitivity: if the relative distance between the last eigenvalue of interest on either margin of the spectrum and the rest of the spectrum is smaller than min_gap, the solver procedure suggests restart (rci.job=999). The values rci.i and rci.j are set to the numbers of eigenvalues on the left and right margin of the spectrum that are too close to the eigenvalues of interest, causing slow convergence. The default value of 0.0 means no restart is ever suggested. Must be in the range \([0.0,1.0]\).

double cf_max

Stagnation sensitivity: if the value \(q_{ij}\) (see method section) is greater than cf_max for \(i > 5\), the eigenpair is marked as stagnated by setting inform.converged(j) to a negative value. The default value of 1.0 indicates that the estimated asymptotic convergence factor is not used for stagnation detection. Must be in the range \([0.5, 1.0]\).

struct spral_ssmfe_infrom

Information on progress of the algorithm.

int converged[mep]

Convergence status.

  • If converged[j]>0, the eigenpair (lambda[j], X[j]) converged on iteration converged[j].

  • If converged[j]=0, the eigenpair (lambda[j], X[j]) is still converging.

  • If converged[j]<0, the eigenpair (lambda[j], X[j]) stagnated at iteration converged[j].

This component is allocated by the routine.

double err_lambda[mep]

Estimated eigenvalue errors for converged and stagnated eigenvalues. This component is allocated by the routine.

double err_x[mep]

Estimated eigenvector errors for converged and stagnated eigenvectors. This component is allocated by the routine.

int flag

Return status of algorithm. See table below.

int iteration

Number of iterations.

double residual_norms[mep]

Euclidean norms of residuals for (lambda[:], X[:]) on return with rci.job=4, 5. This component is allocated by the routine.

int stat

Fortran allocation status in event of failure

inform.flag

-1

m is out-of-range.

-2

rci.job is out-of-range.

-3

options.err_est is out-of-range.

-4

options.minAprod is incompatible with selected routine.

-5

options.extra_left or options.extra_right is out-of-range.

-6

options.min_gap is out-of-range.

-7

options.cf_max is out-of-range.

-11

left is out-of-range (spral_ssmfe_double()) or nep is out-of-range (spral_ssmfe_largest_double()).

-12

right is out-of-range.

-100

Not enough memory; inform.stat contains the value of the Fortran stat parameter.

-200

\(B\) is not positive definite or user_x>0 and linearly dependent initial guesses were supplied.

+1

The iterations have been terminated because no further improvement in accuracy is possible (this may happen if \(B\) or the preconditioner is not positive definite, or if the components of the residual vectors are so small that the round-off errors make them essentially random). The value of inform.non_converged is set to the number of non-converged eigenpairs.

Examples

Preconditioning example

The following code computes the 5 leftmost eigenpairs of the matrix \(A\) of order 100 that approximates the two-dimensional Laplacian operator on a 20-by-20 grid. One forward and one backward Gauss-Seidel update are used for preconditioning, which halves the number of iterations compared with solving the same problem without preconditioning. The header laplace2d/h (examples/C/ssmfe/laplace2d.h) supplies the subroutine apply_laplacian() that multiplies a block of vectors by \(A\), and the subroutine apply_gauss_seidel_step() that computes \(y = T x\) for a block of vectors \(x\) by applying one forward and one backward update of the Gauss-Seidel method to the system \(A y = x\).

/* examples/C/ssmfe/precond_core.f90 */
/* Laplacian on a square grid (using SPRAL_SSMFE_CORE routines) */
#include "spral.h"
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <cblas.h>

/* Header that implements Laplacian and preconditioners */
#include "laplace2d.h"

int main(void) {
   const int ngrid = 20;         /* grid points along each side */
   const int n = ngrid*ngrid;    /* problem size */
   const int nep = 5;            /* eigenpairs wanted */
   const int m = 3;              /* dimension of the iterated subspace */
   const double tol = 1.e-6;     /* eigenvector tolerance */

   int state = SPRAL_RANDOM_INITIAL_SEED; /* PRNG state */

   int *ind = malloc(m * sizeof(*ind));          /* permutation index */
   double *lambda = malloc(n * sizeof(*lambda)); /* eigenvalues */
   double (*X)[n][n] = malloc(sizeof(*X));       /* eigenvectors */
   /* Work arrays */
   double *lmd = malloc(m * sizeof(*lmd));
   double (*rr)[3][2*m][2*m] = malloc(sizeof(*rr));
   double (*W)[7][m][n] = malloc(sizeof(*W));
   double (*U)[m][n] = malloc(sizeof(*U));

   /* Derived types */
   struct spral_ssmfe_rcid rci;              /* reverse communication data */
   struct spral_ssmfe_core_options options;  /* options */
   void *keep;                               /* private data */
   struct spral_ssmfe_inform inform;         /* information */

   /* Initialize options to default values */
   spral_ssmfe_core_default_options(&options);

   /* Initialize W to lin indep vectors by randomizing */
   for(int i=0; i<n; i++)
   for(int j=0; j<m; j++)
      (*W)[0][j][i] = spral_random_real(&state, true);

   int ncon = 0;        /* number of converged eigenpairs */
   rci.job = 0; keep = NULL;
   while(true) { /* reverse communication loop */
      spral_ssmfe_double(&rci, 0, nep, 0, m, lmd, &(*rr)[0][0][0], ind,
         &keep, &options, &inform);
      switch ( rci.job ) {
      case 1:
         apply_laplacian(
            ngrid, ngrid, rci.nx, &(*W)[rci.kx][rci.jx][0], &(*W)[rci.ky][rci.jy][0]
            );
         break;
      case 2:
         apply_gauss_seidel_step (
            ngrid, ngrid, rci.nx, &(*W)[rci.kx][rci.jx][0], &(*W)[rci.ky][rci.jy][0]
            );
         break;
      case 4:
         for(int j=0; j<m; j++) {
            if ( inform.converged[j] != 0 ) continue;
            if ( inform.err_X[j] > 0 && inform.err_X[j] < tol )
               inform.converged[j] = 1;
         }
         break;
      case 5:
         if ( rci.i < 0 ) continue;
         for(int k=0; k<rci.nx; k++) {
           int j = ncon + k;
           lambda[j] = lmd[rci.jx + k];
           cblas_dcopy( n, &(*W)[0][rci.jx+k][0], 1, &(*X)[j][0], 1 );
         }
         ncon += rci.nx;
         if ( ncon >= nep || inform.iteration > 300 ) goto finished;
         break;
      case 11:
         if ( rci.i == 0 ) {
            if ( rci.kx != rci.ky || rci.jx > rci.jy ) {
               cblas_dcopy(n*rci.nx, &(*W)[rci.kx][rci.jx][0], 1, &(*W)[rci.ky][rci.jy][0], 1);
            } else if ( rci.jx < rci.jy ) {
               for(int j=rci.nx-1; j>=0; j--)
                  cblas_dcopy(n, &(*W)[rci.kx][rci.jx+j][0], 1, &(*W)[rci.ky][rci.jy+j][0], 1);
            }
         } else {
            for(int i=0; i<n; i++) {
               for(int j=0; j<rci.nx; j++)
                  (*U)[j][i] = (*W)[rci.kx][ind[j]][i];
               for(int j=0; j<rci.nx; j++)
                  (*W)[rci.kx][j][i] = (*U)[j][i];
               if(rci.ky != rci.kx) {
                  for(int j=0; j<rci.nx; j++)
                    (*U)[j][i] = (*W)[rci.ky][ind[j]][i];
                  for(int j=0; j<rci.nx; j++)
                    (*W)[rci.ky][j][i] = (*U)[j][i];
               }
            }
         }
         break;
      case 12:
         for(int i=0; i<rci.nx; i++)
           (*rr)[rci.k][rci.j+i][rci.i+i] =
             cblas_ddot(n, &(*W)[rci.kx][rci.jx+i][0], 1, &(*W)[rci.ky][rci.jy+i][0], 1);
         break;
      case 13:
         for(int i=0; i<rci.nx; i++) {
            if( rci.kx == rci.ky ) {
               double s = cblas_dnrm2(n, &(*W)[rci.kx][rci.jx+i][0], 1);
               if( s > 0 )
                  cblas_dscal(n, 1/s, &(*W)[rci.kx][rci.jx+i][0], 1);
            } else {
               double s = sqrt(fabs(cblas_ddot(
                  n, &(*W)[rci.kx][rci.jx+i][0], 1, &(*W)[rci.ky][rci.jy+i][0], 1)
                  ));
               if ( s > 0 ) {
                  cblas_dscal(n, 1/s, &(*W)[rci.kx][rci.jx+i][0], 1);
                  cblas_dscal(n, 1/s, &(*W)[rci.ky][rci.jy+i][0], 1);
               } else {
                  for(int j=0; j<n; j++)
                     (*W)[rci.ky][rci.jy+i][j] = 0.0;
               }
            }
         }
         break;
      case 14:
         for(int i=0; i<rci.nx; i++) {
           double s = -(*rr)[rci.k][rci.j+i][rci.i+i];
           cblas_daxpy(n, s, &(*W)[rci.kx][rci.jx+i][0], 1, &(*W)[rci.ky][rci.jy+i][0], 1);
         }
         break;
      case 15:
         if ( rci.nx > 0 && rci.ny > 0 )
            cblas_dgemm(
               CblasColMajor, CblasTrans, CblasNoTrans, rci.nx, rci.ny, n,
               rci.alpha, &(*W)[rci.kx][rci.jx][0], n, &(*W)[rci.ky][rci.jy][0], n,
               rci.beta, &(*rr)[rci.k][rci.j][rci.i], 2*m
               );
         break;
      case 16: // Fall through to 17
      case 17:
         if( rci.ny < 1 ) continue;
         if( rci.nx < 1 ) {
            if( rci.job == 17 ) continue;
            if( rci.beta == 1.0 ) continue;
            for(int j=rci.jy; j<rci.jy+rci.ny; j++)
               cblas_dscal(n, rci.beta, &(*W)[rci.ky][j][0], 1);
            continue;
         }
         if( rci.job == 17 ) {
            cblas_dgemm(
               CblasColMajor, CblasNoTrans, CblasNoTrans, n, rci.ny, rci.nx,
               1.0, &(*W)[rci.kx][rci.jx][0], n, &(*rr)[rci.k][rci.j][rci.i], 2*m,
               0.0, &(*W)[rci.ky][rci.jy][0], n
               );
            cblas_dcopy(n*rci.ny, &(*W)[rci.ky][rci.jy][0], 1, &(*W)[rci.kx][rci.jx][0], 1);
         } else {
            cblas_dgemm(
               CblasColMajor, CblasNoTrans, CblasNoTrans, n, rci.ny, rci.nx,
               rci.alpha, &(*W)[rci.kx][rci.jx][0], n, &(*rr)[rci.k][rci.j][rci.i],
               2*m, rci.beta, &(*W)[rci.ky][rci.jy][0], n
               );
         }
         break;
      case 21: // Fall through to 22
      case 22:
         if( ncon > 0 ) {
            cblas_dgemm(
               CblasColMajor, CblasTrans, CblasNoTrans, ncon, rci.nx, n,
               1.0, &(*X)[0][0], n, &(*W)[rci.ky][rci.jy][0], n, 0.0, &(*U)[0][0], n
               );
            cblas_dgemm(
               CblasColMajor, CblasNoTrans, CblasNoTrans, n, rci.nx, ncon,
               -1.0, &(*X)[0][0], n, &(*U)[0][0], n, 1.0, &(*W)[rci.kx][rci.jx][0], n
               );
         }
         break;
      default:
         goto finished;
      }
   }
finished:
   if(inform.flag != 0) printf("inform.flag = %d\n", inform.flag);
   printf("%3d eigenpairs converged in %d iterations\n", ncon, inform.iteration);
   for(int i=0; i<ncon; i++)
      printf(" lambda[%1d] = %13.7e\n", i, lambda[i]);
   spral_ssmfe_core_free(&keep, &inform);
   free(ind);
   free(lambda);
   free(X);
   free(lmd);
   free(rr);
   free(W);
   free(U);

   /* Success */
   return 0;
}

This code produces the following output:

 5 eigenpairs converged in 72 iterations
lambda[0] = 4.4676695e-02
lambda[1] = 1.1119274e-01
lambda[2] = 1.1119274e-01
lambda[3] = 1.7770878e-01
lambda[4] = 2.2040061e-01

Method

The algorithm

The algorithm implemented by spral_ssmfe_double() and spral_ssmfe_largest_double() is based on the Jacobi-conjugate preconditioned gradients (JCPG) method. The outline of the algorithm, assuming for simplicity that \(0 < left \le m\) and \(right = 0\) and using Matlab notation, is as follows.

  • Initialization. Perform the Rayleigh-Ritz procedure in the trial subspace spanned by the columns of an \({\tt n} \times {\tt m}\) matrix \(X\) i.e. compute

    • \(L = X’*A*X\), if problem >= 0, and \(L = X’*B*A*B*X\) otherwise,

    • \(M = X’*B*X\) (\(B=I\) if problem = 0),

    and solve the generalized eigenvalue problem for the matrix pencil \(L - t*M\), i.e. compute an \(m\times m\) matrix \(Q\) such that \(Q’*M*Q\) is the identity matrix and \(D = Q’*L*Q\) is a diagonal matrix. Compute \(X = X*Q\) and set \(Z = F = []\).

  • Main loop.

    DO

    1. If problem>=0, compute the residual matrix \(R = A*X - B*X*D\), where \(D\) is a diagonal matrix with entries

      \[D(j,j) = \frac{X(:,j)'*A*X(:,j)}{X(:,j)'*B*X(:,j)}\]

      else compute \(R = A*B*X - X*D\), where \(D\) is a diagonal matrix with entries

      \[D(j,j) = \frac{X(:,j)'*B*A*B*X(:,j)}{X(:,j)'*B*X(:,j)}.\]
    2. Perform the orthogonalization of \(R\) to constraints \(C\) by updating \(R = R - B*C*(C’*R)\).

    3. If options.err_est= 1, compute \(R’*B*R\) and use it to compute error bounds; otherwise only compute the diagonal of this matrix and use to compute error bounds. Test for converged eigenpairs and move converged eigenvectors from \(X\) to \(C\) and reduce \(m\) accordingly. Exit the main loop if \(X = []\).

    4. If problem is non-negative, compute the preconditioned gradient matrix \(Y = T*R\).

    5. If \(Z\) is not empty, conjugate \(Y\) to \(Z\), i.e.

      1. if problem is non-negative, then compute \(P = Z’*A*Y\), otherwise compute \(P = Z’*B*A*B*Y\);

      2. compute \(S = Z’*B*Y\);

      3. update \(Y = Y + Z*H\), where

        \[H(i,j) = \frac{P(i,j) - S(i,j)*D(j,j)}{F(i,i) - D(j,j)}\]

        where \(F\) is described in the final step below..

    6. Perform orthogonalization of the search direction matrix \(Y\) to constraints \(C\) by updating \(Y = Y - C*(C’*B*Y)\).

    7. \(B\)-normalize the columns of \(Y\) and reorder them so that the \(B\)-norm of the projection of \(Y(:,j)\) onto the linear span of \(X\) and \(Y(:,1:j-1)\) is an increasing function of \(j\). Compute \(M = [X Y]’*B*[X Y]\). If the condition number of \(M\) is greater than the allowed limit of \(10^4\) then start removing the last columns of \(Y\) and \(M\) and respective rows of \(M\) until the condition number falls below the limit.

    8. Perform the Rayleigh-Ritz procedure in the linear span of columns of \([X Y]\). Update \(X\) by selecting Ritz vectors corresponding to the leftmost \(m\) Ritz values, and place the remaining Ritz vectors into \(Z\) and corresponding Ritz values onto the diagonal of \(F\).

    END DO

The orthogonalization to constraints on step 2 ensures that the algorithm deals with the residuals for the constrained problem rather than with those for the unconstrained one, which may not go to zeros if the constraints are not close enough to exact eigenvectors.

The definition of \(H(i,j)\) in step 5c ensures optimality of the new search direction vector \(Y(:,j)\), notably, the search for the minimum of the Rayleigh quotient \(D(j,j)\) in the direction of this vector produces the asymptotically smallest possible value.

The search directions cleanup procedure employed on step 7 ensures that the convergence of iterations is not damaged by the computational errors of the LAPACK eigensolver _SYGV, in the Rayleigh-Ritz procedure of step 8. The accuracy of _SYGV is affected by the condition number of \(M\). If the latter is very large, poor accuracy in the computed Ritz vectors may lead to convergence failure. The ordering of \(Y(:,j)\) ensures that the ‘least important’ search direction is dropped if the condition number of \(M\) is unacceptably large. In practice, the loss of search direction is rare and does not lead to convergence problems.

If the number of sought eigenpairs exceeds \(m\), then \(m\) is not reduced on step 3. Instead, the approximate eigenvectors moved to \(C\) are replaced with vectors from \(Z\).

Error estimation

Standard problem

If options.err_est = 1, the error estimates for the eigenvalues are based on the eigenvalues of a matrix of the form

\[\hat A = \tilde\Lambda_k - S_k^T S_k,\]

where \(\tilde\Lambda_k\) is a diagonal matrix with the \(k-1\) leftmost Ritz values \(\tilde\lambda_j\) on the diagonal, and the columns of \(S_k\) are the respective residual vectors \(r_j = A \tilde x_j - \tilde\lambda_j \tilde x_j\) divided by \(\sqrt{\lambda_k - \tilde\lambda_j}\). If \(k\) is such that \(\tilde\lambda_{k-1} < \lambda_k\), then the eigenvalues of \(\hat A\) are the left-hand side bounds for eigenvalues \(\lambda_i\), and thus the difference \(\tilde\lambda_j - \hat\lambda_j\) estimates the eigenvalue error \(\tilde\lambda_j - \lambda_j\). The unknown \(\lambda_k\) is replaced by \(\tilde\lambda_k\), and select the maximal \(k \le m\) for which the distance between \(\tilde\lambda_{k-1}\) and \(\tilde\lambda_k\) exceeds the sum of the absolute error tolerance for eigenvalues and the Frobenius norm of the matrix formed by the residuals \(r_j, j = 1,\ldots,k-1\). If \(\tilde\lambda_j - \hat\lambda_j\) is close to the machine accuracy, it may be too polluted by round-off errors to rely upon. In such case, we use instead

\[\tilde\lambda_j - \lambda_j \le \delta_j \approx \frac{\|r_j\|^2}{\tilde\lambda_k - \lambda_j}.\]

The eigenvector errors are estimated based on the Davis-Kahan inequality:

\[\min_{x \in \mathcal{X}_{k-1}} \sin\{\tilde x_j; x\} \le \frac{\|r_j\|}{\lambda_k - \tilde\lambda_j} \approx \frac{\|r_j\|}{\tilde\lambda_k - \tilde\lambda_j},\]

where \(\mathcal{X}_{k-1}\) is the invariant subspace corresponding to \(k-1\) leftmost eigenvalues.

If options.err_est = 2 the errors are estimated based on the eigenvalue decrements history, which produces an estimate for the asymptotic convergence facotr, the geometrical average of the eigenvalue error reduction per iteration:

\[q_{ij} = \left| \frac{\lambda_j^i - \lambda_j^{i-1}} {\lambda_j^i - \lambda_j^0} \right|^{\frac{1}{i}} \approx\left| \frac{\lambda_j - \lambda_j^{i-1}} {\lambda_j - \lambda_j^0} \right|^{\frac{1}{i}} = \left| \frac{\lambda_j - \lambda_j^{i-1}} {\lambda_j - \lambda_j^{i-2}} \cdots \frac{\lambda_j - \lambda_j^1} {\lambda_j - \lambda_j^0} \right|^{\frac{1}{i}}\]

where \(\lambda_j^i\) is the approximation to \(\lambda_j\) on \(i\)-th iteration (see Technical Report [1] for further details). Unlike the residual estimates mentioned in this section, such ‘kinematic’ error estimates are not guaranteed to be upper bounds for the actual errors. However, the numerical tests have demonstrated that kinematic error estimates are significantly more accurate, i.e. closer to the actual error, than the residual-based estimates. Furthermore, they straightforwardly apply to the generalized case as well.

Generalized problems

In the case of the generalized eigenvalue problem , all of the residual norms in the previous section must be replaced with \(\|\cdot\|_{B^{-1}}\)-norm of the residual \(r_j = A \tilde x_j - \tilde\lambda_j B \tilde x_j\) (\(\|r_j\|_{B^{-1}}^2 = r_j^* B^{-1} r_j\)) or its upper estimate, e.g. \(\beta_1^{-1/2}\|\cdot\|\), where \(\beta_1\) is the smallest eigenvalue of \(B\). Hence, if \(\beta_1\) is known, then the error tolerances for eigenvalues and eigenvectors must be multiplied by \(\beta_1\) and \(\sqrt{\beta_1}\) respectively. If no estimate for \(\|\cdot\|_{B^{-1}}\)-norm is available, then the use of non-zero residual tolerances and options.err_est = 1 is not recommended. In the case of problem , the residuals are computed as \(r_j = A B \tilde x_j - \tilde \lambda_j \tilde x_j\), \(B\)-norms of \(r_j\) are used in and , and Lehmann matrix becomes \(\hat A = \tilde\Lambda_k - S_k^T B\ S_k\).

References