SHA#

purpose#

The sha package finds a component-wise secant approximation to the Hessian matrix $H(x)$, for which $(H(x))_{i,j} = \partial f^2 (x) / \partial x_i \partial x_j$, $1 \leq i, j \leq n$, using values of the gradient $g(x) = \nabla_x f(x)$ of the function $f(x)$ of $n$ unknowns $x = (x_1, \ldots, x_n)^T$ at a sequence of given distinct $\{x^{(k)}\}$, $k \geq 0$. More specifically, given differences

\[s^{(k)} = x^{(k+1)} - x^{(k)} \;\;\mbox{and}\;\; y^{(k)} = g(x^{(k+1)}) - g(x^{(k)})\]

the package aims to find an approximation $B$ to $H(x)$ for which the secant conditions $B s^{(k)} \approx y^{(k)}$ hold for a chosen set of values $k$. The methods provided take advantage of the entries in the Hessian that are known to be zero.

The package is particularly intended to allow gradient-based optimization methods, that generate iterates $x^{(k+1)} = x^{(k)} + s^{(k)}$ based upon the values $g( x^{(k)})$ for $k \geq 0$, to build a suitable approximation to the Hessian $H(x^{(k+1)})$. This then gives the method an opportunity to accelerate the iteration using the Hessian approximation.

See Section 4 of $GALAHAD/doc/sha.pdf for additional details.

method#

The package computes the entries in the each row of $B$ one at a time. The entries $b_{ij}$ in row $i$ may be chosen to

\[(1) \;\;\; \min_{b_{i,j}}\!\mbox{imize} \;\; \sum_{k \in {\cal I}_i} \left[ \sum_{{\scriptscriptstyle \mbox{nonzeros}}\; j} b_{i,j} s_j^{(k)} - y_i^{(k)} \right]^2,\]

where ${\cal I}_i$ is ideally chosen to be sufficiently large so that (1) has a unique minimizer. Since this requires that there are at least as many $(s^{(k)}, y^{(k)})$ pairs as the maximum number of nonzeros in any row, this may be prohibitive in some cases. We might then be content with a minimum-norm (under-determined) least-squares solution; each row may then be processed in parallel. Or, we may take advantage of the symmetry of the Hessian, and note that if we have already found the values in row $j$, then the value $b_{i,j} = b_{j,i}$ in (1) is known before we process row $i$. Thus by ordering the rows and exploiting symmetry we may reduce the numbers of unknowns in future unprocessed rows.

In the analysis phase, we order the rows by constructing the connectivity graph—a graph comprising nodes $1$ to $n$ and edges connecting nodes $i$ and $j$ if $h_{i,j}$ is everywhere nonzero—of $H(x)$. The nodes are ordered by increasing degree (that is, the number of edges emanating from the node) using a bucket sort. The row chosen to be ordered next corresponds to a node of minimum degree, the node is removed from the graph, the degrees updated efficiently, and the process repeated until all rows have been ordered. This often leads to a significant reduction in the numbers of unknown values in each row as it is processed in turn, but numerical rounding can lead to inaccurate values in some cases. A useful remedy is to process all rows for which there are sufficient $(s^{(k)}, y^{(k)})$ as before, and then process the remaining rows taking into account the symmetry. That is, the rows and columns are rearranged so that the matrix is in block form

\[\begin{split}B = \begin{pmatrix} B_{11} & B_{12} \\ B^T_{12} & B_{22}\end{pmatrix},\end{split}\]

the $( B_{11} \;\; B_{12})$ rows are processed without regard for symmetry but give the $2,1$ block $B^T_{12}$, and finally the $2,2$ block $B_{22}$ is processed knowing $B^T_{12}$ again without respecting symmetry. The rows in blocks $( B_{11} \;\; B_{12})$ and $B_{22}$ may be computed in parallel. It is also possible to generalise this so that $B$ is decomposed into $r$ blocks, and the blocks processed one at a time recursively using the symmetry from previos rows. More details of the precise algorithms (Algorithms 2.1–2.5) are given in the reference below. The linear least-squares problems (1) themselves are solved by a choice of LAPACK packages.

references#

The method is described in detail in

J. M. Fowkes, N. I. M. Gould and J. A. Scott, “Approximating large-scale Hessians using secant equations”. Preprint TR-2024-001, Rutherford Appleton Laboratory.

functions#

sha.initialize()#

Set default option values and initialize private data.

Returns:

optionsdict

dictionary containing default control options:

errorint
error and warning diagnostics occur on stream error.

outint
general output occurs on stream out.

print_levelint
the level of output required. Possible values are:

<=0

no output.

1

a one-line summary for every iteration.

2

a summary of the inner iteration for each iteration.

>=3

increasingly verbose (debugging) output.

approximation_algorithmint
which approximation algorithm should be used? Possible values are:

1

unsymmetric, parallel (Alg 2.1 in paper).

2

symmetric (Alg 2.2 in paper).

3

composite, parallel (Alg 2.3 in paper).

4

composite, block parallel (Alg 2.4 in paper).

dense_linear_solverint
which dense linear equation solver should be used? Possible values are:

1

Gaussian elimination.

2

QR factorization.

3

singular-value decomposition.

4

singular-value decomposition with divide-and-conquer.

extra_differencesint
if available use an addition extra_differences differences.

sparse_rowint
a row is considered sparse if it has no more than .sparse_row entries.

recursion_maxint
limit on the maximum number of levels of recursion (Alg. 2.4).

recursion_entries_requiredint
the minimum number of entries in a reduced row that are required if a further level of recuresion is allowed (Alg. 2.4).

space_criticalbool
if space_critical is True, every effort will be made to use as little space as possible. This may result in longer computation time.

deallocate_error_fatalbool
if deallocate_error_fatal is True, any array/pointer deallocation error will terminate execution. Otherwise, computation will continue.

prefixstr
all output lines will be prefixed by the string contained in quotes within prefix, e.g. ‘word’ (note the qutoes) will result in the prefix word.

sha.analyse_matrix(n, ne, row, col, options=None)#

Import problem data into internal storage and compute sparsity-based reorderings prior to factorization.

Parameters:

nint
holds the dimension of the system, $n$.

neint
holds the number of entries in the upper triangular part of $H.

rowndarray(ne)
holds the row indices of the upper triangular part of $H$.

colndarray(ne)
holds the column indices of the upper triangular part of $H$.

optionsdict, optional
dictionary of control options (see sha.initialize).

Returns:

mint
gives the minimum number of $(s^{(k)},y^{(k)})$ pairs that will be needed to recover a good Hessian approximation.

sha.recover_matrix(ne, m, ls1, ls2, strans, ly1, ly2, ytrans, order=None)#

Compute the Hessian estimate $B$ from given difference pairs $(s^{(k)},y^{(k)})$.

Parameters:

neint
holds the number of entries in the upper triangular part of the matrix $H$.

mint
gives the number of $(s^{(k)},y^{(k)})$ pairs that are provided to recover the Hessian approximation (see sha.analyse_matrix for the minimum necessary).

ls1int
holds the leading (first) dimension of the array strans (below).

ls2int
holds the trailing (second) dimension of the array strans (below).

stransndarray(ls1,ls2)
holds the values of the vectors $\{s^{(k) T}\}$. Component [$k,i$] should hold $s_i^{(k)}$.

ly1int
holds the leading (first) dimension of the array ytrans (below).

ly2int
holds the trailing (second) dimension of the array ytrans (below).

ytransndarray(ly1,ly2)
holds the values of the vectors $\{y^{(k) T}\}$. Component [$k,i$] should hold $y_i^{(k)}$.

orderndarray(m), optional
holds the preferred order of access for the pairs $\{(s^{(k)},y^{(k)})\}$. The $k$-th component of order specifies the row number of strans and ytrans that will be used as the $k$-th most favoured. order need not be set if the natural order, $k, k = 1,...,$ m, is desired, and this case order should be None.

Returns:

valndarray(ne)
holds the values of the nonzeros in the upper triangle of the matrix $B$ in the same order as specified in the sparsity pattern in sha.analyse_matrix.

[optional] sha.information()

Provide optional output information.

Returns:

informdict

dictionary containing output information:

statusint
the return status. Possible values are:

0

The call was successful.

1

Insufficient data pairs $(s_i,y_i)$ have been provided, as m is too small. The returned $B$ is likely not fully accurate.

-1

An allocation error occurred. A message indicating the offending array is written on unit options[‘error’], and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

-2

A deallocation error occurred. A message indicating the offending array is written on unit options[‘error’] and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

-3

The restriction n > 0 or nz >= 0 or $\leq$ row[i] $\leq$ col[i] $\leq$ n has been violated.

-31

The call to sha_estimate was not preceded by a call to sha_analyse_matrix.

alloc_statusint
the status of the last attempted allocation/deallocation.

bad_allocstr
the name of the array for which an allocation/deallocation error occurred.

max_degreeint
the maximum degree in the adgacency graph.

differences_neededint
the number of differences that will be needed.

max_reduced_degreeint
the maximum reduced degree in the adgacency graph.

approximation_algorithm_usedint
the approximation algorithm actually used.

bad_rowint
a failure occured when forming the bad_row-th row (0 = no failure).

sha.terminate()#

Deallocate all internal private storage.