TREK#

purpose#

The trek package uses an extended-Krylov-subspace iteration to find the global minimizer of a quadratic objective function within an ellipsoidal region; this is commonly known as the trust-region subproblem. The aim is to minimize the quadratic objective function

\[q(x) = f + c^T x + \frac{1}{2} x^T H x,\]
where the vector \(x\) is required to satisfy the ellipsoidal trust-region constraint \(\|x\|_S \leq \Delta\), where the \(S\)-norm of \(x\) is defined to be \(\|x\|_S = \sqrt{x^T S x}\), and where the radius \(\Delta > 0\). The matrix \(S\) need not be provided in the commonly-occurring \(\ell_2\)-trust-region case for which \(S = I\), the \(n\) by \(n\) identity matrix.

Factorization of the matrices \(H\) and, if present, \(S\) will be required, so this package is most suited for the case where such a factorization may be found efficiently. If this is not the case, the package gltr may be preferred.

See Section 4 of $GALAHAD/doc/trek.pdf for additional details.

method#

The required solution \(x_*\) necessarily satisfies the optimality condition \(H x_* + \lambda_* S x_* + c = 0\), where \(\lambda_* \geq 0\) is a Lagrange multiplier corresponding to the constraint \(\|x\|_M \leq \Delta\). In addition in all cases, the matrix \(H + \lambda_* S\) will be positive semi-definite; in most instances it will actually be positive definite, but in special “hard” cases singularity is a possibility.

The method is iterative, and is based upon building a solution approximation from an orthogonal basis of the evolving extended Krylov subspaces \({\cal K}_{2m+1}(H,c) = \mbox{span}\{c,H^{-1}c,H c,H^{-2}c,H^2c,\ldots,\) \(H^{-m}c,H^{m}c\}\) as \(m\) increases. The key observations are (i) the manifold of solutions to the optimality system \[ ( H + \lambda I ) x(\lambda) = - c\] as a function of \(\sigma\) is of approximately very low rank, (ii) the subspace \({\cal K}_{2m+1}(H,c)\) rapidly gives a very good approximation to this manifold, (iii) it is straightforward to build an orthogonal basis of \({\cal K}_{2m+1}(H,c)\) using short-term recurrences and a single factorization of \(H\), and (iv) solutions to the trust-region subproblem restricted to elements of the orthogonal subspace may be found very efficiently using effective high-order root-finding methods. The fact that the second element in the subspace is \(H^{-1} c\) means that it is easy to check for the interior-solution possibility \(x = - H^{-1} c\) that occurs when such a \(x\) satisfies \(\|x\| \leq \Delta\). Coping with general scalings \(S\) is a straightforward extension so long as factorization of \(S\) is also possible.

reference#

The method is described in detail in

H. Al Daas and N. I. M. Gould. Extended-Krylov-subspace methods for trust-region and norm-regularization subproblems. Preprint STFC-P-2025-002, Rutherford Appleton Laboratory, Oxfordshire, England.

matrix storage#

The symmetric \(n\) by \(n\) matrices \(H\) and, optionally, \(S\) may also be presented and stored in a variety of formats. But crucially symmetry is exploited by only storing values from the lower triangular part (i.e, those entries that lie on or below the leading diagonal).

Dense storage format: The matrix \(H\) is stored as a compact dense matrix by rows, that is, the values of the entries of each row in turn are stored in order within an appropriate real one-dimensional array. Since \(H\) is symmetric, only the lower triangular part (that is the part \(H_{ij}\) for \(0 \leq j \leq i \leq n-1\)) need be held. In this case the lower triangle should be stored by rows, that is component \(i * i / 2 + j\) of the storage array H_val will hold the value \(H_{ij}\) (and, by symmetry, \(H_{ji}\)) for \(0 \leq j \leq i \leq n-1\). The string H_type = ‘dense’ should be specified.

Sparse co-ordinate storage format: Only the nonzero entries of the matrices are stored. For the \(l\)-th entry, \(0 \leq l \leq ne-1\), of \(H\), its row index i, column index j and value \(H_{ij}\), \(0 \leq j \leq i \leq n-1\), are stored as the \(l\)-th components of the integer arrays H_row and H_col and real array H_val, respectively, while the number of nonzeros is recorded as H_ne = \(ne\). Note that only the entries in the lower triangle should be stored. The string H_type = ‘coordinate’ should be specified.

Sparse row-wise storage format: Again only the nonzero entries are stored, but this time they are ordered so that those in row i appear directly before those in row i+1. For the i-th row of \(H\) the i-th component of the integer array H_ptr holds the position of the first entry in this row, while H_ptr(n) holds the total number of entries. The column indices j, \(0 \leq j \leq i\), and values \(H_{ij}\) of the entries in the i-th row are stored in components l = H_ptr(i), …, H_ptr(i+1)-1 of the integer array H_col, and real array H_val, respectively. Note that as before only the entries in the lower triangle should be stored. For sparse matrices, this scheme almost always requires less storage than its predecessor. The string H_type = ‘sparse_by_rows’ should be specified.

Diagonal storage format: If \(H\) is diagonal (i.e., \(H_{ij} = 0\) for all \(0 \leq i \neq j \leq n-1\)) only the diagonals entries \(H_{ii}\), \(0 \leq i \leq n-1\) need be stored, and the first n components of the array H_val may be used for the purpose. The string H_type = ‘diagonal’ should be specified.

Multiples of the identity storage format: If \(H\) is a multiple of the identity matrix, (i.e., \(H = \alpha I\) where \(I\) is the n by n identity matrix and \(\alpha\) is a scalar), it suffices to store \(\alpha\) as the first component of H_val. The string H_type = ‘scaled_identity’ should be specified.

The identity matrix format: If \(H\) is the identity matrix, no values need be stored. The string H_type = ‘identity’ should be specified.

The zero matrix format: The same is true if \(H\) is the zero matrix, but now the string H_type = ‘zero’ or ‘none’ should be specified.

functions#

trek.initialize()#

Set default option values and initialize private data

Returns:

optionsdict
dictionary containing default control options:
errorint

error and warning diagnostics occur on stream error.

outint

general output occurs on stream out.

print_levelint

the level of output required is specified by print_level. Possible values are

  • <=0

    gives no output,

  • 1

    gives a one-line summary for every iteration.

  • 2

    gives a summary of the inner iteration for each iteration.

  • >=3

    gives increasingly verbose (debugging) output.

eks_maxint

maximum dimension of the extended Krylov space employed. If a negative value is given, the value 100 will be used instead.

it_maxint

the maximum number of iterations allowed. If a negative value is given, the value 100 will be used instead.

ffloat

the value of \(f\) in the objective function. This value has no effect on the computed \(x\), and takes the value 0.0 by default.

reductionfloat

the value of the reduction factor for a suggested subsequent trust-region radius, see control[‘next_radius’]. The suggested radius will be reduction times the smaller of the current radius and \(\|x\|_S\) at the output \(x\).

stop_residualfloat

the value of the stopping tolerance used by the algorithm. The iteration stops as soon as \(x\) and \(\lambda\) are found to satisfy \(\| ( H + \lambda S ) x + c \| <\) stop_residual \(\times \max( 1, \|c\| )\).

reorthogonalizebool

should be set to True if the generated basis of the extended-Krylov subspace is to be reorthogonalized at every iteration. This can be very expensive, and is generally not warranted.

s_version_52bool

should be set to True if Algorithm 5.2 in the paper is used to generate the extended Krylov space recurrences when a non-unit \(S\) is given, and False if those from Algorithm B.3 ares used instead. In practice, there is very little difference in performance and accuracy.

perturb_cbool

should be set to True if the user wishes to make a tiny pseudo-random perturbations to the components of the term \(c\) to try to protect from the so-called (probability zero) “hard” case. Perturbations are generally not needed, and should only be used in very exceptional cases.

stop_check_all_ordersbool

should be set to True if the algorithm checks for termination for each new member of the extended Krylov space. Such checks incur some extra cost, and experience shows that testing every second member is sufficient.

new_radiusbool

should be set to True if the call retains the previous \(H\), \(S\) and \(c\), but with a new, smaller radius.

new_valuesbool

should be set to True if the any of the values of \(H\), \(S\) and \(c\) has changed since a previous call.

space_criticalbool

if space_critical is True, every effort will be made to use as little space as possible. This may result in longer computation time.

deallocate_error_fatalbool

if deallocate_error_fatal is True, any array/pointer deallocation error will terminate execution. Otherwise, computation will continue.

linear_solverstr

linear equation solver used for systems involving \(H\).

linear_solver_for_sstr

linear equation solver used for systems involving \(S\).

prefixstr

all output lines will be prefixed by the string contained in quotes within prefix, e.g. ‘word’ (note the qutoes) will result in the prefix word.

sls_optionsdict

default control options for SLS (see sls.initialize).

sls_s_optionsdict

default control options for SLS applied to \(S\) (see sls.initialize).

trs_optionsdict

default control options for TRS (see trs.initialize).

trek.load(n, H_type, H_ne, H_row, H_col, H_ptr, options=None)#

Import problem data into internal storage prior to solution.

Parameters:

nint

holds the number of variables.

H_typestring

specifies the symmetric storage scheme used for the Hessian \(H\). It should be one of ‘coordinate’, ‘sparse_by_rows’, ‘dense’, ‘diagonal’, ‘scaled_identity’, ‘identity’, ‘zero’ or ‘none’; lower or upper case variants are allowed.

H_neint

holds the number of entries in the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other schemes.

H_rowndarray(H_ne)

holds the row indices of the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other schemes, and in this case can be None.

H_colndarray(H_ne)

holds the column indices of the lower triangular part of \(H\) in either the sparse co-ordinate, or the sparse row-wise storage scheme. It need not be set when the other storage schemes are used, and in this case can be None.

H_ptrndarray(n+1)

holds the starting position of each row of the lower triangular part of \(H\), as well as the total number of entries, in the sparse row-wise storage scheme. It need not be set when the other schemes are used, and in this case can be None.

optionsdict, optional

dictionary of control options (see trek.initialize).

[optional] trek.load_s(n, S_type, S_ne, S_row, S_col, S_ptr, options=None)

Import problem data for the scaling matrix \(S\), if needed, into internal storage prior to solution.

Parameters:

nint

holds the number of variables.

S_typestring

specifies the symmetric storage scheme used for the Hessian \(H\). It should be one of ‘coordinate’, ‘sparse_by_rows’, ‘dense’, ‘diagonal’, ‘scaled_identity’, ‘identity’, ‘zero’ or ‘none’; lower or upper case variants are allowed.

S_neint

holds the number of entries in the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other schemes.

S_rowndarray(S_ne)

holds the row indices of the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other schemes, and in this case can be None.

S_colndarray(S_ne)

holds the column indices of the lower triangular part of \(H\) in either the sparse co-ordinate, or the sparse row-wise storage scheme. It need not be set when the other storage schemes are used, and in this case can be None.

S_ptrndarray(n+1)

holds the starting position of each row of the lower triangular part of \(H\), as well as the total number of entries, in the sparse row-wise storage scheme. It need not be set when the other schemes are used, and in this case can be None.

optionsdict, optional

dictionary of control options (see trek.initialize).

[optional] trek.reset_options(options)

Reset control parameters after import if required.

Parameters:

optionsdict

dictionary of control options (see trek.initialize).

trek.solve_problem(n, H_ne, H_val, c, radius, S_ne, S_val)#

Find the global minimizer of the quadratic objective function \(q(x)\) within the trust-region constraint.

Parameters:

nint

holds the number of variables.

H_neint

holds the number of entries in the lower triangular part of the Hessian \(H\).

H_valndarray(H_ne)

holds the values of the nonzeros in the lower triangle of the Hessian \(H\) in the same order as specified in the sparsity pattern in trek.load.

cndarray(n)

holds the values of the linear term \(c\) in the objective function.

radiusfloat

holds the strictly positive trust-region radius, \(\Delta\).

S_neint

holds the number of entries in the lower triangular part of the scaling matrix \(S\) if it is not the identity matrix. Otherwise it should be None.

S_valndarray(S_ne)

holds the values of the nonzeros in the lower triangle of the scaling matrix \(S\) in the same order as specified in the sparsity pattern in trek.load_s if needed. Otherwise it should be None.

Returns:

xndarray(n)

holds the values of the approximate minimizer \(x\) after a successful call.

[optional] trek.information()

Provide optional output information

Returns:

informdict
dictionary containing output information:
statusint

return status. Possible values are:

  • 0

    The run was successful.

  • -1

    An allocation error occurred. A message indicating the offending array is written on unit options[‘error’], and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

  • -2

    A deallocation error occurred. A message indicating the offending array is written on unit options[‘error’] and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

  • -3 The restriction n > 0, radius > 0, or requirement that type contains its relevant string ‘dense’, ‘coordinate’, ‘sparse_by_rows’, ‘diagonal’, ‘scaled_identity’, ‘identity’, ‘zero’ or ‘none’ has been violated.

  • -9

    The analysis phase of the factorization failed; the return status from the factorization package is given by inform[‘sls_inform’][‘status’] or inform[‘sls_s_inform’][‘status’] as appropriate.

  • -10

    The factorization failed; the return status from the factorization package is given by inform[‘sls_inform’][‘status’] or inform[‘sls_s_inform’][‘status’] as appropriate.

  • -11

    The solution of a set of linear equations using factors from the factorization package failed; the return status from the factorization package is given by inform[‘sls_inform’][‘status’] or inform[‘sls_s_inform’][‘status’] as appropriate.

  • -15

    \(S\) does not appear to be strictly diagonally dominant.

  • -16

    The problem is so ill-conditioned that further progress is impossible.

  • -18

    Too many iterations have been required. This may happen if options[‘eks max’] is too small, but may also be symptomatic of a badly scaled problem.

  • -31

    A resolve call has been made before an initial call (see options[‘new_radius’] and options[‘new_values’]).

  • -38

    An error occurred in a call to an LAPACK subroutine.

alloc_statusint

the status of the last attempted allocation/deallocation.

bad_allocstr

the name of the array for which an allocation/deallocation error occurred.

iterint

the total number of iterations required

n_vecint

the number of orthogonal vectors required

objfloat

the value of the quadratic function.

x_normfloat

the \(S\)-norm of \(x\), \(||x||_S\).

multiplierfloat

the Lagrange multiplier corresponding to the trust-region constraint.

radiusfloat

the current trust-region radius

next_radiusfloat

the proposed next trust-region radius to be used

errorfloat

the maximum relative residual error

timedict
dictionary containing timing information:
totalfloat

total CPU time spent in the package.

assemblefloat

CPU time spent building \(H\) and \(S\).

analysefloat

CPU time spent reordering \(H\) and \(S\) prior to factorization.

factorizefloat

CPU time spent factorizing \(H\) and \(S\).

solvefloat

CPU time spent solving linear systems inolving \(H\) and \(S\).

clock_totalfloat

total clock time spent in the package.

clock_assemblefloat

clock time spent building \(H\) and \(S\).

clock_analysefloat

clock time spent reordering \(H\) and \(S\) prior to factorization.

clock_factorizefloat

clock time spent factorizing \(H\) and \(S\).

clock_solvefloat

clock time spent solving linear systems inolving \(H\) and \(S\).

sls_informdict

inform parameters for SLS for \(H\) (see sls.information).

sls_s_informdict

inform parameters for SLS for \(S\) (see sls.information).

trs_informdict

inform parameters for TRS (see trs.information).

trek.terminate()#

Deallocate all internal private storage.

example code#

from galahad import trek
import numpy as np
np.set_printoptions(precision=4,suppress=True,floatmode='fixed')
print("\n** python test: trek")

# set parameters
p = 1.0
n = 3
m = 1
infinity = float("inf")

#  describe objective function

f = 0.96
g = np.array([0.0,2.0,0.0])
H_type = 'coordinate'
H_ne = 4
H_row = np.array([0,1,2,2])
H_col = np.array([0,1,2,0])
H_ptr = None
H_val = np.array([1.0,2.0,3.0,4.0])

#  describe norm

S_type = 'coordinate'
S_ne = 3
S_row = np.array([0,1,2])
S_col = np.array([0,1,2])
S_ptr = None
S_val = np.array([1.0,2.0,1.0])

# allocate internal data and set default options
options = trek.initialize()

# set some non-default options
options['print_level'] = 0
#print("options:", options)

# load data (and optionally non-default options)
trek.load(n, H_type, H_ne, H_row, H_col, H_ptr, options)

# set trust-region radius

radius = 1.0

# find minimum of quadratic within the trust region
print("\n solve problem 1")
x = trek.solve_problem(n, H_ne, H_val, g, radius)
print(" x:",x)

# get information
inform = trek.information()
print(" f: %.4f" % inform['obj'])

# reset trust-region radius to the suggested smaller value

radius = inform['next_radius']
options['new_radius'] = True
trek.reset_options(options)

# find minimum of quadratic within the trust region
print("\n solve problem 2 with smaller radius")
x = trek.solve_problem(n, H_ne, H_val, g, radius)
print(" x:",x)

# get information
inform = trek.information()
print(" f: %.4f" % inform['obj'])

# reinitialize trust-region radius

radius = 1.0
options['new_radius'] = False
trek.reset_options(options)

# load data (and optionally non-default options)
trek.load_s(n, S_type, S_ne, S_row, S_col, S_ptr)

# find minimum of quadratic within the trust region
print("\n solve problem 3 with additional non-unit norm")
x = trek.solve_problem(n, H_ne, H_val, g, radius, S_ne, S_val)
print(" x:",x)

# get information
inform = trek.information()
print(" f: %.4f" % inform['obj'])

# deallocate internal data

trek.terminate()

This example code is available in $GALAHAD/src/trek/Python/test_trek.py .