TRS#

purpose#

The trs package uses matrix factorization to find the global minimizer of a quadratic objective function within an ellipsoidal region; this is commonly known as the trust-region subproblem. The aim is to minimize the quadratic objective function

\[q(x) = f + g^T x + \frac{1}{2} x^T H x,\]
where the vector \(x\) is required to satisfy the ellipsoidal trust-region constraint \(\|x\|_M \leq \Delta\), and optionally affine constraints \(A x = 0\), where the \(M\)-norm of \(x\) is defined to be \(\|x\|_M = \sqrt{x^T M x}\), and where the radius \(\Delta > 0\).

The package may also be used to solve the related problem in which \(x\) is instead required to satisfy the equality constraint \(\|x\|_M = \Delta\). The matrix \(M\) need not be provided in the commonly-occurring \(\ell_2\)-trust-region case for which \(M = I\), the \(n\) by \(n\) identity matrix.

Factorization of matrices of the form \(H + \lambda M\), or

\[\begin{split}\left(\begin{array}{cc} H + \lambda M & A^T \\ A & 0 \end{array}\right)\;\;\mbox{(1)}\end{split}\]
in cases where \(A x = 0\) is imposed, for a succession of scalars \(\lambda\) will be required, so this package is most suited for the case where such a factorization may be found efficiently. If this is not the case, the package gltr may be preferred.

See Section 4 of $GALAHAD/doc/trs.pdf for additional details.

method#

The required solution \(x_*\) necessarily satisfies the optimality condition \(H x_* + \lambda_* M x_* + A^T y_* + g = 0\) and \(A x_* = 0\), where \(\lambda_* \geq 0\) is a Lagrange multiplier corresponding to the constraint \(\|x\|_M \leq \Delta\) and \(y_*\) are Lagrange multipliers for the linear constraints \(A x = 0\), if any; for the equality-constrained problem \(\|x\|_M = \Delta\), the multiplier is unconstrained. In addition in all cases, the matrix \(H + \lambda_* M\) will be positive semi-definite on the null-space of \(A\); in most instances it will actually be positive definite, but in special ``hard’’ cases singularity is a possibility.

The method is iterative, and proceeds in two phases. Firstly, lower and upper bounds, \(\lambda_L\) and \(\lambda_U\), on \(\lambda_*\) are computed using Gershgorin’s theorems and other eigenvalue bounds. The first phase of the computation proceeds by progressively shrinking the bound interval \([\lambda_L,\lambda_U]\) until a value \(\lambda\) for which \(\|x(\lambda)\|_M \geq \Delta\) is found. Here \(x(\lambda)\) and its companion \(y(\lambda)\) are defined to be a solution of

\[(H + \lambda M)x(\lambda) + A^T y(\lambda) = - g \;\;\mbox{and}\;\; A x(\lambda) = 0;\;\;\mbox{(2)}\]
along the way the possibility that \(H\) might be positive definite on the null-space of \(A\) and \(\|x(0)\|_M \leq \Delta\) is examined, and if this transpires the process is terminated with \(x_* = x(0)\). Once the terminating \(\lambda\) from the first phase has been discovered, the second phase consists of applying Newton or higher-order iterations to the nonlinear “secular” equation \(\|x(\lambda)\|_M = \Delta\) with the knowledge that such iterations are both globally and ultimately rapidly convergent. It is possible in the “hard” case that the interval in the first-phase will shrink to the single point \(\lambda_*\), and precautions are taken, using inverse iteration with Rayleigh-quotient acceleration to ensure that this too happens rapidly.

The dominant cost is the requirement that we solve a sequence of linear systems (2). In the absence of linear constraints, an efficient sparse Cholesky factorization with precautions to detect indefinite \(H + \lambda M\) is used. If \(A x = 0\) is required, a sparse symmetric, indefinite factorization of (1) is used rather than a Cholesky factorization.

reference#

The method is described in detail in

H. S. Dollar, N. I. M. Gould and D. P. Robinson. ``On solving trust-region and other regularised subproblems in optimization’’. Mathematical Programming Computation 2(1) (2010) 21–57.

matrix storage#

The unsymmetric \(m\) by \(n\) matrix \(A\), if it is needed, may be presented and stored in a variety of convenient input formats.

Dense storage format: The matrix \(A\) is stored as a compact dense matrix by rows, that is, the values of the entries of each row in turn are stored in order within an appropriate real one-dimensional array. In this case, component \(n \ast i + j\) of the storage array A_val will hold the value \(A_{ij}\) for \(0 \leq i \leq m-1\), \(0 \leq j \leq n-1\). The string A_type = ‘dense’ should be specified.

Dense by columns storage format: The matrix \(A\) is stored as a compact dense matrix by columns, that is, the values of the entries of each column in turn are stored in order within an appropriate real one-dimensional array. In this case, component \(m \ast j + i\) of the storage array A_val will hold the value \(A_{ij}\) for \(0 \leq i \leq m-1\), \(0 \leq j \leq n-1\). The string A_type = ‘dense_by_columns’ should be specified.

Sparse co-ordinate storage format: Only the nonzero entries of the matrices are stored. For the \(l\)-th entry, \(0 \leq l \leq ne-1\), of \(A\), its row index i, column index j and value \(A_{ij}\), \(0 \leq i \leq m-1\), \(0 \leq j \leq n-1\), are stored as the \(l\)-th components of the integer arrays A_row and A_col and real array A_val, respectively, while the number of nonzeros is recorded as A_ne = \(ne\). The string A_type = ‘coordinate’ should be specified.

Sparse row-wise storage format: Again only the nonzero entries are stored, but this time they are ordered so that those in row i appear directly before those in row i+1. For the i-th row of \(A\) the i-th component of the integer array A_ptr holds the position of the first entry in this row, while A_ptr(m) holds the total number of entries. The column indices j, \(0 \leq j \leq n-1\), and values \(A_{ij}\) of the nonzero entries in the i-th row are stored in components l = A_ptr(i), \(\ldots\), A_ptr(i+1)-1, \(0 \leq i \leq m-1\), of the integer array A_col, and real array A_val, respectively. For sparse matrices, this scheme almost always requires less storage than its predecessor. The string A_type = ‘sparse_by_rows’ should be specified.

Sparse column-wise storage format: Once again only the nonzero entries are stored, but this time they are ordered so that those in column j appear directly before those in column j+1. For the j-th column of \(A\) the j-th component of the integer array A_ptr holds the position of the first entry in this column, while A_ptr(n) holds the total number of entries. The row indices i, \(0 \leq i \leq m-1\), and values \(A_{ij}\) of the nonzero entries in the j-th columnsare stored in components l = A_ptr(j), \(\ldots\), A_ptr(j+1)-1, \(0 \leq j \leq n-1\), of the integer array A_row, and real array A_val, respectively. As before, for sparse matrices, this scheme almost always requires less storage than the co-ordinate format. The string A_type = ‘sparse_by_columns’ should be specified.

The symmetric \(n\) by \(n\) matrices \(H\) and, optionally. \(M\) may also be presented and stored in a variety of formats. But crucially symmetry is exploited by only storing values from the lower triangular part (i.e, those entries that lie on or below the leading diagonal).

Dense storage format: The matrix \(H\) is stored as a compact dense matrix by rows, that is, the values of the entries of each row in turn are stored in order within an appropriate real one-dimensional array. Since \(H\) is symmetric, only the lower triangular part (that is the part \(H_{ij}\) for \(0 \leq j \leq i \leq n-1\)) need be held. In this case the lower triangle should be stored by rows, that is component \(i * i / 2 + j\) of the storage array H_val will hold the value \(H_{ij}\) (and, by symmetry, \(H_{ji}\)) for \(0 \leq j \leq i \leq n-1\). The string H_type = ‘dense’ should be specified.

Sparse co-ordinate storage format: Only the nonzero entries of the matrices are stored. For the \(l\)-th entry, \(0 \leq l \leq ne-1\), of \(H\), its row index i, column index j and value \(H_{ij}\), \(0 \leq j \leq i \leq n-1\), are stored as the \(l\)-th components of the integer arrays H_row and H_col and real array H_val, respectively, while the number of nonzeros is recorded as H_ne = \(ne\). Note that only the entries in the lower triangle should be stored. The string H_type = ‘coordinate’ should be specified.

Sparse row-wise storage format: Again only the nonzero entries are stored, but this time they are ordered so that those in row i appear directly before those in row i+1. For the i-th row of \(H\) the i-th component of the integer array H_ptr holds the position of the first entry in this row, while H_ptr(n) holds the total number of entries. The column indices j, \(0 \leq j \leq i\), and values \(H_{ij}\) of the entries in the i-th row are stored in components l = H_ptr(i), …, H_ptr(i+1)-1 of the integer array H_col, and real array H_val, respectively. Note that as before only the entries in the lower triangle should be stored. For sparse matrices, this scheme almost always requires less storage than its predecessor. The string H_type = ‘sparse_by_rows’ should be specified.

Diagonal storage format: If \(H\) is diagonal (i.e., \(H_{ij} = 0\) for all \(0 \leq i \neq j \leq n-1\)) only the diagonals entries \(H_{ii}\), \(0 \leq i \leq n-1\) need be stored, and the first n components of the array H_val may be used for the purpose. The string H_type = ‘diagonal’ should be specified.

Multiples of the identity storage format: If \(H\) is a multiple of the identity matrix, (i.e., \(H = \alpha I\) where \(I\) is the n by n identity matrix and \(\alpha\) is a scalar), it suffices to store \(\alpha\) as the first component of H_val. The string H_type = ‘scaled_identity’ should be specified.

The identity matrix format: If \(H\) is the identity matrix, no values need be stored. The string H_type = ‘identity’ should be specified.

The zero matrix format: The same is true if \(H\) is the zero matrix, but now the string H_type = ‘zero’ or ‘none’ should be specified.

functions#

trs.initialize()#

Set default option values and initialize private data

Returns:

optionsdict
dictionary containing default control options:
errorint

error and warning diagnostics occur on stream error.

outint

general output occurs on stream out.

problemint

unit to write problem data into file problem_file.

print_levelint

the level of output required is specified by print_level. Possible values are

  • <=0

    gives no output,

  • 1

    gives a one-line summary for every iteration.

  • 2

    gives a summary of the inner iteration for each iteration.

  • >=3

    gives increasingly verbose (debugging) output.

dense_factorizationint

should the problem be solved by dense factorization? Possible values are

  • 0

    sparse factorization will be used

  • 1

    dense factorization will be used

  • other

    the choice is made automatically depending on the dimension & sparsity.

new_hint

how much of \(H\) has changed since the previous call. Possible values are

  • 0

    unchanged

  • 1

    values but not indices have changed

  • 2

    values and indices have changed.

new_mint

how much of \(M\) has changed since the previous call. Possible values are

  • 0

    unchanged

  • 1

    values but not indices have changed

  • 2

    values and indices have changed.

new_aint

how much of \(A\) has changed since the previous call. Possible values are

  • 0

    unchanged

  • 1

    values but not indices have changed

  • 2

    values and indices have changed.

max_factorizationsint

the maximum number of factorizations (=iterations) allowed. -ve implies no limit.

inverse_itmaxint

the number of inverse iterations performed in the “maybe hard” case.

taylor_max_degreeint

maximum degree of Taylor approximant allowed.

initial_multiplierfloat

initial estimate of the Lagrange multipler.

lowerfloat

lower and upper bounds on the multiplier, if known.

upperfloat

see lower.

stop_normalfloat

stop when \(| ||x|| - \Delta | \leq\) max( stop_normal * \(\Delta\), stop_absolute_normal ).

stop_absolute_normalfloat

see stop_normal.

stop_hardfloat

stop when bracket on optimal multiplier <= stop_hard * max( bracket ends ).

start_invit_tolfloat

start inverse iteration when bracket on optimal multiplier <= stop_start_invit_tol * max( bracket ends ).

start_invitmax_tolfloat

start full inverse iteration when bracket on multiplier <= stop_start_invitmax_tol * max( bracket ends).

equality_problembool

is the solution is required to lie on the boundary (i.e., is the constraint an equality)?.

use_initial_multiplierbool

ignore initial_multiplier?.

initialize_approx_eigenvectorbool

should a suitable initial eigenvector should be chosen or should a previous eigenvector may be used?.

force_Newtonbool

ignore the trust-region if \(H\) is positive definite.

space_criticalbool

if space_critical is True, every effort will be made to use as little space as possible. This may result in longer computation time.

deallocate_error_fatalbool

if deallocate_error_fatal is True, any array/pointer deallocation error will terminate execution. Otherwise, computation will continue.

problem_filestr

name of file into which to write problem data.

symmetric_linear_solverstr

symmetric (indefinite) linear equation solver.

definite_linear_solverstr

definite linear equation solver.

prefixstr

all output lines will be prefixed by the string contained in quotes within prefix, e.g. ‘word’ (note the qutoes) will result in the prefix word.

sls_optionsdict

default control options for SLS (see sls.initialize).

ir_optionsdict

default control options for IR (see ir.initialize).

trs.load(n, H_type, H_ne, H_row, H_col, H_ptr, options=None)#

Import problem data into internal storage prior to solution.

Parameters:

nint

holds the number of variables.

H_typestring

specifies the symmetric storage scheme used for the Hessian \(H\). It should be one of ‘coordinate’, ‘sparse_by_rows’, ‘dense’, ‘diagonal’, ‘scaled_identity’, ‘identity’, ‘zero’ or ‘none’; lower or upper case variants are allowed.

H_neint

holds the number of entries in the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other schemes.

H_rowndarray(H_ne)

holds the row indices of the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other schemes, and in this case can be None.

H_colndarray(H_ne)

holds the column indices of the lower triangular part of \(H\) in either the sparse co-ordinate, or the sparse row-wise storage scheme. It need not be set when the other storage schemes are used, and in this case can be None.

H_ptrndarray(n+1)

holds the starting position of each row of the lower triangular part of \(H\), as well as the total number of entries, in the sparse row-wise storage scheme. It need not be set when the other schemes are used, and in this case can be None.

optionsdict, optional

dictionary of control options (see trs.initialize).

trs.load_m(n, M_type, M_ne, M_row, M_col, M_ptr, options=None)#

Import problem data for the scaling matrix \(M\), if needed, into internal storage prior to solution.

Parameters:

nint

holds the number of variables.

M_typestring

specifies the symmetric storage scheme used for the Hessian \(H\). It should be one of ‘coordinate’, ‘sparse_by_rows’, ‘dense’, ‘diagonal’, ‘scaled_identity’, ‘identity’, ‘zero’ or ‘none’; lower or upper case variants are allowed.

M_neint

holds the number of entries in the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other schemes.

M_rowndarray(M_ne)

holds the row indices of the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other schemes, and in this case can be None.

M_colndarray(M_ne)

holds the column indices of the lower triangular part of \(H\) in either the sparse co-ordinate, or the sparse row-wise storage scheme. It need not be set when the other storage schemes are used, and in this case can be None.

M_ptrndarray(n+1)

holds the starting position of each row of the lower triangular part of \(H\), as well as the total number of entries, in the sparse row-wise storage scheme. It need not be set when the other schemes are used, and in this case can be None.

optionsdict, optional

dictionary of control options (see trs.initialize).

trs.load_a(m, A_type, A_ne, A_row, A_col, A_ptr, options=None)#

Import problem data for the constraint matrix \(A\), if needed, into internal storage prior to solution.

Parameters:

mint

holds the number of constraints.

A_typestring

specifies the unsymmetric storage scheme used for the Hessian \(A\). It should be one of ‘coordinate’, ‘sparse_by_rows’ or ‘dense’; lower or upper case variants are allowed.

A_neint

holds the number of entries in \(A\) in the sparse co-ordinate storage scheme. It need not be set for any of the other schemes.

A_rowndarray(A_ne)

holds the row indices of \(A\) in the sparse co-ordinate storage scheme. It need not be set for any of the other schemes, and in this case can be None.

A_colndarray(A_ne)

holds the column indices of \(A\) in either the sparse co-ordinate, or the sparse row-wise storage scheme. It need not be set when the other storage schemes are used, and in this case can be None.

A_ptrndarray(m+1)

holds the starting position of \(A\), as well as the total number of entries, in the sparse row-wise storage scheme. It need not be set when the other schemes are used, and in this case can be None.

optionsdict, optional

dictionary of control options (see trs.initialize).

trs.solve_problem(n, radius, f, g, H_ne, H_val, M_ne, M_val, m, A_ne, A_val)#

Find the global minimizer of the quadratic objective function \(q(x)\) within the intersection of the trust-region and affine constraints.

Parameters:

nint

holds the number of variables.

radiusfloat

holds the strictly positive trust-region radius, \(\Delta\).

ffloat

holds the constant term \(f\) in the objective function.

gndarray(n)

holds the values of the linear term \(g\) in the objective function.

H_neint

holds the number of entries in the lower triangular part of the Hessian \(H\).

H_valndarray(H_ne)

holds the values of the nonzeros in the lower triangle of the Hessian \(H\) in the same order as specified in the sparsity pattern in trs.load.

M_neint

holds the number of entries in the lower triangular part of the scaling matrix \(M\) if it is not the identity matrix. Otherwise it should be None.

M_valndarray(M_ne)

holds the values of the nonzeros in the lower triangle of the scaling matrix \(M\) in the same order as specified in the sparsity pattern in trs.load_m if needed. Otherwise it should be None.

mint

holds the number of constraints.

A_neint

holds the number of entries in the lower triangular part of the constraint matrix \(A\) if \(m > 0\). Otherwise it should be None.

A_valndarray(A_ne)

holds the values of the nonzeros in the lower triangle of the constraint matrix \(A\) in the same order as specified in the sparsity pattern in trs.load_a if needed. Otherwise it should be None.

Returns:

xndarray(n)

holds the values of the approximate minimizer \(x\) after a successful call.

yndarray(m)

holds the values of the Lagrange multipliers associated with the affine constraints, if any. Absent if trs.load_a has not been called.

[optional] trs.information()

Provide optional output information

Returns:

informdict
dictionary containing output information:
statusint

return status. Possible values are:

  • 0

    The run was successful.

  • -1

    An allocation error occurred. A message indicating the offending array is written on unit options[‘error’], and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

  • -2

    A deallocation error occurred. A message indicating the offending array is written on unit options[‘error’] and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

  • -3 The restriction n > 0, m > 0, radius > 0, or requirement that type contains its relevant string ‘dense’, ‘coordinate’, ‘sparse_by_rows’, ‘diagonal’, ‘scaled_identity’, ‘identity’, ‘zero’ or ‘none’ has been violated.

  • -9

    The analysis phase of the factorization failed; the return status from the factorization package is given by inform[‘factor_status’].

  • -10

    The factorization failed; the return status from the factorization package is given by inform[‘factor_status’].

  • -11

    The solution of a set of linear equations using factors from the factorization package failed; the return status from the factorization package is given by inform[‘factor_status’].

  • -15

    \(M\) does not appear to be strictly diagonally dominant

  • -16

    The problem is so ill-conditioned that further progress is impossible.

  • -19

    The CPU time limit has been reached. This may happen if options[‘cpu_time_limit’] is too small, but may also be symptomatic of a badly scaled problem.

  • -23

    An entry from the strict upper triangle of \(H\) has been specified.

alloc_statusint

the status of the last attempted allocation/deallocation.

bad_allocstr

the name of the array for which an allocation/deallocation error occurred.

factorizationsint

the number of factorizations performed.

max_entries_factorslong

the maximum number of entries in the factors.

len_historyint

the number of \((||x||_M,\lambda)\) pairs in the history.

objfloat

the value of the quadratic function.

x_normfloat

the \(M\)-norm of \(x\), \(||x||_M\).

multiplierfloat

the Lagrange multiplier corresponding to the trust-region constraint.

polefloat

a lower bound \(\max(0,-\lambda_1)\), where \(\lambda_1\) is the left-most eigenvalue of \((H,M)\).

dense_factorizationbool

was a dense factorization used?.

hard_casebool

has the hard case occurred?.

timedict
dictionary containing timing information:
totalfloat

total CPU time spent in the package.

assemblefloat

CPU time spent building \(H + \lambda M\).

analysefloat

CPU time spent reordering \(H + \lambda M\) prior to factorization.

factorizefloat

CPU time spent factorizing \(H + \lambda M\).

solvefloat

CPU time spent solving linear systems inolving \(H + \lambda M\).

clock_totalfloat

total clock time spent in the package.

clock_assemblefloat

clock time spent building \(H + \lambda M\).

clock_analysefloat

clock time spent reordering \(H + \lambda M\) prior to factorization.

clock_factorizefloat

clock time spent factorizing \(H + \lambda M\).

clock_solvefloat

clock time spent solving linear systems inolving \(H + \lambda M\).

historydict
dictionary recording the history of the iterates:
lambdandarray(100)

the values of \(\lambda\) for the first min(100, len_history) iterations.

x_normndarray(100)

the corresponding values of \(\|x(\lambda)\|_M\).

sls_informdict

inform parameters for SLS (see sls.information).

ir_informdict

inform parameters for IR (see ir.information).

trs.terminate()#

Deallocate all internal private storage.

example code#

from galahad import trs
import numpy as np
np.set_printoptions(precision=4,suppress=True,floatmode='fixed')
print("\n** python test: trs")

# set parameters
p = 1.0
n = 3
m = 1
infinity = float("inf")

#  describe objective function

f = 0.96
g = np.array([0.0,2.0,0.0])
H_type = 'coordinate'
H_ne = 4
H_row = np.array([0,1,2,2])
H_col = np.array([0,1,2,0])
H_ptr = None
H_val = np.array([1.0,2.0,3.0,4.0])

#  describe norm

M_type = 'coordinate'
M_ne = 3
M_row = np.array([0,1,2])
M_col = np.array([0,1,2])
M_ptr = None
M_val = np.array([1.0,2.0,1.0])

#  describe constraint

A_type = 'coordinate'
A_ne = 3
A_row = np.array([0,0,0])
A_col = np.array([0,1,2])
A_ptr = None
A_val = np.array([1.0,1.0,1.0])

# set trust-region radius

radius = 1.0

# allocate internal data and set default options
options = trs.initialize()

# set some non-default options
options['print_level'] = 0
#print("options:", options)

# load data (and optionally non-default options)
trs.load(n, H_type, H_ne, H_row, H_col, H_ptr, options)

# find minimum of quadratic within the trust region
print("\n solve problem 1")
x = trs.solve_problem(n, radius, f, g, H_ne, H_val)
print(" x:",x)

# get information
inform = trs.information()
print(" f: %.4f" % inform['obj'])

# load data (and optionally non-default options)
trs.load_m(n, M_type, M_ne, M_row, M_col, M_ptr)

# find minimum of quadratic within the trust region
print("\n solve problem 2 with additional non-unit norm")
x = trs.solve_problem(n, radius, f, g, H_ne, H_val, M_ne, M_val)
print(" x:",x)

# get information
inform = trs.information()
print(" f: %.4f" % inform['obj'])

# load data (and optionally non-default options)
trs.load_a(m, A_type, A_ne, A_row, A_col, A_ptr)

# find minimum of quadratic within the trust region
print("\n solve problem 3 with additional linear constraint")
x, y = trs.solve_problem(n, radius, f, g, H_ne, H_val,
                         M_ne, M_val, m, A_ne, A_val)
print(" x:",x)
print(" y:",y)

# get information
inform = trs.information()
print(" f: %.4f" % inform['obj'])
print('** trs exit status:', inform['status'])

# deallocate internal data

trs.terminate()

This example code is available in $GALAHAD/src/trs/Python/test_trs.py .