TRB#

purpose#

The trb package uses a trust-region method to find a (local) minimizer of a differentiable objective function $f(x)$ of many variables $x$, where the variables satisfy the simple bounds $x^l <= x <= x^u$. The method offers the choice of direct and iterative solution of the key subproblems, and is most suitable for large problems. First derivatives are required, and if second derivatives can be calculated, they will be exploited.

See Section 4 of $GALAHAD/doc/trb.pdf for additional details.

method#

A trust-region method is used. In this, an improvement to a current estimate of the required minimizer, $x_k$ is sought by computing a step $s_k$. The step is chosen to approximately minimize a model $m_k(s)$ of $f(x_k + s)$ within the intersection of the bound constraints $x^l \leq x \leq x^u$ and a trust region $\|s_k\| \leq \Delta_k$ for some specified positive “radius” $\Delta_k$. The quality of the resulting step $s_k$ is assessed by computing the “ratio” $(f(x_k) - f(x_k + s_k))/ (m_k(0) - m_k(s_k))$. The step is deemed to have succeeded if the ratio exceeds a given $\eta_s > 0$, and in this case $x_{k+1} = x_k + s_k$. Otherwise $x_{k+1} = x_k$, and the radius is reduced by powers of a given reduction factor until it is smaller than $\|s_k\|$. If the ratio is larger than $\eta_v \geq \eta_d$, the radius will be increased so that it exceeds $\|s_k\|$ by a given increase factor. The method will terminate as soon as $\|\nabla_x f(x_k)\|$ is smaller than a specified value.

Either linear or quadratic models $m_k(s)$ may be used. The former will be taken as the first two terms $f(x_k) + s^T \nabla_x f(x_k)$ of a Taylor series about $x_k$, while the latter uses an approximation to the first three terms $f(x_k) + s^T \nabla_x f(x_k) + \frac{1}{2} s^T B_k s$, for which $B_k$ is a symmetric approximation to the Hessian $\nabla_{xx} f(x_k)$; possible approximations include the true Hessian, limited-memory secant and sparsity approximations and a scaled identity matrix. Normally a two-norm trust region will be used, but this may change if preconditioning is employed.

The model minimization is carried out in two stages. Firstly, the so-called generalized Cauchy point for the quadratic subproblem is found—the purpose of this point is to ensure that the algorithm converges and that the set of bounds which are satisfied as equations at the solution is rapidly identified. Thereafter an improvement to the quadratic model on the face of variables predicted to be active by the Cauchy point is sought using either a direct approach involving factorization or an iterative (conjugate-gradient/Lanczos) approach based on approximations to the required solution from a so-called Krlov subspace. The direct approach is based on the knowledge that the required solution satisfies the linear system of equations $(B_k + \lambda_k I) s_k = - \nabla_x f(x_k)$, involving a scalar Lagrange multiplier $\lambda_k$, on the space of inactive variables. This multiplier is found by uni-variate root finding, using a safeguarded Newton-like process, by TRS or DPS (depending on the norm chosen). The iterative approach uses GLTR, and is best accelerated by preconditioning with good approximations to $B_k$ using PSLS. The iterative approach has the advantage that only matrix-vector products $B_k v$ are required, and thus $B_k$ is not required explicitly. However when factorizations of $B_k$ are possible, the direct approach is often more efficient.

The iteration is terminated as soon as the Euclidean norm of the projected gradient,

\[\|\min(\max( x_k - \nabla_x f(x_k), x^l), x^u) -x_k\|_2,\]

is sufficiently small. At such a point, $\nabla_x f(x_k) = z_k$, where the $i$-th dual variable $z_i$ is non-negative if $x_i$ is on its lower bound $x^l_i$, non-positive if $x_i$ is on its upper bound $x^u_i$, and zero if $x_i$ lies strictly between its bounds.

reference#

The generic bound-constrained trust-region method is described in detail in

A. R. Conn, N. I. M. Gould and Ph. L. Toint, Trust-region methods. SIAM/MPS Series on Optimization (2000).

matrix storage#

The symmetric $n$ by $n$ matrix $H = \nabla^2_{xx}f$ may be presented and stored in a variety of formats. But crucially symmetry is exploited by only storing values from the lower triangular part (i.e, those entries that lie on or below the leading diagonal).

Dense storage format: The matrix $H$ is stored as a compact dense matrix by rows, that is, the values of the entries of each row in turn are stored in order within an appropriate real one-dimensional array. Since $H$ is symmetric, only the lower triangular part (that is the part $H_{ij}$ for $0 <= j <= i <= n-1$) need be held. In this case the lower triangle should be stored by rows, that is component $i * i / 2 + j$ of the storage array H_val will hold the value $H_{ij}$ (and, by symmetry, $H_{ji}$) for $0 <= j <= i <= n-1$.

Sparse co-ordinate storage format: Only the nonzero entries of the matrices are stored. For the $l$-th entry, $0 <= l <= ne-1$, of $H$, its row index i, column index j and value $H_{ij}$, $0 <= j <= i <= n-1$, are stored as the $l$-th components of the integer arrays H_row and H_col and real array H_val, respectively, while the number of nonzeros is recorded as H_ne = $ne$. Note that only the entries in the lower triangle should be stored.

Sparse row-wise storage format: Again only the nonzero entries are stored, but this time they are ordered so that those in row i appear directly before those in row i+1. For the i-th row of $H$ the i-th component of the integer array H_ptr holds the position of the first entry in this row, while H_ptr(n) holds the total number of entries. The column indices j, $0 <= j <= i$, and values $H_{ij}$ of the entries in the i-th row are stored in components l = H_ptr(i), …, H_ptr(i+1)-1 of the integer array H_col, and real array H_val, respectively. Note that as before only the entries in the lower triangle should be stored. For sparse matrices, this scheme almost always requires less storage than its predecessor.

functions#

trb.initialize()#

Set default option values and initialize private data

Returns:

optionsdict

dictionary containing default control options:

errorint
error and warning diagnostics occur on stream error.

outint
general output occurs on stream out.

print_levelint
the level of output required. Possible values are:

<= 0

no output

1

a one-line summary for every improvement

2

a summary of each iteration

>= 3

increasingly verbose (debugging) output.

start_printint
any printing will start on this iteration.

stop_printint
any printing will stop on this iteration.

print_gapint
the number of iterations between printing.

maxitint
the maximum number of iterations performed.

alive_unitint
removal of the file alive_file from unit alive_unit terminates execution.

alive_filestr
see alive_unit.

more_toraldoint
more_toraldo >= 1 gives the number of More’-Toraldo projected searches to be used to improve upon the Cauchy point, anything else is for the standard add-one-at-a-time CG search.

non_monotoneint
non-monotone <= 0 monotone strategy used, anything else non-monotone strategy with this history length used.

modelint
the model used. Possible values are

0

dynamic (not yet implemented)

1

first-order (no Hessian)

2

second-order (exact Hessian)

3

barely second-order (identity Hessian)

4

secant second-order (sparsity-based)

5

secant second-order (limited-memory BFGS, with``lbfgs_vectors`` history) (not yet implemented)

6

secant second-order (limited-memory SR1, with lbfgs_vectors history) (not yet implemented).

normint
The norm is defined via $||v||^2 = v^T P v$, and will define the preconditioner used for iterative methods. Possible values for $P$ are

-3

users own preconditioner

-2

$P =$ limited-memory BFGS matrix (with lbfgs_vectors history)

-1

identity (= Euclidan two-norm)

0

automatic (not yet implemented)

1

diagonal, $P =$ diag( max( Hessian, min_diagonal ) )

2

banded, $P =$ band( Hessian ) with semi-bandwidth semi_bandwidth

3

re-ordered band, P=band(order(A)) with semi-bandwidth semi_bandwidth

4

full factorization, $P =$ Hessian, Schnabel-Eskow modification

5

full factorization, $P =$ Hessian, GMPS modification (not yet implemented)

6

incomplete factorization of Hessian, Lin-More’

7

incomplete factorization of Hessian, HSL_MI28

8

incomplete factorization of Hessian, Munskgaard (not yet implemented)

9

expanding band of Hessian (not yet implemented).

semi_bandwidthint
specify the semi-bandwidth of the band matrix $P$ if required.

lbfgs_vectorsint
number of vectors used by the L-BFGS matrix $P$ if required.

max_dxgint
number of vectors used by the sparsity-based secant Hessian if required.

icfs_vectorsint
number of vectors used by the Lin-More’ incomplete factorization matrix $P$ if required.

mi28_lsizeint
the maximum number of fill entries within each column of the incomplete factor L computed by HSL_MI28. In general, increasing mi28_lsize improve the quality of the preconditioner but increases the time to compute and then apply the preconditioner. Values less than 0 are treated as 0.

mi28_rsizeint
the maximum number of entries within each column of the strictly lower triangular matrix $R$ used in the computation of the preconditioner by HSL_MI28. Rank-1 arrays of size mi28_rsize * n are allocated internally to hold $R$. Thus the amount of memory used, as well as the amount of work involved in computing the preconditioner, depends on mi28_rsize. Setting mi28_rsize > 0 generally leads to a higher quality preconditioner than using mi28_rsize = 0, and choosing mi28_rsize >= mi28_lsize is generally recommended.

advanced_startint
iterates of a variant on the strategy of Sartenaer SISC 18(6)1990:1788-1803.

infinityfloat
any bound larger than infinity in modulus will be regarded as infinite.

stop_pg_absolutefloat
overall convergence tolerances. The iteration will terminate when the norm of the gradient of the objective function is smaller than MAX( stop_pg_absolute, stop_pg_relative * norm of the initial gradient ) or if the step is less than stop_s.

stop_pg_relativefloat
see stop_pg_absolute.

stop_sfloat
see stop_pg_absolute.

initial_radiusfloat
initial value for the trust-region radius.

maximum_radiusfloat
maximum permitted trust-region radius.

stop_rel_cgfloat
required relative reduction in the resuiduals from CG.

eta_successfulfloat
a potential iterate will only be accepted if the actual decrease f - f(x_new) is larger than eta_successful times that predicted by a quadratic model of the decrease. The trust-region radius will be increased if this relative decrease is greater than eta_very_successful but smaller than eta_too_successful.

eta_very_successfulfloat
see eta_successful.

eta_too_successfulfloat
see eta_successful.

radius_increasefloat
on very successful iterations, the trust-region radius will be increased the factor radius_increase, while if the iteration is unsucceful, the radius will be decreased by a factor radius_reduce but no more than radius_reduce_max.

radius_reducefloat
see radius_increase.

radius_reduce_maxfloat
see radius_increase.

obj_unboundedfloat
the smallest value the objective function may take before the problem is marked as unbounded.

cpu_time_limitfloat
the maximum CPU time allowed (-ve means infinite).

clock_time_limitfloat
the maximum elapsed clock time allowed (-ve means infinite).

hessian_availablebool
is the Hessian matrix of second derivatives available or is access only via matrix-vector products?.

subproblem_directbool
use a direct (factorization) or (preconditioned) iterative method to find the search direction.

retrospective_trust_regionbool
is a retrospective strategy to be used to update the trust-region radius.

renormalize_radiusbool
should the radius be renormalized to account for a change in preconditioner?.

two_norm_trbool
should an ellipsoidal trust-region be used rather than an infinity norm one?.

exact_gcpbool
is the exact Cauchy point required rather than an approximation?.

accurate_bqpbool
should the minimizer of the quadratic model within the intersection of the trust-region and feasible box be found (to a prescribed accuracy) rather than a (much) cheaper approximation?.

space_criticalbool
if space_critical is True, every effort will be made to use as little space as possible. This may result in longer computation time.

deallocate_error_fatalbool
if deallocate_error_fatal is True, any array/pointer deallocation error will terminate execution. Otherwise, computation will continue.

prefixstr
all output lines will be prefixed by the string contained in quotes within prefix, e.g. ‘word’ (note the qutoes) will result in the prefix word.

trs_optionsdict
default control options for TRS (see trs.initialize).

gltr_optionsdict
default control options for GLTR (see gltr.initialize).

psls_optionsdict
default control options for PSLS (see psls.initialize).

lms_optionsdict
default control options for LMS (see lms.initialize).

lms_prec_optionsdict
default control options for LMS (see lms.initialize).

sha_optionsdict
default control options for SHA (see sha.initialize).

trb.load(n, H_type, H_ne, H_row, H_col, H_ptr, options=None)#

Import problem data into internal storage prior to solution.

Parameters:

nint
holds the number of variables.

H_typestring
specifies the symmetric storage scheme used for the Hessian. It should be one of ‘coordinate’, ‘sparse_by_rows’, ‘dense’, ‘diagonal’ or ‘absent’, the latter if access to the Hessian is via matrix-vector products; lower or upper case variants are allowed.

H_neint
holds the number of entries in the lower triangular part of $H$ in the sparse co-ordinate storage scheme. It need not be set for any of the other three schemes.

H_rowndarray(H_ne)
holds the row indices of the lower triangular part of $H$ in the sparse co-ordinate storage scheme. It need not be set for any of the other three schemes, and in this case can be None

H_colndarray(H_ne)
holds the column indices of the lower triangular part of $H$ in either the sparse co-ordinate, or the sparse row-wise storage scheme. It need not be set when the dense or diagonal storage schemes are used, and in this case can be None

H_ptrndarray(n+1)
holds the starting position of each row of the lower triangular part of $H$, as well as the total number of entries, in the sparse row-wise storage scheme. It need not be set when the other schemes are used, and in this case can be None

optionsdict, optional
dictionary of control options (see trb.initialize).

trb.solve(n, H_ne, x_l, x_u, x, g, eval_f, eval_g, eval_h))#

Find an approximate local minimizer of a given function subject to simple bounds on the variables using a trust-region method.

Parameters:

nint
holds the number of variables.

H_neint
holds the number of entries in the lower triangular part of $H$.

x_lndarray(n)
holds the values $x^l$ of the lower bounds on the optimization variables $x$.

x_undarray(n)
holds the values $x^u$ of the upper bounds on the optimization variables $x$.

xndarray(n)
holds the values of optimization variables $x$.

eval_fcallable
a user-defined function that must have the signature:

f = eval_f(x)

The value of the objective function $f(x)$ evaluated at $x$ must be assigned to f.

eval_gcallable
a user-defined function that must have the signature:

g = eval_g(x)

The components of the gradient $\nabla f(x)$ of the objective function evaluated at $x$ must be assigned to g.

eval_hcallable
a user-defined function that must have the signature:

h = eval_h(x)

The components of the nonzeros in the lower triangle of the Hessian $\nabla^2 f(x)$ of the objective function evaluated at $x$ must be assigned to h in the same order as specified in the sparsity pattern in trb.load.

Returns:

xndarray(n)
holds the value of the approximate global minimizer $x$ after a successful call.

gndarray(n)
holds the gradient $\nabla f(x)$ of the objective function.

[optional] trb.information()

Provide optional output information

Returns:

informdict

dictionary containing output information:

statusint
return status. Possible values are:

0

The run was successful.

-1

An allocation error occurred. A message indicating the offending array is written on unit options[‘error’], and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

-2

A deallocation error occurred. A message indicating the offending array is written on unit options[‘error’] and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

-3

The restriction n > 0 or requirement that type contains its relevant string ‘dense’, ‘coordinate’, ‘sparse_by_rows’, ‘diagonal’ or ‘absent’ has been violated.

-7

The objective function appears to be unbounded from below.

-9

The analysis phase of the factorization failed; the return status from the factorization package is given by inform[‘factor_status’].

-10

The factorization failed; the return status from the factorization package is given by inform[‘factor_status’].

-11

The solution of a set of linear equations using factors from the factorization package failed; the return status from the factorization package is given by inform[‘factor_status’].

-16

The problem is so ill-conditioned that further progress is impossible.

-18

Too many iterations have been performed. This may happen if options[‘maxit’] is too small, but may also be symptomatic of a badly scaled problem.

-19

The CPU time limit has been reached. This may happen if options[‘cpu_time_limit’] is too small, but may also be symptomatic of a badly scaled problem.

-82

The user has forced termination of the solver by removing the file named options[‘alive_file’] from unit options[‘alive_unit’].

alloc_statusint
the status of the last attempted allocation/deallocation.

bad_allocstr
the name of the array for which an allocation/deallocation error occurred.

iterint
the total number of iterations performed.

cg_iterint
the total number of CG iterations performed.

cg_maxitint
the maximum number of CG iterations allowed per iteration.

f_evalint
the total number of evaluations of the objective function.

g_evalint
the total number of evaluations of the gradient of the objective function.

h_evalint
the total number of evaluations of the Hessian of the objective function.

n_freeint
the number of variables that are free from their bounds.

factorization_maxint
the maximum number of factorizations in a sub-problem solve.

factorization_statusint
the return status from the factorization.

max_entries_factorsint
the maximum number of entries in the factors.

factorization_integerint
the total integer workspace required for the factorization.

factorization_realint
the total real workspace required for the factorization.

objfloat
the value of the objective function at the best estimate of the solution determined by trb.solve.

norm_pgfloat
the norm of the projected gradient of the objective function at the best estimate of the solution determined by TRB_solve.

radiusfloat
the current value of the trust-region radius.

timedict

dictionary containing timing information:

totalfloat
the total CPU time spent in the package.

preprocessfloat
the CPU time spent preprocessing the problem.

analysefloat
the CPU time spent analysing the required matrices prior to factorization.

factorizefloat
the CPU time spent factorizing the required matrices.

solvefloat
the CPU time spent computing the search direction.

clock_totalfloat
the total clock time spent in the package.

clock_preprocessfloat
the clock time spent preprocessing the problem.

clock_analysefloat
the clock time spent analysing the required matrices prior to factorization.

clock_factorizefloat
the clock time spent factorizing the required matrices.

clock_solvefloat
the clock time spent computing the search direction.

trs_informdict
inform parameters for TRS (see trs.information).

gltr_informdict
inform parameters for GLTR (see gltr.information).

psls_informdict
inform parameters for PSLS (see psls.information).

lms_informdict
inform parameters for LMS (see lms.information).

lms_prec_informdict
inform parameters for LMS used for preconditioning (see lms.information).

sha_informdict
inform parameters for SHA (see sha.information).

trb.terminate()#

Deallocate all internal private storage.

example code#

from galahad import trb
import numpy as np
np.set_printoptions(precision=4,suppress=True,floatmode='fixed')
print("\n** python test: trb")

# allocate internal data and set default options
options = trb.initialize()

# set some non-default options
options['print_level'] = 0
#options['trs_options']['print_level'] = 0
#print("options:", options)

# set parameters
p = 4
# set bounds
n = 3
x_l = np.array([-np.inf,-np.inf,0.0])
x_u = np.array([1.1,1.1,1.1])

# set Hessian sparsity
H_type = 'coordinate'
H_ne = 5
H_row = np.array([0,2,1,2,2])
H_col = np.array([0,0,1,1,2])
H_ptr = None

# load data (and optionally non-default options)
trb.load(n, H_type, H_ne, H_row, H_col, H_ptr, options=options)

# define objective function and its derivatives
def eval_f(x):
    return (x[0] + x[2] + p)**2 + (x[1] + x[2])**2 + np.cos(x[0])
def eval_g(x):
    return np.array([2.0* ( x[0] + x[2] + p ) - np.sin(x[0]),
                     2.0* ( x[1] + x[2] ),
                     2.0* ( x[0] + x[2] + p ) + 2.0 * ( x[1] + x[2] )])
def eval_h(x):
    return np.array([2. - np.cos(x[0]),2.0,2.0,2.0,4.0])

# set starting point
x = np.array([1.,1.,1.])

# find optimum
x, g = trb.solve(n, H_ne, x_l, x_u, x, eval_f, eval_g, eval_h)
print(" x:",x)
print(" g:",g)

# get information
inform = trb.information()
#print(inform)
print(" f: %.4f" % inform['obj'])
print('** trb exit status:', inform['status'])

# deallocate internal data
trb.terminate()

This example code is available in $GALAHAD/src/trb/Python/test_trb.py .