TRU#

purpose#

The tru package uses a trust-region method to find a (local) minimizer of a differentiable objective function \(f(x)\) of many variables \(x\). The method offers the choice of direct and iterative solution of the key subproblems, and is most suitable for large problems. First derivatives are required, and if second derivatives can be calculated, they will be exploited.

See Section 4 of $GALAHAD/doc/tru.pdf for additional details.

method#

A trust-region method is used. In this, an improvement to a current estimate of the required minimizer, \(x_k\) is sought by computing a step \(s_k\). The step is chosen to approximately minimize a model \(m_k(s)\) of \(f(x_k + s)\) within a trust region \(\|s_k\| \leq \Delta_k\) for some specified positive “radius” \(\Delta_k\). The quality of the resulting step \(s_k\) is assessed by computing the “ratio” \((f(x_k) - f(x_k + s_k))/ (m_k(0) - m_k(s_k))\). The step is deemed to have succeeded if the ratio exceeds a given \(\eta_s > 0\), and in this case \(x_{k+1} = x_k + s_k\). Otherwise \(x_{k+1} = x_k\), and the radius is reduced by powers of a given reduction factor until it is smaller than \(\|s_k\|\). If the ratio is larger than \(\eta_v \geq \eta_d\), the radius will be increased so that it exceeds \(\|s_k\|\) by a given increase factor. The method will terminate as soon as \(\|\nabla_x f(x_k)\|\) is smaller than a specified value.

Either linear or quadratic models \(m_k(s)\) may be used. The former will be taken as the first two terms \(f(x_k) + s^T \nabla_x f(x_k)\) of a Taylor series about \(x_k\), while the latter uses an approximation to the first three terms \(f(x_k) + s^T \nabla_x f(x_k) + \frac{1}{2} s^T B_k s\), for which \(B_k\) is a symmetric approximation to the Hessian \(\nabla_{xx} f(x_k)\); possible approximations include the true Hessian, limited-memory secant and sparsity approximations and a scaled identity matrix. Normally a two-norm trust region will be used, but this may change if preconditioning is employed.

An approximate minimizer of the model within the trust region is found using either a direct approach involving factorization or an iterative (conjugate-gradient/Lanczos) approach based on approximations to the required solution from a so-called Krlov subspace. The direct approach is based on the knowledge that the required solution satisfies the linear system of equations \((B_k + \lambda_k I) s_k = - \nabla_x f(x_k)\) involving a scalar Lagrange multiplier \(\lambda_k\). This multiplier is found by uni-variate root finding, using a safeguarded Newton-like process, by TRS or DPS (depending on the norm chosen). The iterative approach uses GLTR, and is best accelerated by preconditioning with good approximations to \(B_k\) using PSLS. The iterative approach has the advantage that only matrix-vector products \(B_k v\) are required, and thus \(B_k\) is not required explicitly. However when factorizations of \(B_k\) are possible, the direct approach is often more efficient.

reference#

The generic trust-region method is described in detail in

A. R. Conn, N. I. M. Gould and Ph. L. Toint, Trust-region methods. SIAM/MPS Series on Optimization (2000).

matrix storage#

The symmetric \(n\) by \(n\) matrix \(H = \nabla^2_{xx}f\) may be presented and stored in a variety of formats. But crucially symmetry is exploited by only storing values from the lower triangular part (i.e, those entries that lie on or below the leading diagonal).

Dense storage format: The matrix \(H\) is stored as a compact dense matrix by rows, that is, the values of the entries of each row in turn are stored in order within an appropriate real one-dimensional array. Since \(H\) is symmetric, only the lower triangular part (that is the part \(H_{ij}\) for \(0 <= j <= i <= n-1\)) need be held. In this case the lower triangle should be stored by rows, that is component \(i * i / 2 + j\) of the storage array H_val will hold the value \(H_{ij}\) (and, by symmetry, \(H_{ji}\)) for \(0 <= j <= i <= n-1\).

Sparse co-ordinate storage format: Only the nonzero entries of the matrices are stored. For the \(l\)-th entry, \(0 <= l <= ne-1\), of \(H\), its row index i, column index j and value \(H_{ij}\), \(0 <= j <= i <= n-1\), are stored as the \(l\)-th components of the integer arrays H_row and H_col and real array H_val, respectively, while the number of nonzeros is recorded as H_ne = \(ne\). Note that only the entries in the lower triangle should be stored.

Sparse row-wise storage format: Again only the nonzero entries are stored, but this time they are ordered so that those in row i appear directly before those in row i+1. For the i-th row of \(H\) the i-th component of the integer array H_ptr holds the position of the first entry in this row, while H_ptr(n) holds the total number of entries. The column indices j, \(0 <= j <= i\), and values \(H_{ij}\) of the entries in the i-th row are stored in components l = H_ptr(i), …, H_ptr(i+1)-1 of the integer array H_col, and real array H_val, respectively. Note that as before only the entries in the lower triangle should be stored. For sparse matrices, this scheme almost always requires less storage than its predecessor.

functions#

tru.initialize()#

Set default option values and initialize private data

Returns:

optionsdict
dictionary containing default control options:
errorint

error and warning diagnostics occur on stream error.

outint

general output occurs on stream out.

print_levelint

the level of output required. Possible values are:

  • <= 0

    no output

  • 1

    a one-line summary for every improvement

  • 2

    a summary of each iteration

  • >= 3

    increasingly verbose (debugging) output.

start_printint

any printing will start on this iteration.

stop_printint

any printing will stop on this iteration.

print_gapint

the number of iterations between printing.

maxitint

the maximum number of iterations performed.

alive_unitint

removal of the file alive_file from unit alive_unit terminates execution.

alive_filestr

see alive_unit.

more_toraldoint

more_toraldo >= 1 gives the number of More’-Toraldo projected searches to be used to improve upon the Cauchy point, anything else is for the standard add-one-at-a-time CG search.

non_monotoneint

non-monotone <= 0 monotone strategy used, anything else non-monotone strategy with this history length used.

modelint

the model used. Possible values are

  • 0

    dynamic (not yet implemented)

  • 1

    first-order (no Hessian)

  • 2

    second-order (exact Hessian)

  • 3

    barely second-order (identity Hessian)

  • 4

    secant second-order (sparsity-based)

  • 5

    secant second-order (limited-memory BFGS, with``lbfgs_vectors`` history) (not yet implemented)

  • 6

    secant second-order (limited-memory SR1, with lbfgs_vectors history) (not yet implemented).

normint

The norm is defined via \(||v||^2 = v^T P v\), and will define the preconditioner used for iterative methods. Possible values for \(P\) are

  • -3

    users own preconditioner

  • -2

    \(P =\) limited-memory BFGS matrix (with lbfgs_vectors history)

  • -1

    identity (= Euclidan two-norm)

  • 0

    automatic (not yet implemented)

  • 1

    diagonal, \(P =\) diag( max( Hessian, min_diagonal ) )

  • 2

    banded, \(P =\) band( Hessian ) with semi-bandwidth semi_bandwidth

  • 3

    re-ordered band, P=band(order(A)) with semi-bandwidth semi_bandwidth

  • 4

    full factorization, \(P =\) Hessian, Schnabel-Eskow modification

  • 5

    full factorization, \(P =\) Hessian, GMPS modification (not yet implemented)

  • 6

    incomplete factorization of Hessian, Lin-More’

  • 7

    incomplete factorization of Hessian, HSL_MI28

  • 8

    incomplete factorization of Hessian, Munskgaard (not yet implemented)

  • 9

    expanding band of Hessian (not yet implemented).

semi_bandwidthint

specify the semi-bandwidth of the band matrix \(P\) if required.

lbfgs_vectorsint

number of vectors used by the L-BFGS matrix \(P\) if required.

max_dxgint

number of vectors used by the sparsity-based secant Hessian if required.

icfs_vectorsint

number of vectors used by the Lin-More’ incomplete factorization matrix \(P\) if required.

mi28_lsizeint

the maximum number of fill entries within each column of the incomplete factor L computed by HSL_MI28. In general, increasing mi28_lsize improve the quality of the preconditioner but increases the time to compute and then apply the preconditioner. Values less than 0 are treated as 0.

mi28_rsizeint

the maximum number of entries within each column of the strictly lower triangular matrix \(R\) used in the computation of the preconditioner by HSL_MI28. Rank-1 arrays of size mi28_rsize * n are allocated internally to hold \(R\). Thus the amount of memory used, as well as the amount of work involved in computing the preconditioner, depends on mi28_rsize. Setting mi28_rsize > 0 generally leads to a higher quality preconditioner than using mi28_rsize = 0, and choosing mi28_rsize >= mi28_lsize is generally recommended.

advanced_startint

try to pick a good initial trust-region radius using advanced_start iterates of a variant on the strategy of Sartenaer SISC 18(6) 1990:1788-1803.

stop_g_absolutefloat

overall convergence tolerances. The iteration will terminate when the norm of the gradient of the objective function is smaller than MAX( stop_g_absolute, stop_g_relative * norm of the initial gradient ) or if the step is less than stop_s.

stop_pg_relativefloat

see stop_g_absolute.

stop_sfloat

see stop_g_absolute.

initial_radiusfloat

initial value for the trust-region radius.

maximum_radiusfloat

maximum permitted trust-region radius.

eta_successfulfloat

a potential iterate will only be accepted if the actual decrease f - f(x_new) is larger than eta_successful times that predicted by a quadratic model of the decrease. The trust-region radius will be increased if this relative decrease is greater than eta_very_successful but smaller than eta_too_successful.

eta_very_successfulfloat

see eta_successful.

eta_too_successfulfloat

see eta_successful.

radius_increasefloat

on very successful iterations, the trust-region radius will be increased the factor radius_increase, while if the iteration is unsucceful, the radius will be decreased by a factor radius_reduce but no more than radius_reduce_max.

radius_reducefloat

see radius_increase.

radius_reduce_maxfloat

see radius_increase.

obj_unboundedfloat

the smallest value the objective function may take before the problem is marked as unbounded.

cpu_time_limitfloat

the maximum CPU time allowed (-ve means infinite).

clock_time_limitfloat

the maximum elapsed clock time allowed (-ve means infinite).

hessian_availablebool

is the Hessian matrix of second derivatives available or is access only via matrix-vector products?.

subproblem_directbool

use a direct (factorization) or (preconditioned) iterative method to find the search direction.

retrospective_trust_regionbool

is a retrospective strategy to be used to update the trust-region radius.

renormalize_radiusbool

should the radius be renormalized to account for a change in preconditioner?.

space_criticalbool

if space_critical is True, every effort will be made to use as little space as possible. This may result in longer computation time.

deallocate_error_fatalbool

if deallocate_error_fatal is True, any array/pointer deallocation error will terminate execution. Otherwise, computation will continue.

prefixstr

all output lines will be prefixed by the string contained in quotes within prefix, e.g. ‘word’ (note the qutoes) will result in the prefix word.

trs_optionsdict

default control options for TRS (see trs.initialize).

gltr_optionsdict

default control options for GLTR (see gltr.initialize).

psls_optionsdict

default control options for PSLS (see psls.initialize).

lms_optionsdict

default control options for LMS (see lms.initialize).

lms_prec_optionsdict

default control options for LMS (see lms.initialize).

sha_optionsdict

default control options for SHA (see sha.initialize).

tru.load(n, H_type, H_ne, H_row, H_col, H_ptr, options=None)#

Import problem data into internal storage prior to solution.

Parameters:

nint

holds the number of variables.

H_typestring

specifies the symmetric storage scheme used for the Hessian. It should be one of ‘coordinate’, ‘sparse_by_rows’, ‘dense’, ‘diagonal’ or ‘absent’, the latter if access to the Hessian is via matrix-vector products; lower or upper case variants are allowed.

H_neint

holds the number of entries in the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other three schemes.

H_rowndarray(H_ne)

holds the row indices of the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other three schemes, and in this case can be None

H_colndarray(H_ne)

holds the column indices of the lower triangular part of \(H\) in either the sparse co-ordinate, or the sparse row-wise storage scheme. It need not be set when the dense or diagonal storage schemes are used, and in this case can be None

H_ptrndarray(n+1)

holds the starting position of each row of the lower triangular part of \(H\), as well as the total number of entries, in the sparse row-wise storage scheme. It need not be set when the other schemes are used, and in this case can be None

optionsdict, optional

dictionary of control options (see tru.initialize).

tru.solve(n, H_ne, x, eval_f, eval_g, eval_h))#

Find an approximate local unconstrained minimizer of a given function using a trust-region method.

Parameters:

nint

holds the number of variables.

H_neint

holds the number of entries in the lower triangular part of \(H\).

xndarray(n)

holds the values of optimization variables \(x\).

eval_fcallable

a user-defined function that must have the signature:

f = eval_f(x)

The value of the objective function \(f(x)\) evaluated at \(x\) must be assigned to f.

eval_gcallable

a user-defined function that must have the signature:

g = eval_g(x)

The components of the gradient \(\nabla f(x)\) of the objective function evaluated at \(x\) must be assigned to g.

eval_hcallable

a user-defined function that must have the signature:

h = eval_h(x)

The components of the nonzeros in the lower triangle of the Hessian \(\nabla^2 f(x)\) of the objective function evaluated at \(x\) must be assigned to h in the same order as specified in the sparsity pattern in tru.load.

Returns:

xndarray(n)

holds the value of the approximate global minimizer \(x\) after a successful call.

gndarray(n)

holds the gradient \(\nabla f(x)\) of the objective function.

[optional] tru.information()

Provide optional output information

Returns:

informdict
dictionary containing output information:
statusint

return status. Possible values are:

  • 0

    The run was successful.

  • -1

    An allocation error occurred. A message indicating the offending array is written on unit options[‘error’], and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

  • -2

    A deallocation error occurred. A message indicating the offending array is written on unit options[‘error’] and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

  • -3

    The restriction n > 0 or requirement that type contains its relevant string ‘dense’, ‘coordinate’, ‘sparse_by_rows’, ‘diagonal’ or ‘absent’ has been violated.

  • -7

    The objective function appears to be unbounded from below.

  • -9

    The analysis phase of the factorization failed; the return status from the factorization package is given by inform[‘factor_status’].

  • -10

    The factorization failed; the return status from the factorization package is given by inform[‘factor_status’].

  • -11

    The solution of a set of linear equations using factors from the factorization package failed; the return status from the factorization package is given by inform[‘factor_status’].

  • -16

    The problem is so ill-conditioned that further progress is impossible.

  • -18

    Too many iterations have been performed. This may happen if options[‘maxit’] is too small, but may also be symptomatic of a badly scaled problem.

  • -19

    The CPU time limit has been reached. This may happen if options[‘cpu_time_limit’] is too small, but may also be symptomatic of a badly scaled problem.

  • -82

    The user has forced termination of the solver by removing the file named options[‘alive_file’] from unit options[‘alive_unit’].

alloc_statusint

the status of the last attempted allocation/deallocation.

bad_allocstr

the name of the array for which an allocation/deallocation error occurred.

iterint

the total number of iterations performed.

cg_iterint

the total number of CG iterations performed.

f_evalint

the total number of evaluations of the objective function.

g_evalint

the total number of evaluations of the gradient of the objective function.

h_evalint

the total number of evaluations of the Hessian of the objective function.

factorization_maxint

the maximum number of factorizations in a sub-problem solve.

factorization_statusint

the return status from the factorization.

max_entries_factorsint

the maximum number of entries in the factors.

factorization_integerint

the total integer workspace required for the factorization.

factorization_realint

the total real workspace required for the factorization.

objfloat

the value of the objective function at the best estimate of the solution determined by tru.solve.

norm_gfloat

the norm of the gradient of the objective function at the best estimate of the solution determined by tru.solve.

radiusfloat

the current value of the trust-region radius.

timedict
dictionary containing timing information:
totalfloat

the total CPU time spent in the package.

preprocessfloat

the CPU time spent preprocessing the problem.

analysefloat

the CPU time spent analysing the required matrices prior to factorization.

factorizefloat

the CPU time spent factorizing the required matrices.

solvefloat

the CPU time spent computing the search direction.

clock_totalfloat

the total clock time spent in the package.

clock_preprocessfloat

the clock time spent preprocessing the problem.

clock_analysefloat

the clock time spent analysing the required matrices prior to factorization.

clock_factorizefloat

the clock time spent factorizing the required matrices.

clock_solvefloat

the clock time spent computing the search direction.

trs_informdict

inform parameters for TRS (see trs.information).

gltr_informdict

inform parameters for GLTR (see gltr.information).

psls_informdict

inform parameters for PSLS (see psls.information).

lms_informdict

inform parameters for LMS (see lms.information).

lms_prec_informdict

inform parameters for LMS used for preconditioning (see lms.information).

sha_informdict

inform parameters for SHA (see sha.information).

tru.terminate()#

Deallocate all internal private storage.

example code#

from galahad import tru
import numpy as np
np.set_printoptions(precision=4,suppress=True,floatmode='fixed')
print("\n** python test: tru")

# allocate internal data and set default options
options = tru.initialize()

# set some non-default options
options['print_level'] = 0
#options['trs_options']['print_level'] = 0
#print("options:", options)

# set parameters
p = 4
# set bounds
n = 3

# set Hessian sparsity
H_type = 'coordinate'
H_ne = 5
H_row = np.array([0,2,1,2,2])
H_col = np.array([0,0,1,1,2])
H_ptr = None

# load data (and optionally non-default options)
tru.load(n, H_type, H_ne, H_row, H_col, H_ptr, options=options)

# define objective function and its derivatives
def eval_f(x):
    return (x[0] + x[2] + p)**2 + (x[1] + x[2])**2 + np.cos(x[0])
def eval_g(x):
    return np.array([2.0* ( x[0] + x[2] + p ) - np.sin(x[0]),
                     2.0* ( x[1] + x[2] ),
                     2.0* ( x[0] + x[2] + p ) + 2.0 * ( x[1] + x[2] )])
def eval_h(x):
    return np.array([2. - np.cos(x[0]),2.0,2.0,2.0,4.0])

# set starting point
x = np.array([1.,1.,1.])

# find optimum
x, g = tru.solve(n, H_ne, x, eval_f, eval_g, eval_h)
print(" x:",x)
print(" g:",g)

# get information
inform = tru.information()
#print(inform)
print(" f: %.4f" % inform['obj'])
print('** trb exit status:', inform['status'])

# deallocate internal data
tru.terminate()

This example code is available in $GALAHAD/src/tru/Python/test_tru.py .