TRU#
purpose#
The tru
package uses a trust-region method to find a (local)
minimizer of a differentiable objective function \(f(x)\) of
many variables \(x\). The method offers the choice of direct
and iterative solution of the key subproblems, and is most
suitable for large problems. First derivatives are required, and
if second derivatives can be calculated, they will be exploited.
See Section 4 of $GALAHAD/doc/tru.pdf for additional details.
method#
A trust-region method is used. In this, an improvement to a current estimate of the required minimizer, \(x_k\) is sought by computing a step \(s_k\). The step is chosen to approximately minimize a model \(m_k(s)\) of \(f(x_k + s)\) within a trust region \(\|s_k\| \leq \Delta_k\) for some specified positive “radius” \(\Delta_k\). The quality of the resulting step \(s_k\) is assessed by computing the “ratio” \((f(x_k) - f(x_k + s_k))/ (m_k(0) - m_k(s_k))\). The step is deemed to have succeeded if the ratio exceeds a given \(\eta_s > 0\), and in this case \(x_{k+1} = x_k + s_k\). Otherwise \(x_{k+1} = x_k\), and the radius is reduced by powers of a given reduction factor until it is smaller than \(\|s_k\|\). If the ratio is larger than \(\eta_v \geq \eta_d\), the radius will be increased so that it exceeds \(\|s_k\|\) by a given increase factor. The method will terminate as soon as \(\|\nabla_x f(x_k)\|\) is smaller than a specified value.
Either linear or quadratic models \(m_k(s)\) may be used. The former will be taken as the first two terms \(f(x_k) + s^T \nabla_x f(x_k)\) of a Taylor series about \(x_k\), while the latter uses an approximation to the first three terms \(f(x_k) + s^T \nabla_x f(x_k) + \frac{1}{2} s^T B_k s\), for which \(B_k\) is a symmetric approximation to the Hessian \(\nabla_{xx} f(x_k)\); possible approximations include the true Hessian, limited-memory secant and sparsity approximations and a scaled identity matrix. Normally a two-norm trust region will be used, but this may change if preconditioning is employed.
An approximate minimizer of the model within the trust region
is found using either a direct approach involving factorization or an
iterative (conjugate-gradient/Lanczos) approach based on approximations
to the required solution from a so-called Krlov subspace. The direct
approach is based on the knowledge that the required solution
satisfies the linear system of equations \((B_k + \lambda_k I) s_k
= - \nabla_x f(x_k)\) involving a scalar Lagrange multiplier \(\lambda_k\).
This multiplier is found by uni-variate root finding, using a safeguarded
Newton-like process, by TRS
or DPS
(depending on the norm chosen). The iterative approach
uses GLTR
, and is best accelerated by preconditioning
with good approximations to \(B_k\) using PSLS
. The
iterative approach has the advantage that only matrix-vector products
\(B_k v\) are required, and thus \(B_k\) is not required explicitly.
However when factorizations of \(B_k\) are possible, the direct approach
is often more efficient.
reference#
The generic trust-region method is described in detail in
A. R. Conn, N. I. M. Gould and Ph. L. Toint, Trust-region methods. SIAM/MPS Series on Optimization (2000).
matrix storage#
The symmetric \(n\) by \(n\) matrix \(H = \nabla^2_{xx}f\) may be presented and stored in a variety of formats. But crucially symmetry is exploited by only storing values from the lower triangular part (i.e, those entries that lie on or below the leading diagonal).
Dense storage format: The matrix \(H\) is stored as a compact dense matrix by rows, that is, the values of the entries of each row in turn are stored in order within an appropriate real one-dimensional array. Since \(H\) is symmetric, only the lower triangular part (that is the part \(H_{ij}\) for \(0 <= j <= i <= n-1\)) need be held. In this case the lower triangle should be stored by rows, that is component \(i * i / 2 + j\) of the storage array H_val will hold the value \(H_{ij}\) (and, by symmetry, \(H_{ji}\)) for \(0 <= j <= i <= n-1\).
Sparse co-ordinate storage format: Only the nonzero entries of the matrices are stored. For the \(l\)-th entry, \(0 <= l <= ne-1\), of \(H\), its row index i, column index j and value \(H_{ij}\), \(0 <= j <= i <= n-1\), are stored as the \(l\)-th components of the integer arrays H_row and H_col and real array H_val, respectively, while the number of nonzeros is recorded as H_ne = \(ne\). Note that only the entries in the lower triangle should be stored.
Sparse row-wise storage format: Again only the nonzero entries are stored, but this time they are ordered so that those in row i appear directly before those in row i+1. For the i-th row of \(H\) the i-th component of the integer array H_ptr holds the position of the first entry in this row, while H_ptr(n) holds the total number of entries. The column indices j, \(0 <= j <= i\), and values \(H_{ij}\) of the entries in the i-th row are stored in components l = H_ptr(i), …, H_ptr(i+1)-1 of the integer array H_col, and real array H_val, respectively. Note that as before only the entries in the lower triangle should be stored. For sparse matrices, this scheme almost always requires less storage than its predecessor.
functions#
- tru.initialize()#
Set default option values and initialize private data
Returns:
- optionsdict
- dictionary containing default control options:
- errorint
error and warning diagnostics occur on stream error.
- outint
general output occurs on stream out.
- print_levelint
the level of output required. Possible values are:
<= 0
no output
1
a one-line summary for every improvement
2
a summary of each iteration
>= 3
increasingly verbose (debugging) output.
- start_printint
any printing will start on this iteration.
- stop_printint
any printing will stop on this iteration.
- print_gapint
the number of iterations between printing.
- maxitint
the maximum number of iterations performed.
- alive_unitint
removal of the file alive_file from unit alive_unit terminates execution.
- alive_filestr
see alive_unit.
- more_toraldoint
more_toraldo >= 1 gives the number of More’-Toraldo projected searches to be used to improve upon the Cauchy point, anything else is for the standard add-one-at-a-time CG search.
- non_monotoneint
non-monotone <= 0 monotone strategy used, anything else non-monotone strategy with this history length used.
- modelint
the model used. Possible values are
0
dynamic (not yet implemented)
1
first-order (no Hessian)
2
second-order (exact Hessian)
3
barely second-order (identity Hessian)
4
secant second-order (sparsity-based)
5
secant second-order (limited-memory BFGS, with``lbfgs_vectors`` history) (not yet implemented)
6
secant second-order (limited-memory SR1, with
lbfgs_vectors
history) (not yet implemented).- normint
The norm is defined via \(||v||^2 = v^T P v\), and will define the preconditioner used for iterative methods. Possible values for \(P\) are
-3
users own preconditioner
-2
\(P =\) limited-memory BFGS matrix (with
lbfgs_vectors
history)-1
identity (= Euclidan two-norm)
0
automatic (not yet implemented)
1
diagonal, \(P =\) diag( max( Hessian,
min_diagonal
) )2
banded, \(P =\) band( Hessian ) with semi-bandwidth
semi_bandwidth
3
re-ordered band, P=band(order(A)) with semi-bandwidth
semi_bandwidth
4
full factorization, \(P =\) Hessian, Schnabel-Eskow modification
5
full factorization, \(P =\) Hessian, GMPS modification (not yet implemented)
6
incomplete factorization of Hessian, Lin-More’
7
incomplete factorization of Hessian, HSL_MI28
8
incomplete factorization of Hessian, Munskgaard (not yet implemented)
9
expanding band of Hessian (not yet implemented).
- semi_bandwidthint
specify the semi-bandwidth of the band matrix \(P\) if required.
- lbfgs_vectorsint
number of vectors used by the L-BFGS matrix \(P\) if required.
- max_dxgint
number of vectors used by the sparsity-based secant Hessian if required.
- icfs_vectorsint
number of vectors used by the Lin-More’ incomplete factorization matrix \(P\) if required.
- mi28_lsizeint
the maximum number of fill entries within each column of the incomplete factor L computed by HSL_MI28. In general, increasing
mi28_lsize
improve the quality of the preconditioner but increases the time to compute and then apply the preconditioner. Values less than 0 are treated as 0.- mi28_rsizeint
the maximum number of entries within each column of the strictly lower triangular matrix \(R\) used in the computation of the preconditioner by HSL_MI28. Rank-1 arrays of size
mi28_rsize
* n are allocated internally to hold \(R\). Thus the amount of memory used, as well as the amount of work involved in computing the preconditioner, depends onmi28_rsize.
Settingmi28_rsize
> 0 generally leads to a higher quality preconditioner than usingmi28_rsize
= 0, and choosingmi28_rsize
>=mi28_lsize
is generally recommended.- advanced_startint
try to pick a good initial trust-region radius using
advanced_start
iterates of a variant on the strategy of Sartenaer SISC 18(6) 1990:1788-1803.- stop_g_absolutefloat
overall convergence tolerances. The iteration will terminate when the norm of the gradient of the objective function is smaller than MAX(
stop_g_absolute,
stop_g_relative
* norm of the initial gradient ) or if the step is less thanstop_s
.- stop_pg_relativefloat
see stop_g_absolute.
- stop_sfloat
see stop_g_absolute.
- initial_radiusfloat
initial value for the trust-region radius.
- maximum_radiusfloat
maximum permitted trust-region radius.
- eta_successfulfloat
a potential iterate will only be accepted if the actual decrease f - f(x_new) is larger than
eta_successful
times that predicted by a quadratic model of the decrease. The trust-region radius will be increased if this relative decrease is greater thaneta_very_successful
but smaller thaneta_too_successful
.- eta_very_successfulfloat
see eta_successful.
- eta_too_successfulfloat
see eta_successful.
- radius_increasefloat
on very successful iterations, the trust-region radius will be increased the factor
radius_increase,
while if the iteration is unsucceful, the radius will be decreased by a factorradius_reduce
but no more thanradius_reduce_max
.- radius_reducefloat
see radius_increase.
- radius_reduce_maxfloat
see radius_increase.
- obj_unboundedfloat
the smallest value the objective function may take before the problem is marked as unbounded.
- cpu_time_limitfloat
the maximum CPU time allowed (-ve means infinite).
- clock_time_limitfloat
the maximum elapsed clock time allowed (-ve means infinite).
- hessian_availablebool
is the Hessian matrix of second derivatives available or is access only via matrix-vector products?.
- subproblem_directbool
use a direct (factorization) or (preconditioned) iterative method to find the search direction.
- retrospective_trust_regionbool
is a retrospective strategy to be used to update the trust-region radius.
- renormalize_radiusbool
should the radius be renormalized to account for a change in preconditioner?.
- space_criticalbool
if
space_critical
is True, every effort will be made to use as little space as possible. This may result in longer computation time.- deallocate_error_fatalbool
if
deallocate_error_fatal
is True, any array/pointer deallocation error will terminate execution. Otherwise, computation will continue.- prefixstr
all output lines will be prefixed by the string contained in quotes within
prefix
, e.g. ‘word’ (note the qutoes) will result in the prefix word.- trs_optionsdict
default control options for TRS (see
trs.initialize
).- gltr_optionsdict
default control options for GLTR (see
gltr.initialize
).- psls_optionsdict
default control options for PSLS (see
psls.initialize
).- lms_optionsdict
default control options for LMS (see
lms.initialize
).- lms_prec_optionsdict
default control options for LMS (see
lms.initialize
).- sha_optionsdict
default control options for SHA (see
sha.initialize
).
- tru.load(n, H_type, H_ne, H_row, H_col, H_ptr, options=None)#
Import problem data into internal storage prior to solution.
Parameters:
- nint
holds the number of variables.
- H_typestring
specifies the symmetric storage scheme used for the Hessian. It should be one of ‘coordinate’, ‘sparse_by_rows’, ‘dense’, ‘diagonal’ or ‘absent’, the latter if access to the Hessian is via matrix-vector products; lower or upper case variants are allowed.
- H_neint
holds the number of entries in the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other three schemes.
- H_rowndarray(H_ne)
holds the row indices of the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other three schemes, and in this case can be None
- H_colndarray(H_ne)
holds the column indices of the lower triangular part of \(H\) in either the sparse co-ordinate, or the sparse row-wise storage scheme. It need not be set when the dense or diagonal storage schemes are used, and in this case can be None
- H_ptrndarray(n+1)
holds the starting position of each row of the lower triangular part of \(H\), as well as the total number of entries, in the sparse row-wise storage scheme. It need not be set when the other schemes are used, and in this case can be None
- optionsdict, optional
dictionary of control options (see
tru.initialize
).
- tru.solve(n, H_ne, x, eval_f, eval_g, eval_h))#
Find an approximate local unconstrained minimizer of a given function using a trust-region method.
Parameters:
- nint
holds the number of variables.
- H_neint
holds the number of entries in the lower triangular part of \(H\).
- xndarray(n)
holds the values of optimization variables \(x\).
- eval_fcallable
a user-defined function that must have the signature:
f = eval_f(x)
The value of the objective function \(f(x)\) evaluated at \(x\) must be assigned to
f
.- eval_gcallable
a user-defined function that must have the signature:
g = eval_g(x)
The components of the gradient \(\nabla f(x)\) of the objective function evaluated at \(x\) must be assigned to
g
.- eval_hcallable
a user-defined function that must have the signature:
h = eval_h(x)
The components of the nonzeros in the lower triangle of the Hessian \(\nabla^2 f(x)\) of the objective function evaluated at \(x\) must be assigned to
h
in the same order as specified in the sparsity pattern intru.load
.Returns:
- xndarray(n)
holds the value of the approximate global minimizer \(x\) after a successful call.
- gndarray(n)
holds the gradient \(\nabla f(x)\) of the objective function.
- [optional] tru.information()
Provide optional output information
Returns:
- informdict
- dictionary containing output information:
- statusint
return status. Possible values are:
0
The run was successful.
-1
An allocation error occurred. A message indicating the offending array is written on unit options[‘error’], and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.
-2
A deallocation error occurred. A message indicating the offending array is written on unit options[‘error’] and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.
-3
The restriction n > 0 or requirement that type contains its relevant string ‘dense’, ‘coordinate’, ‘sparse_by_rows’, ‘diagonal’ or ‘absent’ has been violated.
-7
The objective function appears to be unbounded from below.
-9
The analysis phase of the factorization failed; the return status from the factorization package is given by inform[‘factor_status’].
-10
The factorization failed; the return status from the factorization package is given by inform[‘factor_status’].
-11
The solution of a set of linear equations using factors from the factorization package failed; the return status from the factorization package is given by inform[‘factor_status’].
-16
The problem is so ill-conditioned that further progress is impossible.
-18
Too many iterations have been performed. This may happen if options[‘maxit’] is too small, but may also be symptomatic of a badly scaled problem.
-19
The CPU time limit has been reached. This may happen if options[‘cpu_time_limit’] is too small, but may also be symptomatic of a badly scaled problem.
-82
The user has forced termination of the solver by removing the file named options[‘alive_file’] from unit options[‘alive_unit’].
- alloc_statusint
the status of the last attempted allocation/deallocation.
- bad_allocstr
the name of the array for which an allocation/deallocation error occurred.
- iterint
the total number of iterations performed.
- cg_iterint
the total number of CG iterations performed.
- f_evalint
the total number of evaluations of the objective function.
- g_evalint
the total number of evaluations of the gradient of the objective function.
- h_evalint
the total number of evaluations of the Hessian of the objective function.
- factorization_maxint
the maximum number of factorizations in a sub-problem solve.
- factorization_statusint
the return status from the factorization.
- max_entries_factorsint
the maximum number of entries in the factors.
- factorization_integerint
the total integer workspace required for the factorization.
- factorization_realint
the total real workspace required for the factorization.
- objfloat
the value of the objective function at the best estimate of the solution determined by tru.solve.
- norm_gfloat
the norm of the gradient of the objective function at the best estimate of the solution determined by tru.solve.
- radiusfloat
the current value of the trust-region radius.
- timedict
- dictionary containing timing information:
- totalfloat
the total CPU time spent in the package.
- preprocessfloat
the CPU time spent preprocessing the problem.
- analysefloat
the CPU time spent analysing the required matrices prior to factorization.
- factorizefloat
the CPU time spent factorizing the required matrices.
- solvefloat
the CPU time spent computing the search direction.
- clock_totalfloat
the total clock time spent in the package.
- clock_preprocessfloat
the clock time spent preprocessing the problem.
- clock_analysefloat
the clock time spent analysing the required matrices prior to factorization.
- clock_factorizefloat
the clock time spent factorizing the required matrices.
- clock_solvefloat
the clock time spent computing the search direction.
- trs_informdict
inform parameters for TRS (see
trs.information
).- gltr_informdict
inform parameters for GLTR (see
gltr.information
).- psls_informdict
inform parameters for PSLS (see
psls.information
).- lms_informdict
inform parameters for LMS (see
lms.information
).- lms_prec_informdict
inform parameters for LMS used for preconditioning (see
lms.information
).- sha_informdict
inform parameters for SHA (see
sha.information
).
- tru.terminate()#
Deallocate all internal private storage.
example code#
from galahad import tru
import numpy as np
np.set_printoptions(precision=4,suppress=True,floatmode='fixed')
print("\n** python test: tru")
# allocate internal data and set default options
options = tru.initialize()
# set some non-default options
options['print_level'] = 0
#options['trs_options']['print_level'] = 0
#print("options:", options)
# set parameters
p = 4
# set bounds
n = 3
# set Hessian sparsity
H_type = 'coordinate'
H_ne = 5
H_row = np.array([0,2,1,2,2])
H_col = np.array([0,0,1,1,2])
H_ptr = None
# load data (and optionally non-default options)
tru.load(n, H_type, H_ne, H_row, H_col, H_ptr, options=options)
# define objective function and its derivatives
def eval_f(x):
return (x[0] + x[2] + p)**2 + (x[1] + x[2])**2 + np.cos(x[0])
def eval_g(x):
return np.array([2.0* ( x[0] + x[2] + p ) - np.sin(x[0]),
2.0* ( x[1] + x[2] ),
2.0* ( x[0] + x[2] + p ) + 2.0 * ( x[1] + x[2] )])
def eval_h(x):
return np.array([2. - np.cos(x[0]),2.0,2.0,2.0,4.0])
# set starting point
x = np.array([1.,1.,1.])
# find optimum
x, g = tru.solve(n, H_ne, x, eval_f, eval_g, eval_h)
print(" x:",x)
print(" g:",g)
# get information
inform = tru.information()
#print(inform)
print(" f: %.4f" % inform['obj'])
print('** trb exit status:', inform['status'])
# deallocate internal data
tru.terminate()
This example code is available in $GALAHAD/src/tru/Python/test_tru.py .