ARC#

purpose#

The arc package uses a regularization method to find a (local) minimizer of a differentiable objective function \(f(x)\) of many variables \(x\). The method offers the choice of direct and iterative solution of the key subproblems, and is most suitable for large problems. First derivatives are required, and if second derivatives can be calculated, they will be exploited.

See Section 4 of $GALAHAD/doc/arc.pdf for additional details.

method#

An adaptive cubic regularization method is used. In this, an improvement to a current estimate of the required minimizer, \(x_k\) is sought by computing a step \(s_k\). The step is chosen to approximately minimize a model \(m_k(s)\) of \(f(x_k + s)\) that includes a weighted term \(\sigma_k \|s_k\|^3\) for some specified positive weight \(\sigma_k\). The quality of the resulting step \(s_k\) is assessed by computing the “ratio” \((f(x_k) - f(x_k + s_k))/ (m_k(0) - m_k(s_k))\). The step is deemed to have succeeded if the ratio exceeds a given \(\eta_s > 0\), and in this case \(x_{k+1} = x_k + s_k\). Otherwise \(x_{k+1} = x_k\), and the weight is increased by powers of a given increase factor up to a given limit. If the ratio is larger than \(\eta_v \geq \eta_d\), the weight will be decreased by powers of a given decrease factor again up to a given limit. The method will terminate as soon as \(\|\nabla_x f(x_k)\|\) is smaller than a specified value.

Either linear or quadratic models \(m_k(s)\) may be used. The former will be taken as the first two terms \(f(x_k) + s^T \nabla_x f(x_k)\) of a Taylor series about \(x_k\), while the latter uses an approximation to the first three terms \(f(x_k) + s^T \nabla_x f(x_k) + \frac{1}{2} s^T B_k s\), for which \(B_k\) is a symmetric approximation to the Hessian \(\nabla_{xx} f(x_k)\); possible approximations include the true Hessian, limited-memory secant and sparsity approximations and a scaled identity matrix. Normally a two-norm regularization will be used, but this may change if preconditioning is employed.

An approximate minimizer of the cubic model is found using either a direct approach involving factorization or an iterative (conjugate-gradient/Lanczos) approach based on approximations to the required solution from a so-called Krlov subspace. The direct approach is based on the knowledge that the required solution satisfies the linear system of equations \((B_k + \lambda_k I) s_k = - \nabla_x f(x_k)\) involving a scalar Lagrange multiplier \(\lambda_k\). This multiplier is found by uni-variate root finding, using a safeguarded Newton-like process, by RQS or DPS (depending on the norm chosen). The iterative approach uses GLRT, and is best accelerated by preconditioning with good approximations to \(B_k\) using PSLS. The iterative approach has the advantage that only matrix-vector products \(B_k v\) are required, and thus \(B_k\) is not required explicitly. However when factorizations of \(B_k\) are possible, the direct approach is often more efficient.

references#

The generic adaptive cubic regularization method is described in detail in

C. Cartis, N. I. M. Gould and Ph. L. Toint, ``Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results’’ Mathematical Programming 127(2) (2011) 245–295,

and uses ``tricks’’ as suggested in

N. I. M. Gould, M. Porcelli and Ph. L. Toint, ``Updating the regularization parameter in the adaptive cubic regularization algorithm’’. Computational Optimization and Applications 53(1) (2012) 1-22.

matrix storage#

The symmetric \(n\) by \(n\) matrix \(H = \nabla^2_{xx}f\) may be presented and stored in a variety of formats. But crucially symmetry is exploited by only storing values from the lower triangular part (i.e, those entries that lie on or below the leading diagonal).

Dense storage format: The matrix \(H\) is stored as a compact dense matrix by rows, that is, the values of the entries of each row in turn are stored in order within an appropriate real one-dimensional array. Since \(H\) is symmetric, only the lower triangular part (that is the part \(H_{ij}\) for \(0 <= j <= i <= n-1\)) need be held. In this case the lower triangle should be stored by rows, that is component \(i * i / 2 + j\) of the storage array H_val will hold the value \(H_{ij}\) (and, by symmetry, \(H_{ji}\)) for \(0 <= j <= i <= n-1\).

Sparse co-ordinate storage format: Only the nonzero entries of the matrices are stored. For the \(l\)-th entry, \(0 <= l <= ne-1\), of \(H\), its row index i, column index j and value \(H_{ij}\), \(0 <= j <= i <= n-1\), are stored as the \(l\)-th components of the integer arrays H_row and H_col and real array H_val, respectively, while the number of nonzeros is recorded as H_ne = \(ne\). Note that only the entries in the lower triangle should be stored.

Sparse row-wise storage format: Again only the nonzero entries are stored, but this time they are ordered so that those in row i appear directly before those in row i+1. For the i-th row of \(H\) the i-th component of the integer array H_ptr holds the position of the first entry in this row, while H_ptr(n) holds the total number of entries. The column indices j, \(0 <= j <= i\), and values \(H_{ij}\) of the entries in the i-th row are stored in components l = H_ptr(i), …, H_ptr(i+1)-1 of the integer array H_col, and real array H_val, respectively. Note that as before only the entries in the lower triangle should be stored. For sparse matrices, this scheme almost always requires less storage than its predecessor.

functions#

arc.initialize()#

Set default option values and initialize private data

Returns:

optionsdict
dictionary containing default control options:
errorint

error and warning diagnostics occur on stream error.

outint

general output occurs on stream out.

print_levelint

the level of output required. Possible values are:

  • <= 0

    no output

  • 1

    a one-line summary for every improvement

  • 2

    a summary of each iteration

  • >= 3

    increasingly verbose (debugging) output.

start_printint

any printing will start on this iteration.

stop_printint

any printing will stop on this iteration.

print_gapint

the number of iterations between printing.

maxitint

the maximum number of iterations performed.

alive_unitint

removal of the file alive_file from unit alive_unit terminates execution.

alive_filestr

see alive_unit.

more_toraldoint

more_toraldo >= 1 gives the number of More’-Toraldo projected searches to be used to improve upon the Cauchy point, anything else is for the standard add-one-at-a-time CG search.

non_monotoneint

non-monotone <= 0 monotone strategy used, anything else non-monotone strategy with this history length used.

modelint

the model used. Possible values are

  • 0

    dynamic (not yet implemented)

  • 1

    first-order (no Hessian)

  • 2

    second-order (exact Hessian)

  • 3

    barely second-order (identity Hessian)

  • 4

    secant second-order (sparsity-based)

  • 5

    secant second-order (limited-memory BFGS, with``lbfgs_vectors`` history) (not yet implemented)

  • 6

    secant second-order (limited-memory SR1, with lbfgs_vectors history) (not yet implemented).

normint

The norm is defined via \(||v||^2 = v^T P v\), and will define the preconditioner used for iterative methods. Possible values for \(P\) are

  • -3

    users own preconditioner

  • -2

    \(P =\) limited-memory BFGS matrix (with lbfgs_vectors history)

  • -1

    identity (= Euclidan two-norm)

  • 0

    automatic (not yet implemented)

  • 1

    diagonal, \(P =\) diag( max( Hessian, min_diagonal ) )

  • 2

    banded, \(P =\) band( Hessian ) with semi-bandwidth semi_bandwidth

  • 3

    re-ordered band, P=band(order(A)) with semi-bandwidth semi_bandwidth

  • 4

    full factorization, \(P =\) Hessian, Schnabel-Eskow modification

  • 5

    full factorization, \(P =\) Hessian, GMPS modification (not yet implemented)

  • 6

    incomplete factorization of Hessian, Lin-More’

  • 7

    incomplete factorization of Hessian, HSL_MI28

  • 8

    incomplete factorization of Hessian, Munskgaard (not yet implemented)

  • 9

    expanding band of Hessian (not yet implemented).

semi_bandwidthint

specify the semi-bandwidth of the band matrix \(P\) if required.

lbfgs_vectorsint

number of vectors used by the L-BFGS matrix \(P\) if required.

max_dxgint

number of vectors used by the sparsity-based secant Hessian if required.

icfs_vectorsint

number of vectors used by the Lin-More’ incomplete factorization matrix \(P\) if required.

mi28_lsizeint

the maximum number of fill entries within each column of the incomplete factor L computed by HSL_MI28. In general, increasing mi28_lsize improve the quality of the preconditioner but increases the time to compute and then apply the preconditioner. Values less than 0 are treated as 0.

mi28_rsizeint

the maximum number of entries within each column of the strictly lower triangular matrix \(R\) used in the computation of the preconditioner by HSL_MI28. Rank-1 arrays of size mi28_rsize * n are allocated internally to hold \(R\). Thus the amount of memory used, as well as the amount of work involved in computing the preconditioner, depends on mi28_rsize. Setting mi28_rsize > 0 generally leads to a higher quality preconditioner than using mi28_rsize = 0, and choosing mi28_rsize >= mi28_lsize is generally recommended.

advanced_startint

try to pick a good initial regularization weight using advanced_start iterates of a variant on the strategy of Sartenaer SISC 18(6) 1990:1788-1803.

stop_g_absolutefloat

overall convergence tolerances. The iteration will terminate when the norm of the gradient of the objective function is smaller than MAX( stop_g_absolute, stop_g_relative * norm of the initial gradient ) or if the step is less than stop_s.

stop_pg_relativefloat

see stop_g_absolute.

stop_sfloat

see stop_g_absolute.

initial_weightfloat

Initial value for the regularisation weight (-ve => 1/||g_0||).

minimum_weightfloat

minimum permitted regularisation weight.

reduce_gapfloat

expert parameters as suggested in Gould, Porcelli & Toint, “Updating the regularization parameter in the adaptive cubic regularization algorithm” RAL-TR-2011-007, Rutherford Appleton Laboratory, England (2011), http://epubs.stfc.ac.uk/bitstream/6181/RAL-TR-2011-007.pdf (these are denoted beta, epsilon_chi and alpha_max in the paper).

tiny_gapfloat

see reduce_gap.

large_rootfloat

see reduce_gap.

eta_successfulfloat

a potential iterate will only be accepted if the actual decrease f - f(x_new) is larger than eta_successful times that predicted by a quadratic model of the decrease. The regularization weight will be decreased if this relative decrease is greater than eta_very_successful but smaller than eta_too_successful (the first is eta in Gould, Porcell and Toint, 2011).

eta_very_successfulfloat

see eta_successful.

eta_too_successfulfloat

see eta_successful.

weight_decrease_minfloat

on very successful iterations, the regularization weight will be reduced by the factor weight_decrease but no more than weight_decrease_min while if the iteration is unsuccessful, the weight will be increased by a factor weight_increase but no more than weight_increase_max (these are delta_1, delta_2, delta3 and delta_max in Gould, Porcelli and Toint, 2011).

weight_decreasefloat

see weight_decrease_min.

weight_increasefloat

see weight_decrease_min.

weight_increase_maxfloat

see weight_decrease_min.

obj_unboundedfloat

the smallest value the objective function may take before the problem is marked as unbounded.

cpu_time_limitfloat

the maximum CPU time allowed (-ve means infinite).

clock_time_limitfloat

the maximum elapsed clock time allowed (-ve means infinite).

hessian_availablebool

is the Hessian matrix of second derivatives available or is access only via matrix-vector products?.

subproblem_directbool

use a direct (factorization) or (preconditioned) iterative method to find the search direction.

renormalize_weightbool

should the weight be renormalized to account for a change in preconditioner?.

quadratic_ratio_testbool

should the test for acceptance involve the quadratic model or the cubic?.

space_criticalbool

if space_critical is True, every effort will be made to use as little space as possible. This may result in longer computation time.

deallocate_error_fatalbool

if deallocate_error_fatal is True, any array/pointer deallocation error will terminate execution. Otherwise, computation will continue.

prefixstr

all output lines will be prefixed by the string contained in quotes within prefix, e.g. ‘word’ (note the qutoes) will result in the prefix word.

trs_optionsdict

default control options for TRS (see trs.initialize).

gltr_optionsdict

default control options for GLTR (see gltr.initialize).

dpsdict

default control options for DPS (see dps.initialize).

psls_optionsdict

default control options for PSLS (see psls.initialize).

lms_optionsdict

default control options for LMS (see lms.initialize).

lms_prec_optionsdict

default control options for LMS (see lms.initialize).

sec_optionsdict

default control options for SEC (see sec.initialize).

sha_optionsdict

default control options for SHA (see sha.initialize).

arc.load(n, H_type, H_ne, H_row, H_col, H_ptr, options=None)#

Import problem data into internal storage prior to solution.

Parameters:

nint

holds the number of variables.

H_typestring

specifies the symmetric storage scheme used for the Hessian. It should be one of ‘coordinate’, ‘sparse_by_rows’, ‘dense’, ‘diagonal’ or ‘absent’, the latter if access to the Hessian is via matrix-vector products; lower or upper case variants are allowed.

H_neint

holds the number of entries in the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other three schemes.

H_rowndarray(H_ne)

holds the row indices of the lower triangular part of \(H\) in the sparse co-ordinate storage scheme. It need not be set for any of the other three schemes, and in this case can be None

H_colndarray(H_ne)

holds the column indices of the lower triangular part of \(H\) in either the sparse co-ordinate, or the sparse row-wise storage scheme. It need not be set when the dense or diagonal storage schemes are used, and in this case can be None

H_ptrndarray(n+1)

holds the starting position of each row of the lower triangular part of \(H\), as well as the total number of entries, in the sparse row-wise storage scheme. It need not be set when the other schemes are used, and in this case can be None

optionsdict, optional

dictionary of control options (see arc.initialize).

arc.solve(n, H_ne, x, eval_f, eval_g, eval_h))#

Find an approximate local unconstrained minimizer of a given function using a regularization method.

Parameters:

nint

holds the number of variables.

H_neint

holds the number of entries in the lower triangular part of \(H\).

xndarray(n)

holds the values of optimization variables \(x\).

eval_fcallable

a user-defined function that must have the signature:

f = eval_f(x)

The value of the objective function \(f(x)\) evaluated at \(x\) must be assigned to f.

eval_gcallable

a user-defined function that must have the signature:

g = eval_g(x)

The components of the gradient \(\nabla f(x)\) of the objective function evaluated at \(x\) must be assigned to g.

eval_hcallable

a user-defined function that must have the signature:

h = eval_h(x)

The components of the nonzeros in the lower triangle of the Hessian \(\nabla^2 f(x)\) of the objective function evaluated at \(x\) must be assigned to h in the same order as specified in the sparsity pattern in arc.load.

Returns:

xndarray(n)

holds the value of the approximate global minimizer \(x\) after a successful call.

gndarray(n)

holds the gradient \(\nabla f(x)\) of the objective function.

[optional] arc.information()

Provide optional output information

Returns:

informdict
dictionary containing output information:
statusint

return status. Possible values are:

  • 0

    The run was successful.

  • -1

    An allocation error occurred. A message indicating the offending array is written on unit options[‘error’], and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

  • -2

    A deallocation error occurred. A message indicating the offending array is written on unit options[‘error’] and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

  • -3

    The restriction n > 0 or requirement that type contains its relevant string ‘dense’, ‘coordinate’, ‘sparse_by_rows’, ‘diagonal’ or ‘absent’ has been violated.

  • -7

    The objective function appears to be unbounded from below.

  • -9

    The analysis phase of the factorization failed; the return status from the factorization package is given by inform[‘factor_status’].

  • -10

    The factorization failed; the return status from the factorization package is given by inform[‘factor_status’].

  • -11

    The solution of a set of linear equations using factors from the factorization package failed; the return status from the factorization package is given by inform[‘factor_status’].

  • -16

    The problem is so ill-conditioned that further progress is impossible.

  • -18

    Too many iterations have been performed. This may happen if options[‘maxit’] is too small, but may also be symptomatic of a badly scaled problem.

  • -19

    The CPU time limit has been reached. This may happen if options[‘cpu_time_limit’] is too small, but may also be symptomatic of a badly scaled problem.

  • -82

    The user has forced termination of the solver by removing the file named options[‘alive_file’] from unit options[‘alive_unit’].

alloc_statusint

the status of the last attempted allocation/deallocation.

bad_allocstr

the name of the array for which an allocation/deallocation error occurred.

iterint

the total number of iterations performed.

cg_iterint

the total number of CG iterations performed.

f_evalint

the total number of evaluations of the objective function.

g_evalint

the total number of evaluations of the gradient of the objective function.

h_evalint

the total number of evaluations of the Hessian of the objective function.

factorization_maxint

the maximum number of factorizations in a sub-problem solve.

factorization_statusint

the return status from the factorization.

max_entries_factorsint

the maximum number of entries in the factors.

factorization_integerint

the total integer workspace required for the factorization.

factorization_realint

the total real workspace required for the factorization.

objfloat

the value of the objective function at the best estimate of the solution determined by arc.solve.

norm_gfloat

the norm of the gradient of the objective function at the best estimate of the solution determined by arc.solve.

weightfloat

the current value of the regularization weight.

timedict
dictionary containing timing information:
totalfloat

the total CPU time spent in the package.

preprocessfloat

the CPU time spent preprocessing the problem.

analysefloat

the CPU time spent analysing the required matrices prior to factorization.

factorizefloat

the CPU time spent factorizing the required matrices.

solvefloat

the CPU time spent computing the search direction.

clock_totalfloat

the total clock time spent in the package.

clock_preprocessfloat

the clock time spent preprocessing the problem.

clock_analysefloat

the clock time spent analysing the required matrices prior to factorization.

clock_factorizefloat

the clock time spent factorizing the required matrices.

clock_solvefloat

the clock time spent computing the search direction.

trs_informdict

inform parameters for TRS (see trs.information).

gltr_informdict

inform parameters for GLTR (see gltr.information).

dps_informdict

inform parameters for DPS (see dps.information).

psls_informdict

inform parameters for PSLS (see psls.information).

lms_informdict

inform parameters for LMS (see lms.information).

lms_prec_informdict

inform parameters for LMS used for preconditioning (see lms.information).

sec_informdict

inform parameters for SEC (see sec.information).

sha_informdict

inform parameters for SHA (see sha.information).

arc.terminate()#

Deallocate all internal private storage.

example code#

from galahad import arc
import numpy as np
np.set_printoptions(precision=4,suppress=True,floatmode='fixed')
print("\n** python test: arc")

# allocate internal data and set default options
options = arc.initialize()

# set some non-default options
options['print_level'] = 0
#options['rqs_options']['print_level'] = 0
#print("options:", options)

# set parameters
p = 4
# set bounds
n = 3

# set Hessian sparsity
H_type = 'coordinate'
H_ne = 5
H_row = np.array([0,2,1,2,2])
H_col = np.array([0,0,1,1,2])
H_ptr = None

# load data (and optionally non-default options)
arc.load(n, H_type, H_ne, H_row, H_col, H_ptr, options=options)

# define objective function and its derivatives
def eval_f(x):
    return (x[0] + x[2] + p)**2 + (x[1] + x[2])**2 + np.cos(x[0])
def eval_g(x):
    return np.array([2.0* ( x[0] + x[2] + p ) - np.sin(x[0]),
                     2.0* ( x[1] + x[2] ),
                     2.0* ( x[0] + x[2] + p ) + 2.0 * ( x[1] + x[2] )])
def eval_h(x):
    return np.array([2. - np.cos(x[0]),2.0,2.0,2.0,4.0])

# set starting point
x = np.array([1.,1.,1.])

# find optimum
x, g = arc.solve(n, H_ne, x, eval_f, eval_g, eval_h)
print(" x:",x)
print(" g:",g)

# get information
inform = arc.information()
#print(inform)
print(" f: %.4f" % inform['obj'])
print('** arc exit status:', inform['status'])

# deallocate internal data
arc.terminate()

This example code is available in $GALAHAD/src/arc/Python/test_arc.py .