EXPO#

purpose#

The expo package uses an exponential-penalty function method to solve a given constrained optimization problem. The aim is to find a (local) minimizer of a differentiable objective function $f(x)$ of $n$ variables $x$, subject to $m$ general constraints $c_l \leq c(x) \leq c_u$ and simple-bound constraints $x_l \leq x \leq x_u$ on the variables. Here, any of the components of the vectors of bounds $c_l$, $c_u$, $x_l$ and $x_u$ may be infinite. The method offers the choice of direct and iterative solution of the key unconstrained-optimization subproblems, and is most suitable for large problems. First derivatives are required, and if second derivatives can be calculated, they will be exploited—if the product of second derivatives with a vector may be found but not the derivatives themselves, that may also be exploited.

N.B. This package is currently a beta release, and aspects may change before it is formally released

See Section 4 of $GALAHAD/doc/expo.pdf for additional details.

terminology#

The exponential penalty function is defined to be

\[\begin{split}\begin{array}{rl}\phi(x,w,\mu,v,\nu) \!\! & = f(x) + \sum_{i} \mu_{li} w_{li} \exp[(c_{li} - c_i(x))/\mu_{li}] \\ & \;\;\;\;\;\;\;\;\;\;\;\;\; + \sum_{i} \mu_{ui} w_{ui} \exp[(c_i(x) - c_{ui})/\mu_{ui}] \\ & \;\;\;\;\;\;\;\;\;\;\;\;\; + \sum_{j} \nu_{lj} v_{lj} \exp[(x_{lj} - x_j)/\nu_{lj}] \\ & \;\;\;\;\;\;\;\;\;\;\;\;\; + \sum_{j} \nu_{uj} v_{uj} \exp[(x_j - x_{uj})/\nu_{uj}], \end{array} \end{split}\]

where $c_{li}$, $c_{ui}$ and $c_i(x)$ are the $i$-th components of $c_l$, $c_u$ and $c(x)$, and $c_{lj}$, $c_{uj}$ and $x_j$ are the $j$-th components of $x_l$, $x_u$ and $x$, respectively. Here the components of $\mu_l$, $\mu_u$, $\nu_l$ and $\nu_u$ are separate penalty parameters for each lower and upper, general and simple-bound constraint, respectively, while those of $w_l$, $w_u$, $v_l$, $v_u$ are likewise separate weights for the same. The algorithm iterates by approximately minimizing $\phi(x,w,\mu,v,\nu)$ for a fixed set of penalty parameters and weights, and then adjusting these parameters and weights. The adjustments are designed so the sequence of approximate minimizers of $\phi$ converge to that of the specified constrained optimization problem.

Key constructs are the gradient of the objective function

\[g(x) := \nabla_x f(x),\]

the Jacobian of the vector of constraints,

\[J(x) := \nabla_x c(x),\]

and the gradient and Hessian of the Lagrangian function

\[g_L(x,y,z) := g(x) - J^T(x)y - z \;\;\mbox{and}\;\; H_L(x,y) := \nabla_{xx} \left[ f(x) - \sum_{i} y_i c_i(x)\right]\]

for given vectors $y$ and $z$.

Any required solution $x$ necessarily satisfies the primal optimality conditions

\[c_l \leq c(x) \leq c_u, \;\; x_l \leq x \leq x_u,\;\;\mbox{(1)}\]

the dual optimality conditions

\[g(x) = J^{T}(x) y + z,\;\; y = y_l + y_u \;\;\mbox{and}\;\; z = z_l + z_u,\;\;\mbox{(2a)}\]

and

\[y_l \geq 0, \;\; y_u \leq 0, \;\; z_l \geq 0 \;\;\mbox{and}\;\; z_u \leq 0,\;\;\mbox{(2b)}\]

and the complementary slackness conditions

\[( c(x) - c_l )^{T} y_l = 0,\;\; ( c(x) - c_u )^{T} y_u = 0,\;\; (x -x_l )^{T} z_l = 0 \;\;\mbox{and}\;\;(x -x_u )^{T} z_u = 0,\;\;\mbox{(3)}\]

where the vectors $y$ and $z$ are known as the Lagrange multipliers for the general constraints, and the dual variables for the simple bounds, respectively, and where the vector inequalities hold component-wise.

method#

The method employed involves a sequential minimization of the exponential penalty function $\phi(x,w,\mu,v,\nu)$ for a sequence of positive penalty parameters $(\mu_{lk}, \mu_{uk}, \nu_{lk}, \nu_{uk})$ and weights $(w_{lk}, w_{uk}, v_{lk}, v_{uk})$, for increasing $k \geq 0$. Convergence is ensured if the penalty parameters are forced to zero, and may be accelerated by adjusting the weights. The minimization of $\phi(x,w,\mu,v,\nu)$ is accomplished using the trust-region unconstrained solver TRU. Although critical points $\{x_k\}$ of $\phi(x,w_k,\mu_k,v_k,\nu_k)$ converge to a local solution $x_*$ of the underlying problem, the reduction of the penalty parameters to zero often results in $x_k$ being a poor starting point for the minimization of $\phi(x,w_{k+1},\mu_{k+1},v_{k+1},\nu_{k+1})$. Consequently, a careful extrapolated starting point from $x_k$ is used instead. Moreover, once the algorithm is confident that it is sufficiently close to $x_*$, it switches to Newton’s method to accelerate the convergence. Both the extrapolation and the Newton iteration rely on the block-linear-system solver SSLS.

The iteration is terminated as soon as residuals to the optimality conditions (1)–(3) are sufficiently small. For infeasible problems, this will not be possible, and instead the residuals to (1) will be made as small as possible.

references#

The method is described in detail in

N.Gould, S.Leyffer, A.Montoison and C.Vanaret (2025) The exponential multiplier method in the 21st century. RAL Technical Report, in preparation.

matrix storage#

The unsymmetric $m$ by $n$ Jacobian matrix $J = J(x)$ may be presentedand stored in a variety of convenient input formats.

Dense storage format: The matrix $J$ is stored as a compact dense matrix by rows, that is, the values of the entries of each row in turn are stored in order within an appropriate real one-dimensional array. In this case, component $n \ast i + j$ of the storage array J_val will hold the value $J_{ij}$ for $0 \leq i \leq m-1$, $0 \leq j \leq n-1$.

Dense by columns storage format: The matrix $J$ is stored as a compact dense matrix by columns, that is, the values of the entries of each column in turn are stored in order within an appropriate real one-dimensional array. In this case, component $m \ast j + i$ of the storage array J_val will hold the value $J_{ij}$ for $0 \leq i \leq m-1$, $0 \leq j \leq n-1$.

Sparse co-ordinate storage format: Only the nonzero entries of the matrices are stored. For the $l$-th entry, $0 \leq l \leq ne-1$, of $J$, its row index i, column index j and value $J_{ij}$, $0 \leq i \leq m-1$, $0 \leq j \leq n-1$, are stored as the $l$-th components of the integer arrays J_row and J_col and real array J_val, respectively, while the number of nonzeros is recorded as J_ne = $ne$.

Sparse row-wise storage format: Again only the nonzero entries are stored, but this time they are ordered so that those in row i appear directly before those in row i+1. For the i-th row of $J$ the i-th component of the integer array J_ptr holds the position of the first entry in this row, while J_ptr(m) holds the total number of entries. The column indices j, $0 \leq j \leq n-1$, and values $J_{ij}$ of the nonzero entries in the i-th row are stored in components l = J_ptr(i), $\ldots$, J_ptr(i+1)-1, $0 \leq i \leq m-1$, of the integer array J_col, and real array J_val, respectively. For sparse matrices, this scheme almost always requires less storage than its predecessor.

Sparse column-wise storage format: Once again only the nonzero entries are stored, but this time they are ordered so that those in column j appear directly before those in column j+1. For the j-th column of $J$ the j-th component of the integer array J_ptr holds the position of the first entry in this column, while J_ptr(n) holds the total number of entries. The row indices i, $0 \leq i \leq m-1$, and values $J_{ij}$ of the nonzero entries in the j-th columnsare stored in components l = J_ptr(j), $\ldots$, J_ptr(j+1)-1, $0 \leq j \leq n-1$, of the integer array J_row, and real array J_val, respectively. As before, for sparse matrices, this scheme almost always requires less storage than the co-ordinate format.

The symmetric $n$ by $n$ matrix $H = Hl(x,y)$ may be presented and stored in a variety of formats. But crucially symmetry is exploited by only storing values from the lower triangular part (i.e, those entries that lie on or below the leading diagonal).

Dense storage format: The matrix $H$ is stored as a compact dense matrix by rows, that is, the values of the entries of each row in turn are stored in order within an appropriate real one-dimensional array. Since $H$ is symmetric, only the lower triangular part (that is the part $H_{ij}$ for $0 \leq j \leq i \leq n-1$) need be held. In this case the lower triangle should be stored by rows, that is component $i * i / 2 + j$ of the storage array H_val will hold the value $H_{ij}$ (and, by symmetry, $H_{ji}$) for $0 \leq j \leq i \leq n-1$.

Sparse co-ordinate storage format: Only the nonzero entries of the matrices are stored. For the $l$-th entry, $0 \leq l \leq ne-1$, of $H$, its row index i, column index j and value $H_{ij}$, $0 \leq j \leq i \leq n-1$, are stored as the $l$-th components of the integer arrays H_row and H_col and real array H_val, respectively, while the number of nonzeros is recorded as H_ne = $ne$. Note that only the entries in the lower triangle should be stored.

Sparse row-wise storage format: Again only the nonzero entries are stored, but this time they are ordered so that those in row i appear directly before those in row i+1. For the i-th row of $H$ the i-th component of the integer array H_ptr holds the position of the first entry in this row, while H_ptr(n) holds the total number of entries. The column indices j, $0 \leq j \leq i$, and values $H_{ij}$ of the entries in the i-th row are stored in components l = H_ptr(i), …, H_ptr(i+1)-1 of the integer array H_col, and real array H_val, respectively. Note that as before only the entries in the lower triangle should be stored. For sparse matrices, this scheme almost always requires less storage than its predecessor.

Diagonal storage format: If $H$ is diagonal (i.e., $H_{ij} = 0$ for all $0 \leq i \neq j \leq n-1$) only the diagonals entries $H_{ii}$, $0 \leq i \leq n-1$ need be stored, and the first n components of the array H_val may be used for the purpose.

Multiples of the identity storage format: If $H$ is a multiple of the identity matrix, (i.e., $H = \alpha I$ where $I$ is the n by n identity matrix and $\alpha$ is a scalar), it suffices to store $\alpha$ as the first component of H_val.

The identity matrix format: If $H$ is the identity matrix, no values need be stored.

The zero matrix format: The same is true if $H$ is the zero matrix.

functions#

expo.initialize()#

Set default option values and initialize private data

Returns:

optionsdict

dictionary containing default control options:

errorint
error and warning diagnostics occur on stream error.

outint
general output occurs on stream out.

print_levelint
the level of output required. Possible values are

<= 0

gives no output.

1

gives a one-line summary for every iteration.

2

gives a summary of the inner iteration for each iteration.

>=3

gives increasingly verbose (debugging) output.

start_printint
any printing will start on this iteration.

stop_printint
any printing will stop on this iteration.

print_gapint
the number of iterations between printing.

max_itint
the maximum number of iterations permitted.

max_evalint
the maximum number of function evaluations permitted.

alive_unitint
removal of the file alive_file from unit alive_unit terminates execution.

alive_filestr
see alive_unit.

update_multipliers_itminint
update the Lagrange multipliers/dual variables from iteration update_multipliers_itmin (<0 means never) and once the primal infeasibility is below update_multipliers_tol

update_multipliers_tolfloat
see update_multipliers_itmin.

infinityfloat
any bound larger than infinity in modulus will be regarded as infinite.

stop_abs_pfloat
the required absolute accuracy for the primal infeasibility.

stop_rel_pfloat
the required relative accuracy for the primal infeasibility.

stop_abs_dfloat
the required absolute accuracy for the dual infeasibility.

stop_rel_dfloat
the required relative accuracy for the dual infeasibility.

stop_abs_cfloat
the required absolute accuracy for the complementarity.

stop_rel_cfloat
the required relative accuracy for the complementarity.

stop_sfloat
the smallest the norm of the step can be before termination.

stop_subproblem_relfloat
the subproblem minimization that uses GALAHAD TRU will be stopped as soon as the relative decrease in the subproblem gradient falls below .stop_subproblem_rel. If .stop_subproblem_rel is 1.0 or bigger or 0.0 or smaller, this value will be ignored, and the choice of stopping rule delegated to .control_tru.stop_g_relative (see below)

initial_mufloat
initial value for the penalty parameter (<=0 means set automatically)

mu_reducefloat
the amount by which the penalty parameter is decreased

obj_unboundedfloat
the smallest value the objective function may take before the problem is marked as unbounded.

try_advanced_startfloat
try an advanced start at the end of every iteration when the KKT residuals are smaller than .try_advanced_start (-ve means never)

try_sqp_startfloat
try an advanced SQP start at the end of every iteration when the KKT residuals are smaller than .try_sqp_start (-ve means never)

stop_advanced_startfloat
stop the advanced start search once the residuals small than .stop_advanced_start

cpu_time_limitfloat
the maximum CPU time allowed (-ve means infinite).

clock_time_limitfloat
the maximum elapsed clock time allowed (-ve means infinite).

hessian_availablebool
is the Hessian matrix of second derivatives available or is access only via matrix-vector products (coming soon)?.

subproblem_directbool
use a direct (factorization) or (preconditioned) iterative method (coming soon) to find the search direction.

space_criticalbool
if space_critical is True, every effort will be made to use as little space as possible. This may result in longer computation time.

deallocate_error_fatalbool
if deallocate_error_fatal is True, any array/pointer deallocation error will terminate execution. Otherwise, computation will continue.

prefixstr
all output lines will be prefixed by the string contained in quotes within prefix, e.g. ‘word’ (note the qutoes) will result in the prefix word.

bsc_optionsdict
default control options for BSC (see bsc.initialize).

tru_optionsdict
default control options for TRU (see tru.initialize).

ssls_optionsdict
default control options for SSLS (see ssls.initialize).

expo.load(n, m, J_type, J_ne, J_row, J_col, J_ptr, H_type, H_ne, H_row, H_col, H_ptr, options=None)#

Import problem data into internal storage prior to solution.

Parameters:

nint
holds the number of variables.

mint
holds the number of residuals.

J_typestring
specifies the unsymmetric storage scheme used for the Jacobian $J = J(x)$. It should be one of ‘coordinate’, ‘sparse_by_rows’ or ‘dense’; lower or upper case variants are allowed.

J_neint
holds the number of entries in $J$ in the sparse co-ordinate storage scheme. It need not be set for any of the other two schemes.

J_rowndarray(J_ne)
holds the row indices of $J$ in the sparse co-ordinate storage scheme. It need not be set for any of the other two schemes, and in this case can be None.

J_colndarray(J_ne)
holds the column indices of $J$ in either the sparse co-ordinate, or the sparse row-wise storage scheme. It need not be set when the dense storage scheme is used, and in this case can be None.

J_ptrndarray(m+1)
holds the starting position of each row of $J$, as well as the total number of entries, in the sparse row-wise storage scheme. It need not be set when the other schemes are used, and in this case can be None.

H_typestring, optional
specifies the symmetric storage scheme used for the Hessian $Hl = H(x,y)$. It should be one of ‘coordinate’, ‘sparse_by_rows’, ‘dense’ or ‘diagonal’; lower or upper case variants are allowed. This and the following H_* arguments are only required if a Newton approximation or tensor Gauss-Newton approximation model is required (see control.model = 4,…,8).

H_neint, optional
holds the number of entries in the lower triangular part of $H$ in the sparse co-ordinate storage scheme. It need not be set for any of the other three schemes.

H_rowndarray(H_ne), optional
holds the row indices of the lower triangular part of $H$ in the sparse co-ordinate storage scheme. It need not be set for any of the other three schemes, and in this case can be None.

H_colndarray(H_ne), optional
holds the column indices of the lower triangular part of $H$ in either the sparse co-ordinate, or the sparse row-wise storage scheme. It need not be set when the dense or diagonal storage schemes are used, and in this case can be None.

H_ptrndarray(n+1), optional
holds the starting position of each row of the lower triangular part of $H$, as well as the total number of entries, in the sparse row-wise storage scheme. It need not be set when the other schemes are used, and in this case can be None.

optionsdict, optional
dictionary of control options (see expo.initialize).

expo.solve(n, m, J_ne, H_ne, c_l, c_u, x_l, x_u, x, eval_fc, eval_gj, eval_hl)#

Find an approximate minimizer of a given constrained optimization problem using an exponential penalty method.

Parameters:

nint
holds the number of variables.

mint
holds the number of constraints.

J_neint
holds the number of entries in the Jacobian $J = J(x)$.

H_neint, optional
holds the number of entries in the lower triangular part of the Hessian $H = H(x,y)$.

c_lndarray(m)
holds the values $c^l$ of the lower bounds on the constraints $c(x)$.

c_undarray(m)
holds the values $c^u$ of the upper bounds on the constraints $c(x)$.

x_lndarray(n)
holds the values $x^l$ of the lower bounds on the optimization variables $x$.

x_undarray(n)
holds the values $x^u$ of the upper bounds on the optimization variables $x$.

xndarray(n)
holds the initial values of optimization variables $x$.

eval_fccallable
a user-defined function that must have the signature:

f, c = eval_fc(x)

The value of the objective $f(x)$ and components of the constraints $c(x)$ evaluated at $x$ must be assigned to f and c, respectively.

eval_gjcallable
a user-defined function that must have the signature:

g, j = eval_gj(x)

The components of the gradient $g(x)$ and the nonzeros in the Jacobian $J(x)$ of the constraint functions evaluated at $x$ must be assigned to g and j, respectively, the latter in the same order as specified in the sparsity pattern in expo.load.

eval_hlcallable
a user-defined function that must have the signature:

h = eval_hl(x,y)

The components of the nonzeros in the lower triangle of the Hessian $Hl(x,y)$ evaluated at $x$ and $y$ must be assigned to h in the same order as specified in the sparsity pattern in expo.load.

Returns:

xndarray(n)
holds the value of the approximate minimizer $x$ after a successful call.

yndarray(m)
holds the value of the Lagrange multipliers $y$ after a successful call.

zndarray(n)
holds the value of the dual variables $z$ after a successful call.

cndarray(m)
holds the value of the constraints $c(x)$.

glndarray(n)
holds the gradient $gl(x,y,z)$ of the Lagrangian function.

[optional] expo.information()

Provide optional output information

Returns:

informdict

dictionary containing output information:

statusint
return status. Possible values are:

0

The run was successful.

-1

An allocation error occurred. A message indicating the offending array is written on unit options[‘error’], and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

-2

A deallocation error occurred. A message indicating the offending array is written on unit options[‘error’] and the returned allocation status and a string containing the name of the offending array are held in inform[‘alloc_status’] and inform[‘bad_alloc’] respectively.

-3

The restriction n > 0 or m > 0 or requirement that type contains its relevant string ‘dense’, ‘coordinate’, ‘sparse_by_rows’, ‘diagonal’ or ‘absent’ has been violated.

-9

The analysis phase of the factorization failed; the return status from the factorization package is given by inform[‘factor_status’].

-10

The factorization failed; the return status from the factorization package is given by inform[‘factor_status’].

-11

The solution of a set of linear equations using factors from the factorization package failed; the return status from the factorization package is given by inform[‘factor_status’].

-15

The preconditioner $S(x)$ appears not to be positive definite.

-16

The problem is so ill-conditioned that further progress is impossible.

-18

Too many iterations have been performed. This may happen if options[‘maxit’] is too small, but may also be symptomatic of a badly scaled problem.

-19

The CPU time limit has been reached. This may happen if options[‘cpu_time_limit’] is too small, but may also be symptomatic of a badly scaled problem.

-82

The user has forced termination of the solver by removing the file named options[‘alive_file’] from unit options[‘alive_unit’].

alloc_statusint
the status of the last attempted allocation/deallocation.

bad_allocstr
the name of the array for which an allocation/deallocation error occurred.

bad_evalstr
the name of the user-supplied evaluation routine for which an error occurred.

iterint
the total number of iterations performed.

fc_evalint
the total number of evaluations of the objective function $f(x)$ and constraint functions $c(x)$.

gj_evalint
the total number of evaluations of the gradient $g(x)$ of $f(x)$ and Jacobian $J(x)$ of $c(x)$.

hl_evalint
the total number of evaluations of the Hessian $Hl(x,y)$ of the Lagrangian.

objfloat
the value of the objective function $f(x)$ at the best estimate of the solution, x, determined by EXPO_solve.

primal_infeasibilityfloat
the norm of the primal infeasibility at the best estimate of the solution x, determined by EXPO_solve.

dual_infeasibilityfloat
the norm of the dual infeasibility at the best estimate of the solution x, determined by EXPO_solve.

complementary_slacknessfloat
the norm of the complementary_slackness at the best estimate of the solution x, determined by EXPO_solve.

timedict

dictionary containing timing information:

totalfloat
the total CPU time spent in the package.

preprocessfloat
the CPU time spent preprocessing the problem.

analysefloat
the CPU time spent analysing the required matrices prior to factorization.

factorizefloat
the CPU time spent factorizing the required matrices.

solvefloat
the CPU time spent computing the search direction.

clock_totalfloat
the total clock time spent in the package.

clock_preprocessfloat
the clock time spent preprocessing the problem.

clock_analysefloat
the clock time spent analysing the required matrices prior to factorization.

clock_factorizefloat
the clock time spent factorizing the required matrices.

clock_solvefloat
the clock time spent computing the search direction.

bsc_informdict
inform parameters for BSC (see bsc.information).

tru_informdict
inform parameters for TRU (see tru.information).

ssls_informdict
inform parameters for SSLS (see ssls.information).

expo.terminate()#

Deallocate all internal private storage.

example code#

This example code is available in $GALAHAD/src/expo/Python/test_expo.py .