IMSTAT Procedure (Analytics)

OPTIMIZE Statement

The OPTIMIZE statement performs a non-linear optimization of an objective function that is defined through a SAS program. The expression defined in the SAS program and its analytic first and second derivatives are compiled into executable code. The code is then executed in multiple threads against the data in an in-memory table. Like all other IMSTAT statements, the calculations are performed by the server. You can choose from several first-order and second-order optimization algorithms.

Syntax

OPTIMIZE <options>;

OPTIMIZE Statement Options

ALPHA=number

specifies a number between 0 and 1 from which to determine the confidence level for approximate confidence intervals of the parameter estimates. The default is α = 0.05, which leads to 100 x (1- α)% = 95% confidence limits for the parameter estimates.

Default 0.05

BOUNDS=(boundary-specification<, boundary-specification,...>)

specifies boundary values for the parameters. A boundary-specification is specified in the following form:

parameter-name operator value

parameter-name

specifies the parameter

operator

is one of >=, GE, <=, LE, >, GT, <, LT, =, EQ.

value

specifies the boundary value

Alias BOUND=
Example BOUNDS=(s2 > 0, beta2 >= 0.2)

CODE=file-reference

specifies a file reference to the SAS program that defines the objective function. The program must make an assignment to the reserved symbol _OBJFNC_. The server then minimizes the negative of that function (or maximize the function). In other words, you should specify _OBJFNC_ to be the function that you want to maximize across the in-memory table. The actual optimization is carried out as a minimization problem.

Alias PGM=

DEFSTART=value

specifies the default starting value for parameters whose starting value has not been specified. The default value, 1, might not work well depending on the optimization.

Alias DEFVAL=
Default 1

DUD

specifies that you do not want to use analytic derivatives in the optimization. The option name is an acronym for "do not use derivatives." Instead, the server calculates gradient vectors and Hessian matrices from finite difference approximations. Generally, you should not rely on derivatives calculated from finite differences if analytic derivatives are available. However, this option is useful in situations where the objective function is not calculated independently for each row of data. If derivatives of the objective function depend on lagged values, which are themselves functions of the parameters, then finite difference derivatives are called for.

Alias NODERIVATIVES

FCONV=r

specifies a relative function convergence criterion. For all techniques except NMSIMP, termination requires a small relative change of the function value in successive iterations. Suppose that Ψ is the p × 1 vector of parameter estimates in the optimization and the objective function at the kth iteration is denoted, f(Ψ)k. Then, the FCONV criterion is met if

fconv
Default r=10-FDIGITS where FDIGITS is -log10(e) and e is the machine precision.

GCONV=r

specifies a relative gradient convergence criterion. For all optimization techniques except CONGRA and NMSIMP, termination requires that the normalized predicted function reduction is small. The default value is r = 1e-8. Suppose that Ψ is the p × 1 vector of parameter estimates in the optimization with ith element Ψi. . The objective function, its p × 1 gradient vector, and its p × p Hessian matrix are denoted, f(Ψ), g(Ψ), and H(Ψ ), respectively. Then, if superscripts denote the iteration count, the normalized predicted function reduction at iteration k is

gconv
The GCONV convergence criterion is assumed to be met if that value is less than or equal to r.
Note that it is possible that the relative gradient reduction is small, even if one or more gradients is still substantial in absolute value. If this situation occurs, you can disable the GCONV criterion by setting r=0. If the optimization would have stopped early due to meeting the GCONV criterion, the iterative process usually takes one more step until the gradients are small in absolute value.

ITDETAIL

requests that the server produce an iteration history table for the optimization. This table displays the objective function, its absolute change, and the largest absolute gradient across the iterations.

MAXFUNC=n

specifies the maximum number n of function calls in the iterative model fitting process. The default value depends on the optimization technique as follows:

Optimization Technique
Default Number of Function Calls
TRUREG, NRRIDG, and NEWRAP
125
QUANEW and DBLDOG
500
CONGRA
1000
NMSIMP
3000
Alias MAXFU=

MAXITER=i

specifies the maximum number of iterations in the iterative model fitting process. The default value depends on the optimization technique as follows:

Optimization Technique
Default Number of Iterations
TRUREG, NRRIDG, and NEWRAP
50
QUANEW and DBLDOG
200
CONGRA
400
NMSIMP
1000
Alias MAXIT=

MAXTIME=t

specifies an upper limit of t seconds of CPU time for the optimization process. The default value is the largest floating-point double representation value for the hardware used by the SAS LASR Analytic Server. Note that the time specified by the MAXTIME= option is checked only once at the end of each iteration. The time is measured on the root node for the server. Therefore, the actual running time can be longer than the value specified by the MAXTIME= option.

MINITER=i

specifies the minimum number of iterations.

Alias MINIT=
Default 0

NBEST=k

requests that only the k best points in the starting value grid are reproduced in the "Starting Values" table. By default, the objective function is initially evaluated at all points in the starting value grid and the "Starting Values" table contains one row for each point on the grid. If you specify the NBEST= option, then only the k points with the smallest objective function value are shown.

Alias BEST=

NOEMPTY

requests that result sets for optimizations without usable data are not generated.

NOPREPARSE

specifies to prevent pre-parsing and pre-generating the program code that is referenced in the CODE= option. If you know the code is correct, you can specify this option to save resources. The code is always parsed by the server, but you might get more detailed error messages when the procedure parses the code rather than the server. The server assumes that the code is correct. If the code fails to compile, the server indicates that it could not parse the code, but not where the error occurred.

Alias NOPREP

NOSTDERR

specifies to prevent calculating standard errors of the parameter estimates. The calculation of standard errors requires the derivation of the Hessian or cross-product Jacobian. If you do not want standard errors, p-values, or confidence intervals for the parameter estimates, then specifying this option saves computing resources.

Alias NOSTD

PARAMETERS=(parameter-specification <, parameter-specification...>)

specifies the parameters in the optimization and the starting values. You do not have to specify parameters and you do not have to specify starting values. If you omit the starting values, the default starting value is assigned. This default value is 1.0 and can be modified with the DEFSTART= option.

If you do not specify the parameter names, the server assumes that all symbols in your SAS program are parameters if they do not match column names in the in-memory table or are not special or temporary symbols. This might not be what you want and you should examine the "Starting Values" and "Parameter Estimates" table in that case to make sure that the server designated the appropriate quantities as parameters in the optimization.
In the first example that follows, Intercept is assigned a starting value of 6. The remaining parameters start at 0 because the DEFSTART= option is 0.
In the second example that follows, the server evaluates the objective function initially for the Cartesian product set of all the parameter vectors. The server evaluates 1 × 3 × 2 × 1 = 6 parameter vectors. The optimization then starts from the vector associated with the best objective function value.
Alias PARMS=
Examples DEFSTART=0; PARMS=(Intercept = 6, a_0, b_0, c_0, x_1, x_2, x_3);
PARMS=(beta1 = -3.22, beta2 = 0.5 0.47 0.6, beta3 = -2.45 -2.0, s2 = 0.5);

RESTRICT=(one-restriction <, one-restriction>)

specifies linear equality and inequality constraints for the optimization. A single restriction takes on the general form

coefficient parameter ... coefficient parameter operator value
Inequality restrictions are expressed as constraints greater than (>) or greater than or equal (>=) than the right hand side value.
The first example that follows shows the restriction β1 – 2 β2 > 3.
The second example that follows shows how to use more than one restriction. Restrictions are separated by commas and the second example requests that the estimates for parameters dose1 and dose2 are the same, as well as the estimates for logd1 and logd2.
Examples RESTRICT=(1 beta1 -2 beta2 > 3)
RESTRICT=(1 dose1 -1 dose2 = 0, 1 logd1 -1 logd2 = 0)

SAVE=table-name

saves the result table so that you can use it in other IMSTAT procedure statements like STORE, REPLAY, and FREE. The value for table-name must be unique within the scope of the procedure execution. The name of a table that has been freed with the FREE statement can be used again in subsequent SAVE= options.

SETSIZE

requests that the server estimate the size of the result set. The procedure does not create a result table if the SETSIZE option is specified. Instead, the procedure reports the number of rows that are returned by the request and the expected memory consumption for the result set (in KB). If you specify the SETSIZE option, the SAS log includes the number of observations and the estimated result set size. See the following log sample:

NOTE: The LASR Analytic Server action request for the STATEMENT
      statement would return 17 rows and approximately
      3.641 kBytes of data.
The typical use of the SETSIZE option is to get an estimate of the size of the result set in situations where you are unsure whether the SAS session can handle a large result set. Be aware that in order to determine the size of the result set, the server has to perform the work as if you were receiving the actual result set. Requesting the estimated size of the result set does consume resources on the server. The estimated number of KB is very close to the actual memory consumption of the result set. It might not be immediately obvious how this size relates to the displayed table, since many tables contain hidden columns. In addition, some elements of the result set might not be converted to tabular output by the procedure.

TECHNIQUE=

specifies the optimization technique.

Valid values are as follows:
CONGRA (CG) performs a conjugate-gradient optimization.
DBLDOG (DD) performs a version of the double-dogleg optimization.
DUQUANEW (DQN) performs a (dual) quasi-Newton optimization.
NMSIMP (NS) performs a Nelder-Mead simplex optimization.
NONE specifies not to perform any optimization. This value can be used to perform a grid search without optimization.
NEWRAP (NRA) performs a (modified) Newton-Raphson optimization that combines a line-search algorithm with ridging.
NRRIDG (NRR) performs a (modified) Newton-Raphson optimization with ridging.
QUANEW (QN) performs a quasi-Newton optimization.
TRUREG (TR) performs a trust-region optimization.
The factors that go into choosing a particular optimization technique for a particular problem are complex. Trial and error can be involved. For many optimization problems, computing the gradient takes more computer time than computing the function value. Computing the Hessian sometimes takes much more computer time and memory than computing the gradient, especially when there are many parameters. Unfortunately, first-order optimization techniques that do not use some type of Hessian or Hessian approximation usually require more iterations than second-order techniques that use a Hessian matrix. As a result, the total run time of first-order techniques can be longer. Techniques that do not use the Hessian also tend to be less reliable. For example, they can terminate more easily at stationary points than at global optima.
The TRUREG, NEWRAP, and NRRIDG algorithms are second-order algorithms.
The server computes first and second derivatives of the objective function with respect to the parameters in analytic form wherever possible. Finite-difference approximations for derivatives are used only when the derivatives of functions are not known. In most cases, finite-difference approximations are not necessary.
For more information about the algorithms, see SAS/STAT User's Guide.
Alias TECH=
Default DUQUANEW