The OPTLSO Procedure

Describing the Objective Function

PROC OPTLSO enables you to define the function explicitly by using PROC FCMP. The following statements describe a PROC FCMP objective function that is used in Linear Constraints and a Nonlinear Objective:

proc fcmp outlib=sasuser.myfuncs.mypkg;    
   function sixhump(x1,x2);   
      return ((4 - 2.1*x1**2 + x1**4/3)*x1**2 
              + x1*x2 + (-4 + 4*x2**2)*x2**2); 
   endsub;  
run; 

Because PROC FCMP writes to an external library that might contain a large number of functions, PROC OPTLSO needs to know which objective function in the FCMP library to use and whether to minimize or maximize the function. You provide this information to PROC OPTLSO by specifying an objective description data set in the OBJECTIVE= option. To minimize the function sixhump, you could specify the OBJECTIVE= objdata option and define the following objective description data set:

data objdata;
   input _id_ $ _function_ $ _sense_ $;
   datalines;
f   sixhump     min
;

In the preceding DATA step, the _ID_ column specifies the function name to be used internally by the solver, the _FUNCTION_ column specifies the corresponding FCMP function name, and the _SENSE_ column specifies whether the objective is to be minimized or maximized.

Intermediate Functions

You can use intermediate functions to simplify the objective function definition and to improve computational efficiency. You specify intermediate functions by using a missing value entry in the _SENSE_ column to denote that the new function is not an objective. The _ID_ column entries for intermediate functions can then be used as arguments for the objective function. The following set of programming statements demonstrates how to create an equivalent objective definition for Linear Constraints and a Nonlinear Objective by using intermediate functions.

data objdata;
   length _function_ $10;
   input _id_ $ _function_ $ _sense_ $;
   datalines;
f1   sixhump1      .
f2   sixhump2      .
f3   sixhumpNew    min
;
proc fcmp outlib=sasuser.myfuncs.mypkg;  
   function sixhump1(x1,x2);   
      return (4 - 2.1*x1**2 + x1**4/3);
   endsub; 
   function sixhump2(x1,x2);
      return (-4 + 4*x2**2); 
   endsub; 
   function sixhumpNew(x1,x2,f1,f2);   
      return (f1*x1**2 + x1*x2 + f2*x2**2); 
   endsub; 
run;

In this case, PROC OPTLSO first computes the values for sixhump1 and sixhump2, internally assigning the output to f1 and f2, respectively. The _ID_ column entries for intermediate functions can then be used as arguments for the objective function f3. Because the intermediate functions are evaluated first, before the objective function is evaluated, intermediate functions should never depend on output from the objective function.

Incorporating MPS and QPS Objective Functions

If you use the MPSDATA= or QPSDATA= options to define linear constraints, an objective function $m(x)$ is necessarily defined (see Describing Linear Constraints). If you do not specify the OBJECTIVE= option, PROC OPTLSO optimizes $m(x)$. However, if you specify the OBJECTIVE= option, the objective function $m(x)$ is ignored unless you explicitly include the corresponding objective function name and specify whether the function is to be used as an intermediate or objective function. (See Combining MPS and FCMP Function Definitions.) If the objective name also matches a name in the FCMP library, the FCMP function definition takes precedence.

Using Large Data Sets

When your objective function includes the sum of a family of functions that are parameterized by the rows of a given data set, PROC OPTLSO enables you to include in your objective function definition a single external data set that is specified in the _DATASET_ column of the OBJECTIVE= data set. Consider the following unconstrained optimization problem, where $k$ is a very large integer (for example, $10^6$), $\mathbf{A}$ denotes a $k \times 5$ matrix, and $\mathbf{b}$ denotes the corresponding right-hand-side target vector:

\[  \min _{x\in {\mathbb R}^5} f(x) = \| Ax-b\| _2 + \| x\| _1  \]

To evaluate the objective function, the following operations must be performed:

$\displaystyle  f_1(x)  $
$\displaystyle = $
$\displaystyle  \sum _{i=1}^{k} d_ i  $
$\displaystyle f_2(x)  $
$\displaystyle = $
$\displaystyle  \sum _{j=1}^5 |x_ i|  $
$\displaystyle f(x)  $
$\displaystyle = $
$\displaystyle  \sqrt {f_1(x)} + f_2(x)  $

where

\[  d_ i = \left(-b_ i + \sum _{j=1}^5 a_{ij} x_ j \right)^2  \]

and $a_{ij}$ denotes the $j$th entry of row $i$ of the matrix $\mathbf{A}$. Assume that there is an existing SAS data set that stores numerical entries for $\mathbf{A}$ and $\mathbf{b}$. The following DATA step shows an example data set, where $k = 3$:

data Abdata;
   input _id_ $ a1 a2 a3 a4 a5 b;
   datalines;
row1  1  2  3  4  5  6
row2  7  8  9  10 11 12
row3  13 14 15 16 17 18 
;

The following statements pass this information to PROC OPTLSO by adding the corresponding functions to the FCMP function library Sasuser.Myfuncs.Mypkg:

proc fcmp outlib=sasuser.myfuncs.mypkg;
   function axbi(x1,x2,x3,x4,x5,a1,a2,a3,a4,a5,b);
      array x[5];
      array a[5]; 
      di = -b;
      do j=1 to 5;
         di = di + a[j]*x[j];
      end;
      return (di*di);
   endsub;

   function onenorm(x1,x2,x3,x4,x5);
      array x[5]; 
      f2 = 0;
      do j=1 to 5;
         f2 = f2 + abs(x[j]);
      end;
      return (f2);
   endsub;

   function combine(f1, f2);
      return (sqrt(f1)+f2);
   endsub;

The next DATA step then defines the objective name with a given target:

data lsqobj1;
   input _id_ $ _function_$ _sense_ $ _dataset_ $;
   datalines;
f1        axbi         .      Abdata 
f2        onenorm      .      .  
f         combine      min    .  
;

The following DATA step declares the variables:

data xvar;
   input _id_ $ @@;
   datalines;
   x1 x2 x3 x4 x5 
   ;

The following statements call the OPTLSO procedure:

options cmplib=sasuser.myfuncs;
proc optlso
   variables = xvar
   objective = lsqobj1;
run;

The contents of the OBJECTIVE= data set (lsqobj1) direct PROC OPTLSO to search for the three FCMP functions AXBI, ONENORM, and COMBINE in the library that is specified by the CMPLIB= option. The missing values in the _SENSE_ column indicate that AXBI and ONENORM are intermediate functions to be used as arguments of the objective function COMBINE, which is of type MIN. Of the three FCMP functions, only F1 has requested data. The entry Abdata specifies that the FCMP function AXBI should be called on each row of the data set Abdata and that the results should be summed. This value is then specified as the first argument to the FCMP function COMBINE.

In this example, Abdata is a data set that comes from the Work library. However, Abdata could just as easily come from a data set in a user-defined library or even a data set that had been previously distributed (for example, to a Teradata or Greenplum database). If the data set Abdata is stored in a different library, replace Abdata with libref.Abdata in the data set lsqob1. The source of the data set is irrelevant to PROC OPTLSO.

You can omit F2 if you want to form the one-norm directly in the aggregate function. Thus, an equivalent formulation would be as follows:

proc fcmp outlib=sasuser.myfuncs.mypkg; 
   function combine2(x1,x2,x3,x4,x5, f1);
      array x[5];
      f2 = 0;
      do j=1 to 5;
         f2 = f2 + abs(x[j]);
      end;
      return (sqrt(f1)+f2);
   endsub;
run;      

In this case, you define the objective name with a given target in a data set:

data lsqobj2;
   input _id_ $ _function_$ _sense_ $ _dataset_ $;
   datalines;
f1        axbi         .         Abdata
f         combine2     min       . 
;

options cmplib=sasuser.myfuncs;
proc optlso
   variables = xvar
   objective = lsqobj2;
run;

Thus, any of the intermediate functions that are used within the OBJECTIVE= data set (objdata) are permitted to have arguments that form a subset of the variables listed in the VARIABLES= data set (xvar) and the numerical columns from the data set that is specified in the _DATASET_ column of the OBJECTIVE= data set. Only numerical values are supported from external data sets. Only one function can be of type MIN or MAX. This function can take as arguments any of the variables in the VARIABLES= data set, any of the numerical columns from an external data set for the objective (if specified), and any implicit variables that are listed in the _ID_ column of the OBJECTIVE= data set.

The following rules for the objective data set are used during parsing:

  • Only one data set can be used for a given problem definition.

  • The objective function can take a data set as input only if no intermediate functions are being used. Otherwise, only the intermediate functions can be linked to the corresponding data set.

The data set is used in a distributed format if either the NODES= option is specified in the PERFORMANCE statement or the data set is a distributed library.