The OPTLSO Procedure

Example 3.8 Johnson’s Systems of Distributions

This example further illustrates the use of external data sets that are specified in the OBJECTIVE= option. For this example, a data set that contains $n=20,000$ randomly generated observations is used to estimate the parameters for the Johnson $S_{U}$ family of distributions (Bowman and Shenton 1983). The objective is the log likelihood for the family, which involves four variables, $x=(x_1,x_2,x_3,x_4)$:

\[ f(x) = n \log (x_4) - n \log (x_2) - \dfrac {1}{2} \sum _{k=1}^ n \left( \left( x_3 + x_4 \log (z_ k) \right)^2 + \log (1+y_ k^2) \right) \]

where

\[ z_ k = y_ k + \sqrt {1+y_ k^2} \text { with } y_ k = \dfrac {d_ k - x_1}{x_2} \]

Here, $d_ k$ denotes the value of d in the kth observation of the data set that is generated by the following DATA step.

 data sudata;
    n=20000;
    theta=-1;
    sigma=1;
    delta=3;
    gamma=5;
    rngSeed=123;
    do i = 1 to n;
       z = rannor(rngSeed);
       a = exp( (z - gamma)/delta );
       d = sigma * ( (a**2 - 1)/(2*a) ) + theta;
       output;
    end;
    keep d;
 run;

This generates a data set called sudata that contains $n=20,000$ observations. You can modify n to increase or decrease the computational work per function evaluation. The following call to PROC FCMP defines the corresponding FCMP function definition:

proc fcmp outlib=sasuser.myfuncs.mypkg;
   function jsu(x4,x2,f1);
      return (20000*(log(x4) - log(x2)) + f1);
   endsub;

   function jsu1(x1,x2,x3,x4,d);
      yk = (d - x1)/x2;
      zk = yk + sqrt(1 + yk**2);
      return (-0.5*(x3 + x4*log(zk))**2 -0.5*log(1 + yk**2));
   endsub;
run;
options cmplib = sasuser.myfuncs;

In the following steps, the assumption for the definition of jsu and jsu1 is that jsu1 is called once for each line of data (in this case 20,000 times) and cumulatively summed. The resulting value is then provided to the function jsu for a final calculation, which is called only once per evaluation of $f(x)$.

 data objdata;
    input _id_ $ _function_ $ _sense_ $ _dataset_ $;
    datalines;
 f1 jsu1 .   sudata
 f  jsu  max .
 ;

 data vardata;
    input _id_ $ _lb_ _ub_;
    datalines;
 x1  .     .
 x2  1e-12 .
 x3  .     .
 x4  1e-12 .
 ;
 proc optlso
    primalout = solution
    objective = objdata
    variables = vardata
    logfreq   = 100
    maxgen    = 1000;
    performance nodes=4 nthreads=8;
 run;

Output 3.8.1 shows the output from running these steps.

Output 3.8.1: Estimation for Johnson $S_{U}$ Family of Distributions

The OPTLSO Procedure

Problem Summary
Problem Type NLP
   
Objective Definition Set OBJDATA
Variables VARDATA
   
Number of Variables 4
Integer Variables 0
Continuous Variables 4
   
Number of Constraints 0
Linear Constraints 0
Nonlinear Constraints 0
   
Objective Definition Source OBJDATA
Objective Sense Maximize
Objective Intermediate Functions 1
Objective Data Set sudata

Solution Summary
Solution Status Function convergence
Objective -8414.726059
Infeasibility 0
Iterations 494
Evaluations 43171
Cached Evaluations 147
Global Searches 1
Population Size 120
Seed 1

Performance Information
Host Node << your grid host >>
Execution Mode Distributed
Number of Compute Nodes 4
Number of Threads per Node 8