The OPTMODEL Procedure

Example 5.7 Sparse Modeling

This example demonstrates how to rewrite certain models for more efficient processing. Sometimes optimization models that run out of memory during problem generation can be rewritten to take advantage of sparsity to use memory more efficiently. This often occurs when a large array is modeled in a dense format but most of its entries are zeros. Usually, the array provides problem coefficients or it contains optimization variables.

The model for this example solves the facility location problem that is described in Example 8.3. This example is concerned with the resources that are required for PROC OPTMODEL problem generation and solver initialization. So the size of the problem has been increased, but the model has also been modified to make it easier to solve. In order to handle the larger problem size, the model eliminates a large number of the potential assignments of customers to facilities based on distance, making the problem sparse.

The following code generates a random instance of the facility location problem:

%let NumCustomers  = 1500;
%let NumSites      = 250;
%let SiteCapacity  = 50;
%let MaxDemand     = 10;
%let xmax          = 200;
%let ymax          = 100;
%let seed          = 938;

/* generate random customer locations */
data cdata(drop=i);
   length name $8;
   call streaminit(&seed);
   do i = 1 to &NumCustomers;
      name = compress('C'||put(i,best.));
      x = rand('UNIFORM') * &xmax;
      y = rand('UNIFORM') * &ymax;
      demand = 1;
      output;
   end;
run;

/* generate random site locations and fixed charge */
data sdata(drop=i);
   length name $8;
   call streaminit(&seed);
   do i = 1 to &NumSites;
      name = compress('SITE'||put(i,best.));
      x = rand('UNIFORM') * &xmax;
      y = rand('UNIFORM') * &ymax;
      fixed_charge = 300 * (abs(&xmax/2-x)/&xmax + abs(&ymax/2-y)/&ymax);
      output;
   end;
run;

The following code uses a dense version of the facility location model. This model is equivalent to the model from Example 8.3 except for the added constraint distance_at_most_30. This constraint eliminates from consideration the assignment of customers to facilities over long distances by forcing the corresponding Assign variables to 0.

proc optmodel;
   performance details;
   profile on percent=0.1;
   set <str> CUSTOMERS;
   set <str> SITES init {};

   /* x and y coordinates of CUSTOMERS and SITES */
   num x {CUSTOMERS union SITES};
   num y {CUSTOMERS union SITES};
   num demand {CUSTOMERS};
   num fixed_charge {SITES};

   /* distance from customer i to site j */
   num dist {i in CUSTOMERS, j in SITES}
       = sqrt((x[i] - x[j])^2 + (y[i] - y[j])^2);

   read data cdata into CUSTOMERS=[name] x y demand;
   read data sdata into SITES=[name] x y fixed_charge;

   var Assign {CUSTOMERS, SITES} binary;
   var Build {SITES} binary;

   min CostNoFixedCharge
       = sum {i in CUSTOMERS, j in SITES} dist[i,j] * Assign[i,j];
   min CostFixedCharge
       = CostNoFixedCharge + sum {j in SITES} fixed_charge[j] * Build[j];

   /* each customer assigned to exactly one site */
   con assign_def {i in CUSTOMERS}:
      sum {j in SITES} Assign[i,j] = 1;

   /* if customer i assigned to site j, then facility must be built at j */
   con link {i in CUSTOMERS, j in SITES}:
      Assign[i,j] <= Build[j];

   /* each site can handle at most &SiteCapacity demand */
   con capacity {j in SITES}:
      sum {i in CUSTOMERS} demand[i] * Assign[i,j] <=
         &SiteCapacity * Build[j];

   /* do not assign customer to site more than 30 units away */
   con distance_at_most_30 {i in CUSTOMERS, j in SITES: dist[i,j] > 30}:
      Assign[i,j] = 0;

   /* solve the MILP */
   solve with milp/timetype=real;

quit;

If you inspect the log after running the preceding code, then you will see that the MILP presolver has pruned down the problem size considerably. If you also run the code with the SAS option FULLSTIMER enabled on a 64-bit system, then you will notice that about 1.3GB of memory is required for the PROC OPTMODEL step when you are running on a single CPU.

The solution and timing results for the dense model are shown in Output 5.7.1. The PERFORMANCE DETAILS statement from the model requests display of the task timing table for the SOLVE statement. The PROFILE ON statement requests further timing details, including evaluation time for declarations used by problem generation.

Output 5.7.1: Dense Model Results

Solution Summary
Solver MILP
Algorithm Branch and Cut
Objective Function CostFixedCharge
Solution Status Optimal within Relative Gap
Objective Value 18001.14035
   
Relative Gap 0.000098081
Absolute Gap 1.7653963848
Primal Infeasibility 1.14871E-12
Bound Infeasibility 9.9999999E-7
Integer Infeasibility 1E-6
   
Best Bound 17999.374954
Nodes 5
Iterations 40131
Presolve Time 1.20
Solution Time 62.73

Procedure Task Timing
Task Time
(sec.)
% Time
Problem Generation 4.57 6.64%
Solver Initialization 0.90 1.30%
Code Generation 0.06 0.08%
Presolve 1.20 1.75%
Root Node Processing 50.70 73.62%
Branch And Cut 9.32 13.54%
Synchronization 0.20 0.30%
Idle 1.13 1.64%
Other Tasks 0.17 0.25%
Solver Postprocessing 0.62 0.89%

Profile Information
Item Line Col. Execution
Count
Net Time
(sec.)
Wait Time
(sec.)
% Total
Time
SOLVE 3546 4 1 64.85 2.17 85.9%
Constraint distance_at_most_30 3542 8   5.59 0.00 7.4%
Constraint link 3533 8   3.22 0.00 4.3%
Number dist 3514 8   0.52 0.00 0.7%
Constraint capacity 3537 8   0.38 0.00 0.5%
Min CostNoFixedCharge 3523 8   0.37 0.00 0.5%
Var Assign 3520 8   0.29 0.00 0.4%
Constraint assign_def 3529 8   0.28 0.00 0.4%
Other profiled items       0.01 0.00 0.0%

Note: Total profiled time is 75.51 seconds.




The best approach for reducing the memory requirements is to eliminate the Assign variables that are always going to be 0. This is accomplished in the following sparse version of the code. Instead of indexing Assign over the crossproduct of CUSTOMERS and SITES, now the code defines a new set of pairs that satisfy the distance requirement, CUSTOMERS_SITES. This set replaces the constraint distance_at_most_30 in the dense model. The objective and constraints have been modified to use the new indexing scheme, with implicit set slicing (as described in the section More on Index Sets) for constraints assign_def and capacity.

proc optmodel;
   performance details;
   profile on percent=0.1;
   set <str> CUSTOMERS;
   set <str> SITES init {};

   /* x and y coordinates of CUSTOMERS and SITES */
   num x {CUSTOMERS union SITES};
   num y {CUSTOMERS union SITES};
   num demand {CUSTOMERS};
   num fixed_charge {SITES};

   /* distance from customer i to site j */
   num dist {i in CUSTOMERS, j in SITES}
       = sqrt((x[i] - x[j])^2 + (y[i] - y[j])^2);

   read data cdata into CUSTOMERS=[name] x y demand;
   read data sdata into SITES=[name] x y fixed_charge;

   set CUSTOMERS_SITES = {i in CUSTOMERS, j in SITES: dist[i,j] <= 30};
   var Assign {CUSTOMERS_SITES} binary;
   var Build {SITES} binary;

   min CostNoFixedCharge
       = sum {<i,j> in CUSTOMERS_SITES} dist[i,j] * Assign[i,j];
   min CostFixedCharge
       = CostNoFixedCharge + sum {j in SITES} fixed_charge[j] * Build[j];

   /* each customer assigned to exactly one site */
   con assign_def {i in CUSTOMERS}:
      sum {<(i),j> in CUSTOMERS_SITES} Assign[i,j] = 1;

   /* if customer i assigned to site j, then facility must be built at j */
   con link {<i,j> in CUSTOMERS_SITES}:
      Assign[i,j] <= Build[j];

   /* each site can handle at most &SiteCapacity demand */
   con capacity {j in SITES}:
      sum {<i,(j)> in CUSTOMERS_SITES} demand[i] * Assign[i,j] <=
         &SiteCapacity * Build[j];

   /* solve the MILP */
   solve with milp/timetype=real;

quit;

The log from running the preceding code shows that the MILP presolver does not find anything to improve with this version of the model. On a 64-bit system, the FULLSTIMER option shows that memory requirements have been reduced to about 580MB when you are running on a single CPU, less than half the requirements of the previous model.

The solution and timing results for the dense model are shown in Output 5.7.2. Note that the dense model (Output 5.7.1) and the sparse model (Output 5.7.2) are equivalent after presolver processing and generate the same result using similar amounts of solver time. On the other hand, problem generation time is significantly reduced as are other times including presolve time. Both models used the solver option TIMETYPE=REAL so that all times are reported in seconds of real time. You can see from the "Profile Information" table that the overhead associated with problem declarations has been significantly reduced.

Output 5.7.2: Sparse Model Results

Solution Summary
Solver MILP
Algorithm Branch and Cut
Objective Function CostFixedCharge
Solution Status Optimal within Relative Gap
Objective Value 18001.14035
   
Relative Gap 0.000098081
Absolute Gap 1.7653963848
Primal Infeasibility 1.14871E-12
Bound Infeasibility 9.9999999E-7
Integer Infeasibility 1E-6
   
Best Bound 17999.374954
Nodes 5
Iterations 40131
Presolve Time 0.29
Solution Time 60.60

Procedure Task Timing
Task Time
(sec.)
% Time
Problem Generation 1.19 1.92%
Solver Initialization 0.10 0.17%
Code Generation 0.00 0.01%
Presolve 0.29 0.47%
Root Node Processing 49.82 80.44%
Branch And Cut 9.02 14.57%
Synchronization 0.20 0.32%
Idle 1.10 1.77%
Other Tasks 0.17 0.28%
Solver Postprocessing 0.04 0.06%

Profile Information
Item Line Col. Execution
Count
Net Time
(sec.)
Wait Time
(sec.)
% Total
Time
SOLVE 3596 4 1 60.79 0.12 97.5%
Number dist 3567 8   0.49 0.00 0.8%
Set CUSTOMERS_SITES 3573 8   0.46 0.00 0.7%
Constraint link 3587 8   0.35 0.00 0.6%
Constraint assign_def 3583 8   0.08 0.00 0.1%
Constraint capacity 3591 8   0.07 0.00 0.1%
Other profiled items       0.08 0.00 0.1%

Note: Total profiled time is 62.32 seconds.