Curve Fitting: Fitting a Curve to a Set of Data Points


PROC OPTMODEL Statements and Output

The following PROC OPTMODEL statements declare an index set and parameters and then read the input data:

proc optmodel;
   set POINTS;
   num x {POINTS};
   num y {POINTS};
   read data xy_data into POINTS=[_N_] x y;

The following NUM statement declares the order parameter, which is later set to 1 for Problems (1) and (2) and set to 2 for Problem (3):

   num order;
   var Beta {0..order};
   impvar Estimate {i in POINTS}
      = Beta[0] + sum {k in 1..order} Beta[k] * x[i]^k;

The following statements encode the linearization of the $L_1$ norm:

   var Surplus {POINTS} >= 0;
   var Slack {POINTS} >= 0;
   min Objective1 = sum {i in POINTS} (Surplus[i] + Slack[i]);
   con Abs_dev_con {i in POINTS}:
      Estimate[i] - Surplus[i] + Slack[i] = y[i];

The following statements (which are not used but are shown here for comparison) encode an alternative linearization of the $L_1$ norm that requires half as many variables but twice as many constraints:

   var AbsDeviation {POINTS} >= 0;
   min Objective1 = sum {i in POINTS} AbsDeviation[i];
   con Abs_dev_con1 {i in POINTS}:
      AbsDeviation[i] >= Estimate[i] - y[i];
   con Abs_dev_con2 {i in POINTS}:
      AbsDeviation[i] >= y[i] - Estimate[i];

The following additional declarations encode the linearization of the $L_\infty $ norm:

   var MinMax;
   min Objective2 = MinMax;
   con MinMax_con {i in POINTS}:
      MinMax >= Surplus[i] + Slack[i];

The following statements (which are not used but match the formulation in Williams (1999)) encode an alternative linearization of the $L_\infty $ norm that requires the same number of variables but twice as many constraints:

   var MinMax;
   min Objective2 = MinMax;
   con MinMax_con1 {i in POINTS}:
      MinMax >= Surplus[i];
   con MinMax_con2 {i in POINTS}:
      MinMax >= Slack[i];

The following NUM statements use the ABS function, the .sol variable suffix, and the MAX aggregation operator to compute the two norms from the optimal solution:

   num sum_abs_dev = sum {i in POINTS} abs(Estimate[i].sol - y[i]);
   num max_abs_dev = max {i in POINTS} abs(Estimate[i].sol - y[i]);

The following PROBLEM statement specifies the variables, objective, and constraints for $L_1$ minimization:

   problem L1 include
      Beta Surplus Slack
      Objective1
      Abs_dev_con;

The following PROBLEM statement specifies the additional variables, objective, and constraints for $L_\infty $ minimization:

   problem Linf from L1 include
      MinMax
      Objective2
      MinMax_con;

The following statements specify a straight line fit, switch the focus to problem L1, solve Problem (1), print the results, and store the y-values that are predicted by the optimal solution:

   order = 1;
   use problem L1;
   solve;
   print sum_abs_dev max_abs_dev;
   print Beta;
   print x y Estimate Surplus Slack;
   create data sol_data1 from [POINTS] x y Estimate;

Figure 11.1 shows the output from the linear programming solver for Problem (1).

Figure 11.1: Output from Linear Programming Solver for Problem (1)

The OPTMODEL Procedure

Problem Summary
Objective Sense Minimization
Objective Function Objective1
Objective Type Linear
   
Number of Variables 40
Bounded Above 0
Bounded Below 38
Bounded Below and Above 0
Free 2
Fixed 0
   
Number of Constraints 19
Linear LE (<=) 0
Linear EQ (=) 19
Linear GE (>=) 0
Linear Range 0
   
Constraint Coefficients 75

Performance Information
Execution Mode Single-Machine
Number of Threads 1

Solution Summary
Solver LP
Algorithm Dual Simplex
Objective Function Objective1
Solution Status Optimal
Objective Value 11.46625
   
Primal Infeasibility 2.220446E-15
Dual Infeasibility 0
Bound Infeasibility 0
   
Iterations 24
Presolve Time 0.00
Solution Time 0.00

sum_abs_dev max_abs_dev
11.466 2.7688

[1] Beta
0 0.58125
1 0.63750

[1] x y Estimate Surplus Slack
1 0.0 1.0 0.58125 0.00000 0.41875
2 0.5 0.9 0.90000 0.00000 0.00000
3 1.0 0.7 1.21875 0.51875 0.00000
4 1.5 1.5 1.53750 0.03750 0.00000
5 1.9 2.0 1.79250 0.00000 0.20750
6 2.5 2.4 2.17500 0.00000 0.22500
7 3.0 3.2 2.49375 0.00000 0.70625
8 3.5 2.0 2.81250 0.81250 0.00000
9 4.0 2.7 3.13125 0.43125 0.00000
10 4.5 3.5 3.45000 0.00000 0.05000
11 5.0 1.0 3.76875 2.76875 0.00000
12 5.5 4.0 4.08750 0.08750 0.00000
13 6.0 3.6 4.40625 0.80625 0.00000
14 6.6 2.7 4.78875 2.08875 0.00000
15 7.0 5.7 5.04375 0.00000 0.65625
16 7.6 4.6 5.42625 0.82625 0.00000
17 8.5 6.0 6.00000 0.00000 0.00000
18 9.0 6.8 6.31875 0.00000 0.48125
19 10.0 7.3 6.95625 0.00000 0.34375



The following statements switch the focus to problem Linf, solve Problem (2), print the results, and store the y-values that are predicted by the optimal solution:

   use problem Linf;
   solve;
   print sum_abs_dev max_abs_dev;
   print Beta;
   print x y Estimate Surplus Slack;
   create data sol_data2 from [POINTS] x y Estimate;

Figure 11.2 shows the output from the linear programming solver for Problem (2).

Figure 11.2: Output from Linear Programming Solver for Problem (2)

Problem Summary
Objective Sense Minimization
Objective Function Objective2
Objective Type Linear
   
Number of Variables 41
Bounded Above 0
Bounded Below 38
Bounded Below and Above 0
Free 3
Fixed 0
   
Number of Constraints 38
Linear LE (<=) 0
Linear EQ (=) 19
Linear GE (>=) 19
Linear Range 0
   
Constraint Coefficients 132

Performance Information
Execution Mode Single-Machine
Number of Threads 1

Solution Summary
Solver LP
Algorithm Dual Simplex
Objective Function Objective2
Solution Status Optimal
Objective Value 1.725
   
Primal Infeasibility 8.881784E-16
Dual Infeasibility 0
Bound Infeasibility 0
   
Iterations 27
Presolve Time 0.00
Solution Time 0.00

sum_abs_dev max_abs_dev
19.95 1.725

[1] Beta
0 -0.400
1 0.625

[1] x y Estimate Surplus Slack
1 0.0 1.0 -0.4000 0.000 1.4000
2 0.5 0.9 -0.0875 0.000 0.9875
3 1.0 0.7 0.2250 0.000 0.4750
4 1.5 1.5 0.5375 0.000 0.9625
5 1.9 2.0 0.7875 0.000 1.2125
6 2.5 2.4 1.1625 0.000 1.2375
7 3.0 3.2 1.4750 0.000 1.7250
8 3.5 2.0 1.7875 0.000 0.2125
9 4.0 2.7 2.1000 0.000 0.6000
10 4.5 3.5 2.4125 0.000 1.0875
11 5.0 1.0 2.7250 1.725 0.0000
12 5.5 4.0 3.0375 0.000 0.9625
13 6.0 3.6 3.3500 0.000 0.2500
14 6.6 2.7 3.7250 1.025 0.0000
15 7.0 5.7 3.9750 0.000 1.7250
16 7.6 4.6 4.3500 0.000 0.2500
17 8.5 6.0 4.9125 0.000 1.0875
18 9.0 6.8 5.2250 0.000 1.5750
19 10.0 7.3 5.8500 0.000 1.4500



The following statements specify a quadratic curve fit, solve both parts of Problem (3), print the results, and store the y-values that are predicted by the optimal solutions:

   order = 2;
   use problem L1;
   solve;
   print sum_abs_dev max_abs_dev;
   print Beta;
   print x y Estimate Surplus Slack;
   create data sol_data3 from [POINTS] x y Estimate;

   use problem Linf;
   solve;
   print sum_abs_dev max_abs_dev;
   print Beta;
   print x y Estimate Surplus Slack;
   create data sol_data4 from [POINTS] x y Estimate;
quit;

Figure 11.3 shows the output from the linear programming solver for the first part of Problem (3).

Figure 11.3: Output from Linear Programming Solver for First Part of Problem (3)

Problem Summary
Objective Sense Minimization
Objective Function Objective1
Objective Type Linear
   
Number of Variables 41
Bounded Above 0
Bounded Below 38
Bounded Below and Above 0
Free 3
Fixed 0
   
Number of Constraints 19
Linear LE (<=) 0
Linear EQ (=) 19
Linear GE (>=) 0
Linear Range 0
   
Constraint Coefficients 93

Performance Information
Execution Mode Single-Machine
Number of Threads 1

Solution Summary
Solver LP
Algorithm Dual Simplex
Objective Function Objective1
Solution Status Optimal
Objective Value 10.458964706
   
Primal Infeasibility 5.329071E-15
Dual Infeasibility 0
Bound Infeasibility 0
   
Iterations 21
Presolve Time 0.00
Solution Time 0.00

sum_abs_dev max_abs_dev
10.459 2.298

[1] Beta
0 0.982353
1 0.294510
2 0.033725

[1] x y Estimate Surplus Slack
1 0.0 1.0 0.98235 0.00000 0.017647
2 0.5 0.9 1.13804 0.23804 0.000000
3 1.0 0.7 1.31059 0.61059 0.000000
4 1.5 1.5 1.50000 0.00000 0.000000
5 1.9 2.0 1.66367 0.00000 0.336329
6 2.5 2.4 1.92941 0.00000 0.470588
7 3.0 3.2 2.16941 0.00000 1.030588
8 3.5 2.0 2.42627 0.42627 0.000000
9 4.0 2.7 2.70000 0.00000 0.000000
10 4.5 3.5 2.99059 0.00000 0.509412
11 5.0 1.0 3.29804 2.29804 0.000000
12 5.5 4.0 3.62235 0.00000 0.377647
13 6.0 3.6 3.96353 0.36353 0.000000
14 6.6 2.7 4.39520 1.69520 0.000000
15 7.0 5.7 4.69647 0.00000 1.003529
16 7.6 4.6 5.16861 0.56861 0.000000
17 8.5 6.0 5.92235 0.00000 0.077647
18 9.0 6.8 6.36471 0.00000 0.435294
19 10.0 7.3 7.30000 0.00000 0.000000



Figure 11.4 shows the output from the linear programming solver for the second part of Problem (3).

Figure 11.4: Output from Linear Programming Solver for Second Part of Problem (3)

Problem Summary
Objective Sense Minimization
Objective Function Objective2
Objective Type Linear
   
Number of Variables 42
Bounded Above 0
Bounded Below 38
Bounded Below and Above 0
Free 4
Fixed 0
   
Number of Constraints 38
Linear LE (<=) 0
Linear EQ (=) 19
Linear GE (>=) 19
Linear Range 0
   
Constraint Coefficients 150

Performance Information
Execution Mode Single-Machine
Number of Threads 1

Solution Summary
Solver LP
Algorithm Dual Simplex
Objective Function Objective2
Solution Status Optimal
Objective Value 1.475
   
Primal Infeasibility 9.769963E-15
Dual Infeasibility 0
Bound Infeasibility 0
   
Iterations 29
Presolve Time 0.00
Solution Time 0.00

sum_abs_dev max_abs_dev
16.758 1.475

[1] Beta
0 2.475
1 -0.625
2 0.125

[1] x y Estimate Surplus Slack
1 0.0 1.0 2.4750 1.47500 0.00000
2 0.5 0.9 2.1938 1.29375 0.00000
3 1.0 0.7 1.9750 1.27500 0.00000
4 1.5 1.5 1.8188 0.31875 0.00000
5 1.9 2.0 1.7388 0.00000 0.26125
6 2.5 2.4 1.6938 0.00000 0.70625
7 3.0 3.2 1.7250 0.00000 1.47500
8 3.5 2.0 1.8188 0.00000 0.18125
9 4.0 2.7 1.9750 0.00000 0.72500
10 4.5 3.5 2.1938 0.00000 1.30625
11 5.0 1.0 2.4750 1.47500 0.00000
12 5.5 4.0 2.8188 0.00000 1.18125
13 6.0 3.6 3.2250 0.00000 0.37500
14 6.6 2.7 3.7950 1.09500 0.00000
15 7.0 5.7 4.2250 0.00000 1.47500
16 7.6 4.6 4.9450 0.34500 0.00000
17 8.5 6.0 6.1938 0.19375 0.00000
18 9.0 6.8 6.9750 0.17500 0.00000
19 10.0 7.3 8.7250 1.42500 0.00000



You can find a higher-order polynomial fit simply by increasing the value of the order parameter. The dimensions of the Beta and Estimate variables are automatically updated when order changes.

The following PROC SGPLOT statements use the output data sets that are created by PROC OPTMODEL to display the results from Problems (1) and (2) in one plot:

data plot1;
   merge sol_data1(rename=(Estimate=Line1)) sol_data2(rename=(Estimate=Line2));
run;

proc sgplot data=plot1;
   scatter x=x y=y;
   series x=x y=Line1 / curvelabel;
   series x=x y=Line2 / curvelabel;
run;

Figure 11.5 shows the regression lines for Problems (1) and (2), as on page 319 of Williams (1999).

Figure 11.5: Regression Lines for Problems (1) and (2)

Regression Lines for Problems (1) and (2)


The following PROC SGPLOT statements use the output data sets that are created by PROC OPTMODEL to display the results from both parts of Problem (3) in one plot:

data plot2;
   merge sol_data3(rename=(Estimate=Curve1))
      sol_data4(rename=(Estimate=Curve2));
run;

proc sgplot data=plot2;
   scatter x=x y=y;
   series x=x y=Curve1 / curvelabel;
   series x=x y=Curve2 / curvelabel;
run;

Figure 11.6 shows the regression curves for Problem (3), as on page 319 of Williams (1999).

Figure 11.6: Regression Curves for Problem (3)

Regression Curves for Problem (3)