Example 31.4 Linear Discriminant Analysis of Remote-Sensing Data on Crops
In this example, the remote-sensing data are used. In this data set, the observations are grouped into five crops: clover, corn, cotton, soybeans, and sugar beets. Four measures called x1 through x4 make up the descriptive variables. 
In the first PROC DISCRIM statement, the DISCRIM procedure uses normal-theory methods (METHOD=NORMAL) assuming equal variances (POOL=YES) in five crops. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The LIST option lists the resubstitution classification results for each observation (Output  31.4.2). The CROSSVALIDATE option displays cross validation error-rate estimates (Output  31.4.3). The OUTSTAT= option stores the calibration information in a new data set to classify future observations. A second PROC DISCRIM statement uses this calibration information to classify a test data set. Note that the values of the identification variable, xvalues, are obtained by rereading the x1 through x4 fields in the data lines as a single character variable. The following statements produce Output  31.4.1 through Output  31.4.3: 
title 'Discriminant Analysis of Remote Sensing Data on Five Crops';
data crops;
   input Crop $ 1-10 x1-x4 xvalues $ 11-21;
   datalines;
Corn      16 27 31 33
Corn      15 23 30 30
Corn      16 27 27 26
Corn      18 20 25 23
Corn      15 15 31 32
Corn      15 32 32 15
Corn      12 15 16 73
Soybeans  20 23 23 25
Soybeans  24 24 25 32
Soybeans  21 25 23 24
Soybeans  27 45 24 12
Soybeans  12 13 15 42
Soybeans  22 32 31 43
Cotton    31 32 33 34
Cotton    29 24 26 28
Cotton    34 32 28 45
Cotton    26 25 23 24
Cotton    53 48 75 26
Cotton    34 35 25 78
Sugarbeets22 23 25 42
Sugarbeets25 25 24 26
Sugarbeets34 25 16 52
Sugarbeets54 23 21 54
Sugarbeets25 43 32 15
Sugarbeets26 54  2 54
Clover    12 45 32 54
Clover    24 58 25 34
Clover    87 54 61 21
Clover    51 31 31 16
Clover    96 48 54 62
Clover    31 31 11 11
Clover    56 13 13 71
Clover    32 13 27 32
Clover    36 26 54 32
Clover    53 08 06 54
Clover    32 32 62 16
;
title2 'Using the Linear Discriminant Function';
proc discrim data=crops outstat=cropstat method=normal pool=yes
             list crossvalidate;
   class Crop;
   priors prop;
   id xvalues;
   var x1-x4;
run;
    Output 31.4.1
    Linear Discriminant Function on Crop Data
| Clover | 11 | 11.0000 | 0.305556 | 0.305556 | 
| Corn | 7 | 7.0000 | 0.194444 | 0.194444 | 
| Cotton | 6 | 6.0000 | 0.166667 | 0.166667 | 
| Soybeans | 6 | 6.0000 | 0.166667 | 0.166667 | 
| Sugarbeets | 6 | 6.0000 | 0.166667 | 0.166667 | 
 
 
 
The DISCRIM Procedure
| 2.37125 | 7.52830 | 4.44969 | 6.16665 | 5.07262 | 
| 6.62433 | 3.27522 | 5.46798 | 4.31383 | 6.47395 | 
| 3.23741 | 5.15968 | 3.58352 | 5.01819 | 4.87908 | 
| 4.95438 | 4.00552 | 5.01819 | 3.58352 | 4.65998 | 
| 3.86034 | 6.16564 | 4.87908 | 4.65998 | 3.58352 | 
 
 
 
| -10.98457 | -7.72070 | -11.46537 | -7.28260 | -9.80179 | 
| 0.08907 | -0.04180 | 0.02462 | 0.0000369 | 0.04245 | 
| 0.17379 | 0.11970 | 0.17596 | 0.15896 | 0.20988 | 
| 0.11899 | 0.16511 | 0.15880 | 0.10622 | 0.06540 | 
| 0.15637 | 0.16768 | 0.18362 | 0.14133 | 0.16408 | 
 
 
 
 
  
    Output 31.4.2
    Misclassified Observations: Resubstitution
The DISCRIM Procedure
Classification Results for Calibration Data: WORK.CROPS
Resubstitution Results using Linear Discriminant Function
|  | 0.0894 | 0.4054 | 0.1763 | 0.2392 | 0.0897 | 
|  | 0.0769 | 0.4558 | 0.1421 | 0.2530 | 0.0722 | 
|  | 0.0982 | 0.3422 | 0.1365 | 0.3073 | 0.1157 | 
|  | 0.1052 | 0.3634 | 0.1078 | 0.3281 | 0.0955 | 
|  | 0.0588 | 0.5754 | 0.1173 | 0.2087 | 0.0398 | 
| * | 0.0972 | 0.3278 | 0.1318 | 0.3420 | 0.1011 | 
|  | 0.0454 | 0.5238 | 0.1849 | 0.1376 | 0.1083 | 
|  | 0.1330 | 0.2804 | 0.1176 | 0.3305 | 0.1385 | 
|  | 0.1768 | 0.2483 | 0.1586 | 0.2660 | 0.1502 | 
|  | 0.1481 | 0.2431 | 0.1200 | 0.3318 | 0.1570 | 
| * | 0.2357 | 0.0547 | 0.1016 | 0.2721 | 0.3359 | 
| * | 0.0549 | 0.4749 | 0.0920 | 0.2768 | 0.1013 | 
| * | 0.1474 | 0.2606 | 0.2624 | 0.1848 | 0.1448 | 
| * | 0.2815 | 0.1518 | 0.2377 | 0.1767 | 0.1523 | 
| * | 0.2521 | 0.1842 | 0.1529 | 0.2549 | 0.1559 | 
| * | 0.3125 | 0.1023 | 0.2404 | 0.1357 | 0.2091 | 
| * | 0.2121 | 0.1809 | 0.1245 | 0.3045 | 0.1780 | 
| * | 0.4837 | 0.0391 | 0.4384 | 0.0223 | 0.0166 | 
|  | 0.2256 | 0.0794 | 0.3810 | 0.0592 | 0.2548 | 
| * | 0.1421 | 0.3066 | 0.1901 | 0.2231 | 0.1381 | 
| * | 0.1969 | 0.2050 | 0.1354 | 0.2960 | 0.1667 | 
|  | 0.2928 | 0.0871 | 0.1665 | 0.1479 | 0.3056 | 
| * | 0.6215 | 0.0194 | 0.1250 | 0.0496 | 0.1845 | 
| * | 0.2258 | 0.1135 | 0.1646 | 0.2770 | 0.2191 | 
|  | 0.0850 | 0.0081 | 0.0521 | 0.0661 | 0.7887 | 
| * | 0.0693 | 0.2663 | 0.3394 | 0.1460 | 0.1789 | 
| * | 0.1647 | 0.0376 | 0.1680 | 0.1452 | 0.4845 | 
|  | 0.9328 | 0.0003 | 0.0478 | 0.0025 | 0.0165 | 
|  | 0.6642 | 0.0205 | 0.0872 | 0.0959 | 0.1322 | 
|  | 0.9215 | 0.0002 | 0.0604 | 0.0007 | 0.0173 | 
| * | 0.2525 | 0.0402 | 0.0473 | 0.3012 | 0.3588 | 
|  | 0.6132 | 0.0212 | 0.1226 | 0.0408 | 0.2023 | 
|  | 0.2669 | 0.2616 | 0.1512 | 0.2260 | 0.0943 | 
| * | 0.2650 | 0.2645 | 0.3495 | 0.0918 | 0.0292 | 
|  | 0.5914 | 0.0237 | 0.0676 | 0.0781 | 0.2392 | 
| * | 0.2163 | 0.3180 | 0.3327 | 0.1125 | 0.0206 | 
 
* Misclassified observation    
 
 
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Resubstitution Summary using Linear Discriminant Function
 
| 0.4545 | 0.1429 | 0.8333 | 0.5000 | 0.6667 | 0.5000 | 
| 0.3056 | 0.1944 | 0.1667 | 0.1667 | 0.1667 |  | 
 
 
 
 
  
    Output 31.4.3
    Misclassified Observations: Cross Validation
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Cross-validation Summary using Linear Discriminant Function
 
| 0.6364 | 0.4286 | 1.0000 | 0.5000 | 0.8333 | 0.6667 | 
| 0.3056 | 0.1944 | 0.1667 | 0.1667 | 0.1667 |  | 
 
 
 
 
  Next, you can use the calibration information stored in the Cropstat data set to classify a test data set. The TESTLIST option lists the classification results for each observation in the test data set. The following statements produce Output  31.4.4 and Output  31.4.5: 
data test;
   input Crop $ 1-10 x1-x4 xvalues $ 11-21;
   datalines;
Corn      16 27 31 33
Soybeans  21 25 23 24
Cotton    29 24 26 28
Sugarbeets54 23 21 54
Clover    32 32 62 16
;
title2 'Classification of Test Data';
proc discrim data=cropstat testdata=test testout=tout testlist;
   class Crop;
   testid xvalues;
   var x1-x4;
run;
proc print data=tout;
   title 'Discriminant Analysis of Remote Sensing Data on Five Crops';
   title2 'Output Classification Results of Test Data';
run;
    Output 31.4.4
    Classification of Test Data
The DISCRIM Procedure
Classification Results for Test Data: WORK.TEST
Classification Results using Linear Discriminant Function
|  | 0.0894 | 0.4054 | 0.1763 | 0.2392 | 0.0897 | 
|  | 0.1481 | 0.2431 | 0.1200 | 0.3318 | 0.1570 | 
| * | 0.2521 | 0.1842 | 0.1529 | 0.2549 | 0.1559 | 
| * | 0.6215 | 0.0194 | 0.1250 | 0.0496 | 0.1845 | 
| * | 0.2163 | 0.3180 | 0.3327 | 0.1125 | 0.0206 | 
 
* Misclassified observation    
 
 
The DISCRIM Procedure
Classification Summary for Test Data: WORK.TEST
Classification Summary using Linear Discriminant Function
 
| 1.0000 | 0.0000 | 1.0000 | 0.0000 | 1.0000 | 0.6389 | 
| 0.3056 | 0.1944 | 0.1667 | 0.1667 | 0.1667 |  | 
 
 
 
 
  
    Output 31.4.5
    Output Data Set of the Classification Results for Test Data
| Corn | 16 | 27 | 31 | 33 | 16 27 31 33 | 0.08935 | 0.40543 | 0.17632 | 0.23918 | 0.08972 | Corn | 
| Soybeans | 21 | 25 | 23 | 24 | 21 25 23 24 | 0.14811 | 0.24308 | 0.11999 | 0.33184 | 0.15698 | Soybeans | 
| Cotton | 29 | 24 | 26 | 28 | 29 24 26 28 | 0.25213 | 0.18420 | 0.15294 | 0.25486 | 0.15588 | Soybeans | 
| Sugarbeets | 54 | 23 | 21 | 54 | 54 23 21 54 | 0.62150 | 0.01937 | 0.12498 | 0.04962 | 0.18452 | Clover | 
| Clover | 32 | 32 | 62 | 16 | 32 32 62 16 | 0.21633 | 0.31799 | 0.33266 | 0.11246 | 0.02056 | Cotton | 
 
 
 
 
  In this next example, PROC DISCRIM uses normal-theory methods (METHOD=NORMAL) assuming unequal variances (POOL=NO) for the remote-sensing data. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The CROSSVALIDATE option displays cross validation error-rate estimates. Note that the total error count estimate by cross validation (0.5556) is much larger than the total error count estimate by resubstitution (0.1111). The following statements produce Output  31.4.6: 
title2 'Using Quadratic Discriminant Function';
proc discrim data=crops method=normal pool=no crossvalidate;
   class Crop;
   priors prop;
   id xvalues;
   var x1-x4;
run;
    Output 31.4.6
    Quadratic Discriminant Function on Crop Data
| Clover | 11 | 11.0000 | 0.305556 | 0.305556 | 
| Corn | 7 | 7.0000 | 0.194444 | 0.194444 | 
| Cotton | 6 | 6.0000 | 0.166667 | 0.166667 | 
| Soybeans | 6 | 6.0000 | 0.166667 | 0.166667 | 
| Sugarbeets | 6 | 6.0000 | 0.166667 | 0.166667 | 
 
 
 
| 4 | 23.64618 | 
| 4 | 11.13472 | 
| 4 | 13.23569 | 
| 4 | 12.45263 | 
| 4 | 17.76293 | 
 
 
 
The DISCRIM Procedure
| 26.01743 | 1320 | 104.18297 | 194.10546 | 31.40816 | 
| 27.73809 | 14.40994 | 150.50763 | 38.36252 | 25.55421 | 
| 26.38544 | 588.86232 | 16.81921 | 52.03266 | 37.15560 | 
| 27.07134 | 46.42131 | 41.01631 | 16.03615 | 23.15920 | 
| 26.80188 | 332.11563 | 43.98280 | 107.95676 | 21.34645 | 
 
 
 
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Resubstitution Summary using Quadratic Discriminant Function
 
| 0.1818 | 0.0000 | 0.0000 | 0.0000 | 0.3333 | 0.1111 | 
| 0.3056 | 0.1944 | 0.1667 | 0.1667 | 0.1667 |  | 
 
 
 
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Cross-validation Summary using Quadratic Discriminant Function
 
| 0.1818 | 0.7143 | 0.6667 | 0.6667 | 0.8333 | 0.5556 | 
| 0.3056 | 0.1944 | 0.1667 | 0.1667 | 0.1667 |  |