  | 
  | 
Example 31.4 Linear Discriminant Analysis of Remote-Sensing Data on Crops
In this example, the remote-sensing data are used. In this data set, the observations are grouped into five crops: clover, corn, cotton, soybeans, and sugar beets. Four measures called x1 through x4 make up the descriptive variables. 
In the first PROC DISCRIM statement, the DISCRIM procedure uses normal-theory methods (METHOD=NORMAL) assuming equal variances (POOL=YES) in five crops. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The LIST option lists the resubstitution classification results for each observation (Output  31.4.2). The CROSSVALIDATE option displays cross validation error-rate estimates (Output  31.4.3). The OUTSTAT= option stores the calibration information in a new data set to classify future observations. A second PROC DISCRIM statement uses this calibration information to classify a test data set. Note that the values of the identification variable, xvalues, are obtained by rereading the x1 through x4 fields in the data lines as a single character variable. The following statements produce Output  31.4.1 through Output  31.4.3: 
   title 'Discriminant Analysis of Remote Sensing Data on Five Crops';
   
   data crops;
      input Crop $ 1-10 x1-x4 xvalues $ 11-21;
      datalines;
   Corn      16 27 31 33
   Corn      15 23 30 30
   Corn      16 27 27 26
   Corn      18 20 25 23
   Corn      15 15 31 32
   Corn      15 32 32 15
   Corn      12 15 16 73
   Soybeans  20 23 23 25
   Soybeans  24 24 25 32
   Soybeans  21 25 23 24
   Soybeans  27 45 24 12
   Soybeans  12 13 15 42
   Soybeans  22 32 31 43
   Cotton    31 32 33 34
   Cotton    29 24 26 28
   Cotton    34 32 28 45
   Cotton    26 25 23 24
   Cotton    53 48 75 26
   Cotton    34 35 25 78
   Sugarbeets22 23 25 42
   Sugarbeets25 25 24 26
   Sugarbeets34 25 16 52
   Sugarbeets54 23 21 54
   Sugarbeets25 43 32 15
   Sugarbeets26 54  2 54
   Clover    12 45 32 54
   Clover    24 58 25 34
   Clover    87 54 61 21
   Clover    51 31 31 16
   Clover    96 48 54 62
   Clover    31 31 11 11
   Clover    56 13 13 71
   Clover    32 13 27 32
   Clover    36 26 54 32
   Clover    53 08 06 54
   Clover    32 32 62 16
   ;
   title2 'Using the Linear Discriminant Function';
   
   proc discrim data=crops outstat=cropstat method=normal pool=yes
                list crossvalidate;
      class Crop;
      priors prop;
      id xvalues;
      var x1-x4;
   run;
    Output 31.4.1
    Linear Discriminant Function on Crop Data
 
 
| Clover | 
11 | 
11.0000 | 
0.305556 | 
0.305556 | 
| Corn | 
7 | 
7.0000 | 
0.194444 | 
0.194444 | 
| Cotton | 
6 | 
6.0000 | 
0.166667 | 
0.166667 | 
| Soybeans | 
6 | 
6.0000 | 
0.166667 | 
0.166667 | 
| Sugarbeets | 
6 | 
6.0000 | 
0.166667 | 
0.166667 | 
 
 
 
The DISCRIM Procedure
| 2.37125 | 
7.52830 | 
4.44969 | 
6.16665 | 
5.07262 | 
| 6.62433 | 
3.27522 | 
5.46798 | 
4.31383 | 
6.47395 | 
| 3.23741 | 
5.15968 | 
3.58352 | 
5.01819 | 
4.87908 | 
| 4.95438 | 
4.00552 | 
5.01819 | 
3.58352 | 
4.65998 | 
| 3.86034 | 
6.16564 | 
4.87908 | 
4.65998 | 
3.58352 | 
 
 
 
| -10.98457 | 
-7.72070 | 
-11.46537 | 
-7.28260 | 
-9.80179 | 
| 0.08907 | 
-0.04180 | 
0.02462 | 
0.0000369 | 
0.04245 | 
| 0.17379 | 
0.11970 | 
0.17596 | 
0.15896 | 
0.20988 | 
| 0.11899 | 
0.16511 | 
0.15880 | 
0.10622 | 
0.06540 | 
| 0.15637 | 
0.16768 | 
0.18362 | 
0.14133 | 
0.16408 | 
 
 
 
 
  
    Output 31.4.2
    Misclassified Observations: Resubstitution
The DISCRIM Procedure
Classification Results for Calibration Data: WORK.CROPS
Resubstitution Results using Linear Discriminant Function
|   | 
0.0894 | 
0.4054 | 
0.1763 | 
0.2392 | 
0.0897 | 
|   | 
0.0769 | 
0.4558 | 
0.1421 | 
0.2530 | 
0.0722 | 
|   | 
0.0982 | 
0.3422 | 
0.1365 | 
0.3073 | 
0.1157 | 
|   | 
0.1052 | 
0.3634 | 
0.1078 | 
0.3281 | 
0.0955 | 
|   | 
0.0588 | 
0.5754 | 
0.1173 | 
0.2087 | 
0.0398 | 
| * | 
0.0972 | 
0.3278 | 
0.1318 | 
0.3420 | 
0.1011 | 
|   | 
0.0454 | 
0.5238 | 
0.1849 | 
0.1376 | 
0.1083 | 
|   | 
0.1330 | 
0.2804 | 
0.1176 | 
0.3305 | 
0.1385 | 
|   | 
0.1768 | 
0.2483 | 
0.1586 | 
0.2660 | 
0.1502 | 
|   | 
0.1481 | 
0.2431 | 
0.1200 | 
0.3318 | 
0.1570 | 
| * | 
0.2357 | 
0.0547 | 
0.1016 | 
0.2721 | 
0.3359 | 
| * | 
0.0549 | 
0.4749 | 
0.0920 | 
0.2768 | 
0.1013 | 
| * | 
0.1474 | 
0.2606 | 
0.2624 | 
0.1848 | 
0.1448 | 
| * | 
0.2815 | 
0.1518 | 
0.2377 | 
0.1767 | 
0.1523 | 
| * | 
0.2521 | 
0.1842 | 
0.1529 | 
0.2549 | 
0.1559 | 
| * | 
0.3125 | 
0.1023 | 
0.2404 | 
0.1357 | 
0.2091 | 
| * | 
0.2121 | 
0.1809 | 
0.1245 | 
0.3045 | 
0.1780 | 
| * | 
0.4837 | 
0.0391 | 
0.4384 | 
0.0223 | 
0.0166 | 
|   | 
0.2256 | 
0.0794 | 
0.3810 | 
0.0592 | 
0.2548 | 
| * | 
0.1421 | 
0.3066 | 
0.1901 | 
0.2231 | 
0.1381 | 
| * | 
0.1969 | 
0.2050 | 
0.1354 | 
0.2960 | 
0.1667 | 
|   | 
0.2928 | 
0.0871 | 
0.1665 | 
0.1479 | 
0.3056 | 
| * | 
0.6215 | 
0.0194 | 
0.1250 | 
0.0496 | 
0.1845 | 
| * | 
0.2258 | 
0.1135 | 
0.1646 | 
0.2770 | 
0.2191 | 
|   | 
0.0850 | 
0.0081 | 
0.0521 | 
0.0661 | 
0.7887 | 
| * | 
0.0693 | 
0.2663 | 
0.3394 | 
0.1460 | 
0.1789 | 
| * | 
0.1647 | 
0.0376 | 
0.1680 | 
0.1452 | 
0.4845 | 
|   | 
0.9328 | 
0.0003 | 
0.0478 | 
0.0025 | 
0.0165 | 
|   | 
0.6642 | 
0.0205 | 
0.0872 | 
0.0959 | 
0.1322 | 
|   | 
0.9215 | 
0.0002 | 
0.0604 | 
0.0007 | 
0.0173 | 
| * | 
0.2525 | 
0.0402 | 
0.0473 | 
0.3012 | 
0.3588 | 
|   | 
0.6132 | 
0.0212 | 
0.1226 | 
0.0408 | 
0.2023 | 
|   | 
0.2669 | 
0.2616 | 
0.1512 | 
0.2260 | 
0.0943 | 
| * | 
0.2650 | 
0.2645 | 
0.3495 | 
0.0918 | 
0.0292 | 
|   | 
0.5914 | 
0.0237 | 
0.0676 | 
0.0781 | 
0.2392 | 
| * | 
0.2163 | 
0.3180 | 
0.3327 | 
0.1125 | 
0.0206 | 
 
 
* Misclassified observation    
 
 
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Resubstitution Summary using Linear Discriminant Function
 
| 0.4545 | 
0.1429 | 
0.8333 | 
0.5000 | 
0.6667 | 
0.5000 | 
| 0.3056 | 
0.1944 | 
0.1667 | 
0.1667 | 
0.1667 | 
  | 
 
 
 
 
 
    Output 31.4.3
    Misclassified Observations: Cross Validation
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Cross-validation Summary using Linear Discriminant Function
 
| 0.6364 | 
0.4286 | 
1.0000 | 
0.5000 | 
0.8333 | 
0.6667 | 
| 0.3056 | 
0.1944 | 
0.1667 | 
0.1667 | 
0.1667 | 
  | 
 
 
 
 
  Next, you can use the calibration information stored in the Cropstat data set to classify a test data set. The TESTLIST option lists the classification results for each observation in the test data set. The following statements produce Output  31.4.4 and Output  31.4.5: 
   data test;
      input Crop $ 1-10 x1-x4 xvalues $ 11-21;
      datalines;
   Corn      16 27 31 33
   Soybeans  21 25 23 24
   Cotton    29 24 26 28
   Sugarbeets54 23 21 54
   Clover    32 32 62 16
   ;
   title2 'Classification of Test Data';
   
   proc discrim data=cropstat testdata=test testout=tout testlist;
      class Crop;
      testid xvalues;
      var x1-x4;
   run;
   
   proc print data=tout;
      title 'Discriminant Analysis of Remote Sensing Data on Five Crops';
      title2 'Output Classification Results of Test Data';
   run;
    Output 31.4.4
    Classification of Test Data
The DISCRIM Procedure
Classification Results for Test Data: WORK.TEST
Classification Results using Linear Discriminant Function
  
|   | 
0.0894 | 
0.4054 | 
0.1763 | 
0.2392 | 
0.0897 | 
|   | 
0.1481 | 
0.2431 | 
0.1200 | 
0.3318 | 
0.1570 | 
| * | 
0.2521 | 
0.1842 | 
0.1529 | 
0.2549 | 
0.1559 | 
| * | 
0.6215 | 
0.0194 | 
0.1250 | 
0.0496 | 
0.1845 | 
| * | 
0.2163 | 
0.3180 | 
0.3327 | 
0.1125 | 
0.0206 | 
 
 
* Misclassified observation    
The DISCRIM Procedure
Classification Summary for Test Data: WORK.TEST
Classification Summary using Linear Discriminant Function
 
| 1.0000 | 
0.0000 | 
1.0000 | 
0.0000 | 
1.0000 | 
0.6389 | 
| 0.3056 | 
0.1944 | 
0.1667 | 
0.1667 | 
0.1667 | 
  | 
 
 
 
 
    Output 31.4.5
    Output Data Set of the Classification Results for Test Data
| Corn | 
16 | 
27 | 
31 | 
33 | 
16 27 31 33 | 
0.08935 | 
0.40543 | 
0.17632 | 
0.23918 | 
0.08972 | 
Corn | 
| Soybeans | 
21 | 
25 | 
23 | 
24 | 
21 25 23 24 | 
0.14811 | 
0.24308 | 
0.11999 | 
0.33184 | 
0.15698 | 
Soybeans | 
| Cotton | 
29 | 
24 | 
26 | 
28 | 
29 24 26 28 | 
0.25213 | 
0.18420 | 
0.15294 | 
0.25486 | 
0.15588 | 
Soybeans | 
| Sugarbeets | 
54 | 
23 | 
21 | 
54 | 
54 23 21 54 | 
0.62150 | 
0.01937 | 
0.12498 | 
0.04962 | 
0.18452 | 
Clover | 
| Clover | 
32 | 
32 | 
62 | 
16 | 
32 32 62 16 | 
0.21633 | 
0.31799 | 
0.33266 | 
0.11246 | 
0.02056 | 
Cotton | 
 
 
 
 
  In this next example, PROC DISCRIM uses normal-theory methods (METHOD=NORMAL) assuming unequal variances (POOL=NO) for the remote-sensing data. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The CROSSVALIDATE option displays cross validation error-rate estimates. Note that the total error count estimate by cross validation (0.5556) is much larger than the total error count estimate by resubstitution (0.1111). The following statements produce Output  31.4.6: 
   title2 'Using Quadratic Discriminant Function';
   
   proc discrim data=crops method=normal pool=no crossvalidate;
      class Crop;
      priors prop;
      id xvalues;
      var x1-x4;
   run;
    Output 31.4.6
    Quadratic Discriminant Function on Crop Data
 
| Clover | 
11 | 
11.0000 | 
0.305556 | 
0.305556 | 
| Corn | 
7 | 
7.0000 | 
0.194444 | 
0.194444 | 
| Cotton | 
6 | 
6.0000 | 
0.166667 | 
0.166667 | 
| Soybeans | 
6 | 
6.0000 | 
0.166667 | 
0.166667 | 
| Sugarbeets | 
6 | 
6.0000 | 
0.166667 | 
0.166667 | 
 
 
 
| 4 | 
23.64618 | 
| 4 | 
11.13472 | 
| 4 | 
13.23569 | 
| 4 | 
12.45263 | 
| 4 | 
17.76293 | 
 
 
 
The DISCRIM Procedure
| 26.01743 | 
1320 | 
104.18297 | 
194.10546 | 
31.40816 | 
| 27.73809 | 
14.40994 | 
150.50763 | 
38.36252 | 
25.55421 | 
| 26.38544 | 
588.86232 | 
16.81921 | 
52.03266 | 
37.15560 | 
| 27.07134 | 
46.42131 | 
41.01631 | 
16.03615 | 
23.15920 | 
| 26.80188 | 
332.11563 | 
43.98280 | 
107.95676 | 
21.34645 | 
 
 
 
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Resubstitution Summary using Quadratic Discriminant Function
 
| 0.1818 | 
0.0000 | 
0.0000 | 
0.0000 | 
0.3333 | 
0.1111 | 
| 0.3056 | 
0.1944 | 
0.1667 | 
0.1667 | 
0.1667 | 
  | 
 
 
 
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Cross-validation Summary using Quadratic Discriminant Function
 
| 0.1818 | 
0.7143 | 
0.6667 | 
0.6667 | 
0.8333 | 
0.5556 | 
| 0.3056 | 
0.1944 | 
0.1667 | 
0.1667 | 
0.1667 | 
  | 
 
 
 
 
Copyright
        © 2009 by SAS Institute Inc., Cary, NC, USA. All
        rights reserved.