Example 31.4 Linear Discriminant Analysis of Remote-Sensing Data on Crops
In this example, the remote-sensing data are used. In this data set, the observations are grouped into five crops: clover, corn, cotton, soybeans, and sugar beets. Four measures called x1 through x4 make up the descriptive variables.
In the first PROC DISCRIM statement, the DISCRIM procedure uses normal-theory methods (METHOD=NORMAL) assuming equal variances (POOL=YES) in five crops. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The LIST option lists the resubstitution classification results for each observation (Output 31.4.2). The CROSSVALIDATE option displays cross validation error-rate estimates (Output 31.4.3). The OUTSTAT= option stores the calibration information in a new data set to classify future observations. A second PROC DISCRIM statement uses this calibration information to classify a test data set. Note that the values of the identification variable, xvalues, are obtained by rereading the x1 through x4 fields in the data lines as a single character variable. The following statements produce Output 31.4.1 through Output 31.4.3:
title 'Discriminant Analysis of Remote Sensing Data on Five Crops';
data crops;
input Crop $ 1-10 x1-x4 xvalues $ 11-21;
datalines;
Corn 16 27 31 33
Corn 15 23 30 30
Corn 16 27 27 26
Corn 18 20 25 23
Corn 15 15 31 32
Corn 15 32 32 15
Corn 12 15 16 73
Soybeans 20 23 23 25
Soybeans 24 24 25 32
Soybeans 21 25 23 24
Soybeans 27 45 24 12
Soybeans 12 13 15 42
Soybeans 22 32 31 43
Cotton 31 32 33 34
Cotton 29 24 26 28
Cotton 34 32 28 45
Cotton 26 25 23 24
Cotton 53 48 75 26
Cotton 34 35 25 78
Sugarbeets22 23 25 42
Sugarbeets25 25 24 26
Sugarbeets34 25 16 52
Sugarbeets54 23 21 54
Sugarbeets25 43 32 15
Sugarbeets26 54 2 54
Clover 12 45 32 54
Clover 24 58 25 34
Clover 87 54 61 21
Clover 51 31 31 16
Clover 96 48 54 62
Clover 31 31 11 11
Clover 56 13 13 71
Clover 32 13 27 32
Clover 36 26 54 32
Clover 53 08 06 54
Clover 32 32 62 16
;
title2 'Using the Linear Discriminant Function';
proc discrim data=crops outstat=cropstat method=normal pool=yes
list crossvalidate;
class Crop;
priors prop;
id xvalues;
var x1-x4;
run;
Output 31.4.1
Linear Discriminant Function on Crop Data
Clover |
11 |
11.0000 |
0.305556 |
0.305556 |
Corn |
7 |
7.0000 |
0.194444 |
0.194444 |
Cotton |
6 |
6.0000 |
0.166667 |
0.166667 |
Soybeans |
6 |
6.0000 |
0.166667 |
0.166667 |
Sugarbeets |
6 |
6.0000 |
0.166667 |
0.166667 |
The DISCRIM Procedure
2.37125 |
7.52830 |
4.44969 |
6.16665 |
5.07262 |
6.62433 |
3.27522 |
5.46798 |
4.31383 |
6.47395 |
3.23741 |
5.15968 |
3.58352 |
5.01819 |
4.87908 |
4.95438 |
4.00552 |
5.01819 |
3.58352 |
4.65998 |
3.86034 |
6.16564 |
4.87908 |
4.65998 |
3.58352 |
-10.98457 |
-7.72070 |
-11.46537 |
-7.28260 |
-9.80179 |
0.08907 |
-0.04180 |
0.02462 |
0.0000369 |
0.04245 |
0.17379 |
0.11970 |
0.17596 |
0.15896 |
0.20988 |
0.11899 |
0.16511 |
0.15880 |
0.10622 |
0.06540 |
0.15637 |
0.16768 |
0.18362 |
0.14133 |
0.16408 |
Output 31.4.2
Misclassified Observations: Resubstitution
The DISCRIM Procedure
Classification Results for Calibration Data: WORK.CROPS
Resubstitution Results using Linear Discriminant Function
|
0.0894 |
0.4054 |
0.1763 |
0.2392 |
0.0897 |
|
0.0769 |
0.4558 |
0.1421 |
0.2530 |
0.0722 |
|
0.0982 |
0.3422 |
0.1365 |
0.3073 |
0.1157 |
|
0.1052 |
0.3634 |
0.1078 |
0.3281 |
0.0955 |
|
0.0588 |
0.5754 |
0.1173 |
0.2087 |
0.0398 |
* |
0.0972 |
0.3278 |
0.1318 |
0.3420 |
0.1011 |
|
0.0454 |
0.5238 |
0.1849 |
0.1376 |
0.1083 |
|
0.1330 |
0.2804 |
0.1176 |
0.3305 |
0.1385 |
|
0.1768 |
0.2483 |
0.1586 |
0.2660 |
0.1502 |
|
0.1481 |
0.2431 |
0.1200 |
0.3318 |
0.1570 |
* |
0.2357 |
0.0547 |
0.1016 |
0.2721 |
0.3359 |
* |
0.0549 |
0.4749 |
0.0920 |
0.2768 |
0.1013 |
* |
0.1474 |
0.2606 |
0.2624 |
0.1848 |
0.1448 |
* |
0.2815 |
0.1518 |
0.2377 |
0.1767 |
0.1523 |
* |
0.2521 |
0.1842 |
0.1529 |
0.2549 |
0.1559 |
* |
0.3125 |
0.1023 |
0.2404 |
0.1357 |
0.2091 |
* |
0.2121 |
0.1809 |
0.1245 |
0.3045 |
0.1780 |
* |
0.4837 |
0.0391 |
0.4384 |
0.0223 |
0.0166 |
|
0.2256 |
0.0794 |
0.3810 |
0.0592 |
0.2548 |
* |
0.1421 |
0.3066 |
0.1901 |
0.2231 |
0.1381 |
* |
0.1969 |
0.2050 |
0.1354 |
0.2960 |
0.1667 |
|
0.2928 |
0.0871 |
0.1665 |
0.1479 |
0.3056 |
* |
0.6215 |
0.0194 |
0.1250 |
0.0496 |
0.1845 |
* |
0.2258 |
0.1135 |
0.1646 |
0.2770 |
0.2191 |
|
0.0850 |
0.0081 |
0.0521 |
0.0661 |
0.7887 |
* |
0.0693 |
0.2663 |
0.3394 |
0.1460 |
0.1789 |
* |
0.1647 |
0.0376 |
0.1680 |
0.1452 |
0.4845 |
|
0.9328 |
0.0003 |
0.0478 |
0.0025 |
0.0165 |
|
0.6642 |
0.0205 |
0.0872 |
0.0959 |
0.1322 |
|
0.9215 |
0.0002 |
0.0604 |
0.0007 |
0.0173 |
* |
0.2525 |
0.0402 |
0.0473 |
0.3012 |
0.3588 |
|
0.6132 |
0.0212 |
0.1226 |
0.0408 |
0.2023 |
|
0.2669 |
0.2616 |
0.1512 |
0.2260 |
0.0943 |
* |
0.2650 |
0.2645 |
0.3495 |
0.0918 |
0.0292 |
|
0.5914 |
0.0237 |
0.0676 |
0.0781 |
0.2392 |
* |
0.2163 |
0.3180 |
0.3327 |
0.1125 |
0.0206 |
* Misclassified observation
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Resubstitution Summary using Linear Discriminant Function
0.4545 |
0.1429 |
0.8333 |
0.5000 |
0.6667 |
0.5000 |
0.3056 |
0.1944 |
0.1667 |
0.1667 |
0.1667 |
|
Output 31.4.3
Misclassified Observations: Cross Validation
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Cross-validation Summary using Linear Discriminant Function
0.6364 |
0.4286 |
1.0000 |
0.5000 |
0.8333 |
0.6667 |
0.3056 |
0.1944 |
0.1667 |
0.1667 |
0.1667 |
|
Next, you can use the calibration information stored in the Cropstat data set to classify a test data set. The TESTLIST option lists the classification results for each observation in the test data set. The following statements produce Output 31.4.4 and Output 31.4.5:
data test;
input Crop $ 1-10 x1-x4 xvalues $ 11-21;
datalines;
Corn 16 27 31 33
Soybeans 21 25 23 24
Cotton 29 24 26 28
Sugarbeets54 23 21 54
Clover 32 32 62 16
;
title2 'Classification of Test Data';
proc discrim data=cropstat testdata=test testout=tout testlist;
class Crop;
testid xvalues;
var x1-x4;
run;
proc print data=tout;
title 'Discriminant Analysis of Remote Sensing Data on Five Crops';
title2 'Output Classification Results of Test Data';
run;
Output 31.4.4
Classification of Test Data
The DISCRIM Procedure
Classification Results for Test Data: WORK.TEST
Classification Results using Linear Discriminant Function
|
0.0894 |
0.4054 |
0.1763 |
0.2392 |
0.0897 |
|
0.1481 |
0.2431 |
0.1200 |
0.3318 |
0.1570 |
* |
0.2521 |
0.1842 |
0.1529 |
0.2549 |
0.1559 |
* |
0.6215 |
0.0194 |
0.1250 |
0.0496 |
0.1845 |
* |
0.2163 |
0.3180 |
0.3327 |
0.1125 |
0.0206 |
* Misclassified observation
The DISCRIM Procedure
Classification Summary for Test Data: WORK.TEST
Classification Summary using Linear Discriminant Function
1.0000 |
0.0000 |
1.0000 |
0.0000 |
1.0000 |
0.6389 |
0.3056 |
0.1944 |
0.1667 |
0.1667 |
0.1667 |
|
Output 31.4.5
Output Data Set of the Classification Results for Test Data
Corn |
16 |
27 |
31 |
33 |
16 27 31 33 |
0.08935 |
0.40543 |
0.17632 |
0.23918 |
0.08972 |
Corn |
Soybeans |
21 |
25 |
23 |
24 |
21 25 23 24 |
0.14811 |
0.24308 |
0.11999 |
0.33184 |
0.15698 |
Soybeans |
Cotton |
29 |
24 |
26 |
28 |
29 24 26 28 |
0.25213 |
0.18420 |
0.15294 |
0.25486 |
0.15588 |
Soybeans |
Sugarbeets |
54 |
23 |
21 |
54 |
54 23 21 54 |
0.62150 |
0.01937 |
0.12498 |
0.04962 |
0.18452 |
Clover |
Clover |
32 |
32 |
62 |
16 |
32 32 62 16 |
0.21633 |
0.31799 |
0.33266 |
0.11246 |
0.02056 |
Cotton |
In this next example, PROC DISCRIM uses normal-theory methods (METHOD=NORMAL) assuming unequal variances (POOL=NO) for the remote-sensing data. The PRIORS statement, PRIORS PROP, sets the prior probabilities proportional to the sample sizes. The CROSSVALIDATE option displays cross validation error-rate estimates. Note that the total error count estimate by cross validation (0.5556) is much larger than the total error count estimate by resubstitution (0.1111). The following statements produce Output 31.4.6:
title2 'Using Quadratic Discriminant Function';
proc discrim data=crops method=normal pool=no crossvalidate;
class Crop;
priors prop;
id xvalues;
var x1-x4;
run;
Output 31.4.6
Quadratic Discriminant Function on Crop Data
Clover |
11 |
11.0000 |
0.305556 |
0.305556 |
Corn |
7 |
7.0000 |
0.194444 |
0.194444 |
Cotton |
6 |
6.0000 |
0.166667 |
0.166667 |
Soybeans |
6 |
6.0000 |
0.166667 |
0.166667 |
Sugarbeets |
6 |
6.0000 |
0.166667 |
0.166667 |
4 |
23.64618 |
4 |
11.13472 |
4 |
13.23569 |
4 |
12.45263 |
4 |
17.76293 |
The DISCRIM Procedure
26.01743 |
1320 |
104.18297 |
194.10546 |
31.40816 |
27.73809 |
14.40994 |
150.50763 |
38.36252 |
25.55421 |
26.38544 |
588.86232 |
16.81921 |
52.03266 |
37.15560 |
27.07134 |
46.42131 |
41.01631 |
16.03615 |
23.15920 |
26.80188 |
332.11563 |
43.98280 |
107.95676 |
21.34645 |
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Resubstitution Summary using Quadratic Discriminant Function
0.1818 |
0.0000 |
0.0000 |
0.0000 |
0.3333 |
0.1111 |
0.3056 |
0.1944 |
0.1667 |
0.1667 |
0.1667 |
|
The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.CROPS
Cross-validation Summary using Quadratic Discriminant Function
0.1818 |
0.7143 |
0.6667 |
0.6667 |
0.8333 |
0.5556 |
0.3056 |
0.1944 |
0.1667 |
0.1667 |
0.1667 |
|