Now that you have chosen a two-factor PLS model for predicting pollutant concentrations that are based on sample spectra, suppose that you have two new samples. The following SAS statements create a data set that contains the spectra for the new samples:
data newobs; input obsnam $ v1-v27 @@; datalines; EM17 3933 4518 5637 6006 5721 5187 4641 4149 3789 3579 3447 3381 3327 3234 3078 2832 2571 2274 2040 1818 1629 1470 1350 1245 1134 1050 987 EM25 2904 2997 3255 3150 2922 2778 2700 2646 2571 2487 2370 2250 2127 2052 1713 1419 1200 984 795 648 525 426 351 291 240 204 162 ;
You can apply the PLS model to these samples to estimate pollutant concentration by appending the new samples to the original 16 and specifying that the predicted values for all 18 be output to a data set, as shown in the following statements:
data all; set sample newobs; run; proc hppls data=all nfac=2; model ls ha dt = v1-v27; partition roleVar = Role(train='TRAIN' test='TEST'); output out=result pred=p; id obsnam; run;
proc print data=result; where (obsnam in ('EM17','EM25')); var obsnam p_ls p_ha p_dt; run;
The ID
statement lists the variable obsnam
from the input data set that is transferred to the output data set. The new observations are not used in calculating the
PLS model because they have no response values. Their predicted concentrations are shown in FigureĀ 57.10.
Figure 57.10: Predicted Concentrations for New Observations