BY Statement |
A BY statement is used with the FIT statement to obtain separate estimates for observations in groups defined by the BY variables. Note that if an output model file is written, using the OUTMODEL= option, the parameter values stored are those from the last BY group processed. To save parameter estimates for each BY group, use the OUTEST= option in the FIT statement.
A BY statement is used with the SOLVE statement to obtain solutions for observations in groups defined by the BY variables. If the BY variables are identical in the DATA= data set and the ESTDATA= data set, then the two data sets are synchronized and the calculations are performed by using the data and parameters for each BY group. This holds for BY variables in the SDATA= data set as well. If the BY variables do not match, BY group processing is abandoned in either the ESTDATA= data set or the SDATA= data set, whichever has the missing BY value. If the DATA= data set does not contain BY variables and the ESTDATA= data set or the SDATA= data set does, then BY processing is performed for the ESTDATA= data set and the SDATA= data set by reusing the data in the DATA= data set for each BY group.
If both FIT and SOLVE tasks require BY group processing, then two separate BY statements are needed. If parameters for each BY group in the OUTEST = data set obtained from the FIT task are to be used for the corresponding BY group for the SOLVE task, then one of the two BY statements must appear after the SOLVE statement.
The following linear regression example illustrates the use of BY group processing. Both the datasets A and D to be used for fitting and solving, respectively, have three groups.
/*------ data set for fit task------ */ data a ; do group = 1 to 3 ; do i = 1 to 100 ; x = normal(1); y = 2 + 3*x + rannor(1) ; output ; end ; end ; run ; /*------ data set for solve task------ */ data d ; do group = 1 to 3 ; x = normal(1) ; output ; end ; run ;
/* ------ 2 BY statements, one of them appear after SOLVE statement ------ */ proc model data = a ; by group ; y = a0 + a1*x ; fit y / outest = b1 ; solve y / data = d estdata = b1 out = c1 ; by group ; run; proc print data = b1 ;run; proc print data = c1 ; run;
Each of the parameter estimates obtained from the BY group processing in the FIT statement shown in Figure 19.14 is used in the corresponding BY group variables in the SOLVE statement. The output dataset is shown in Figure 19.15.
Obs | group | _NAME_ | _TYPE_ | _STATUS_ | _NUSED_ | a0 | a1 |
---|---|---|---|---|---|---|---|
1 | 1 | OLS | 0 Converged | 100 | 2.00338 | 3.00298 | |
2 | 2 | OLS | 0 Converged | 100 | 2.05091 | 3.08808 | |
3 | 3 | OLS | 0 Converged | 100 | 2.15528 | 3.04290 |
Obs | group | _TYPE_ | _MODE_ | _ERRORS_ | y | x |
---|---|---|---|---|---|---|
1 | 1 | PREDICT | SIMULATE | 0 | 7.42322 | 1.80482 |
2 | 2 | PREDICT | SIMULATE | 0 | 1.80413 | -0.07992 |
3 | 3 | PREDICT | SIMULATE | 0 | 3.36202 | 0.39658 |
/*------ 1 BY statement that appears before SOLVE statement------ */ proc model data = a ; by group ; y = a0 + a1*x ; fit y / outest = b2 ; solve y / data = d estdata = b2 out = c2 ; run; proc print data = b2 ; run; proc print data = c2 ; run;
The estimates of the parameters are shown in Figure 19.16, and the output data set of the SOLVE statement is shown in Figure 19.17. Hence, the estimates and the predicted values obtained in the last BY group variable of both DATA C1 and C2 are the same while the others do not match.
Obs | group | _NAME_ | _TYPE_ | _STATUS_ | _NUSED_ | a0 | a1 |
---|---|---|---|---|---|---|---|
1 | 1 | OLS | 0 Converged | 100 | 2.00338 | 3.00298 | |
2 | 2 | OLS | 0 Converged | 100 | 2.05091 | 3.08808 | |
3 | 3 | OLS | 0 Converged | 100 | 2.15528 | 3.04290 |
Obs | _TYPE_ | _MODE_ | _ERRORS_ | y | x |
---|---|---|---|---|---|
1 | PREDICT | SIMULATE | 0 | 7.64717 | 1.80482 |
2 | PREDICT | SIMULATE | 0 | 1.91211 | -0.07992 |
3 | PREDICT | SIMULATE | 0 | 3.36202 | 0.39658 |
/*------ 1 BY statement that appears after SOLVE statement------*/ proc model data = a ; y = a0 + a1*x ; fit y / outest = b3 ; solve y / data = d estdata = b3 out = c3 ; by group ; run; proc print data = b3 ; run; proc print data = c3 ; run;
The output data B3 and C3 are listed in Figure 19.18 and Figure 19.19, respectively.
Obs | _NAME_ | _TYPE_ | _STATUS_ | _NUSED_ | a0 | a1 |
---|---|---|---|---|---|---|
1 | OLS | 0 Converged | 300 | 2.06624 | 3.04219 |
Obs | group | _TYPE_ | _MODE_ | _ERRORS_ | y | x |
---|---|---|---|---|---|---|
1 | 1 | PREDICT | SIMULATE | 0 | 7.55686 | 1.80482 |
2 | 2 | PREDICT | SIMULATE | 0 | 1.82312 | -0.07992 |
3 | 3 | PREDICT | SIMULATE | 0 | 3.27270 | 0.39658 |