The MODEL Procedure


BY Statement

  • BY variables;

A BY statement is used with the FIT statement to obtain separate estimates for observations in groups defined by the BY variables. If an output model file is written using the OUTMODEL= option, the parameter values that are stored are those from the last BY group processed. To save parameter estimates for each BY group, use the OUTEST= option in the FIT statement.

A BY statement is used with the SOLVE statement to obtain solutions for observations in groups defined by the BY variables. If the BY variables in the DATA= data set and the ESTDATA= data set are identical, then the two data sets are synchronized and the calculations are performed by using the data and parameters for each BY group. This holds for BY variables in the SDATA= data set as well. If the BY variables do not match, BY-group processing is abandoned in either the ESTDATA= data set or the SDATA= data set, whichever has the missing BY value. If the DATA= data set does not contain BY variables and the ESTDATA= data set or the SDATA= data set does, then BY-group processing is performed for the ESTDATA= data set and the SDATA= data set by reusing the data in the DATA= data set for each BY group.

If both FIT and SOLVE tasks require BY-group processing, then two separate BY statements are needed. If parameters for each BY group in the OUTEST = data set that is obtained from the FIT task are to be used for the corresponding BY group for the SOLVE task, then one of the two BY statements must appear after the SOLVE statement.

The following linear regression example illustrates the use of BY-group processing. Both the data sets A and D to be used for fitting and solving, respectively, have three groups.

/*------ data set for fit task------ */
data a ;
   do group = 1 to 3 ;
      do i = 1 to 100 ;
         x = normal(1);
         y = 2 + 3*x + rannor(1) ;
         output ;
      end ;
   end ;
run ;

/*------ data set for solve task------ */
data d ;
   do group = 1 to 3 ;
      x = normal(1) ;
      output ;
   end ;
run ;
/* ------  2 BY statements, one of them appear after SOLVE statement ------ */
proc model data = a ;
   by group ;
   y = a0 + a1*x ;
   fit y / outest = b1 ;
   solve y / data = d estdata = b1 out = c1 ;
   by group ;
run;

proc print data = b1 ;run;
proc print data = c1 ; run;

Each of the parameter estimates obtained from the BY group processing in the FIT statement shown in Figure 26.15 is used in the corresponding BY group variables in the SOLVE statement. The output dataset is shown in Figure 26.16.

Figure 26.15: Listing of OUTEST= Data Set Created in the FIT Statement with Two BY Statements

Obs group _NAME_ _TYPE_ _STATUS_ _NUSED_ a0 a1
1 1   OLS 0 Converged 100 2.00338 3.00298
2 2   OLS 0 Converged 100 2.05091 3.08808
3 3   OLS 0 Converged 100 2.15528 3.04290



Figure 26.16: Listing of OUT= Data Set Created in the SOLVE Statement with Two BY Statements

Obs group _TYPE_ _MODE_ _ERRORS_ y x
1 1 PREDICT SIMULATE 0 7.42322 1.80482
2 2 PREDICT SIMULATE 0 1.80413 -0.07992
3 3 PREDICT SIMULATE 0 3.36202 0.39658



If only one BY statement is used and it appears before the SOLVE statement, then parameters for the last BY group in the OUTEST = data set are used for all BY groups for the SOLVE task.

/*------ 1 BY statement that appears before SOLVE statement------ */
proc model data = a ;
   by group ;
   y = a0 + a1*x ;
   fit y / outest = b2 ;
   solve y / data = d estdata = b2 out = c2 ;
run;

proc print data = b2 ; run;
proc print data = c2 ; run;

The estimates of the parameters are shown in Figure 26.17, and the output data set of the SOLVE statement is shown in Figure 26.18. Hence, the estimates and the predicted values obtained in the last BY group variable of both DATA C1 and C2 are the same while the others do not match.

Figure 26.17: Listing of OUTEST= Data Set Created in the FIT Statement with One BY Statement That Appears before the SOLVE Statement

Obs group _NAME_ _TYPE_ _STATUS_ _NUSED_ a0 a1
1 1   OLS 0 Converged 100 2.00338 3.00298
2 2   OLS 0 Converged 100 2.05091 3.08808
3 3   OLS 0 Converged 100 2.15528 3.04290



Figure 26.18: Listing of OUT= Data Set Created in the SOLVE Statement with One BY Statement That Appears before the SOLVE Statement

Obs _TYPE_ _MODE_ _ERRORS_ y x
1 PREDICT SIMULATE 0 7.64717 1.80482
2 PREDICT SIMULATE 0 1.91211 -0.07992
3 PREDICT SIMULATE 0 3.36202 0.39658



If only one BY statement is used and it appears after the SOLVE statement, then BY group processing does not apply to the FIT task. In this case, the OUTEST=data set does not contain the BY variable, and the single set of parameter estimates obtained from the FIT task are used for all BY groups during the SOLVE task.

/*------ 1 BY statement that appears after SOLVE statement------*/
proc model data = a ;
   y = a0 + a1*x ;
   fit y / outest = b3 ;
   solve y / data = d estdata = b3 out = c3 ;
   by group ;
run;

proc print data = b3 ; run;
proc print data = c3 ; run;

The output data B3 and C3 are listed in Figure 26.19 and Figure 26.20, respectively.

Figure 26.19: Listing of OUTEST= Data Set Created in the FIT Statement with One BY Statement That Appears after the SOLVE Statement

Obs _NAME_ _TYPE_ _STATUS_ _NUSED_ a0 a1
1   OLS 0 Converged 300 2.06624 3.04219



Figure 26.20: Listing of OUT= Data Set Created in the First SOLVE Statement with One BY Statement That Appears after the SOLVE Statement

Obs group _TYPE_ _MODE_ _ERRORS_ y x
1 1 PREDICT SIMULATE 0 7.55686 1.80482
2 2 PREDICT SIMULATE 0 1.82312 -0.07992
3 3 PREDICT SIMULATE 0 3.27270 0.39658