The size of a model that PROC MIXED can fit depends on several factors including the amount of memory and disk space available on your computer, the number of levels in the hierarchy, and number of levels within each hierarchy.
By taking advantage of the nested structure of the data, PROC MIXED can estimate fairly large models in a reasonable amount of time. For example, the following statements simulate data on students, with students nested within schools and schools nested within school districts.
data test; retain uschool 0; do district=1 to 100; rd=rannor(123)*2; do school=1 to ceil(ranuni(123)*5)+3; uschool+1; rsd=rannor(123); do student=1 to ceil(ranuni(123)*10)+20; x1=rannor(123); y=rd + rsd + x1 + rannor(123); output; end; end; end; run;
The generated data has 100 school districts, with between 4 and 8 schools within each district for a total of 610 schools, and between 21 and 30 students within each school for a total of 15,541 observations. There are two variables representing the school level — SCHOOL, numbered 1 to the number of schools within each district, and USCHOOL which is a non-nested school number uniquely identifying the schools from 1 to 610.
The following statements fit the hierarchical linear model:
proc mixed data=test; class district uschool; model y=x1 / ddfm=bw; random int / subject=district; random int / subject=uschool; run;
Note that there is no common factor in the SUBJECT= effects in the two RANDOM statements. As a result, the model is not processed by subjects and PROC MIXED processes the data as a single block. That reduces the efficiency of PROC MIXED, resulting in greater time to estimate the model.
If you make a slight change to the program, you can fit the same model in less time. In the following statements, since DISTRICT is involved in the SUBJECT= effects in both RANDOM statements, PROC MIXED can block the data on DISTRICT. Processing by subjects allows for more efficient use of resources and less computing time.
proc mixed data=test; class district uschool; model y=x1 / ddfm=bw; random int / subject=district; random int / subject=uschool(district); run;
A further improvement can be made to the coding of this model. The USCHOOL variable is not nested within DISTRICT. Instead, each school has a unique value of USCHOOL. By identifying schools in a nested fashion, within districts, PROC MIXED can handle models with many more levels within each level of the hierarchy. The SCHOOL variable is nested within each level of DISTRICT. For instance, within district 1 there are schools 1, 2, 3, and 4. Schools 1, 2, 3, 4, 5, 6, and 7 are in district 2. And so on. Therefore, a school can be uniquely identified by a combination of DISTRICT and SCHOOL values. Note that school 1 within district 1 is different from school 1 within district 2.
This nested representation of schools is used in the following PROC MIXED step:
proc mixed data=test; class district school; model y=x1 / ddfm=bw; random int / subject=district; random int / subject=school(district); run;
The results of this model agree with those of the previous two models, but this model converges more quickly. The above step replaces a variable that has 610 levels (USCHOOL) in the CLASS statement, with a variable (SCHOOL) having only 8 levels. This change can save dramatic amounts of memory when setting up the model, making it possible to estimate models that would otherwise require too much memory.
The following statements simulates data from 500 districts, 12,790 schools, and 326,053 students.
data test; retain uschool 0; do district=1 to 500; rd=rannor(123)*2; do school=1 to ceil(ranuni(123)*10)+20; uschool+1; rsd=rannor(123); do student=1 to ceil(ranuni(123)*10)+20; x1=rannor(123); y=rd + rsd + x1 + rannor(123); output; end; end; end; run;
If you fit the model using USCHOOL (the variable with a unique value for each school), the model estimation takes several minutes to complete:
proc mixed data=test; class district uschool; model y=x1 / ddfm=bw; random int / subject=district; random int / subject=uschool(district); run;
By using the nested effect of SCHOOL within DISTRICT, model estimation takes just a few seconds:
proc mixed data=test; class district school; model y=x1 / ddfm=bw; random int / subject=district; random int / subject=school(district); run;
Processing the extra levels of USCHOOL as a CLASS effect significantly lengthens the time needed to fit the model. By minimizing the number of CLASS levels, performance is considerably improved.
If the data set were larger, with 1000 districts and millions of students, attempting to fit the model using USCHOOL in the CLASS statement will either result in an out of memory message in the SAS log or estimation would take an unacceptably long time. Using the nested school effect, SCHOOL(DISTRICT), estimation of the model in an acceptable amount of time is still possible.
___________
In these examples, the DDFM=BW option is used giving a between- and within-subject breakdown of the degrees of freedom. In many real-world cases, you may want to use DDFM=SATTERTH or DDFM=KR to get degrees of freedom more appropriate for your analysis. Both of these methods take more time and memory to calculate than DDFM=BW. For very large data sets, you may need to switch back to DDFM=BW or even DDFM=RESIDUAL to get acceptable performance.
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS | ||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows XP Professional | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
Type: | Usage Note |
Priority: | |
Topic: | Analytics ==> Mixed Models SAS Reference ==> Procedures ==> MIXED |
Date Modified: | 2009-09-08 15:03:16 |
Date Created: | 2009-09-02 02:36:36 |