The GLIMMIX Procedure

Example 45.19 Quadrature Method for Multilevel Model

In many fields, data are observed on nested units at multiple levels. For example, in an educational study, classes of students from multiple schools might be assigned to one of two programs. At the end of the program, students take an evaluation test for which they receive a grade of Pass or Fail. Such a study has a three-level structure:

Students are the level-1 units.
Classes are the level-2 clusters.
Schools are the level-3 clusters.

Students are nested within classes, which are further nested within schools.

The following DATA step simulates such data for 500 students from five classes that are selected from 10 schools. The observations of test results (Grade) are simulated from a logistic model, with the variable Program as a fixed effect and the variables School and Class as random effects. The random effects for School and Class are simulated with variances 16.0 and 4.0, respectively.

data test; 
   do school = 1 to 10;
      schef = rannor(1234)*4;
      do class = 1 to 5;
         clsef = rannor(2345)*2;
         program = ranbin(12345,1,.5); 
         do student = 1 to 10;                  
            eta = 3 + program + schef + clsef ;
            p = 1/(1+exp(-eta));               
            grade = ranbin(23456,1,p);               
            output;            
         end;            
      end;           
   end;
run;

The following statements use a three-point adaptive quadrature to estimate this three-level model:

proc glimmix data=test method = quad(qpoints=3);
   class school class program;
   model grade = program/s dist=binomial link=logit solution;
   random int /subject = school;
   random int /subject = class(school);
run;

Output 45.19.1 shows the model information. Note that Gauss-Hermite quadrature is used for likelihood approximation.

Output 45.19.1: Model Information

The GLIMMIX Procedure

Model Information
Data Set	WORK.TEST
Response Variable	grade
Response Distribution	Binomial
Link Function	Logit
Variance Function	Default
Variance Matrix Blocked By	school
Estimation Technique	Maximum Likelihood
Likelihood Approximation	Gauss-Hermite Quadrature
Degrees of Freedom Method	Containment

The covariance parameter estimates and the solutions for fixed effects are shown in Output 45.19.2.

Output 45.19.2: Parameter Estimates

Covariance Parameter Estimates
Cov Parm	Subject	Estimate	Standard Error
Intercept	school	3.7342	3.1391
Intercept	class(school)	6.2835	2.5338

Solutions for Fixed Effects
Effect	program	Estimate	Standard Error	DF	t Value	Pr > \|t\|
Intercept		2.6014	0.9512	9	2.73	0.0230
program	0	-0.2056	0.9009	450	-0.23	0.8195
program	1	0	.	.	.	.

For this model, the number of random effects within each school is $(1+5)=6$ : one for school random intercept and five for random intercepts for classes that are nested within the school. This means that the marginal log-likelihood computation requires the numerical evaluation of a six-dimensional integral. With the number of quadrature points set to three by the QPOINTS=3 suboption, this numerical evaluation computes $3^6 = 729$ conditional log likelihoods for each observation on each pass through the data.

The number of conditional log likelihoods increases exponentially with the number of random effects. For example, suppose that you increase the number of classes from five to ten. Then the number of conditional log-likelihood evaluations becomes $3^{(1+10)}$ = 177,147 for each observation, where 1 is the number of random school intercepts and 10 is the number of random intercepts for classes that are nested within each school. In many applications, it is not uncommon to have more than 10 nested units within a higher-level unit. As the number of nested units increases, the number of nested random effects increases, driving the exponential growth of the computational requirement.

To address the intimidating computational demand for dealing with such multilevel models, Pinheiro and Chao (2006) propose a multilevel adaptive quadrature algorithm. You can use the FASTQUAD suboption in the METHOD=QUAD option to invoke this algorithm. The following DATA step simulates the data set for 10 schools:

data test; 
   do school = 1 to 10;
      schef = rannor(1234)*4;
      do class = 1 to 10;
         clsef = rannor(2345)*2;
         program = ranbin(12345,1,.5); 
         do student = 1 to 10;                  
            eta = 3 + program + schef + clsef ;
            p = 1/(1+exp(-eta));               
            grade = ranbin(23456,1,p);               
            output;            
         end;            
      end;           
   end;
run;

The following statements use the FASTQUAD suboption to request multilevel quadrature estimation for the model:

proc glimmix data=test method = quad(qpoints=3 fastquad);
   class school class program;
   model grade = program/s dist=binomial link=logit solution;
   random int /subject = school;
   random int /subject = class(school);
run;

The "Model Information" table in Output 45.19.3 shows that multilevel Gauss-Hermite quadrature is used for likelihood approximation.

Output 45.19.3: Model Information

The GLIMMIX Procedure

Model Information
Data Set	WORK.TEST
Response Variable	grade
Response Distribution	Binomial
Link Function	Logit
Variance Function	Default
Variance Matrix Blocked By	school
Estimation Technique	Maximum Likelihood
Likelihood Approximation	Multilevel Gauss-Hermite Quadrature
Degrees of Freedom Method	Containment

Output 45.19.3 displays the parameter estimates.

Output 45.19.4: Parameter Estimates

Covariance Parameter Estimates
Cov Parm	Subject	Estimate	Standard Error
Intercept	school	13.4910	8.5545
Intercept	class(school)	2.9375	1.1370

Solutions for Fixed Effects
Effect	program	Estimate	Standard Error	DF	t Value	Pr > \|t\|
Intercept		4.9024	1.4352	9	3.42	0.0077
program	0	0.03780	0.6155	900	0.06	0.9511
program	1	0	.	.	.	.

In multilevel quadrature, the number of random effects within each school is $(1+1)=2$ : one for the school random intercept and one for the class random intercept. That is, the number of class random intercepts does not grow with the number of classes. With three quadrature points, this leads to $3^{(1+1)} = 9$ evaluations of conditional log likelihoods for any number of classes in this example. In addition to reducing computational demand, multilevel adaptive quadrature also reduces memory usage. Both the computational efficiency and the memory-usage efficiency that the FASTQUAD suboption affords enables you to handle much larger multilevel models.