In animal breeding, it is common to model genetic and environmental effects with a random effect for the animal. When there are many animals being studied, this can lead to very large mixed model equations to be solved. In this example we present an analysis of simulated data with this structure.

Suppose you have 3000 animals from five different genetic species raised on 100 different farms. The following DATA step simulates
40000 observations of milk yield (`Yield`

) from a linear mixed model with variables `Species`

and `Farm`

in the fixed-effect model and `Animal`

as a random effect. The random effect due to `Animal`

is simulated with a variance of 4.0, while the residual error variance is 8.0. These variance component values reflect the
fact that variation in milk yield is typically genetically controlled to be no more than 33% (4/(4+8)).

data Sim; keep Species Farm Animal Yield; array AnimalEffect{3000}; array AnimalFarm{3000}; array AnimalSpecies{3000}; do i = 1 to dim(AnimalEffect); AnimalEffect{i} = sqrt(4.0)*rannor(12345); AnimalFarm{i} = 1 + int(100*ranuni(12345)); AnimalSpecies{i} = 1 + int(5*ranuni(12345)); end; do i = 1 to 40000; Animal = 1 + int(3000*ranuni(12345)); Species = AnimalSpecies{Animal}; Farm = AnimalFarm{Animal}; Yield = 1 + Species + Farm/10 + AnimalEffect{Animal} + sqrt(8.0)*rannor(12345); output; end; run;

A simple linear mixed model analysis is performed by using the following SAS statements:

proc hpmixed data=Sim; class Species Farm Animal; model Yield = Species Species*Farm; random Animal; test Species*Farm; contrast 'Species1 = Species2 = Species3' Species 1 0 -1, Species 0 1 -1; run;

Selected results from the preceding SAS statements are shown in Figure 55.1 through Figure 55.4.

The "Class Level Information" table in Figure 55.1 shows that the three model effects have 5, 100, and 3000 levels, respectively. Only a portion of the levels are displayed
by default. The "Dimensions" table shows that the model contains a single G-side covariance parameter and a single R-side
covariance parameter. R-side covariance parameters are those associated with the covariance matrix in the conditional distribution, given the random effects. In the case of the HPMIXED procedure this matrix is simply and the single R-side covariance parameter corresponds to the residual variance. The G-side parameter is the variance of
the random `Animal`

effect; the matrix is a diagonal matrix with the common variance on the diagonal.

Figure 55.1: Class Levels and Dimensions

Taking into account the intercept as well as the number of levels of the `Species`

and `Species*Farm`

effects, the matrix for this problem has 506 columns, so that the mixed model equations

have 3506 rows and columns. This is a substantial computational problem: simply storing a single copy of this matrix in dense format requires nearly 50 megabytes of memory. The sparse matrix techniques of PROC HPMIXED use a small fraction of this amount of memory and a similarly small fraction of the CPU time required to solve the equations with dense techniques. For more information about sparse versus dense techniques, see the section Sparse Matrix Techniques.

Figure 55.2 displays the covariance parameter estimates at convergence of the REML algorithm. The variance component estimate for animal effect is and for residual . These estimates are close to the simulated values (4.0 and 8.0).

Figure 55.2: Estimates of Variance Components

The TEST
statement requests a Type III test of the fixed effect in the model. By default, the HPMIXED procedure does not compute Type
III tests, because they can be computationally demanding. The tests of the `Species*Farm`

effect is highly significant. That indicates animals of a genetic species perform differently in different environments.

Figure 55.3: Type III Tests of Fixed Effect

You can use the CONTRAST or ESTIMATE statement to test custom linear hypotheses involving the fixed and/or random effects. The CONTRAST statement in the preceding program tests the null hypothesis that there are no differences among the first three genetic species. Results from this analysis are shown in Figure 55.4. The small p-value indicates that there are significant differences among the first three genetics species.

Figure 55.4: Result of CONTRAST Statement