The GLIMMIX procedure computes knots for lowrank smoothing based on the vertices or centroids of a kd tree. The default is to use the vertices of the tree as the knot locations, if you use the TYPE=RSMOOTH covariance structure. The construction of this tree amounts to a partitioning of the random regressor space until all partitions contain at most b observations. The number b is called the bucket size of the kd tree. You can exercise control over the construction of the tree by changing the bucket size with the BUCKET= suboption of the KNOTMETHOD=KDTREE option in the RANDOM statement. A large bucket size leads to fewer knots, but it is not correct to assume that K, the number of knots, is simply . The number of vertices depends on the configuration of the values in the regressor space. Also, coordinates of the bounding hypercube are vertices of the tree. In the onedimensional case, for example, the extreme values of the random effect are vertices.
To demonstrate how the kd tree partitions the randomeffects space based on observed data and the influence of the bucket size, consider the following
example from Chapter 53: The LOESS Procedure. The SAS data set Gas
contains the results of an engine exhaust emission study (Brinkman, 1981). The covariate in this analysis, E
, is a measure of the airfuel mixture richness. The response, NOx
, measures the nitric oxide concentration (in micrograms per joule, and normalized).
data Gas; input NOx E; format NOx E f5.3; datalines; 4.818 0.831 2.849 1.045 3.275 1.021 4.691 0.97 4.255 0.825 5.064 0.891 2.118 0.71 4.602 0.801 2.286 1.074 0.97 1.148 3.965 1 5.344 0.928 3.834 0.767 1.99 0.701 5.199 0.807 5.283 0.902 3.752 0.997 0.537 1.224 1.64 1.089 5.055 0.973 4.937 0.98 1.561 0.665 ;
There are 22 observations in the data set, and the values of the covariate are unique. If you want to smooth these data with
a lowrank radial smoother, you need to choose the number of knots, as well as their placement within the support of the variable
E
. The kd tree construction depends on the observed values of the variable E
; it is independent of the values of nitric oxide in the data. The following statements construct a tree based on a bucket
size of b = 11 and display information about the tree and the selected knots:
ods select KDtree KnotInfo; proc glimmix data=gas nofit; model NOx = e; random e / type=rsmooth knotmethod=kdtree(bucket=11 treeinfo knotinfo); run;
The NOFIT option prevents the GLIMMIX procedure from fitting the model. This option is useful if you want to investigate the knot construction for various bucket sizes. The TREEINFO and KNOTINFO suboptions of the KNOTMETHOD=KDTREE option request displays of the kd tree and the knot coordinates derived from it. Construction of the tree commences by splitting the data in half. For b = 11, n = 22, neither of the two splits contains more than b observations and the process stops. With a single split value, and the two extreme values, the tree has two terminal nodes and leads to three knots (Figure 41.13). Note that for onedimensional problems, vertices of the kd tree always coincide with data values.
Figure 41.13: Kd Tree and Knots for Bucket Size 11
kdTree for RSmooth(E)  

Node Number  Left Child  Right Child  Split Direction  Split Value 
0  1  2  E  0.9280 
1  TERMINAL  
2  TERMINAL 
Radial Smoother Knots for RSmooth(E) 


Knot Number  E 
1  0.6650 
2  0.9280 
3  1.2240 
If the bucket size is reduced to b = 8, the following statements produce the tree and knots in Figure 41.14:
ods select KDtree KnotInfo; proc glimmix data=gas nofit; model NOx = e; random e / type=rsmooth knotmethod=kdtree(bucket=8 treeinfo knotinfo); run;
The initial split value of 0.9280 leads to two sets of 11 observations. In order to achieve a partition into cells that contain at most eight observations, each initial partition is split at its median one more time. Note that one split value is greater and one split value is less than 0.9280.
Figure 41.14: Kd Tree and Knots for Bucket Size 8
kdTree for RSmooth(E)  

Node Number  Left Child  Right Child  Split Direction  Split Value 
0  1  2  E  0.9280 
1  3  4  E  0.8070 
2  5  6  E  1.0210 
3  TERMINAL  
4  TERMINAL  
5  TERMINAL  
6  TERMINAL 
Radial Smoother Knots for RSmooth(E) 


Knot Number  E 
1  0.6650 
2  0.8070 
3  0.9280 
4  1.0210 
5  1.2240 
A further reduction in bucket size to b = 4 leads to the tree and knot information shown in Figure 41.15.
Figure 41.15: Kd Tree and Knots for Bucket Size 4
kdTree for RSmooth(E)  

Node Number  Left Child  Right Child  Split Direction  Split Value 
0  1  2  E  0.9280 
1  3  4  E  0.8070 
2  9  10  E  1.0210 
3  5  6  E  0.7100 
4  7  8  E  0.8910 
5  TERMINAL  
6  TERMINAL  
7  TERMINAL  
8  TERMINAL  
9  11  12  E  0.9800 
10  13  14  E  1.0890 
11  TERMINAL  
12  TERMINAL  
13  TERMINAL  
14  TERMINAL 
Radial Smoother Knots for RSmooth(E) 


Knot Number  E 
1  0.6650 
2  0.7100 
3  0.8070 
4  0.8910 
5  0.9280 
6  0.9800 
7  1.0210 
8  1.0890 
9  1.2240 
The split value for b = 11 is also a split value for b = 8, the split values for b = 8 are a subset of those for b = 4, and so forth. Figure 41.16 displays the data and the location of split values for the three cases. For a onedimensional problem (a univariate smoother), the vertices comprise the split values and the values on the bounding interval.
You might want to move away from the boundary, in particular for an irregular data configuration or for multivariate smoothing.
The KNOTTYPE=CENTER suboption of the KNOTMETHOD= option chooses centroids of the leaf node cells instead of vertices. This tends to move the outer knot locations closer to
the convex hull, but not necessarily to data locations. In the emission example, choosing a bucket size of b = 11 and centroids as knot locations yields two knots at E
=0.7956 and E
=1.076. If you choose the NEAREST suboption, then the nearest neighbor of a vertex or centroid will serve as the knot location. In this case, the knot locations
are a subset of the data locations, regardless of the dimension of the smooth.