The GAMPL Procedure

Thin-Plate Regression Splines

The GAMPL procedure uses thin-plate regression splines (Wood 2003) to construct spline basis expansions. The thin-plate regression splines are based on thin-plate smoothing splines (Duchon 1976, 1977). Compared to thin-plate smoothing splines, thin-plate regression splines produce fewer basis expansions and thus make direct fitting of generalized additive models possible.

Thin-Plate Smoothing Splines

Consider the problem of estimating a smoothing function f of $\mb{x}$ with d covariates from n observations. The model assumes

\[ y_ i = f(\mb{x}_ i)+\epsilon _ i, i=1, \dots , n \]

Then the thin-plate smoothing splines estimate the smoothing function f by minimizing the penalized least squares function:

\[ \sum _{i=1}^ n (y_ i-f(\mb{x}_ i))^2 + \lambda J_{m,d}(f) \]

The penalty term $\lambda J_{m,d}(f)$ includes the function that measures roughness on the f estimate:

\[ J_{m,d}(f)= \int \cdots \int \sum _{\alpha _1+\cdots +\alpha _ d=m} \frac{m!}{\alpha _1!\cdots \alpha _ d !} \left( \begin{array}{c} \frac{\partial ^ m f}{\partial {x_1}^{\alpha _1}\cdots \partial {x_ d}^{\alpha _ d}} \end{array}\right)^2 \, \mr{d}x_1\cdots \mr{d}x_ d \]

The parameter m (which corresponds to the M= option for a spline effect) specifies how the penalty is applied to the function roughness. Function derivatives whose order is less than m are not penalized. The relation $2m>d$ must be satisfied.

The penalty term also includes the smoothing parameter $\lambda \in [0,\infty )$, which controls the trade-off between the model’s fidelity to the data and the function smoothness of f. When $\lambda =0$, the function estimate corresponds to an interpolation. When $\lambda \to \infty $, the function estimate becomes the least squares fit. By using the defined penalized least squares criterion and a fixed $\lambda $ value, you can explicitly express the estimate of the smooth function f in the following form:

\[ f_{\lambda }(\mb{x}) = \sum _{j=1}^ M\theta _ j\phi _ j(\mb{x})+\sum _ i^ n\delta _ i\eta _{m,d}(\| \mb{x}-\mb{x}_ i\| ) \]

In the expression of $f_{\lambda }(\mb{x})$, $\delta _ i$ and $\theta _ j$ are coefficients to be estimated. The functions $\phi _ j(\mb{x})$ correspond to unpenalized polynomials of $\mb{x}$ with degrees up to $m-1$. The total number of these polynomials is $M={m+d-1\choose d}$. The function $\eta _{m,d}$ models the extra nonlinearity besides the polynomials and is a function of the Euclidean distance r between any $\mb{x}$ value and an observed $\mb{x}_ i$ value:

\begin{equation*} \eta _{m,d}(r) = \begin{cases} \frac{(-1)^{m+1+d/2}}{2^{2m-1}\pi ^{d/2}(m-1)!(m-d/2)!}r^{2m-d}\log (r) & \text {if } d \text { is even}\\ \frac{\Gamma (d/2-m)}{2^{2m}\pi ^{d/2}(m-1)!}r^{2m-d} & \text {if } d \text { is odd} \end{cases}\end{equation*}

Define the penalty matrix $\bE $ such that each entry $E_{ij}=\eta _{m,d}(\| \mb{x}_ i-\mb{x}_ j\| )$, let $\mb{y}$ be the vector of the response, let $\bT $ be the matrix where each row is formed by $\phi _ j(\mb{x})$, and let $\btheta $ and $\bdelta $ be vectors of coefficients $\theta _ j$ and $\delta _ i$. Then you can obtain the function estimate $f_{\lambda }$ from the following minimization problem:

\[ \min \| \mb{y}-\bT \btheta -\bE \bdelta \| ^2+\lambda \bdelta ’\bE \bdelta \quad \text {subject to} \quad \bT ’\bdelta =\mb{0} \]

For more information about thin-plate smoothing splines, see Chapter 116: The TPSPLINE Procedure in SAS/STAT 14.1 User's Guide.

Low-Rank Approximation

Given the representation of the thin-plate smoothing spline, the estimate of f involves as many parameters as the number of unique data points. Solving $(\btheta ,\bdelta )$ with an optimum $\lambda $ becomes difficult for large problems.

Because the matrix $\bE $ is symmetric and nonnegative definite, the eigendecomposition can be taken as $\bE = \bU \bD \bU ’$, where $\bD $ is the diagonal matrix of eigenvalues $d_{i}$ of $\bE $, and $\bU $ is the matrix of eigenvectors that corresponds to $d_ i$. The truncated eigendecomposition forms $\tilde{\bE }_ k$, which is an approximation to $\bE $ such that

\[ \tilde{\bE }_ k = \bU _ k \bD _ k \bU _ k’ \]

where $\bD _ k$ is a diagonal matrix that contains the k most extreme eigenvalues in descending order of absolute values: $|\tilde{d}_1|>\cdots >|\tilde{d}_ k|$. $\bU _ k$ is the matrix that is formed by columns of eigenvectors that correspond to the eigenvalues in $\bD _ k$.

The approximation $\tilde{\bE }_ k$ not only reduces the dimension from $n\times n$ of $\bE $ to $n\times k$ but also is optimal in two senses. First, $\tilde{\bE }_ k$ minimizes the spectral norm $\| \bE -\bF _ k\| _2$ between $\bE $ and all rank k matrices $\bF _ k$. Second, $\tilde{\bE }_ k$ also minimizes the worst possible change that is introduced by the eigenspace truncation as defined by

\[ \max _{\bdelta \ne \mb{0}} \frac{\bdelta '(\bE -\bG _ k)\bdelta }{\| \bdelta \| ^2} \]

where $\bG _ k$ is formed by any k eigenvalues and corresponding eigenvectors. For more information, see Wood (2003).

Now given $\bE \approx \tilde{\bE }_ k$ and $\tilde{\bE }_ k = \bU _ k \bD _ k \bU _ k’$, and letting $\bdelta _ k=\bU _ k’\bdelta $, the minimization problem becomes

\[ \min \| \mb{y}-\bT \btheta -\bU _ k\bD _ k\bdelta _ k\| ^2+\lambda \bdelta _ k’\bD _ k\bdelta _ k \quad \text {subject to} \quad \bT ’\bU _ k\bdelta _ k=\mb{0} \]

You can turn the constrained optimization problem into an unconstrained one by using any orthogonal column basis $\bZ $. One way to form $\bZ $ is via the QR decomposition of $\bU _ k’\bT $:

\[ \bU _ k’\bT = [\bQ _1 \bQ _2]\left[\begin{matrix} \bR \\ \mb{0} \end{matrix}\right] \]

Let $\bZ =\bQ _2$. Then it is verified that

\[ \bT ’\bU _ k\bZ = \bR ’\bQ _1’\bQ _2 = \mb{0} \]

So for $\bdelta _ k$ such that $\bT ’\bU _ k\bdelta _ k=\mb{0}$, it is true that $\bdelta _ k=\bZ \tilde{\bdelta }$. Now the problem becomes the unconstrained optimization,

\[ \min \| \mb{y}-\bT \btheta -\bU _ k\bD _ k\bZ \tilde{\bdelta }\| ^2+ \lambda \tilde{\bdelta }’\bZ ’\bD _ k \bZ \tilde{\bdelta } \]


\[ \bbeta = \left[\begin{matrix} \btheta \\ \tilde{\bdelta } \end{matrix}\right], \quad \bX = [\bT : \bU _ k\bD _ k\bZ ], \quad \text {and} \quad \bS = \left[\begin{matrix} \mb{0} & \mb{0} \\ \mb{0} & \bZ ’\bD _ k \bZ \end{matrix}\right] \]

The optimization is simplified as

\[ \min \| \mb{y} - \bX \bbeta \| ^2 + \lambda \bbeta ’\bS \bbeta \quad \text {with respect to}\quad \bbeta \]