The GAMPL Procedure

Thin-Plate Regression Splines

Subsections:

Thin-Plate Smoothing Splines
Low-Rank Approximation

The GAMPL procedure uses thin-plate regression splines (Wood 2003) to construct spline basis expansions. The thin-plate regression splines are based on thin-plate smoothing splines (Duchon 1976, 1977). Compared to thin-plate smoothing splines, thin-plate regression splines produce fewer basis expansions and thus make direct fitting of generalized additive models possible.

Thin-Plate Smoothing Splines

Consider the problem of estimating a smoothing function f of $\mb{x}$ with d covariates from n observations. The model assumes

$y_ i = f(\mb{x}_ i)+\epsilon _ i, i=1, \dots , n$

Then the thin-plate smoothing splines estimate the smoothing function f by minimizing the penalized least squares function:

$\sum _{i=1}^ n (y_ i-f(\mb{x}_ i))^2 + \lambda J_{m,d}(f)$

The penalty term $\lambda J_{m,d}(f)$ includes the function that measures roughness on the f estimate:

$J_{m,d}(f)= \int \cdots \int \sum _{\alpha _1+\cdots +\alpha _ d=m} \frac{m!}{\alpha _1!\cdots \alpha _ d !} \left( \begin{array}{c} \frac{\partial ^ m f}{\partial {x_1}^{\alpha _1}\cdots \partial {x_ d}^{\alpha _ d}} \end{array}\right)^2 \, \mr{d}x_1\cdots \mr{d}x_ d$

The parameter m (which corresponds to the M= option for a spline effect) specifies how the penalty is applied to the function roughness. Function derivatives whose order is less than m are not penalized. The relation $2m>d$ must be satisfied.

The penalty term also includes the smoothing parameter $\lambda \in [0,\infty )$ , which controls the trade-off between the model’s fidelity to the data and the function smoothness of f. When $\lambda =0$ , the function estimate corresponds to an interpolation. When $\lambda \to \infty$ , the function estimate becomes the least squares fit. By using the defined penalized least squares criterion and a fixed $\lambda$ value, you can explicitly express the estimate of the smooth function f in the following form:

$f_{\lambda }(\mb{x}) = \sum _{j=1}^ M\theta _ j\phi _ j(\mb{x})+\sum _ i^ n\delta _ i\eta _{m,d}(\| \mb{x}-\mb{x}_ i\| )$

In the expression of $f_{\lambda }(\mb{x})$ , $\delta _ i$ and $\theta _ j$ are coefficients to be estimated. The functions $\phi _ j(\mb{x})$ correspond to unpenalized polynomials of $\mb{x}$ with degrees up to $m-1$ . The total number of these polynomials is $M={m+d-1\choose d}$ . The function $\eta _{m,d}$ models the extra nonlinearity besides the polynomials and is a function of the Euclidean distance r between any $\mb{x}$ value and an observed $\mb{x}_ i$ value:

$\begin{equation*} \eta _{m,d}(r) = \begin{cases} \frac{(-1)^{m+1+d/2}}{2^{2m-1}\pi ^{d/2}(m-1)!(m-d/2)!}r^{2m-d}\log (r) & \text {if } d \text { is even}\\ \frac{\Gamma (d/2-m)}{2^{2m}\pi ^{d/2}(m-1)!}r^{2m-d} & \text {if } d \text { is odd} \end{cases}\end{equation*}$

Define the penalty matrix $\bE$ such that each entry $E_{ij}=\eta _{m,d}(\| \mb{x}_ i-\mb{x}_ j\| )$ , let $\mb{y}$ be the vector of the response, let $\bT$ be the matrix where each row is formed by $\phi _ j(\mb{x})$ , and let $\btheta$ and $\bdelta$ be vectors of coefficients $\theta _ j$ and $\delta _ i$ . Then you can obtain the function estimate $f_{\lambda }$ from the following minimization problem:

$\min \| \mb{y}-\bT \btheta -\bE \bdelta \| ^2+\lambda \bdelta ’\bE \bdelta \quad \text {subject to} \quad \bT ’\bdelta =\mb{0}$

For more information about thin-plate smoothing splines, see Chapter 116: The TPSPLINE Procedure.

Low-Rank Approximation

Given the representation of the thin-plate smoothing spline, the estimate of f involves as many parameters as the number of unique data points. Solving $(\btheta ,\bdelta )$ with an optimum $\lambda$ becomes difficult for large problems.

Because the matrix $\bE$ is symmetric and nonnegative definite, the eigendecomposition can be taken as $\bE = \bU \bD \bU ’$ , where $\bD$ is the diagonal matrix of eigenvalues $d_{i}$ of $\bE$ , and $\bU$ is the matrix of eigenvectors that corresponds to $d_ i$ . The truncated eigendecomposition forms $\tilde{\bE }_ k$ , which is an approximation to $\bE$ such that

$\tilde{\bE }_ k = \bU _ k \bD _ k \bU _ k’$

where $\bD _ k$ is a diagonal matrix that contains the k most extreme eigenvalues in descending order of absolute values: $|\tilde{d}_1|>\cdots >|\tilde{d}_ k|$ . $\bU _ k$ is the matrix that is formed by columns of eigenvectors that correspond to the eigenvalues in $\bD _ k$ .

The approximation $\tilde{\bE }_ k$ not only reduces the dimension from $n\times n$ of $\bE$ to $n\times k$ but also is optimal in two senses. First, $\tilde{\bE }_ k$ minimizes the spectral norm $\| \bE -\bF _ k\| _2$ between $\bE$ and all rank k matrices $\bF _ k$ . Second, $\tilde{\bE }_ k$ also minimizes the worst possible change that is introduced by the eigenspace truncation as defined by

$\max _{\bdelta \ne \mb{0}} \frac{\bdelta '(\bE -\bG _ k)\bdelta }{\| \bdelta \| ^2}$

where $\bG _ k$ is formed by any k eigenvalues and corresponding eigenvectors. For more information, see Wood (2003).

Now given $\bE \approx \tilde{\bE }_ k$ and $\tilde{\bE }_ k = \bU _ k \bD _ k \bU _ k’$ , and letting $\bdelta _ k=\bU _ k’\bdelta$ , the minimization problem becomes

$\min \| \mb{y}-\bT \btheta -\bU _ k\bD _ k\bdelta _ k\| ^2+\lambda \bdelta _ k’\bD _ k\bdelta _ k \quad \text {subject to} \quad \bT ’\bU _ k\bdelta _ k=\mb{0}$

You can turn the constrained optimization problem into an unconstrained one by using any orthogonal column basis $\bZ$ . One way to form $\bZ$ is via the QR decomposition of $\bU _ k’\bT$ :

$\bU _ k’\bT = [\bQ _1 \bQ _2]\left[\begin{matrix} \bR \\ \mb{0} \end{matrix}\right]$

Let $\bZ =\bQ _2$ . Then it is verified that

$\bT ’\bU _ k\bZ = \bR ’\bQ _1’\bQ _2 = \mb{0}$

So for $\bdelta _ k$ such that $\bT ’\bU _ k\bdelta _ k=\mb{0}$ , it is true that $\bdelta _ k=\bZ \tilde{\bdelta }$ . Now the problem becomes the unconstrained optimization,

$\min \| \mb{y}-\bT \btheta -\bU _ k\bD _ k\bZ \tilde{\bdelta }\| ^2+ \lambda \tilde{\bdelta }’\bZ ’\bD _ k \bZ \tilde{\bdelta }$

Let

$\bbeta = \left[\begin{matrix} \btheta \\ \tilde{\bdelta } \end{matrix}\right], \quad \bX = [\bT : \bU _ k\bD _ k\bZ ], \quad \text {and} \quad \bS = \left[\begin{matrix} \mb{0} & \mb{0} \\ \mb{0} & \bZ ’\bD _ k \bZ \end{matrix}\right]$

The optimization is simplified as

$\min \| \mb{y} - \bX \bbeta \| ^2 + \lambda \bbeta ’\bS \bbeta \quad \text {with respect to}\quad \bbeta$