ANOVA Decomposition

The model that is produced by the multivariate adaptive regression splines algorithm can be formed as

\begin{align*} \hat{f}(\mb{x}) & = \beta _0 + \sum _{m=1}^ M\beta _ m \mb{B}_ m\\ & = \beta _0 + \sum _{m=1}^ M\beta _ m\prod _{k=1}^{K_ m}\mb{T}_ m(\mb{x}_{k,m},t_{k,m}) \end{align*}

Here $\hat{f}$ is the nonparametric estimate of the response variable in linear models and of the linked response variable in generalized linear models. M is the number of nonconstant bases. For each formed basis, $K_ m$ is the order of interaction, $\mb{T}_ m$ is the variable transformation function that depends on the variable type, $\mb{x}_{k,m}$ is variable for the kth component of the basis, and $t_{k,m}$ is the corresponding knot value or subset categories for the variable.

The function estimate can be recast into the form

\begin{align*} \hat{f}(\mb{x}) & = \beta _0 + \sum _{i:K_ m=1}f_ i(\mb{x}_ i)+ \sum _{i,j:K_ m=2}f_{ij}(\mb{x}_ i,\mb{x}_ j)+ \sum _{i,j,k:K_ m=3}f_{ijk}(\mb{x}_ i,\mb{x}_ j,\mb{x}_ k)+\cdots \end{align*}

where $f_ i$ represents the sum of bases that involve a single variable $\mb{x}_ i$, $f_{ij}$ represents the sum of bases that involve two-way interactions between transformations of two variables, and so on. The univariate function $f_ i$ is a linear regression spline for variable $\mb{x}_ i$, which represent the univariate contribution of $\mb{x}_ i$ to the model. Let

\[ f_{ij}^*=f_ i(\mb{x}_ i)+f_ j(\mb{x}_ j)+f_{ij}(\mb{x}_ i,\mb{x}_ j) \]

Then this bivariate function is a tensor product regression spline that represents the joint contribution by both $\mb{x}_ i$ and $\mb{x}_ j$. Multivariate functions can be formed similarly if higher-order interaction terms are present in the model. Because of its similarity to the analysis of variance for contingency tables, this is referred as the ANOVA decomposition of the multivariate adaptive regression splines model.