The IRT Procedure (Experimental)

Notation for the Item Response Theory Model

This section introduces the mathematical notation that is used throughout the chapter to describe the item response theory (IRT) model. For a description of the fitting algorithms and the mathematical-statistical details, see the section Details: IRT Procedure.

A d-dimensional IRT model that has J ordinal responses can be expressed by the equations

\[  y_{ij} = \blambda _ j\bm {\eta }_ i + \epsilon _{ij}  \]
\[  p_{ijk} = \Pr (u_{ij}=k) = \Pr (\alpha _{(j,k-1)}<y_{ij}<\alpha _{(j,k)}), \hspace{3mm} k=1,\ldots ,K  \]

where $u_{ij}$ is the observed ordinal response from subject i for item j, $y_{ij}$ is a continuous latent response that underlies $u_{ij}$, $\balpha _ j=(\alpha _{(j,0)}=-\infty ,\alpha _{(j,1)},\ldots ,\alpha _{(j,K-1)},\alpha _{(j,K)}=\infty )$ is a vector of threshold parameters for item j, $\blambda _ j$ is a vector of slope (or discrimination) parameters for item j, $\bm {\eta }_ i=(\eta _{i1}, \ldots , \eta _{id})$ is a vector of latent factors for subject i, $\bm {\eta }_ i \sim N_ d(\bmu ,\bSigma )$, and $\bepsilon _ i=(\epsilon _{i1},\ldots ,\epsilon _{iJ})$ is a vector of unique factors for subject i. All the unique factors in $\bepsilon _ i$ are independent from one another, suggesting that $y_{ij}, j=1,\ldots , J$, are independent conditional on the latent factor $\bm {\eta }_ i$. This is the so-called local independence assumption. Finally, $\bm {\eta }_ i$ and $\bepsilon _ i$ are also independent.

Based on the preceding model specification,

\[  p_{ijk} = \int _{\alpha _{(j,k-1)}}^{\alpha _{(j,k)}} p(y; \lambda _ j\eta _ i,1)dy = \int _{\alpha _{(j,k-1)}-\lambda _ j\eta _ i}^{\alpha _{(j,k)}-\lambda _ j\eta _ i} p(y; 0,1)dy  \]

where $p$ is determined by the link function. It is the density function of the standard normal distribution if the probit link is used, or the density function of the logistic distribution if the logistic link is used.

Let $\bLambda = (\blambda _1^ T, \ldots , \blambda _ J^ T)$ denote the slope matrix. To identify the model in exploratory analysis, the upper triangular elements of $\bLambda $ are fixed as zero, the factor means $\bmu $ is fixed as a zero vector, and the factor variance covariance matrix $\bSigma $ is fixed as an identity matrix. For confirmatory analysis, it is assumed that the identification problem is solved by user-specified constraints.

The model that is specified in the preceding equation is called the multidimensional graded response model. When the responses are binary and there is only one latent factor, this model reduces to the two-parameter model, which can be expressed as follows:

\[  y_{ij} = \lambda _ j\eta _ i + \epsilon _{ij}  \]
\[  p_{ij} = \Pr (u_{ij}=1) = \Pr (y_{ij}>\alpha _ j)  \]

A different parameterization for the two-parameter model is

\[  y_{ij} = a_ j(\eta _ i - b_ j) + \epsilon _{ij}  \]
\[  p_{ij} = \Pr (u_{ij}=1) = \Pr (y_{ij}>0)  \]

where $b_ j$ is interpreted as item difficulty and $a_ j$ is called the discrimination parameter. The preceding two parameterizations are mathematically equivalent. For binary response items, you can transfer the threshold parameter into the difficulty parameter by $b_ j = \frac{\alpha _ j}{\lambda _ j}$. The IRT procedure uses the first parameterization.

The two-parameter model reduces to a one-parameter model when slope parameters for all the items are constrained to be equal. In the case where logistic link is used, the one- and two-parameter models are often abbreviated as 1PL and 2PL. When all the slope parameters are set to 1 and the factor variance is set to a free parameter, the Rasch model is obtained.

You can obtain three- and four-parameter models by introducing the guessing and ceiling parameters. Let $g_ j$ and $c_ j$ denote the item-specific guessing and ceiling parameters. Then the four-parameter model can be expressed as

\[  p_{ij} = \Pr (u_{ij}=1) = g_ j + (c_ j - g_ j)\Pr (y_{ij}>0)  \]

This model reduces to the three-parameter model when $c_ j=1$.