Expectations of Random Variables and Vectors

If Y is a discrete random variable with mass function $p(y)$ and support (possible values) $y_1,y_2,\cdots $, then the expectation (expected value) of Y is defined as

\[  \mr {E}[Y] = \sum _{j=1}^\infty y_ j \,  p(y_ j)  \]

provided that $\sum |y_ j|p(y_ j) < \infty $, otherwise the sum in the definition is not well-defined. The expected value of a function $h(y)$ is similarly defined: provided that $\sum |h(y_ j)|p(y_ j) <\infty $,

\[  \mr {E}[h(Y)] = \sum _{j=1}^\infty h(y_ j)\, p(y_ j)  \]

For continuous random variables, similar definitions apply, but summation is replaced by integration over the support of the random variable. If X is a continuous random variable with density function $f(x)$, and $\int |x|f(x)dx < \infty $, then the expectation of X is defined as

\[  \mr {E}[X] = \int _{-\infty }^{\infty } x f(x) \,  dx  \]

The expected value of a random variable is also called its mean or its first moment. A particularly important function of a random variable is $h(Y) = (Y - \mr {E}[Y])^2$. The expectation of $h(Y)$ is called the variance of Y or the second central moment of Y. When you study the properties of multiple random variables, then you might be interested in aspects of their joint distribution. The covariance between random variables Y and X is defined as the expected value of the function $(Y-\mr {E}[Y])(X-\mr {E}[X])$, where the expectation is taken under the bivariate joint distribution of Y and X:

\[  \mr {Cov}[Y,X] = \mr {E}[(Y-\mr {E}[Y])(X-\mr {E}[X])] = \mr {E}[YX] - \mr {E}[Y]\mr {E}[X] = \int \int x\, y f(x,y) \, \, dxdy - \mr {E}[Y]\mr {E}[X]  \]

The covariance between a random variable and itself is the variance, $\mr {Cov}[Y,Y] = \mr {Var}[Y]$.

In statistical applications and formulas, random variables are often collected into vectors. For example, a random sample of size n from the distribution of Y generates a random vector of order $(n \times 1)$,

\[  \bY = \left[ \begin{array}{c} Y_1 \cr Y_2 \cr \vdots \cr Y_ n \end{array}\right]  \]

The expected value of the $(n \times 1)$ random vector $\bY $ is the vector of the means of the elements of $\bY $:

\[  \mb {E}[\bY ] = \left[\mr {E}\left[Y_ i\right]\right] = \left[ \begin{array}{c} \mr {E}[Y_1] \cr \mr {E}[Y_2] \cr \vdots \cr \mr {E}[Y_ n] \end{array}\right]  \]

It is often useful to directly apply rules about working with means, variances, and covariances of random vectors. To develop these rules, suppose that $\bY $ and $\mb {U}$ denote two random vectors with typical elements $Y_1,\cdots ,Y_ n$ and $U_1,\cdots ,U_ k$. Further suppose that $\bA $ and $\mb {B}$ are constant (nonstochastic) matrices, that $\mb {a}$ is a constant vector, and that the $c_ i$ are scalar constants.

The following rules enable you to derive the mean of a linear function of a random vector:

$\displaystyle  \mr {E}[\bA ] = $
$\displaystyle  \, \, \bA  $
$\displaystyle \mr {E}[\bY + \mb {a}] = $
$\displaystyle  \, \, \mr {E}[\bY ]  $
$\displaystyle \mr {E}[\mb {AY}+ \mb {a}] = $
$\displaystyle  \, \, \bA \mr {E}[\bY ]+\mb {a}  $
$\displaystyle \mr {E}[\bY + \bU ] = $
$\displaystyle  \, \, \mr {E}[\bY ] + \mr {E}[\bU ] $

The covariance matrix of $\bY $ and $\bU $ is the $(n \times k)$ matrix whose typical element in row i, column j is the covariance between $Y_ i$ and $U_ j$. The covariance matrix between two random vectors is frequently denoted with the $\mr {Cov}$ operator.

$\displaystyle  \mr {Cov}[\bY ,\bU ] = $
$\displaystyle  \left[\mr {Cov}[Y_ i,U_ j]\right]  $
$\displaystyle = $
$\displaystyle  \, \, \mr {E}\left[ \left(\bY -\mr {E}[\bY ]\right) \left(\bU -\mr {E}[\bU ]\right)^\prime \right] = \mr {E}\left[\mb {YU}’\right] - \mr {E}[\bY ]\mr {E}[\bU ]’  $
$\displaystyle  = $
$\displaystyle  \left[\begin{array}{lllll} \mr {Cov}[Y_1,U_1] &  \mr {Cov}[Y_1,U_2] &  \mr {Cov}[Y_1,U_3] &  \cdots &  \mr {Cov}[Y_1,U_ k] \cr \mr {Cov}[Y_2,U_1] &  \mr {Cov}[Y_2,U_2] &  \mr {Cov}[Y_2,U_3] &  \cdots &  \mr {Cov}[Y_2,U_ k] \cr \mr {Cov}[Y_3,U_1] &  \mr {Cov}[Y_3,U_2] &  \mr {Cov}[Y_3,U_3] &  \cdots &  \mr {Cov}[Y_3,U_ k] \cr \vdots &  \vdots &  \vdots &  \ddots &  \vdots \cr \mr {Cov}[Y_ n,U_1] &  \mr {Cov}[Y_ n,U_2] &  \mr {Cov}[Y_ n,U_3] &  \cdots &  \mr {Cov}[Y_ n,U_ k] \cr \end{array}\right]  $

The variance matrix of a random vector $\bY $ is the covariance matrix between $\bY $ and itself. The variance matrix is frequently denoted with the $\mr {Var}$ operator.

$\displaystyle  \mr {Var}[\bY ] = $
$\displaystyle  \, \, \mr {Cov}[\bY ,\bY ] = \left[\mr {Cov}[Y_ i,Y_ j]\right]  $
$\displaystyle = $
$\displaystyle  \, \, \mr {E}\left[ \left(\bY -\mr {E}[\bY ]\right) \left(\bY -\mr {E}[\bY ]\right)^\prime \right] = \mr {E}\left[\mb {YY}’\right] - \mr {E}[\bY ]\mr {E}[\bY ]’  $
$\displaystyle  = $
$\displaystyle  \left[\begin{array}{lllll} \mr {Cov}[Y_1,Y_1] &  \mr {Cov}[Y_1,Y_2] &  \mr {Cov}[Y_1,Y_3] &  \cdots &  \mr {Cov}[Y_1,Y_ n] \cr \mr {Cov}[Y_2,Y_1] &  \mr {Cov}[Y_2,Y_2] &  \mr {Cov}[Y_2,Y_3] &  \cdots &  \mr {Cov}[Y_2,Y_ n] \cr \mr {Cov}[Y_3,Y_1] &  \mr {Cov}[Y_3,Y_2] &  \mr {Cov}[Y_3,Y_3] &  \cdots &  \mr {Cov}[Y_3,Y_ n] \cr \vdots &  \vdots &  \vdots &  \ddots &  \vdots \cr \mr {Cov}[Y_ n,Y_1] &  \mr {Cov}[Y_ n,Y_2] &  \mr {Cov}[Y_ n,Y_3] &  \cdots &  \mr {Cov}[Y_ n,Y_ n] \cr \end{array}\right]  $
$\displaystyle  = $
$\displaystyle  \left[\begin{array}{lllll} \mr {Var}[Y_1] &  \mr {Cov}[Y_1,Y_2] &  \mr {Cov}[Y_1,Y_3] &  \cdots &  \mr {Cov}[Y_1,Y_ n] \cr \mr {Cov}[Y_2,Y_1] &  \mr {Var}[Y_2] &  \mr {Cov}[Y_2,Y_3] &  \cdots &  \mr {Cov}[Y_2,Y_ n] \cr \mr {Cov}[Y_3,Y_1] &  \mr {Cov}[Y_3,Y_2] &  \mr {Var}[Y_3] &  \cdots &  \mr {Cov}[Y_3,Y_ n] \cr \vdots &  \vdots &  \vdots &  \ddots &  \vdots \cr \mr {Cov}[Y_ n,Y_1] &  \mr {Cov}[Y_ n,Y_2] &  \mr {Cov}[Y_ n,Y_3] &  \cdots &  \mr {Var}[Y_ n] \cr \end{array}\right]  $

Because the variance matrix contains variances on the diagonal and covariances in the off-diagonal positions, it is also referred to as the variance-covariance matrix of the random vector $\bY $.

If the elements of the covariance matrix $\mr {Cov}[\bY ,\bU ]$ are zero, the random vectors are uncorrelated. If $\bY $ and $\bU $ are normally distributed, then a zero covariance matrix implies that the vectors are stochastically independent. If the off-diagonal elements of the variance matrix $\mr {Var}[\bY ]$ are zero, the elements of the random vector $\bY $ are uncorrelated. If $\bY $ is normally distributed, then a diagonal variance matrix implies that its elements are stochastically independent.

Suppose that $\bA $ and $\mb {B}$ are constant (nonstochastic) matrices and that $c_ i$ denotes a scalar constant. The following results are useful in manipulating covariance matrices:

$\displaystyle  \mr {Cov}[\mb {AY},\bU ] = $
$\displaystyle  \, \, \bA \mr {Cov}[\bY ,\bU ]  $
$\displaystyle \mr {Cov}[\bY ,\mb {BU}] = $
$\displaystyle  \, \, \mr {Cov}[\bY ,\bU ]\bB ’  $
$\displaystyle \mr {Cov}[\mb {AY},\mb {BU}] = $
$\displaystyle  \, \, \bA \mr {Cov}[\bY ,\bU ]\bB ’  $
$\displaystyle \mr {Cov}[c_1\bY _1+c_2\bU _1,c_3\bY _2+c_4\bU _2] = $
$\displaystyle  \, \, c_1c_3 \mr {Cov}[\bY _1,\bY _2] + c_1c_4 \mr {Cov}[\bY _1,\bU _2]  $
$\displaystyle + $
$\displaystyle  \, \, c_2c_3 \mr {Cov}[\bU _1,\bY _2] + c_2c_4 \mr {Cov}[\bU _1,\bU _2]  $

Since $\mr {Cov}[\bY ,\bY ] = \mr {Var}[\bY ]$, these results can be applied to produce the following results, useful in manipulating variances of random vectors:

$\displaystyle  \mr {Var}[\bA ] = $
$\displaystyle  \, \, \mb {0}  $
$\displaystyle \mr {Var}[\mb {AY}] = $
$\displaystyle  \, \, \bA \mr {Var}[\bY ]\bA ’  $
$\displaystyle \mr {Var}[\bY +\mb {x}] = $
$\displaystyle  \, \, \mr {Var}[\bY ]  $
$\displaystyle \mr {Var}[\mb {x}’\bY ] = $
$\displaystyle  \, \, \mb {x}’\mr {Var}[\bY ]\mb {x}  $
$\displaystyle \mr {Var}[c_1\bY ] = $
$\displaystyle  \, \, c_1^2 \mr {Var}[\bY ]  $
$\displaystyle \mr {Var}[c_1\bY +c_2\mb {U}]= $
$\displaystyle  \, \, c_1^2\mr {Var}[\bY ] + c_2^2\mr {Var}[\mb {U}] + 2c_1c_2\mr {Cov}[\bY ,\mb {U}]  $

Another area where expectation rules are helpful is quadratic forms in random variables. These forms arise particularly in the study of linear statistical models and in linear statistical inference. Linear inference is statistical inference about linear function of random variables, even if those random variables are defined through nonlinear models. For example, the parameter estimator $\widehat{\btheta }$ might be derived in a nonlinear model, but this does not prevent statistical questions from being raised that can be expressed through linear functions of $\btheta $; for example,

\[  H_0\colon \left\{ \begin{array}{cc} \theta _1 - 2\theta _2 = 0 \cr \theta _2 - \theta _3 = 0 \end{array}\right.  \]

if $\bA $ is a matrix of constants and $\bY $ is a random vector, then

\[  \mr {E}[\bY ’\mb {AY}] = \mr {trace}(\bA \mr {Var}[\bY ]) + \mr {E}[\bY ]’\bA \mr {E}[\bY ]  \]