If Y is a discrete random variable with mass function and support (possible values) , then the expectation (expected value) of Y is defined as

provided that , otherwise the sum in the definition is not well-defined. The expected value of a function is similarly defined: provided that ,

For continuous random variables, similar definitions apply, but summation is replaced by integration over the support of the random variable. If X is a continuous random variable with density function , and , then the expectation of X is defined as

The expected value of a random variable is also called its *mean* or its first moment. A particularly important function of a random variable is . The expectation of is called the *variance* of Y or the second central moment of Y. When you study the properties of multiple random variables, then you might be interested in aspects of their joint distribution.
The covariance between random variables Y and X is defined as the expected value of the function , where the expectation is taken under the bivariate joint distribution of Y and X:

The *covariance* between a random variable and itself is the variance, .

In statistical applications and formulas, random variables are often collected into vectors. For example, a random sample of size n from the distribution of Y generates a random vector of order ,

The expected value of the random vector is the vector of the means of the elements of :

It is often useful to directly apply rules about working with means, variances, and covariances of random vectors. To develop these rules, suppose that and denote two random vectors with typical elements and . Further suppose that and are constant (nonstochastic) matrices, that is a constant vector, and that the are scalar constants.

The following rules enable you to derive the mean of a linear function of a random vector:

The *covariance matrix* of and is the matrix whose typical element in row i, column j is the covariance between and . The covariance matrix between two random vectors is frequently denoted with the "operator."

The *variance matrix* of a random vector is the covariance matrix between and itself. The variance matrix is frequently denoted with the "operator."

Because the variance matrix contains variances on the diagonal and covariances in the off-diagonal positions, it is also
referred to as the *variance-covariance matrix* of the random vector .

If the elements of the covariance matrix are zero, the random vectors are uncorrelated. If and are normally distributed, then a zero covariance matrix implies that the vectors are stochastically independent. If the off-diagonal elements of the variance matrix are zero, the elements of the random vector are uncorrelated. If is normally distributed, then a diagonal variance matrix implies that its elements are stochastically independent.

Suppose that and are constant (nonstochastic) matrices and that denotes a scalar constant. The following results are useful in manipulating covariance matrices:

Since , these results can be applied to produce the following results, useful in manipulating variances of random vectors:

Another area where expectation rules are helpful is quadratic forms in random variables. These forms arise particularly in the study of linear statistical models and in linear statistical inference. Linear inference is statistical inference about linear function of random variables, even if those random variables are defined through nonlinear models. For example, the parameter estimator might be derived in a nonlinear model, but this does not prevent statistical questions from being raised that can be expressed through linear functions of ; for example,

if is a matrix of constants and is a random vector, then