Subsetting Data and Selecting Observations

It is often necessary to subset data for analysis. You might need to subset data to do the following:

  • restrict the time range. For example, you want to perform a time series analysis using only recent data and ignoring observations from the distant past.

  • select cross sections of the data. (See the section Cross-Sectional Dimensions and BY Groups.) For example, you have a data set with observations over time for each of several states, and you want to analyze the data for a single state.

  • select particular kinds of time series from an interleaved-form data set. (See the section Interleaved Time Series.) For example, you have an output data set produced by the FORECAST procedure that contains both forecast and confidence limits observations, and you want to extract only the forecast observations.

  • exclude particular observations. For example, you have an outlier in your time series, and you want to exclude this observation from the analysis.

You can subset data either by using the DATA step to create a subset data set or by using a WHERE statement with the SAS procedure that analyzes the data.

A typical WHERE statement used in a procedure has the following form:

proc arima data=full;
   where '31dec1993'd < date < '26mar1994'd;
   identify var=close;
run;

For complete reference documentation on the WHERE statement, see SAS Language Reference: Dictionary.