Overview of Data Analysis in SAS Visual Analytics Explorer

Types of Data Analysis

SAS Visual Analytics enables you to perform three basic types of data analysis:
Correlation
identifies the degree of statistical relationship between measures.
Fit Line
plots a model of the relationship between measures. There are many types of fit lines, including linear fit, quadratic fit, cubic fit, and penalized B-spline.
Forecasting
estimates future values for your data based on statistical trends.

Correlation

Correlation identifies the degree of statistical relationship between measures. The strength of a correlation is described as a number between -1 and 1. A value that is close to -1 implies a strong negative correlation, a value that is close to 0 implies little or no correlation, and a value that is close to 1 implies a strong positive correlation.
To apply correlation to a visualization, add a linear fit line, or select the correlation matrix visualization type.
For a heat map or a simple scatter plot, the correlation is identified by a text label in the visualization legend. Select About these correlation results to view additional details about the correlation, including the exact correlation value.
For a scatter plot matrix, the correlation for each plot is identified by a colored border around the plot. The visualization legend displays a key for the color values. Select About these correlation results to view additional details about the correlation, including the exact correlation values for each plot.
Note: For nonlinear fit types, a scatter plot matrix displays additional plots to show each intersection of variables in two orientations. For example, if a scatter plot matrix plots the variables A, B, and C, then plots are created for both A * B and B * A when a nonlinear fit line is applied.
For a correlation matrix, the correlation for each cell is identified by the color of the cell background. The visualization legend displays a key for the color values. The data tip for each cell displays the correlation value.

Fit Lines

A fit line plots a model of the relationship between measures. You can apply fit lines to scatter plots and heat maps.
You can apply the following types of fit line to your visualization:
Best Fit
selects the most appropriate model (linear, quadratic, or cubic) for your data. The Best Fit method uses backward variable selection to select the highest-order model that is significant. To see the final model that was used, select About these regression results from the visualization legend.
Linear
creates a linear fit line from a linear regression algorithm. A linear fit line produces the straight line that best represents the relationship between two measures. For more information about the linear fit line, select About these regression results from the visualization legend.
For a linear fit, correlation is automatically added to the visualization. Correlation is not available with other fit types.
Quadratic
creates a quadratic fit line. A quadratic fit produces a line with a single curve. A quadratic fit line often produces a line with the shape of a parabola. For more information about the quadratic fit line, select About these regression results from the visualization legend.
Cubic
creates a cubic fit line. A cubic fit line produces a line with two curves. A cubic fit line often produces a line with an “S” shape. For more information about the cubic fit line, select About these regression results from the visualization legend.
PSpline
creates a penalized B-spline. A penalized B-spline is a smoothing spline that fits the data closely. A penalized B-spline can display a complex line with many changes in its curvature. For more information about the penalized B-spline, select About these regression results from the visualization legend.

Forecasting

Forecasting estimates future values for your data based on statistical trends. Forecasting is available only for line charts that contain date or datetime data items.
A forecast adds a line with predicted values to your visualization and a colored band that represents the confidence interval. For example, a 95% confidence interval is the data range where the forecasting model is 95% confident what the future values will be.
The explorer automatically tests multiple forecasting models against your data, and then selects the best model. To see which forecasting model was used, select About these correlation results from the visualization legend.
The forecast model can be any one of the following:
  • Damped-trend exponential smoothing
  • Linear exponential smoothing
  • Seasonal exponential smoothing
  • Simple exponential smoothing
  • Winters method (additive)
  • Winters method (multiplicative)
Note: Forecasting accounts for cyclical patterns by using standard intervals of time (for example, 60 minutes in an hour, 24 hours in a day, and so on). If your data uses nonstandard intervals (for example, 48 30-minute cycles per day), then cyclical patterns are not considered in the forecast.