A linear regression
attempts to predict the value of a measure response variable as a
linear function of one or more effects. The linear regression model
uses the least squares method to determine the model.
To create the linear
regression for this example, complete the following steps:
-
Click
to create a new visualization.
-
Click
to specify that this visualization is a linear regression.
-
Drag and drop the variable
Emission
of Total Hydrocarbons (g/mi) into the
Response field
on the
Roles tab.
-
Drag and drop the variables
Vehicle
Clusters,
Vehicle Manufacturer,
Test
Procedure,
Vehicle Cylinders,
Vehicle
MPG,
Vehicle Gears, and
Vehicle
Weight (lbs) onto the visualization. SAS Visual Analytics
automatically creates a linear regression model using these variables
as the effects.
-
Drag and drop the variable
Vehicle
Type into the
Group By field
on the
Roles tab. This specifies that
Vehicle
Type is the segmentation variable.
The results windows
are updated. Instead of creating one model for the entire input data
set, separate models are created for each measurement level of the
group by variable. In this example, that means that separate models
were created based on a vehicle’s classification as a car,
a truck, or both.
-
Select the
Properties tab
in the right pane. Select
Informative missingness.
Enabling this property indicates that missing values are used in the
model.
-
In the
Fit
Summary window, click the
CAR segment.
In the Influence
Plot, Cook’s D is the default influence statistic.
Notice that the first bar in this plot is significantly larger than
all the other bars. From this, you can guess that the observations
represented by this bar are outliers, and you should exclude them
from the model.
To exclude these observations,
click the bar to select it. Right-click the bar, and select Exclude
Selected.
The model for the CAR segment
is updated to account for the excluded observations.
-