Role
|
Description
|
---|---|
Roles
|
|
Analysis
variables
|
The variables that you assign to this role are the numeric variables for which you want statistics. You must assign at least one variable to this role.
|
Classification
variables
|
The variables that you assign to this role are character or discrete numeric variables that are used to divide the input data into categories
or subgroups. The statistics
are calculated on all selected analysis variables for each unique combination of classification variables.
|
Additional Roles
|
|
Group analysis
by
|
The variables that you assign to this role are used to compute separate statistics
for each distinct value or combination of values of the Group analysis by variables. The data is automatically
sorted by the variables in this role before the statistics are computed.
|
Frequency
count
|
When you assign a variable to this role, each observation in the table is assumed to represent n observations, where n is the value of the frequency count for that row. Statistics are calculated accordingly. You can assign a maximum of
one variable to this role.
|
Weight variable
|
If you assign a variable to this role, the value of the variable for each observation
is used to calculate weighted means, variances, and sums. You can assign a maximum
of one variable to this role.
|
Option Name
|
Description
|
---|---|
Statistics
|
|
Basic Statistics
|
|
Mean
|
is the arithmetic average, calculated by adding the values of an analysis variable and dividing this sum by the number of nonmissing observations.
|
Standard
deviation
|
is a statistical measure of the variability of a group of data values. This measure, which is the most widely used measure of the dispersion of a frequency distribution, is equal to the positive square root of the variance.
|
Minimum
value
|
is the smallest value for an analysis variable.
|
Maximum
value
|
is the largest value for an analysis variable.
|
Median
|
is the middle value for an analysis variable.
|
Number of
observations
|
is the total number of observations with nonmissing values.
|
Number of
missing values
|
is the total number of observations with missing values.
|
Additional Statistics
|
|
Standard
error
|
is the standard deviation of the sample mean. The standard error is defined as the ratio of the sample standard deviation to the square root of the sample size.
Note: This option is available
only if Degrees of freedom is selected in
the Divisor for standard deviation and variance drop-down
list.
|
Variance
|
is a statistical measure of dispersion of data values. This measure is an average
of the total squared dispersion between each observation and the sample mean.
|
Mode
|
is the most frequent value for the analysis variable.
|
Range
|
is the difference between
the largest and smallest values in the data.
|
Sum
|
is the sum of all values in the analysis variable.
|
Sum of weights
|
is the sum of the numeric variable that is used to weight each observation.
Note: You cannot compute the sum
of the weights unless you assign a variable to the Weight
variable role.
|
Confidence
limits for the mean
|
is the two-sided confidence limits for the mean. A two-sided confidence interval for the mean has the following upper and lower limits: , where s is and is the of the Student’s t statistics
with degrees of freedom.
|
Coefficient
of variation
|
is a unitless measure of relative variability. This measure is defined as the ratio
of the standard deviation to the mean expressed as a percentage. The coefficient of variation is meaningful only if the variable is measured on a ratio scale.
|
Skewness
|
is skewness, which measures the tendency of the deviations to be larger in one direction than in the other.
|
Kurtosis
|
is the kurtosis, which
measures the heaviness of tails.
|
Percentiles
|
|
1st, 5th,
10th, Lower quartile, Median, Upper quartile, 90th, 95th, 99th, Interquartile
range
|
choose the percentiles and quantiles to compute.
|
Quantile
method
|
specifies the method
that is used to compute the quantiles, median, and percentiles.
Order statistics
reads all of the data
into memory and sorts it by the unique values.
Piecewise-parabolic algorithm
approximates the quantile and is a less memory-intensive method.
Note: If you assigned a variable
to the Weight variable role, only the Order
statistics method is available.
|
Plots
|
|
Histogram
|
creates a graph that is used to determine the distribution of the data. If you add
a normal density curve, the task uses the sample mean and sample standard deviation for and . If you add a kernel density curve, the task uses the AMISE method to compute the kernel density estimates.
To include the statistics
in the graph, select the Add inset statistics check
box.
|
Comparative
box plot
|
creates a graph that shows a measure of central location (the median), two measures of dispersion (the range and interquartile range), the skewness (from the orientation of the median relative to the quartiles), and potential outliers. Box plots are especially useful in comparing two or more sets of data.
Note: The Comparative
box plot option is available only when a column is assigned
to the Classification variable role.
You can choose to add the overall inset statistics to the graph or only the inset statistics for each group.
|
Histogram
and box plot
|
displays the histogram and box plots together in a single panel, sharing common X axes. You can choose to add the overall
inset statistics to the graph.
Note: The Histogram
and box plot option is available only when no column
is assigned to the Classification variable role.
|
Details
|
|
Divisor
for standard deviation and variance
|
specifies the divisor
to use in the calculation of the variance and standard deviation.
Here are the valid options:
Degrees of freedom
By default, the divisor for the variance is the degrees of freedom.
Number of observations
n
Note: The Sum of weights
minus one and the Sum of weights options
are available only if you assigned a variable to the Weight
variable role.
|