Contents
About
Credits and Acknowledgments
Credits
Documentation
Software
Testing
Internationalization Testing
Technical Support
Acknowledgments
What’s New in SAS/STAT 14.1 High-Performance Procedures
Overview
SAS/STAT 14.1 High-Performance Procedure Enhancements
HPGENSELECT Procedure
HPLOGISTIC Procedure
HPPRINCOMP Procedure
HPSPLIT Procedure
Highlights of Enhancements in SAS/STAT 13.2 High-Performance Procedures
Introduction
Overview of SAS/STAT High-Performance Procedures
About This Book
Chapter Organization
Typographical Conventions
Options Used in Examples
Online Documentation
SAS Technical Support Services
Shared Concepts and Topics
Overview
Processing Modes
Single-Machine Mode
Distributed Mode
Controlling the Execution Mode with Environment Variables and Performance Statement Options
Determining Single-Machine Mode or Distributed Mode
Data Access Modes
Single-Machine Data Access Mode
Distributed Data Access Mode
Determining the Data Access Mode
Alongside-the-Database Execution
Alongside-LASR Distributed Execution
Running High-Performance Analytical Procedures Alongside a SAS LASR Analytic Server in Distributed Mode
Starting a SAS LASR Analytic Server Instance
Associating a SAS Libref with the SAS LASR Analytic Server Instance
Running a High-Performance Analytical Procedure Alongside the SAS LASR Analytic Server Instance
Terminating a SAS LASR Analytic Server Instance
Alongside-LASR Distributed Execution on a Subset of the Appliance Nodes
Running High-Performance Analytical Procedures in Asymmetric Mode
Running in Asymmetric Mode on Distinct Appliances
Alongside-HDFS Execution
Alongside-HDFS Execution by Using the SASHDAT Engine
Alongside-HDFS Execution by Using the Hadoop Engine
Output Data Sets
Working with Formats
PERFORMANCE Statement
Shared Statistical Concepts
Common Features of SAS High-Performance Statistical Procedures
Syntax Common to SAS High-Performance Statistical Procedures
CLASS Statement
FREQ Statement
ID Statement
SELECTION Statement
VAR Statement
WEIGHT Statement
Levelization of Classification Variables
Specification and Parameterization of Model Effects
Effect Operators
Bar and At Sign Operators
Colon, Dash, and Double Dash Operators
GLM Parameterization of Classification Variables and Effects
Intercept
Regression Effects
Main Effects
Interaction Effects
Nested Effects
Continuous-Nesting-Class Effects
Continuous-by-Class Effects
General Effects
Reference Parameterization
Model Selection
Methods
Full Model Fitted
Forward Selection
Backward Elimination
Stepwise Selection
Forward-Swap Selection
Least Angle Regression
Lasso Selection
Adaptive Lasso Selection
References
The HPCANDISC Procedure
Overview: HPCANDISC Procedure
PROC HPCANDISC Features
PROC HPCANDISC Compared with PROC CANDISC
Getting Started: HPCANDISC Procedure
Syntax: HPCANDISC Procedure
PROC HPCANDISC Statement
BY Statement
CLASS Statement
FREQ Statement
ID Statement
PERFORMANCE Statement
VAR Statement
WEIGHT Statement
Details: HPCANDISC Procedure
Missing Values
Computational Method
General Formulas
Multithreading
Output Data Sets
OUT= Data Set
OUTSTAT= Data Set
Displayed Output
ODS Table Names
Examples: HPCANDISC Procedure
Analyzing Iris Data with PROC HPCANDISC
Performing Canonical Discriminant Analysis in Single-Machine and Distributed Modes
References
The HPFMM Procedure
Overview: HPFMM Procedure
Basic Features
PROC HPFMM Contrasted with PROC FMM
Assumptions
Notation for the Finite Mixture Model
Homogeneous Mixtures
Special Mixtures
Getting Started: HPFMM Procedure
Mixture Modeling for Binomial Overdispersion: "Student," Pearson, Beer, and Yeast
Modeling Zero-Inflation: Is it Better to Fish Poorly or Not to Have Fished At All?
Looking for Multiple Modes: Are Galaxies Clustered?
Comparison with Roeder’s Method
Syntax: HPFMM Procedure
PROC HPFMM Statement
BAYES Statement
BY Statement
CLASS Statement
FREQ Statement
ID Statement
MODEL Statement
Response Variable Options
Model Options
OUTPUT Statement
PERFORMANCE Statement
PROBMODEL Statement
RESTRICT Statement
WEIGHT Statement
Details: HPFMM Procedure
A Gentle Introduction to Finite Mixture Models
The Form of the Finite Mixture Model
Mixture Models Contrasted with Mixing and Mixed Models: Untangling the Terminology Web
Overdispersion
Log-Likelihood Functions for Response Distributions
Bayesian Analysis
Conjugate Sampling
Metropolis-Hastings Algorithm
Latent Variables via Data Augmentation
Prior Distributions
Parameterization of Model Effects
Computational Method
Multithreading
Choosing an Optimization Algorithm
First- or Second-Order Algorithms
Algorithm Descriptions
Output Data Set
Default Output
Performance Information
Model Information
Class Level Information
Number of Observations
Response Profile
Default Output for Maximum Likelihood
Default Output for Bayes Estimation
ODS Table Names
ODS Graphics
Examples: HPFMM Procedure
Modeling Mixing Probabilities: All Mice Are Created Equal, but Some Are More Equal
The Usefulness of Custom Starting Values: When Do Cows Eat?
Enforcing Homogeneity Constraints: Count and Dispersion—It Is All Over!
References
The GAMPL Procedure
Overview: GAMPL Procedure
PROC GAMPL Features
PROC GAMPL Contrasted with PROC GAM
Getting Started: GAMPL Procedure
Syntax: GAMPL Procedure
PROC GAMPL Statement
CLASS Statement
FREQ Statement
ID Statement
MODEL Statement
OUTPUT Statement
PERFORMANCE Statement
WEIGHT Statement
Details: GAMPL Procedure
Missing Values
Thin-Plate Regression Splines
Generalized Additive Models
Model Evaluation Criteria
Fitting Algorithms
Degrees of Freedom
Model Inference
Dispersion Parameter
Tests for Smoothing Components
Computational Method: Multithreading
Choosing an Optimization Technique
First- or Second-Order Techniques
Technique Descriptions
Displayed Output
ODS Table Names
ODS Graphics
Examples: GAMPL Procedure
Scatter Plot Smoothing
Nonparametric Logistic Regression
Nonparametric Negative Binomial Model for Mackerel Egg Density
References
The HPGENSELECT Procedure
Overview: HPGENSELECT Procedure
PROC HPGENSELECT Features
PROC HPGENSELECT Contrasted with PROC GENMOD
Getting Started: HPGENSELECT Procedure
Syntax: HPGENSELECT Procedure
PROC HPGENSELECT Statement
BY Statement
CLASS Statement
CODE Statement
FREQ Statement
ID Statement
MODEL Statement
OUTPUT Statement
PARTITION Statement
PERFORMANCE Statement
RESTRICT Statement
SELECTION Statement
WEIGHT Statement
ZEROMODEL Statement
Details: HPGENSELECT Procedure
Missing Values
Exponential Family Distributions
Response Distributions
Response Probability Distribution Functions
Log-Likelihood Functions
The LASSO Method of Model Selection
Using Validation and Test Data
Computational Method: Multithreading
Choosing an Optimization Algorithm
First- or Second-Order Algorithms
Algorithm Descriptions
Displayed Output
ODS Table Names
Examples: HPGENSELECT Procedure
Model Selection
Modeling Binomial Data
Tweedie Model
Model Selection by the LASSO Method
References
The HPLMIXED Procedure
Overview: HPLMIXED Procedure
PROC HPLMIXED Features
Notation for the Mixed Model
PROC HPLMIXED Contrasted with Other SAS Procedures
Getting Started: HPLMIXED Procedure
Mixed Model Analysis of Covariance with Many Groups
Syntax: HPLMIXED Procedure
PROC HPLMIXED Statement
CLASS Statement
ID Statement
MODEL Statement
OUTPUT Statement
PARMS Statement
PERFORMANCE Statement
RANDOM Statement
REPEATED Statement
Details: HPLMIXED Procedure
Linear Mixed Models Theory
Matrix Notation
Formulation of the Mixed Model
Estimating Covariance Parameters in the Mixed Model
Estimating Fixed and Random Effects in the Mixed Model
Statistical Properties
Computational Method
Distributed Computing
Multithreading
Displayed Output
Performance Information
Model Information
Class Level Information
Dimensions
Number of Observations
Optimization Information
Iteration History
Convergence Status
Covariance Parameter Estimates
Fit Statistics
Timing Information
ODS Table Names
Examples: HPLMIXED Procedure
Computing BLUPs for a Large Number of Subjects
References
The HPLOGISTIC Procedure
Overview: HPLOGISTIC Procedure
PROC HPLOGISTIC Features
PROC HPLOGISTIC Contrasted with Other SAS Procedures
Getting Started: HPLOGISTIC Procedure
Binary Logistic Regression
Syntax: HPLOGISTIC Procedure
PROC HPLOGISTIC Statement
BY Statement
CLASS Statement
CODE Statement
FREQ Statement
ID Statement
MODEL Statement
OUTPUT Statement
PARTITION Statement
PERFORMANCE Statement
SELECTION Statement
WEIGHT Statement
Details: HPLOGISTIC Procedure
Missing Values
Response Distributions
Log-Likelihood Functions
Existence of Maximum Likelihood Estimates
Using Validation and Test Data
Model Fit and Assessment Statistics
The Hosmer-Lemeshow Goodness-of-Fit Test
Computational Method: Multithreading
Choosing an Optimization Algorithm
First- or Second-Order Algorithms
Algorithm Descriptions
Displayed Output
ODS Table Names
Examples: HPLOGISTIC Procedure
Model Selection
Modeling Binomial Data
Ordinal Logistic Regression
Partitioning Data
References
The HPNLMOD Procedure
Overview: HPNLMOD Procedure
PROC HPNLMOD Features
PROC HPNLMOD Contrasted with the NLIN and NLMIXED Procedures
Getting Started: HPNLMOD Procedure
Least Squares Model
Binomial Model
Syntax: HPNLMOD Procedure
PROC HPNLMOD Statement
BOUNDS Statement
BY Statement
ESTIMATE Statement
MODEL Statement
PARAMETERS Statement
PERFORMANCE Statement
PREDICT Statement
RESTRICT Statement
Programming Statements
Details: HPNLMOD Procedure
Least Squares Estimation
Built-In Log-Likelihood Functions
Computational Method
Choosing an Optimization Algorithm
Displayed Output
ODS Table Names
Examples: HPNLMOD Procedure
Segmented Model
References
The HPPLS Procedure
Overview: HPPLS Procedure
PROC HPPLS Features
PROC HPPLS Contrasted with PROC PLS
Getting Started: HPPLS Procedure
Spectrometric Calibration
Fitting a PLS Model
Selecting the Number of Factors by Test Set Validation
Predicting New Observations
Syntax: HPPLS Procedure
PROC HPPLS Statement
BY Statement
CLASS Statement
ID Statement
MODEL Statement
OUTPUT Statement
PARTITION Statement
PERFORMANCE Statement
Details: HPPLS Procedure
Regression Methods
Partial Least Squares
SIMPLS
Principal Components Regression
Reduced Rank Regression
Relationships between Methods
Test Set Validation
Centering and Scaling
Missing Values
Computational Method
Multithreading
Output Data Set
Displayed Output
Performance Information
Data Access Information
Centering and Scaling Information
Model Information
Number of Observations
Class Level Information
Dimensions
Test Set Validation
Percent Variation Accounted for by Extracted Factors
Model Details
Parameter Estimates
Timing Information
ODS Table Names
Examples: HPPLS Procedure
Choosing a PLS Model by Test Set Validation
Fitting a PLS Model in Single-Machine and Distributed Modes
References
The HPPRINCOMP Procedure
Overview: HPPRINCOMP Procedure
PROC HPPRINCOMP Features
PROC HPPRINCOMP Contrasted with PROC PRINCOMP
Getting Started: HPPRINCOMP Procedure
Syntax: HPPRINCOMP Procedure
PROC HPPRINCOMP Statement
BY Statement
CODE Statement
FREQ Statement
ID Statement
OUTPUT Statement
PARTIAL Statement
PERFORMANCE Statement
VAR Statement
WEIGHT Statement
Details: HPPRINCOMP Procedure
Computing Principal Components
Eigenvalue Decomposition
NIPALS
ITERGS
Missing Values
Output Data Sets
OUT= Data Set
OUTSTAT= Data Set
Computational Method
Multithreading
Displayed Output
Performance Information
Data Access Information
Model Information
Number of Observations
Number of Variables
Simple Statistics
Centering and Scaling Information
Explained Variation of Variables
Correlation Matrix
Regression Statistics
Regression Coefficients
Partial Correlation Matrix
Total Variance
Eigenvalues
Eigenvectors
Loadings
Timing Information
ODS Table Names
Examples: HPPRINCOMP Procedure
Analyzing Mean Temperatures of US Cities
Computing Principal Components in Single-Machine and Distributed Modes
Extracting Principal Components with NIPALS
References
The HPQUANTSELECT Procedure
Overview: HPQUANTSELECT Procedure
PROC HPQUANTSELECT Features
PROC HPQUANTSELECT Contrasted with Other SAS Procedures
Getting Started: HPQUANTSELECT Procedure
Syntax: HPQUANTSELECT Procedure
PROC HPQUANTSELECT Statement
BY Statement
CLASS Statement
CODE Statement
ID Statement
MODEL Statement
OUTPUT Statement
PARTITION Statement
PERFORMANCE Statement
SELECTION Statement
WEIGHT Statement
Details: HPQUANTSELECT Procedure
Quantile Regression
Linear Model with iid Errors
Linear-in-Parameter Model with Non-iid Settings
More Statistics for Parameter Estimates
Criteria Used in Model Selection
Quasi-likelihood Information Criteria
Statistical Tests for Significance Level
Diagnostic Statistics
Classification Variables and the SPLIT Option
Macro Variables That Contain Selected Effects
Using Validation and Test Data
Using the Validation ACL as the STOP= Criterion
Using the Validation ACL as the CHOOSE= Criterion
Using the Validation ACL as the SELECT= Criterion
Computational Method
Multithreading
Output Data Set
Displayed Output
Performance Information
Data Access Information
Model Information
Selection Information
Number of Observations
Class Level Information
Dimensions
Entry and Removal Candidates
Selection Summary
Stop Reason
Selection Reason
Selected Effects
Fit Statistics
Parameter Estimates
Timing Information
ODS Table Names
Examples: HPQUANTSELECT Procedure
Simulation Study
Growth Charts for Body Mass Index
References
The HPREG Procedure
Overview: HPREG Procedure
PROC HPREG Features
PROC HPREG Contrasted with Other SAS Procedures
Getting Started: HPREG Procedure
Syntax: HPREG Procedure
PROC HPREG Statement
BY Statement
CLASS Statement
CODE Statement
FREQ Statement
ID Statement
MODEL Statement
OUTPUT Statement
PARTITION Statement
PERFORMANCE Statement
SELECTION Statement
WEIGHT Statement
Details: HPREG Procedure
Criteria Used in Model Selection
Diagnostic Statistics
Classification Variables and the SPLIT Option
Using Validation and Test Data
Computational Method
Output Data Set
Screening
Displayed Output
ODS Table Names
Examples: HPREG Procedure
Model Selection with Validation
Backward Selection in Single-Machine and Distributed Modes
Forward-Swap Selection
Forward Selection with Screening
References
The HPSPLIT Procedure
Overview: HPSPLIT Procedure
PROC HPSPLIT Features
Getting Started: HPSPLIT Procedure
Syntax: HPSPLIT Procedure
PROC HPSPLIT Statement
CLASS Statement
CODE Statement
GROW Statement
ID Statement
MODEL Statement
OUTPUT Statement
PARTITION Statement
PERFORMANCE Statement
PRUNE Statement
RULES Statement
Details: HPSPLIT Procedure
Building a Decision Tree
Splitting Criteria
Splitting Strategy
Pruning
Memory Considerations
Primary and Surrogate Splitting Rules
Handling Missing Values
Unknown Values of Categorical Predictors
Scoring
Measures of Model Fit
Variable Importance
ODS Table Names
ODS Graphics
SAS Enterprise Miner Syntax and Notes
Examples: HPSPLIT Procedure
Building a Classification Tree for a Binary Outcome
Cost-Complexity Pruning with Cross Validation
Creating a Regression Tree
Creating a Binary Classification Tree with Validation Data
Assessing Variable Importance
Applying Breiman’s 1-SE Rule with Misclassification Rate
References
Product
Release
SAS/STAT
14.1
Type
Usage and Reference
Copyright Date
July 2015
Last Updated
14Jul2015