Contents
About
Credits and Acknowledgments
Credits
Documentation
Software
Testing
Internationalization Testing
Technical Support
Acknowledgments
What’s New in SAS High-Performance Statistics 13.1
Overview
New Procedures
HPCANDISC Procedure
HPFMM Procedure
HPPRINCOMP Procedure
Procedure Enhancements
HPLMIXED Procedure
HPREG Procedure
Introduction
Overview of SAS/STAT High-Performance Procedures
About This Book
Chapter Organization
Typographical Conventions
Options Used in Examples
Online Documentation
SAS Technical Support Services
Shared Concepts and Topics
Overview
Processing Modes
Single-Machine Mode
Distributed Mode
Symmetric and Asymmetric Distributed Modes
Symmetric Mode
Asymmetric Mode
Controlling the Execution Mode with Environment Variables and Performance Statement Options
Determining Single-Machine Mode or Distributed Mode
Alongside-the-Database Execution
Alongside-LASR Distributed Execution
Running High-Performance Analytical Procedures Alongside a SAS LASR Analytic Server in Distributed Mode
Starting a SAS LASR Analytic Server Instance
Associating a SAS Libref with the SAS LASR Analytic Server Instance
Running a High-Performance Analytical Procedure Alongside the SAS LASR Analytic Server Instance
Terminating a SAS LASR Analytic Server Instance
Alongside-LASR Distributed Execution on a Subset of the Appliance Nodes
Running High-Performance Analytical Procedures in Asymmetric Mode
Running in Symmetric Mode
Running in Asymmetric Mode on One Appliance
Running in Asymmetric Mode on Distinct Appliances
Alongside-HDFS Execution
Alongside-HDFS Execution by Using the SASHDAT Engine
Alongside-HDFS Execution by Using the Hadoop Engine
Output Data Sets
Working with Formats
PERFORMANCE Statement
Shared Statistical Concepts
Common Features of SAS High-Performance Statistical Procedures
Syntax Common to SAS High-Performance Statistical Procedures
CLASS Statement
FREQ Statement
ID Statement
SELECTION Statement
VAR Statement
WEIGHT Statement
Levelization of Classification Variables
Specification and Parameterization of Model Effects
Effect Operators
Bar and At Sign Operators
Colon, Dash, and Double Dash Operators
GLM Parameterization of Classification Variables and Effects
Intercept
Regression Effects
Main Effects
Interaction Effects
Nested Effects
Continuous-Nesting-Class Effects
Continuous-by-Class Effects
General Effects
Reference Parameterization
Model Selection
Methods
Full Model Fitted
Forward Selection
Backward Elimination
Stepwise Selection
Forward-Swap Selection
Least Angle Regression
Lasso Selection
Adaptive Lasso Selection
References
The HPCANDISC Procedure
Overview: HPCANDISC Procedure
PROC HPCANDISC Features
PROC HPCANDISC Compared with PROC CANDISC
Getting Started: HPCANDISC Procedure
Syntax: HPCANDISC Procedure
PROC HPCANDISC Statement
BY Statement
CLASS Statement
FREQ Statement
ID Statement
PERFORMANCE Statement
VAR Statement
WEIGHT Statement
Details: HPCANDISC Procedure
Missing Values
Computational Method
General Formulas
Multithreading
Output Data Sets
OUT= Data Set
OUTSTAT= Data Set
Displayed Output
ODS Table Names
Examples: HPCANDISC Procedure
Analyzing Iris Data with PROC HPCANDISC
Performing Canonical Discriminant Analysis in Single-Machine and Distributed Modes
References
The HPFMM Procedure
Overview: HPFMM Procedure
Basic Features
PROC HPFMM Contrasted with PROC FMM
Assumptions
Notation for the Finite Mixture Model
Homogeneous Mixtures
Special Mixtures
Getting Started: HPFMM Procedure
Mixture Modeling for Binomial Overdispersion: “Student,” Pearson, Beer, and Yeast
Modeling Zero-Inflation: Is it Better to Fish Poorly or Not to Have Fished At All?
Looking for Multiple Modes: Are Galaxies Clustered?
Comparison with Roeder’s Method
Syntax: HPFMM Procedure
PROC HPFMM Statement
BAYES Statement
BY Statement
CLASS Statement
FREQ Statement
ID Statement
MODEL Statement
Response Variable Options
Model Options
OUTPUT Statement
PERFORMANCE Statement
PROBMODEL Statement
RESTRICT Statement
WEIGHT Statement
Details: HPFMM Procedure
A Gentle Introduction to Finite Mixture Models
The Form of the Finite Mixture Model
Mixture Models Contrasted with Mixing and Mixed Models: Untangling the Terminology Web
Overdispersion
Log-Likelihood Functions for Response Distributions
Bayesian Analysis
Conjugate Sampling
Metropolis-Hastings Algorithm
Latent Variables via Data Augmentation
Prior Distributions
Parameterization of Model Effects
Computational Method
Multithreading
Choosing an Optimization Algorithm
First- or Second-Order Algorithms
Algorithm Descriptions
Output Data Set
Default Output
Performance Information
Model Information
Class Level Information
Number of Observations
Response Profile
Default Output for Maximum Likelihood
Default Output for Bayes Estimation
ODS Table Names
ODS Graphics
Examples: HPFMM Procedure
Modeling Mixing Probabilities: All Mice Are Created Equal, but Some Are More Equal
The Usefulness of Custom Starting Values: When Do Cows Eat?
Enforcing Homogeneity Constraints: Count and Dispersion—It Is All Over!
References
The HPGENSELECT Procedure
Overview: HPGENSELECT Procedure
PROC HPGENSELECT Features
PROC HPGENSELECT Contrasted with PROC GENMOD
Getting Started: HPGENSELECT Procedure
Syntax: HPGENSELECT Procedure
PROC HPGENSELECT Statement
CLASS Statement
CODE Statement
FREQ Statement
ID Statement
MODEL Statement
OUTPUT Statement
PERFORMANCE Statement
SELECTION Statement
WEIGHT Statement
ZEROMODEL Statement
Details: HPGENSELECT Procedure
Missing Values
Exponential Family Distributions
Response Distributions
Response Probability Distribution Functions
Log-Likelihood Functions
Computational Method: Multithreading
Choosing an Optimization Algorithm
First- or Second-Order Algorithms
Algorithm Descriptions
Displayed Output
ODS Table Names
Examples: HPGENSELECT Procedure
Model Selection
Modeling Binomial Data
Tweedie Model
References
The HPLMIXED Procedure
Overview: HPLMIXED Procedure
PROC HPLMIXED Features
Notation for the Mixed Model
PROC HPLMIXED Contrasted with Other SAS Procedures
Getting Started: HPLMIXED Procedure
Mixed Model Analysis of Covariance with Many Groups
Syntax: HPLMIXED Procedure
PROC HPLMIXED Statement
CLASS Statement
ID Statement
MODEL Statement
OUTPUT Statement
PARMS Statement
PERFORMANCE Statement
RANDOM Statement
REPEATED Statement
Details: HPLMIXED Procedure
Linear Mixed Models Theory
Matrix Notation
Formulation of the Mixed Model
Estimating Covariance Parameters in the Mixed Model
Estimating Fixed and Random Effects in the Mixed Model
Statistical Properties
Computational Method
Distributed Computing
Multithreading
Displayed Output
Performance Information
Model Information
Class Level Information
Dimensions
Number of Observations
Optimization Information
Iteration History
Convergence Status
Covariance Parameter Estimates
Fit Statistics
Timing Information
ODS Table Names
Examples: HPLMIXED Procedure
Computing BLUPs for a Large Number of Subjects
References
The HPLOGISTIC Procedure
Overview: HPLOGISTIC Procedure
PROC HPLOGISTIC Features
PROC HPLOGISTIC Contrasted with Other SAS Procedures
Getting Started: HPLOGISTIC Procedure
Binary Logistic Regression
Syntax: HPLOGISTIC Procedure
PROC HPLOGISTIC Statement
BY Statement
CLASS Statement
CODE Statement
FREQ Statement
ID Statement
MODEL Statement
OUTPUT Statement
PERFORMANCE Statement
SELECTION Statement
WEIGHT Statement
Details: HPLOGISTIC Procedure
Missing Values
Response Distributions
Log-Likelihood Functions
Existence of Maximum Likelihood Estimates
Generalized Coefficient of Determination
The Hosmer-Lemeshow Goodness-of-Fit Test
Computational Method: Multithreading
Choosing an Optimization Algorithm
First- or Second-Order Algorithms
Algorithm Descriptions
Displayed Output
ODS Table Names
Examples: HPLOGISTIC Procedure
Model Selection
Modeling Binomial Data
Ordinal Logistic Regression
Conditional Logistic Regression for Matched Pairs Data
References
The HPNLMOD Procedure
Overview: HPNLMOD Procedure
PROC HPNLMOD Features
PROC HPNLMOD Contrasted with the NLIN and NLMIXED Procedures
Getting Started: HPNLMOD Procedure
Least Squares Model
Binomial Model
Syntax: HPNLMOD Procedure
PROC HPNLMOD Statement
BOUNDS Statement
BY Statement
ESTIMATE Statement
MODEL Statement
PARAMETERS Statement
PERFORMANCE Statement
PREDICT Statement
RESTRICT Statement
Programming Statements
Details: HPNLMOD Procedure
Least Squares Estimation
Built-In Log-Likelihood Functions
Computational Method
Choosing an Optimization Algorithm
Displayed Output
ODS Table Names
Examples: HPNLMOD Procedure
Segmented Model
References
The HPPRINCOMP Procedure
Overview: HPPRINCOMP Procedure
PROC HPPRINCOMP Features
PROC HPPRINCOMP Contrasted with PROC PRINCOMP
Getting Started: HPPRINCOMP Procedure
Syntax: HPPRINCOMP Procedure
PROC HPPRINCOMP Statement
BY Statement
CODE Statement
FREQ Statement
ID Statement
PARTIAL Statement
PERFORMANCE Statement
VAR Statement
WEIGHT Statement
Details: HPPRINCOMP Procedure
Missing Values
Output Data Sets
OUT= Data Set
OUTSTAT= Data Set
Computational Method
Multithreading
Displayed Output
Performance Information
Number of Observations
Number of Variables
Simple Statistics
Correlation Matrix
Regression Statistics
Regression Coefficients
Partial Correlation Matrix
Total Variance
Eigenvalues
Eigenvectors
Timing Information
ODS Table Names
Examples: HPPRINCOMP Procedure
Analyzing Mean Temperatures of US Cities
Computing Principal Components in Single-Machine and Distributed Modes
References
The HPREG Procedure
Overview: HPREG Procedure
PROC HPREG Features
PROC HPREG Contrasted with Other SAS Procedures
Getting Started: HPREG Procedure
Syntax: HPREG Procedure
PROC HPREG Statement
BY Statement
CLASS Statement
CODE Statement
FREQ Statement
ID Statement
MODEL Statement
OUTPUT Statement
PARTITION Statement
PERFORMANCE Statement
SELECTION Statement
WEIGHT Statement
Details: HPREG Procedure
Criteria Used in Model Selection
Diagnostic Statistics
Classification Variables and the SPLIT Option
Using Validation and Test Data
Computational Method
Output Data Set
Screening
Displayed Output
ODS Table Names
Examples: HPREG Procedure
Model Selection with Validation
Backward Selection in Single-Machine and Distributed Modes
Forward-Swap Selection
Forward Selection with Screening
References
The HPSPLIT Procedure
Overview: HPSPLIT Procedure
PROC HPSPLIT Features
Getting Started: HPSPLIT Procedure
Syntax: HPSPLIT Procedure
PROC HPSPLIT Statement
CODE Statement
CRITERION Statement
ID Statement
INPUT Statement
OUTPUT Statement
PARTITION Statement
PERFORMANCE Statement
PRUNE Statement
RULES Statement
SCORE Statement
TARGET Statement
Details: HPSPLIT Procedure
Building a Tree
Interval Input Binning Details
Input Variable Splitting and Selection
Pruning
Memory Considerations
Handling Missing Values
Handling Unknown Levels in Scoring
Splitting Criteria
Pruning Criteria
Subtree Statistics
Variable Importance
Outputs
Examples: HPSPLIT Procedure
Creating a Node Rules Description of a Tree
Assessing Variable Importance
References
Product
Release
SAS/STAT
13.1
Type
Usage and Reference
Copyright Date
December 2013
Last Updated
17Dec2013