SAS Global Forum 2016 Proceedings

A B C D E F G H I L M P Q R S T U W

A

Session 7660-2016:

A SAS^® Macro for Generating Random Numbers of Skew-Normal and Skew-T Distributions

This paper aims to show a SAS^® macro for generating random numbers of skew-normal and skew-t distributions, as well as the quantiles of these distributions. The results are similar to those generated by the sn package of R software.

Read the paper (PDF) | Download the data file (ZIP) | View the e-poster or slides (PDF)

Alan Silva, University of Brasilia

Paulo Henrique Dourado da Silva, Banco do Brasil

Session 6474-2016:

A SAS^® Macro for Generating a Ternary Graph in a Ceramics Industry Process

This work presents a macro for generating a ternary graph that can be used to solve a problem in the ceramics industry. The ceramics industry uses mixtures of clay types to generate a product with good properties that can be assessed according to breakdown pressure after incineration, the porosity, and water absorption. Beyond these properties, the industry is concerned with managing geological reserves of each type of clay. Thus, it is important to seek alternative compositions that present properties similar to the default mixture. This can be done by analyzing the surface response of these properties according to the clay composition, which is easily done in a ternary graph. SAS^® documentation does not describe how to adjust an analysis grid in nonrectangular forms on graphs. A macro triaxial, however, can generate ternary graphical analysis by creating a special grid, using an annotated database, and making linear transformations of the original data. The GCONTOUR procedure is used to generate three-dimensional analysis.

Read the paper (PDF) | Watch the recording

IGOR NASCIMENTO, UNB

Zacarias Linhares, IFPI

ROBERTO SOARES, IFPI

Session 5800-2016:

A Short Introduction to Longitudinal and Repeated Measures Data Analyses

Longitudinal and repeated measures data are seen in nearly all fields of analysis. Examples of this data include weekly lab test results of patients or performance of test score by children from the same class. Statistics students and analysts alike might be overwhelmed when it comes to repeated measures or longitudinal data analyses. They might try to educate themselves by diving into text books or taking semester-long or intensive weekend courses, resulting in even more confusion. Some might try to ignore the repeated nature of data and take short cuts such as analyzing all data as independent observations or analyzing summary statistics such as averages or changes from first to last points and ignoring all the data in-between. This hands-on presentation introduces longitudinal and repeated measures analyses without heavy emphasis on theory. Students in the workshop will have the opportunity to get hands-on experience graphing longitudinal and repeated measures data. They will learn how to approach these analyses with tools like PROC MIXED and PROC GENMOD. Emphasis will be on continuous outcomes, but categorical outcomes will briefly be covered.

Read the paper (PDF)

Leanne Goldstein, City of Hope

Session 10081-2016:

An Application of the PRINQUAL Procedure to Develop a Synthetic Index of Customer Value for a Colombian Financial Institution

Currently Colpatria, as a part of Scotiabank in Colombia, has several methodologies that enable us to have a vision of the customer from a risk perspective. However, the current trend in the financial sector is to have a global vision that involves aspects of risk as well as of profitability and utility. As a part of the business strategies to develop cross-sell and customer profitability under conditions of risk needs, it's necessary to create a customer value index to score the customer according to different groups of business key variables that permit us to describe the profitability and risk of each customer. In order to generate the Index of Customer Value, we propose to construct a synthetic index using principal component analysis and multiple factorial analysis.

Read the paper (PDF)

Ivan Atehortua, Colpatria

Diana Flórez, Colpatria

Session 12000-2016:

Analysis of Grades for University Students Using Administrative Data and the IRT Procedure

The world's capacity to store and analyze data has increased in ways that would have been inconceivable just a couple of years ago. Due to this development, large-scale data are collected by governments. Until recently, this was for purely administrative purposes. This study used comprehensive data files on education. The purpose of this study was to examine compulsory courses for a bachelor's degree in the economics program at the University of Copenhagen. The difficulty and use of the grading scale was compared across the courses by using the new IRT procedure, which was introduced in SAS/STAT^® 13.1. Further, the latent ability traits that were estimated for all students in the sample by PROC IRT are used as predictors in a logistic regression model. The hypothesis of interest is that students who have a lower ability trait will have a greater probability of dropping out of the university program compared to successful students. Administrative data from one cohort of students in the economics program at the University of Copenhagen was used (n=236). Three unidimensional Item Response Theory models, two dichotomous and one polytomous, were introduced. It turns out that the polytomous Graded Response model does the best job of fitting data. The findings suggest that in order to receive the highest possible grade, the highest level of student ability is needed for the course exam in the first-year course Descriptive Economics A. In contrast, the third-year course Econometrics C is the easiest course in which to receive a top grade. In addition, this study found that as estimated student ability decreases, the probability of a student dropping out of the bachelor's degree program increases drastically. However, contrary to expectations, some students with high ability levels also end up dropping out.

Read the paper (PDF)

Sara Armandi, University of Copenhagen

Session 11702-2016:

Analyzing Non-Normal Binomial and Categorical Response Variables under Varying Data Conditions

When dealing with non-normal categorical response variables, logistic regression is the robust method to use for modeling the relationship between categorical outcomes and different predictors without assuming a linear relationship between them. Within such models, the categorical outcome might be binary, multinomial, or ordinal, and predictors might be continuous or categorical. Another complexity that might be added to such studies is when data is longitudinal, such as when outcomes are collected at multiple follow-up times. Learning about modeling such data within any statistical method is beneficial because it enables researchers to look at changes over time. This study looks at several methods of modeling binary and categorical response variables within regression models by using real-world data. Starting with the simplest case of binary outcomes through ordinal outcomes, this study looks at different modeling options within SAS^® and includes longitudinal cases for each model. To assess binary outcomes, the current study models binary data in the absence and presence of correlated observations under regular logistic regression and mixed logistic regression. To assess multinomial outcomes, the current study uses multinomial logistic regression. When responses are ordered, using ordinal logistic regression is required as it allows for interpretations based on inherent rankings. Different logit functions for this model include the cumulative logit, adjacent-category logit, and continuation ratio logit. Each of these models is also considered for longitudinal (panel) data using methods such as mixed models and Generalized Estimating Equations (GEE). The final consideration, which cannot be addressed by GEE, is the conditional logit to examine bias due to omitted explanatory variables at the cluster level. Different procedures for the aforementioned within SAS^® 9.4 are explored and their strengths and limitations are specified for applied researchers finding simil ar data characteristics. These procedures include PROC LOGISTIC, PROC GLIMMIX, PROC GENMOD, PROC NLMIXED, and PROC PHREG.

Read the paper (PDF)

Niloofar Ramezani, University of Northern Colorado

B

Session 7200-2016:

Bayesian Inference for Gaussian Semiparametric Multilevel Models

Bayesian inference for complex hierarchical models with smoothing splines is typically intractable, requiring approximate inference methods for use in practice. Markov Chain Monte Carlo (MCMC) is the standard method for generating samples from the posterior distribution. However, for large or complex models, MCMC can be computationally intensive, or even infeasible. Mean Field Variational Bayes (MFVB) is a fast deterministic alternative to MCMC. It provides an approximating distribution that has minimum Kullback-Leibler distance to the posterior. Unlike MCMC, MFVB efficiently scales to arbitrarily large and complex models. We derive MFVB algorithms for Gaussian semiparametric multilevel models and implement them in SAS/IML^® software. To improve speed and memory efficiency, we use block decomposition to streamline the estimation of the large sparse covariance matrix. Through a series of simulations and real data examples, we demonstrate that the inference obtained from MFVB is comparable to that of PROC MCMC. We also provide practical demonstrations of how to estimate additional posterior quantities of interest from MFVB either directly or via Monte Carlo simulation.

Read the paper (PDF) | Download the data file (ZIP)

Jason Bentley, The University of Sydney

Cathy Lee, University of Technology Sydney

C

Session 8620-2016:

Competing-Risks Analyses: An Overview of Regression Models

Competing-risks analyses are methods for analyzing the time to a terminal event (such as death or failure) and its cause or type. The cumulative incidence function CIF(j, t) is the probability of death by time t from cause j. New options in the LIFETEST procedure provide for nonparametric estimation of the CIF from event times and their associated causes, allowing for right-censoring when the event and its cause are not observed. Cause-specific hazard functions that are derived from the CIFs are the analogs of the hazard function when only a single cause is present. Death by one cause precludes occurrence of death by any other cause, because an individual can die only once. Incorporating explanatory variables in hazard functions provides an approach to assessing their impact on overall survival and on the CIF. This semiparametric approach can be analyzed in the PHREG procedure. The Fine-Gray model defines a subdistribution hazard function that has an expanded risk set, which consists of individuals at risk of the event by any cause at time t, together with those who experienced the event before t from any cause other than the cause of interest j. Finally, with additional assumptions a full parametric analysis is also feasible. We illustrate the application of these methods with empirical data sets.

Read the paper (PDF) | Watch the recording

Joseph Gardiner, Michigan State University

Session 7500-2016:

Customer Lifetime Value Modeling

Customer lifetime value (LTV) estimation involves two parts: the survival probabilities and profit margins. This article describes the estimation of those probabilities using discrete-time logistic hazard models. The estimation of profit margins is based on linear regression. In the scenario, when outliers are present among margins, we suggest applying robust regression with PROC ROBUSTREG.

Read the paper (PDF) | View the e-poster or slides (PDF)

Vadim Pliner, Verizon Wireless

D

Session 11841-2016:

Diagnosing Obstructive Sleep Apnea: Using Predictive Analytics Based on Wavelet Analysis in SAS/IML^® Software and Spectral Analysis in PROC SPECTRA

This paper presents an application based on predictive analytics and feature-extraction techniques to develop the alternative method for diagnosis of obstructive sleep apnea (OSA). Our method reduces the time and cost associated with the gold standard or polysomnography (PSG), which is operated manually, by automatically determining the OSA's severity of a patient via classification models using the time series from a one-lead electrocardiogram (ECG). The data is from Dr. Thomas Penzel of Philipps-University, Germany, and can be downloaded at www.physionet.org. The selected data consists of 10 recordings (7 OSAs, and 3 controls) of ECG collected overnight, and non-overlapping-minute-by-minute OSA episode annotations (apnea and non-apnea states). This accounts for a total of 4,998 events (2,532 non-apnea and 2,466 apnea minutes). This paper highlights the nonlinear decomposition technique, wavelet analysis (WA) in SAS/IML^® software, to maximize the information of OSA symptoms from ECG, resulting in useful predictor signals. Then, the spectral and cross-spectral analyses via PROC SPECTRA are used to quantify important patterns of those signals to numbers (features), namely power spectral density (PSD), cross power spectral density (CPSD), and coherency, such that the machine learning techniques in SAS^® Enterprise Miner™, can differentiate OSA states. To eliminate variations such as body build, age, gender, and health condition, we normalize each feature by the feature of its original signal (that is, ratio of PSD of ECGs WA by PSD of ECG). Moreover, because different OSA symptoms occur at different times, we account for this by taking features from adjacency minutes into analysis, and select only important ones using a decision tree model. The best classification result in the validation data (70:30) obtained from the Random Forest model is 96.83% accuracy, 96.39% sensitivity, and 97.26% specificity. The results suggest our method is well comparable to the gold standard.

Read the paper (PDF)

Woranat Wongdhamma, Oklahoma State University

Session 11670-2016:

Do SAS^® High-Performance Statistics Procedures Really Perform Highly? A Comparison of HP and Legacy Procedures

The new SAS^® High-Performance Statistics procedures were developed to respond to the growth of big data and computing capabilities. Although there is some documentation regarding how to use these new high-performance (HP) procedures, relatively little has been disseminated regarding under what specific conditions users can expect performance improvements. This paper serves as a practical guide to getting started with HP procedures in SAS^®. The paper describes the differences between key HP procedures (HPGENSELECT, HPLMIXED, HPLOGISTIC, HPNLMOD, HPREG, HPCORR, HPIMPUTE, and HPSUMMARY) and their legacy counterparts both in terms of capability and performance, with a particular focus on discrepancies in real time required to execute. Simulations were conducted to generate data sets that varied on the number of observations (10,000, 50,000, 100,000, 500,000, 1,000,000, and 10,000,000) and the number of variables (50, 100, 500, and 1,000) to create these comparisons.

Read the paper (PDF) | View the e-poster or slides (PDF)

Diep Nguyen, University of South Florida

Sean Joo, University of South Florida

Anh Kellermann, University of South Florida

Jeff Kromrey, University of South Florida

Jessica Montgomery, University of South Florida

Patricia Rodríguez de Gil, University of South Florida

Yan Wang, University of South Florida

E

Session 11683-2016:

Equivalence Tests

Motivated by the frequent need for equivalence tests in clinical trials, this presentation provides insights into tests for equivalence. We summarize and compare equivalence tests for different study designs, including one-parameter problems, designs with paired observations, and designs with multiple treatment arms. Power and sample size estimations are discussed. We also provide examples to implement the methods by using the TTEST, ANOVA, GLM, and MIXED procedures.

Read the paper (PDF)

Fei Wang, McDougall Scientific Ltd.

John Amrhein, McDougall Scientific Ltd.

Session 9760-2016:

Evaluation of PROC IRT Procedure for Item Response Modeling

The experimental item response theory procedure (PROC IRT), included in recently released SAS/STAT^® 13.1 and 13.2, enables item response modeling and trait estimation in SAS^®. PROC IRT enables you to perform item parameter calibration and latent trait estimation using a wide spectrum of educational and psychological research. This paper evaluates the performance of PROC IRT in item parameter recovery and under various testing conditions. The pros and cons of PROC IRT versus BILOG-MG 3.0 are presented. For practitioners of IRT models, the development of IRT-related analysis in SAS is inspiring. This analysis offers a great choice to the growing population of IRT users. A shift to SAS can be beneficial based on several features of SAS: its flexibility in data management, its power in data analysis, its convenient output delivery, and its increasing richness in graphical presentation. It is critical to ensure the quality of item parameter calibration and trait estimation before you can continue with other components, such as test scoring, test form constructions, IRT equatings, and so on.

View the e-poster or slides (PDF)

Yi-Fang Wu, ACT, Inc.

Session SAS2746-2016:

Everyone CAN be a Data Scientist: Using SAS^® Studio to Create a Custom Task for SAS^® In-Memory Statistics

SAS^® In-Memory Statistics uses a powerful interactive programming interface for analytics, aimed squarely at the data scientist. We show how the custom tasks that you can create in SAS^® Studio (a web-based programming interface) can make everyone a data scientist! We explain the Common Task Model of SAS Studio, and we build a simple task in steps that carries out the basic functionality of the IMSTAT procedure. This task can then be shared amongst all users, empowering everyone on their journey to becoming a data scientist. During the presentation, it will become clear that not only can shareable tasks be created but the developer does not have to understand coding in Java, JavaScript, or ActionScript. We also use the task we created in the Visual Programming perspective in SAS Studio.

Read the paper (PDF)

Stephen Ludlow, SAS

Session 8441-2016:

exSPLINE That: Explaining Geographic Variation in Insurance Pricing

Generalized linear models (GLMs) are commonly used to model rating factors in insurance pricing. The integration of territory rating and geospatial variables poses a unique challenge to the traditional GLM approach. Generalized additive models (GAMs) offer a flexible alternative based on GLM principles with a relaxation of the linear assumption. We explore two approaches for incorporating geospatial data in a pricing model using a GAM-based framework. The ability to incorporate new geospatial data and improve traditional approaches to territory ratemaking results in further market segmentation and a better match of price to risk. Our discussion highlights the use of the high-performance GAMPL procedure, which is new in SAS/STAT^® 14.1 software. With PROC GAMPL, we can incorporate the geographic effects of geospatial variables on target loss outcomes. We illustrate two approaches. In our first approach, we begin by modeling the predictors as regressors in a GLM, and subsequently model the residuals as part of a GAM based on location coordinates. In our second approach, we model all inputs as covariates within a single GAM. Our discussion compares the two approaches and demonstrates visualization of model outputs.

Read the paper (PDF)

Carol Frigo, State Farm

Kelsey Osterloo, State Farm Insurance

Session 9340-2016:

Exact Logistic Models for Nested Binary Data in SAS^®

The use of logistic models for independent binary data has relied first on asymptotic theory and later on exact distributions for small samples, as discussed by Troxler, Lalonde, and Wilson (2011). While the use of logistic models for dependent analysis based on exact analyses is not common, it is usually presented in the case of one-stage clustering. We present a SAS^® macro that allows the testing of hypotheses using exact methods in the case of one-stage and two-stage clustering for small samples. The accuracy of the method and the results are compared to results obtained using an R program.

Read the paper (PDF)

Kyle Irimata, Arizona State University

Jeffrey Wilson, Arizona State University

Session 12490-2016:

Exploring the Factors That Impact Injury Severity Using Hierarchical Linear Modeling (HLM)

Injury severity describes the severity of the injury to the person involved in the crash. Understanding the factors that influence injury severity can be helpful in designing mechanisms to reduce accident fatalities. In this research, we model and analyze the data as a hierarchy with three levels to answer the question what road, vehicle and driver-related factors influence injury severity. In this study, we used hierarchical linear modeling (HLM) for analyzing nested data from Fatality Analysis Reporting System (FARS). The results show that driver-related factors are directly related to injury severity. On the other hand, road conditions and vehicle characteristics have significant moderation impact on injury severity. We believe that our study has important policy implications for designing customized mechanisms specific to each hierarchical level to reduce the occurrence of fatal accidents.

Read the paper (PDF)

Session 12487-2016:

Extracting Useful Information From the Google Ngram Data Set: A General Method to Take the Growth of the Scientific Literature into Account

Recent years have seen the birth of a powerful tool for companies and scientists: the Google Ngram data set, built from millions of digitized books. It can be, and has been, used to learn about past and present trends in the use of words over the years. This is an invaluable asset from a business perspective, mostly because of its potential application in marketing. The choice of words has a major impact on the success of a marketing campaign and an analysis of the Google Ngram data set can validate or even suggest the choice of certain words. It can also be used to predict the next buzzwords in order to improve marketing on social media or to help measure the success of previous campaigns. The Google Ngram data set is a gift for scientists and companies, but it has to be used with a lot of care. False conclusions can easily be drawn from straightforward analysis of the data. It contains only a limited number of variables, which makes it difficult to extract valuable information from it. Through a detailed example, this paper shows that it is essential to account for the disparity in the genre of the books used to construct the data set. This paper argues that for the years after 1950, the data set has been constructed using a much higher proportion of scientific books than for the years before. An ingenious method is developed to approximate, for each year, this unknown proportion of books coming from the scientific literature. A statistical model accounting for that change in proportion is then presented. This model is used to analyze the trend in the use of common words of the scientific literature in the 20th century. Results suggest that a naive analysis of the trends in the data can be misleading.

Read the paper (PDF)

Aurélien Nicosia, Université Laval

Thierry Duchesne, Universite Laval

Samuel Perreault, Université Laval

F

Session 3940-2016:

Fantasizing about the Big Data of NFL Fantasy Football, or Time to Get a Life

With millions of users and peak traffic of thousands of requests a second for complex user-specific data, fantasy football offers many data design challenges. Not only is there a high volume of data transfers, but the data is also dynamic and of diverse types. We need to process data originating on the stadium playing field and user devices and make it available to a variety of different services. The system must be nimble and must produce accurate and timely responses. This talk discusses the strategies employed by and lessons learned from one of the primary architects of the National Football League's fantasy football system. We explore general data design considerations with specific examples of high availability, data integrity, system performance, and some other random buzzwords. We review some of the common pitfalls facing large-scale databases and the systems using them. And we cover some of the tips and best practices to take your data-driven applications from fantasy to reality.

Read the paper (PDF)

Clint Carpenter, Carpenter Programming

Session SAS4720-2016:

Fitting Multilevel Hierarchical Mixed Models Using PROC NLMIXED

Hierarchical nonlinear mixed models are complex models that occur naturally in many fields. The NLMIXED procedure's ability to fit linear or nonlinear models with standard or general distributions enables you to fit a wide range of such models. SAS/STAT^® 13.2 enhanced PROC NLMIXED to support multiple RANDOM statements, enabling you to fit nested multilevel mixed models. This paper uses an example to illustrate the new functionality.

Read the paper (PDF)

Raghavendra Kurada, SAS

Session SAS5601-2016:

Fitting Your Favorite Mixed Models with PROC MCMC

The popular MIXED, GLIMMIX, and NLMIXED procedures in SAS/STAT^® software fit linear, generalized linear, and nonlinear mixed models, respectively. These procedures take the classical approach of maximizing the likelihood function to estimate model. The flexible MCMC procedure in SAS/STAT can fit these same models by taking a Bayesian approach. Instead of maximizing the likelihood function, PROC MCMC draws samples (using a variety of sampling algorithms) to approximate the posterior distributions of model parameters. Similar to the mixed modeling procedures, PROC MCMC provides estimation, inference, and prediction. This paper describes how to use the MCMC procedure to fit Bayesian mixed models and compares the Bayesian approach to how the classical models would be fit with the familiar mixed modeling procedures. Several examples illustrate the approach in practice.

Read the paper (PDF)

Gordon Brown, SAS

Fang K. Chen, SAS

Maura Stokes, SAS

G

Session 7980-2016:

Generating and Testing the Properties of Randomization Sequences Created by the Adaptive Biased Coin Randomization Design

There are many methods to randomize participants in randomized control trials. If it is important to have approximately balanced groups throughout the course of the trial, simple randomization is not a suitable method. Perhaps the most common alternative method that provides balance is the blocked randomization method. A less well-known method called the treatment adaptive randomized design also achieves balance. This paper shows you how to generate an entire randomization sequence to randomize participants in a two-group clinical trial using the adaptive biased coin randomization design (ABCD), prior to recruiting any patients. Such a sequence could be used in a central randomization server. A unique feature of this method allows the user to determine the extent to which imbalance is permitted to occur throughout the sequence while retaining the probabilistic nature that is essential to randomization. Properties of sequences generated by the ABCD approach are compared to those generated by simple randomization, a variant of simple randomization that ensures balance at the end of the sequence, and by blocked randomization.

View the e-poster or slides (PDF)

Gary Foster, St Joseph's Healthcare

H

Session SAS1800-2016:

Highly Customized Graphs Using ODS Graphics

You can use annotation, modify templates, and change dynamic variables to customize graphs in SAS^®. Standard graph customization methods include template modification (which most people use to modify graphs that analytical procedures produce) and SG annotation (which most people use to modify graphs that procedures such as PROC SGPLOT produce). However, you can also use SG annotation to modify graphs that analytical procedures produce. You begin by using an analytical procedure, ODS Graphics, and the ODS OUTPUT statement to capture the data that go into the graph. You use the ODS document to capture the values that the procedure sets for the dynamic variables, which control many of the details of how the graph is created. You can modify the values of the dynamic variables, and you can modify graph and style templates. Then you can use PROC SGRENDER along with the ODS output data set, the captured or modified dynamic variables, the modified templates, and SG annotation to create highly customized graphs. This paper shows you how and provides examples.

Read the paper (PDF) | Download the data file (ZIP)

Warren Kuhfeld, SAS

I

Session 11420-2016:

Integrating SAS^® and R to Perform Optimal Propensity Score Matching

In studies where randomization is not possible, imbalance in baseline covariates (confounding by indication) is a fundamental concern. Propensity score matching (PSM) is a popular method to minimize this potential bias, matching individuals who received treatment to those who did not, to reduce the imbalance in pre-treatment covariate distributions. PSM methods continue to advance, as computing resources expand. Optimal matching, which selects the set of matches that minimizes the average difference in propensity scores between mates, has been shown to outperform less computationally intensive methods. However, many find the implementation daunting. SAS/IML^® software allows the integration of optimal matching routines that execute in R, e.g. the R optmatch package. This presentation walks through performing optimal PSM in SAS^® through implementing R functions, assessing whether covariate trimming is necessary prior to PSM. It covers the propensity score analysis in SAS, the matching procedure, and the post-matching assessment of covariate balance using SAS/STAT^® 13.2 and SAS/IML procedures.

Read the paper (PDF)

Lucy D'Agostino McGowan, Vanderbilt University

Robert Greevy, Department of Biostatistics, Vanderbilt University

L

Session 1720-2016:

Limit of Detection (LoD) Estimation Using Parametric Curve Fitting to (Hit) Rate Data: The LOD_EST SAS^® Macro

The Limit of Detection (LoD) is defined as the lowest concentration or amount of material, target, or analyte that is consistently detectable (for polymerase chain reaction [PCR] quantitative studies, in at least 95% of the samples tested). In practice, the estimation of the LoD uses a parametric curve fit to a set of panel member (PM1, PM2, PM3, and so on) data where the responses are binary. Typically, the parametric curve fit to the percent detection levels takes on the form of a probit or logistic distribution. The SAS^® PROBIT procedure can be used to fit a variety of distributions, including both the probit and logistic. We introduce the LOD_EST SAS macro that takes advantage of the SAS PROBIT procedure's strengths and returns an information-rich graphic as well as a percent detection table with associated 95% exact (Clopper-Pearson) confidence intervals for the hit rates at each level.

Read the paper (PDF) | Download the data file (ZIP)

Jesse Canchola, Roche Molecular Systems, Inc.

Pari Hemyari, Roche Molecular Systems, Inc.

M

Session 9080-2016:

MCMC in SAS^®: From Scratch or by PROC

Markov chain Monte Carlo (MCMC) algorithms are an essential tool in Bayesian statistics for sampling from various probability distributions. Many users prefer to use an existing procedure to code these algorithms, while others prefer to write an algorithm from scratch. We demonstrate the various capabilities in SAS^® software to satisfy both of these approaches. In particular, we first illustrate the ease of using the MCMC procedure to define a structure. Then we step through the process of using SAS/IML^® to write an algorithm from scratch, with examples of a Gibbs sampler and a Metropolis-Hastings random walk.

Read the paper (PDF)

Chelsea Lofland, University of California, Santa Cruz

Session 10761-2016:

Medicare Fraud Analytics Using Cluster Analysis: How PROC FASTCLUS Can Refine the Identification of Peer Comparison Groups

Although limited to a small fraction of health care providers, the existence and magnitude of fraud in health insurance programs requires the use of fraud prevention and detection procedures. Data mining methods are used to uncover odd billing patterns in large databases of health claims history. Efficient fraud discovery can involve the preliminary step of deploying automated outlier detection techniques in order to classify identified outliers as potential fraud before an in-depth investigation. An essential component of the outlier detection procedure is the identification of proper peer comparison groups to classify providers as within-the-norm or outliers. This study refines the concept of peer comparison group within the provider category and considers the possibility of distinct billing patterns associated with medical or surgical procedure codes identifiable by the Berenson-Eggers Type of System (BETOS). The BETOS system covers all HCPCS codes (Health Care Procedure Coding System); assigns a HCPCS code to only one BETOS code; consists of readily understood clinical categories; consists of categories that permit objective assignment& (Center for Medicare and Medicaid Services, CMS). The study focuses on the specialty General Practice and involves two steps: first, the identification of clusters of similar BETOS-based billing patterns; and second, the assessment of the effectiveness of these peer comparison groups in identifying outliers. The working data set is a sample of the summary of 2012 data of physicians active in health care government programs made publicly available by the CMS through its website. The analysis uses PROC FASTCLUS, the SAS^® cubic clustering criterion approach, to find the optimal number of clusters in the data. It also uses PROC ROBUSTREG to implement a multivariate adaptive threshold outlier detection method.

Read the paper (PDF) | Download the data file (ZIP)

Paulo Macedo, Integrity Management Services

Session 11381-2016:

Multiple-Group Multiple Imputation: An Empirical Comparison of Parameter Recovery in the Presence of Missing Data using SAS^®

It is well documented in the literature the impact of missing data: data loss (Kim & Curry, 1977) and consequently loss of power (Raaijmakers, 1999), and biased estimates (Roth, Switzer, & Switzer, 1999). If left untreated, missing data is a threat to the validity of inferences made from research results. However, the application of a missing data treatment does not prevent researchers from reaching spurious conclusions. In addition to considering the research situation at hand and the type of missing data present, another important factor that researchers should also consider when selecting and implementing a missing data treatment is the structure of the data. In the context of educational research, multiple-group structured data is not uncommon. Assessment data gathered from distinct subgroups of students according to relevant variables (e.g., socio-economic status and demographics) might not be independent. Thus, when comparing the test performance of subgroups of students in the presence of missing data, it is important to preserve any underlying group effect by applying separate multiple imputation with multiple groups before any data analysis. Using attitudinal (Likert-type) data from the Civics Education Study (1999), this simulation study evaluates the performance of multiple-group imputation and total-group multiple imputation in terms of item parameter invariance within structural equation modeling using the SAS^® procedure CALIS and item response theory using the SAS procedure MCMC.

Read the paper (PDF)

Patricia Rodriguez de Gil, University of South Florida

Jeff Kromrey, University of South Florida

P

Session 11663-2016:

PROC GLIMMIX as a Teaching and Planning Tool for Experiment Design

In our book SAS for Mixed Models, we state that 'the majority of modeling problems are really design problems.' Graduate students and even relatively experienced statistical consultants can find translating a study design into a useful model to be a challenge. Generalized linear mixed models (GLMMs) exacerbate the challenge, because they accommodate complex designs, complex error structures, and non-Gaussian data. This talk covers strategies that have proven successful in design of experiment courses and in consulting sessions. In addition, GLMM methods can be extremely useful in planning experiments. This talk discusses methods to implement precision and power analysis to help choose between competing plausible designs and to assess the adequacy of proposed designs.

Read the paper (PDF) | Download the data file (ZIP)

Walter Stroup, University of Nebraska, Lincoln

Q

Session 5620-2016:

Quantitle Regression versus Ordinary Least Squares Regression

Regression is used to examine the relationship between one or more explanatory (independent) variables and an outcome (dependent) variable. Ordinary least squares regression models the effect of explanatory variables on the average value of the outcome. Sometimes, we are more interested in modeling the median value or some other quantile (for example, the 10th or 90th percentile). Or, we might wonder if a relationship between explanatory variables and outcome variable is the same for the entire range of the outcome: Perhaps the relationship at small values of the outcome is different from the relationship at large values of the outcome. This presentation illustrates quantile regression, comparing it to ordinary least squares regression. Topics covered will include: a graphical explanation of quantile regression, its assumptions and advantages, using the SAS^® QUANTREG procedure, and interpretation of the procedure's output.

Read the paper (PDF) | Watch the recording

Ruth Croxford, Institute for Clinical Evaluative Sciences

R

Session 6500-2016:

Research Problems Arising in Sports Statistics

With advances in technology, the world of sports is now offering rich data sets that are of interest to statisticians. This talk concerns some research problems in various sports that are based on large data sets. In baseball, PITCHf/x data is used to help quantify the quality of pitches. From this, questions about pitcher evaluation and effectiveness are addressed. In cricket, match commentaries are parsed to yield ball-by-ball data in order to assist in building a match simulator. The simulator can then be used to investigate optimal lineups, player evaluation, and the assessment of fielding.

Read the paper (PDF) | Watch the recording

Session 5621-2016:

Restricted Cubic Spline Regression: A Brief Introduction

Sometimes, the relationship between an outcome (dependent) variable and the explanatory (independent) variable(s) is not linear. Restricted cubic splines are a way of testing the hypothesis that the relationship is not linear or summarizing a relationship that is too non-linear to be usefully summarized by a linear relationship. Restricted cubic splines are just a transformation of an independent variable. Thus, they can be used not only in ordinary least squares regression, but also in logistic regression, survival analysis, and so on. The range of values of the independent variable is split up, with knots defining the end of one segment and the start of the next. Separate curves are fit to each segment. Overall, the splines are defined so that the resulting fitted curve is smooth and continuous. This presentation describes when splines might be used, how the splines are defined, choice of knots, and interpretation of the regression results.

Read the paper (PDF) | Watch the recording

Ruth Croxford, Institute for Clinical Evaluative Sciences

S

Session 10260-2016:

SAS^® Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates

Longitudinal data with time-dependent covariates is not readily analyzed as there are inherent, complex correlations due to the repeated measurements on the sampling unit and the feedback process between the covariates in one time period and the response in another. A generalized method of moments (GMM) logistic regression model (Lalonde, Wilson, and Yin 2014) is one method for analyzing such correlated binary data. While GMM can account for the correlation due to both of these factors, it is imperative to identify the appropriate estimating equations in the model. Cai and Wilson (2015) developed a SAS^® macro using SAS/IML^® software to fit GMM logistic regression models with extended classifications. In this paper, we expand the use of this macro to allow for continuous responses and as many repeated time points and predictors as possible. We demonstrate the use of the macro through two examples, one with binary response and another with continuous response.

Read the paper (PDF)

Katherine Cai, Arizona State University

Jeffrey Wilson, Arizona State University

Session 10960-2016:

SAS^® and R: A Perfect Combination for Sports Analytics

Revolution Analytics reports more than two million R users worldwide. SAS^® has the capability to use R code, but users have discovered a slight learning curve to performing certain basic functions such as getting data from the web. R is a functional programming language while SAS is a procedural programming language. These differences create difficulties when first making the switch from programming in R to programming in SAS. However, SAS/IML^® software enables integration between the two languages by enabling users to write R code directly into SAS/IML. This paper details the process of using the SAS/IML command Submit /R and the R package XML to get data from the web into SAS/IML. The project uses public basketball data for each of the 30 NBA teams over the past 35 years, taken directly from Basketball-Reference.com. The data was retrieved from 66 individual web pages, cleaned using R functions, and compiled into a final data set composed of 48 variables and 895 records. The seamless compatibility between SAS and R provide an opportunity to use R code in SAS for robust modeling. The resulting analysis provides a clear and concise approach for those interested in pursuing sports analytics.

View the e-poster or slides (PDF)

Matt Collins, University of Alabama

Taylor Larkin, The University of Alabama

Session 7640-2016:

Simulation of Imputation Effects under Different Assumptions

Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive question or in fear of embarrassment. Researchers often assume that their data are missing completely at random or missing at random.' Unfortunately, we cannot test whether the mechanism condition is satisfied because missing values cannot be calculated. Alternatively, we can run simulation in SAS^® to observe the behaviors of missing data under different assumptions: missing completely at random, missing at random, and being ignorable. We compare the effects from imputation methods if we assign a set of variable of interests to missing. The idea of imputation is to see how efficient substituted values in a data set affect further studies. This lets the audience decide which method(s) would be best to approach a data set when it comes to missing data.

View the e-poster or slides (PDF)

Danny Rithy, California Polytechnic State University

Soma Roy, California Polytechnic State University

Session SAS4900-2016:

Statistical Model Building for Large, Complex Data: Five New Directions in SAS/STAT^® Software

The increasing size and complexity of data in research and business applications require a more versatile set of tools for building explanatory and predictive statistical models. In response to this need, SAS/STAT^® software continues to add new methods. This presentation takes you on a high-level tour of five recent enhancements: new effect selection methods for regression models with the GLMSELECT procedure, model selection for generalized linear models with the HPGENSELECT procedure, model selection for quantile regression with the HPQUANTSELECT procedure, construction of generalized additive models with the GAMPL procedure, and building classification and regression trees with the HPSPLIT procedure. For each of these approaches, the presentation reviews its key concepts, uses a basic example to illustrate its benefits, and guides you to information that will help you get started.

Read the paper (PDF)

Bob Rodriguez, SAS

T

Session 6541-2016:

The Baker's Dozen: What Every Biostatistician Needs to Know

It's impossible to know all of SAS^® or all of statistics. There will always be some technique that you don't know. However, there are a few techniques that anyone in biostatistics should know. If you can calculate those with SAS, life is all the better. In this session you will learn how to compute and interpret a baker's dozen of these techniques, including several statistics that are frequently confused. The following statistics are covered: prevalence, incidence, sensitivity, specificity, attributable fraction, population attributable fraction, risk difference, relative risk, odds ratio, Fisher's exact test, number needed to treat, and McNemar's test. With these 13 tools in their tool chest, even nonstatisticians or statisticians who are not specialists will be able to answer many common questions in biostatistics. The fact that each of these can be computed with a few statements in SAS makes the situation all the sweeter. Bring your own doughnuts.

Read the paper (PDF)

AnnMaria De Mars, 7 Generation Games

Session 8080-2016:

The HPSUMMARY Procedure: An Old Friend's Younger (and Brawnier) Cousin

The HPSUMMARY procedure provides data summarization tools to compute basic descriptive statistics for variables in a SAS^® data set. It is a high-performance version of the SUMMARY procedure in Base SAS^®. Though PROC SUMMARY is popular with data analysts, PROC HPSUMMARY is still a new kid on the block. The purpose of this paper is to provide an introduction to PROC HPSUMMARY by comparing it with its well-known counterpart, PROC SUMMARY. The comparison focuses on differences in syntax, options, and performance in terms of execution time and memory usage. Sample code, outputs, and SAS log snippets are provided to illustrate the discussion. Simulated large data is used to observe the performance of the two procedures. Using SAS^® 9.4 installed on a single-user machine with four cores available, preliminary experiments that examine performance of the two procedures show that HPSUMMARY is more memory-efficient than SUMMARY when the data set is large. (For example, SUMMARY failed due to insufficient memory, whereas HPSUMMARY finished successfully). However, there is no evidence of a performance advantage of the HPSUMMARY over the SUMMARY procedures in this single-user machine.

Read the paper (PDF) | View the e-poster or slides (PDF)

Anh Kellermann, University of South Florida

Jeff Kromrey, University of South Florida

Session 11770-2016:

Time Series Analysis: U.S. Military Casualties in the Pacific Theater during World War II

This paper shows how SAS^® can be used to obtain a Time Series Analysis of data regarding World War II. This analysis tests whether Truman's justification for the use of atomic weapons was valid. Truman believed that by using the atomic weapons, he would prevent unacceptable levels of U.S. casualties that would be incurred in the course of a conventional invasion of the Japanese islands.

Read the paper (PDF) | View the e-poster or slides (PDF)

Rachael Becker, University of Central Florida

Session SAS6403-2016:

Tips and Strategies for Mixed Modeling with SAS/STAT^® Procedures

Inherently, mixed modeling with SAS/STAT^® procedures (such as GLIMMIX, MIXED, and NLMIXED) is computationally intensive. Therefore, considerable memory and CPU time can be required. The default algorithms in these procedures might fail to converge for some data sets and models. This encore presentation of a paper from SAS Global Forum 2012 provides recommendations for circumventing memory problems and reducing execution times for your mixed-modeling analyses. This paper also shows how the new HPMIXED procedure can be beneficial for certain situations, as with large sparse mixed models. Lastly, the discussion focuses on the best way to interpret and address common notes, warnings, and error messages that can occur with the estimation of mixed models in SAS^® software.

Read the paper (PDF) | Watch the recording

Phil GIbbs, SAS

U

Session 3160-2016:

Using Administrative Databases for Research: Propensity Score Methods to Adjust for Bias

Health care and other programs collect large amounts of information in order to administer the program. A health insurance plan, for example, probably records every physician service, hospital and emergency department visit, and prescription medication--information that is collected in order to make payments to the various providers (hospitals, physicians, pharmacists). Although the data are collected for administrative purposes, these databases can also be used to address research questions, including questions that are unethical or too expensive to answer using randomized experiments. However, when subjects are not randomly assigned to treatment groups, we worry about assignment bias--the possibility that the people in one treatment group were healthier, smarter, more compliant, etc., than those in the other group, biasing the comparison of the two treatments. Propensity score methods are one way to adjust the comparisons, enabling research using administrative data to mimic research using randomized controlled trials. In this presentation, I explain what the propensity score is, how it is used to compare two treatments, and how to implement a propensity score analysis using SAS^®. Two methods using propensity scores are presented: matching and inverse probability weighting. Examples are drawn from health services research using the administrative databases of the Ontario Health Insurance Plan, the single payer for medically necessary care for the 13 million residents of Ontario, Canada. However, propensity score methods are not limited to health care. They can be used to examine the impact of any nonrandomized program, as long as there is enough information to calculate a credible propensity score.

Read the paper (PDF)

Ruth Croxford, Institute for Clinical Evaluative Sciences

Session 1680-2016:

Using GENMOD to Analyze Correlated Data on Military Health System Beneficiaries Receiving Behavioral Health Care in South Carolina Health Care System

Many SAS^® procedures can be used to analyze large amounts of correlated data. This study was a secondary analysis of data obtained from the South Carolina Revenue and Fiscal Affairs Office (RFA). The data includes medical claims from all health care systems in South Carolina (SC). This study used the SAS procedure GENMOD to analyze a large amount of correlated data about Military Health Care (MHS) system beneficiaries who received behavioral health care in South Carolina Health Care Systems from 2005 to 2014. Behavioral health (BH) was defined by Major Diagnostic Code (MDC) 19 (mental disorders and diseases) and 20 (alcohol/drug use). MDCs are formed by dividing all possible principal diagnoses from the International Classification Diagnostic (ICD-9) codes into 25 mutually exclusive diagnostic categories. The sample included a total of 6,783 BH visits and 4,827 unique adult and child patients that included military service members, veterans, and their adult and child dependents who have MHS insurance coverage. PROC GENMOD included a multivariate GEE model with type of BH visit (mental health or substance abuse) as the dependent variable, and with gender, race group, age group, and discharge year as predictors. Hospital ID was used in the repeated statement with different correlation structures. Gender was significant with both independent correlation (p = .0001) and exchangeable structure (p = .0003). However, age group was significant using the independent correlation (p = .0160), but non-significant using the exchangeable correlation structure (p = .0584). SAS is a powerful statistical program for analyzing large amounts of correlated data with categorical outcomes.

Read the paper (PDF) | View the e-poster or slides (PDF)

Abbas Tavakoli, University of South Carolina

Jordan Brittingham, USC/ Arnold School of Public Health

Nikki R. Wooten, USC/College of Social Work

Session 11886-2016:

Using PROC HPSPLIT for Prediction with Classification and Regression

The HPSPLIT procedure, a powerful new addition to SAS/STAT^®, fits classification and regression trees and uses the fitted trees for interpretation of data structures and for prediction. This presentation shows the use of PROC HPSPLIT to analyze a number of data sets from different subject areas, with a focus on prediction among subjects and across landscapes. A critical component of these analyses is the use of graphical tools to select a model, interpret the fitted trees, and evaluate the accuracy of the trees. The use of different kinds of cross validation for model selection and model assessment is also discussed.

Read the paper (PDF)

Session 9520-2016:

Using Parametric and Nonparametric Tests to Assess the Decision of the NBA's 2014-2015 MVP Award

Stephen Curry, James Harden, and LeBron James are considered to be three of the most gifted professional basketball players in the National Basketball Association (NBA). Each year the Kia Most Valuable Player (MVP) award is given to the best player in the league. Stephen Curry currently holds this title, followed by James Harden and LeBron James, the first two runners-up. The decision for MVP was made by a panel of judges comprised of 129 sportswriters and broadcasters, along with fans who were able to cast their votes through NBA.com. Did the judges make the correct decision? Is there statistical evidence that indicates that Stephen Curry is indeed deserving of this prestigious title over James Harden and LeBron James? Is there a significant difference between the two runners-up? These are some of the questions that are addressed through this project. Using data collected from NBA.com for the 2014-2015 season, a variety of parametric and nonparametric k-sample methods were used to test 20 quantitative variables. In an effort to determine which of the three players is the most deserving of the MVP title, post-hoc comparisons were also conducted on the variables that were shown to be significant. The time-dependent variables were standardized, because there was a significant difference in the number of minutes each athlete played. These variables were then tested and compared with those that had not been standardized. This led to significantly different outcomes, indicating that the results of the tests could be misleading if the time variable is not taken into consideration. Using the standardized variables, the results of the analyses indicate that there is a statistically significant difference in the overall performances of the three athletes, with Stephen Curry outplaying the other two players. However, the difference between James Harden and LeBron James is not so clear.

Read the paper (PDF) | View the e-poster or slides (PDF)

Sherrie Rodriguez, Kennesaw State University

Session SAS5480-2016:

Using SAS^® Simulation Studio to Test and Validate SAS/OR^® Optimization Models

In many discrete-event simulation projects, the chief goal is to investigate the performance of a system. You can use output data to better understand the operation of the real or planned system and to conduct various what-if analyses. But you can also use simulation for validation--specifically, to validate a solution found by an optimization model. In optimization modeling, you almost always need to make some simplifying assumptions about the details of the system you are modeling. These assumptions are especially important when the system includes random variation--for example, in the arrivals of individuals, their distinguishing characteristics, or the time needed to complete certain tasks. A common approach holds each random element at some nominal value (such as the mean of its observed values) and proceeds with the optimization. You can do better. In order to test an optimization model and its underlying assumptions, you can build a simulation model of the system that uses the optimal solution as an input and simulates the system's detailed behavior. The simulation model helps determine how well the optimal solution holds up when randomness and perhaps other logical complexities (which the optimization model might have ignored, summarized, or modeled only approximately) are accounted for. Simulation might confirm the optimization results or highlight areas of concern in the optimization model. This paper describes cases in which you can use simulation and optimization together in this manner and discusses the positive implications of this complementary analytic approach. For the reader, prior experience with optimization and simulation is helpful but not required.

Read the paper (PDF) | Download the data file (ZIP)

Ed Hughes, SAS

Emily Lada, SAS Institute Inc.

Leo Lopes, SAS Institute Inc.

Imre Polik, SAS

Session 10729-2016:

Using SAS^® to Conduct Multivariate Statistical Analysis in Educational Research: Exploratory Factor Analysis and Confirmatory Factor Analysis

Multivariate statistical analysis plays an increasingly important role as the number of variables being measured increases in educational research. In both cognitive and noncognitive assessments, many instruments that researchers aim to study contain a large number of variables, with each measured variable assigned to a specific factor of the bigger construct. Recalling the educational theories or empirical research, the factor of each instrument usually emerges the same way. Two types of factor analysis are widely used in order to understand the latent relationships among these variables based on different scenarios. (1) Exploratory factor analysis (EFA), which is performed by using the SAS^® procedure PROC FACTOR, is an advanced statistical method used to probe deeply into the relationship among the variables and the larger construct and then develop a customized model for the specific assessment. (2) When a model is established, confirmatory factor analysis (CFA) is conducted by using the SAS procedure PROC CALIS to examine the model fit of specific data and then make adjustments for the model as needed. This paper presents the application of SAS to conduct these two types of factor analysis to fulfill various research purposes. Examples using real noncognitive assessment data are demonstrated, and the interpretation of the fit statistics is discussed.

Read the paper (PDF)

Jun Xu, Educational Testing Service

Steven Holtzman, Educational Testing Service

Kevin Petway, Educational Testing Service

Lili Yao, Educational Testing Service

Session 10725-2016:

Using SAS^® to Implement Simultaneous Linking in Item Response Theory

The objective of this study is to use the GLM procedure in SAS^® to solve a complex linkage problem with multiple test forms in educational research. Typically, the ABSORB option in the GLM procedure makes this task relatively easy to implement. Note that for educational assessments, to apply one-dimensional combinations of two-parameter logistic (2PL) models (Hambleton, Swaminathan, and Rogers 1991, ch. 1) and generalized partial credit models (Muraki 1997) to a large-scale high-stakes testing program with very frequent administrations requires a practical approach to link test forms. Haberman (2009) suggested a pragmatic solution of simultaneous linking to solve the challenging linking problem. In this solution, many separately calibrated test forms are linked by the use of least-squares methods. In SAS, the GLM procedure can be used to implement this algorithm by the use of the ABSORB option for the variable that specifies administrations, as long as the data are sorted by order of administration. This paper presents the use of SAS to examine the application of this proposed methodology to a simple case of real data.

Read the paper (PDF)

Lili Yao, Educational Testing Service

Session 2101-2016:

Using the Kaplan-Meier Product-Limit Estimator to Adjust NFL Yardage Averages

Average yards per reception, as well as number of touchdowns, are commonly used to rank National Football League (NFL) players. However, scoring touchdowns lowers the player's average since it stops the play and therefore prevents the player from gaining more yardage. If yardages are tabulated in a life table, then yardage from a touchdown play is denoted as a right-censored observation. This paper discusses the application of the SAS/STAT^® Kaplan-Meier product-limit estimator to adjust these averages. Using 15 seasons of NFL receiving data, the relationship between touchdown rates and average yards per reception is compared, before and after adjustment. The modification of adjustments when a player incurred a 2-point safety during the season is also discussed.

Read the paper (PDF)

Keith Curtis, USAA

Session SAS3161-2016:

Using the OPTMODEL Procedure in SAS/OR^® to Find the k Best Solutions

Because optimization models often do not capture some important real-world complications, a collection of optimal or near-optimal solutions can be useful for decision makers. This paper uses various techniques for finding the k best solutions to the linear assignment problem in order to illustrate several features recently added to the OPTMODEL procedure in SAS/OR^® software. These features include the network solver, the constraint programming solver (which can produce multiple solutions), and the COFOR statement (which allows parallel execution of independent solver calls).

Read the paper (PDF) | Download the data file (ZIP)

Rob Pratt, SAS

W

Session SAS4201-2016:

Writing Packages: A New Way to Distribute and Use SAS/IML^® Programs

SAS/IML^® 14.1 enables you to author, install, and call packages. A package consists of SAS/IML source code, documentation, data sets, and sample programs. Packages provide a simple way to share SAS/IML functions. An expert who writes a statistical analysis in SAS/IML can create a package and upload it to the SAS/IML File Exchange. A nonexpert can download the package, install it, and immediately start using it. Packages provide a standard and uniform mechanism for sharing programs, which benefits both experts and nonexperts. Packages are very popular with users of other statistical software, such as R. This paper describes how SAS/IML programmers can construct, upload, download, and install packages. They're not wrapped in brown paper or tied up with strings, but they'll soon be a few of your favorite things!

Read the paper (PDF)

Rick Wicklin, SAS