Education Papers A-Z

A
Paper 1603-2015:
A Model Family for Hierarchical Data with Combined Normal and Conjugate Random Effects
Non-Gaussian outcomes are often modeled using members of the so-called exponential family. Notorious members are the Bernoulli model for binary data, leading to logistic regression, and the Poisson model for count data, leading to Poisson regression. Two of the main reasons for extending this family are (1) the occurrence of overdispersion, meaning that the variability in the data is not adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of hierarchical structure in the data, stemming from clustering in the data which, in turn, might result from repeatedly measuring the outcome, for various members of the same family, and so on. The first issue is dealt with through a variety of overdispersion models such as the beta-binomial model for grouped binary data and the negative-binomial model for counts. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these phenomena might occur simultaneously, models combining them are uncommon. This paper proposes a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. We place particular emphasis on so-called conjugate random effects at the level of the mean for the first aspect and normal random effects embedded within the linear predictor for the second aspect, even though our family is more general. The binary, count, and time-to-event cases are given particular emphasis. Apart from model formulation, we present an overview of estimation methods, and then settle for maximum likelihood estimation with analytic-numerical integration. Implications for the derivation of marginal correlations functions are discussed. The methodology is applied to data from a study of epileptic seizures, a clinical trial for a toenail infection named onychomycosis, and survival data in children with asthma.
Read the paper (PDF). | Watch the recording.
Geert Molenberghs, Universiteit Hasselt & KU Leuven
Paper 3492-2015:
Alien Nation: Text Analysis of UFO Sightings in the US Using SAS® Enterprise Miner™ 13.1
Are we alone in this universe? This is a question that undoubtedly passes through every mind several times during a lifetime. We often hear a lot of stories about close encounters, Unidentified Flying Object (UFO) sightings and other mysterious things, but we lack the documented evidence for analysis on this topic. UFOs have been a matter of interest in the public for a long time. The objective of this paper is to analyze one database that has a collection of documented reports of UFO sightings to uncover any fascinating story related to the data. Using SAS® Enterprise Miner™ 13.1, the powerful capabilities of text analytics and topic mining are leveraged to summarize the associations between reported sightings. We used PROC GEOCODE to convert addresses of sightings to the locations on the map. Then we used PROC GMAP procedure to produce a heat map to represent the frequency of the sightings in various locations. The GEOCODE procedure converts address data to geographic coordinates (latitude and longitude values). These geographic coordinates can then be used on a map to calculate distances or to perform spatial analysis. On preliminary analysis of the data associated with sightings, it was found that the most popular words associated with UFOs tell us about their shapes, formations, movements, and colors. The Text Profiler node in SAS Enterprise Miner 13.1 was leveraged to build a model and cluster the data into different levels of segment variable. We also explain how the opinions about the UFO sightings change over time using Text Profiling. Further, this analysis uses the Text Profile node to find interesting terms or topics that were used to describe the UFO sightings. Based on the feedback received at SAS® analytics conference, we plan to incorporate a technique to filter duplicate comments and include weather in that location.
Read the paper (PDF). | Download the data file (ZIP).
Pradeep Reddy Kalakota, Federal Home Loan Bank of Desmoines
Naresh Abburi, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Zabiulla Mohammed, Oklahoma State University
Paper 2600-2015:
An Introductory Overview of the Features of Complex Survey Data
A complex survey data set is one characterized by any combination of the following four features: stratification, clustering, unequal weights, or finite population correction factors. In this paper, we provide context for why these features might appear in data sets produced from surveys, highlight some of the formulaic modifications they introduce, and outline the syntax needed to properly account for them. Specifically, we explain why you should use the SURVEY family of SAS/STAT® procedures, such as PROC SURVEYMEANS or PROC SURVEYREG, to analyze data of this type. Although many of the syntax examples are drawn from a fictitious expenditure survey, we also discuss the origins of complex survey features in three real-world survey efforts sponsored by statistical agencies of the United States government--namely, the National Ambulatory Medical Care Survey, the National Survey of Family of Growth, and the Consumer Building Energy Consumption Survey.
Read the paper (PDF). | Watch the recording.
Taylor Lewis, University of Maryland
Paper SAS1759-2015:
An Overview of Econometrics Tools in SAS/ETS®: Explaining the Past and Modeling the Future:
The importance of econometrics in the analytics toolkit is increasing every day. Econometric modeling helps uncover structural relationships in observational data. This paper highlights the many recent changes to the SAS/ETS® portfolio that increase your power to explain the past and predict the future. Examples show how you can use Bayesian regression tools for price elasticity modeling, use state space models to gain insight from inconsistent time series, use panel data methods to help control for unobserved confounding effects, and much more.
Read the paper (PDF).
Mark Little, SAS
Kenneth Sanford, SAS
Paper 3327-2015:
Automated Macros to Extract Data from the National (Nationwide) Inpatient Sample (NIS)
The use of administrative databases for understanding practice patterns in the real world has become increasingly apparent. This is essential in the current health-care environment. The Affordable Care Act has helped us to better understand the current use of technology and different approaches to surgery. This paper describes a method for extracting specific information about surgical procedures from the Healthcare Cost and Utilization Project (HCUP) database (also referred to as the National (Nationwide) Inpatient Sample (NIS)).The analyses provide a framework for comparing the different modalities of surgerical procedures of interest. Using an NIS database for a single year, we want to identify cohorts based on surgical approach. We do this by identifying the ICD-9 codes specific to robotic surgery, laparoscopic surgery, and open surgery. After we identify the appropriate codes using an ARRAY statement, a similar array is created based on the ICD-9 codes. Any minimally invasive procedure (robotic or laparoscopic) that results in a conversion is flagged as a conversion. Comorbidities are identified by ICD-9 codes representing the severity of each subject and merged with the NIS inpatient core file. Using a FORMAT statement for all diagnosis variables, we create macros that can be regenerated for each type of complication. These created macros are compiled in SAS® and stored in the library that contains the four macros that are called by tables. They call the macros for different macros variables. In addition, they create the frequencies of all cohorts and create the table structure with the title and number of the table. This paper describes a systematic method in SAS/STAT® 9.2 to extract the data from NIS using the ARRAY statement for the specific ICD-9 codes, to format the extracted data for the analysis, to merge the different NIS databases by procedures, and to use automatic macros to generate the report.
Read the paper (PDF).
Ravi Tejeshwar Reddy Gaddameedi, California State University,Eastbay
Usha Kreaden, Intuitive Surgical
B
Paper 3120-2015:
"BatchStats": SAS® Batch Statistics, A Click Away!
Over the years, the SAS® Business Intelligence platform has proved its importance in this big data world with its suite of applications that enable us to efficiently process, analyze, and transform huge amounts of business data. Within the data warehouse universe, 'batch execution' sits in the heart of SAS Data Integration technologies. On a day-to-day basis, batches run, and the current status of the batch is generally sent out to the team or to the client as a 'static' e-mail or as a report. From experience, we know that they don't provide much insight into the real 'bits and bytes' of a batch run. Imagine if the status of the running batch is automatically captured in one central repository and is presented on a beautiful web browser on your computer or on your iPad. All this can be achieved without asking anybody to send reports and with all 'post-batch' queries being answered automatically with a click. This paper aims to answer the same with a framework that is designed specifically to automate the reporting aspects of SAS batches and, yes, it is all about collecting statistics of the batch, and we call it - 'BatchStats.'
Prajwal Shetty, Tesco HSC
Paper 3643-2015:
Before and After Models in Observational Research Using Random Slopes and Intercepts
In observational data analyses, it is often helpful to use patients as their own controls by comparing their outcomes before and after some signal event, such as the initiation of a new therapy. It might be useful to have a control group that does not have the event but that is instead evaluated before and after some arbitrary point in time, such as their birthday. In this context, the change over time is a continuous outcome that can be modeled as a (possibly discontinuous) line, with the same or different slope before and after the event. Mixed models can be used to estimate random slopes and intercepts and compare patients between groups. A specific example published in a peer-reviewed journal is presented.
Read the paper (PDF).
David Pasta, ICON Clinical Research
C
Paper 2080-2015:
Calculate Decision Consistency Statistics for a Single Administration of a Test
Many certification programs classify candidates into performance levels. For example, the SAS® Certified Base Programmer breaks down candidates into two performance levels: Pass and Fail. It is important to note that because all test scores contain measurement error, the performance level categorizations based on those test scores are also subject to measurement error. An important part of psychometric analysis is to estimate the decision consistency of the classifications. This study helps fill a gap in estimating decision consistency statistics for a single administration of a test using SAS.
Read the paper (PDF).
Fan Yang, The University of Iowa
Yi Song, University of Illinois at Chicago
Paper 1329-2015:
Causal Analytics: Testing, Targeting, and Tweaking to Improve Outcomes
This session is an introduction to predictive analytics and causal analytics in the context of improving outcomes. The session covers the following topics: 1) Basic predictive analytics vs. causal analytics; 2) The causal analytics framework; 3) Testing whether the outcomes improve because of an intervention; 4) Targeting the cases that have the best improvement in outcomes because of an intervention; and 5) Tweaking an intervention in a way that improves outcomes further.
Read the paper (PDF).
Jason Pieratt, Humana
Paper 3291-2015:
Coding Your Own MCMC Algorithm
In Bayesian statistics, Markov chain Monte Carlo (MCMC) algorithms are an essential tool for sampling from probability distributions. PROC MCMC is useful for these algorithms. However, it is often desirable to code an algorithm from scratch. This is especially present in academia where students are expected to be able to understand and code an MCMC. The ability of SAS® to accomplish this is relatively unknown yet quite straightforward. We use SAS/IML® to demonstrate methods for coding an MCMC algorithm with examples of a Gibbs sampler and Metropolis-Hastings random walk.
Read the paper (PDF).
Chelsea Lofland, University of California Santa Cruz
Paper 3249-2015:
Cutpoint Determination Methods in Survival Analysis Using SAS®: Updated %FINDCUT Macro
Statistical analysis that uses data from clinical or epidemiological studies include continuous variables such as patient's age, blood pressure, and various biomarkers. Over the years, there has been an increase in studies that focus on assessing associations between biomarkers and disease of interest. Many of the biomarkers are measured as continuous variables. Investigators seek to identify the possible cutpoint to classify patients as high risk versus low risk based on the value of the biomarker. Several data-oriented techniques such as median and upper quartile, and outcome-oriented techniques based on score, Wald, and likelihood ratio tests are commonly used in the literature. Contal and O'Quigley (1999) presented a technique that used log rank test statistic in order to estimate the cutpoint. Their method was computationally intensive and hence was overlooked due to the unavailability of built-in options in standard statistical software. In 2003, we provided the %FINDCUT macro that used Contal and O'Quigley's approach to identify a cutpoint when the outcome of interest was measured as time to event. Over the past decade, demand for this macro has continued to grow, which has led us to consider updating the %FINDCUT macro to incorporate new tools and procedures from SAS® such as array processing, Graph Template Language, and the REPORT procedure. New and updated features include: results presented in a much cleaner report format, user-specified cutpoints, macro parameter error checking, temporary data set cleanup, preserving current option settings, and increased processing speed. We present the utility and added options of the revised %FINDCUT macro using a real-life data set. In addition, we critically compare this method to some of the existing methods and discuss the use and misuse of categorizing a continuous covariate.
Read the paper (PDF).
Jay Mandrekar, Mayo Clinic
Jeffrey Meyers, Mayo Clinic
D
Paper 3321-2015:
Data Summarization for a Dissertation: A Grad Student How-To Paper
Graduate students often need to explore data and summarize multiple statistical models into tables for a dissertation. The challenges of data summarization include coding multiple, similar statistical models, and summarizing these models into meaningful tables for review. The default method is to type (or copy and paste) results into tables. This often takes longer than creating and running the analyses. Students might spend hours creating tables, only to have to start over when a change or correction in the underlying data requires the analyses to be updated. This paper gives graduate students the tools to efficiently summarize the results of statistical models in tables. These tools include a macro-based SAS/STAT® analysis and ODS OUTPUT statement to summarize statistics into meaningful tables. Specifically, we summarize PROC GLM and PROC LOGISTIC output. We convert an analysis of hospital-acquired delirium from hundreds of pages of output into three formatted Microsoft Excel files. This paper is appropriate for users familiar with basic macro language.
Read the paper (PDF).
Elisa Priest, Texas A&M University Health Science Center
Ashley Collinsworth, Baylor Scott & White Health/Tulane University
Paper 3201-2015:
Designing Big Data Analytics Undergraduate and Postgraduate Programs for Employability by Using National Skills Frameworks
There is a widely forecast skills gap developing between the numbers of Big Data Analytics (BDA) graduates and the predicted jobs market. Many universities are developing innovative programs to increase the numbers of BDA graduates and postgraduates. The University of Derby has recently developed two new programs that aim to be unique and offer the applicants highly attractive and career-enhancing programs of study. One program is an undergraduate Joint Honours program that pairs analytics with a range of alternative subject areas; the other is a Master's program that has specific emphasis on governance and ethics. A critical aspect of both programs is the synthesis of a Personal Development Planning Framework that enables the students to evaluate their current status, identifies the steps needed to develop toward their career goals, and that provides a means of recording their achievements with evidence that can then be used in job applications. In the UK, we have two sources of skills frameworks that can be synthesized to provide a self-assessment matrix for the students to use as their Personal Development Planning (PDP) toolkit. These are the Skills Framework for the Information Age (SFIA-Plus) framework developed by the SFIA Foundation, and the Student Employability Profiles developed by the Higher Education Academy. A new set of National Occupational Skills (NOS) frameworks (Data Science, Data Management, and Data Analysis) have recently been released by the organization e-Skills UK for consultation. SAS® UK has had significant input to this new set of NOSs. This paper demonstrates how curricula have been developed to meet the Big Data Analytics skills shortfall by using these frameworks and how these frameworks can be used to guide students in their reflective development of their career plans.
Read the paper (PDF).
Richard Self, University of Derby
Paper 3061-2015:
Does Class Rank Align with Future Potential?
Today, employers and universities alike must choose the most talented individuals from a large pool. However, it is difficult to determine whether a student's A+ in English means that he or she can write as proficiently as another student who writes as a hobby. As a result, there are now dozens of ways to compare individuals to one or another spectrum. For example, the ACT and SAT enable universities to view a student's performance on a test given to all applicants in order to help determine whether they will be successful. High schools use students' GPAs in order to compare them to one another for academic opportunities. The WorkKeys Exam enables employers to rate prospective employees on their abilities to perform day-to-day business activities. Rarely do standardized tests and in-class performance get compared to each other. We used SAS® to analyze the GPAs and WorkKeys Exam results of 285 seniors who attend the Phillip O Berry Academy. The purpose was to compare class standing to what a student can prove he knows in a standardized environment. Emphasis is on the use of PROC SQL in SAS® 9.3 rather than DATA step processing.
Read the paper (PDF). | Download the data file (ZIP).
Jonathan Gomez Martinez, Phillip O Berry Academy of Technology
Christopher Simpson, Phillip O Berry Academy of Technology
Paper 3347-2015:
Donor Sentiment and Characteristic Analysis Using SAS® Enterprise Miner™ and SAS® Sentiment Analysis Studio
It has always been a million-dollar question, What inhibits a donor to donate? Many successful universities have deep roots in annual giving. We know donor sentiment is a key factor in drawing attention to engage donors. This paper is a summary of findings about donor behaviors using textual analysis combined with the power of predictive modeling. In addition to identifying the characteristics of donors, the paper focuses on identifying the characteristics of a first-time donor. It distinguishes the features of the first-time donor from the general donor pattern. It leverages the variations in data to provide deeper insights into behavioral patterns. A data set containing 247,000 records was obtained from the XYZ University Foundation alumni database, Facebook, and Twitter. Solicitation content such as email subject lines sent to the prospect base was considered. Time-dependent data and time-independent data were categorized to make unbiased predictions about the first-time donor. The predictive models use input such as age, educational records, scholarships, events, student memberships, and solicitation methods. Models such as decision trees, Dmine regression, and neural networks were built to predict the prospects. SAS® Sentiment Analysis Studio and SAS® Enterprise Miner™ were used to analyze the sentiment.
Read the paper (PDF).
Ramcharan Kakarla, Comcast
Goutam Chakraborty, Oklahoma State University
E
Paper 3190-2015:
Educating Future Business Leaders in the Era of Big Data
At NC State University, our motto is Think and Do. When it comes to educating students in the Poole College of Management, that means that we want them to not only learn to think critically but also to gain hands-on experience with the tools that will enable them to be successful in their careers. And, in the era of big data, we want to ensure that our students develop skills that will help them to think analytically in order to use data to drive business decisions. One method that lends itself well to thinking and doing is the case study approach. In this paper, we discuss the case study approach for teaching analytical skills and highlight the use of SAS® software for providing practical, hands-on experience with manipulating and analyzing data. The approach is illustrated with examples from specific case studies that have been used for teaching introductory and intermediate courses in business analytics.
Read the paper (PDF).
Tonya Balan, NC State University
Paper 3384-2015:
Exploring Number Theory Using Base SAS®
Long before analysts began mining large data sets, mathematicians sought truths hidden within the set of natural numbers. This exploration was formalized into the mathematical subfield known as number theory. Though this discipline has proven fruitful for many applied fields, number theory delights in numerical truth for its own sake. The austere and abstract beauty of number theory prompted nineteenth century mathematician Carl Friedrich Gauss to dub it The Queen of Mathematics.' This session and the related paper demonstrate that the analytical power of the SAS® engine is well-suited for exploring concepts in number theory. In Base SAS®, students and educators will find a powerful arsenal for exploring, illustrating, and visualizing the following: prime numbers, perfect numbers, the Euclidean algorithm, the prime number theorem, Euler's totient function, Chebyshev's theorem, the Chinese remainder theorem, Gauss circle problem, and more! The paper delivers custom SAS procedures and code segments that generate data sets relevant to the exploration of topics commonly found in elementary number theory texts. The efficiency of these algorithms is discussed and an emphasis is placed on structuring data sets to maximize flexibility and ease in exploring new conjectures and illustrating known theorems. Last, the power of SAS plotting is put to use in unexpected ways to visualize and convey number-theoretic facts. This session and the related paper are geared toward the academic user or SAS user who welcomes and revels in mathematical diversions.
Read the paper (PDF).
Matthew Duchnowski, Educational Testing Service
F
Paper 3434-2015:
From Backpacks to Briefcases: Making the Transition from Grad School to a Paying Job
During grad school, students learn SAS® in class or on their own for a research project. Time is limited, so faculty have to focus on what they know are the fundamental skills that students need to successfully complete their coursework. However, real-world research projects are often multifaceted and require a variety of SAS skills. When students transition from grad school to a paying job, they might find that in order to be successful, they need more than the basic SAS skills that they learned in class. This paper highlights 10 insights that I've had over the past year during my transition from grad school to a paying SAS research job. I hope this paper will help other students make a successful transition. Top 10 insights: 1. You still get graded, but there is no syllabus. 2. There isn't time for perfection. 3. Learn to use your resources. 4. There is more than one solution to every problem. 5. Asking for help is not a weakness. 6. Working with a team is required. 7. There is more than one type of SAS®. 8. The skills you learned in school are just the basics. 9. Data is complicated and often frustrating. 10. You will continue to learn both personally and professionally.
Read the paper (PDF).
Lauren Hall, Baylor Scott & White Health
Elisa Priest, Texas A&M University Health Science Center
Paper SAS1580-2015:
Functional Modeling of Longitudinal Data with the SSM Procedure
In many studies, a continuous response variable is repeatedly measured over time on one or more subjects. The subjects might be grouped into different categories, such as cases and controls. The study of resulting observation profiles as functions of time is called functional data analysis. This paper shows how you can use the SSM procedure in SAS/ETS® software to model these functional data by using structural state space models (SSMs). A structural SSM decomposes a subject profile into latent components such as the group mean curve, the subject-specific deviation curve, and the covariate effects. The SSM procedure enables you to fit a rich class of structural SSMs, which permit latent components that have a wide variety of patterns. For example, the latent components can be different types of smoothing splines, including polynomial smoothing splines of any order and all L-splines up to order 2. The SSM procedure efficiently computes the restricted maximum likelihood (REML) estimates of the model parameters and the best linear unbiased predictors (BLUPs) of the latent components (and their derivatives). The paper presents several real-life examples that show how you can fit, diagnose, and select structural SSMs; test hypotheses about the latent components in the model; and interpolate and extrapolate these latent components.
Read the paper (PDF). | Download the data file (ZIP).
Rajesh Selukar, SAS
G
Paper 1886-2015:
Getting Started with Data Governance
While there has been tremendous progress in technologies related to data storage, high-performance computing, and advanced analytic techniques, organizations have only recently begun to comprehend the importance of parallel strategies that help manage the cacophony of concerns around access, quality, provenance, data sharing, and use. While data governance is not new, the drumbeat around it, along with master data management and data quality, is approaching a crescendo. Intensified by the increase in consumption of information, expectations about ubiquitous access, and highly dynamic visualizations, these factors are also circumscribed by security and regulatory constraints. In this paper, we provide a summary of what data governance is and its importance. We go beyond the obvious and provide practical guidance on what it takes to build out a data governance capability appropriate to the scale, size, and purpose of the organization and its culture. Moreover, we discuss best practices in the form of requirements that highlight what we think is important to consider as you provide that tactical linkage between people, policies, and processes to the actual data lifecycle. To that end, our focus includes the organization and its culture, people, processes, policies, and technology. Further, we include discussions of organizational models as well as the role of the data steward, and provide guidance on how to formalize data governance into a sustainable set of practices within your organization.
Read the paper (PDF). | Watch the recording.
Greg Nelson, ThotWave
Lisa Dodson, SAS
H
Paper 3181-2015:
How Best to Effectively Teach the SAS® Language
Learning a new programming language is not an easy task, especially for someone who does not have any programming experience. Learning the SAS® programming language can be even more challenging. One of the reasons is that the SAS System consists of a variety of languages, such as the DATA step language, SAS macro language, Structured Query Language for the SQL procedure, and so on. Furthermore, each of these languages has its own unique characteristics and simply learning the syntax is not sufficient to grasp the language essence. Thus, it is not unusual to hear about someone who has learned SAS for several years and has never become a SAS programming expert. By using the DATA step language as an example, I would like to share some of my experiences on effectively teaching the SAS language.
Read the paper (PDF).
Arthur Li, City of Hope National Medical Center
Paper 3214-2015:
How is Your Health? Using SAS® Macros, ODS Graphics, and GIS Mapping to Monitor Neighborhood and Small-Area Health Outcomes
With the constant need to inform researchers about neighborhood health data, the Santa Clara County Health Department created socio-demographic and health profiles for 109 neighborhoods in the county. Data was pulled from many public and county data sets, compiled, analyzed, and automated using SAS®. With over 60 indicators and 109 profiles, an efficient set of macros was used to automate the calculation of percentages, rates, and mean statistics for all of the indicators. Macros were also used to automate individual census tracts into pre-decided neighborhoods to avoid data entry errors. Simple SQL procedures were used to calculate and format percentages within the macros, and output was pushed out using Output Delivery System (ODS) Graphics. This output was exported to Microsoft Excel, which was used to create a sortable database for end users to compare cities and/or neighborhoods. Finally, the automated SAS output was used to map the demographic data using geographic information system (GIS) software at three geographies: city, neighborhood, and census tract. This presentation describes the use of simple macros and SAS procedures to reduce resources and time spent on checking data for quality assurance purposes. It also highlights the simple use of ODS Graphics to export data to an Excel file, which was used to mail merge the data into 109 unique profiles. The presentation is aimed at intermediate SAS users at local and state health departments who might be interested in finding an efficient way to run and present health statistics given limited staff and resources.
Read the paper (PDF).
Roshni Shah, Santa Clara County
Paper 3252-2015:
How to Use SAS® for GMM Logistic Regression Models for Longitudinal Data with Time-Dependent Covariates
In longitudinal data, it is important to account for the correlation due to repeated measures and time-dependent covariates. Generalized method of moments can be used to estimate the coefficients in longitudinal data, although there are currently limited procedures in SAS® to produce GMM estimates for correlated data. In a recent paper, Lalonde, Wilson, and Yin provided a GMM model for estimating the coefficients in this type of data. SAS PROC IML was used to generate equations that needed to be solved to determine which estimating equations to use. In addition, this study extended classifications of moment conditions to include a type IV covariate. Two data sets were evaluated using this method, including re-hospitalization rates from a Medicare database as well as body mass index and future morbidity rates among Filipino children. Both examples contain binary responses, repeated measures, and time-dependent covariates. However, while this technique is useful, it is tedious and can also be complicated when determining the matrices necessary to obtain the estimating equations. We provide a concise and user-friendly macro to fit GMM logistic regression models with extended classifications.
Read the paper (PDF).
Katherine Cai, Arizona State University
I
Paper 2320-2015:
Implementing a Discrete Event Simulation Using the American Community Survey and the SAS® University Edition
SAS® University Edition is a great addition to the world of freely available analytic software, and this 'how-to' presentation shows you how to implement a discrete event simulation using Base SAS® to model future US Veterans population distributions. Features include generating a slideshow using ODS output to PowerPoint.
Read the paper (PDF). | Download the data file (ZIP).
Michael Grierson
Paper 3343-2015:
Improving SAS® Global Forum Papers
Just as research is built on existing research, the references section is an important part of a research paper. The purpose of this study is to find the differences between professionals and academicians with respect to the references section of a paper. Data is collected from SAS® Global Forum 2014 Proceedings. Two research hypotheses are supported by the data. First, the average number of references in papers by academicians is higher than those by professionals. Second, academicians follow standards for citing references more than professionals. Text mining is performed on the references to understand the actual content. This study suggests that authors of SAS Global Forum papers should include more references to increase the quality of the papers.
Read the paper (PDF).
Vijay Singh, Oklahoma State University
Pankush Kalgotra, Oklahoma State University
Paper SAS1965-2015:
Improving the Performance of Data Mining Models with Data Preparation Using SAS® Enterprise Miner™
In data mining modelling, data preparation is the most crucial, most difficult, and longest part of the mining process. A lot of steps are involved. Consider the simple distribution analysis of the variables, the diagnosis and reduction of the influence of variables' multicollinearity, the imputation of missing values, and the construction of categories in variables. In this presentation, we use data mining models in different areas like marketing, insurance, retail and credit risk. We show how to implement data preparation through SAS® Enterprise Miner™, using different approaches. We use simple code routines and complex processes involving statistical insights, cluster variables, transform variables, graphical analysis, decision trees, and more.
Read the paper (PDF).
Ricardo Galante, SAS
K
Paper 2480-2015:
Kaplan-Meier Survival Plotting Macro %NEWSURV
The research areas of pharmaceuticals and oncology clinical trials greatly depend on time-to-event endpoints such as overall survival and progression-free survival. One of the best graphical displays of these analyses is the Kaplan-Meier curve, which can be simple to generate with the LIFETEST procedure but difficult to customize. Journal articles generally prefer that statistics such as median time-to-event, number of patients, and time-point event-free rate estimates be displayed within the graphic itself, and this was previously difficult to do without an external program such as Microsoft Excel. The macro %NEWSURV takes advantage of the Graph Template Language (GTL) that was added with the SG graphics engine to create this level of customizability without the need for back-end manipulation. Taking this one step further, the macro was improved to be able to generate a lattice of multiple unique Kaplan-Meier curves for side-by-side comparisons or for condensing figures for publications. This paper describes the functionality of the macro and describes how the key elements of the macro work.
Read the paper (PDF).
Jeffrey Meyers, Mayo Clinic
Paper 3023-2015:
Killing Them with Kindness: Policies Not Based on Data Might Do More Harm Than Good
Educational administrators sometimes have to make decisions based on what they believe is in the best interest of their students because they do not have the data they need at the time. Some administrators do not even know that the data exist to help them make their decisions. However, well-intentioned policies that are not based on facts can sometimes do more harm than good for the students and the institution. This presentation discusses the results of the policy analyses conducted by the Office of Institutional Research at Western Kentucky University using Base SAS®, SAS/STAT®, SAS® Enterprise Miner™, and SAS® Visual Analytics. The researchers analyzed Western Kentucky University's math course placement procedure for incoming students and assessed the criteria used for admissions decisions, including those for first-time first-year students, transfer students, and students readmitted to the University after being dismissed for unsatisfactory academic progress--procedures and criteria previously designed with the students' best interests at heart. The presenters discuss the statistical analyses used to evaluate the policies and the use of SAS Visual Analytics to present their results to administrators in a visual manner. In addition, the presenters discuss subsequent changes in the policies, and where possible, the results of the policy changes.
Read the paper (PDF).
Tuesdi Helbig, Western Kentucky University
Matthew Foraker, Western Kentucky University
L
Paper 3203-2015:
Learning Analytics to Evaluate and Confirm Pedagogic Choices
There are many pedagogic theories and practices that academics research and follow as they strive to ensure excellence in their students' achievements. In order to validate the impact of different approaches, there is a need to apply analytical techniques to evaluate the changing levels of achievements that occur as a result of changes in applied pedagogy. The analytics used should be easily accessible to all academics with minimal overhead in terms of the collection of new data. This paper is based on a case study of the changing pedagogical approaches of the author over the past five years, using grade profiles from a wide range of modules taught by the author in both the School of Computing and Maths and the Business School at the University of Derby. Base SAS® and SAS® Studio were used to evaluate and demonstrate the impact of the change from a pedagogical position of Academic as Domain Expert to a pedagogical position of Academic as Learning-to-Learn Expert . This change resulted in greater levels of research that supported learning along with better writing skills. The application of Learning Analytics in this case study demonstrates a very significant improvement in grade profiles of all students of between 15% and 20%. More surprisingly, it demonstrates that it also eliminates a significant grade deficit in the black and minority ethnic student population, which is typically about 15% in a large number of UK universities.
Read the paper (PDF).
Richard Self, University of Derby
M
Paper 2400-2015:
Modeling Effect Modification and Higher-Order Interactions: A Novel Approach for Repeated Measures Design Using the LSMESTIMATE Statement in SAS® 9.4
Effect modification occurs when the association between a predictor of interest and the outcome is differential across levels of a third variable--the modifier. Effect modification is statistically tested as the interaction effect between the predictor and the modifier. In repeated measures studies (with more than two time points), higher-order (three-way) interactions must be considered to test effect modification by adding time to the interaction terms. Custom fitting and constructing these repeated measures models are difficult and time consuming, especially with respect to estimating post-fitting contrasts. With the advancement of the LSMESTIMATE statement in SAS®, a simplified approach can be used to custom test for higher-order interactions with post-fitting contrasts within a mixed model framework. This paper provides a simulated example with tips and techniques for using an application of the nonpositional syntax of the LSMESTIMATE statement to test effect modification in repeated measures studies. This approach, which is applicable to exploring modifiers in randomized controlled trials (RCTs), goes beyond the treatment effect on outcome to a more functional understanding of the factors that can enhance, reduce, or change this relationship. Using this technique, we can easily identify differential changes for specific subgroups of individuals or patients that subsequently impact treatment decision making. We provide examples of conventional approaches to higher-order interaction and post-fitting tests using the ESTIMATE statement and compare and contrast this to the nonpositional syntax of the LSMESTIMATE statement. The merits and limitations of this approach are discussed.
Read the paper (PDF). | Download the data file (ZIP).
Pronabesh DasMahapatra, PatientsLikeMe Inc.
Ryan Black, NOVA Southeastern University
Paper 2900-2015:
Multiple Ways to Detect Differential Item Functioning in SAS®
Differential item functioning (DIF), as an assessment tool, has been widely used in quantitative psychology, educational measurement, business management, insurance, and health care. The purpose of DIF analysis is to detect response differences of items in questionnaires, rating scales, or tests across different subgroups (for example, gender) and to ensure the fairness and validity of each item for those subgroups. The goal of this paper is to demonstrate several ways to conduct DIF analysis by using different SAS® procedures (PROC FREQ, PROC LOGISITC, PROC GENMOD, PROC GLIMMIX, and PROC NLMIXED) and their applications. There are three general methods to examine DIF: generalized Mantel-Haenszel (MH), logistic regression, and item response theory (IRT). The SAS® System provides flexible procedures for all these approaches. There are two types of DIF: uniform DIF, which remains consistent across ability levels, and non-uniform DIF, which varies across ability levels. Generalized MH is a nonparametric method and is often used to detect uniform DIF while the other two are parametric methods and examine both uniform and non-uniform DIF. In this study, I first describe the underlying theories and mathematical formulations for each method. Then I show the SAS statements, input data format, and SAS output for each method, followed by a detailed demonstration of the differences among the three methods. Specifically, PROC FREQ is used to calculate generalized MH only for dichotomous items. PROC LOGISITIC and PROC GENMOD are used to detect DIF by using logistic regression. PROC NLMIXED and PROC GLIMMIX are used to examine DIF by applying an exploratory item response theory model. Finally, I use SAS/IML® to call two R packages (that is, difR and lordif) to conduct DIF analysis and then compare the results between SAS procedures and R packages. An example data set, the Verbal Aggression assessment, which includes 316 subjects and 24 items, is used in this stud y. Following the general DIF analysis, the male group is used as the reference group, and the female group is used as the focal group. All the analyses are conducted by SAS® 9.3 and R 2.15.3. The paper closes with the conclusion that the SAS System provides different flexible and efficient ways to conduct DIF analysis. However, it is essential for SAS users to understand the underlying theories and assumptions of different DIF methods and apply them appropriately in their DIF analyses.
Read the paper (PDF).
Yan Zhang, Educational Testing Service
P
Paper 2440-2015:
Permit Me to Permute: A Basic Introduction to Permutation Tests with SAS/IML® Software
If your data do not meet the assumptions for a standard parametric test, you might want to consider using a permutation test. By randomly shuffling the data and recalculating a test statistic, a permutation test can calculate the probability of getting a value equal to or more extreme than an observed test statistic. With the power of matrices, vectors, functions, and user-defined modules, the SAS/IML® language is an excellent option. This paper covers two examples of permutation tests: one for paired data and another for repeated measures analysis of variance. For those new to SAS/IML® software, this paper offers a basic introduction and examples of how effective it can be.
Read the paper (PDF).
John Vickery, North Carolina State University
Paper 3307-2015:
Preparing Output from Statistical Procedures for Publication, Part 1: PROC REG to APA Format
Many scientific and academic journals require that statistical tables be created in a specific format, with one of the most common formats being that of the American Psychological Association (APA). The APA publishes a substantial guide book to writing and formatting papers, including an extensive section on creating tables (Nichol 2010). However, the output generated by SAS® procedures does not match this style. This paper discusses techniques to change the SAS procedure output to match the APA guidelines using SAS ODS (Output Delivery System).
Read the paper (PDF).
Vince DelGobbo, SAS
Peter Flom, Peter Flom Consulting
Paper 2103-2015:
Preparing Students for the Real World with SAS® Studio
A common complaint of employers is that educational institutions do not prepare students for the types of messy data and multi-faceted requirements that occur on the job. No organization has data that resembles the perfectly scrubbed data sets in the back of a statistics textbook. The objective of the Annual Report Project is to quickly bring new SAS® users to a level of competence where they can use real data to meet real business requirements. Many organizations need annual reports for stockholders, funding agencies, or donors. Or, they need annual reports at the department or division level for an internal audience. Being tapped as part of the team creating an annual report used to mean weeks of tedium, poring over columns of numbers in 8-point font in (shudder) Excel spreadsheets, but no more. No longer painful, using a few SAS procedures and functions, reporting can be easy and, dare I say, fun. All analyses are done using SAS® Studio (formerly SAS® Web Editor) of SAS OnDemand for Academics. This paper uses an example with actual data for a report prepared to comply with federal grant funding requirements as proof that, yes, it really is that simple.
Read the paper (PDF). | Watch the recording.
AnnMaria De Mars, AnnMaria De Mars
R
Paper 2601-2015:
Replication Techniques for Variance Approximation
Replication techniques such as the jackknife and the bootstrap have become increasingly popular in recent years, particularly within the field of complex survey data analysis. The premise of these techniques is to treat the data set as if it were the population and repeatedly sample from it in some systematic fashion. From each sample, or replicate, the estimate of interest is computed, and the variability of the estimate from the full data set is approximated by a simple function of the variability among the replicate-specific estimates. An appealing feature is that there is generally only one variance formula per method, regardless of the underlying quantity being estimated. The entire process can be efficiently implemented after appending a series of replicate weights to the analysis data set. As will be shown, the SURVEY family of SAS/STAT® procedures can be exploited to facilitate both the task of appending the replicate weights and approximating variances.
Read the paper (PDF).
Taylor Lewis, University of Maryland
S
Paper 2620-2015:
SAS® Certification as a Tool for Professional Development
In today's competitive job market, both recent graduates and experienced professionals are looking for ways to set themselves apart from the crowd. SAS® certification is one way to do that. SAS Institute Inc. offers a range of exams to validate your knowledge level. In writing this paper, we have drawn upon our personal experiences, remarks shared by new and longtime SAS users, and conversations with experts at SAS. We discuss what certification is and why you might want to pursue it. Then we share practical tips you can use to prepare for an exam and do your best on exam day.
Read the paper (PDF).
Andra Northup, Advanced Analytic Designs, Inc.
Susan Slaughter, Avocet Solutions
Paper 3510-2015:
SAS® Visual Analytics: Emerging Trend in Institutional Research
Institutional research and effectiveness offices at most institutions are often the primary beneficiaries of the data warehouse (DW) technologies. However, at many institutions, building the data warehouse for growing accountability, decision support, and the institutional effectiveness needs are still unfulfilled, in part due to the growing data volumes as well as the prohibitively expensive data warehousing costs built by UIT departments. In recent years, many institutional research offices in the country are often asked to take a leadership role in building the DW or partner with the campus IT department to improve the efficiency and effectiveness of the DW development. Within this context, the Office of Institutional Research and Effectiveness at a large public research university in the north east was entrusted with the responsibility to build the new campus data warehouse for growing needs such as resource allocation, competitive positioning, new program development in emerging STEM disciplines, and accountability reporting. These requirements necessitated the deployment of state-of-the-art analytical decision support applications, such as SAS® Visual Analytics (reporting and analysis), SAS® Visual Statistics (predictive), in a disparate data environment, including PeopleSoft (student), Kuali (finance), Genesys (human resources), and homegrown sponsored funding database. This presentation focuses on the efforts of institutional research and effectiveness offices in developing the decision support applications using the SAS® Enterprise business intelligence and analytical solutions. With users ranging from nontechnical to advanced analysts, greater efficiency lies in the ability to get faster and more elegant reporting from those huge stores of data and being able to share the resulting discoveries across departments. Most of the reporting applications were developed based on the needs of IPEDS, CUPA, Common Data Set, US News and World Report, g raduation and retention, and faculty activity, and deployed through an online web-based portal. The participants will learn how the University quickly analyzes institutional data through an easy-to-use, drag-and-drop, web-based application. This presentation demonstrates how to use SAS® Visual Analytics to quickly design reports that are attractive, interactive, and meaningful and then distribute those reports via the web, or through SAS® Mobile BI on an iPad® or tablet.
Read the paper (PDF).
Sivakumar Jaganathan, University of Connecticut
Thulasi Kumar Raghuraman, University of Connecticut
Sivakumar Jaganathan, University of Connecticut
Paper 3274-2015:
Statistical Analysis of Publicly Released Survey Data with SAS/STAT® Software SURVEY Procedures
Several U.S. Federal agencies conduct national surveys to monitor health status of residents. Many of these agencies release their survey data to the public. Investigators might be able to address their research objectives by conducting secondary statistical analyses with these available data sources. This paper describes the steps in using the SAS SURVEY procedures to analyze publicly released data from surveys that use probability sampling to make statistical inference to a carefully defined population of elements (the target population).
Read the paper (PDF). | Watch the recording.
Donna Brogan, Emory University, Atlanta, GA
Paper 2040-2015:
Survival Analysis with Survey Data
Surveys are designed to elicit information about population characteristics. A survey design typically combines stratification and multistage sampling of intact clusters, sub-clusters, and individual units with specified probabilities of selection. A survey sample can produce valid and reliable estimates of population parameters at a fraction of the cost of carrying out a census of the entire population, with clear logistical efficiencies. For analyses of survey data, SAS®software provides a suite of procedures from SURVEYMEANS and SURVEYFREQ for generating descriptive statistics and conducting inference on means and proportions to regression-based analysis through SURVEYREG and SURVEYLOGISTIC. For longitudinal surveys and follow-up studies, SURVEYPHREG is designed to incorporate aspects of the survey design for analysis of time-to-event outcomes based on the Cox proportional hazards model, allowing for time-varying explanatory variables.We review the salient features of the SURVEYPHREG procedure with application to survey data from the National Health and Nutrition Examination Survey (NHANES III) Linked Mortality File.
Read the paper (PDF).
JOSEPH GARDINER, MICHIGAN STATE UNIVERSITY
T
Paper 3504-2015:
The %ic_mixed Macro: A SAS Macro to Produce Sorted Information Criteria (AIC and BIC) List for PROC MIXED for Model Selection
PROC MIXED is one of the most popular SAS procedures to perform longitudinal analysis or multilevel models in epidemiology. Model selection is one of the fundamental questions in model building. One of the most popular and widely used strategies is model selection based on information criteria, such as Akaike Information Criterion (AIC) and Sawa Bayesian Information Criterion (BIC). This strategy considers both fit and complexity, and enables multiple models to be compared simultaneously. However, there is no existing SAS procedure to perform model selection automatically based on information criteria for PROC MIXED, given a set of covariates. This paper provides information about using the SAS %ic_mixed macro to select a final model with the smallest value of AIC and BIC. Specifically, the %ic_mixed macro will do the following: 1) produce a complete list of all possible model specifications given a set of covariates, 2) use do loop to read in one model specification every time and save it in a macro variable, 3) execute PROC MIXED and use the Output Delivery System (ODS) to output AICs and BICs, 4) append all outputs and use the DATA step to create a sorted list of information criteria with model specifications, and 5) run PROC REPORT to produce the final summary table. Based on the sorted list of information criteria, researchers can easily identify the best model. This paper includes the macro programming language, as well as examples of the macro calls and outputs.
Read the paper (PDF).
Qinlei Huang, St Jude Children's Research Hospital
Paper 3278-2015:
The Analytics Behind an NBA Name Change
For the past two academic school years, our SAS® Programming 1 class had a classroom discussion about the Charlotte Bobcats. We wondered aloud If the Bobcats changed their team name would the dwindling fan base return? As a class, we created a survey that consisted of 10 questions asking people if they liked the name Bobcats, did they attend basketball games, and if they bought merchandise. Within a one-hour class period, our class surveyed 981 out of 1,733 students at Phillip O. Berry Academy of Technology. After collecting the data, we performed advanced analytics using Base SAS® and concluded that 75% of students and faculty at Phillip O. Berry would prefer any other name except the Bobcats. In other results, 80% percent of the student body liked basketball, and the most preferred name was the Hornets, followed by the Royals, Flight, Dragons, and finally the Bobcats, The following school year, we conducted another survey to discover if people's opinions had changed since the previous survey and if people were happy with the Bobcats changing their name. During this time period, the Bobcats had recently reported that they were granted the opportunity to change the team name to the Hornets. Once more, we collected and analyzed the data and concluded that 77% percent of people surveyed were thrilled with the name change. In addition, around 50% percent of surveyors were interested in purchasing merchandise. Through the work of this project, SAS® Analytics was applied in the classroom to a real world scenario. The ability to see how SAS® could be applied to a question of interest and create change inspired the students in our class. This project is significantly important to show the economic impact that sports can have on a city. This project in particular, focused on the nostalgia that people of the city of Charlotte felt for the name Hornets. The project opened the door for more analysis and questions and continues to spa rk interest. This is the case because when people have a connection to the team and the more the team flourishes, the more Charlotte benefits.
Read the paper (PDF). | Download the data file (ZIP).
Lauren Cook, Charlotte Mecklenburg School System
Paper 3060-2015:
The Knight's Tour in Chess--Implementing a Heuristic Solution
The knight's tour is a sequence of moves on a chess board such that a knight visits each square only once. Using a heuristic method, it is possible to find a complete path, beginning from any arbitrary square on the board and landing on the remaining squares only once. However, the implementation poses challenging programming problems. For example, it is necessary to discern viable knight moves, which change throughout the tour. Even worse, the heuristic approach does not guarantee a solution. This paper explains a SAS® solution that finds a knight's tour beginning from every initial square on a chess board...well, almost.
Read the paper (PDF).
John R Gerlach, Dataceutics, Inc.
Paper SAS1745-2015:
The SEM Approach to Longitudinal Data Analysis Using the CALIS Procedure
Researchers often use longitudinal data analysis to study the development of behaviors or traits. For example, they might study how an elderly person's cognitive functioning changes over time or how a therapeutic intervention affects a certain behavior over a period of time. This paper introduces the structural equation modeling (SEM) approach to analyzing longitudinal data. It describes various types of latent curve models and demonstrates how you can use the CALIS procedure in SAS/STAT® software to fit these models. Specifically, the paper covers basic latent curve models, such as unconditional and conditional models, as well as more complex models that involve multivariate responses and latent factors. All illustrations use real data that were collected in a study that looked at maternal stress and the relationship between mothers and their preterm infants. This paper emphasizes the practical aspects of longitudinal data analysis. In addition to illustrating the program code, it shows how you can interpret the estimation results and revise the model appropriately. The final section of the paper discusses the advantages and disadvantages of the SEM approach to longitudinal data analysis.
Read the paper (PDF).
Xinming An, SAS
Yiu-Fai Yung, SAS
Paper 3200-2015:
The Use Of SAS® Maps For University Retention And Recruiting
Universities often have student data that is difficult to represent, which includes information about the student's home location. Often, student data is represented in tables, and patterns are easily overlooked. This study aimed to represent recruiting and retention data at a county level using SAS® mapping software for a public, land-grant university. Three years of student data from the student records database were used to visually represent enrollment, retention, and other predictors of student success. SAS® Enterprise Guide® was used along with the GMAP procedure to make user-friendly maps. Displaying data using maps on a county level revealed patterns in enrollment, retention, and other factors of interest that might have otherwise been overlooked, which might be beneficial for recruiting purposes.
Read the paper (PDF).
Allison Lempola, South Dakota State University
Thomas Brandenburger, South Dakota State University
Paper 3423-2015:
To Believe or Not To Believe? The Truth of Data Analytics Results
Drawing on the results from machine learning, exploratory statistics, and a variety of related methodologies, data analytics is becoming one of the hottest areas in a variety of global industries. The utility and application of these analyses have been extremely impressive and have led to successes ranging from business value generation to hospital infection control applications. This presentation examines the philosophical foundations epistemology associated with scientific discovery and shows whether the currently used analytics techniques rest on a rational philosophy of science. Examples are provided to assist in making the concepts more concrete to the business and scientific user.
Read the paper (PDF). | Watch the recording.
Mike Hardin, The University of Alabama
U
Paper 3320-2015:
Using PROC SURVEYREG and PROC SURVEYLOGISTIC to Assess Potential Bias
The Behavioral Risk Factor Surveillance System (BRFSS) collects data on health practices and risk behaviors via telephone survey. This study focuses on the question, On average, how many hours of sleep do you get in a 24-hour period? Recall bias is a potential concern in interviews and questionnaires, such as BRFSS. The 2013 BRFSS data is used to illustrate the proper methods for implementing PROC SURVEYREG and PROC SURVEYLOGISTIC, using the complex weighting scheme that BRFSS provides.
Read the paper (PDF).
Lucy D'Agostino McGowan, Vanderbilt University
Alice Toll, Vanderbilt University
Paper 3101-2015:
Using SAS® Enterprise Miner™ to Predict Breast Cancer at an Early Stage
Breast cancer is the leading cause of cancer-related deaths among women worldwide, and its early detection can reduce mortality rate. Using a data set containing information about breast screening provided by the Royal Liverpool University Hospital, we constructed a model that can provide early indication of a patient's tendency to develop breast cancer. This data set has information about breast screening from patients who were believed to be at risk of developing breast cancer. The most important aspect of this work is that we excluded variables that are in one way or another associated with breast cancer, while keeping the variables that are less likely to be associated with breast cancer or whose associations with breast cancer are unknown as input predictors. The target variable is a binary variable with two values, 1 (indicating a type of cancer is present) and 0 (indicating a type of cancer is not present). SAS® Enterprise Miner™ 12.1 was used to perform data validation and data cleansing, to identify potentially related predictors, and to build models that can be used to predict at an early stage the likelihood of patients developing breast cancer. We compared two models: the first model was built with an interactive node and a cluster node, and the second was built without an interactive node and a cluster node. Classification performance was compared using a receiver operating characteristic (ROC) curve and average squares error. Interestingly, we found significantly improved model performance by using only variables that have a lesser or unknown association with breast cancer. The result shows that the logistic model with an interactive node and a cluster node has better performance with a lower average squared error (0.059614) than the model without an interactive node and a cluster node. Among other benefits, this model will assist inexperienced oncologists in saving time in disease diagnosis.
Read the paper (PDF).
Gibson Ikoro, Queen Mary University of London
Beatriz de la Iglesia, University of East Anglia, Norwich, UK
Paper 3440-2015:
Using SAS® Text Analytics to Examine Gubernatorial Rhetoric and Mass Incarceration
Throughout the latter part of the twentieth century, the United States of America has experienced an incredible boom in the rate of incarceration of its citizens. This increase arguably began in the 1970s when the Nixon administration oversaw the beginning of the war on drugs in America. The U.S. now has one of the highest rates of incarceration among industrialized nations. However, the citizens who have been incarcerated on drug charges have disproportionately been African American or other racial minorities, even though many studies have concluded that drug use is fairly equal among racial groups. In order to remedy this situation, it is essential to first understand why so many more people have been arrested and incarcerated. In this research, I explore a potential explanation for the epidemic of mass incarceration. I intend to answer the question does gubernatorial rhetoric have an effect on the rate of incarceration in a state? More specifically, I am interested in examining the language that the governor of a state uses at the annual State of the State address in order to see if there is any correlation between rhetoric and the subsequent rate of incarceration in that state. In order to understand any possible correlation, I use SAS® Text Miner and SAS® Contextual Analysis to examine the attitude towards crime in each speech. The political phenomenon that I am trying to understand is how state government employees are affected by the tone that the chief executive of a state uses towards crime, and whether the actions of these state employees subsequently lead to higher rates of incarceration. The governor is the top government official in charge of employees of a state, so when this official addresses the state, the employees may take the governor's message as an order for how they do their jobs. While many political factors can affect legislation and its enforcement, a governor has the ability to set the tone of a state when it comes to policy issues suc h as crime.
Catherine Lachapelle, UNC Chapel Hill
Paper 3452-2015:
Using SAS® to Increase Profitability through Cluster Analysis and Simplicity: Follow the Money
Developing a quality product or service, while at the same time improving cost management and maximizing profit, are challenging goals for any company. Finding the optimal balance between efficiency and profitability is not easy. The same can be said in regards to the development of a predictive statistical model. On the one hand, the model should predict as accurately as possible. On the other hand, having too many predictors can end up costing company money. One of the purposes of this project is to explore the cost of simplicity. When is it worth having a simpler model, and what are some of the costs of using a more complex one? The answer to that question leads us to another one: How can a predictive statistical model be maximized in order to increase a company's profitability? Using data from the consumer credit risk domain provided from CompuCredit (now Atlanticus), we used logistic regression to build binary classification models to predict the likelihood of default. This project compares two of these models. Although the original data set had several hundred predictor variables and more than a million observations, I chose to use rather simple models. My goal was to develop a model with as few predictors as possible, while not going lower than a concordant level of 80%. Two models were evaluated and compared based on efficiency, simplicity, and profitability. Using the selected model, cluster analysis was then performed in order to maximize the estimated profitability. Finally, the analysis was taken one step further through a supervised segmentation process, in order to target the most profitable segment of the best cluster.
Read the paper (PDF).
Sherrie Rodriguez, Kennesaw State University
Paper 1335-2015:
Using the GLIMMIX and GENMOD Procedures to Analyze Longitudinal Data from a Department of Veterans Affairs Multisite Randomized Controlled Trial
Many SAS® procedures can be used to analyze longitudinal data. This study employed a multisite randomized controlled trial design to demonstrate the effectiveness of two SAS procedures, GLIMMIX and GENMOD, to analyze longitudinal data from five Department of Veterans Affairs Medical Centers (VAMCs). Older male veterans (n = 1222) seen in VAMC primary care clinics were randomly assigned to two behavioral health models, integrated (n = 605) and enhanced referral (n = 617). Data was collected at baseline, and at 3-, 6-, and 12- month follow-up. A mixed-effects repeated measures model was used to examine the dependent variable, problem drinking, which was defined as count and dichotomous from baseline to 12 month follow-up. Sociodemographics and depressive symptoms were included as covariates. First, bivariate analyses included general linear model and chi-square tests to examine covariates by group and group by problem drinking outcomes. All significant covariates were included in the GLIMMIX and GENMOD models. Then, multivariate analysis included mixed models with Generalized Estimation Equations (GEEs). The effect of group, time, and the interaction effect of group by time were examined after controlling for covariates. Multivariate results were inconsistent for GLIMMIX and GENMOD using Lognormal, Gaussian, Weibull, and Gamma distributions. SAS is a powerful statistical program in data analyses for longitudinal study.
Read the paper (PDF).
Abbas Tavakoli, University of South Carolina/College of Nursing
Marlene Al-Barwani, University of South Carolina
Sue Levkoff, University of South Carolina
Selina McKinney, University of South Carolina
Nikki Wooten, University of South Carolina
Paper 3376-2015:
Using the SAS-PIRT Macro for Estimating the Parameters of Polytomous Items
Polytomous items have been widely used in educational and psychological settings. As a result, the demand for statistical programs that estimate the parameters of polytomous items has been increasing. For this purpose, Samejima (1969) proposed the graded response model (GRM), in which category characteristic curves are characterized by the difference of the two adjacent boundary characteristic curves. In this paper, we show how the SAS-PIRT macro (a SAS® macro written in SAS/IML®) was developed based on the GRM and how it performs in recovering the parameters of polytomous items using simulated data.
Read the paper (PDF).
Sung-Hyuck Lee, ACT, Inc.
V
Paper 3486-2015:
Visualizing Student Enrollment Trends Compared across Calendar Periods and Grouped by Categories with SAS® Visual Analytics
Enrollment management is very important to all colleges. Having the correct tools to help you better understand your enrollment patterns of the past and the future is critical to any school. This session will describe how Valencia College went from manually updating static charts for enrollment management, to building dynamic, interactive visualizations to compare how students register across different calendar-date periods (current versus previous period)grouped by different start-of-registration dates--from start of registration, days into registration, and calendar date to previous year calendar date. This includes being able to see the trend by college campus, instructional method mode (onsite or online ) or by type of session (part of semester, full, and so on) all available in one visual and sliced and diced via check lists. The trend loads 4-6 million rows of data nightly to the SAS® LASR™ Analytics Server in a snap with no performance issues on the back-end or presentation visual. We will give a brief history of how we used to load data into Excel and manually build charts. Then we will describe the current environment, which is an automated approach through SAS® Visual Analytics. We will show pictures of our old, static reports, and then show the audience the power and functionality of our new, interactive reports through SAS Visual Analytics.
Read the paper (PDF).
Juan Olvera, Valencia College
W
Paper 3216-2015:
Where Did My Students Go?
Many freshmen leave their first college and go on to attend another institution. Some of these students are even successful in earning degrees elsewhere. As there is more focus on college graduation rates, this paper shows how the power of SAS® can pull in data from many disparate sources, including the National Student Clearinghouse, to answer questions on the minds of many institutional researchers. How do we use the data to answer questions such as What would my graduation rate be if these students graduated at my institution instead of at another one?', What types of schools do students leave to attend? , and Are there certain characteristics of students who leave, and are they concentrated in certain programs? The data-handling capabilities of SAS are perfect for this type of analysis, and this presentation walks you through the process.
Read the paper (PDF).
Stephanie Thompson, Datamum
Paper 3390-2015:
Working with PROC FEDSQL in SAS® 9.4
Working with multiple data sources in SAS® was not a straight forward thing until PROC FEDSQL was introduced in the SAS® 9.4 release. Federated Query Language, or FEDSQL, is a vendor-independent language that provides a common SQL syntax to communicate across multiple relational databases without having to worry about vendor-specific SQL syntax. PROC FEDSQL is a SAS implementation of the FEDSQL language. PROC FEDSQL enables us to write federated queries that can be used to perform joins on tables from different databases with a single query, without having to worry about loading the tables into SAS individually and combining them using DATA steps and PROC SQL statements. The objective of this paper is to demonstrate the working of PROC FEDSQL to fetch data from multiple data sources such as Microsoft SQL Server database, MySQL database, and a SAS data set, and run federated queries on all the data sources. Other powerful features of PROC FEDSQL such as transactions and FEDSQL pass-through facility are discussed briefly.
Read the paper (PDF).
Zabiulla Mohammed, Oklahoma State University
Ganesh Kumar Gangarajula, Oklahoma State University
Pradeep Reddy Kalakota, Federal Home Loan Bank of Desmoines
back to top