SAS® Grid Computing is a scale-out SAS®
solution that enables SAS applications to better utilize computing
resources, which is extremely I/O and compute intensive. It requires the
use of a high-performance shared storage (SS) that allows all servers to
access the same file systems. SS may be implemented via traditional NFS
NAS or clustered file systems (CFS) like GPFS. This paper uses the Lustre*
file system, a parallel, distributed CFS, for a case study of performance
scalability of SAS Grid Computing nodes on SS. The paper qualifies the
performance of a standardized SAS workload running on Lustre at scale.
Lustre has been traditionally used for large and sequential I/O. We will
record and present the tuning changes necessary for the optimization of
Lustre for the SAS applications. In addition, results from the scaling of
SAS Cluster jobs running on Lustre will be presented.
Suleyman Sair, Intel Corporation
Brett Lee, Intel Corporation
Ying M. Zhang, Intel Corporation
Nowadays, most corporations build and maintain their own data warehouse,
and an ETL (Extract, Transform, and Load) process plays a critical role in
managing the data. Some people might create a large program and execute
this program from top to bottom. Others might generate a SAS®
driver with several programs included, and then execute this driver. If
some programs can be run in parallel, then developers must write extra
code to handle these concurrent processes. If one program fails, then
users can either rerun the entire process or comment out the successful
programs and resume the job from where the program failed. Usually the
programs are deployed in production with read and execute permission only.
Users do not have the priviledge of modifying codes on the fly. In this
case, how do you comment out the programs if the job terminated
abnormally? This paper illustrates an approach for managing ETL process
flows. The approach uses a framework based on SAS, on a UNIX platform.
This is a high-level infrastructure discussion with some explanation of
the SAS codes that are used to implement the framework. The framework
supports the rerun or partial run of the entire process without changing
any source codes. It also supports the concurrent process, and therefore
no extra code is needed.
Kevin Chung, Fannie Mae
SID file, SAS® Deployment Wizard, SAS®
Migration Utility, SAS® Environment Manager, plan file.
SAS® can seem very mysterious to IT organizations used to
working with other software solutions. The more IT knows and understands
about SAS how it works, what its system requirements are, how to maintain
it and back it up, and what its value is to the organization the better IT
can support the SAS shop. This paper provides an introduction to the world
of SAS and sheds light on some of the unique elements of maintaining a SAS
environment.
Lisa Horwitz, SAS
The current study looks at recent health trends and behavior analyses of
youth in America. Data used in this analysis was provided by the Center
for Disease Control and Prevention and gathered using the Youth Risk
Behavior Surveillance System (YRBSS). A factor analysis was performed to
identify and define latent mental health and risk behavior variables. A
series of logistic regression analyses were then performed using the risk
behavior and demographic variables as potential contributing factors to
each of the mental health variables. Mental health variables included
disordered eating and depression/suicidal ideation data, while the risk
behavior variables included smoking, consumption of alcohol and drugs,
violence, vehicle safety, and sexual behavior data. Implications derived
from the results of this research are a primary focus of this study. Risks
and benefits of using a factor analysis with logistic regression in social
science research will also be discussed in depth. Results included
reporting differences between the years of 1991 and 2011. All results are
discussed in relation to current youth health trend issues. Data was
analyzed using SAS® 9.3.
Deanna Schreiber-Gregory, North Dakota State University
One of the first lessons that SAS® programmers learn on the
job is that numeric and character variables do not play well together, and
that type mismatches are one of the more common source of errors in their
otherwise flawless SAS programs. Luckily, converting variables from one
type to another in SAS (that is, casting) is not difficult, requiring only
the judicious use of either the input() or put() function. There remains,
however, the danger of data being lost in the conversion process. This
type of error is most likely to occur in cases of character-to-numeric
variable conversion, most especially when the user does not fully
understand the data contained in the data set. This paper will review the
basics of data storage for character and numeric variables in SAS, the use
of formats and informats for conversions, and how to ensure accurate type
conversion of even high-precision numeric values.
Andrew Clapson, Statistics Canada
Complex data manipulations can be resource intensive, both in terms of
development time and processing duration. However, in recent years SAS has
introduced a number of new technologies that, when used together, can
produce a dramatic increase in performance while simultaneously
simplifying program development and maintenance. This paper presents a
development paradigm that utilizes the problem decomposition capabilities
of DS2, the flexibility of SQL, and the performance benefits of in-memory
storage using hash objects.
Shaun Kaufmann, Farm Credit Canada
Have you ever wished that with one click you could copy any SAS®
data set, including variable names, so that you could paste the text into
a Microsoft Word file, Microsoft PowerPoint slide, or spreadsheet? You can
and, with just Base SAS®, there are some little-known but
easy-to use methods that are available for automating many of your (or
your users ) common tasks.
Arthur Tabachneck, myQNA, Inc.
Tom Abernathy, Pfizer, Inc.
Matthew Kastin, I-Behavior, Inc.
Hip fractures are a common source of morbidity and mortality among the
elderly. While multiple prior studies have identified risk factors for
poor outcomes, few studies have presented a validated method for
stratifying patient risk. The purpose of this study was to develop a
simple risk score calculator tool predictive of 30-day morbidity after hip
fracture. To achieve this, we prospectively queried a database maintained
by The American College of Surgeons (ACS) National Surgical Quality
Improvement Program (NSQIP) to identify all cases of hip fracture between
2005 and 2010, based on primary Current Procedural Terminology (CPT)
codes. Patient demographics, comorbidities, laboratory values, and
operative characteristics were compared in a univariate analysis, and a
multivariate logistic regression analysis was then used to identify
independent predictors of 30-day morbidity. Weighted values were assigned
to each independent risk factor and were used to create predictive models
of 30-day complication risk. The models were internally validated with
randomly partitioned 80%/20% cohort groups. We hypothesized that
significant predictors of morbidity could be identified and used in a
predictive model for a simple risk score calculator. All analyses are
performed via SAS® software.
Yubo Gao, University of Iowa Hospitals and Clinics
This paper shows users how they can use a SAS® macro named
%SURVEYGLM to incorporate information about survey design to Generalized
Linear Models (GLM). The R function %svyglm (Lumley, 2004) was used to
verify the suitability of the %SURVEYGLM macro estimates. The results show
that estimates are closer than the R function and that new distributions
can be easily added to the algorithm.
Paulo Henrique Dourado da Silva, University of Brasilia
Alan Ricardo da Silva, Universidade de Brasilia
Influence analysis in statistical modeling looks for observations that
unduly influence the fitted model. Cook s distance is a standard tool for
influence analysis in regression. It works by measuring the difference in
the fitted parameters as individual observations are deleted. You can
apply the same idea to examining influence of groups of observations for
example, the multiple observations for subjects in longitudinal or
clustered data but you need to adapt it to the fact that different
subjects can have different numbers of observations. Such an adaptation is
discussed by Zhu, Ibrahim, and Cho (2012), who generalize the subject size
factor as the so-called degree of perturbation, and correspondingly
generalize Cook s distances as the scaled Cook s distance. This paper
presents the %SCDMixed SAS
® macro, which implements these
ideas for analyzing influence in mixed models for longitudinal or
clustered data. The macro calculates the degree of perturbation and scaled
Cook s distance measures of Zhu et al. (2012) and presents the results
with useful tabular and graphical summaries. The underlying theory is
discussed, as well as some of the programming tricks useful for computing
these influence measures efficiently. The macro is demonstrated using both
simulated and real data to show how you can interpret its results for
analyzing influence in your longitudinal modeling.
Download the ZIP file
Grant Schneider, The Ohio State University
Randy Tobias, SAS Institute
Stepwise regression includes regression models in which the predictive
variables are selected by an automated algorithm. The stepwise method
involves two approaches: backward elimination and forward selection.
Currently, SAS® has three procedures capable of performing
stepwise regression: REG, LOGISTIC, and GLMSELECT. PROC REG handles the
linear regression model, but does not support a CLASS statement. PROC
LOGISTIC handles binary responses and allows for logit, probit, and
complementary log-log link functions. It also supports a CLASS statement.
The GLMSELECT procedure performs selections in the framework of general
linear models. It allows for a variety of model selection methods,
including the LASSO method of Tibshirani (1996) and the related LAR method
of Efron et al. (2004). PROC GLMSELECT also supports a CLASS statement. We
present a stepwise algorithm for generalized linear mixed models for both
marginal and conditional models. We illustrate the algorithm using data
from a longitudinal epidemiology study aimed to investigate parents
beliefs, behaviors, and feeding practices that associate positively or
negatively with indices of sleep quality.
Nagaraj Neerchal, University of Maryland Baltimore County
Jorge Morel, Procter and Gamble
Xuang Huang, University of Maryland Baltimore County
Alain Moluh, University of Maryland Baltimore County
SAS® functions provide amazing power to your DATA step
programming. Some of these functions are essential some of them save you
writing volumes of unnecessary code. This paper covers some of the most
useful SAS functions. Some of these functions might be new to you, and
they will change the way you program and approach common programming
tasks.
Ron Cody, Camp Verde Associates
This paper simply develops a new SAS® macro, which allows
you to scrap user textual reviews from Apple iTunes store for iPhone
applications. It not only can help you understand your customers
experiences and needs, but also can help you be aware of your competitors
user experiences. The macro uses API in iTunes and PROC HTTP in SAS to
extract and create data sets. This paper also shows how you can use the
application ID and country code to extract user reviews.
Jiawen Liu, Qualex Consulting Services, Inc.
Mantosh Kumar Sarkar, Verizon
Meizi Jin, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
The ExcelXP tagset offers several options for controlling column widths,
including Width_Points, Width_Fudge, and Absolute_Column_Width. Although
Absolute_Column_Width might seem unpredictable at first, it is possible to
fix the first two options so that the Absolute_Column_Width is the exact
column width in pixels. This poster presents these settings and suggests
how to create and manage the integer string of column widths.
Dylan Ellis, Mathematica Policy Research
Accelerated testing is an effective tool for predicting when systems fail,
where the system can be as simple as an engine gasket or as complex as a
magnetic resonance imaging (MRI) scanner. In particular, you might conduct
an experiment to determine how factors such as temperature and voltage
impose enough stress on the system to cause failure. Because system
components usually meet nominal quality standards, it can take a long time
to obtain failure data under normal-use conditions. An effective strategy
is to accelerate the experiment by testing under abnormally stressful
conditions, such as higher temperatures. Following that approach, you
obtain data more quickly, and you can then analyze the data by using the
RELIABILITY procedure in SAS/QC® software. The analysis is a
three-step process: you establish a probability model, explore the
relationship between stress and failure, and then extrapolate to
normal-use conditions. Graphs are a key component of all three stages: you
choose a model by comparing residual plots from candidate models, use
graphs to examine the stress-failure relationship, and then use an
appropriately scaled graph to extrapolate along a straight line. This
paper guides you through the process, and it highlights features added to
the RELIABILITY procedure in SAS/QC 13.1.
Bobby Gutierrez, SAS
Structured Query Language (SQL) does not recognize the concept of row
order. Instead, query results are thought of as unordered sets of rows.
Most workarounds involve including serial numbers, which can then be
compared or subtracted. This presentation illustrates and compares five
techniques for creating serial numbers.
Howard Schreier, Howles Informatics
The SAS® Data Quality Server allows SAS®
programmers to integrate the power of DataFlux® into their
data cleaning programs. The power of SAS Data Quality Server enables
programmers to efficiently identify matching records across different
datasets when exact matches are not present. During a recent educational
research project, the DQMATCH function proved very capable when trying to
link records from disparate data sources. Two key insights led to even
greater success in linking records. The first insight was acknowledging
that the hierarchical structure of data can greatly improve success in
matching records. The second insight was that the names of individuals can
be restructured to improve the chances of successful matches. This paper
provides an overview of how these insights were implemented using the
DQMATCH function to link educational data from multiple sources.
Pat Taylor, University of Houston
Lee Branum-Martin, Georgia State University
Cluster (group) randomization in trials is increasingly used over
patient-level randomization. There are many reasons for this, including
more pragmatic trials associated with comparative effectiveness research.
Examples of clusters that could be randomized for study are clinics or
hospitals, counties within a state, and other geographical areas such as
communities. In many of these trials, the number of clusters is relatively
small. This can be a problem if there are important covariates at the
cluster level that are not balanced across the intervention and control
groups. For example, if we randomize eight counties, a simple
randomization could put all counties with high socioeconomic status in one
group or the other, leaving us without good comparison data. There are
strategies to prevent an unlucky cluster randomization. These include
matching, stratification, minimization and covariate-constrained
randomization. Each method is discussed, and a county-level Health
Economics example of covariate-constrained randomization is shown for
intermediate SAS® users working with SAS®
Foundation for Release 9.2 and SAS/STAT® on a Windows
operating system.
Brenda Beaty, University of Colorado
L. Miriam Dickinson, University of Colorado
The Base SAS® 9.4 Output Delivery System (ODS) EPUB
destination enables users to deliver SAS® reports as e-books
on Apple mobile devices. The first maintenance release of SAS®
9.4 adds the ODS EPUB3 destination, which offers powerful new multimedia
and presentation features to report writers. This paper shows you how to
include images, audio, and video in your ODS EPUB3 e-book reports. You
learn how to use publishing presentation techniques such as sidebars and
multicolumn layouts. You become familiar with best practices for
accessibility when employing these new features in your reports. This
paper provides advanced instruction for writing e-books with ODS EPUB.
Please bring your iPad, iPhone, or iPod to the presentation so that you
can download and read the examples.
David Kelley, SAS
SAS® 9.4 has overhauled web authentication schemes, and the
integration with enterprise security infrastructure is quite different
from that of SAS® 9.3. This paper examines advanced security
features such as Secure Sockets Layer (SSL) configuration, single sign-on
(SSO) support through Integrated Windows authentication (IWA), and
third-party security packages like CA SiteMinder and IBM Tivoli Access
Manager and WebSEAL. FIPS 140-2 compliance efforts that enforce the use of
a stronger encryption algorithm for web communication and the SAS®
system itself are also described. The authentication support for mobile
devices such as the iPad is different. The secure Wi-Fi connection from a
mobile device to the IT internal resources, as well as how it can be
safely integrated into the enterprise security configuration by using the
same user repository as the SAS web applications, is explained. The
configuration example is shown with SAS® Visual Analytics
6.2.
Heesun Park, SAS
Paper 1832-2014:
Agile Marketing in a Data-Driven World
The operational tempo of marketing in a digital world seems faster every
day. New trends, issues, and ideas appear and spread like wildfire,
demanding that sales and marketing adapt plans and priorities on-the-fly.
The technology available to us can handle this, but traditional
organizational processes often become the bottleneck. The solution is a
new management approach called agile marketing. Drawing upon the success
of agile software development and the lean start-up movement, agile
marketing is a simple but powerful way to make marketing teams more nimble
and marketing programs more responsive. You don't have to be small to be
agile agile marketing has thrived at large enterprises such as Cisco and
EMC. This session covers the basics of agile marketing what it is, why it
works, and how to get started with agile marketing in your own team. In
particular, we look at how agile marketing dovetails with the explosion of
data-driven management in marketing by using the fast feedback from
analytics to rapidly iterate and adapt new marketing programs in a rapid
yet focused fashion.
Scott Brinker, ion interactive, inc.
Paper 2165-2014:
Allocation: Getting the Right Products to the Right Locations in the Right
Quantities Is the Retail Brass Ring!
Allocation is key. If the allocation isn t right, it can lead to
out-of-stocks, lost sales, and customer dissatisfaction. Automating your
most complicated and time-consuming tasks, and ensuring you are feeding
the right data at the right time, is critical. This session will review
how Beall s Outlet and Beall s Department Stores are managing and
executing allocations at optimal levels. How using attributes and group
definitions allow them to be responsive to trends, history, and plans.
Trina Gladwell, Bealls, Inc.
SAS® 9.4 and SAS® Visual Analytics support a
wide list of authentication protocols such as Integrated Windows
authentication (IWA), client certificate, IBM WebSEAL, CA SiteMinder, and
Security Assertion Markup Language (SAML) 2.0. However, advanced customers
might want to use some of these protocols together and also have the
flexibility to select which protocols to use. In this paper, we focus on a
fallback authentication framework that supports IWA as the primary
authentication method. When IWA fails, it uses the X509 client certificate
as the secondary authentication method, and when the client certificate
fails, it uses the form-based username/password as the last option. The
paper first introduces the security architecture of SAS® 9.4
and SAS Visual Analytics. It then reviews the three above-mentioned
security protocols. Further, it introduces the detailed fallback
authentication framework and discusses how to configure it. Finally, we
discuss the use of this framework in the customer scenario from
implementing the fallback authentication framework in a customer s SAS®
9.4 and SAS Visual Analytics environment.
Zhiyong Li, SAS
Mike Roda, SAS
The videogame industry is a growing business, with an annual growth rate
that exceeded 16.7% for the period 2005 through 2008. Moreover, revenues
from online games will account for more than 38% of total video game
software revenues by 2013. Due to this, online games are vulnerable to
illicit player activity that results in cheating. Cheating in online games
could damage the reputation of the game when honest players realize that
their peers are cheating, resulting in the loss of trust from honest
players, and ultimately reducing revenue for the gameproducers. Analysis
of game data is fundamental for understanding player behaviors and for
combating cheating in online games. In this presentation, we propose a
data analysis methodology for detecting cheating in massive multiplayer
online (MMO) racing games. More specifically, our work focuses on bot
detection. A bot controls a player automatically and is characterized by
repetitive behavior. Players in an MMO racing game can use bots to play
during the races using artificial intelligence favoring their odds to win,
and can automate the process of starting a new race upon finishing the
last one. This results in a high number of races played with race duration
showing low mean and low standard deviation, and time in between races
showing consistent low median value. A study case is built on upon data
from an MMO racing game. Our results indicate that our methodology
successfully characterize suspiciousplayer behavior.
Andrea Villanes, North Carolina State University
Finding groups with similar attributes is at the core of knowledge
discovery. To this end, Cluster Analysis automatically locates groups of
similar observations. Despite successful applications, many practitioners
are uncomfortable with the degree of automation in Cluster Analysis, which
causes intuitive knowledge to be ignored. This is more true in text mining
applications since individual words have meaning beyond the data set.
Discovering groups with similar text is extremely insightful. However,
blind applications of clustering algorithms ignore intuition and hence are
unable to group similar text categories. The challenge is to integrate the
power of clustering algorithms with the knowledge of experts. We
demonstrate how SAS/STAT® 9.2 procedures and the SAS®
Macro Language are used to ensemble the opinion of domain experts with
multiple clustering models to arrive at a consensus. The method has been
successfully applied to a large data set with structured attributes and
unstructured opinions. The result is the ability to discover observations
with similar attributes and opinions by capturing the wisdom of the crowds
whether man or model.
Masoud Charkhabi, Canadian Imperial Bank of Commerce (CIBC)
Ling Zhu, Canadian Imperial Bank of Commerce (CIBC)
SAS/ACCESS® Interface to ODBC has been around forever. On
one level, ODBC is very easy to use. That ease hides the flexibility that
ODBC offers. This presentation uses examples to show you how to increase
your program's performance and troubleshoot problems. You will learn: the
differences between ODBC and OLE DB what the odbc.ini file is (and why it
is important) how to discover what your ODBC driver is actually doing the
difference between a native ACCESS engine and SAS/ACCESS Interface to ODBC
Jeff Bailey, SAS
This paper expands upon A Multilevel Model Primer Using SAS®
PROC MIXED in which we presented an overview of estimating two- and
three-level linear models via PROC MIXED. However, in our earlier paper,
we, for the most part, relied on simple options available in PROC MIXED.
In this paper, we present a more advanced look at common PROC MIXED
options used in the analysis of social and behavioral science data, as
well introduce users to two different SAS macros previously developed for
use with PROC MIXED: one to examine model fit (MIXED_FIT) and the other to
examine distributional assumptions (MIXED_DX). Specific statistical
options presented in the current paper include (a) PROC MIXED statement
options for estimating statistical significance of variance estimates
(COVTEST, including problems with using this option) and estimation
methods (METHOD =), (b) MODEL statement option for degrees of freedom
estimation (DDFM =), and (c) RANDOM statement option for specifying the
variance/covariance structure to be used (TYPE =). Given the importance of
examining model fit, we also present methods for estimating changes in
model fit through an illustration of the SAS macro MIXED_FIT. Likewise,
the SAS macro MIXED_DX is introduced to remind users to examine
distributional assumptions associated with two-level linear models. To
maintain continuity with the 2013 introductory PROC MIXED paper, thus
providing users with a set of comprehensive guides for estimating
multilevel models using PROC MIXED, we use the same real-world data
sources that we used in our earlier primer paper.
Bethany Bell, University of South Carolina
Whitney Smiley, University of South Carolina
Mihaela Ene, University of South Carolina
Genine Blue, University of South Carolina
The use of Bayesian methods has become increasingly popular in modern
statistical analysis, with applications in numerous scientific fields. In
recent releases, SAS® has provided a wealth of tools for
Bayesian analysis, with convenient access through several popular
procedures in addition to the MCMC procedure, which is specifically
designed for complex Bayesian modeling (not discussed here). This paper
introduces the principles of Bayesian inference and reviews the steps in a
Bayesian analysis. It then describes the Bayesian capabilities provided in
four procedures(the GENMOD, PHREG, FMM, and LIFEREG procedures) including
the available prior distributions, posterior summary statistics, and
convergence diagnostics. Various sampling methods that are used to sample
from the posterior distributions are also discussed. The second part of
the paper describes how to use the GENMOD and PHREG procedures to perform
Bayesian analyses for real-world examples and how to take advantage of the
Bayesian framework to address scientific questions.
Maura Stokes, SAS
Fang Chen, SAS
Funda Gunes, SAS
This paper is an introduction to SAS® Studio and covers how
to perform basic programming tasks in SAS Studio. Many people program in
the SAS® language by using SAS Display Manager or SAS®
Enterprise Guide®. SAS Studio is different because it
enables you to write and run SAS code by using the most popular web
browsers, without requiring a SAS® 9.4 installation on your
machine. With SAS Studio, you can access your data files, libraries, and
existing programs, and write new programs while using SAS software behind
the scenes. SAS Studio connects to a SAS sever in order to process SAS
programs. The SAS server can be a hosted server in a cloud environment, a
server in your local environment, or a copy of SAS on your local machine.
Michael Monaco, SAS
Marie Dexter, SAS
Jennifer Tamburro, SAS
The Kolmogorov-Smirnov (K-S) test is one of the most useful and general
nonparametric methods for comparing two samples. It is sensitive to all
types of differences between two populations (shift, scale, shape, and so
on). In this paper, we will present a thorough investigation into the K-S
test including, derivation of the formal test procedure, practical
demonstration of the test, large sample approximation of the test, and
ease of use in SAS® using the NPAR1WAY procedure.
Tison Bolen, Cardinal Health
Dawit Mulugeta, Cardinal Health
Jason Greenfield, Cardinal Health
Lisa Conley, Cardinal Health
SAS
® and SAS
® Enterprise Miner
™
have provided advanced data mining and machine learning capabilities for
years beginning long before the current buzz. Moreover, SAS has
continually incorporated advances in machine learning research into its
classification, prediction, and segmentation procedures. SAS Enterprise
Miner now includes many proven machine learning algorithms in its
high-performance environment and is introducing new leading-edge scalable
technologies. This paper provides an overview of machine learning and
presents several supervised and unsupervised machine learning examples
that use SAS Enterprise Miner. So, come back to the future to see machine
learning in action with SAS!
Download the ZIP file
Patrick Hall, SAS
Jared Dean, SAS
Ilknur Kaynar Kabul, SAS
Jorge Silva, SAS
Big data! Hadoop! MapReduce! These are all buzzwords that you ve probably
already heard mentioned at SAS® Global Forum 2014. But what
exactly is MapReduce and what has it got to do with SAS®?
This talk explains how a simple processing framework (created by Google
and more recently popularized by the open-source technology Hadoop) can be
replicated using cornerstone SAS technologies such as Base
SAS®, SAS macros, and SAS/CONNECT®. The talk
explains how, out of the box, the SAS DATA step can replicate the MAP
function. It looks at how well-established SAS procedures can be used to
create reduce-like functionality. We look at how parallel processing data
across multiple machines using MPCONNECT can replicate MapReduce s
shared-nothing approach to data processing.
David Moors, Whitehound Limited
Overdispersion (extra variation) arises in binomial, multinomial, or count
data when variances are larger than those allowed by the binomial,
multinomial, or Poisson model. This phenomenon is caused by clustering of
the data, lack of independence, or both. As pointed out by McCullagh and
Nelder (1989), Overdispersion is not uncommon in practice. In fact, some
would maintain that over-dispersion is the norm in practice and nominal
dispersion the exception. Several approaches are found for handling
overdispersed data, namely quasi-likelihood and likelihood models,
generalized estimating equations, and generalized linear mixed models.
Some classical likelihood models are presented. Among them are the
beta-binomial, binomial cluster (a.k.a. random clumped binomial),
negative-binomial, zero-inflated Poisson, zero-inflated negative-binomial,
hurdle Poisson, and the hurdle negative-binomial. We focus on how these
approaches or models can be implemented in a practical way using, when
appropriate, the procedures GLIMMIX, GENMOD, FMM, COUNTREG, NLMIXED, and
SURVEYLOGISTIC. Some real data set examples are discussed in order to
illustrate these applications. We also provide some guidance on how to
analyze generalized linear overdispersion mixed models and possible
scenarios where we might encounter them.
Jorge Morel, Procter and Gamble
The proliferation of textual data in business is overwhelming.
Unstructured textual data is being constantly generated via call center
logs, emails, documents on the web, blogs, tweets, customer comments,
customer reviews, and so on. While the amount of textual data is
increasing rapidly, businesses ability to summarize, understand, and make
sense of such data for making better business decisions remain
challenging. This presentation takes a quick look at how to organize and
analyze textual data for extracting insightful customer intelligence from
a large collection of documents and for using such information to improve
business operations and performance. Multiple business applications of
case studies using real data that demonstrate applications of text
analytics and sentiment mining using SAS® Text Miner and
SAS® Sentiment Analysis Studio are presented. While SAS®
products are used as tools for demonstration only, the topics and theories
covered are generic (not tool specific).
Goutam Chakraborty, Oklahoma State University
Murali Pagolu, SAS
In randomized experiments, it is generally assumed that the hierarchical
structures and variances are the same in the treatment and control groups.
In some situations, however, these structures and variance components can
differ. Consider a randomized experiment in which individuals randomized
to the treatment condition are further assigned to clusters in which the
intervention is administered, but no such clustering occurs in the control
condition. Such a structure can occur, for example, when the individuals
in the treatment condition are randomly assigned to group therapy sessions
or to mathematics tutoring groups; individuals in the control condition do
not receive group therapy or mathematics tutoring and therefore do not
have that level of clustering. In this example, individuals in the
treatment condition have a hierarchical structure, but individuals in the
control condition do not. If the therapists or tutors differ in efficacy,
the clustering in the treatment condition induces an extra source of
variability in the data that needs to be accounted for in the analysis. We
show how special features of SAS® PROC MIXED and PROC
GLIMMIX can be used to analyze data in which one or more treatment groups
have a hierarchical structure that differs from that in the control group.
We also discuss how to code variables in order to increase the
computational efficiency for estimating parameters from these designs.
Sharon Lohr, Westat
Peter Schochet, Mathematica Policy Research
SAS/STAT® 13.1 includes the new ICLIFETEST procedure, which
is specifically designed for analyzing interval-censored data. This type
of data is frequently found in studies where the event time of interest is
known to have occurred not at a specific time but only within a certain
time period. PROC ICLIFETEST performs nonparametric survival analysis of
interval-censored data and is a counterpart to PROC LIFETEST, which
handles right-censored data. With similar syntax, you use PROC ICLIFETEST
to estimate the survival function and to compare the survival functions of
different populations. This paper introduces you to the ICLIFETEST
procedure and presents examples that illustrate how you can use it to
perform analyses of interval-censored data.
Changbin Guo, SAS
Ying So, SAS
Gordon Johnston, SAS
Hierarchical data are common in many fields, from pharmaceuticals to
agriculture to sociology. As data sizes and sources grow, information is
likely to be observed on nested units at multiple levels, calling for the
multilevel modeling approach. This paper describes how to use the GLIMMIX
procedure in SAS/STAT® to analyze hierarchical data that
have a wide variety of distributions. Examples are included to illustrate
the flexibility that PROC GLIMMIX offers for modeling within-unit
correlation, disentangling explanatory variables at different levels, and
handling unbalanced data. Also discussed are enhanced weighting options,
new in SAS/STAT 13.1, for both the MODEL and RANDOM statements. These
weighting options enable PROC GLIMMIX to handle weights at different
levels. PROC GLIMMIX uses a pseudolikelihood approach to estimate
parameters, and it computes robust standard error estimators. This new
feature is applied to an example of complex survey data that are collected
from multistage sampling and have unequal sampling probabilities.
Min Zhu, SAS
A central component of discussions of healthcare reform in the U.S. are
estimations of healthcare cost and use at the national or state level, as
well as for subpopulation analyses for individuals with certain
demographic properties or medical conditions. For example, a striking but
persistent observation is that just 1% of the U.S. population accounts for
more than 20% of total healthcare costs, and 5% account for almost 50% of
total costs. In addition to descriptions of specific data sources
underlying this type of observation, we demonstrate how to use SAS®
to generate these estimates and to extend the analysis in various ways;
that is, to investigate costs for specific subpopulations. The goal is to
provide SAS programmers and healthcare analysts with sufficient
data-source background and analytic resources to independently conduct
analyses on a wide variety of topics in healthcare research. For selected
examples, such as the estimates above, we concretely show how to download
the data from federal web sites, replicate published estimates, and extend
the analysis. An added plus is that most of the data sources we describe
are available as free downloads.
Paul Gorrell, IMPAQ International
A SAS® license of any organization consists of a variety of
SAS components such as SAS/STAT®, SAS/GRAPH®,
SAS/OR®, and so on. SAS administrators do not have any
automated tool supplied with Base SAS® software to find how
many licensed copies are being actively used, how many SAS users are
actively utilizing the SAS server, and how many SAS datasets are being
referenced. These questions help a SAS administrator to take important
decisions such as controlling SAS licenses, removing inactive SAS users,
purging long-time non-referenced SAS data sets, and so on. With the help
of a system parameter that is provided by SAS and called RTRACE, these
questions can be answered. The goal of this paper is to explain the setup
of the RTRACE parameter and to explain its use in making the SAS
administrator s life easy. This paper is based on SAS® 9.2
running on AIX operating system.
Airaha Chelvakkanthan Manickam, Cognizant Technology Solutions
Sampling is widely used in different fields for quality control,
population monitoring, and modeling. However, the purposes of sampling
might be justified by the business scenario, such as legal or compliance
needs. This paper uses one probability sampling method, stratified
sampling, combined with quality control review business cost to determine
an optimized procedure of sampling that satisfies both statistical
selection criteria and business needs. The first step is to determine the
total number of strata by grouping the strata with a small number of
sample units, using box-and-whisker plots outliers as a whole. Then, the
cost to review the sample in each stratum is justified by a corresponding
business counter-party, which is the human working hour. Lastly, using the
determined number of strata and sample review cost, optimal allocation of
predetermined total sample population is applied to allocate the sample
into different strata.
Yi Du, Freddie Mac
The power of social media has increased to such an extent that businesses
that fail to monitor consumer responses on social networking sites are now
clearly at a disadvantage. In this paper, we aim to provide some insights
on the impact of the Digital Rights Management (DRM) policies of Microsoft
and the release of Xbox One on their customers' reactions. We have
conducted preliminary research to compare the basic text mining
capabilities of SAS® and R, two very diverse yet powerful
tools. A total of 6,500 Tweets were collected to analyze the impact of the
DRM policies of Microsoft. The Tweets were segmented into three groups
based on date: before Microsoft announced its Xbox One policies (May 18 to
May 26), after the policies were announced (May 27 to June 16), and after
changes were made to the announced policies (June 16 to July 1). Our
results suggest that SAS works better than R when it comes to extensive
analysis of textual data. In our following work, customers reactions to
the release of Xbox One will be analyzed using SAS®
Sentiment Analysis Studio. We will collect Tweets on Xbox posted before
and after the release of Xbox One by Microsoft. We will have two
categories, Tweets posted between November 15 and November 21 and those
posted between November 22 and November 29. Sentiment analysis will then
be performed on these Tweets, and the results will be compared between the
two categories.
Aditya Datla, Oklahoma State University
Reshma Palangat, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Paper SAS051-2014:
Ask Vince: Moving SAS® Data and Analytical Results to
Microsoft Excel
This presentation is an open-ended discussion about techniques for
transferring data and analytical results from SAS® to
Microsoft Excel. There will be some introductory comments, but this
presentation does not have any set content. Instead, the topics discussed
are dictated by attendee questions. Come prepared to ask and get answers
to your questions. To submit your questions or suggestions for discussion
in advance, go to http://support.sas.com/surveys/askvince.html.
Vince DelGobbo, SAS
Most marketers today are trying to use Facebook s network of 1.1 billion
plus registered users for social media marketing. Local television
stations and newspapers are no exception. This paper investigates what
makes a post effective. A Facebook page that is owned by a brand has fans,
or people who like the page and follow the stories posted on that page.
The posts on a brand page, however, do not appear on all the fans News
Feeds. This is determined by EdgeRank, a Facebook proprietary algorithm
that determines what content users see and how it s prioritized on their
News Feed. If marketers can understand how EdgeRank works, then they can
develop more impactful posts and ultimately produce more effective social
marketing using Facebook. The objective of this paper is to find the
characteristics of a Facebook post that enhance the efficacy of a news
outlet s page among their fans using Facebook Power Ratio as the target
variable. Power Ratio, a surrogate to EdgeRank, was developed by experts
at Frank N. Magid Associates, a research-based media consulting firm.
Seventeen variables that describe the characteristics of a post were
extracted from more than 8,000 posts, which were encoded by 10 media
experts at Magid. Numerous models were built and compared to predict Power
Ratio. The most useful model is a polynomial regression with the top three
important factors as whether a post asks fans to like the post, content
category of a post (including news, weather, etc.), and number of fans of
the page.
Dinesh Yadav Gaddam, Oklahoma State University
Yogananda Domlur Seetharama, Oklahoma State University
The Challenge: assigning outbound calling agents in a telemarketing
campaign to geographic districts. The districts have a variable number of
leads, and each agent needs to be assigned entire districts with the total
number of leads being as close as possible to a specified number for each
of the agents (usually, but not always, an equal number). In addition,
there are constraints concerning the distribution of assigned districts
across time zones in order to maximize productivity and availability. Our
Solution: use the SAS/OR® procedure PROC CLP to formulate
the challenge as a constraint satisfaction problem (CSP) since the
objective is not necessarily to minimize a cost function, but rather to
find a feasible solution to the constraint set. The input consists of the
number of agents, the number of districts, the number of leads in each
district, the desired number of leads per agent, the amount by which the
actual number of leads can differ from the desired number, and the time
zone for each district.
Kevin Gillette, Accenture
Stephen Sloan, Accenture
Many different neuroscience researchers have explored how various parts of
the brain are connected, but no one has performed association mining using
brain data. In this study, we used SAS® Enterprise Miner™
7.1 for association mining of brain data collected by a 14-channel EEG
device. An application of the association mining technique is presented in
this novel context of brain activities and by linking our results to
theories of cognitive neuroscience. The brain waves were collected while a
user processed information about Facebook, the most well-known social
networking site. The data was cleaned using Independent Component Analysis
via an open source MATLAB package. Next, by applying the LORETA algorithm,
activations at every fraction of the second were recorded. The data was
codified into transactions to perform association mining. Results showing
how various parts of brain get excited while processing the information
are reported. This study provides preliminary insights into how brain wave
data can be analyzed by widely available data mining techniques to enhance
researcher s understanding of brain activation patterns.
Pankush Kalgotra, Oklahoma State University
Ramesh Sharda, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Suicidal tendency among adolescent girls is a big challenge for the
present day society. The main goal of this paper is to find associations
of the various socio-emotional factors to suicidal tendencies among
adolescent girls in the United States. The data were obtained from the
National Longitudinal Study of Adolescent Health to explore social
behavior among adolescents. The observations from National Longitudinal
Study of Adolescent Health represent a nationally representative sample of
adolescents in grades 7 through 12 in the U.S. Students in each school
were stratified by grade and sex. About 17 students were randomly chosen
from each stratum so that a total of approximately 200 adolescents were
selected from each of the 80 pairs of schools. The public access sample
includes 6,504 adolescents. Models built via multiple regressions are used
to find the association between depression, religious affiliation, and
suicidal tendency. ANOVA and chi-square tests are conducted to confirm the
association of religious affiliation, depression to suicidal tendency.
Sesha Sai Ega, Oklahoma State University
Chandra Shekar Pulipati, Oklahoma State University
Venkata Rachapudi, Oklahoma State University
With a growing enterprise analytics environment that comprises global
users and a variety of sensitive data sources, a system administrator is
faced with the challenge of knowing who logs into the system, how often,
and what applications and what data sources are being consumed. This
information is necessary for auditing the consumers of data as well as for
monitoring the growth of data sources for hardware expansion. With the use
of SAS® Audit, Performance and Measurement Package, along
with some additional middle-tier logging and SAS® code,
information about the major consumers of the environment can be loaded
into LASR tables and analyzed with SAS® Visual Analytics
reporting tools.
Dan Lucas, SAS
Brandon Kirk, SAS
Do you have an abstract for an idea that you want to submit as a proposal
to SAS® conferences, but you are not sure which section is
the most appropriate one? In this paper, we discuss a methodology for
automatically identifying the most suitable section or sections for your
proposal content. We use SAS® Text Miner 12.1 and SAS®
Content Categorization Studio 12.1 to develop a rule-based categorization
model. This model is used to automatically score your paper abstract to
identify the most relevant and appropriate conference sections to submit
to for a better chance of acceptance.
Goutam Chakraborty, Oklahoma State University
Murali Pagolu, SAS
In our previous work, we often needed to perform large numbers of
repetitive and data-driven post-campaign analyses to evaluate the
performance of marketing campaigns in terms of customer response. These
routine tasks were usually carried out manually by using Microsoft Excel,
which was tedious, time-consuming, and error-prone. In order to improve
the work efficiency and analysis accuracy, we managed to automate the
analysis process with SAS® programming and replace the
manual Excel work. Through the use of SAS macro programs and other
advanced skills, we successfully automated the complicated data-driven
analyses with high efficiency and accuracy. This paper presents and
illustrates the creative analytical ideas and programming skills for
developing the automatic analysis process, which can be extended to apply
in a variety of business intelligence and analytics fields.
Justin Jia, Canadian Imperial Bank of Commerce (CIBC)
Amanda Lin, Bell Canada
This is a simple macro that will examine all fields in a SAS®
dataset that are stored as character data to see if they contain real
character data or if they could be converted to numeric. It then performs
the conversion and reports in the log the names of any fields that could
not be converted. This allows the truly numeric data to be analyzed by
PROC MEANS or PROC UNIVARIATE. It makes use of the SAS dictionary tables,
the SELECT INTO syntax, and the ANYALPHA function.
Andrea Wainwright-Zimmerman, Capital One
The creation of production reports for our organization has historically
been a labor-intensive process. Each month, our team produced around 650
SAS® graphs and 30 tables which were then copied and pasted
into 16 custom Microsoft PowerPoint presentations, each between 20 and 30
pages. To reduce the number of manual steps, we converted to using stored
processes and the SAS® Add-In for Microsoft Office. This
allowed us to simply refresh those 16 PowerPoint presentations by using
SAS Add-In for Microsoft Office to run SAS® Stored
Processes. SAS Stored Processes generates the graphs and tables while SAS
Add-In for Microsoft Office refreshes the document with updated graphs
already sized and positioned on the slides just as we need them. With this
new process, we are realizing the dream of reducing the amount of time
spent on a single monthly production process. This paper will discuss the
steps to creating a complex PowerPoint presentation that is simply
refreshed rather than created new each month. I will discuss converting
the original code to stored processes using SAS® Enterprise
Guide®, options and style statements that are required to
continue to use a custom style sheet, and how to create the PowerPoint
presentation with an assortment of output types including horizontal bar
charts, control charts, and tables. I will also discuss some of the
challenges and solutions specific to the stored process and PowerPoint
Add-In that we encountered during this conversion process.
Julie VanBuskirk, Baylor Health Care System
This paper kicks off a project to write a comprehensive book of best
practices for documenting SAS® projects. The presenter s
existing documentation styles are explained. The presenter wants to
discuss and gather current best practices used by the SAS user community.
The presenter shows documentation styles at three different levels of
scope. The first is a style used for project documentation, the second a
style for program documentation, and the third a style for variable
documentation. This third style enables researchers to repeat the modeling
in SAS research, in an alternative language, or conceptually.
Peter Timusk, Statistics Canada
As IT professionals, saving time is critical. Delivering timely and
quality-looking reports and information to management, end users, and
customers is essential. SAS® provides numerous 'canned'
PROCedures for generating quick results to take care of these needs ...
and more. In this hands-on workshop, attendees acquire basic insights into
the power and flexibility offered by SAS PROCedures using PRINT, FORMS,
and SQL to produce detail output; FREQ, MEANS, and UNIVARIATE to summarize
and create tabular and statistical output; and data sets to manage data
libraries. Additional topics include techniques for informing SAS which
data set to use as input to a procedure, how to subset data using a WHERE
statement (or WHERE= data set option), and how to perform BY-group
processing.
Kirk Paul Lafler, Software Intelligence Corporation
There is an ever-increasing number of study designs and analysis of
clinical trials using Bayesian frameworks to interpret treatment effects.
Many research scientists prefer to understand the power and probability of
taking a new drug forward across the whole range of possible true
treatment effects, rather than focusing on one particular value to power
the study. Examples are used in this paper to show how to compute Bayesian
probabilities using the SAS/STAT® MIXED procedure and
UNIVARIATE procedure. Particular emphasis is given to the application on
efficacy analysis, including the comparison of new drugs to placebos and
to standard drugs on the market.
Howard Liang, inVentiv health Clinical
As complicated as the macro language is to learn, there are very strong
reasons for doing so. At its heart, the macro language is a code
generator. In its simplest uses, it can substitute simple bits of code
like variable names and the names of data sets that are to be analyzed. In
more complex situations, it can be used to create entire statements and
steps based on information may even be unavailable to the person writing
or even executing the macro. At the time of execution, it can be used to
make queries of the SAS® environment as well as the
operating system, and utilize the gathered information to make informed
decisions about how it is to further function and execute.
Art Carpenter, California Occidental Consultants
Because the macro language is primarily a code generator, it makes sense
that the code that it creates must be generated before it can be executed.
This implies that execution of the macro language comes first. Simple as
this is in concept, timing issues and conflicts are often not so simple to
recognize in application. As we use the macro language to take on more
complex tasks, it becomes even more critical that we have an understanding
of these issues.
Art Carpenter, California Occidental Consultants
Macro variables and their values are stored in symbol tables, which in
turn are held in memory. Not only are there are a number of ways to create
macro variables, but they can be created in a wide variety of situations.
How they are created and under what circumstances effects the variable s
scope how and where the macro variable is stored and retrieved. There are
a number of misconceptions about macro variable scope and about how the
macro variables are assigned to symbol tables. These misconceptions can
cause problems that the new, and sometimes even the experienced, macro
programmer does not anticipate. Understanding the basic rules for macro
variable assignment can help the macro programmer solve some of these
problems that are otherwise quite mystifying.
Art Carpenter, California Occidental Consultants
Paper 2622-2014:
Best Practices for Deploying SAS on Red Hat Enterprise Linux
The number of SAS deployments on Red Hat Enterprise Linux (RHEL) continues
to increase in recent years because more and more customers have found
RHEL to be the best price/performance choice for new and/or updated SAS
deployments on x86 systems. Back for the fourth year at SGF, Barry will
share new performance findings and best practices for deploying SAS on Red
Hat Enterprise Linux and will discuss topics such as virtualization, GFS2
shared file system, SAS Grid Manager, and more. This session will be
beneficial for SAS customers interested in deploying on Red Hat Enterprise
Linux, or existing SAS-on-RHEL customers who want to get more out of their
deployments.
Barry Marson, Red Hat
There are many components that make up the middle tier and server tier of
a SAS® 9.4 deployment. There is also a variety of
technologies that can be used to provide high availability of these
components. This paper focuses on a small set of best practices
recommended by SAS for a consistent high-availability strategy across the
entire SAS 9.4 platform. We focus on two technologies: clustering, as well
as the high-availability features of SAS® Grid Manager. For
the clustering, we detail newly introduced clustering capabilities in SAS
9.4 such as the middle-tier SAS® Web Application Server and
the server-tier SAS® metadata clusters. We also introduce
the small, medium, and large deployment scenarios or profiles, which make
use of each of these technologies. These deployment scenarios reflect the
typical customer's environment and address their high availability,
performance, and scalability requirements.
Cheryl Doninger, SAS
Zhiyong Li, SAS
Bryan Wolfe, SAS
This paper provides an overview of how to create a SAS®
Enterprise Guide® process that is well designed, simple,
documented, automated, modular, efficient, reliable, and easy to maintain.
Topics include how to organize a SAS Enterprise Guide process, how to best
document in SAS Enterprise Guide, when to leverage point-and-click
functionality, and how to automate and simplify SAS Enterprise Guide
processes. This paper has something for any SAS Enterprise Guide user, new
or experienced!
Jennifer First-Kluge, Systems Seminar Consultants
Steven First, Systems Seminar Consultants
You can provide access and visibility to SAS® BI Dashboards,
SAS® Stored Processes, and SAS® Visual
Analytics through the use of SAS® Web Parts for Microsoft
SharePoint. In many organizations, the administrators who are responsible
for SharePoint and SAS® are different. This paper provides
best practices for the deployment of SAS Web Parts for Microsoft
SharePoint. Bridging the gap between SharePoint and SAS is especially
important for people who are not familiar with SharePoint administration.
This paper also provides tips for co-existence between SAS Web Parts for
Microsoft SharePoint 6.1 and 5.1. (The 5.1 release is available in SAS®
9.3. The 6.1 release is available in SAS® 9.4.) Finally,
this paper provides some guidance on DNS, permissions, and installation
techniques the fine points that make or break your deployment!
Randy Mullis, SAS
The scheduling of surgical operations in a hospital is a complex problem,
with each surgical specialty having to satisfy their demand while
competing for resources with other hospital departments. This project
extends the construction of a weekly timetable, the Master Surgery
Schedule, which assigns surgical specialties to operating theater sessions
by taking into account the post-surgery resource requirements, primarily
post-operative beds on hospital wards. Using real data from the largest
teaching hospital in Wales, UK, this paper describes how SAS®
has been used to analyze large data sets to investigate the relationship
between the operating theater schedule and the demand for beds on wards in
the hospital. By understanding this relationship, a more well-informed and
robust operating theater schedule can be produced that delivers economic
benefit to the hospital and a better experience for the patients by
reducing the number of cancelled operations caused by the unavailability
of beds on hospital wards.
Elizabeth Rowse, Cardiff University
Paul Harper, Cardiff University
SAS® Visual Analytics and the SAS® LASR™
Analytic Server provide many capabilities to analyze data fast. Depending
on your organization, data can be loaded as a self-service operation. Or,
your data can be large and shared with many people. And, as data gets
large, effectively loading it and keeping it updated become important.
This presentation discusses the range of data scenarios from self-service
spreadsheets to very large databases, from single-subject data to large
star schema topologies, and from single-use data to continually updated
data that requires high levels of resilience and monitoring. Fast and easy
access to big data is important to empower your organization to make
better business decisions. Understanding how to have a responsive and
reliable data tier on which to make these decisions is within your reach.
Gary Mehler, SAS
Donna Bennett, SAS
Paper 2069-2014:
Big Data and Data Governance: Managing Expectations for Rampant
Information Consumption
Rapid adoption of high-performance scalable systems supporting big data
acquisition, absorption, management, and analysis has exposed a potential
gap in asserting governance and oversight over the information production
flow. In a traditional organizational data management environment,
business rules encompassed within data controls can be used to govern the
creation and consumption of data across the enterprise. However, as more
big data analytics applications absorbing massive data sets from external
sources whose creation points are far removed from their various
repurposed uses, the ability to control the production of data must give
way to a different kind of data governance. In this talk, we discuss a
rational approach to scoping data governance for big data. Instead of
presuming the ability to validate and cleanse data prior to its loading
into the analytical platform, we will explore pragmatic expectations and
measures for scoring data utility and believability within the context of
big data analytics applications. Attendees will learn about: Expectations
for data variation The impact of variance on analytical results Focusing
on the quality of your data management processesand infrastructure
Measurements for data usability
David Loshin, Knowledge Integrity, Inc.
Paper 2482-2014:
Big Data at the Speed of Business - IBM's Big Data Platform
IBM is unique in having developed enterprise class big data software and
systems that allow you to address the full spectrum of big data business
challenges. The centerprice of this strategy is IBM InfoSphere
BigInsights, which brings the power of Hadoop to the enterprise and
reliably manages large volumes of structured and unstructured data.
BigInsights makes it simpler for people to use Hadoop and build big data
applications. It enhances this open source technology to withstand the
demands of your enterprise, adding administrative, discovery, development,
provisioning, and security features. Attend this session and find out how
IBM platforms and SAS software can deliver these capabilities for you.
Marc Andrews, IBM
The emerging discipline of data governance encompasses data quality
assurance, data access and use policy, security risks and privacy
protection, and longitudinal management of an organization s data
infrastructure. In the interests of forestalling another bureaucratic
solution to data governance issues, this presentation features database
programming tools that provide rapid access to big data and make selective
access to and restructuring of metadata practical.
Sigurd Hermansen, Westat
In many organizations the amount of data we deal with increases far faster
than the hardware and IT infrastructure to support them. As a result, we
encounter significant bottlenecks and I/O bound processes. However, clever
use of SAS® software can help us find a way around. In this
paper we will look at the clever use of PROC OLAP to show you how to
address I/O bound processing spread I/O traffic to different servers to
increase cube building efficiency. This paper assumes experience with
SAS® OLAP Cube Studio and/or PROC OLAP.
Yunbo (Jenny) Sun, Canada Post
Michael Brule, SAS
Digital data has manifested into a classic BIG DATA challenge for
marketers who want to push past the retroactive analysis limitations of
traditional web analytics. The current groundswell of digital device
adoption and variety of digital interactions grows larger year after year.
The opportunity for 'digital intelligence' has arrived, as traditional web
analytic techniques were not designed for the breadth of channels,
devices, and pace that fuels consumer experiences. In parallel, today's
landscape for data visualization, advanced analytics, and our ability to
process very large amounts of multi-channel information is changing. The
democratization of analytics for the masses is upon us, and marketers have
the oppourtunity to take advantage of descriptive, predictive, and (most
importantly) prescriptive data-driven insights. This presentation
describes how organizations can use SAS® products,
specifically SAS® Visual Analytics and SAS®
Adaptive Customer Experience, to overcome the limitations of web
analytics, and support data-driven integrated marketing objectives.
Suneel Grover, SAS
Simply using an ODS destination to replay PROC CONTENTS output does not
provide the user with attractive, usable metadata. Harness the power of
SAS® and ODS output objects to create designer multi-tab
metadata workbooks with the click of a mouse!
Louise Hadden, Abt Associates Inc.
In the credit card industry, there is a group of people who use credit
cards as an interest-free loan by transferring their balances between
cards during 0% balance transfer (BT) periods in order to avoid paying
interest. These people are called gamers. Gamers generate losses for banks
due to their behavior of paying no interest and having no purchases. It is
hard to use traditional ways, such as risk scorecards, to identify them
since gamers tend to have very good credit histories. This paper uses
Naive Bayes classifier to classify gamers into three segments, according
to the proportion of gamers. Using this model, the targeting policy and
underwriting policy can be significantly improved and the function of
tracking the proportion of gamers in population can be realized. This
result has been accomplished by using logistic regression in SAS®
combined with a Microsoft Excel pivot table. The procedure is described in
detail in this paper.
Yang Ge, Lancaster University
Discover how SAS® leverages field marketing programs to
support AllAnalytics.com, a sponsored third-party community. This paper
explores the use of SAS software, including SAS® Enterprise
Guide®, SAS® Customer Experience Analytics,
and SAS® Marketing Automation to enable marketers to have
better insight, better targeting, and better response from SAS programs.
Julie Chalk, SAS
Kristine Vick, SAS
Paper 2162-2014:
Building a Business Justification
At last you found it, the perfect software solution to meet the demands of
your business. Now, how do you convince your company and its leaders to
spend the money to bring it in-house and reap the business benefit? How do
you build a compelling business justification? How should you
think-through, communicate, and overcome this common business roadblock?
Join Scott Sanders, a business and IT veteran who has effectively handled
this challenge on numerous occasions. Hear some of his best practices and
review some of his effective methods of approach, as he goes through his
personal experience in building a business case and shepherding it through
to final approval.
Scott Sanders, Sears Holdings
The Affordable Care Act (ACA) contains provisions that have stimulated
interest in analytics among health care providers, especially those
provisions that address quality of outcomes. High Impact Technologies
(HIT) has been addressing these issues since before passage of the ACA and
has a Health Care Data Model recognized by Gartner and implemented at
several health care providers. Recently, HIT acquired SAS®
Visual Analytics, and this paper reports our successful efforts to use SAS
Visual Analytics for visually exploring Big Data for health care
providers. Health care providers can suffer significant financial
penalties for readmission rates above a certain threshold and other
penalties related to quality of care. We have been able to use SAS Visual
Analytics, coupled with our experience gained from implementing the HIT
Healthcare Data Model at a number of Healthcare providers, to identify
clinical measures that are significant predictors for readmission. As a
result, we can help health care providers reduce the rate of 30-day
readmissions.
Joe Whitehurst, High Impact Technologies
Diane Hatcher, SAS
Inference of variance components in linear mixed effect models (LMEs) is
not always straightforward. I introduce and describe a flexible SAS®
macro (%COVTEST) that uses the likelihood ratio test (LRT) to test
covariance parameters in LMEs by means of the parametric bootstrap. Users
must supply the null and alternative models (as macro strings), and a data
set name. The macro calculates the observed LRT statistic and then
simulates data under the null model to obtain an empirical p-value. The
macro also creates graphs of the distribution of the simulated LRT
statistics. The program takes advantage of processing accomplished by PROC
MIXED and some SAS/IML® functions. I demonstrate the syntax
and mechanics of the macro using three examples.
Peter Ott, BC Ministry of Forests, Lands & NRO
The use of Cohen s kappa has enjoyed a growing popularity in the social
sciences as a way of evaluating rater agreement on a categorical scale.
The kappa statistic can be calculated as Cohen first proposed it in his
1960 paper or by using any one of a variety of weighting schemes. The most
popular among these are the linear weighted kappa and the quadratic
weighted kappa. Currently, SAS® users can produce the kappa
statistic of their choice through PROC FREQ and the use of relevant AGREE
options. Complications arise however when the data set does not contain a
completely square cross-tabulation of data. That is, this method requires
that both raters have to have at least one data point for every available
category. There have been many solutions offered for this predicament.
Most suggested solutions include the insertion of dummy records into the
data and then assigning a weight of zero to those records through an
additional class variable. The result is a multi-step macro, extraneous
variable assignments, and potential data integrity issues. The author
offers a much more elegant solution by producing a segment of code which
uses brute force to calculate Cohen s kappa as well as all popular
variants. The code uses nested PROC SQL statements to provide a single
conceptual step which generates kappa statistics of all types even those
that the user wishes to define for themselves.
Matthew Duchnowski, Educational Testing Service (ETS)
A case control study is in its most basic form comparing a case series to
a matched control series and are commonly implemented in the field of
public health. While matching is intended to eliminate confounding, the
main potential benefit of matching in case control studies is a gain in
efficiency. There are many known methods for selecting potential match or
matches (in case of 1:n studies) per case, the most prominent being
distance-based approach and matching on propensity scores. In this paper,
we will go through both and compare their results and will present a macro
capable of performing both.
Lovedeep Gondara, BC Cancer Agency
Colleen Mcgahan, BC Cancer Agency
This paper demonstrates the new case-level residuals in the CALIS
procedure and how they differ from classic residuals in structural
equation modeling (SEM). Residual analysis has a long history in
statistical modeling for finding unusual observations in the sample data.
However, in SEM, case-level residuals are considerably more difficult to
define because of 1) latent variables in the analysis and 2) the
multivariate nature of these models. Historically, residual analysis in
SEM has been confined to residuals obtained as the difference between the
sample and model-implied covariance matrices. Enhancements to the CALIS
procedure in SAS/STAT® 12.1 enable users to obtain
case-level residuals as well. This enables a more complete residual and
influence analysis. Several examples showing mean/covariance residuals and
case-level residuals are presented.
Catherine Truxillo, SAS
Often in a clinical trial, measures are needed to describe pain,
discomfort, or physical constraints that are visible but not measurable
through lab tests or other vital signs. In these cases, researchers turn
to questionnaires to provide documentation of improvement or statistically
meaningful change in support safety and efficacy hypotheses. For example,
in studies (like Parkinson s studies) where pain or depression are serious
non-motor symptoms of the disease, these questionnaires provide primary
endpoints for analysis. Questionnaire data presents unique challenges in
both collection and analysis in the world of CDISC standards. The
questions are usually aggregated into scale scores, as the underlying
questions by themselves provide little additional usefulness. SAS®
is a powerful tool for extraction of the raw data from the collection
databases and transposition of columns into a basic data structure in
SDTM, which is vertical. The data is then processed further as per the
instructions in the Statistical Analysis Plan (SAP). This involves
translation of the originally collected values into sums, and the values
of some questions need to be reversed. Missing values can be computed as
means of the remaining questions. These scores are then saved as new rows
in the ADaM (analysis-ready) data sets. This paper describes the types of
questionnaires, how data collection takes place, the basic CDISC rules for
storing raw data in SDTM, and how to create analysis data sets with
derived records using ADaM standards, while maintaining traceability to
the original question.
Karin LaPann, PRA International
Terek Peterson, PRA International
Usually, log files are checked by users only when SAS®
completes the execution of programs. If SAS finds any errors in the
current line, it skips the current step and executes the next line. The
process is completed only at the execution complete program. There are a
few programs that will take more than a day to complete. In this case, the
user opens the log file in Read-Only mode frequently to check for errors,
warnings, and unexpected notes and terminates the execution of the program
manually if any potential messages are identified. Otherwise, the user
will be notified with the errors in the log file only at the end of the
execution. Our suggestion is to run the parallel utility program along
with the production program to check the log file of the currently running
program and to notify the user through an e-mail when an error, warning,
or unexpected note is found in the log file. Also, the execution can be
terminated automatically and the user can be notified when potential
messages are identified.
Harun Rasheed, Cognizant Technology Solutions
Amarnath Vijayarangan, Genpact
The life of a SAS® program can be broken down into sets of
changes made over time. Programmers are generally focused on the future,
but when things go wrong, a look into the past can be invaluable.
Determining what changes were made, why they were made, and by whom can
save both time and headaches. This paper discusses version control and the
current options available to SAS® Enterprise Guide®
users. It then highlights the upcoming Program History feature of SAS
Enterprise Guide. This feature enables users to easily track changes made
to SAS programs. Properly managing the life cycle of your SAS programs
will enable you to develop with peace of mind.
Joe Flynn, SAS
Casey Smith, SAS
Alex Song, SAS
Can clustering discharge records improve a predictive model s overall fit
statistic? Do the predictors differ across segments? This talk describes
the methods and results of data mining pediatric IBD patient records from
my Analytics 2013 poster. SAS® Enterprise Miner™
12.1 was used to segment patients and model important predictors for the
length of hospital stay using discharge records from the national Kid s
Inpatient Database. Profiling revealed that patient segments were
differentiated by primary diagnosis, operating room procedure indicator,
comorbidities, and factors related to admission and disposition of
patient. Cluster analysis of patient discharges improved the overall
average square error of predictive models and distinguished predictors
that were unique to patient segments.
Linda Schumacher, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
When a SAS® user asked for help scanning words in textual
data and then matching them to pre-scored keywords, it struck a chord with
SAS programmers! They contributed code that solved the problem using hash
structures, SQL, informats, arrays, and PRX routines. Of course, the next
question was which program is fastest! This paper compares the different
approaches and evaluates the performance of the programs on varying
amounts of data. The code for each program is provided to show how SAS has
a variety of tools available to solve common problems. While this won t
make you an expert on any of these programming techniques, you ll see each
of them in action on a common problem.
Tom Kari, Tom Kari Consulting
The graphical display of the individual data is important in understanding
the raw data and the relationship between the variables in the data set.
You can explore your data to ensure statistical assumptions hold by
detecting and excluding outliers if they exist. Since you can visualize
what actually happens to individual subjects, you can make your
conclusions more convincing in statistical analysis and interpretation of
the results. SAS® provides many tools for creating graphs of
individual data. In some cases, multiple tools need to be combined to make
a specific type of graph that you need. Examples are used in this paper to
show how to create graphs of individual data using the SAS®
ODS Graphics procedures (SG procedures).
Howard Liang, inVentiv health Clinical
This paper describes a method that uses some simple SAS®
macros and SQL to merge data sets containing related data that contains
rows with varying effective date ranges. The data sets are merged into a
single data set that represents a serial list of snapshots of the merged
data, as of a change in any of the effective dates. While simple
conceptually, this type of merge is often problematic when the effective
date ranges are not consecutive or consistent, when the ranges overlap, or
when there are missing ranges from one or more of the merged data sets.
The technique described was used by the Fairfax County Human Resources
Department to combine various employee data sets (Employee Name and
Personal Data, Personnel Assignment and Job Classification, Personnel
Actions, Position-Related data, Pay Plan and Grade, Work Schedule,
Organizational Assignment, and so on) from the County's SAP-HCM ERP system
into a single Employee Action History/Change Activity file for historical
reporting purposes. The technique currently is used to combine fourteen
data sets, but is easily expandable by inserting a few lines of code using
the existing macros.
James Moon, County of Fairfax, Virginia
Missing data commonly occurs in medical, psychiatry, and social
researches. The SAS® MI and MIANALYZE procedures are often
used to generate multiple imputations and then provide valid statistical
inferences based on them. However, MIANALYZE is not applicable to combine
type-III analyses obtained using multiple imputed data sets. In this
manuscript, we write a macro to combine the type-III analyses generated
from the SAS MIXED procedure based on multiple imputations. The proposed
method can be extended to other procedures reporting type-III analyses,
such as GENMOD and GLM.
Binhuan Wang, New York University School of Medicine
Yixin Fang, New York University School of Medicine
Man Jin, Forest Research Institute
Graphic software users are confronted with what I call Options
Over-Choice, and with defaults that are designed to easily give you a
result, but not necessarily the best result. This presentation and paper
focus on guidelines for communication-effective data visualization. It
demonstrates their practical implementation, using graphic examples likely
to be adaptable to your own work. Code is provided for the examples.
Audience members will receive the latest update of my tip sheet compendium
of graphic design principles. The examples use SAS® tools
(traditional SAS/GRAPH® or the newer ODS graphics procedures
that are available with Base SAS®), but the design
principles are really software independent. Come learn how to use data
visualization to inform and influence, to reveal and persuade, using tips
and techniques developed and refined over 34 years of working to get the
best out of SAS® graphic software tools.
LeRoy Bessler, Bessler Consulting and Research
There has been debate regarding which method to use to analyze repeated
measures continuous data when the design includes only two measurement
times. Five different techniques can be applied and give similar results
when there is little to no correlation between pre- and post-test
measurements and when data at each time point are complete: 1) analysis of
variance on the difference between pre- and post-test, 2) analysis of
covariance on the differences between pre- and post-test controlling for
pre-test, 3) analysis of covariance on post-test controlling for pre-test,
4) multiple analysis of variance on post- test and pre-test, and 5)
repeated measures analysis of variance. However, when there is missing
data or if a moderate to high correlation between pre- and post-test
measures exists under an intent-to-treat analysis framework, bias is
introduced in the tests for the ANOVA, ANCOVA, and MANOVA techniques. A
comparison of Type III sum of squares, F-tests, and p-values for a
complete case and an intent-to-treat analysis are presented. The analysis
using a complete case data set shows that all five methods produce similar
results except for the repeated measures ANOVA due to a moderate
correlation between pre- and post-test measures. However, significant bias
is introduced for the tests using the intent-to-treat data set.
J. Madison Hyer, Georgia Regents University
Jennifer Waller, Georgia Regents University
With the growth in size and complexity of organizations investing in
SAS® platform technologies, the size and complexity of ETL
subsystems and data integration (DI) jobs is growing at a rapid rate.
Developers are pushed to come up with new and innovative ways to improve
process efficiency in their DI jobs to meet increasingly demanding service
level agreements (SLAs). The ability to conditionally execute or switch
paths in a DI job is an extremely useful technique for improving process
efficiency. How can a SAS® Data Integration developer design
a job to best suit conditional execution? This paper discusses a technique
for providing a parameterized dynamic execution custom transformation that
can be easily incorporated into SAS® Data Integration Studio
jobs to provide process path switching capabilities. The aim of any data
integration task is to ensure that all sources of business data are
integrated as efficiently as possible. It is concerned with the
repurposing of data via transformation, should be a value-adding process,
and also should be the product of collaboration. Modularization of common
or repeatable processes is a fundamental part of the collaboration process
in DI design and development. Switch path a custom transformation built to
conditionally execute branches or nodes in SAS Data Integration Studio
provides a reusable module for solving the conditional execution
limitations of standard SAS Data Integration Studio transformations and
jobs. Switch Path logic in SAS Data Integration Studio can serve many
purposes in day-to-day business needs for a SAS data integration developer
as it is completely reusable
Prajwal Shetty, Tesco
Paper SAS2401-2014:
Confessions of a SAS® Dummy
People from all over the world are using SAS® analytics to
achieve great things, such as to develop life-saving medicines, detect and
prevent financial fraud, and ensure the survival of endangered species.
Chris Hemedinger is not one of those people. Instead, Chris has used SAS
to optimize his baby name selections, evaluate his movie rental behavior,
and analyze his Facebook friends. Join Chris as he reviews some of his
personal triumphs over the little problems in life, and learn how these
exercises can help to hone your skills for when it really matters.
Chris Hemedinger, SAS
Non-cognitive assessments, which measure constructs such as time
management, goal-setting, and personality, are becoming more prevalent
today in research within the domains of academic performance and workforce
readiness. Many instruments that are used for this purpose contain a large
number of items that can each be assigned to specific facets of the larger
construct. The factor structure of each instrument emerges from a mixture
of psychological theory and empirical research, often by doing exploratory
factor analysis (EFA) using the SAS® procedure PROC FACTOR.
Once an initial model is established, it is important to perform
confirmatory factor analysis (CFA) to confirm that the hypothesized model
provides a good fit to the data. If outcome data such as grades are
collected, structural equation modeling (SEM) should also be employed to
investigate how well the assessment predicts these measures. This paper
demonstrates how the SAS procedure PROC CALIS is useful for performing
confirmatory factor analysis and structural equation modeling. Examples of
these methods are demonstrated and proper interpretation of the fit
statistics and resulting output is illustrated.
Steven Holtzman, Educational Testing Service
If you have an existing SAS® Business Intelligence
environment and you want to add SAS® Visual Analytics, you
need to make some architectural choices. SAS Visual Analytics and SAS
Business Intelligence can share certain components, such as a SAS®
Metadata Server and the SAS® Web Infrastructure Platform.
Sharing metadata eliminates the need to create and maintain duplicate
information, and it enables your users to take advantage of functionality
that can be shared between SAS Visual Analytics and SAS Business
Intelligence. Sharing the SAS Web Infrastructure Platform enables SAS
middle-tier applications such as SAS® Visual Analytics
Services and SAS® Web Report Studio to communicate with each
other. Intended for SAS architects and administrators, this paper explores
supported architecture for SAS Visual Analytics and SAS Business
Intelligence. The paper then identifies areas where the architecture can
be shared as well as where resources should be kept separate. In addition,
the paper offers recommendations and other considerations to keep in mind
when you are managing shared resources.
Christine Vitron, SAS
James Holman, SAS
Paper SAS403-2014:
Consumer Research Tools
The big questions in consumer research lead to statistical methods
appropriate to them. 'What do consumers say?' is all about analyzing
surveys and finding relationships between preferences and background
attributes. 'What do consumers think? is about looking at higher-level
structures like preference mappings that can be derived from ratings.
'What will consumers pay?' is about conducting choice experiments to pin
down the way consumers trade off among features and with prices, with the
willingness to pay. 'How do you trigger purchases?' is about experiments
that determine which interventions work, and how to target them to
potential consumers, with uplift modeling. The SAS product JMP®
version 11 was released last fall with a new group of modeling tools to
address these and other questions in consumer research. Traditionally JMP
has specialized in engineering tools, but consumer research is an
important part of engineering, in product planning, to make sure you
produce the products with the attributes consumers want.
John Sall, SAS
The CDISC Study Data Tabulation Model (SDTM) provides a standardized
structure and specification for a broad range of human and animal study
data in pharmaceutical research, and is widely adopted in the industry for
the submission of the clinical trial data. Because SDTM requires
additional variables and datasets that are not normally available in the
clinical database, further programming is required to convert the clinical
database into the SDTM datasets. This presentation introduces the concept
and general requirements of SDTM, and the different approaches in the SDTM
data conversion process. The author discusses database design
considerations, implementation procedures, and SAS® macros
that can be used to maximize the efficiency of the process. The creation
of the metadata DEFINE.XML and the final STDM dataset validation are also
discussed.
Hong Chen, McDougall Scientific Ltd.
The INTCK function is used to obtain the number of time intervals between
two dates. The INTCK function comes with arguments and argument-modifiers
to enable us to perform a variety of date-related manipulations. This
paper deals with a real-time simple usage of the INTCK function to
calculate frequency of days of the week between the start and end day of a
trip. The INTCK function with its arguments can directly calculate the
number of days of the week as illustrated in this paper. The same usage of
the INTCK function using PROC SQL is also presented in this paper. All the
codes executed and presented in this paper involve Base SAS®
Release 9.3 only.
Jinson Erinjeri, D.K. Shifflet & Associates
SAS® Visual Analytics Designer enables you to create reports
with different layouts. There are several basic graph objects that you can
include in these reports. What if you wanted to create a report that
wasn't possible with one of the out-of-the-box graph objects? No worries!
The new SAS® Visual Analytics Graph Builder available from
the SAS® Visual Analytics home page lets you create a custom
graph object using built-in sample data. You can then include these graph
objects in SAS Visual Analytics Designer and generate reports using them.
Come see how you can create custom graph objects such as stock plots,
butterfly charts, and more. These custom objects can be easily shared with
others for use in SAS Visual Analytics Designer.
Ravi Devarajan, SAS
Himesh Patel, SAS
Pat Berryman, SAS
Lisa Everdyke, SAS
When submitting clinical data to the Food and Drug Administration (FDA),
besides the usual trials results, we need to submit the information that
helps the FDA to understand the data. The FDA has required the CDISC Case
Report Tabulation Data Definition Specification (Define-XML), which is
based on the CDISC Operational Data Model (ODM), for submissions using
Study Data Tabulation Model (SDTM). Electronic submission to the FDA is
therefore a process of following the guidelines from CDISC and FDA. This
paper illustrates how to create an FDA guidance compliant define.xml v2
from metadata by using SAS®.
Qinghua (Kathy) Chen, Exelixis Inc.
James Lenihan, Exelixis Inc.
The Census Bureau conducts the Common Core of Data surveys for the
National Center for Education Statistics annually. We have written SAS®
programs to automate the database documentation. We try to avoid including
hard-coded values in the programs. Thanks to a record layout spreadsheet,
the analysts can quickly update the survey metadata outside the SAS
programs. This paper explains how SAS can read the record layout
spreadsheet to create formats on the fly. The analysts can update the
values as changes occur over time without having to worry about writing
correct SAS syntax. Behind the scenes, SAS is using dictionary views,
macros, ODS OUTPUT, PROC TEMPLATE, PROC FORMAT, the ODS Report Writing
Interface, and RTF to create the desired results. This paper uses syntax
for SAS® 9.2, written for programmers at the intermediate
level.
Suzanne Dorinski, US Census Bureau
LaTeX is a free document creation package that is often used to create
journal articles. It provides the capability to create very specific
formatting and to write a wide variety of formulas. Using ODS, SAS®
can write documents to a LaTeX file, which can then be compiled through
LaTeX into PDF files. This paper briefly reviews the basic syntax and
options to produce these files. Then, we look at how to create a new
tagset to make changes to the standard ODS LaTeX templates to create the
non-gridded table appearance that is typically seen in journal articles.
We also explore how to write special characters and equations not
otherwise available through ODS LaTeX.
Steven Feder, Federal Reserve Board of Governors
Paper SAS050-2014:
Creating Multi-Sheet Microsoft Excel Workbooks with SAS®:
The Basics and Beyond Part 1
This presentation explains how to use Base SAS
®9 software to
create multi-sheet Microsoft Excel workbooks. You learn step-by-step
techniques for quickly and easily creating attractive multi-sheet Excel
workbooks that contain your SAS
® output using the ExcelXP
ODS tagset. The techniques can be used regardless of the platform on which
your SAS software is installed. You can even use them on a mainframe!
Creating and delivering your workbooks on-demand and in real time using
SAS server technology is discussed. Although the title is similar to
previous presentations by this author, this presentation contains new and
revised material not previously presented.
Download the ZIP file
Vince DelGobbo, SAS
The first table in many research journal articles is a statistical
comparison of demographic traits across study groups. It might not be
exciting, but it s necessary. And although SAS® calculates
these numbers with ease, it is a time-consuming chore to transfer these
results into a journal-ready table. Introducing the time-saving deluxe
%MAKETABLE SAS macro it does the heavy work for you. It creates a
Microsoft Word table of up to four comparative groups reporting t-tests,
chi-square, ANOVA, or median test results, including a p-value. You
specify only a one-line macro call for each line in the table, and the
macro takes it from there. The result is a tidily formatted journal-ready
Word table that you can easily include in a manuscript, report, or
Microsoft PowerPoint presentation. For statisticians and researchers
needing to summarize group comparisons in a table, this macro saves time
and relieves you from the drudgery of trying to make your output neat and
pretty. And after all, isn t that what we want computing to do for us?
Alan Elliott, Southern Methodist University
Patient safety in a neonatal intensive care unit (NICU) as in any hospital
unit is critically dependent on appropriate staffing. We used SAS®
Simulation Studio to create a discrete-event simulation model of a
specific NICU that can be used to predict the number of nurses needed per
shift. This model incorporates the complexities inherent in determining
staffing needs, including variations in patient acuity, referral patterns,
and length of stay. To build our model, the group first estimated
probability distributions for the number and type of patients admitted
each day to the unit. Using both internal and published data, the team
also estimated distributions for various NICU-specific patient
morbidities, including type and timing of each morbidity event and its
temporal effect on a patient s acuity. We then built a simulation model
that samples from these input distributions and simulates the flow of
individual patients through the NICU (consisting of critical-care and
step-down beds) over a one-year time period. The general basis of our
model represents a method that can be applied to any unit in any hospital,
thereby providing clinicians and administrators with a tool to rigorously
and quantitatively support staffing decisions. With additional
refinements, the use of such a model over time can provide significant
benefits in both patient safety and operational efficiency.
Chris DeRienzo, Duke University Medical Center
David Tanaka, Duke University Medical Center
Emily Lada, SAS
Phillip Meanor, SAS
Business Intelligence platforms provide a bridge between expert data
analysts and decision-makers and other end-users. But what do you do when
you can identify no system that meets both your needs and your budget? If
you are the Consolidated Data Analysis Center in the HHS Office of
Inspector General, you use SAS® Enterprise BI Server and the
SAS® Stored Process Web Application to build your own. This
presentation covers the inception, design, and implementation of the
PAYment by Geographic Area (PAYGAR) system, which uses only SAS®
Enterprise BI tools, namely the SAS Stored Process Web Application, PROC
GMAP, and HTML/JAVA embedded in a DATA step, to create an interactive
platform for presenting and exploring data that has a geographic
component. In particular, the presentation reviews how we created a system
of chained stored processes to enable a user to select the data to be
presented, navigate through different geographic levels, and display
companion reports related to the current data and geographic selections.
It also covers the creation of the HTML front-end that sits over and
manages the system. Throughout, the presentation emphasizes the
scalability of PAYGAR, which the SAS Stored Process Web Application
facilitates.
Scott Hutchison, HHS Office of Inspector General
John Venturini, Piper Enterprise Solutions
Energy companies that operate in a highly regulated environment and are
constrained in pricing flexibility must employ a multitude of approaches
to maintain high levels of customer satisfaction. Many investor-owned
utilities are just starting to embrace a customer-centric business model
to improve the customer experience and hold the line on costs while
operating in an inflationary business setting. Faced with these
challenges, it is natural for utility executives to ask: 'What drives
customer satisfaction, and what is the optimum balance between influencing
customer perceptions and improving actual process performance in order to
be viewed as a top-tier performer by our customers?' J.D. Power, for
example, cites power quality and reliability as the top influencer of
overall customer satisfaction. But studies have also shown that customer
perceptions of reliability do not always match actual reliability
experience. This apparent gap between actual and perceived performance
raises a conundrum: Should the utility focus its efforts and resources on
improving actual reliability performance or would it be better to
concentrate on influencing customer perceptions of reliability? How can
this conundrum be unraveled with an analytically driven approach? In this
paper, we explore how the design of experiment techniques can be employed
to help understand the relationship between process performance and
customer perception, thereby leading to important insights into the energy
customer equation and higher customer satisfaction!
Mark Konya, Ameren Missouri
Kathy Ball, SAS
In this new era of healthcare reform, health insurance companies have
heightened their efforts to pinpoint who their customers are, what their
characteristics are, what they look like today, and how this impacts
business in today s and tomorrow s healthcare environment. The passing of
the Healthcare Reform policies led insurance companies to focus and
prioritize their projects on understanding who the members in their
current population were. The goal was to provide an integrated single view
of the customer that could be used for retention, increased market share,
balancing population risk, improving customer relations, and providing
programs to meet the members' needs. By understanding the customer, a
marketing strategy could be built for each customer segment
classification, as predefined by specific attributes. This paper describes
how SAS® was used to perform the analytics that were used to
characterize their insured population. The high-level discussion of the
project includes regression modeling, customer segmentation, variable
selection, and propensity scoring using claims, enrollment, and
third-party psychographic data.
MaryAnne DePesquo, BlueCross BlueShield of Arizona
Merging or joining data sets is an integral part of the data consolidation
process. Within SAS®, there are numerous methods and
techniques that can be used to combine two or more data sets. We commonly
think that within the DATA step the MERGE statement is the only way to
join these data sets, while in fact, the MERGE is only one of numerous
techniques available to us to perform this process. Each of these
techniques has advantages, and some have disadvantages. The informed
programmer needs to have a grasp of each of these techniques if the
correct technique is to be applied. This paper covers basic merging
concepts and options within the DATA step, as well as a number of
techniques that go beyond the traditional MERGE statement. These include
fuzzy merges, double SET statements, and the use of key indexing. The
discussion will include the relative efficiencies of these techniques,
especially when working with large data sets.
Art Carpenter, California Occidental Consultants
The Washington D.C. aqueduct was completed in 1863, carrying desperately
needed clean water to its many residents. Just as the aqueduct was vital
and important to its residents, a lifeline if you will, so too is the
supply of data to the business. Without the flow of vital information,
many businesses would not be able to make important decisions. The task of
building my company s first dashboard was brought before us by our CIO;
the business had not asked for it. In this poster, I discuss how we were
able to bring fresh ideas and data to our business units by converting the
data they saw on a daily basis in reports to dashboards. The road to
success was long with plenty of struggles from creating our own business
requirements to building data marts, synching SQL to SAS®,
using information maps and SAS® Enterprise Guide®
projects to move data around, all while dealing with technology and other
I.T. team roadblocks. Then on to designing what would become our real-time
dashboards, fighting for SharePoint single sign-on, and, oh yeah, user
adoption. My story of how dashboards revitalized the business is a
refreshing tale for all levels.
Jennifer McBride, Virginia Credit Union
Cross-visit checks are a vital part of data cleaning for longitudinal
studies. The nature of longitudinal studies encourages repeatedly
collecting the same information. Sometimes, these variables are expected
to remain static, go away, increase, or decrease over time. This
presentation reviews the na ve and the better approaches at handling
one-variable and two-variable consistency checks. For a single-variable
check, the better approach features the new ALLCOMB function, introduced
in SAS® 9.2. For a two-variable check, the better approach
uses the .first pseudo-class to flag inconsistencies. This presentation
will provide you the tools to enhance your longitudinal data cleaning
process.
Lauren Parlett, Johns Hopkins University
With increased concern about privacy and simultaneous pressure to make
survey data available, statistical disclosure control (SDC) treatments are
performed on survey microdata to reduce disclosure risk prior to
dissemination to the public. This situation is all the more problematic in
the push to provide data online for immediate user query. Two SDC
approaches are data coarsening, which reduces the information collected,
and data swapping, which is used to adjust data values. Data coarsening
includes recodes, top-codes and variable suppression. Challenges related
to creating a SAS® macro for data coarsening include
providing flexibility for conducting different coarsening approaches, and
keeping track of the changes to the data so that variable and value labels
can be assigned correctly. Data swapping includes selecting target records
for swapping, finding swapping partners, and swapping data values for the
target variables. With the goal of minimizing the impact on resulting
estimates, challenges for data swapping are to find swapping partners that
are close matches in terms of both unordered categorical and ordered
categorical variables. Such swapping partners ensure that enough change is
made to the target variables, that data consistency between variables is
retained, and that the pool of potential swapping partners is controlled.
An example is presented using each algorithm.
Tom Krenzke, Westat
Katie Hubbell, Westat
Mamadou Diallo, Westat
Amita Gopinath, Westat
Sixia Chen, Westat
Having data that are consistent, reliable, and well linked is one of the
biggest challenges faced by financial institutions. The paper describes
how the SAS® Data Management offering helps to connect
people, processes, and technology to deliver consistent results for data
sourcing and analytics teams, and minimizes the cost and time involved in
the development life cycle. The paper concludes with best practices
learned from various enterprise data initiatives.
Anand Jagarapu, Arunam Technologies LLC
A revolution is taking place in the U.S. at both the national and state
level in the area of health care transparency. Large amounts of data on
the health of communities, the quality of health care providers, and the
cost of health care is being collected and is being made available by both
levels of government to a variety of stakeholders. The surfacing of this
data and the consumption of it by health care decision makers unfolds a
new opportunity to view, explore, and analyze health care data in novel
ways. Furthermore, this data provides the health care system an
opportunity to advance the achievement of the Triple Aim. Data
transparency will bring a sea change to the world of health care by
necessitating new ways of communicating information to end users such as
payers, providers, researchers, and consumers of health care. This paper
examines the information needs of public health care payers such as
Medicare and Medicaid, and discusses the convergence of health care and
data visualization in creating consumable health insights that will aid in
achieving cost containment, quality improvement, and increased
accessibility for populations served. Moreover, using claims data and
SAS® Visual Analytics, it examines how data visualization
can help identify the most critical insights necessary to managing
population health. If health care payers can analyze large amounts of
claims data effectively, they can improve service and care delivery to
their recipients.
Krisa Tailor, SAS
Euramax is a global manufacturer of precoated metals who relies on
analytics and data visualization for its decision making. Euramax has
deployed significant innovations in recent years. SAS®
Visual Analytics fits in the innovative culture of Euramax and its need
for information-based decision making. During this presentation, Peter
Wijers shares best practices of the implementation process and several
application areas.
Peter Wijers, Euramax Coated Products BV
Paper 2044-2014:
Dataset Matching and Clustering with PROC OPTNET
We used OPTNET to link hedge fund datasets from four vendors, covering
overlapping populations, but with no universal identifier. This quick tip
shows how to treat data records as nodes, use pairwise identifiers to
generate distance measures, and get PROC OPTNET to assign clusters of
records from all sources to each hedge fund. This proved to be far faster,
and easier, than doing the same task in PROC SQL.
Mark Keintz, Wharton Research Data Services
Debugging SAS® code contained in a macro can be frustrating
because the SAS error messages refer only to the line in the SAS log where
the macro was invoked. This can make it difficult to pinpoint the problem
when the macro contains a large amount of SAS code. Using a macro that
contains one small DATA step, this paper shows how to use the MPRINT and
MFILE options along with the fileref MPRINT to write just the SAS code
generated by a macro to a file. The 'de-macroified' SAS code can be easily
executed and debugged.
Bruce Gilsen, Federal Reserve Board
Your company s chronically overloaded SAS® environment,
adversely impacted user community, and the resultant lackluster
productivity have finally convinced your upper management that it is time
to upgrade to a SAS® grid to eliminate all the resource
problems once and for all. But after the contract is signed and
implementation begins, you as the SAS administrator suddenly realize that
your company-wide standard mode of SAS operations, that is, using the
traditional SAS® Display Manager on a server machine, runs
counter to the expectation of the SAS grid your users are now supposed to
switch to SAS® Enterprise Guide® on a PC. This
is utterly unacceptable to the user community because almost everything
has to change in a big way. If you like to play a hero in your little
world, this is your opportunity. There are a number of things you can do
to make the transition to the SAS grid as smooth and painless as possible,
and your users get to keep their favorite SAS Display Manager.
Houliang Li, HL SASBIPros Inc
The evolution of the mobile landscape has created a shift in the workforce
that now favors mobile devices over traditional desktops. Considering that
today's workforce is not always in the office or at their desks, new
opportunities have been created to deliver report content through
innovative mobile experiences. SAS® Mobile BI for both iOS
and Android tablets compliments the SAS® Visual Analytics
offering by providing anytime, anywhere access to reports containing
information that consumers need. This paper presents best practices and
tips on how to optimize reports for mobile users, taking into
consideration the constraints of limited screen real estate and
connectivity, as well as answers a few frequently asked questions.
Discover how SAS Mobile BI captures the power of mobile reporting to
prepare for the vast growth that is predicted in the future.
Peter Ina, SAS
Khaliah Cothran, SAS
New innovative, analytical techniques are necessary to extract patterns in
big data that have temporal and geo-spatial attributes. An approach to
this problem is required when geo-spatial time series data sets, which
have billions of rows and the precision of exact latitude and longitude
data, make it extremely difficult to locate patterns of interest The usual
temporal bins of years, months, days, hours, and minutes often do not
allow the analyst to have control of the precision necessary to find
patterns of interest. Geohashing is a string representation of
two-dimensional geometric coordinates. Time hashing is a similar
representation, which maps time to preserve all temporal aspects of the
date and time of the data into a one-dimensional set of data points.
Geohashing and time hashing are both forms of a Z-order curve, which maps
multidimensional data into single dimensions and preserves the locality of
the data points. This paper explores the use of a multidimensional Z-order
curve, combining both geohashing and time hashing, that is known as
geo-temporal hashing or space-time boxes using SAS®. This
technique provides a foundation for reducing the data into bins that can
yield new methods for pattern discovery and detection in big data.
Richard La Valley, Leidos
Abraham Usher, Human Geo Group
Don Henderson, Henderson Consulting Services
Paul Dorfman, Dorfman Consulting
Very often, there is a need to present the analysis output from SAS®
through web applications. On these occasions, it would make a lot of
difference to have highly interactive charts over static image charts and
graphs. Not only this is visually appealing, with features like zooming,
filtering, etc., it enables consumers to have a better understanding of
the output. There are a lot of charting libraries available in the market
which enable us to develop cool charts without much effort. Some of the
packages are Highcharts, Highstock, KendoUI, and so on. They are developed
in JavaScript and use the latest HTML5 components, and they also support a
variety of chart types such as line, spline, area, area spline, column,
bar, pie, scatter, angular gauges, area range, area spline range, column
range, bubble, box plot, error bars, funnel, waterfall, polar chart types
etc. This paper demonstrates how we can combine the data processing and
analytic powers of SAS with the visualization abilities of these charting
libraries. Since most of them consume JSON-formatted data, the emphasis is
on JSON producing capabilities of SAS, both with PROC JSON and other
custom programming methods. The example would show how easy it is to do
develop a stored process which produces JSON data which would be consumed
by the charting library with minimum change to the sample program.
Rajesh Inbasekaran, Kavi Associates
Naren Mudivarthy, Kavi Associates
Neetha Sindhu, Kavi Associates
This paper outlines the techniques that I have used with my clients over
the last five years to build powerful applications that run from a web
browser. The user interface is presented using HTML and JavaScript, which
is generated by SAS® Stored Processes. A JavaScript
framework called Ext JS is used to build components such as tables and
graphs, which have a lot of functionality built in. A range of SAS®
macros are used for building HTML and JavaScript, so the generation of the
user interface is simplified. This technique has been used to create a
medical monitoring system, the UK Census MIS, and a bank's risk management
application. I also discuss some techniques involved with integrating a
system like this with SAS® Portal, cubes, and web reports.
Philip Mason, Wood Street Consultants
Particle swarm optimization is a heuristic global optimization method that
was given by James Kennedy and Russell C. Eberhart in 1995. (James Kennedy
and Russell C. Eberhart). The purpose of this paper develops a code for
particle swarm optimization in SAS® 9.2.
Anurag Srivastava, Decision Quotient
Sangita Kumbharvadiya, Decision Quotient
Evaluation of the efficacy of an intervention is often complicated because
the intervention is not randomly assigned. Usually, interventions in
marketing, such as coupons or retention campaigns, are directed at
customers because their spending is below some threshold or because the
customers themselves make a purchase decision. The presence of nonrandom
assignment of the stimulus can lead to over- or underestimating the value
of the intervention. This can cause future campaigns to be directed at the
wrong customers or cause the impacts of these effects to be over- or
understated. This paper gives a brief overview of selection bias,
demonstrates how selection in the data can be modeled, and shows how to
apply some of the important consistent methods of estimating selection
models, including Heckman's two-step procedure, in an empirical example.
Sample code is provided in an appendix.
Gunce Walton, SAS
Kenneth Sanford, SAS
New York City boasts a wide variety of cuisine owing to the rich tourism
and the vibrant immigrant population. The quality of food and hygiene
maintained at the restaurants serving different cuisines has a direct
impact on the people dining in them. The objective of this paper is to
build a model that predicts the grade of the restaurants in New York City.
It also provides deeper statistical insights into the distribution of
restaurants, cuisine categories, grades, criticality of violations, etc.,
and concludes with the sequence analysis performed on the complete set of
violations recorded for the restaurants at different time periods over the
years 2012 and 2013. The data for 2013 is used to test the model. The data
set consists of 15 variables that capture to restaurant location-specific
and violation details. The target is an ordinal variable with three
levels, A, B, and C, in descending order of the quality representation.
Various SAS® Enterprise Miner™ models,
logistic regression, decision trees, neural networks, and ensemble models
are built and compared using validation misclassification rate. The
stepwise regression model appears to be the best model, with prediction
accuracy of 75.33%. The regression model is trained at step 3. The number
of critical violations at 8.5 gives the root node for the split of the
target levels, and the rest of the tree splits are guided by the predictor
variables such as number of critical and non-critical violations, number
of critical violations for the year 2011, cuisine group, and the borough.
Pruthvi Bhupathiraju Venkata, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
APP is an unofficial collective abbreviation for the SAS®
functions ADDR, PEEK, PEEKC, the CALL POKE routine, and their so-called
LONG 64-bit counterparts the SAS tools designed to directly read from and
write to physical memory in the DATA step. APP functions have long been a
SAS dark horse. First, the examples of APP usage in SAS documentation
amount to a few technical report tidbits intended for mainframe system
programming, with nary a hint how the functions can be used for data
management programming. Second, the documentation note on the CALL POKE
routine is so intimidating in tone that many potentially receptive folks
might decide to avoid the allegedly precarious route altogether. However,
little can stand in the way of an inquisitive SAS programmer daring to
take a close look, and it turns out that APP functions are very simple and
useful tools! They can be used to explore how things really work, to make
code more concise, to implement en masse data movement, and they can often
dramatically improve execution efficiency. The author and many other SAS
experts (notably Peter Crawford, Koen Vyverman, Richard DeVenezia, Toby
Dunn, and the fellow masked by his 'Puddin' Man' sobriquet) have been
poking around the SAS APP realm on SAS-L and in their own practices since
1998, occasionally letting the SAS community at large to peek at their
findings. This opus is an attempt to circumscribe the results in a
systematic manner. Welcome to the APP world! You are in for a few glorious
surprises.
Paul Dorfman, Dorfman Consulting
The health-care industry in the United States is going through a paradigm
shift moving away from its focus on treating diseases and toward promoting
health, wellness, and preventive public health programs, so that both the
individuals and the government can maintain a healthy bottom line. The
high-level business problem is to reduce the expected medical costs and
number of medical services required by the people of New Hampshire by
implementing successful disease prevention programs. The objective is to
identify which among the six prevention programs will successfully improve
the health of the residents of New Hampshire over nine future years (2012
2020). The business scenario of the case is to identify the preventive
programs that are most effective in reducing the costs in New Hampshire
and to invest the money in those programs so that the overall health-care
overhead costs can be reduced or controlled. The effectiveness of
implementing the preventive programs was evaluated using SAS®
Enterprise Guide® 5.1 and SAS® Enterprise
Miner™ 12. Time series analysis, in particular,
forecasting, is used to project the future health-care services and costs
for the years from 2012 to 2020. Our analysis showed that all the
preventive programs should be implemented concurrently. The minimum
anticipated savings in cost is approximately $572,111 or 3.3% of the
expected baseline cost of $17,297,931. Therefore, our recommendation is to
use this cost reduction figure, $572,111, as the initial funding
investment toward initiating the six prevention programs concurrently, so
that tangible results can be noticed by 2020.
Rakesh Karn, Oklahoma State University
Rom Khattri, Oklahoma State University
Pradeep Podila, Oklahoma State University
Linda Schumacher, Oklahoma State University
Paper 2443-2014:
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big
Data Analytics
There certainly is no shortage of hype when it comes to the term 'Big
Data' as vendors and enterprises alike highlight the transformative effect
of building actionable insight from the deluge of data that is now
available to us all. But among the hype, practical guidance is often
lacking: why is Apache Hadoop most often the technology underpinning 'Big
Data'? How does it fit into the current landscape of databases and data
warehouses that are already in use? Are there typical usage patterns that
can be used to distill some of the inherent complexity for us all to speak
a common language? And if there are common patterns, what are some ways
that I can apply them to my unique situation? This session has the
following agenda: Learn what types of data are being captured to build
'Big Data' applications Discover where Hadoop most often fits into the
data landscape for the typical enterprise Hear how common patterns of use
can simplify your approach and help you to find a usage that makes sense
for your business See how other organizations have used the usage patterns
to get started on their Big Data journey
Shaun Connolly, Horton Works
'NOTE: No unequal values were found. All values compared are exactly
equal.' Do your eyes automatically drop to the end of your PROC COMPARE
output in search of these words? Do you then conclude that your data sets
match? Be careful here! Major discrepancies might still lurk in the
shadows, and you ll never know about them if you make this common mistake.
This paper describes several of PROC COMPARE s blind spots and how to
steer clear of them. Watch in horror as PROC COMPARE glosses over
important differences while boldly proclaiming that all is well. See the
gruesome truth about what PROC COMPARE does, and what it doesn t do! Learn
simple techniques that allow you to peer into these blind spots and avoid
getting blindsided by PROC COMPARE!
Josh Horstman, Nested Loop Consulting
Roger Muller, Data-To-Events.com
Response rates, churn models, customer lifetime value today's marketing
departments are more analytically driven than ever. Marketers have had
their heads down developing analytic capabilities for some time. The
results have been game-changing. But it's time for marketers to look up
and discover which analytic results from other departments can enhance the
analytics of marketing. What if you knew the demand forecast for your
products? What could you do? What if you understood the price sensitivity
for your products? How would this impact the actions that your marketing
team takes? Using the hospitality industry as an example, we explore how
marketing teams can use the analytic outputs from other departments to get
better results overall.
Natalie Osborn, SAS
Eric Peterson, PInnacle Entertainment
Vistaprint saw the opportunity in the printing market to get more out of
high-volume printing by grouping similar orders in large groups. They
heavily rely on technology to handle design, printing, and order handling
and use the Internet as a medium. With their successful expansion across
the world, the issue they were facing was a lot of one-time buyers and a
lot of registered users who didn't finish the check-out. The need to
implement a retention strategy was the next logical step, for which they
chose SAS® Campaign Management. In this session, Vistaprint
explains how they use campaign management for retention and how the
project was addressed. They will also touch on how the concept of high
performance could open up new possibilities for them.
Sven Putseys, Vistaprint
Zelia Pellissier, Vistaprint
Ebony and Ivory was a number one song by Paul McCartney and Stevie Wonder
about making music together, proper integration, unity, and harmony on a
deeper level. With SAS® Visual Analytics, current Enterprise
Business Intelligence (BI) customers can rest assured that their years of
existing BI work and content can coexist until they can fully transition
over to SAS Visual Analytics. This presentation covers 10
inter-operability integration points between SAS® BI and SAS
Visual Analytics.
Ted Stolarczyk, SAS
Both recent banking and insurance risk regulations require effective
aggregation of risks. To determine the total enterprise risk for a
financial institution, all risks must be aggregated and analyzed.
Typically, there are two approaches: bottom-up and top-down risk
aggregation. In either approach, financial institutions face challenges
due to various levels of risks with differences in metrics, data source,
and availability. First, it is especially complex to aggregate risk. A
common view of the dependence between all individual risks can be hard to
achieve. Second, the underlying data sources can be updated at different
times and can have different horizons. This in turn requires an
incremental update of the overall risk view. Third, the risk needs to be
analyzed across on-demand hierarchies. This paper presents SAS®
solutions to these challenges. To address the first challenge, we consider
a mixed approach to specify copula dependence between individual risks and
allow step-by-step specification with a minimal amount of information.
Next, the solution leverages an event-driven architecture to update
results on a continuous basis. Finally, the platform provides a
self-service reporting and visualization environment for designing and
deploying reports across any hierarchy and granularity on the fly. These
capabilities enable institutions to create an accurate, timely,
comprehensive, and adaptive risk-aggregation and reporting system.
Wei Chen, SAS
Jimmy Skoglund, SAS
Srinivasan Iyer, SAS
With the introduction of new features in SAS® 9.4 Grid
Manager, administrators of SAS solutions have even better capabilities for
effectively managing the use of SAS® Enterprise Guide®
in a grid environment. In this paper, we explain and demonstrate proven
practices for configuring the SAS 9.4 Grid Manager environment, leveraging
grid options sets and grid-spawned SAS® Workspace Servers.
We walk through the options provided by SAS Enterprise Guide that make the
most effective use of the grid environment.
Edoardo Riva, SAS
The implicit loop refers to the DATA step repetitively reading data and
creating observations, one at a time. The explicit loop, which uses the
iterative DO, DO WHILE, or DO UNTIL statements, is used to repetitively
execute certain SAS® statements within each iteration of the
DATA step execution. Explicit loops are often used to simulate data and to
perform a certain computation repetitively. However, when an explicit loop
is used along with array processing, the applications are extended widely,
which includes transposing data, performing computations across variables,
and so on. To be able to write a successful program that uses loops and
arrays, one needs to know the contents in the program data vector (PDV)
during the DATA step execution, which is the fundamental concept of DATA
step programming. This workshop covers the basic concepts of the PDV,
which is often ignored by novice programmers, and then illustrates how to
use loops and arrays to transform lengthy code into more efficient
programs.
Arthur Li, City of Hope
Literature suggests two main approaches, parametric and non-parametric,
for constructing efficiency frontiers on which efficiency scores of other
units can be based. Parametric functions can be either deterministic or
stochastic in nature. However, when multiple inputs and outputs are
encountered, Data Envelopment Analysis (DEA), a non-parametric approach,
is a powerful tool used for decades in measurement of
productivity/efficiency with a wide range of applications. Both approaches
have advantages and limitations. This paper attempts to further explore
and validate a hybrid approach, taking the best of both the DEA and the
parametric approach, in order to estimate efficiency of Decision Making
Units (DMUs) in an even better way.
John Dilip Raj, GE
Data quality depends on review by operational stewards of the content.
Volumes of complex data disappear as e-mail attachments. Is there a
critical data shift that might be missed? Embedding a summary image drives
expert data review from 15% to 87%. Downstream error rate is significantly
reduced. Increased accuracy to variable physician compensation measures
results.
Amy Swartz, Kaiser Permanente
The availability of specialized programming and analysis resources in
academic medical centers is often limited, creating a significant
challenge for clinical research. The current work describes how Base
SAS® and SAS® Enterprise Guide®
are being used to empower research staff so that they are less reliant on
these scarce resources.
Chris Schacherer, Clinical Data Management Systems, LLC
This session provides an overview of how SAS® environments
can be best integrated with SAP HANA. You learn what is different about
SAP HANA and how SAS users can access and push-down their work, and thus
start to benefit from the in-memory power of SAP HANA. We further
highlight how the SAS® Predictive Modeling Workbench embeds
in the SAP HANA platform and the value this co-innovation delivers to you.
Christoph Morgen, SAP
Typically, it takes a system administrator to understand the graphic data
results that are generated in the Microsoft Windows Performance Monitor.
However, using SAS/GRAPH® software, you can customize
performance results in such a way that makes the data easier to read and
understand than the data that appears in the default performance monitor
graphs. This paper uses a SAS® data set that contains a
subset of the most common performance counters to show how SAS programmers
can create an improved, easily understood view of the key performance
counters by using SAS/GRAPH software. This improved view can help your
organization reduce resource bottlenecks on systems that range from large
servers to small workstations. The paper begins with a concise explanation
of how to collect data with Windows Performance Monitor. Next, examples
are used to illustrate the following topics in detail: converting and
formatting a subset of the performance-monitor data into a data set using
a SAS program to generate clearly labeled graphs that summarize
performance results analyzing results in different combinations that
illustrate common resource bottlenecks
John Maxwell, SAS
For data analysts, one of the most important steps after manipulating and
analyzing the data set is to create a report for it. Nowadays, many
statistics tables and reports are generated as HTML files that can be
easily accessed through the Internet. However, the SAS®
Output Delivery System (ODS) HTML output has many limitations on
interacting with users. In this paper, we introduce a method to enhance
the traditional ODS HTML output by using jQuery (a JavaScript library). A
macro was developed to implement this idea. Compared to the standard HTML
output, this macro can add sort, pagination, search, and even dynamic
drilldown function to the ODS HTML output file.
Yu Fu, Oklahoma State Department of Health
Chao Huang, Oklahoma State University
Pipeline parallelism, an extension of MP Connect, is an effective way to
speed processing. Piping allows the typical programming sequence of DATA
step followed by PROC to execute in parallel. Piping uses TCP ports to
pass records directly from the DATA step to the PROC immediately as each
individual record is processed. The DATA step in effect becomes a data
transformation filter for the PROC , running in parallel and incurring no
additional disk storage or related I/O lag. Establishing a pipe with MP
Connect typically requires specifying a physical TCP port to be used by
the writing and by the reading processes. Coding in this style opens the
possibility for users to generate systems conflicts by inadvertently
requesting ports that are in use. SAS® Metadata Server
allows one to allocate ports dynamically; that is, users can use a
symbolic name for the port with the server dynamically determining an
unused port to temporarily assign to the SAS® job. While
this capability is attractive, implementing SAS Metadata Server on a
system which does not use any of the other SAS BI technology can be
inefficient from a cost perspective. To enable dynamic port allocation
without the added cost, we created a UNIX script which can be called from
within SAS to ascertain which ports are available at runtime. The script
returns a list of available ports which is captured in a SAS macro
variable and subsequently used in establishing pipeline parallelism.
Piyush Singh, TATA Consultancy Services Ltd.
Gerhardt Pohl, Eli Lilly and Company
This paper is based on the belief that debugging your programs is not only
necessary, but also a good way to gain insight into how SAS®
works. Once you understand why you got an error, a warning, or a note,
you'll be better able to avoid problems in the future. In other words,
people who are good debuggers are good programmers. This paper covers
common problems including missing semicolons and character-to-numeric
conversions, and the tricky problem of a DATA step that runs without
suspicious messages but, nonetheless, produces the wrong results. For each
problem, the message is deciphered, possible causes are listed, and how to
fix the problem is explained.
Lora Delwiche, University of California
Susan Slaughter, Avocet Solutions
In evaluation instruments and tests, individual items are often collected
using an ordinal measurement or Likert type scale. Typically measures such
as Cronbach s alpha are estimated using the standard Pearson correlation.
Gadderman and Zumbo (2012) illustrate how using the standard Pearson
correlations may yield biased estimates of reliability when the data are
ordinal and present methodology for using the polychoric correlation in
reliability estimates as an alternative. This session shows how to
implement the methods of Gadderman and Zumbo using SAS®
software. An example will be presented that incorporates these methods in
the estimation of the reliability of an active learning post-occupancy
evaluation instrument developed by Steelcase Education Solutions
researchers.
Laura Kapitula, Grand Valley State University
The worst part of going to school is having to show up. However, data
shows that those who do show up are the ones that are going to be the most
successful (Johnson, 2000). As shown in a study done in Minneapolis,
students who were in class at least 95% of the time were twice as likely
pass state tests (Johnson, 2000). Studies have been conducted and show
that school districts that show interest in attendance have higher
achievement in students (Reeves, 2008). The goal in doing research on
student attendance is to find out the patterns of when people are missing
class and why they are absent. The data comes directly from the Phillip O
Berry High School Attendance Office, with around 1600 students; there is
plenty of data to be used from the 2012 2013 school year. Using Base
SAS® 9.3, after importing the data in from Microsoft Excel,
a series of PROC formats and PROC GCharts were used to output and analyze
the data. The data showed the days of the week and period that students
missed the most, depending on grade level. The data shows that Freshman
and Seniors were the most likely to be absent on a given day. Based on the
data, attendance continues to be a issue; therefore, school districts need
to take an active role in developing attendance policies.
Jacob Foard, Phillip O. Berry Academy of Technology
Thomas Nix, Phillip O. Berry Academy of Technology
Rachel Simmons, Phillip O. Berry Academy of Technology
Paper SAS029-2014:
Event Stream Processing For Big Data and Real-time Analytics
Gartner claims the 'Internet of Things' trend will add 50 billion
connected devices by 2015 on top of the 2 billion connected people who
currently populate the Internet as we know it. Understanding what is
happening in these environments is a huge challenge because the flow and
volume of data is ever increasing. And while the types of data processing
itself do not change, where you do this processing, how event streams are
captured, and how important events are defined does. Event stream
processing is important technology for capturing, analyzing, and
processing fast flowing data in motion. This session will give you an
overview of where SAS is going in support of event streaming.
Steve Sparano, SAS
Given a time series data set, you can use automatic time series modeling
software to select an appropriate time series model. You can use various
statistics to judge how well each candidate model fits the data
(in-sample). Likewise, you can use various statistics to select an
appropriate model from a list of candidate models (in-sample or
out-of-sample or both). Finally, you can use rolling simulations to
evaluate ex-ante forecast performance over several forecast origins. This
paper demonstrates how you can use SAS® Forecast Server
Procedures and SAS® Forecast Studiosoftware to perform the
statistical analyses that are related to rolling simulations.
Michael Leonard, SAS
Ashwini Dixit, SAS
Udo Sglavo, SAS
Logistic regression is a powerful technique for predicting the outcome of
a categorical response variable and is used in a wide range of
disciplines. Until recently, however, this methodology was available only
for data that were collected using a simple random sample. Thanks to the
work of statisticians such as Binder (1983), logistic modeling has been
extended to data that are collected from a complex survey design that
includes strata, clusters, and weights. Through examples, this paper
provides guidance on how to use PROC SURVEYLOGISTIC to apply logistic
regression modeling techniques to data that are collected from a complex
survey design. The examples relate to calculating odds ratios for models
with interactions, scoring data sets, and producing ROC curves. As an
extension of these techniques, a final example shows how to fit a
Generalized Estimating Equations (GEE) logit model.
Rob Agnelli, SAS
SAS® is an outstanding suite of software, but not everyone
in the workplace speaks SAS. However, almost everyone speaks Excel. Often,
the data you are analyzing, the data you are creating, and the report you
are producing is a form of a Microsoft Excel spreadsheet. Every year at
SAS® Global Forum, there are SAS and Excel presentations,
not just because Excel isso pervasive in the workplace, but because there
s always something new to learn (or re-learn)! This paper summarizes and
references (and pays homage to!) previous SAS Global Forum presentations,
as well as examines some of the latest Excel capabilities with the latest
versions of SAS® 9.4 and SAS® Visual
Analytics.
Andrew Howell, ANJ Solutions
Business Intelligence (BI) dashboards serve as an invaluable, high-level,
visual reference tool for decision-making processes in many business
industries. A request was made to our department to develop some BI
dashboards that could be incorporated in an academic setting. These
dashboards would aim to serve various undergraduate executive and
administrative staff at the university. While most business data may lend
itself to work very well and easily in the development of dashboards,
academic data is typically modeled differently and, therefore, faces
unique challenges. In this paper, the authors detail and share the design
and development process of creating dashboards for decision making in an
academic environment utilizing SAS® BI Dashboard 4.3 and
other SAS® Enterprise Business Intelligence 9.2 tools. The
authors also provide lessons learned as well as recommendations for future
implementations of BI dashboards utilizing academic data.
Evangeline Collado, University of Central Florida
Michelle Parente, University of Central Florida
Explore the various DATA step merge and PROC SQL join processes. This
presentation examines the similarities and differences between merges and
joins, and provides examples of effective coding techniques. Attendees
examine the objectives and principles behind merges and joins, one-to-one
merges (joins), and match-merge (equi-join), as well as the coding
constructs associated with inner and outer merges (joins) and PROC SQL set
operators.
Kirk Paul Lafler, Software Intelligence Corporation
Potential of One, Power of All. That has a really nice ring to it,
especially as it pertains to accessing all of your corporate data through
one single data access point. It means the potential of having a single
source for all of your data connections from throughout the enterprise. It
also means that the complexities of connecting to these data assets from
the various source systems throughout the enterprise are hidden from the
end user. With this, however, comes the possibility of placing personally
identifiable information in the hands of a user who should not have access
to it. The bottom line is that there is risk and uncertainty with allowing
users to have access to data that is disallowed by your existing data
governance strategy. Blocking these data elements from specific users or
groups of users is a challenge that many corporations face today, whether
it is secure financial information, confidential personnel records, or
personal medical information protected by strict regulations. How do you
surface All necessary data to All necessary users, while at the same time
maintaining the security of the data? SAS® Federation Server
Manager is an easy-to-use interface that allows the data administrator to
manage your data assets in such a way that it alleviates this risk by
controlling access to critical data elements and maintaining the proper
level of data disclosure control. This session focuses on how to employ
various data access control strategies from within SAS Federation Server
Manager.
Mark Craver, SAS
Mike Frost, SAS
SAS® can easily perform calculations and export the result
to Microsoft Excel in a report. However, sometimes you need Excel to have
a formula or a function in a cell and not just a number. Whether it s for
a boss who wants to see a SUM formula in the total cell or to have
automatically updating reports that can be sent to people who don t use
SAS to be completed, exporting formulas to Excel can be very powerful.
This paper illustrates how, by using PROC REPORT and PROC PRINT along with
the ExcelXP tagset, you can easily export formulas and functions into
Excel directly from SAS. The method outlined in this paper requires Base
SAS® 9.1 or higher and Excel 2002 or later and requires a
basic understanding of the ExcelXP tagset.
Joseph Skopic, Federal Government
The growing adoption of electronic systems for keeping medical records
provides an opportunity for health care practitioners and biomedical
researchers to access traditionally unstructured data in a new and
exciting way. Pathology reports, progress notes, and many other sections
of the patient record that are typically written in a narrative format can
now be analyzed by employing natural language processing contextual
extraction techniques to identify specific concepts contained within the
text. Linking these concepts to a standardized nomenclature (for example,
SNOMED CT, ICD-9, ICD-10, and so on) frees analysts to explore and test
hypotheses using these observational data. Using SAS®
software, we have developed a solution in order to extract data from the
unstructured text found in medical pathology reports, link the extracted
terms to biomedical ontologies, join the output with more structured
patient data, and view the results in reports and graphical
visualizations. At its foundation, this solution employs SAS®
Enterprise Content Categorization to perform entity extraction using both
manually and automatically generated concept definition rules. Concept
definition rules are automatically created using technology developed by
SAS, and the unstructured reports are scored using the DS2/SAS®
Content Categorization API. Results are post-processed and added to tables
compatible with SAS® Visual Analytics, thus enabling users
to visualize and explore data as required. We illustrate the interrelated
components of this solution with examples of appropriate use cases and
describe manual validation of performance and reliability with metrics
such as precision and recall. We also provide examples of reports and
visualizations created with SAS Visual Analytics.
Greg Massey, SAS
Radhikha Myneni, SAS
Adrian Mattocks, SAS
Eric Brinsfield, SAS
When you want to know the details about a small subset of a much larger
data set, it can take a long time to select the records you need. This
paper shows you how to create a user-defined SAS® format to
pull only the observations that you want out of a big data source. Even
when selecting a million records out of data sets that can have more than
100 million records, this method is much quicker than either a PROC SQL
join or a SAS merge.
Sara Boltman, Butterfly Projects
Each month, our project team delivers updated 5-Star ratings for 15,700+
nursing homes across the United States to Centers for Medicare and
Medicaid Services. There is a wealth of data (and processing) behind the
ratings, and this data is longitudinal in nature. A prior paper in this
series, 'Programming the Provider Previews: Extreme SAS®
Reporting,' discussed one aspect of the processing involved in maintaining
the Nursing Home Compare website. This paper will discuss two other
aspects of our processing: creating an annual data Compendium and
extending the 5-star processing to accommodate several different output
formats for different purposes. Products used include Base
SAS®, SAS/STAT®, ODS Graphics procedures, and
SAS/GRAPH®. New annotate facilities in both SAS/GRAPH and
the ODS Graphics procedures will be discussed. This paper and presentation
will be of most interest to SAS programmers with medium to advanced SAS
skills.
Louise Hadden, Abt Associates Inc.
Paper SAS405-2014:
Financial Crimes Compliance: Track, Monitor, and Audit
The continued expansion of governance associated with the Supervisory
Guidance on Model Risk Management (OCC 2011-12, SR 11-7), which is
published by the Office of the Comptroller of the Currency and the Board
of Governors of the Federal Reserve System, now includes all areas within
a financial institution, including scenarios associated with financial
crimes compliance. There is now an expectation to track, monitor, and
audit the overall scenario management through the entire cycle. This
includes authoring scenarios, managing changes associated with those
scenarios (what if scenarios, champion and challenger scenarios, etc.),
promoting those scenarios to production, and the ongoing measuring and
monitoring of those scenarios. Leveraging the power of SAS®
Visual Scenario Designer, we can execute all of these tasks that
facilitate interaction between the modeling group that manages the
scenarios and the data associated through case investigation. This paper
discusses how to use SAS® Visual Analytics, SAS Visual
Scenario Designer, and SAS® Financial Crimes Suite to
converge traditional business operations approaches and to develop, test,
and promote models to allow for greater control and tracking for the
compliance groups.
Jay Flowe, SAS
Traditionally, web applications interact with back-end databases by means
of JDBC/ODBC connections to retrieve and update data. With the growing
need for real-time charting and complex analysis types of data
representation on these web applications, SAS computing power can be put
to use by adding a SAS web service layer between the application and the
database. With the experience that we have with integrating these
applications to SAS® BI Web Services, this is our attempt to
point out five things to do when using SAS BI Web Services. 1) Input Data
Sources: always enable Allow rewinding stream while creating the stored
process. 2) Use LIBNAME statements to define XML filerefs for the Input
and Output Streams (Data Sources). 3) Define input prompts and output
parameters as global macro variables in the stored process if the stored
process calls macros that use these parameters. 4) Make sure that all of
the output parameters values are set correctly as defined (data type)
before the end of the stored process. 5) The Input Streams (if any) should
have a consistent data type; essentially, every instance of the stream
should have the same structure. This paper consist of examples and
illustrations of errors and warnings associated with the previously
mentioned cases.
Neetha Sindhu, Kavi Associates
Vimal Raj, Kavi Associates
Data is often stored in highly normalized ( tall and skinny ) structures
that are not convenient for analysis. The SAS® programmer
frequently needs to transform the data to arrange relevant variables
together in a single row. Sometimes this is a simple matter of using the
TRANSPOSE procedure to flip the values of a single variable into separate
variables. However, when there are multiple variables to be transposed to
a single row, it might require multiple transpositions to obtain the
desired result. This paper describes five different ways to achieve this
flip-flop, explains how each method works, and compares the usefulness of
each method in various situations. Emphasis is given to achieving a
data-driven solution that minimizes hard-coding based on prior knowledge
of the possible values each variable can have and that improves
maintainability and reusability of the code. The intended audience is
novice and intermediate SAS programmers who have a basic understanding of
the DATA step and the TRANSPOSE procedure.
Josh Horstman, Nested Loop Consulting
No way. Not gonna happen. I am a real SAS® programmer.
(Spoken by a Real SAS Programmer.) SAS® Enterprise Guide®
sometimes gets a bad rap. It was originally promoted as a code generator
for non-programmers. The truth is, however, that SAS Enterprise Guide has
always allowed programmers to write their own code. In addition, it offers
many features that are not included in PC SAS®. This
presentation shows you the top ten features that people who like to write
code care about. It will be taught by a programmer who now prefers using
SAS Enterprise Guide.
Christopher Bost, MDRC
For almost two decades, Western Kentucky University's Office of
Institutional Research (WKU-IR) has used SAS® to help shape
the future of the institution by providing faculty and administrators with
information they can use to make a difference in the lives of their
students. This presentation provides specific examples of how WKU-IR has
shaped the policies and practices of our institution and discusses how
WKU-IR moved from a support unit to a key strategic partner. In addition,
the presentation covers the following topics: How the WKU Office of
Institutional Research developed over time; Why WKU abandoned reactive
reporting for a more accurate, convenient system using SAS®
Enterprise Intelligence Suite for Education; How WKU shifted from
investigating what happened to predicting outcomes using SAS®
Enterprise Miner™ and SAS® Text Miner; How
the office keeps the system relevant and utilized by key decision makers;
What the office has accomplished and key plans for the future.
Tuesdi Helbig, Western Kentucky University
Gina Huff, Western Kentucky University
In this interconnected world, it is becoming ever more important to
understand not just details about your data, but also how different parts
of your data are related to each other. From social networks to supply
chains to text analytics, network analysis is becoming a critical
requirement and network visualization is one of the best ways to
understand the results. The new SAS® Visual Analytics
network visualization shows links between related nodes as well as
additional attributes such as color, size, or labels. This paper explains
the basic concepts of networks as well as provides detailed background
information on how to use network visualizations within SAS Visual
Analytics.
Falko Schulz, SAS
Nascif Abousalh-Neto, SAS
Based on selection criteria, the SAS® Data Integration
Studio loop or splitter transformations can be used to generate multiple
output files. The ETL developer or SAS® administrator can
decide which transformation is better suited for the design, priorities,
and SAS configuration at their site. Factors to consider are the setup,
maintenance, and performance of the ETL job. The loop transformation
requires an understanding of macros and a control table. The splitter
transformation is more straightforward and self documenting. If time
allows, creating and running a job with each transformation can provide
benchmarking to measure performance. For a comparison of these two
options, this paper shows an example of the same job using the loop or
splitter transformation. For added testing metrics, one can adapt the
LOGPARSE SAS macro to parse the job logs.
Liotus Laura, Community Care Behavioral Health
PROC TABULATE is the most widely used reporting tool in
SAS®, along with PROC REPORT. Any kind of report with the
desired statistics can be produced by PROC TABULATE. When we need to
report some summary statistics like mean, median, and range in the
heading, either we have to edit it outside SAS in word processing software
or enter it manually. In this paper, we discuss how we can automate this
to be dynamic by using PROC SQL and some simple macros.
Lovedeep Gondara, BC Cancer Agency
This paper shares our experience integrating two leading data analytics
and Geographic Information Systems (GIS) software products SAS®
and ArcGIS to provide integrated reporting capabilities. SAS is a powerful
tool for data manipulation and statistical analysis. ArcGIS is a powerful
tool for analyzing data spatially and presenting complex cartographic
representations. Combining statistical data analytics and GIS provides
increased insight into data and allows for new and creative ways of
visualizing the results. Although products exist to facilitate the sharing
of data between SAS and ArcGIS, there are no ready-made solutions for
integrating the output of these two tools in a dynamic and automated way.
Our approach leverages the individual strengths of SAS and ArcGIS, as well
as the report delivery infrastructure of SAS® Information
Delivery Portal.
Nathan Clausen, CACI
Aaron House, CACI
Paper SAS2203-2014:
Getting Started with Mixed Models
This introductory presentation is intended for an audience new to mixed
models who wants to get an overview of this useful class of models. Learn
about mixed models as an extension of ordinary regression models, and see
several examples of mixed models in social, agricultural, and
pharmaceutical research.
Catherine Truxillo, SAS
Paper SAS2204-2014:
Getting Started with Mixed Models in Business
For decades, mixed models have been used by researchers to account for
random sources of variation in regression-type models. Now, they are
gaining favor in business statistics for giving better predictions for
naturally occurring groups of data, such as sales reps, store locations,
or regions. Learn about how predictions based on a mixed model differ from
predictions in ordinary regression and see examples of mixed models with
business data.
Catherine Truxillo, SAS
Paper SAS2206-2014:
Getting Started with Poisson Regression Modeling
When the dependent variable is a count, Poisson regression is a natural
choice of distribution for fitting a regression model. This presentation
is intended for an audience experienced in linear regression modeling, but
new to Poisson regression modeling. Learn the basics of this useful
distribution and see some examples where it is appropriate. Tips for
identifying problems with fitting a Poisson regression model and some
helpful alternatives are provided.
Chris Daman, SAS
Marc Huber, SAS
Paper SAS2205-2014:
Getting Started with Survey Procedures
Analyzing data from a complex probability survey involves weighting
observations so that inferences are correct. This introductory
presentation is intended for an audience new to analyzing survey data.
Learn the essentials of using the SURVEYxx procedures in
SAS/STAT®.
Chris Daman, SAS
Bob Lucas, SAS
Do you need a statistic that is not computed by any SAS®
procedure? Reach for the SAS/IML® language! Many statistics
are naturally expressed in terms of matrices and vectors. For these, you
need a matrix-vector language. This hands-on workshop introduces the
SAS/IML language to experienced SAS programmers. The workshop focuses on
statements that create and manipulate matrices, read and write data sets,
and control the program flow. You will learn how to write user-defined
functions, interact with other SAS procedures, and recognize efficient
programming techniques. Programs are written using the SAS/IML®
Studio development environment. This course covers Chapters 2 4 of
Statistical Programming with SAS/IML Software (Wicklin, 2010).
Rick Wicklin, SAS
Have you ever seen SAS® Visual Analytics reports that are
somehow more elegant than a standard report? Which qualities make reports
easier to navigate, more appealing to the eye, or reveal insights more
quickly? These quick tips will reveal several SAS Visual Analytics report
design characteristics to help make your reports stand out from the pack.
We cover concepts like color palettes, content organization, interactions,
labeling, and branding, just to name a few.
Keith Renison, SAS
With the ever increasing proliferation of disparate complex data being
collected and stored, it has never been more important that this
information is accurate, clean, integrated, and often times in compliance
with an expanding set of government regulations. This means that the data
must be cleaned and standardized, duplicates must be identified and
removed, and the individual data must be able to be joined or merged
together in some way. However, it is often the case that this data does
not have the same variables or values to make this possible with a simple
Join or Merge. To that end, one has to employ a set of fuzzy logics or
fuzzy matching. Simply put, fuzzy matching is the implementation of
algorithmic processes (fuzzy logic) to determine the similarity between
elements of data such as business names, people names, or address
information. Fuzzy logic is used to predict the probability of data with
non-exact matches to help in data cleansing, deduplication, or matching of
disparate data sets. This paper shows the basics of using fuzzy logic by
using SAS® functions, COMPLEV, multiple variables matches,
and a modified Porter stemming algorithm.
Toby Dunn, Dunn Consulting
It is not uncommon to find models with random components like location,
clinic, teacher, etc., not just the single error term we think of in
ordinary regression. This paper uses several examples to illustrate the
underlying ideas. In addition, the response variable might be Poisson or
binary rather than normal, thus taking us into the realm of generalized
linear mixed models, These too will be illustrated with examples.
David Dickey, NC State University
Beginning with SA®S 9.2, ODS Graphics introduces a whole new
way of generating graphs using SAS®. With just a few lines
of code, you can create a wide variety of high-quality graphs. This paper
covers the three basic ODS Graphics procedures SGPLOT, SGPANEL, and
SGSCATTER. SGPLOT produces single-celled graphs. SGPANEL produces
multi-celled graphs that share common axes. SGSCATTER produces
multi-celled graphs that might use different axes. This paper shows how to
use each of these procedures in order to produce different types of
graphs, how to send your graphs to different ODS destinations, how to
access individual graphs, and how to specify properties of graphs, such as
format, name, height, and width.
Lora Delwiche, University of California, Davis
Susan Slaughter, Avocet Solutions
This paper illustrates some SAS® graphs that can be useful
for variable selection in predictive modeling. Analysts are often
confronted with hundreds of candidate variables available for use in
predictive models, and this paper illustrates some simple SAS graphs that
are easy to create and that are useful for visually evaluating candidate
variables for inclusion or exclusion in predictive models. The graphs
illustrated in this paper are bar charts with confidence intervals using
the GCHART procedure and comparative histograms using the UNIVARIATE
procedure. The graphs can be used for most combinations of categorical or
continuous target variables with categorical or continuous input
variables. This paper assumes the reader is familiar with the basic
process of creating predictive models using multiple (linear or logistic)
regression.
Bob Moore, Thrivent Financial
Missing data is an ever-present issue, and analysts should exercise proper
care when dealing with it. Depending on the data and the analytical
approach, this problem can be addressed by simply removing records with
missing data. However, in most cases, this is not the best approach. In
fact, this can potentially result in inaccurate or biased analyses. The
SAS® programming language offers many DATA step processes
and functions for handling missing values. However, some analysts might
not like or be comfortable with programming. Fortunately, SAS®
Enterprise Guide® can provide those analysts with a number
of simple built-in tasks for discovering missing data and diagnosing their
distribution across fields. In addition, various techniques are available
in SAS Enterprise Guide for imputing missing values, varying from simple
built-in tasks to more advanced tasks that might require some customized
SAS code. The focus of this presentation is to demonstrate how SAS
Enterprise Guide features such as Query Builder, Filter and Sort Wizard,
Describe Data, Standardize Data, and Create Time Series address missing
data issues through the point-and-click interface. As an example of code
integration, we demonstrate the use of a code node for more advanced
handling of missing data. Specifically, this demonstration highlights the
power and programming simplicity of PROC EXPAND (SAS/ETS®
software) in imputing missing values for time series data.
Elena Shtern, SAS
Matt Hall, SAS
Have you ever needed additional data that was only accessible via a web
service in XML or JSON? In some situations, the web service is set up to
only accept parameter values that return data for a single observation. To
get the data for multiple values, we need to iteratively pass the
parameter values to the web service in order to build the necessary
dataset. This paper shows how to combine the SAS® hash
object with the FILEVAR= option to iteratively pass a parameter value to a
web service and input the resulting JSON or XML formatted data.
John Vickery, North Carolina State University
SAS® provides some powerful, flexible tools for creating
reports, like PROC REPORT and PROC TABULATE. With the advent of the Output
Delivery System (ODS), you have almost total control over how the output
from those procedures looks. But there are still times when you need (or
want) just a little more, and that s where the Report Writing Interface
(RWI) can help. The RWI is just a fancy way of saying that you are using
the ODSOUT object in a DATA step. This object enables you to lay out the
page, create tables, embed images, add titles and footnotes, and more all
from within a DATA step, using whatever DATA step logic you need. Also,
all the style capabilities of ODS are available to you so that the output
created by your DATA step can have fonts, sizes, colors, backgrounds, and
borders that make your report look just like you want. This presentation
quickly covers some of the basics of using the ODSOUT object and then
walks through some of the techniques to create four real-world examples.
Who knows, you might even go home and replace some of your PROC REPORT
code I know I have!
Pete Lund, Looking Glass Analytics
Healthcare services data on products and services come in different shapes
and forms. Data cleaning, characterization, massaging, and transformation
are essential precursors to any statistical model-building efforts. In
addition, data size, quality, and distribution influence model selection,
model life cycle, and the ease with which business insights are extracted
from data. Analysts need to examine data characteristics and determine the
right data transformation and methods of analysis for valid interpretation
of results. In this presentation, we demonstrate the common data
distribution types for a typical healthcare services industry such as
Cardinal Health and their salient features. In addition, we use Base
SAS® and SAS/STAT® for data transformation of
both the response (Y) and the explanatory (X) variables in four
combinations [RR (Y and X as row data), TR (only Y transformed), RT (only
X transformed), and TT (Y and X transformed)] and the practical
significance of interpreting linear, logistic, and completely randomized
design model results using the original and the transformed data values
for decision-making processes. The reality of dealing with diverse forms
of data, the ramification of data transformation, and the challenge of
interpreting model results of transformed data are discussed. Our analysis
showed that the magnitude of data variability is an overriding factor to
the success of data transformation and the subsequent tasks of model
building and interpretation of model parameters. Although data
transformation provided some benefits, it complicated analysis and
subsequent interpretation of model results.
Dawit Mulugeta, Cardinal Health
Jason Greenfield, Cardinal Health
Tison Bolen, Cardinal Health
Lisa Conley, Cardinal Health
When first presented with SAS® Enterprise
Guide®, many existing SAS® programmers don't
know where to begin. They want to understand, 'What's in it for me?' if
they switch over. These longtime users of SAS are accustomed to typing all
of their code into the Program Editor window and clicking Submit. This
beginning tutorial introduces SAS Enterprise Guide 6.1 to old and new
users of SAS who need to code. It points out advantages and tips that
demonstrate why a user should be excited about the switch. This tutorial
focuses on the key points of a session involving coding and introduces new
features. It covers the top three items for a user to consider when
switching over to a server-based environment. Attendees will return to the
office with a new motivation and confidence to start coding with SAS
Enterprise Guide.
Andy Ravenna, SAS
A group tasked with testing SAS® software from the customer
perspective has gathered a number of helpful hints for SAS®
9.4 that will smooth the transition to its new features and products.
These hints will help with the 'huh?' moments that crop up when you're
getting oriented and will provide short, straightforward answers. And we
can share insights about changes in your order contents. Gleaned from
extensive multi-tier deployments, SAS® Customer Experience
Testing shares insiders' practical tips to ensure you are ready to begin
your transition to SAS® 9.4.
Cindy Taylor, SAS
Do you create reports via a mainframe? If so, you can use SAS®
as a one-stop shop for all of your data manipulations. SAS can efficiently
read data, create data sets without the need for multiple DATA steps, and
produce Excel reports without the need for edits. This poster helps novice
mainframe programmers by providing helpful tips to efficiently create
reports using SAS in the mainframe environment. Topics covered are
replacing JCL with SAS for reading data, efficient merging for efficient
programming, and using PROQ FREQ for data quality and PROC TABULATE for
superior reporting.
Rahul Pillay, Northrop Grumman
Many organizations need to forecast large numbers of time series that are
organized in a hierarchical fashion. Good forecasting practices recommend
that several hierarchies be used and that each hierarchy contain a
homogeneous set of time series with similar statistical properties.
Modeling and forecasting homogeneous time series hierarchies provide
better out-of-sample forecast performance. Because an organization might
have many time series hierarchies, it is often desirable to model and
forecast these hierarchical time series in parallel for computational
efficiency. Additionally, it is often desirable to aggregate forecasts
from several nonhomogeneous time series hierarchies for report generation.
This paper demonstrates these techniques for forecasting time series
hierarchies in parallel and for aggregating the forecasts by using SAS®
Forecast Server and SAS® Grid Manager.
Michael Leonard, SAS
Cheryl Doninger, SAS
Udo Sglavo, SAS
Portfolio segmentation is key in all forecasting projects. Not all
products are equally predictable. Nestl uses animal names for its
segmentation, and the animal behavior translates well into how the
planners should plan these products. Mad Bulls are those products that are
tough to predict, if we don't know what is causing their unpredictability.
The Horses are easier to deal with. Modern time series based statistical
forecasting methods can tame Mad Bulls, as they allow to add explanatory
variables into the models. Nestl now complements its Demand Planning
solution based on SAP with predictive analytics technology provided by
SAS®, to overcome these issues in an industry that is highly
promotion-driven. In this talk, we will provide an overview of the
relationship Nestl is building with SAS, and provide concrete examples of
how modern statistical forecasting methods available in SAS®
Demand-Driven Planning and Optimization help us to increase forecasting
performance, and therefore to provide high service to our customers with
optimized stock, the primary goal of Nestl 's supply chains.
Marcel Baumgartner, Nestlé SA
This case study shows how SAS® Enterprise Guide®
and SAS® Enterprise BI made it possible to easily implement
reports of fraud prevention in BF Financial Services and also how to help
operational areas to increase efficiency through automation of information
delivery. The fraud alert report was made using a program developed in SAS
Enterprise Guide to detect frauds on loan applications and later published
in SAS® Web Report Studio in order to be analyzed by a team.
The second example is the automation by SAS BI of a payment report that
spent 30% of the time of a six-worker staff.
Plinio Faria, Bradesco
SAS offers advanced analytics while Teradata has developed one of the
fastestdatabases known to mankind. The Office of the Actuary at CMS uses
SAS andTeradata to perform many important tasks such as setting Medicare
Advantagerates for providers, forecasting the cost of closing the Part-D
Donut Hole, andestimating the future cost of healthcare in America. The
major advantage to theTeradata platform is the speed at which data can be
summarized as compared tolegacy systems (billions of rows of data can be
summarized in seconds where itonce took days) and the SAS system provides
many options for accessing andanalyzing this data - whether you re on a
mainframe, Windows®, or Unixthrough SAS®
Enterprise Guide or Business Intelligence.
Richard Andrews, Centers for Medicare and Medicaid Services
The role of the Data Scientist is the viral job description of the decade.
And like LOLcats, there are many types of Data Scientists. What is this
new role? Who is hiring them? What do they do? What skills are required to
do their job? What does this mean for the SAS® programmer
and the statistician? Are they obsolete? And finally, if I am a SAS user,
how can I become a Data Scientist? Come learn about this job of the future
and what you can do to be part of it.
Chuck Kincaid, Experis Business Analytics
Do you have data in SharePoint that you would like to run analysis on with
SAS®? This workshop teaches you how to create a custom task
in SAS® Enterprise Guide® in order to find,
retrieve, and format that data into a SAS data set for use in your SAS
programs.
Bill Reid, SAS
Recent studies suggest that unstructured data, such as customer comments
or feedback, can enhance the power of existing predictive models. SAS®
Text Miner can generate singular value decomposition (SVD) units from text
documents, which is a vectorial representation of terms in documents.
These SVDs, when used as additional inputs along with the existing
structured input variables, often prove to capture the response better.
However, SVD units are sort of black box variables and are not easy to
interpret or explain. This is a big hindrance to win over the decision
makers in the organizations to incorporate these derived textual data
components in the models. In this paper, we demonstrate a new and powerful
feature in SAS® Text Miner 12.1 that helps in explaining the
SVDs or the text cluster components. We discuss two important methods that
are useful to interpreting them. For this purpose, we used data from a
television network company that has transcripts of its call center notes
from three prior calls of each customer. We are able to extract the key
terms from the call center notes in the form of Boolean rules, which have
contributed to the prediction of customer churn. These rules provide an
intuitive sense of which set of terms, when occurring in either the
presence or absence of another set of terms in the call center notes,
might lead to a churn. It also provides insights into which customers are
at a bigger risk of churning from the company s services and, more
importantly, why.
Murali Pagolu, SAS
Goutam Chakraborty, Oklahoma State University
No matter how long you ve been programming in SAS®, using
and manipulating dates still seems to require effort. Learn all about SAS
dates, the different ways they can be presented, and how to make them
useful. This paper includes excellent examples for dealing with raw input
dates, functions to manage dates, and outputting SAS dates into other
formats. Included is all the date information you will need: date and time
functions, Informats, formats, and arithmetic operations.
Jenine Milum, Equifax Inc.
Retail price setting is influenced by two distinct factors: the regular
price and the promotion price. Together, these factors determine the list
price for a specific item at a specific time. These data are often
reported only as a singular list price. Separating this one price into two
distinct prices is critical for accurate price elasticity modeling in
retail. These elasticities are then used to make sales forecasts, manage
inventory, and evaluate promotions. This paper describes a new time-series
feature extraction utility within SAS® Forecast Server that
allows for automated separation of promotional and regular prices.
Michael Leonard, SAS
Michele Trovero, SAS
The Healthcare and Life Sciences industry is by nature conservative and
tends to change slowly. Driven by the joint needs to improve quality and
lower costs for their services and products that rate of change is rapidly
increasing. We are now able to take data in almost any format; organize
it, curate it, and turn it into actionable knowledge. We can now do this
in real time and make a meaningful difference in citizen s lives. From
genomics (all the omics actually), to operations, to clinical care, to
health and wellness, Big data is popping up everywhere. From ICD-10 CAC
(Computer Assisted Coding) to clinical trials big data is there. With the
rapid advancement in remote monitoring (RPM), the Internet of Things
(IoT), and personalized clinical medicine (PCM), changes in drug
discovery, care delivery and disease prevention are rapidly moving from
concept to mainstream practice. Come hear about the IT infrastructure and
capability your organization needs to develop, to deliver on such
capabilities. Whether you are a researcher, a practicing clinician, Govt
official or hospital administrator you will need to understand and use big
data solutions to thrive in the coming era of healthcare and big data.
Mark Blatt, Intel Corporation
This presentation is for users who are familiar with SAS®
Enterprise Guide® but might not be aware of the many useful
new features added in versions 4.2 and beyond. For example, SAS Enterprise
Guide allows you to: Format your SAS® source code to make it
easier to read. Easily schedule a project to run at a given time. Work
with OLAP data in your enterprise. We will overview these and other
features to help you become even more productive using this powerful
application.
Mark Allemang, SAS
The DATA step has served SAS® programmers well over the
years, and although it is powerful, it has not fundamentally changed. With
DS2, SAS has introduced a significant alternative to the DATA step by
introducing an object-oriented programming environment. In this paper, we
share our experiences with getting started with DS2 and learning to use it
to access, manage, and share data in a scalable, threaded, and
standards-based way.
Peter Eberhardt, Fernwood Consulting Group Inc.
Xue Yao, Winnipeg Regional Health Authority
Determining what, when, and how to migrate SAS® software
from one major version to the next is a common challenge. SAS provides
documentation and tools to help make the assessment, planning, and
eventual deployment go smoothly. We describe some of the keys to making
your migration a success, including the effective use of the SAS®
Migration Utility, both in the analysis mode and the execution mode. This
utility is responsible for analyzing each machine in an existing
environment, surfacing product-specific migration information, and
creating packages to migrate existing product configurations to later
versions. We show how it can be used to simplify each step of the
migration process, including recent enhancements to flag product version
compatibility and incompatibility.
Josh Hames, SAS
Gerry Nelson, SAS
This paper illustrates a permutation method for implementing multiple
comparisons on Pearson s Chi-square test for an R×C contingency
table, using the SAS® FREQ procedure and a newly developed
SAS macro called CHISQ_MC. This method is analogous to the Tukey-type
multiple comparison method for one-way analysis of variance.
Man Jin, Forest Research Institute
Binhuan Wang, New York University School of Medicine
Power producers are looking for ways to not only improve efficiency of
power plant assets but also to grow concerns about the environmental
impacts of power generation without compromising their market
competitiveness. To meet this challenge, this study demonstrates the
application of data mining techniques for process optimization in a
coal-fired power plant in Thailand with 97,920 data records. The main
purpose is to determine which factors have a great impact on both (1) heat
rate (kJ/kWh) of electrical energy output and (2) opacity of the flue gas
exhaust emissions. As opposed to the traditional regression analysis
currently employed at the plant and based on Microsoft Excel, more complex
analytical models using SAS® Enterprise Miner™
help supporting managerial decision to improve the overall performance of
the existing energy infrastructure while reducing emissions through a
change in the energy supply structure.
Thanrawee Phurithititanapong, National Institute of Development
Administration
Jongsawas Chongwatpol, National Institute of Development Administration
Detecting patterns in graphics output is much easier when numeric data can
be grouped categorically. Such is the case with Body Mass Index and its
four classifications: underweight, normal weight, overweight, and obese.
This presentation goes from conventional histogram to asymmetric violin
plot with coverage of the categorical histogram along the way. HISTOGRAM,
BANDPLOT, and LATTICE statements are described in context. SAS®
9.3 must be used to replicate the graphs.
Perry Watts, Stakana Analytics
Administrators at Western Kentucky University rely on the Institutional
Research department to perform detailed statistical analyses to deepen the
understanding of issues associated with enrollment management, student and
faculty performance, and overall program operations. This paper presents
several instances of analyses performed for the university to help it
identify and recruit suitable candidates, uncover root causes in grade and
enrollment trends, evaluate faculty effectiveness, and assess the impact
of student characteristics, programs, or student activities on retention
and graduation rates. The paper briefly discusses the data infrastructure
created and used by Institutional Research. For each analysis performed,
it reviews the SAS® program and key components of the SAS
code involved. The studies presented include the use of SAS®
Enterprise Miner™ to create a retention model
incorporating dozens of student background variables. It shows an
examination of grade trends in the same courses taught by different
faculty and subsequent student behavior and success, providing insights
into the nuances and subtleties of evaluating faculty performance. Another
analysis uncovers the possible influence of fraternities and sororities in
freshmen algebra courses. Two investigations explore the impact of
programs on student retention and graduation rates. Each example and its
findings illustrate how Institutional Research can support the
administration of university operations. The target audience is any SAS
professional interested in learning more about Institutional Research in
higher education and how SAS software is used by an Institutional Research
department to serve its organization.
Matthew Foraker, Western Kentucky University
SAS® High-Performance Analytics is a significant step
forward in the area of high-speed, analytic processing in a scalable
clustered environment. However, Big Data problems generally come with data
from lots of data sources, at varying levels of maturity. Teradata s
innovative Unified Data Architecture (UDA) represents a significant
improvement in the way that large companies can think about Enterprise
Data Management, including the Teradata Database, Hortonworks Hadoop, and
Aster Data Discovery platform in a seamless integrated platform. Together,
the two platforms provide business users, analysts, and data scientists
with the ideally suited data management platforms, targeted specifically
to their analytic needs, based upon analytic use cases, managed in a
single integrated enterprise data management environment. The paper will
focus on how several companies today are using Teradata s Integrated
Hardware and Software UDA Platform to manage a single enterprise analytic
environment, fight the ongoing proliferation of analytic data marts, and
speed their operational analytic processes.
John Cunningham, Teradata Corporation
SAS® solutions are tightly integrated with the scheduling
capabilities provided by SAS® Grid Manager and Platform
Suite for SAS®. Many organizations require that their
corporate scheduler be used to control SAS processing within the
enterprise. Historically this has been a laborious process, requiring
duplication of job and flow information using manual forms and cumbersome
change management. This paper provides proven techniques and methods that
enable tight integration between the corporate scheduler and SAS without
the administrative overhead. Platform Suite for SAS can be used to create
flows which are then executed by the corporate scheduler. The business
unit can tweak the flow without reference to the enterprise scheduling
team. The approaches discussed are: Using the corporate scheduler to:
Trigger SAS flows and to respond to flow return codes Restart a SAS flow
that has exited due to error conditions Enable and disable LSF queues,
allowing jobs that have been queued up to run within a time window that is
managed on external dependencies rather than time How to configure your
SAS environment to leverage the provided capabilities Real-world use cases
to highlight the features and benefits of this approach The contents of
this paper is of interest to SAS administrators and IT personnel
responsible for enterprise scheduling. Full code and deployment
instructions will be made available.
Paul Northrop, SAS
Controlled vocabularies define a common set of concepts that retain their
meaning across contexts, supporting consistent use of terms to annotate,
integrate, retrieve, and interpret information. Controlled vocabularies
are large hierarchical structures that cannot be represented using typical
SAS® practices (e.g., SAS format statements and hash
objects). This paper compares and contrasts three models for representing
hierarchical structures using SAS data sets: adjacency list, path
enumeration, and nested set (Celko, 2004; Mackey, 2002). Specific
controlled vocabularies include a university organizational structure and
several biological vocabularies (MeSH, NCBI Taxonomy, and GO). The paper
presents data models and SAS code for populating tables and performing
queries. The paper concludes with a discussion of implications for data
warehouse implementation and future work related to efficiency of update
and delete operations.
Glenn Colby, University of Colorado Boulder
The use, limits, and misuse of statistical models in different industries
are propelling new techniques and best practices in forecasting. Until
recently, many factors such as data collection and storage constraints,
poor data synchronization capabilities, technology limitations, and
limited internal analytical expertise have made it impossible to forecast
intermittent demand. In addition, integrating consumer demand data (that
is, point-of-sale [POS]/syndicated scanner data from ACNielsen/
Information Resources Inc. [IRI]/Intercontinental Marketing Services
[IMS]) to shipment forecasts was a challenge. This presentation gives
practical how-to advice on intermittent forecasting and outlines a
framework, using multi-tiered causal analysis (MTCA), that links demand to
supply. The framework uses a process of nesting causal models together by
using data and analytics.
Edward Katz, SAS
This presentation addresses two main topics: The first topic focuses on
the industry's norms and the best practices for building internal credit
ratings (PD, EAD, and LGD). Although there is not any capital relief to
local US banks using internal credit ratings (the US hs not adopted the
Internal Rating Based approach of Basel2, with the exception of the top 10
banks), there is an increased responsiveness in credit ratings modeling
for the last two years in the US banking industry. The main reason is the
added value a bank can achieve from these ratings, and that is the focus
of the second part of this presentation. It describes our journey (a
client story) for getting there, introducing the SAS®
project. Even more importantly, it describes how we use credit ratings in
order to achieve effective credit risk management and get real added value
out of that investment. The key success factor for achieving it is to
effectively implement ratings within the credit process and throughout
decision making . Only then can ratings be used to improve risk-adjusted
return on capital, which is the high-end objective of all of us.
Boaz Galinson, Bank Leumi
Understanding the actual gambling behavior of an individual over the
Internet, we develop markers which identify behavioral patterns, which in
turn can be used to predict the level of risk a subscriber is prone to
gambling. The data set contains 4,056 subscribers. Using SAS®
Enterprise Miner™ 12.1, a set of models are run to
predict which subscriber is likely to become a high-risk internet gambler.
The data contains 114 variables such as first active date and first active
product used on the website as well as the characteristics of the game
such as fixed odds, poker, casino, games, etc. Other measures of a
subscriber s data such as money put at stake and what odds are being bet
are also included. These variables provide a comprehensive view of a
subscriber s behavior while gambling over the website. The target variable
is modeled as a binary variable, 0 indicating a risky gambler and 1
indicating a controlled gambler. The data is a typical example of
real-world data with many missing values and hence had to be transformed,
imputed, and then later considered for analysis. The model comparison
algorithm of SAS Enterprise Miner 12.1 was used to determine the best
model. The stepwise Regression performs the best among a set of 25 models
which were run using over a 100 permutations of each model. The Stepwise
Regression model predicts a high-risk Internet gambler at an accuracy of
69.63% with variables such as wk4frequency and wk3frequency of bets.
Sai Vijay Kishore Movva, Oklahoma State University
Vandana Reddy, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Organizations today make numerous decisions within their businesses that
affect almost every aspect of their daily operations. Many of these
decisions are now automatically generated by sophisticated enterprise
decision management systems. These decisions include what offers to make
to customers, sales transaction processing, payment processing, call
center interactions, industrial maintenance, transportation scheduling,
and thousands of other applications that all have a significant impact on
the business bottom line. Concurrently, many of these same companies have
developed or are now developing analytics that provide valuable insight
into their customers, their products, and their markets. Unfortunately,
many of the decision systems cannot maximize the power of analytics in the
business processes at the point where the decisions are made. SAS®
Decision Manager is a new product that integrates analytical models with
business rules and deploys them to operational systems where the decisions
are made. Analytically driven decisions can be monitored, assessed, and
improved over time. This paper describes the new product and its use and
shows how models and business rules can be joined into a decision process
and deployed to either batch processes or to real-time web processes that
can be consumed by business applications.
Steve Sparano, SAS
Charlotte Crain, SAS
David Duling, SAS
This session introduces frailty models and their use in biostatistics to
model time-to-event or survival data. The session uses examples to review
situations in which a frailty model is a reasonable modeling option, to
describe which SAS® procedures can be used to fit frailty
models, and to discuss the advantages and disadvantages of frailty models
compared to other modeling options.
John Amrhein, McDougall Scientific Ltd.
The NLIN procedure fits a wide variety of nonlinear models. However, some
models can be so nonlinear that standard statistical methods of inference
are not trustworthy. That s when you need the diagnostic and inferential
features that were added to PROC NLIN in SAS/STAT® 9.3,
12.1, and 13.1. This paper presents these features and explains how to use
them. Examples demonstrate how to use parameter profiling and confidence
curves to identify the nonlinearcharacteristics of the model parameters.
They also demonstrate how to use the bootstrap method to study the
sampling distribution of parameter estimates and to make more accurate
statistical inferences. This paper highlights how measures of nonlinearity
help you diagnose models and decide on potential reparameterization. It
also highlights how multithreading is used to tame the large number of
nonlinear optimizations that are required for these features.
Biruk Gebremariam, SAS
Item response theory (IRT) is concerned with accurate test scoring and
development of test items. You design test items to measure various types
of abilities (such as math ability), traits (such as extroversion), or
behavioral characteristics (such as purchasing tendency). Responses to
test items can be binary (such as correct or incorrect responses in
ability tests) or ordinal (such as degree of agreement on Likert scales).
Traditionally, IRT models have been used to analyze these types of data in
psychological assessments and educational testing. With the use of IRT
models, you can not only improve scoring accuracy but also economize test
administrations by adaptively using only the discriminative items. These
features might explain why in recent years IRT models have become
increasingly popular in many other fields, such as medical research,
health sciences, quality-of-life research, and even marketing research.
This paper describes a variety of IRT models, such as the Rasch model,
two-parameter model, and graded response model, and demonstrates their
application by using real-data examples. It also shows how to use the IRT
procedure, which is new in SAS/STAT® 13.1, to calibrate
items, interpret item characteristics, and score respondents. Finally, the
paper explains how the application of IRT models can help improve test
scoring and develop better tests. You will see the value in applying item
response theory, possibly in your own organization!
Xinming An, SAS
Yiu-Fai Yung, SAS
Traditional SAS® programs typically consist of a series of
SAS DATA steps, which refine input data sets until the final data set or
report is reached. SAS DATA steps do not run in-database. However, SAS®
Enterprise Guide® users can replicate this kind of iterative
programming and have the resulting process flow run in-database by linking
a series of SAS Enterprise Guide Query Builder tasks that output SAS views
pointing at data that resides in a Teradata database, right up to the last
Query Builder task, which generates the final data set or report. This
session both explains and demonstrates this functionality.
Frank Capobianco, Teradata
Formats are an often under-valued tool in the SAS® toolbox.
They can be used in just about all domains to improve the readability of a
report, or they can be used as a look-up table to recode your data. Out of
the box, SAS includes a multitude of ready-defined formats that can be
applied without modification to address most recode and redisplay
requirements. And if that s not enough, there is also a FORMAT procedure
for defining your own custom formats. This paper looks at using some of
the formats supplied by SAS in some innovative ways, but primarily focuses
on the techniques we can apply in creating our own custom formats.
Brian Bee, The Knowledge Warehouse Ltd
Paper SAS119-2014:
Lessons Learned from SAS® 9.4 High-Availability and Failover
Testing
SAS® 9.4 has improved clustering capabilities that allow for
scalability and failover for middle-tier servers and the metadata server.
In this presentation, we share our experiences with high-availability and
failover testing done prior to SAS 9.4 availability. We discuss what we
tested and lessons learned (good and bad) while doing the testing.
Susan Bartholow, SAS
Arthur Hunt, SAS
Renee Lorden, SAS
Report automation and scheduling are very hot topics in many industries.
They confer many advantages including reduced work load, elimination of
repetitive tasks, generatation of accurate results, and better
performance. This paper illustrates how to design an appropriate program
to automate and schedule reports in SAS® 9.1 and SAS®
Enterprise Guide® 5.1 using a SAS® server as
well as the Windows Scheduler. The automation part includes good aspects
of formatting Microsoft Excel tables using XML or VBA coding or any other
formats, and conditional auto e-mailing with file attachments. We
systematically walk through each step with a clear flow diagram from the
data source to the final destination. We also discuss details of
server-side and PC-side schedulers and how these schedulers involve
invoking batch programs.
Anjan Matlapudi, AmerihealthCaritas
Despite its popularity in recent years, .NET development has yet to enjoy
the quality, level, and depth of statistical support that has always been
provided by SAS®. And yet, many .NET applications could
benefit greatly from the power of SAS and, likewise, some SAS applications
could benefit from friendly graphical user interfaces (GUIs) supported by
Microsoft s .NET Framework. What the author sets out to do here is to 1)
outline the basic mechanics of automating SAS with .NET, 2) provide a
framework and specific strategies for maintaining parallelism between the
two platforms at runtime, and 3) sketch out put some simple applications
that provide an exciting combination of powerful SAS analytics and highly
accessible GUIs. The mechanics of automating SAS with .NET will be covered
briefly. Attendees will learn the required objects and methods needed to
pass information between the two platforms. The attendees will learn some
strategies for organizing their projects and for writing SAS code that
lends itself to automation. This will include embedding SAS scripts within
a .NET project and managing communications between the two platforms.
Specifically, the log and listing output will be captured and handled by
.NET, and user actions will be interpreted and sent to the SAS engine.
Example applications used throughout the session include a tool that
converts between SAS variable types through simple drag-and-drop and an
application that analyzes the growth of the user s computer hard drive.
Matthew Duchnowski, Educational Testing Service (ETS)
Being flexible and highlighting important details in your output is
critical. The use of ODS ESCAPECHAR allows the SAS®
programmer to insert inline formatting functions into variable values
through the DATA step, and it makes for a quick and easy way to highlight
specific data values or modify the style of the table cells in your
output. What is an easier and more efficient way to concatenate those
inline formatting functions to the variable values? This paper shows how
the CAT functions can simplify this task.
Yanhong Liu, Cincinnati Children's Hospital Medical Center
Justin Bates, Cincinnati Children's Hospital Medical Center
Traditional merchandise planning processes have been primarily product and
location focused, with decisions about assortment selection, breadth and
depth, and distribution based on the historical performance of merchandise
in stores. However, retailers are recognizing that in order to compete and
succeed in an increasingly complex marketplace, assortments must become
customer-centric. Advanced analytics can be leveraged to generate
actionable insights into the relevance of merchandise to a retailer's
various customer segments and purchase channel preferences. These insights
enrich the merchandise and assortment planning process. This paper
describes techniques for using advanced analytics to impact
customer-centric assortments. Topics covered include approaches for
scoring merchandise based on customer relevance and preferences,
techniques for gaining insight into customer relevance without customer
data, and an overall approach to a customer-driven merchandise planning
process.
Christopher Matz, SAS
Ensemble models combine two or more models to enable a more robust
prediction, classification, or variable selection. This paper describes
three types of ensemble models: boosting, bagging, and model averaging. It
discusses go-to methods, such as gradient boosting and random forest, and
newer methods, such as rotational forest and fuzzy clustering. The
examples section presents a quick setup that enables you to take fullest
advantage of the ensemble capabilities of SAS® Enterprise
Miner™ by using existing nodes, Start Groups and End
Groups nodes, and custom coding.
Miguel M. Maldonado, SAS
Jared Dean, SAS
Wendy Czika, SAS
Susan Haller, SAS
Paper 2085-2014:
Leveraging Mathematical Optimization to Innovate our Laundry Portfolio
Architecture
Designing laundry products has become more complex and challenging over
the years. This has occurred for many reasons: portfolio expansions,
rapidly changing supply conditions, and product cost pressures to name a
few. The pace of change is fast and ever-increasing. Simplifying our
approach to our product portfolio is desired in order to increase our
agility and enable us to react to these rapidly changing conditions. This
talk will describe the application of mathematical optimization to create
a more agile and productive approach for managing our product portfolio.
Kevin Norwood, Procter & Gamble
The soaring number of publicly available data sets across disciplines have
allowed for increased access to real-life data for use in both research
and educational settings. These data often leverage cost-effective complex
sampling designs including stratification and clustering, which allow for
increased efficiency in survey data collection and analyses. Weighting
becomes a necessary component in these survey data in order to properly
calculate variance estimates and arrive at sound inferences through
statistical analysis. Generally speaking, these weights are included with
the variables provided in the public use data, though an explanation for
how and when to use these weights is often lacking. This paper presents an
analysis using the California Health Interview Survey to compare weighted
and non-weighted results using SAS® PROC LOGISTIC and PROC
SURVEYLOGISTIC.
Tyler Smith, National University
Besa Smith, Analydata
There is an increasing interest in exploring healthcare practices and
costs for the American working population and their dependents to improve
the quality and efficiency of care and to compare healthcare performance.
Comparative data is needed to evaluate and benchmark financial and
clinical performance. Because of the large amounts of comparative
dataavailable, it is useful to use data exploration tools. In this paper,
the authors describe theirexperience building a prototype to extract data
from MarketScan ResearchDatabases, load the data into SAS®
Visual Analytics, and explore this healthcaredata to understand drug
adherence for a diabetes population.
Al Cordoba, Truven Health Analytics
Jim Fenton, SAS
William Marder, Truven Health Analytics
Tony Pepitone, Truven Health Analytics
Paper SAS216-2014:
Leveraging SAS® Visualization Technologies to Increase the
Global Competency of the US Workforce
U.S. educators face a critical new imperative: to prepare all students for
work and civic roles in a globalized environment in which success
increasingly requires the ability to compete, connect, and cooperate on an
international scale. The Asia Society and the Longview Foundation are
collaborating on a project to show both the need for and supply of
globally competent graduates. This presentation shows you how SAS assisted
these organizations with a solution that leverages SAS®
visualization technologies in order to produce a heatmap application. The
heatmap application surfaces data from over 300 indicators and surfaces
over a quarter million data points in a highly iterative heatmap
application. The application features a drillable map that shows data at
the state level as well as at the county level for all 50 states. This
endeavor involves new SAS® 9.4 technology to both combine
the data and to create the interface. You'll see how SAS procedures, such
as PROC JSON, which came out in SAS 9.4, were used to prepare the data for
the web application. The user interface demonstrates how SAS/GRAPH®
output can be combined with popular JavaScript frameworks like Dojo and
Twitter Bootstrap to create an HTML5 application that works on desktop,
mobile, and tablet devices.
Jim Bauer, SAS
SAS® Environment Manager is included with the release of
SAS® 9.4. This exciting new product enables administrators
to monitor the performance and operation of their SAS®
deployments. What very few people are aware of is that the data collected
by SAS Environment Manager is stored in a centralized data mart that's
designed to help administrators better understand the behavior and
performance of the components of their SAS solution stack. This data mart
could also be used to help organizations to meet their ITIL reporting and
measurement requirements. In addition to the information about alerts,
events, and performance metrics collected by the SAS Environment Manager
agent technology, this data mart includes the metadata audit and content
usage data previously available only from the SAS® Audit,
Performance and Measurement Package.
Bob Bonham, SAS
Greg Smith, SAS
This paper provides a set of ideas about design elements of SAS®
macros. This paper is a checklist for programmers who write or test
macros.
Ronald Fehd, Stakana Analytics
Effective graphs are indispensable for modern statistical analysis. They
reveal tendencies that are not readily apparent in simple tables and add
visual clarity to reports. My client is a big graph fan; he always shows
me a lot of high-quality and complex sample graphs that were created by
other software and asks me Can SAS® duplicate these outputs?
Often, by leveraging the capabilities of the ODS Graph Template Language
and the SGRENDER procedure, the answer is Yes . Graph Template Language
offers SAS users a more direct approach to customize the output and to
overlay graphs in different levels. This paper uses cases drawn from a
real work situation to demonstrate how to get the seemingly unattainable
results with the power of Graph Template Language: utilizing bubble plots
as your distribution density bars creating refreshing looking linear
regression graphics with the slop information in the legend overlaying
different plots together to create sophisticated analytical bottleneck
test output
Wen Song, ICF International
Ge Wu, Johns Hopkins University
If you have been programming SAS® for years, you have
probably made Display Manager your own: customized window layout, program
text colors, bookmarks, and abbreviations/keyboard macros. Now you are
using SAS® Enterprise Guide®. Did you know you
can have almost all the same modifications you had in Base SAS®
in SAS Enterprise Guide, plus more?
John Ladds, Statistics Canada
How do you compare group responses when the data are unbalanced or when
covariates come into play? Simple averages will not do, but LS-means are
just the ticket. Central to postfitting analysis in SAS/STAT®
linear modeling procedures, LS-means generalize the simple average for
unbalanced data and complicated models. They play a key role both in
standard treatment comparisons and Type III tests and in newer techniques
such as sliced interaction effects and diffograms. This paper reviews the
definition of LS-means, focusing on their interpretation as predicted
population marginal means, and it illustrates their broad range of use
with numerous examples.
Weijie Cai, SAS
A common complaint from users working on identifying fraud and abuse in
Medicare is that teams focus on operational applications, static reports,
and high-level outliers. But, when faced with the need to constantly
evaluate changing Medicare provider and beneficiary or enrollee dynamics,
users are clamoring for more dynamic and accurate detection approaches.
Providing these organizations with a data discovery and predictive
analytics framework that leverages Hadoop and other big data approaches,
while providing a clear path for teams to make more fact-based decisions
more quickly is very important in pre- and post-fraud and abuse analysis.
Organizations that do pursue a framework and a reusable services-based
data discovery and analytics framework and architecture approach enjoy
greater success in supporting data management, reporting, and analytics
demands. They can quickly turn models into prioritized alerts and avoid
improper or fraudulent payments. A successful framework should enable
organizations to come up with efficient fraud, waste, and abuse models to
address complex schemes; identify fraud, waste, and abuse vulnerabilities;
and shorten triage efforts using a variety of data sourced from big data
platforms like Hadoop and other relational database management systems.
This paper talks about the data management, data discovery, predictive
analytics, and social network analysis capabilities that are included in
the SAS fraud framework and how a unified approach can significantly
reduce the lifecycle of building and deploying fraud models. We hope this
paper will provide IT leaders with a clear path for resolving issues from
the simple to the incredibly complex, through a measured and scalable
approach for delivering value for fraud, waste, and abuse models by
providing deep insights to support evidence-based investigations.
Vivek Sethunatesan, Northrop Grumman Corp
When we start programming, we simply hope that the log comes out with no
errors or warnings. Yet once we have programmed for a while, especially in
the area of pharmaceutical research, we realize that having a log with
specific, useful information in it improves quality and accountability. We
discuss clearing the log, sending the log to an output file, helpful
information to put in the log, which messages are permissible, automated
log checking, adding messages regarding data changes, whether or not we
want to see source code, and a few other log-related ideas. Hopefully, the
log will become something that we keep in mind from the moment we start
programming.
Emmy Pahmer, inVentiv Health Clinical
Today's business needs require 24/7 access to your data in order to
perform complex queries to your analytical store. In addition, you might
need to periodically update your analytical store to append new data,
delete old data, or modify some existing data. Historically, the size of
your analytical tables or the window in which the table must be updated
can cause unacceptable downtime for queries. This paper describes how you
can use new SAS® Scalable Performance Data Server 5.1
cluster table features to simulate transaction isolation in order to
modify large sections of your cluster table. These features are optimized
for extremely fast operation and can be done without affecting any
on-going queries. This provides both the continuous query access and
periodic update requirements for your analytical store that your business
model requires.
Guy Simpson, SAS
Email is an important marketing channel for digital marketers. We can stay
connected with our subscribers and attract them with relevant content as
long as they are still subscribed to our email communication. In this
session, we are planning to discuss why it's important to manage opt-out
risk; how did we predict opt-out risk; and how do we proactively manage
opt-out using the models we developed.
Jia Lei (Carol) Li, Gilt Groupe
Data governance combines the disciplines of data quality, data management,
data policy management, business process management, and risk management
into a methodology that ensures important data assets are formally managed
throughout an enterprise. SAS® has developed a cohesive
suite of technologies that can be used to implement efficient and
effective data governance initiatives, thereby improving an enterprise s
overall data management efficiency. This paper discusses data governance
use cases and challenges, and provides an example of how to manage the
data governance lifecycle to ensure success.
Scott Gidley, SAS
The capabilities of SAS® have been extended by the use of
macros and custom formats. SAS macro code libraries and custom format
libraries can be stored in various locations, some of which may or may not
always be easily and efficiently accessed from other operating
environments. Code can be in various states of development ranging from
global organization-wide approved libraries to very elementary
just-getting-started code. Formalized yet flexible file structures for
storing code are needed. SAS user environments range from standalone
systems such as PC SAS or SAS on a server/mainframe to much more complex
installations using multiple platforms. Strictest attention must be paid
to (1) file location for macros and formats and (2) management of the lack
of cross-platform portability of formats. Macros are relatively easy to
run from their native locations. This paper covers methods of doing this
with emphasis on: (a) the option sasautos to define the location and the
search order for identifying macros being called, and (b) even more
importantly the little-known SAS option MAUTOLOCDISPLAY to identify the
location of the macro actually called in the saslog. Format libraries are
more difficult to manage and cannot be created and run in a different
operating system than that in which they were created. This paper will
discuss the export, copying and importing of format libraries to provide
cross-platform capability. A SAS macro used to identify the source of a
format being used will be presented.
Roger Muller, Data-To-Events, Inc.
This paper describes a technique for calibrating street address match
logic to maximize the match rate without introducing excessive erroneous
matching.
Richard Cadieux, Towers Watson
Dan Bretheim, Towers Watson
One of the most common questions about logistic regression is How do I
know if my model fits the data? There are many approaches to answering
this question, but they generally fall into two categories: measures of
predictive power (like R-squared) and goodness of fit tests (like the
Pearson chi-square). This presentation looks first at R-squared measures,
arguing that the optional R-squares reported by PROC LOGISTIC might not be
optimal. Measures proposed by McFadden and Tjur appear to be more
attractive. As for goodness of fit, the popular Hosmer and Lemeshow test
is shown to have some serious problems. Several alternatives are
considered.
Paul Allison, University of Pennsylvania
Breast cancer is the most common cancer among females globally. After
being diagnosed and treated for breast cancer, patients fear the
recurrence of breast cancer. Breast cancer recurrence (BCR) can be defined
as the return of breast cancer after primary treatment, and it can recur
within the first three to five years. BCR studies have been conducted
mostly in developed countries such as the United States, Japan, and
Canada. Thus, the primary aim of this study is to investigate the
feasibility of building a medical scorecard to assess the risk of BCR
among Malaysian women. The medical scorecard was developed using data from
454 out of 1,149 patients who were diagnosed and underwent treatment at
the Department of Surgery, Hospital Kuala Lumpur from 2006 until 2011. The
outcome variable is a binary variable with two values: 1 (recurrence) and
0 (remission). Based on the availability of data, only 13 categorical
predictors were identified and used in this study. The predictive
performance of the Breast Cancer Recurrence scorecard (BCR scorecard)
model was compared to the standard logistic regression (LR) model. Both
the BCR scorecard and LR model were developed using SAS®
Enterprise Miner™ 7.1. From this exploratory study,
although the BCR scorecard model has better predictive ability with a
lower misclassification rate (18%) compared to the logistic regression
model (23%), the sensitivity of the BCR scorecard model is still low,
possibly due to the small sample size and small number of risk factors.
Five important risk factors were identified: histological type, race,
stage, tumor size, and vascular invasion in predicting recurrence status.
Nurul Husna Jamian, Universiti Teknologi Mara
Yap Bee Wah, Universiti Teknologi Mara
Nor Aina Emran, Hospital Kuala Lumpur
SAS® has a large portfolio of Java EE applications. In
releases previous to SAS® 9.4, SAS provides support for
configuring, deploying, and running these applications in Oracle WebLogic,
IBM WebSphere, or Red Hat JBoss. Beginning with SAS® 9.4,
SAS has updated the middle-tier architecture to deliver and run these web
applications exclusivcely in the SAS® Web Application Server
(a specialized, extended configuration of Pivotal tc Server), rather than
the other thrid-party web application servers. This paper discusses the
motivation, technology selections, and architecture on which this change
is based. It also describes the advantages that the new approach presents
to customers, including increased automation of installation and
configuration tasks, and improved system administration.
Zhiyong Li, SAS
Alec Fernandez, SAS
In applied statistical practice, incomplete measurement sequences are the
rule rather than the exception. Fortunately, in a large variety of
settings, the stochastic mechanism governing the incompleteness can be
ignored without hampering inferences about the measurement process. While
ignorability only requires the relatively general missing at random
assumption for likelihood and Bayesian inferences, this result cannot be
invoked when non-likelihood methods are used. We will first sketch the
framework used for contemporary missing-data analysis. Apart from
revisiting some of the simpler but problematic methods, attention will be
paid to direct likelihood and multiple imputation. Because popular
non-likelihood-based methods do not enjoy the ignorability property in the
same circumstances as likelihood and Bayesian inferences, weighted
versions have been proposed. This holds true in particular for generalized
estimating equations (GEE). Even so-called doubly-robust versions have
been derived. Apart from GEE, also pseudo-likelihood based strategies can
be adapted appropriately. We describe a suite of corrections to the
standard form of pseudo-likelihood, to ensure its validity under
missingness at random. Our corrections follow both single and double
robustness ideas, and is relatively simple to apply.
Geert Molenberghs, Universiteit Hasselt & KU Leuven
Mobile devices are taking over conventional ways of sharing and presenting
information in today s businesses and working environments. Accessibility
to this information is a key factor for companies and institutions in
order to reach wider audiences more efficiently. SAS®
software provides a powerful set of tools that allows developers to
fulfill the increasing demand in mobile reporting without needing to
upgrade to the latest version of the platform. Here at University of
Central Florida (UCF), we were able to create reports targeting our iPad
consumers at our executive level by using the SAS® 9.2
Enterprise Business Intelligence environment, specifically SAS®
Web Report Studio 4.3. These reports provide them with the relevant data
for their decision-making process. At UCF, the goal is to provide
executive consumers with reports that fit on one screen in order to avoid
the need of scrolling and that are easily exportable to PDF. This is done
in order to respond to their demand to be able to accomodate their
increasing use of portable technology to share sensitive data in a timely
manner. The technical challenge is to provide specific data to those
executive users requesting access through their iPad devices.
Compatibility issues arise but are successfully bypassed. We are able to
provide reports that fit on one screen and that can be opened as a PDF if
needed. These enhanced capabilities were requested and well received by
our users. This paper presents techniques we use in order to create mobile
reports.
Carlos Piemonti, University of Central Florida
Bootstrapped Decision Tree is a variable selection method used to identify
and eliminate unintelligent variables from a large number of initial
candidate variables. Candidates for subsequent modeling are identified by
selecting variables consistently appearing at the top of decision trees
created using a random sample of all possible modeling variables. The
technique is best used to reduce hundreds of potential fields to a short
list of 30 50 fields to be used in developing a model. This method for
variable selection has recently become available in JMP®
under the name BootstrapForest; this paper presents an implementation in
Base SAS®9. The method does accept but does not require a
specific outcome to be modeled and will therefore work for nearly any type
of model, including segmentation, MCMC, multiple discrete choice, in
addition to standard logistic regression. Keywords: Bootstrapped Decision
Tree, Variable Selection
David Corliss, Magnify Analytic Solutions
For most practitioners, ordinary least square (OLS) regression with a
Gaussian distributional assumption might be the top choice for modeling
fractional outcomes in many business problems. However, it is conceptually
flawed to assume a Gaussian distribution for a response variable in the
[0, 1] interval. In this paper, several modeling methodologies for
fractional outcomes with their implementations in SAS® are
discussed through a data analysis exercise in predicting corporate
financial leverage ratios. Various empirical and conceptual methods for
the model evaluation and comparison are also discussed throughout the
example. This paper provides a comprehensive survey about how to model
fractional outcomes.
WenSui Liu, Fifth Third Bancorp
Jason Xin, SAS
Predicting loss given default (LGD) is playing an increasingly crucial
role in quantitative credit risk modeling. In this paper, we propose to
apply mixed effects models to predict corporate bonds LGD, as well as
other widely used LGD models. The empirical results show that mixed
effects models are able to explain the unobservable heterogeneity and to
make better predictions compared with linear regression and fractional
response regression. All the statistical models are performed in
SAS/STAT®, SAS® 9.2, using specifically PROC
REG and PROC NLMIXED, and the model evaluation metrics are calculated in
PROC IML. This paper gives a detailed description on how to use PROC
NLMIXED to build and estimate generalized linear models and mixed effects
models.
Xiao Yao, The University of Edinburgh
Jonathan Crook, The University of Edinburgh
Galina Andreeva, The University of Edinburgh
While survey researchers make great attempts to standardize their
questionnaires including the usage of ratings scales in order to collect
unbiased data, respondents are still prone to introducing their own
interpretation and bias to their responses. This bias can potentially
affect the understanding of commonly investigated drivers of customer
satisfaction and limit the quality of the recommendations made to
management. One such problem is scale use heterogeneity, in which
respondents do not employ a panoramic view of the entire scale range as
provided, but instead focus on parts of the scale in giving their
responses. Studies have found that bias arising from this phenomenon was
especially prevalent in multinational research, e.g., respondents of some
cultures being inclined to use only the neutral points of the scale.
Moreover, personal variability in response tendencies further complicates
the issue for researchers. This paper describes an implementation that
uses a Bayesian hierarchical model to capture the distribution of
heterogeneity while incorporating the information present in the data.
More specifically, SAS® PROC MCMC is used to carry out a
comprehensive modeling strategy of ratings data that account for
individual level scale usage. Key takeaways include an assessment of
differences between key driver analyses that ignore this phenomenon versus
the one that results from our implementation. Managerial implications are
also emphasized in light of the prevalent use of more simplistic
approaches.
Jorge Alejandro, Market Probe
Sharon Kim, Market Probe
For over three decades, SAS® has provided capabilities for
beating your data into submission. In June of 2000, SAS acquired a company
called DataFlux in order to add data quality capabilities to its
portfolio. Recently, SAS folded DataFlux into the mother ship. With
SAS® 9.4, SAS® Enterprise Data Integration
Server and baby brother SAS® Data Integration Server were
upgraded into a series of new bundles that still include the former
DataFlux products, but those products have grown. These new bundles
include data management, data governance, data quality, and master data
management, and come in advanced and standard packaging. This paper
explores these offerings and helps you understand what this means to both
new and existing customers of SAS® Data Management and
DataFlux products. We break down the marketing jargon and give you
real-world scenarios of what customers are using today (prior to SAS 9.4)
and walk you through what that might look like in the SAS 9.4 world. Each
scenario includes the software that is required, descriptions of what each
of the components do (features and functions), as well as the likely
architectures that you might want to consider. Finally, for existing SAS
Enterprise Data Integration Server and SAS® Data Integration
Server customers, we discuss implications for migrating to SAS Data
Management and detail some of the functionality that may be new to your
organization.
Greg Nelson, ThotWave Technologies
Lisa Dodson, SAS
Over the past decade, sports analytics has seen an explosion in research
and model development to calculate wins, reaching cult popularity with the
release of the film 'Moneyball.' The purpose of this paper is to explore
the methodology of solving a real-life Moneyball problem in basketball. An
optimal basketball lineup will be selected in an attempt to maximize the
total points per game while maximizing court coverage. We will briefly
review some of the literature that has explored this type of problem,
traditionally called the maximum coverage problem (MCP) in operations
research. An exploratory data analysis will be performed, including
visualizations and clustering in order to prep the modeling dataset for
optimization. Finally, SAS® will be used to formulate an MCP
problem, and additional constraints will be added to run different
business scenarios.
Sabah Sadiq, Deloitte Consulting
Jing Zhao, Deloitte Consulting
More organizations are understanding the importance of geo-tagged data and
the need for tools that can successfully combine location data with
business metrics to provide intelligent outputs that are beyond a simple
map. SAS® Visual Analytics provides a robust and powerful
platform for achieving location intelligence performed with a combination
of SAS® Analytics and GIS mapping technologies such as that
offered by Esri. This paper describes the essentials for achieving
location intelligence and demonstrates with industry examples how SAS
Visual Analytics makes it possible.
Falko Schulz, SAS
Anand Chitale, SAS
This paper considers the %MRE macro for estimating multivariate ratio
estimates. Also, we use PROC REG to estimate multivariate regression
estimates and to show that regression estimates are superior to the ratio
estimates.
Alan Silva, Universidade de Brasilia
Two examples of Vector Autoregressive Moving Average modeling with
exogenous variables are given in this presentation. Data is from the real
world. One example is about a two-dimensional time series for wages and
prices in Denmark that spans more than a hundred years. The other is about
the market for agricultural products, especially eggs! These examples give
a general overview of the many possibilities offered by PROC VARMAX, such
as handling of seasonality, causality testing and Bayesian modeling, and
so on.
Anders Milhøj, University of Copenhagen
This presentation features implementation leads from SAS®
Professional Services and Health Canada's Non-Insured Health benefits
(NIHB) program, on a joint implementation of SAS® Fraud
Framework for Health Care. The presentation walks through the fast-paced
implementation of NIHB's Pharmacy Surveillance System that guards Canadian
taxpayers from undue costs, and protects the safety of NIHB clients. This
presentation is a blend of project management and technical material, and
presents both the client (NIHB) and consultant (SAS) perspectives
throughout the story. The presentation converges onto several core
principles needed to successfully deliver analytical solutions.
Jeffrey Menzies, Health Canada
Ian Ghent, SAS
Sometimes the notes, warnings, and errors in the SAS® Log
window can be cryptic, at best. Hours of programming and deciphering the
log can make a person feel a little down and somewhat nutty. What if there
was a way to make the SAS log informative and amusing at the same time?
Having the option to change how the SAS log communicates might actually
keep a user from throwing his or her computer out the window. Our aim is
to help thousands of SAS programmers understand how the messages in the
log can be interpreted in an entertaining way.
Ethan Miller, Ethanomics LLC
Rebecca Ottesen, California Polytechnic State University
Greater data availability leads to potentially greater depth and subtlety
of modeling, but building a model and gaining actionable business insight
from analytic data is fundamentally a fixed process (there are no short
cuts). There are different impacts, however. Big Data analytic processing
taxes the process in one way, while analytic exploration taxes it in
another.
Michael Ralston, HP - Vertica
SAS/OR® software for operations research includes
mathematical optimization, discrete-event simulation, and project and
resource scheduling capabilities. This paper surveys a number of its new
features that better equip you to address decision-making challenges such
as planning, resource management, and asset allocation. Optimization
performance improvements help you solve larger, more detailed problems
more quickly. Improvements encompass linear, mixed integer linear, and
nonlinear optimization, and include multithreading of the mixed integer
linear solver and major improvements in the performance and functionality
of the decomposition algorithm for linear and mixed integer linear
optimization. The OPTMODEL procedure for optimization modeling adds direct
access to the same set of efficient network optimization algorithms
available via the OPTNET procedure in SAS/OR, enabling you to embed
network optimization as a component of larger solution processes. Other
new features enable you to execute multiple optimizations in parallel and
use the FCMP procedure to define functions. The OPTLSO procedure for
global and local search optimization adds the ability to work with
multiple objective functions and produce a set of Pareto-optimal
solutions. This approach enables you to manage the trade-offs that arise
between competing objectives and adds to the range of optimization
problems that you can solve using PROC OPTLSO. Another new feature is
support for the READ_ARRAY function in PROC FCMP, with which you can much
more easily input array-structured data to be used in function
definitions. Finally, SAS® Simulation Studio for
discrete-event simulation enhances its graphical interface to better
support customization and increase ease of use.
Ed Hughes, SAS
Rob Pratt, SAS
Most programmers are familiar with the directive Know your data. But not
everyone knows about all the data and metadata that a SAS®
data set holds or understands what to do with this information. This
presentation talks about the majority of these attributes, how to obtain
them, why they are important, and what you can do with them. For example,
data sets that have been around for a while might have an inordinate
number of deleted observations that you are carrying around unnecessarily.
Or you might be able to quickly check to determine whether the data set is
indexed and if so, by what variables in order to increase your program s
performance. Also, engine-dependent data such as owner name and file size
is found in PROC CONTENTS output, which is useful for understanding and
managing your data. You can also use ODS output in order to use the values
of these many attributes programmatically. This presentation shows you
how.
Diane Olson, SAS
In business environments, a common obstacle to effective data-informed
decision making occurs when key stakeholders are reluctant to embrace
statistically derived predicted values or forecasts. If concerns regarding
model inputs, underlying assumptions, and limitations are not addressed,
decision makers might choose to trust their gut and reject the insight
offered by a statistical model. This presentation explores methods for
converting potential critics into partners by proactively involving them
in the modeling process and by incorporating simple inputs derived from
expert judgment, focus groups, market research, or other directional
qualitative sources. Techniques include biasing historical data, what-if
scenario testing, and Monte Carlo simulations.
John Parker, GSK
It is often the case that parameters in a predictive model should be
restricted to an interval that is either reasonable or necessary given the
model s application. A simple and classic example of such a restriction is
the regression model which requires that all parameters to be positive. In
the case of multiple least squares (MLS) regression, the resulting model
is therefore strictly additive and, in certain applications, not only
appropriate but also intuitive. This special case of an MLS model is
commonly referred to as a nonnegative least squares regression. While Base
SAS® contains a multitude of ways to perform a multiple
least squares regression (PROC REG and PROC GLM, to name two), there
exists no native SAS® procedure to conduct a nonnegative
least squares regression. The author offers a concise way to conduct the
nonnegative least squares analysis by using PRON NLIN (proc non-linear ).
PROC NLIN offers user restriction on parameter estimates. By fashioning a
linear model in the framework of a nonlinear procedure, the end result can
be achieved. As an additional corollary, the author will show how to
calculate the _RSQUARE_ statistic for the resulting model, which has been
left out of the PROC NLIN output for the reason that it is invalid in most
cases (though not ours).
Matthew Duchnowski, Educational Testing Service (ETS)
This poster shows the audience step-by-step how to connect to a database
without registering the connection in either the Windows ODBC
Administrator tool or in the Windows Registry database. This poster also
shows how the connection can be more flexible and better managed by
building it into a SAS® macro.
Jesper Michelsen, Nykredit
When creating an OLAP cube, you have the option of specifying a
drill-through table, also known as a Show Details table. This quick tip
discusses the implications of using your detail table as your
drill-through table and explores some viable alternatives.
Michelle Buchecker, SAS
When viewing and working with SAS® data sets especially wide
ones it s often instinctive to rearrange the variables (columns) into some
intuitive order. The RETAIN statement is one of the most commonly cited
methods used for ordering variables. Though RETAIN can perform this task,
its use as an ordering clause can cause a host of easily missed problems
due to its intended function of retaining values across DATA step
iterations. This risk is especially great for the more novice SAS
programmer. Instead, two equally effective and less risky ways to order
data set variables are recommended, namely, the FORMAT and SQL SELECT
statements.
Andrew Clapson, Statistics Canada
Part of being a good analyst and statistician is being able to understand
the output of a statistical test in SAS®. P-values are
ubiquitous in statistical output as well as medical literature and can be
the deciding factor in whether a paper gets published. This shows a
somewhat dictatorial side of them. But do we really know what they mean?
In a democratic process, people vote for another person to represent them,
their values, and their opinions. In this sense, the sample of research
subjects, their characteristics, and their experience, are combined and
represented to a certain degree by the p-value. This paper discusses
misconceptions about and misinterpretations of the p-value, as well as how
things can go awry in calculating a p-value. Alternatives to p-values are
described, with advantages and disadvantages of each. Finally, some
thoughts about p-value interpretation are given. To disarm the dictator,
we need to understand what the democratic p-value can tell us about what
it represents&.and what it doesn't. This presentation is aimed at
beginning to intermediate SAS statisticians and analysts working with
SAS/STAT®.
Brenda Beaty, University of Colorado
Michelle Torok, University of Colorado
Have you ever asked, Why doesn't my PDF output look just like my HTML
output? This paper explains the power and differences of each destination.
You ll learn how each destination works and understand why the output
looks the way it does. Learn tips and tricks for how to modify your
SAS
® code to make each destination look more like the other.
The tips span from beginner to advanced in all areas of reporting. Each
destination is like a superhero, helping you transform your reports to
meet all your needs. Learn how to use each ODS destination to the fullest
extent of its powers.
Download the ZIP file
Scott Huntley, SAS
Cynthia Zender, SAS
PD_Calibrate is a macro that standardizes the calibration of our
predictive credit-scoring models at Nykredit. The macro is activated with
an input data set, variables, anchor point, specification of method,
number of buckets, kink-value, and so on. The output consists of graphs,
HTML, and two data sets containing key values for the model being
calibrated and values for the use of graphics.
Keld Asnæs, Nykredit a/s
Jesper Michelsen, Nykredit
ODS is a power tool for generating HTML-based reports. Quite often,
however, there are exacting requirements for report content, layout, and
placement that can be done with HTML (and especially HTML5) that can t be
done with ODS. This presentation shows several examples that use PROC
STREAM and SAS® Server Pages in a batch (for example,
scheduled tasks, using SAS® Display Manager, using SAS®
Enterprise Guide®) to generate such custom reports. And yes,
despite the name SAS Server Pages, this technology, including the use of
jQuery widgets, does apply to batch environments. This paper describes and
shows several examples that are similar to those presented in the SAS®
Press book SAS Server Pages: Generating Dynamic Content
(http://support.sas.com/publishing/authors/extras/64993b.html) and on the
author s blog Jurassic SAS in the BI/EBI World
(http://hcsbi.blogspot.com/): creating a custom calendar; a sample
mail-merge application; generating a custom Microsoft Excel-based report;
and generating an expanding drill-down table.
Don Henderson, Henderson Consulting Services
Quite often when building web applications that use either the SAS®
Stored Process Server or the SAS/IntrNet® Applications
Dispatcher, it is necessary to create a custom user interface to prompt
for the needed parameters. For example, generating a custom user interface
can be accomplished by chaining stored processes together. The first
stored process generates the user interface where the user selects the
desired options and uses PROC STREAM to process and input SAS®
Server Pages to display the user interface. The second (or later) stored
process in the chain generates the desired output. This paper describes
and shows several examples similar to those presented in the SAS®
Press book SAS Server Pages: Generating Dynamic Content
(http://support.sas.com/publishing/authors/extras/64993b.html) and on the
author s blog Jurassic SAS in the BI/EBI World
(http://hcsbi.blogspot.com/).
Don Henderson, Henderson Consulting Services
PROC TABULATE is a powerful tool for creating tabular summary reports. Its
advantages, over PROC REPORT, are that it requires less code, allows for
more convenient table construction, and uses syntax that makes it easier
to modify a table s structure. However, its inability to compute the sum,
difference, product, and ratio of column sums has hindered its use in many
circumstances. This paper illustrates and discusses some creative
approaches and methods for overcoming these limitations, enabling users to
produce needed reports and still enjoy the simplicity and convenience of
PROC TABULATE. These methods and skills can have prominent applications in
a variety of business intelligence and analytics fields.
Justin Jia, Canadian Imperial Bank of Commerce (CIBC)
Amanda Lin, Bell Canada
A time-consuming part of statistical analysis is building an analytic data
set for statistical procedures. Whether it is recoding input values,
transforming variables, or combining data from multiple data sources, the
work to create an analytic data set can take time. The DS2 programming
language in SAS® 9.4 simplifies and speeds data preparation
with user-defined methods, storing methods and attributes in shareable
packages, and threaded execution on multi-core SMP and MPP machines. Come
see how DS2 makes your job easier.
Jason Secosky, SAS
Robert Ray, SAS
Greg Otto, Teradata Corporation
The linear logistic test model (LLTM) that incorporates the cognitive task
characteristics into the Rasch model has been widely used for various
purposes in educational contexts. However, the LLTM model assumes that the
variance of item difficulties is completely accounted for by cognitive
attributes. To overcome the disadvantages of the LLTM, Janssen and
colleagues (2004) proposed the crossed random-effects (CRE) LLTM by adding
the error term on item difficulty. This study examines the accuracy and
precision of the CRE-LLTM in terms of parameter estimation for cognitive
attributes. The effect of different factors (for example, sample size,
population distributions, sparse or dense matrices, and test length), is
examined. PROC GLIMMIX was used to do the analysis and SAS/IML®
software was used to generate data.
Chunhua Cao, University of South Florida
Yan Wang, University of South Florida
Yi-hsin Chen, University of South Florida
Isaac Li, University of South Florida
Graphs in oncology studies are essential for getting more insight about
the clinical data. This presentation demonstrates how ODS Graphics can be
effectively and easily used to create graphs used in oncology studies. We
discuss some examples and illustrate how to create plots like drug
concentration versus time plots, waterfall charts, comparative survival
plots, and other graphs using Graph Template Language and ODS Graphics
procedures. These can be easily incorporated into a clinical report.
Debpriya Sarkar, SAS
The effectiveness of visual interpretation of the differences between
pairs of LS-means in a generalized linear model includes the graph's
ability to display four inferential and two perceptual tasks. Among the
types of graphs which display some or all of these tasks are the forest
plot, the mean-mean scatter plot (diffogram), and closely related to it,
the mean-mean multiple comparison (MMC) plot. These graphs provide
essential visual perspectives for interpretation of the differences among
pairs of LS-means from a generalized linear model (GLM). The diffogram is
a graphical option now available through ODS statistical graphics with
linear model procedures such as GLIMMIX. Through combining ODS output
files of the LS-means and their differences, the SGPLOT procedure can
efficiently produce forest and MMC plots.
Robin High, University of Nebraska Medical Center
Power analysis helps you plan a study that has a controlled probability of
detecting a meaningful effect, giving you conclusive results with maximum
efficiency. SAS/STAT® provides two procedures for performing
sample size and power computations: the POWER procedure provides analyses
for a wide variety of different statistical tests, and the GLMPOWER
procedure focuses on power analysis for general linear models. In SAS/STAT
13.1, the GLMPOWER procedure has been updated to enable power analysis for
multivariate linear models and repeated measures studies. Much of the
syntax is similar to the syntax of the GLM procedure, including both the
new MANOVA and REPEATED statements and the existing MODEL and CONTRAST
statements. In addition, PROC GLMPOWER offers flexible yet parsimonious
options for specifying the covariance. One such option is the
two-parameter linear exponent autoregressive (LEAR) correlation structure,
which includes other common structures such as AR(1), compound symmetry,
and first-order moving average as special cases. This paper reviews the
new repeated measures features of PROC GLMPOWER, demonstrates their use in
several examples, and discusses the pros and cons of the MANOVA and
repeated measures approaches.
John Castelloe, SAS
The SQL procedure contains many powerful and elegant language features for
intermediate and advanced SQL users. This presentation discusses topics
that will help SAS® users unlock the many powerful features,
options, and other gems found in the SQL universe. Topics include CASE
logic; a sampling of summary (statistical) functions; dictionary tables;
PROC SQL and the SAS macro language interface; joins and join algorithms;
PROC SQL statement options _METHOD, MAGIC=101, MAGIC=102, and MAGIC=103;
and key performance (optimization) issues.
Kirk Paul Lafler, Software Intelligence Corporation
Paper 1506-2014:
Practical Considerations in the Development of a Suite of Predictive
Models for Population Health Management
The use of predictive models in healthcare has steadily increased over the
decades. Statistical models now are assumed to be a necessary component in
population health management. This session will review practical
considerations in the choice of models to develop, criteria for assessing
the utility of the models for production, and challenges with
incorporating the models into business process flows. Specific examples of
models will be provided based upon work by the Health Economics team at
Blue Cross Blue Shield of North Carolina.
Daryl Wansink, Blue Cross Blue Shield of North Carolina
Over the years, there has been a growing concern about consumption of
tobacco among youth. But no concrete studies have been done to find what
exactly leads the children to start consuming tobacco. This study is an
attempt to figure out the potential reasons for the same. Through our
analysis, we have also tried to build A model to predict whether a child
would smoke next year or not. This study is based on the 2011 National
Youth Tobacco Survey data of 18,867 observations. In order to prepare data
for insightful analysis, imputation operations were performed on the data
using tree-based imputation methods. From a pool of 197 variables, 48 key
variables were selected using variable selection methods, partial least
squares, and decision tree models. Logistic Regression and Decision Tree
models were built to predict whether a child would smoke in the next year
or not. Comparing the models using Misclassification rate as the selection
criteria, we found that the Stepwise Logistic Regression Model
outperformed other models with a Validation Misclassification of 0.028497,
47.19% Sensitivity and 95.80% Specificity. Factors such as company of
friends, cigarette brand ads, accessibility to the tobacco products, and
passive smoking turned out to be the most important predictors in
determining a child smoker. After this study, we could outline some
important findings like the odds of a child taking up smoking are 2.17
times high when his close friends are also smoking.
Jin Ho Jung, Oklahoma State University
Gaurav Pathak, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
An increase in sea levels is a potential problem that is affecting the
human race and marine ecosystem. Many models are being developed to find
out the factors that are responsible for it. In this research, the
Memory-Based Reasoning model looks more effective than most other models.
This is because this model takes the previous solutions and predicts the
solutions for forthcoming cases. The data was collected from NASA. The
data contains 1,072 observations and 10 variables such as emissions of
carbon dioxide, temperature, and other contributing factors like electric
power consumption, total number of industries established, and so on.
Results of Memory-Based Reasoning models like RD tree, scan tree, neural
networks, decision tree, and logistic regression are compared. Fit
statistics, such as misclassification rate and average squared error are
used to evaluate the model performance. This analysis is used to predict
the rise in sea levels in the near future and to take the necessary
actions to protect the environment from global warming and natural
disasters.
Prasanna K S Sailaja Bhamidi, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Introduction to the PAR Framework, a non-profit, member-driven
collaborative for student success providing affordable predictive
analytics, innovative benchmark reports, and intervention assessment tools
to colleges and universities nationwide.
Heidi Hiemstra, PAR Framework
The steady expansion of electronic health records (EHR) over the past
decade has increased the use of observational healthcare data for
analysis. One of the challenges with EHR data is to combine information
from different domains (diagnosis, procedures, drugs, adverse events,
labs, quality of life scores, and so on) onto a single timeline to get a
longitudinal view of the patient. This enables the physician or researcher
to visualize a patient's health profile, thereby revealing anomalies,
trends, and responses graphically,thus empowering them to treat more
effectively. This paper attempts to provide a composite view of a patient
by using SAS® Graph Template Language to create a profile
graph using the following data elements: key event dates, drugs, adverse
events, Quality of Life (QoL) scores. For visualization, the GTL graph
uses X and X2 axes for dates, vertical reference lines to represent key
dates (for example, when the disease is first diagnosed), horizontal bar
plot for duration of drugs taken and adverse events reported, and a series
plot at the bottom to show the QoL score.
Radhikha Myneni, SAS
Eric Brinsfield, SAS
Sparse data sets are common in applications of text and data mining,
social network analysis, and recommendation systems. In SAS®
software, sparse data sets are usually stored in the coordinate list (COO)
transactional format. Two major drawbacks are associated with this sparse
data representation: First, most SAS procedures are designed to handle
dense data and cannot consume data that are stored transactionally. In
that case, the options for analysis are significantly limited. Second, a
sparse data set in transactional format is hard to store and process in
distributed systems. Most techniques require that all transactions for a
particular object be kept together; this assumption is violated when the
transactions of that object are distributed to different nodes of the
grid. This paper presents some different ideas about how to package all
transactions of an object into a single row. Approaches include storing
the sparse matrix densely, doing variable selection, doing variable
extraction, and compressing the transactions into a few text variables by
using Base64 encoding. These simple but effective techniques enable you to
store and process your sparse data in better ways. This paper demonstrates
how to use SAS® Text Miner procedures to process sparse data
sets and generate output data sets that are easy to store and can be
readily processed by traditional SAS modeling procedures. The output of
the system can be safely stored and distributed in any grid environment.
Zheng Zhao, SAS
Russell Albright, SAS
James Cox, SAS
In this session, you learn how Kaiser Permanente has taken a centralized
production support approach to using SAS® Enterprise
Guide® 4.3 in the healthcare industry. Kaiser Permanente
Northwest (KPNW) has designed standardized processes and procedures that
have allowed KPNW to streamline the support of production content, which
enabled KPNW analytical resources to focus more on new content development
rather than on maintenance and support of steady state programs and
processes. We started with over 200 individual SAS®
processes across four different SAS platforms, SAS Enterprise Guide,
Mainframe SAS®, PC SAS® and SAS®
Data Integration Studio, in oder to standardize our development approach
on SAS Enterprise Guide and build efficient and scalable processes within
our department and across the region. We walk through the need for change,
how the team was set up, provide an overview of the UNIX SAS platform,
walk through the standard production requirements (developer pack), and
review lessons learned.
Ryan Henderson, Kaiser Permanente
Karl Petith, Kaiser Permanente
In a good clinical study, statisticians and various stakeholders are
interested in assessing and isolating the effect of non-study drugs. One
common practice in clinical trials is that clinical investigators follow
the protocol to taper certain concomitant medications in an attempt to
prevent or resolve adverse reactions and/or to minimize the number of
subject withdrawals due to lack of efficacy or adverse event. To assess
the impact of those tapering medicines during study is of high interest to
clinical scientists and the study statistician. This paper presents the
challenges and caveats of assessing the impact of tapering a certain type
of concomitant medications using SAS® 9.3 based on a
hypothetical case. The paper also presents the advantages of visual graphs
in facilitating communications between clinical scientists and the study
statistician.
Iuliana Barbalau, Santen Inc.
Chen Shi, Santen Inc
Yang Yang, Santen Inc.
Many SAS® procedures use classification variables when they
are processing the data. These variables control how the procedure forms
groupings, summarizations, and analysis elements. For statistics
procedures, they are often used in the formation of the statistical model
that is being analyzed. Classification variables can be explicitly
specified with a CLASS statement, or they can be specified implicitly from
their usage in the procedure. Because classification variables have such a
heavy influence on the outcome of so many procedures, it is essential that
the analyst have a good understanding of how classification variables are
applied. Certainly there are a number of options (system and procedural)
that affect how classification variables behave. While you may be aware of
some of these options, a great many are new, and some of these new options
and techniques are especially powerful. You really need to be open to
learning how to program with CLASS.
Art Carpenter, California Occidental Consultants
Multi-site health science-related, distributed data networks are becoming
increasingly popular, particularly at a time where big data and privacy
are often competing priorities. Distributed data networks allow
individual-level data to remain behind the firewall of the data holder,
permitting the secure execution of queries against those local data and
the return of aggregated data produced from those queries to the
requester. These networks allow the use of multiple, varied sources of
data for study purposes ranging from public health surveillance to
comparative effectiveness research, without compromising data holders
concerns surrounding data security, patient privacy, or proprietary
interests. This paper focuses on the experiences of the Mini-Sentinel
pilot project as a case study for using SAS® to design and
build infrastructure for a successful multi-site, collaborative,
distributed data network. Mini-Sentinel is a pilot project sponsored by
the U.S. Food and Drug Administration (FDA) to create an active
surveillance system the Sentinel System to monitor the safety of
FDA-regulated medical products. The paper focuses on the data and
programming aspects of distributed data networks but also visits
governance and administrative issues as they relate to the maintenance of
a technical network.
Jennifer Popovic, Harvard Pilgrim Health Care Institute/Harvard Medical
School
Do you find it difficult to dress up your graphs for your reports or
presentations? SAS® 9.4 introduced new capabilities in ODS
Graphics that give you the ability to style your graphs without creating
or modifying ODS styles. Some of the new capabilities include the
following: a new option for controling how ODS styles are applied graph
syntax for overriding ODS style attributes for grouped plots the ability
to define font glyphs and images as plot markers enhanced attribute map
support In this presentation, we discuss these new features in detail,
showing examples in the context of Graph Template Language and ODS
Graphics procedures.
Dan Heath, SAS
Can you juggle? Maybe. Can you shuffle a deck of cards? Probably. Can you
do both at the same time? Welcome to the world of SAS® and
LSF! Very few SAS Administrators start out learning LSF at the same time
they learn SAS; most already know SAS, possibly starting out as a
programmer or analyst, but now have to step up to an enterprise platform
with shared resources. The biggest challenge on an enterprise platform?
How to share! How to maximum the utilization of a SAS platform, yet still
ensure everyone gets their fair share? This presentation will boil down
the 2000+ pages of LSF documentation to provide an introduction into
various LSF concepts: * Host * Clusters * Nodes * Queues *
First-Come-First-Serve * Fairshare * and various configuration settings:
UJOB_LIMIT, PJOB_LIMIT, etc. Plus some insight on where to configure all
these settings which are set up by the installation process, and which can
be configured by the SAS or LSF administrator. This session is definitely
NOT for experts. It is for those about to step into an enterprise
deployment of SAS, and want to understand how the SAS server sessions they
know so well can run on a shared platform.
Andrew Howell, ANJ Solutions
Are you time-poor and code-heavy? It's easy to get into a rut with your
SAS® code, and it can be time-consuming to spend your time
learning and implementing improved techniques. This presentation is
designed to share quick improvements that take five minutes to learn and
about the same time to implement. The quick hits are applicable across
versions of SAS and require only Base SAS® knowledge.
Included topics are: simple macro tricks little-known functions that get
rid of messy coding dynamic conditional logic data summarization tips to
reduce data and processing testing and space utilization tips. This
presentation has proven valuable to beginner through experienced SAS
users.
Marje Fecht, Prowerk Consulting
If someone comes to you with hundreds of questionnaire forms in Microsoft
Word file format and asks you to extract the data from the forms into a
SAS® data set, you might have several ways to handle this.
However, as a SAS programmer, the shortcut is to write a SAS program to
read the data directly from Word files into a SAS data set. This paper
shows how it can be done with simple SAS programming skills, such as using
FILENAME with the PIPE option, DDE, function call EXECUTE( ), and so on.
Sijian Zhang, VA Pittsburgh Healthcare System
The ZIP access method is new with SAS® 9.4. This paper
provides several examples of reading from and writing to ZIP files using
this access method, including the use of the DATA step directory
management macros and the new MEMVAR= option.
Rick Langston, SAS
The Department of Market Monitoring (DMM) at California ISO is responsible
for promoting a robust, competitive, and nondiscriminatory electric power
market in California by keeping a close watch on the efficiency and
effectiveness of the ancillary service, congestion management, and
real-time spot markets. We monitor the potential of market participants to
exercise undue market power, the behavior of market participants that is
consistent with attempts to exercise market power and the market
performance that results from the interaction of market structure with
participant behavior. In order to perform monitoring activities
effectively, DMM collects available data, designs, and implement reporting
dashboards that track key market metrics. We are using various SAS®
BI tools to develop and employ metrics and analytic tools applicable to
market structure, participant behavior, and market performance. This paper
provides details about the effective use of various SAS BI tools to
implement an automated real time market monitoring functionality.
Amol Deshmukh, California ISO Corp.
Jeff McDonald, California ISO Corp.
Paper SAS368-2014:
Recommendation Systems at Scale
With the coming of very big data sets, timely analysis and fast
recommendations of items for industries to users are of particular
importance. The SAS® LASR analytic server, which processes
requests at great speed due to its high-performance, multi-threaded and
gridded analytic code, provides an in-memory analytic platform for our
recommendation system and makes it possible for rapid and accurate
recommendations. This paper describes how PROC recommend in SAS®
LASR analytic server works from loading tables, selecting models, tuning
parameters to final recommending. The paper also uses movie rating data
sets to show how to evaluate different models based on various metrics.
Wayne Thompson, SAS
Predicting news articles that customers are likely to view/read next
provides a distinct advantage to news sites. Collaborative filtering is a
widely used technique for the same. This paper details an approach within
collaborative filtering that uses the cosine similarity function to
achieve this purpose. The paper further details two different approaches,
customized targeting and article level targeting, that can be used in
marketing campaigns. Please note that this presentation connects with
Session ID 1887. Session ID 1887 happens immediately following this
session
Rajendra Ledalla Venkata Naga, GE Capital Retail Finance
Qing Wang, Warwick Business School
John Dilip Raj, GE Capital Retail Finance
Personalized recommender systems are being used in many industries to
increase customer engagement. In the TV industry, this is primarily used
to increase viewership, which in turn increases market share, revenue, and
profit. This paper attempts to develop a recommender system using the
correlation procedure under collaborative filtering methodology. The only
data requirement for this recommendation system would be past viewership
of customers for a given time period. Please note that this session
connects with Session ID 1886. Session ID 1886 happens immediately prior
to this session
John Dilip Raj, GE
Ledalla Venkata Naga Rajendra, GE
Qing Wang, Warwick Business School
Healthcare expenditure growth continues to be a prominent healthcare
policy issue, and the uncertain impact of the Affordable Care Act (ACA)
has put increased pressure on payers to find ways to exercise control over
costs. Fueled by provider performance analytics, BCBSNC has developed
innovative strategies that recognize, reward, and assist providers
delivering high-quality and efficient care. A leading strategy has been
the introduction of a new tiered network product called Blue Select, which
was launched in 2013 and will be featured in the State Health Exchange.
Blue Select is a PPO with differential member cost-sharing for tiered
providers. Tier status of providers is determined by comparing providers
to their peers on the efficiency and quality of the care they delivery.
Providers who meet or exceed the standard for quality, measured using
Healthcare Effectiveness Data and Information Set (HEDIS) adherence rates
and potentially avoidable complication rates for certain procedures, are
then evaluated on their case-mix adjusted costs for total episodic costs.
Each practice s performance is compared, through indirect standardization,
to expected performance, given the patients and conditions treated within
a practice. A ratio of observed to expected performance is calculated for
both cost and quality to use in determining the tier status of providers
for Blue Select. While the primary goal of provider tiering is cost
containment through member steerage, the initiative has also resulted in
new and strengthened collaborative relationships between BCBSNC and
providers. The strategy offers the opportunity to bend the cost curve and
provide meaningful change in the quality of healthcare delivery.
Stephanie Poley, BCBSNC
Analyzing automobile policies and claims is an ongoing area of interest to
the insurance industry. Although there have been many data mining projects
in insurance sector over the past decade, the following questions How can
insurance firms retain their best customers? Will this damaged car be
covered and get claim payment? How much of loss of claims associated with
this policy will be? do remain as common. This study applies data mining
techniques using SAS® Enterprise Miner™ to
enhance insurance policies and claims. The main focus is on assessing how
corporate fleet customers policy characteristics and claim behavior are
different from that of non-fleet customers. With more than 100,000 data
records, implementing advanced analytics help create better planning for
policy and claim management strategy.
Kittipong Trongsawad, National Institute of Development Administration
Jongsawas Chongwatpol, National Institute of Development Administration
Duration and severity data arise in several fields including
biostatistics, demography, economics, engineering, and sociology. SAS®
procedures LIFETEST, LIFEREG. and PHREG are the workhorses for analysis of
time to event data in applications in biostatistics. Similar methods apply
to the magnitude or severity of a random event, where the outcome might be
right, left, or interval censored and/or, right or left truncated. All
combinations of types of censoring and truncation could be present in the
data set. Regression models such as the accelerated failure time model,
the Cox model, and the non-homogeneous Poisson model have extensions to
address time-varying covariates in the analysis of clustered outcomes,
multivariate outcomes of mixed types, and recurrent events. We present an
overview of new capabilities that are available in the procedures QLIM,
QUANTLIFE, RELIABILITY, and SEVERITY with examples illustrating their
application using empirical data sets drawn from easily accessible
sources.
Joseph Gardiner, Michigan State University
In healthcare, we often express our analytics results as being adjusted .
For example, you might have read a study in which the authors reported the
data as age-adjusted or risk-adjusted. The concept of adjustment is widely
used in program evaluation, comparing quality indicators across providers
and systems, forecasting incidence rates, and in cost-effectiveness
research. In order to make reasonable comparisons across time, place, or
population, we need to account for small sample sizes and case-mix
variation in other words, we need to level the playing field and account
for differences in health status and for uniqueness in a given population.
If you are new to healthcare. What it really means to adjust the data in
order to make comparisons might not be obvious. In this paper, we explore
the methods by which we control for potentially confounding variables in
our data. We do so through a series of examples from the healthcare
literature in both primary care and health insurance. In this survey of
methods, we discuss the concepts of rates and how they can be adjusted for
demographic strata (such as age, gender, and race), as well as health risk
factors such as case mix.
Greg Nelson, ThotWave
This presentation takes a look at DirectPay, a company that collects and
buys consumer claims of all types. It developed a model with SAS®
Enterprise Miner™ to determine the risk of fraud by a
debtor and a debtor's creditworthiness. This model is focused on the added
value of more and better data. Since 2010, all credit and fraud scores
have been calculated using DirectPay's own data and models. In addition,
the presentation explores the use of SAS® Visual Analytics
as both a management information and an analytical tool since early 2013.
Colin Nugteren, DirectPay Services BV
Paper SAS2161-2014:
Retailing in the Era of the Tech Titans.An Annual Update
Over the last decade, five companies have begun to aggressively reshape
the landscape of multiple industries and to change retailing forever. They
are the Tech Titans: Amazon, Apple, eBay, Facebook, and Google. At last
year s SAS® Retail Users Group, we spoke of these titans.
They ve made such an impact since then that they deserve revisiting.
Several other tech giants who also have this same potential are joining
the battle. These companies are taking a formidable bite not only out of
retailing, but also out of advertising, publishing, movies, television,
communications, financial services, health care, and insurance. This
session highlights the strategies of these companies and what progressive
retailers are doing to not only fight back, but to leverage the titans.
Lori Schafer, SAS
This paper reveals the human mobility behavior in the metropolitan area of
Rio de Janeiro, Brazil. The base for this study is the mobile phone data
provided by one of the largest mobile carriers in Brazil. Mobile phone
data comprises a reasonable variety of information, including data about
time and location for call activity throughout urban areas. This
information might be used to build users trajectories over time,
describing the major characteristics of the urban mobility within the
city. A variety of distribution analyses is presented in this paper aiming
clearly describes the most relevant characteristics of the overall
mobility in the metropolitan area of Rio de Janeiro. In addition to that,
methods from physics to describe trends in trips such as gravity and
radiation models were computed and compared in terms of granularity of the
geographic scales and also in relation to traditional data mining approach
such as linear regressions. A brief comparison in terms of performance in
predicting the amount of trips between pairs of locations is presented at
the end.
Carlos Andre Reis Pinheiro, KU Leuven
The project focuses on using analytics to reveal unwarranted use of access
to medical records, i.e. employees in health organizations that access
information about neighbours, friends, celebrities, etc., without a sound
reason to do so. The method is based on the natural assumption that the
vast majority of lookups are legitimate lookups that differ from a
statistically defined normal behavior will be subject to manual
investigation. The work was carried out in collaboration between SAS
Institute Norway and the largest Norwegian hospital, Oslo University
Hospital (OUS) and was aimed at establishing whether the method is
suitable for unveiling unwarranted lookups in medical records. A number of
so called scenarios are used to indicate adverse behaviour, each
responsible for looking at one particular aspect of journal access data.
For instance, one scenario determines the timeliness of a lookup relative
to the patient's admission history; another judges whether the medical
competency of the employee is relevant to the situation of the patient at
the time of the lookup. We have so far designed and developed a library of
around 20 scenarios that together are used in weighted combination to
render a final judgment of the appropriateness of the lookup. The approach
has been proven highly successful, and a further development of these
ideas is currently being done, the aim of which is to establish a joint
Norwegian solution to the problem of unwarranted access. Furthermore, we
believe that the approach and the framework may be utilised in many other
industries where sensitive data is being processed, such as financial,
police, tax and social services. In this paper, the method is outlined, as
well as results of its application on data from OUS.
Heidi Thorstensen, Oslo University Hospital
Torulf Mollestad, SAS
Spinal epidural abscess (SEA) is a serious complication in hemodialysis
(HD) patients, yet there is little medical literature that discusses it.
This analysis identified risk factors and co-morbidities associated with
SEA, as well as risk factors for mortality following the diagnosis. All
incident HD cases from the United States Renal Data System for calendar
years 2005 2008 were queried for a diagnosis of SEA. Potential clinical
covariates, survival, and risk factors were recovered using ICD-9
diagnosis codes. Log-binomial regressions were performed using PROC GENMOD
to assess the relative risks, and Cox regression models were run using
PROC PHREG to estimate hazard ratios for mortality. For the 4-year study
period, 660/355084 (0.19%) HD patients were identified with SEA, the
largest cohort to date. Older age (RR=1.625), infectious comorbidities
including bacteremia (RR=7.7976), methicillin-resistant Staphylococcus
aureus infection (RR=2.6507), hepatitis C (RR=1.545), and non-infectious
factors including diabetes (RR=1.514) and presence of vascular catheters
(RR=1.348) were identified as significant risk factors for SEA. SEA in HD
patients was associated with an increased risk of death (HR=1.20). Older
age (HR=2.269), the presence of dialysis catheters (HR=1.884), cirrhosis
(HR=1.715), decubitus ulcers (HR=1.669), bacteremia (HR=1.407), and total
parenteral nutrition (HR=1.376) constitute the greatest risk factors for
death after SEA diagnosis and thus necessitate a comprehensive approach to
management.
Chan Jin, Georgia Regents University
Jennifer White, Georgia Regents University
Rhonda Colombo, Georgia Regents University
Stephanie Baer, Georgia Regents University and Augusta VAMC
Usman Afza, Georgia Regents University
M. Kheda, Georgia Regents University
Lu Huber, Georgia Regents University
Puja Chebrolu, Georgia Regents University
N. Stanley Nahman, Georgia Regents University and Augusta VAMC
Kristina Kintziger, Georgia Regents University
Guidelines from the International Conference on Harmonisation (ICH)
suggest that clinical trial data should be actively monitored to ensure
data quality. Traditional interpretation of this guidance has often led to
100 percent source data verification (SDV) of respective case report forms
through on-site monitoring. Such monitoring activities can also identify
deficiencies in site training and uncover fraudulent behavior. However,
such extensive on-site review is time-consuming, expensive and, as is true
for any manual effort, limited in scope and prone to error. In contrast,
risk-based monitoring makes use of central computerized review of clinical
trial data and site metrics to determine whether sites should receive more
extensive quality review through on-site monitoring visits. We demonstrate
a risk-based monitoring solution within JMP® Clinical to
assess clinical trial data quality. Further, we describe a suite of tools
used for identifying potentially fraudulent data at clinical sites. Data
from a clinical trial of patients who experienced an aneurysmal
subarachnoid hemorrhage provide illustration.
Richard Zink, SAS
Paper SAS2785-2014:
SAS Hadoop Vision and Direction
The growth in the use of Hadoop is changing the way organizations are
managing data for analytics. More and more data is being captured and
stored in Hadoop -- with the intention of feeding analytics. The way data
must be structured for anlaytics hasn't changed. But, because of the
volume, there is a clear need for new tools and options for managing data
and analytic base tables (ABTs). Hear what SAS sees in the Hadoop arena
and how we are addressing this space.
Michael Ames, SAS
Donna De Capite, SAS
Paper SAS383-2014:
SAS In-Memory Forests and Beyond
Decision trees and random bootstrap forests are among the most popular
algorithms for data mining applications and competitions. Decision trees
are easy to interpret and bootstrap forests are very competitive automated
classifiers. SAS In-memory Statistics for Hadoop is a new product that
provides a decision tree fit action based on the C4.5 algorithm, and
ensemble of decision trees action based on bootstrap sampling. This paper
introduces how to use SAS In-memory Statistics for Hadoop to apply random
bootstrap forests for feature selection, clustering of unlabeled data, and
outlier detections.
Xiangxiang Meng, SAS
Paper 2462-2014:
SAS Solutions OnDemand: Cloud Focused and Data Driven
Ushering in the age of Agile Analytics, SAS Solutions OnDemand
substantiates Cloud Focused architectures through both
Software-as-a-Service (SaaS) and Enterprise Hosting of SAS Solutions.
Supporting almost 500 customer sites , tens of thousands of users, across
70 countries, has yielded a proven track record of success for deploying
solution clouds for our industry leading business analytics. If you are an
Enterprise Architect, SAS Architect, DBA, data manager, or responsible for
SAS cloud based deployments, come hear about some of our top challenges,
innovative techniques and best practices, database performance
optimizations and other SAS/ACCESS related efficiencies from a data driven
perspective. We ll look at how using new features in Oracle Database 12c
such as Multitenancy and In Memory on our Oracle Exadata systems will
further increase efficiencies towards lowering TCO while maintaining the
highest standards of security,availability, agility and performance that
our customers demand andexpect.
Patrick Wheeler, Oracle Corporation
Randy Wilcox, SAS
SAS9.4 brings a lot of progress to the interoperability between SAS and
Hadoop -the industry standard for Big Data. This talk brings you up to
date with where we are : - more distributions, more data types, more
options :-) You now have the choice to run a stand-alone cluster for SAS
HPA and VA, or to co-mingle your SAS processing on your general purpose
clusters -we'll detail some of the pros and cons of each approach, and
explore how advances in Hadoop like YARN will make managing the shared
cluster easier going forwards.
Paul Kent, SAS
Paper 2461-2014:
SAS and Oracle: Big Data and Cloud - Partnering Innovation Targets the 3rd
Platform
Visionaries Paul Kent, SAS Vice President, Big Data and David Lawler,
Oracle Senior Vice President, Product Management and Strategyshare their
strategic insight as to how and why companies must leverage the 3rd
Platform in order to be successful. IDC defines the 3rd Platform as the
convergence of Big Data, Cloud, Mobility and Social Media predicting
acceleration of uptake for 2014. This session discusses how SAS
High-Performance Analytics solutions are tackling today s big data
challenges and requisite union to what IDC refers to as data-optimized
cloud platforms . The benefits of the collaborative effort between SAS and
Oracle enable joint customers to realize tangible value by analyzing all
their data, quickly, safely and with the necessary agility to reduce time
to insight. What questions should Data Scientists & IT be asking in their
Big Data pursuits? How does the convergence of In-Memory and In-Database
create the backbone of these data-optimized cloud platforms? #DontMiss.
Paul Kent, SAS
David Lawler, Oracle
Paper SAS399-2014:
SAS and SAP - Long Term Friends
Friends are funny things. They can be fierce rivals or work great together
as a team -- sometimes both at the same time! It's the same with SAS and
SAP. There are cases where SAS and SAP compete, but at the same time, we
both recognize that we can also work together. SAS and SAP have a long
history of working together as we both evolve our technologies. This paper
will provide an overview of how SAS can help derive more value from your
SAP deployment.
Nancy Bremmer, SAS
Diane Hatcher, SAS
SAS/STAT® 13.1 brings valuable new techniques to all sectors
of the audience forSAS statistical software. Updates for survival analysis
include nonparametricmethods for interval censoring and models for
competing risks. Multipleimputation methods are extended with the addition
of sensitivity analysis.Bayesian discrete choice models offer a modern
approach for consumer research.Path diagrams are a welcome addition to
structural equation modeling, and itemresponse models are available for
educational assessment. This paper providesoverviews and introductory
examples for each of the new focus areas in SAS/STAT13.1. The paper also
provides a sneak preview of the follow-up release,SAS/STAT 13.2, which
brings additional strategies for missing data analysis andother important
updates to statistical customers.
Bob Rodriguez, SAS
Maura Stokes, SAS
SAS® 9.4 introduces several new software products to better
support SAS® web applications. These products include
SAS® Web Server, SAS® Web Application Server
(with the availability of out-of-the-box clustering), and SAS®
Environment Manager. Even though these products have been tuned and tested
for SAS 9.4 web applications, advanced users might want to know the tools
and techniques that they can use to further monitor, manage, tune, and
improve the performance of their environment. This paper discusses how
customers can achieve that by exploring the following concepts,
activities, techniques, and tools: using SAS Environment Manager to
monitor run-time performance of middle-tier components using additional
tools to monitor middle-tier components (Apache server-status, Java
VisualVM, Java command-line tools, Java GC logging) identifying the
potential bottlenecks and tuning suggestions identifying appropriate
clustering strategy (single-server vs. multi-server for homogenous or
heterogeneous clustering) suggesting the data to collect when analyzing
performance (GC data, thread dumps, heapdumps, system resource utilization
information, log files) discussing in-depth performance analysis tools
(Thread Dump Analyzer, HPjmeter, Eclipse Memory Analyzer (MAT), IBM
Support Assistant tools: GC and Memory Visualizer, Memory Analyzer,
Thread, and Monitor Dump Analyzer)
Rob Sioss, SAS
Why would a SAS® administrator need a dashboard? With the
evolution of SAS®9, the SAS administrator s role has
dramatically changed. Creating a dashboard on a SAS environment gives the
SAS administrator an overview on the environment health, ensures resources
are used as predicted, and provides a way to explore. SAS®
Visual Analytics allows you to quickly explore, analyze, and visualize
data. So, why not bring the two concepts together? In this session, you
will learn tips for designing dashboards, loading what might seem like
impossible data, and building visualizations that guide users toward the
next level of analysis. Using the dashboard, SAS administrators will learn
ways to determine the system health and how to take advantage of external
tools, such as the Metacoda software, to find additional insights and
explore problem areas.
Tricia Aanderud, And Data Inc.
Michelle Homes, Metacoda
The high school dropout problem has been called a national crisis (Heppen
& Therriault, 2008). Almost one-third of all high school students leave
the public school system before graduating (Swanson, 2004), and the
problem is particularly severe among minority students (Greene & Winters,
2005; U.S. Department of Education, 2006). Educators, researchers, and
policymakers continue to work to identify effective dropout prevention
strategies. One effective approach is to identify high-risk students at an
early stage, and then provide corresponding interventions to keep them in
school. One of the strengths of Educational Data Mining is to reveal
hidden patterns and predict future performance by analyzing accessible
student data. These predictive algorithms generated by predictive modeling
can serve as an early warning system. However, because individual schools
and districts have various combinations of race, gender, and socioeconomic
status, we cannot use a set of standardized predictors and obtain
satisfactory predictive results. Analyzing a limited number of variables
and limited historical data does not generate accurate models.
Additionally, the predictive model might not consider interactions among
predictors. The strength of data mining is the capability to analyze a
large amount of data and variables. Multiple analytic strategies
(including model comparisons) can be applied to maximize model
performance. For future goals, we propose a debuted data mining framework
to construct an early warning and trend analysis system with components of
data warehousing, data mining, and reporting at the levels of individual
students, schools, school districts, and the entire state.
Wendy Dickinson, Ringling College of Art + Design
Morgan Wang, University of Central Florida
SAS® Enterprise Guide® has become the place
through which many SAS® users access the power of SAS. Some
like it, some loathe it, some have never known anything else. In my
experience, the following attitudes prevail regarding the product: 1) I
don't know what SAS is, but I can use a mouse and I know what my business
needs are. 2) I've used SAS before, but now my company has moved to SAS
Enterprise Guide and I love it! 3) I've used SAS before, but now my
company has done something really stupid. SAS Enterprise Guide offers a
place to learn as well as work. The product offers environments for
point-and-click for those who want that, and a
type-your-code-with-semi-colons environment for those who want that. Even
better, a user can mix and match, using the best of both worlds. I show
that SAS Enterprise Guide is a great place for building up business
solutions using a step-by-step method, how we can make the best of both
environments, and how we can dip our toes into parts of SAS that might
have frustrated us in the past and made us run away and cry I ll do it in
Excel! I demonstrate that there are some very nice aspects to SAS
Enterprise Guide, out of the box, that are often ignored but that can
improve the overall SAS experience. We look at my personal nemeses,
SAS/GRAPH® and PROC TABULATE, with a side-trip to the
mysterious world that is ODS, or the Output Delivery System.
Dave Shea, Skylark Limited
Have you been programming in SAS® for a while and just aren
t sure how SAS® Enterprise Guide® can help
you? This presentation demonstrates how SAS programmers can use SAS
Enterprise Guide 5.1 as their primary interface to SAS, while maintaining
the flexibility of writing their own customized code. We explore:
navigating and customizing the SAS Enterprise Guide environment using SAS
Enterprise Guide to access existing programs and enhance processing
exploiting the enhanced development environment including syntax
completion and built-in function help using SAS® Code
Analyzer, Report Builder, and Document Builder adding Project Parameters
to generalize the usability of programs and processes leveraging built-in
capabilities available in SAS Enterprise Guide to further enhance the
information you deliver Our audience is SAS users who understand the
basics of SAS programming and want to learn how to use SAS Enterprise
Guide. This paper is also appropriate for users of earlier versions of SAS
Enterprise Guide who want to try the enhanced features available in SAS
Enterprise Guide 5.1.
Marje Fecht, Prowerk Consulting
Rupinder Dhillon, Dhillon Consulting
Changes in default behavior in the last few SAS® releases
have enabled faster processing of SAS formats, especially for
SAS/ACCESS® customers. But, as with any performance
enhancement, your results may vary. This presentation teaches you: the
differences between two important SAS format optimizations how to tell
which optimization is in effect a simple method to get the behavior you
want The target audience for this presentation is SAS/ACCESS customers,
particularly those who have also licensed SAS® In-Database
Code Accelerator for Teradata or SAS® In-Database Code
Accelerator for Greenplum.
David Wiehle, SAS
Speed, precision, reliability these are just three of the many challenges
that today s banking institutions need to face. Join Austria s ERSTE GROUP
Bank on their road from monolithic processing toward a highly flexible
processing infrastructure using SAS® Grid technology. This
paper focuses on the central topics and decisions that go beyond the
standard material about the product that is presented initially to SAS
Grid prospects. Topics covered range from how to choose the correct
hardware and critical architecture considerations to the necessary
adaptions of existing code and logic all of which have shown to be a
common experience for all the members of the SAS Grid community. After
making the initial plans and successfully managing the initial hurdles,
seeing it all come together makes you realize the endless possibilities
for improving your processing landscape.
Manuel Nitschinger, sIT-Solutions
Phillip Manschek, SAS
As organizations deploy SAS® applications to produce the
analytical results that are critical for solid decision making, they are
turning to distributed grid computing operated by SAS® Grid
Manager. SAS Grid Manager provides a flexible, centrally managed computing
environment for processing large volumes of data for analytical
applications. Exceptional storage performance is one of the most critical
components of implementing SAS in a distributed grid environment. When the
storage subsystem is not designed properly or implemented correctly, SAS
applications do not perform well, thereby reducing a key advantage of
moving to grid computing. Therefore, a well-architected SAS environment
with a high-performance storage environment is integral to clients getting
the most out of their investment. This paper introduces concepts from
software storage virtualization in the cloud for the generalized SAS Grid
Manager architecture, highlights platform and enterprise architecture
considerations, and uses the most popularly selected distributed file
system, IBM GPFS, as an example. File system scalability considerations,
configuration details, and tuning suggestions are provided in a manner
that can be applied to a client s own environment. A summary checklist of
important factors to consider when architecting and deploying a shared,
distributed file system is provided.
Gregg Rohaly, IBM
Harry Seifert, IBM
There are exciting new capabilities available from SAS®
High-Performance Analytics and SAS® Visual Analytics.
Current customers seek a deployment strategy that enables gradual
migration to the new technologies. Such a strategy would mitigate the need
for 'rip and replace' and would enable resource utilization to evolve
along a continuum rather than partitioning resources, which would result
in underused computing or storage hardware. New customers who deploy a
combination of SAS® Grid Manager, SAS High-Performance
Analytics, and SAS Visual Analytics seek to reduce the cost of computing
resources and reduce data duplication and data movement by deploying these
solutions on the same pool of hardware. When sharing hardware, it is
important to implement resource management in order to help guarantee that
resources are available for critical applications and processes. This
session discusses various methods for managing hardware resources in a
multi-application environment. Specific strategies are suggested, along
with implementation suggestions.
Ken Gahagan, SAS
Paper SAS2321-2014:
SAS® In-Memory Statistics for Hadoop
In this hands-on workshop, we introduce the highly interactive IMSTAT
procedure for developing a variety of statistical and machine-learning
models. We emphasize collocation and management of interactive analytics
within a Hadoop cluster. You learn how to prepare and load data from HDFS
into a SAS LASR® Analytic Server session, summarize and
explore the data, compute temporary columns, use GROUP BY, develop
logistic and OLS regression models, use decision trees and random woods
model, evaluate and deploy, and build recommendation models using PROC
RECOMMEND.
Michael Ames, SAS
Hui Li, SAS
Xiangxiang Meng, SAS
Wayne Thompson, SAS
This discussion uses SAS® Office Analytics as an example to
demonstrate the importance of preparing for the SAS®
installation. There are many nuances as well as requirements that need to
be addressed before you do an installation. These requirements are
basically similar, yet they differ according to the target installation
operating system. In other words, there are some differences in
preparation routines for Windows and *Nix flavors. Our discussion focuses
on these three topics: 1. Pre-installation considerations such as sizing,
storage, proper credentials, and third-party requirements; 2. Installation
steps and requirements; and 3. Post-installation configuration. In
addition to preparation, this paper also discusses potential issues and
pitfalls to watch out for, as well as best practices.
Rafi Sheikh, Analytiks International, Inc.
You've been coding in Base SAS® for a while. You've seen it,
maybe even run code written by someone else, but there is something about
the SAS® Macro Language that is preventing you from fully
embracing it. Could it be that % sign that appears everywhere, that &,
that &&, or even that dreaded &&&? Fear no more. This short presentation
will make everything clearer and encourage you to start coding your own
SAS macros.
Alex Chaplin, Bank of America
Are you wondering what is causing your valuable machine asset to fail?
What could those drivers be, and what is the likelihood of failure? Do you
want to be proactive rather than reactive? Answers to these questions have
arrived with SAS® Predictive Asset Maintenance. The solution
provides an analytical framework to reduce the amount of unscheduled
downtime and optimize maintenance cycles and costs. An all new (R&D-based)
version of this offering is now available. Key aspects of this paper
include: Discussing key business drivers for and capabilities of SAS
Predictive Asset Maintenance. Detailed analysis of the solution,
including: Data model Explorations Data selections Path I: analysis
workbench maintenance analysis and stability monitoring Path II: analysis
workbench JMP®, SAS® Enterprise
Guide®, and SAS® Enterprise Miner™
Analytical case development using SAS Enterprise Miner, SAS®
Model Manager, and SAS® Data Integration Studio SAS
Predictive Asset Maintenance Portlet for reports A realistic business
example in the oil and gas industry is used.
George Habek, SAS
Paper SAS1585-2014:
SAS® Retail Road Map
This presentation provides users with an update on retail solution
releases that have occurred in the past year and a roadmap for moving
forward.
Saurabh Gupta, SAS
Hospital readmission rates have become a key indicator for measuring the
quality of health care. Currently, use of these rates has been adopted by
major healthcare stakeholders, including the Centers for Medicare &
Medicaid Services (CMS), the Agency for Healthcare Research and Quality
(AHRQ), and the National Committee for Quality Assurance (NCQA). In the
calculation of the readmission rate, it is often a challenging task to
identify eligible hospital readmissions from the convoluted administrative
claims data. By taking advantage of the flexibility and power of SAS®
programming tools, this paper proposes three different solutions using
both DATA step and PROC SQL to help identify 30-day hospital readmissions
more efficiently and accurately. Solution 1 (DATA STEP vertically) employs
the LAG function to calculate the gap between the current admission date
and the immediate previous discharge date. This vertical thinking process
is straightforward and does not require additional data management.
Solution 2 (DATA STEP horizontally) uses PROC TRANSPOSE procedures,
ARRAYs, and DO loops to transform claims data from long to wide, and
examines each patient s hospitalization experiences in just one line. A
similar horizontal thinking process has been discussed in previous SAS
papers for calculating medication utilization. Solution 3 (PROC SQL) takes
advantage of a special table joining (self-join) by creating a Cartesian
product further subsetted by a joining condition and WHERE statements. All
three solutions have achieved the same results by correctly identifying
30-day hospital readmissions, and they can be handily applied to tackle
similar programming challenges in research projects.
Weifeng Fan, UMWA Health and Retirement Funds
Maryam Sarfarazi, UMWA Health and Retirement Funds
The UNIX host group delivers many utilities that go unnoticed. What are
these utilities, and what can they tell you about your SAS®
system? Are you having authentication problems? Are you unable to get a
result from a workspace server? What hot fixes have you applied? These are
subjects that come up during a tech support call. It would be good to have
background information about these tools before you have to use them.
Jerry Pendergrass, SAS
Universities in the UK are now subject to League Table reporting by a
range of providers. The criteria used by each League Table differ.
Universities, their faculties, and individual subject areas want to
understand how the different tables are constructed and calculated, and
what is required in order to maximize their position in each league table
in order to attract the best students to their institution, thereby
maximizing recruitment and student-related income streams. The School of
Computing and Maths at the University of Derby is developing the use
SAS® Visual Analytics to analyse each league table to
provide actionable insights as to actions that can be taken to improve
their relative standing in the league tables and also to gain insights
into feasible levels of targets relative to the peer groups of
institutions. This paper outlines the approaches taken and some of the
critical insights developed that will be of value to other higher
education institutions in the UK, and suggests useful approaches that
might be valuable in other countries.
Richard Self, University of Derby
Stuart Berry, University of Derby
Claire Foyle, University of Derby
Dave Voorhis, University of Derby
SAS® Visual Analytics delivers the power of approachable
in-memory analytics in an intuitive web interface. The scalable technology
behind SAS Visual Analytics should not benefit just the analyst or data
scientist in your organization but indeed everyone regardless of their
analytical background. This paper outlines a framework for the creation of
a cloud deployment of SAS Visual Analytics using the SAS®
9.4 platform. Based on proven best practices and existing customer
implementations, the paper focuses on architecture, processes, and design
for reliability and scalable multi-tenancy. The framework enables your
organization to move away from the departmental view of the world and to
offer analytical capabilities for consumerization and collaboration across
the enterprise.
Christopher Redpath, SAS
Nicholas Eayrs, SAS
Paper SAS1423-2014:
SAS® Workshop: Data Management
This workshop provides hands-on experience using tools in the SAS®
Data Management offering. Workshop participants will use the following
products: SAS® Data Integration Studio DataFlux®
Data Management Studio SAS® Data Management Console
Kari Richardson, SAS
Paper SAS1523-2014:
SAS® Workshop: Data Mining
This workshop provides hands-on experience using SAS®
Enterprise Miner™. Workshop participants will do the
following: open a project create and explore a data source build and
compare models produce and examine score code that can be used for
deployment
Bob Lucas, SAS
Mike Speed, SAS
Paper SAS1522-2014:
SAS® Workshop: Forecasting
This workshop provides hands-on experience using SAS®
Forecast Server. Workshop participants will do the following: create a
project with a hierarchy generate multiple forecasts automatically
evaluate the accuracy of the forecasts build a custom model
Bob Lucas, SAS
Jeff Thompson, SAS
Paper SAS1525-2014:
SAS® Workshop: High-Performance Analytics
This workshop provides hands-on experience using SAS®
Enterprise Miner™ high-performance nodes. Workshop
participants will do the following: learn the similarities and differences
between high-performance nodes and standard nodes build a project flow
using high-performance nodes extract and save a score code for model
deployment
Bob Lucas, SAS
Jeff Thompson, SAS
Paper SAS1393-2014:
SAS® Workshop: SAS® Office Analytics
This workshop provides hands-on experience using SAS® Office
Analytics. Workshop participants will complete the following tasks: use
SAS® Enterprise Guide® to access and analyze
data create a stored process that can be shared across an organization
access and analyze data sources and stored processes using the SAS®
Add-In for Microsoft Office
Eric Rossland, SAS
Paper SAS1421-2014:
SAS® Workshop: SAS® Visual Analytics
This workshop provides hands-on experience with SAS® Visual
Analytics. Workshop participants will do the following: explore data with
SAS® Visual Analytics Explorer design reports with SAS®
Visual Analytics Designer
Eric Rossland, SAS
Paper SAS1524-2014:
SAS® Workshop: Text Analytics
This workshop provides hands-on experience using SAS® Text
Miner Workshop participants will do the following: read a collection of
text documents and convert them for use by SAS Text Miner using the Text
Import node use the simple query language supported by the Text Filter
node to extract information from a collection of documents use the Text
Topic node to identify the dominant themes and concepts in a collection of
documents use the Text Rule Builder node to classify documents that have
pre-assigned categories
Tom Bohannon, SAS
Bob Lucas, SAS
Due to XML's growing role in data interchange, it is increasingly
important for SAS® programmers to become proficient with SAS
technologies and techniques for creating and consuming XML. The current
work expands on a SAS® Global Forum 2013 presentation that
dealt with these topics providing additional examples of using XML maps to
read and write XML files and using the Output Delivery System (ODS) to
create custom tagsets for generating XML.
Chris Schacherer, Clinical Data Management Systems, LLC
Traditionally, Java web applications interact with back-end databases by
means of JDBC/ODBC connections to retrieve and update data. With the
growing need for real-time charting and complex analysis types of data
representation on these types of web applications, SAS®
computing power can be put to use by adding a SAS web service layer
between the application and the database. This paper shows how a SAS web
service layer can be used to render data to a JAVA application in a
summarized form using SAS® Stored Processes. This paper also
demonstrates how inputs can be passed to a SAS Stored Process based on
which computations/summarizations are made before output parameter and/or
output data streams are returned to the Java application. SAS Stored
Processes are then deployed as SAS® BI Web Services using
SAS® Management Console, which are available to the JAVA
application as a URL. We use the SOAP method to interact with the web
services. XML data representation is used as a communication medium. We
then illustrate how RESTful web services can be used with JSON objects
being the communication medium between the JAVA application and SAS in
SAS® 9.3. Once this pipeline communication between the
application, SAS engine, and database is set up, any complex manipulation
or analysis as supported by SAS can be incorporated into the SAS Stored
Process. We then illustrate how graphs and charts can be passed as outputs
to the application.
Neetha Sindhu, Kavi Associates
Hari Hara Sudhan, Kavi Associates
Mingming Wang, Kavi Associates
Using Lilypond typesetting software, you can write publication-grade music
scores. The input for Lilypond is a text file that can be written once and
then transferred to SAS® for patterned repetition, so that
you can cycle through patterns that occur in music. The author plays a
sequence of notes and then writes this into Lilypond code. The sequence
starts in the key of C with only a two-note sequence. Then the sequence is
extended to three-, four-, then five-note sequences, always contained in
one octave. SAS is then used to write the same code for all other eleven
keys and in seven scale modes. The method is very simple and not advanced
programming. Lookup files are used in the programming, demonstrating
efficient lookup techniques. The result is a lengthy book or exercise for
practicing music in a PDF file, and a sound source file in midi format is
created that you can hear. This method shows how various programming
languages can be used to write other programming languages.
Peter Timusk, Statistics Canada
Statistical mediation analysis is common in business, social sciences,
epidemiology, and related fields because it explains how and why two
variables are related. For example, mediation analysis is used to
investigate how product presentation affects liking the product, which
then affects the purchase of the product. Mediation analysis evaluates the
mechanism by which a health intervention changes norms that then change
health behavior. Research on mediation analysis methods is an active area
of research. Some recent research in statistical mediation analysis
focuses on extracting accurate information from small samples by using
Bayesian methods. The Bayesian framework offers an intuitive solution to
mediation analysis with small samples; namely, incorporating prior
information into the analysis when there is existing knowledge about the
expected magnitude of mediation effects. Using diffuse prior distributions
with no prior knowledge allows researchers to reason in terms of
probability rather than in terms of (or in addition to) statistical power.
Using SAS® PROC MCMC, researchers can choose one of two
simple and effective methods to incorporate their prior knowledge into the
statistical analysis, and can obtain the posterior probabilities for
quantities of interest such as the mediated effect. This project presents
four examples of using PROC MCMC to analyze a single mediator model with
real data using: (1) diffuse prior information for each regression
coefficient in the model, (2) informative prior distributions for each
regression coefficient, (3) diffuse prior distribution for the covariance
matrix of variables in the model, and (4) informative prior distribution
for the covariance matrix.
Miočević Milica, Arizona State University
David MacKinnon, Arizona State University
How does the SAS® server architecture fit within your IT
infrastructure? What functional aspects does the architecture support?
This session helps attendees understand the logical server topology of the
SAS technology stack: resource and process management in-memory
architecture in-database processing The session also discusses process
flows from data acquisition through analytical information to visual
insight. IT architects, data administrators, and IT managers from all
industries should leave with an understanding of how SAS has evolved to
better fit into the IT enterprise and to help IT's internal customers make
better decisions.
Gary Spakes, SAS
To get the full benefit from PROC REPORT, the savvy programmer needs to
master ACROSS usage and the COMPUTE block. Timing issues with PROC REPORT
and ABSOLUTE column references can unlock the power of PROC REPORT. This
presentation shows how to make the most of ACROSS usage with PROC REPORT.
Use PROC REPORT instead of multiple TRANSPOSE steps. Find out how to use
character variables with ACROSS. Learn how to impact the column headings
for ACROSS usage items. Learn how to use aliases. Find out how to perform
rowwise trafficlighting and trafficlighting based on multiple conditions.
Download the ZIP file
Cynthia Zender, SAS
SAS® has a number of procedures for smoothing scatter plots.
In this tutorial, we review the nonparametric technique called LOESS,
which estimates local regression surfaces. We review the LOESS procedure
and then compare it to a parametric regression methodology that employs
restricted cubic splines to fit nonlinear patterns in the data. Not only
do these two methods fit scatterplot data, but they can also be used to
fit multivariate relationships.
Jonas Bilenas, Barclays UK&E RBB
The scatter plot is a basic tool for examining the relationship between
two variables. While the basic plot is good, enhancements can make it
better. In addition, there might be problems of overplotting. In this
paper, I cover ways to create basic and enhanced scatter plots and to deal
with overplotting.
Peter Flom, Peter Flom Consulting
Linear regression has been a widely used approach in social and medical
sciences to model the association between a continuous outcome and the
explanatory variables. Assessing the model assumptions, such as linearity,
normality, and equal variance, is a critical step for choosing the best
regression model. If any of the assumptions are violated, one can apply
different strategies to improve the regression model, such as performing
transformation of the variables or using a spline model. SAS®
has been commonly used to assess and validate the postulated model and
SAS® 9.3 provides many new features that increase the
efficiency and flexibility in developing and analyzing the regression
model, such as ODS Statistical Graphics. This paper aims to demonstrate
necessary steps to find the best linear regression model in SAS 9.3 in
different scenarios where variable transformation and the implementation
of a spline model are both applicable. A simulated data set is used to
demonstrate the model developing steps. Moreover, the critical parameters
to consider when evaluating the model performance are also discussed to
achieve accuracy and efficiency.
Ning Huang, University of Southern California
All successful organizations seek ways of communicating the identity of
subject matter experts to employees. This information exists as common
knowledge when an organization is first starting out, but the common
knowledge becomes fragmented as the organization grows. SAS®
Text Analytics can be used on an organization's internal unstructured data
to reunite these knowledge fragments. This paper demonstrates how to
extract and surface this valuable information from within an organization.
First, the organization s unstructured textual data are analyzed by
SAS® Enterprise Content Categorization to develop a topic
taxonomy that associates subject matter with subject matter experts in the
organization. Then, SAS Text Analytics can be used successfully to build
powerful semantic models that enhance an organization's unstructured data.
This paper shows how to use those models to process and deliver real-time
information to employees, increasing the value of internal company
information.
Richard Crowell, SAS
Saratendu Sethi, SAS
Xu Yang, SAS
Chunqi Zuo, SAS
Fruzsina Veress, SAS
Business analysts commonly use Microsoft Excel with the SAS®
System to answer difficult business questions. While you can use these
applications independently of each other to obtain the information you
need, you can also combine the power of those applications, using the SAS
Output Delivery System (ODS) tagsets, to completely automate the process.
This combination delivers a more efficient process that enables you to
create fully functional and highly customized Excel worksheets within SAS.
This paper starts by discussing common questions and problems that SAS
Technical Support receives from users when they try to generate Excel
worksheets. The discussion continues with methods for automating Excel
worksheets using ODS tagsets and customizing your worksheets using the CSS
style engine and extended tagsets. In addition, the paper discusses tips
and techniques for moving from the current MSOffice2K and ExcelXP tagsets
to the new Excel destination, which generates output in the native Excel
2010 format.
Chevell Parker, SAS
SAS® OLAP technology is used to organize and present
summarized data for business intelligence applications. It features
flexible options for creating and storing aggregations to improve
performance and brings a powerful multi-dimensional approach to querying
data. This paper focuses on managing security features available to OLAP
cubes through the combination of SAS metadata and MDX logic.
Stephen Overton, Overton Technologies, LLC
Security-conscious organizations have rigorous IT regulations, especially
when company data is available on the move. This paper explores the
options available to secure a deployment of SAS® Mobile BI
with SAS® Visual Analytics. The setup ensures encrypted
communication from remote mobile clients all the way to backend servers.
Additionally, the integration of SAS Mobile BI with third-party Mobile
Device Management (MDM) software and Virtual Private Network (VPN)
technology enable you to place several layers of security and access
control to your data. The paper also covers the out-of-the box security
features of the SAS Mobile BI and SAS Visual Analytics administration
applications to help you close the loop on all possible areas of
exploitation.
Christopher Redpath, SAS
Meera Venkataramani, SAS
Even if you are familiar with security considerations for SAS®
BI deployments, such as metadata and file system permissions, there are
additional security aspects to consider when securing any environment that
includes SAS® Visual Analytics. These include files and
permissions to the grid machines in a distributed environment, permissions
on the SAS® LASR™ Analytic Servers, and interactions
with existing metadata types. We approach these security aspects from the
perspective of an administrator who is securing the environment for
himself, a data builder, and a report consumer.
Dawn Schrader, SAS
Universities strive to be competitive in the quality of education as well
as cost of attendance. Peer institutions are selected to make comparisons
pertaining to academics, costs, and revenues. These comparisons lead to
strategic decisions and long-range planning to meet goals. The process of
finding comparable institutions could be completed with cluster analysis,
a statistical technique. Cluster analysis places universities with similar
characteristics into groups or clusters. A process to determine peer
universities will be illustrated using PROC STANDARD, PROC FASTCLUS, and
PROC CLUSTER.
Diana Suhr, University of Northern Colorado
Multiple imputation, a popular strategy for dealing with missing values,
usually assumes that the data are missing at random (MAR). That is, for a
variable X, the probability that an observation is missing depends only on
the observed values of other variables, not on the unobserved values of X.
It is important to examine the sensitivity of inferences to departures
from the MAR assumption, because this assumption cannot be verified using
the data. The pattern-mixture model approach to sensitivity analysis
models the distribution of a response as the mixture of a distribution of
the observed responses and a distribution of the missing responses.
Missing values can then be imputed under a plausible scenario for which
the missing data are missing not at random (MNAR). If this scenario leads
to a conclusion different from that of inference under MAR, then the MAR
assumption is questionable. This paper reviews the concepts of multiple
imputation and explains how you can apply the pattern-mixture model
approach in the MI procedure by using the MNAR statement, which is new in
SAS/STAT® 13.1. You can specify a subset of the observations
to derive the imputation model, which is used for pattern imputation based
on control groups in clinical trials. You can also adjust imputed values
by using specified shift and scale parameters for a set of selected
observations, which are used for sensitivity analysis with a tipping-point
approach.
Yang Yuan, SAS
The Purchasing Department is considering contracting with your team for a
new SAS® Enterprise BI application. He's already met with
SAS® and seen the sales pitch, and he is very interested.
But the manager is a tightwad and not sure about spending the money. Also,
he wants his team to be the primary developers for this new application.
Before investing his money on training, programming, and support, he would
like a proof-of-concept. This paper will walk you through the seven steps
to create a SAS Enterprise BI POC project: Develop a kick-off meeting
including a full demo of the SAS Enterprise BI tools. Set up your UNIX
file systems and security. Set up your SAS metadata ACTs, users, groups,
folders, and libraries. Make sure the necessary SAS client tools are
installed on the developers machines. Hold a SAS Enterprise BI workshop to
introduce them to the basics, including SAS® Enterprise
Guide®, SAS® Stored Processes, SAS®
Information Maps, SAS® Web Report Studio, SAS®
Information Delivery Portal, and SAS® Add-In for Microsoft
Office, along with supporting documentation. Work with them to develop a
simple project, one that highlights the benefits of SAS Enterprise BI and
shows several methods for achieving the desired results. Last but not
least, follow up! Remember, your goal is not to launch a full-blown
application. Instead, we ll strive toward helping them see the potential
in your organization for applying this methodology.
Sheryl Weise, Wells Fargo
SAS® Visual Analytics enables you to conduct ad hoc data
analysis, visually explore data, develop reports, and then share insights
through the web and mobile tablet apps. You can now also share your
insights with colleagues using the SAS® Office Analytics
integration with Microsoft Excel, Microsoft Word, Microsoft PowerPoint,
Microsoft Outlook, and Microsoft SharePoint. In addition to opening and
refreshing reports created using SAS Visual Analytics, a new SAS®
Central view enables you to manage and comment on your favorite and recent
reports from your Microsoft Office applications. You can also view your
SAS Visual Analytics results in SAS® Enterprise
Guide®. Learn more about this integration and what's coming
in the future in this breakout session.
David Bailey, SAS
I-Kong Fu, SAS
Anand Chitale, SAS
SAS® continues to expand and improve its reporting
capability. With new SAS® 9.4 enhancements in ODS (Output
Delivery System), the opportunity to create stunning reports has expanded
even further. If you are charged with creating relevant, informative,
easy-to-read reports for clients or administrators, then the ODS Report
Writing Interface, ODS LAYOUT enhancements, and the new ODSTEXT procedure
are important tools to use. These tools allow you to create reports in a
smart, eye-catching format that can be turned around quite quickly and
programmed to provide optimum flexibility. How many times have you worked
hours to tweak and fine-tune a report directly in Microsoft Excel,
Microsoft Word, Microsoft Power Point or some other similar software only
to be asked for a quick update , which would then take hours to recreate
because you are manually transferring data? Do you ever dread receiving
the compliment, This is really wonderful information!!!! because you know
it will be followed by Can you run this for EVERY region? Well, dread no
more, because when you harness the power of SAS® ODS, you
can create first-rate, flexible, fabulous reports! Join me as I share with
you two real-world examples of ODS capabilities using (1) a marketing
piece I designed to help the president of our university spotlight county-
and region-specific data as he recruited across the state and (2) our
academic program review form, a multi-page report that outputs to Word so
that program coordinators can add personalized commentary to support their
program s effectiveness.
Gina Huff, Western Kentucky University
Paper SAS1781-2014:
Simplifying Data Integreation in the World of Big Data
Traditional approaches for data integration require an investment in both
tooling for automation and in skills needed to work with and manage those
tools. Organizations want to leverage their current investments in data
integration tooling and apply them to current technology such as big data
and cloud computing without the need to hire new employees or retraining
existing ones. This paper will introduce and demonstrate the intuitive and
easy-to-use interface of an all-new code generation environment from SAS
that simplifies the effort to move, transform, and clean data in place so
that anyone can do it. Whether the data is in SAS, a relational database,
in a Hadoop cluster, in-memory, or in the cloud, this paper will show how
users of any skill level will be able to define and direct powerful
integration algorithms that allow work to pushed to and executed from
anywhere without the need to learn any specialized skills such as writing
MapReduce code.
Mike Frost, SAS
Companies in the insurance and banking industries need to model the
frequency and severity of adverse events every day. Accurate modeling of
risks and the application of predictive methods ensure the liquidity and
financial health of portfolios. Often, the modeling involves
computationally intensive, large-scale simulation. SAS/ETS
®
provides high-performance procedures to assist in this modeling. This
paper discusses the capabilities of the HPCOUNTREG and HPSEVERITY
procedures, which estimate count and loss distribution models in a
massively parallel processing environment. The loss modeling features have
been extended by the new HPCDM procedure, which simulates the probability
distribution of the aggregate loss by compounding the count and severity
distribution models. PROC HPCDM also analyzes the impact of various future
scenarios and parameter uncertainty on the distribution of the aggregate
loss. This paper steps through the entire modeling and simulation process
that is useful in the insurance and banking industries.
Download the ZIP file
Mahesh V. Joshi, SAS
Jan Chvosta, SAS
Big data is all the rage these days, with the proliferation of
data-accumulating electronic gadgets and instrumentation. At the heart of
big data analytics is the MapReduce programming model. As a framework for
distributed computing, MapReduce uses a divide-and-conquer approach to
allow large-scale parallel processing of massive data. As the name
suggests, the model consists of a Map function, which first splits data
into key-value pairs, and a Reduce function, which then carries out the
final processing of the mapper outputs. It is not hard to see how these
functions can be simulated with the SAS® hash objects
technique, and in reality, implemented in the new SAS® DS2
language. This paper demonstrates how hash object programming can handle
data in a MapReduce fashion and shows some potential applications in
physics, chemistry, biology, and finance.
Joseph Hinson, Accenture Life Sciences
Paper SAS247-2014:
Smart-Meter Analytical Applications
For electricity retailer and distribution companies, the introduction of
smart-meter technologies has been a key investment, reducing the
significant costs associated with meter reading. Electricity companies
continue to look for ways to generate a dividend from them in other ways.
This presentation looks at selected practical applications of smart-meter
data: forecasting using smart-meter data as inputs customer segmentation
revenue protection This presentation aims to show some techniques that can
be used to effectively manage and analyze the large amounts of data
generated by these devices in order to generate business value.
Andrew Cathie, SAS
Distributing SAS® software to a large number of machines can
be challenging at best and exhausting at worst. Common areas of concern
for installers are silent automation, network traffic, ease of setup,
standardized configurations, maintainability, and simply the sheer amount
of time it takes to make the software available to end users. We describe
a variety of techniques for easing the pain of provisioning SAS software,
including the new standalone SAS® Enterprise Guide®
and SAS® Add-in for Microsoft Office installers, as well as
the tried and true SAS® Deployment Wizard record and
playback functionality. We also cover ways to shrink SAS Software Depots,
like the new 'subsetting recipe' feature, in order to ease scenarios
requiring depot redistribution. Finally, we touch on alternate methods for
workstation access to SAS client software, including application
streaming, desktop virtualization, and Java Web Start.
Mark Schneider, SAS
All the documentation about the creation of graphs with SAS®
software states that ODS Graphics is not intended to replace
SAS/GRAPH®. However, ODS Graphics is included in the Base
SAS® license from SAS® 9.3, but SAS/GRAPH
still requires an additional component license, so there is definitely a
financial incentive to convert to ODS Graphics. This paper gives examples
that can be used to replace commonly created SAS/GRAPH plots, and
highlights the small number of plots that are still very difficult, or
impossible, to create in ODS Graphics.
Philip Holland, Holland Numerics Ltd
Have you ever needed to use dates as values to loop through a table? For
example, how many events occurred by 1, 2 , 3 & n months ahead? Maybe you
just changed the dates manually and re-ran the query n times? This is a
common need in economic and behavioral sciences. This presentation
demonstrates how to create a table of dates that can be used with SAS®
macro variables to loop through a table. Using this dates table in
combination with the SAS DO loop ensures accuracy and saves time.
Scott Fawver, Arch Mortgage Insurance Company
Businesses today are inundated with unstructured data not just social
media but books, blogs, articles, journals, manuscripts, and even detailed
legal documents. Manually managing unstructured data can be time consuming
and frustrating, and might not yield accurate results. Having an analyst
read documents often introduces bias because analysts have their own
experiences, and those experiences help shape how the text is interpreted.
The fact that people become fatigued can also impact the way that the text
is interpreted. Is the analyst as motivated at the end of the day as they
are at the beginning? Data science involves using data management,
analytical, and visualization strategies to uncover the story that the
data is trying to tell in a more automated fashion. This is important with
structured data but becomes even more vital with unstructured data.
Introducing automated processes for managing unstructured data can
significantly increase the value and meaning gleaned from the data. This
paper outlines the data science processes necessary to ingest, transform,
analyze, and visualize three Star Wars movie scripts: A New Hope, The
Empire Strikes Back, and Return of the Jedi. It focuses on the need to
create structure from unstructured data using SAS® Data
Management, SAS® Text Miner, and SAS® Content
Categorization. The results are featured using SAS® Visual
Analytics.
Adam Maness, SAS
Mary Osborne, SAS
One beautiful graph provides visual clarity of data summaries reported in
tables and listings. Waterfall graphs show, at a glance, the increase or
decrease of data analysis results from various industries. The
introduction of SAS® 9.2 ODS Statistical Graphics enables
SAS® programmers to produce high-quality results with less
coding effort. Also, SAS programmers can create sophisticated graphs in
stylish custom layouts using the SAS® 9.3 Graph Template
Language and ODS style template. This poster presents two sets of example
waterfall graphs in the setting of clinical trials using SAS®
9.3 and later. The first example displays colorful graphs using new SAS
9.3 options. The second example displays simple graphs with gray-scale
color coding and patterns. SAS programmers of all skill levels can create
these graphs on UNIX or Windows.
Setsuko Chiba, Exelixis Inc.
Systematic reviews have become increasingly important in healthcare,
particularly when there is a need to compare new treatment options and to
justify clinical effectiveness versus cost. This paper describes a method
in SAS/STAT® 9.2 for computing weighted averages and
weighted standard deviations of clinical variables across treatment
options while correctly using these summary measures to make accurate
statistical inference. The analyses of data from systematic reviews
typically involve computations of weighted averages and comparisons across
treatment groups. However, the application of the TTEST procedure does not
currently take into account weighted standard deviations when computing
p-values. The use of a default non-weighted standard deviation can lead to
incorrect statistical inference. This paper introduces a method for
computing correct p-values using weighted averages and weighted standard
deviations. Given a data set containing variables for three treatment
options, we want to make pairwise comparisons of three independent
treatments. This is done by creating two temporary data sets using PROC
MEANS, which yields the weighted means and weighted standard deviations.
Subsequently, we then perform a t-test on each temporary data set.The
resultant data sets containing all comparisons of each treatment options
are merged and then transposed to obtain the necessary statistics. The
resulting output provides pairwise comparisons of each treatment option
and uses the weighted standard deviations to yield the correct p-values in
a desired format. This method allows the use of correct weighted standard
deviations using PROC MEANS and PROC TTEST in summarizing data from a
systematic review while providing correct p-values.
Ravi Gaddameedi, California State University
Usha Kreaden, Intuitive Surgical
Contrasting two sets of textual data points out important differences. For
example, consider social media data that have been collected on the race
between incumbent Kay Hagan and challenger Thom Tillis in the 2014
election for the seat of US Senator from North Carolina. People talk about
the candidates in different terms for different topics, and you can
extract the words and phrases that are used more in messages about one
candidate than about the other. By using SAS® Sentiment
Analysis on the extracted information, you can discern not only the most
important topics and sentiments for each candidate, but also the most
prominent and distinguishing terms that are used in the discussion. Find
out if Republicans and Democrats speak different languages!
Hilke Reckman, SAS
Michael Wallis, SAS
Richard Crowell, SAS
Linnea Micciulla, SAS
Cheyanne Baird, SAS
Westat utilizes SAS® software as a core capability for
providing clients in government and private industry with analysis and
characterization of survey data. Staff programmers, analysts, and
statisticians use SAS to manage, store, and analyze client data, as well
as to produce tabulations, reports, graphs, and summary statistics.
Because SAS is so widely used at Westat, the organization has built a
comprehensive infrastructure to support its deployment and use. This paper
provides an overview of Westat's SAS support infrastructure, which
supplies resources that are aimed at educating staff, strengthening their
SAS skills, providing SAS technical support, and keeping the staff on the
cutting edge of SAS programming techniques.
Michael Raithel, Westat
One in every four people dies of heart disease in the United States, and
stress is an important factor which contributes towards a cardiac event.
As the condition of the heart gradually worsens with age, the factors that
lead to a myocardial infarction when the patients are subjected to stress
are analyzed. The data used for this project was obtained from a survey
conducted through the Department of Biostatistics at Vanderbilt
University. The objective of this poster is to predict the chance of
survival of a patient after a cardiac event. Then by using decision trees,
neural networks, regression models, bootstrap decision trees, and ensemble
models, we predict the target which is modeled as a binary variable,
indicating whether a person is likely to survive or die. The top 15
models, each with an accuracy of over 70%, were considered. The model will
give important survival characteristics of a patient which include his
history with diabetes, smoking, hypertension, and angioplasty.
Yogananda Domlur Seetharama, Oklahoma State University
Sai Vijay Kishore Movva, Oklahoma State University
In the day-to-day operations of a Biostatistics and Statistical
Programming department, we are often tasked with generating reports in the
form of tables, listings, and figures (TLFs). Some requests come in the
form of a small number of TLFs, whereas others are more substantial in
magnitude. Regardless, creating a single document for distribution and
review might be required after all TLFs have been completed. A common
setting in the pharmaceutical industry is to develop SAS®
code in which individual programs generate one or more TLFs in some
standard formatted output such as RTF or PDF with a common look and feel.
Furthermore, programs are developed over time, with the production run in
batch mode. The result is a set of TLFs completed at different times.
Creation of a final (single) document with a properly sectioned and
hyperlinked Table of Contents, as well as dynamic page numbering, might be
wanted. The ability to deliver a single document greatly simplifies
document management and electronic review for many users. Many options
have been proposed that post-process individual RTF or PDF files. An
alternative approach, which uses ODS Document, is introduced. Unlike many
other techniques, ODS Document uses intermediate files called 'item
stores' that are independent of the ODS destination. This technique has
proven successful across multiple projects in our specific setting and
continues to show promise in other applications as well.
William Coar, Axio Research
Before you can analyze your big data, you need to prepare the data for
analysis. This paper discusses capabilities and techniques for using the
power of SAS® to prepare big data for analytics. It focuses
on how a SAS user can write code that will run in a Hadoop cluster and
take advantage of the massive parallel processing power of Hadoop.
Donna De Capite, SAS
SAS® platform installations are large, complex, growing, and
ever-changing enterprise systems that support many diverse groups of users
and content. A reliable metadata security implementation is critical for
providing access to business resources in a methodical, organized,
partitioned, and protected manner. With natural changes to users, groups,
and folders from an organization s day-to-day activities, deviations from
an original metadata security plan are very likely and can put protected
resources at risk. Regular security testing can ensure compliance, but,
given existing administrator commitments and the time consuming nature of
manual testing procedures, it doesn't tend to happen. This paper discusses
concepts and outlines several example test specifications from an
automated metadata security testing framework being developed by Metacoda.
With regularly scheduled, automated testing, using a well-defined set of
test rules, administrators can focus on their other work, and let alerts
notify them of any deviations from a metadata security test specification.
Paul Homes, Metacoda
With smartphone and mobile apps market developing so rapidly, the
expectations about effectiveness of mobile applications is high. Marketers
and app developers need to analyze huge data available much before the app
release, not only to better market the app, but also to avoid costly
mistakes. The purpose of this poster is to build models to predict the
success rate of an app to be released in a particular category. Data has
been collected for 540 android apps under the Top free newly released apps
category from https://play.google.com/store . The SAS®
Enterprise Miner™ Text Mining node and SAS®
Sentiment Analysis Studio are used to parse and tokenize the collected
customer reviews and also to calculate the average customer sentiment
score for each app. Linear regression, neural, and auto-neural network
models have been built to predict the rank of an app by considering
average rating, number of installations, total number of reviews, number
of 1-5 star ratings, app size, category, content rating, and average
customer sentiment score as independent variables. A linear regression
model with least Average Squared Error is selected as the best model, and
number of installations, app maturity content are considered as
significant model variables. App category, user reviews, and average
customer sentiment score are also considered as important variables in
deciding the success of an app. The poster summarizes the app success
trends across various factors and also introduces a new SAS®
macro %getappdata, which we have developed for web crawling and text
parsing.
Vandana Reddy, Oklahoma State University
Chinmay Dugar, Oklahoma State University
Global businesses must react to daily changes in market conditions over
multiple geographies and industries. Consuming reputable daily economic
reports assists in understanding these changing conditions, but requires
both a significant human time commitment and a subjective assessment of
each topic area of interest. To combat these constraints, Dow's Advanced
Analytics team has constructed a process to calculate sentence-level topic
frequency and sentiment scoring from unstructured economic reports. Daily
topic sentiment scores are aggregated to weekly and monthly intervals and
used as exogenous variables to model external economic time series data.
These models serve to both validate the relationship between our sentiment
scoring process and also as near-term forecasts where daily or weekly
variables are unavailable. This paper will first describe our process of
using SAS® Text Miner to import and discover economic topics
and sentiment from unstructured economic reports. The next section
describes sentiment variable selection techniques that use
SAS/STAT®, SAS/ETS®, and SAS®
Enterprise Miner™ to generate similarity measures to
economic indices. Our process then uses ARIMAX modeling in SAS®
Forecast Studio to create economic index forecasts with topic sentiments.
Finally, we show how the sentiment model components are used as a matrix
of economic key performance indicators by topic and geography.
Michael P. Dessauer, The Dow Chemical Company
Justin Kauhl, Tata Consultancy Services
Nowadays, in the Big Data era, Business Intelligence Departments collect,
store, process, calculate, and monitor massive amounts of data.
Nevertheless, sometimes hundreds of metrics built on the structured data
are inefficient to explain why the offered deal sold better or worse than
expected. The answer might be found in text data that every company owns
and yet is not aware of its possible usage or neglects its value. This
project shows text mining methods, implemented in SAS® Text
Miner 12.1, that enable the determination of a deal's success or failure
factors based on in-house or Internet-scattered customers' views and
opinions. The study is conducted on data gathered from Groupon Sp. z o.o.
(Polish business unit) - e-commerce company, as it is assumed that the
market is by and large a customer-driven environment.
Rafal Wojdan, Warsaw School of Economics
'Can I have that in Excel?' This is a request that makes many of us
shudder. Now your boss has discovered Microsoft Excel pivot tables.
Unfortunately, he has not discovered how to make them. So you get to
extract the data, massage the data, put the data into Excel, and then
spend hours rebuilding pivot tables every time the corporate data is
refreshed. In this workshop, you learn to be the armchair quarterback and
build pivot tables without leaving the comfort of your SAS®
environment. In this workshop, you learn the basics of Excel pivot tables
and, through a series of exercises, you learn how to augment basic pivot
tables first in Excel, and then using SAS. No prior knowledge of Excel
pivot tables is required.
Peter Eberhardt, Fernwood Consulting Group Inc.
In a clinical study, we often set up multiple hypotheses with regard to
the cost of getting study result. However, the multiplicity problem arises
immediately when they are performed in a univariate manner. Some methods
to control the rate of the overall type I error are applied widely, and
they are discussed in this paper, except the methodology, we will
introduce its application in one study case and provide the SAS®
code.
Lixiang Yao, icon
Once upon a time, a writer compared a desert to a labyrinth. A desert has
no walls or stairways, but you can still find yourself utterly lost in it.
And oftentimes, when you think you found that oasis you were looking for,
what you are really seeing is an illusion, a mirage. Similarly, logical
fallacies and misleading data patterns can easily deceive the unaware data
explorer. In this paper, we discuss how they can be recognized and
neutralized with the power of the SAS® Visual Analytics
Explorer. Armed with this knowledge, you will be able to safely navigate
the dunes to find true insights and avoid false conclusions.
Nascif Abousalh-Neto, SAS
We continually work with our hardware partners to establish best practices
with regard to tuning the latest hardware components that are released
each year. This paper goes over the latest tuning guidelines for your
hardware infrastructure, including your host computer system, operating
system, and complete I/O infrastructure (from the computer host and
network adapters down through the physical storage). Our findings are
published in SAS® papers on the SAS website,
support.sas.com, with updates posted to the SAS Administration blog.
Margaret Crevar, SAS
Tony Brown, SAS
For decades, SAS® has been the cornerstone of many
organizations for business reporting. In more recent times, the ability to
quickly determine the performance of an organization through the use of
dashboards has become a requirement. Different ways of providing dashboard
capabilities are discussed in this paper: using out-of-the-box solutions
such as SAS® Visual Analytics and SAS® BI
Dashboard, through to alternative solutions using SAS®
Stored Processes, batch processes, and SAS® Integration
Technologies. Extending the available indicators is also discussed, using
Graph Template Language and KPI indicators provided with Base
SAS®, as well as alternatives such as Google Charts and
Flash objects. Real-world field experience, problem areas, solutions, and
tips are shared, along with live examples of some of the different
methods.
Mark Bodt, The Knowledge Warehouse (Knoware)
The FORMAT procedure in SAS® is a very powerful and
productive tool, yet many beginning programmers rarely make use of it. The
FORMAT procedure provides a convenient way to do a table lookup in SAS.
User-generated FORMATS can be used to assign descriptive labels to data
values, create new variables, and find unexpected values. PROC FORMAT can
also be used to generate data extracts and to merge data sets. This paper
provides an introductory look at PROC FORMAT for the beginning user and
provides sample code that illustrates the power of PROC FORMAT in a number
of applications. Additional examples and applications of PROC FORMAT can
be found in the SAS® Press book titled 'The Power of PROC
FORMAT.'
Jonas Bilenas, Barclays UK&E RBB
The SAS® Enterprise Guide® Query Builder is
one of the most powerful components of the software. It enables a user to
bring in data, join, drop and add columns, compute new columns, sort,
filter data, leverage the advanced expression builder, change column
attributes, and more! This presentation provides an overview of the major
features of this powerful tool and how to leverage it every day.
Jennifer First-Kluge, Systems Seminar Consultants
Steven First, Systems Seminar Consultants
Raking (iterative proportional fitting) is a procedure that takes sampling
weights from complex sample surveys and adjusts them so that they add to
known control totals. This process reduces variance and adjusts for
undercoverage. But raking in multiple dimensions can lead to extreme
weights, which increase variance. Trimming is another sample weighting
procedure that reduces extreme weights to cutoffs, thereby improving
variance properties while potentially introducing bias. The RAKE-TRIM
macro combines raking and trimming in an iterative algorithm to achieve
these two goals simultaneously. The raking reduces the bias potential from
trimming, and the trimming reduces the variance inflation from raking.
When convergence occurs, the final weights aggregate to the control
totals, as well as respect the trimming limits. SAS® macros
are well suited for this kind of envelope program: the larger macro
consists of the integration of component macros that were developed for
other applications. A parameter specification sheet enables users to
provide all of the parameters needed to define the algorithm for their
particular situation, and, if necessary, to alter the parameters to
facilitate convergence. Diagnostics are included when convergence fails.
Microsoft Excel tables are imported to provide the cell structure and are
exported to provide statistics for the algorithm s results. This RAKE-TRIM
macro was first developed in 2010 for the 2009 National Household
Transportation Survey and has been used in other studies as well. The
paper describes the algorithm and discusses our experiences with it.
Louis Rizzo, Westat
Direct marketing is the practice of delivering promotional messages
directly to potential customers on an individual basis rather than by
using mass medium. In this project, we build a finely tuned response model
that helps a financial services company to select high-quality receptive
customers for their future campaigns and to identify the important factors
that influence marketing to effectively manage their resources. This study
was based on the customer solicitation center s marketing campaign data
(45,211 observations and 18 variables) available on UC Irvine's web site
with attributes of present and past campaign information (communication
type, contact duration, previous campaign outcome, and so on) and customer
s personal and banking information. As part of data preparation, we had
performed mean imputation to handle missing values and categorical
recoding for reducing levels of class variables. In this study, we had
built several predictive models using the SAS® Enterprise
Miner™ models Decision Tree, Neural Network, Logistic
Regression, and SVM to predict whether the customer responds to the loan
offer by subscribing. The results showed that the Stepwise Logistic
Regression model was the best when chosen based on the misclassification
rate criteria. When the top 3 decile customers were selected based on the
best model, the cumulative response rate was 14.5% in contrast to the
baseline response rate of 5%. Further analysis showed that the customers
are more likely to subscribe to the loan offer if they have the following
characteristics: never been contacted in the past, no default history, and
provided cell phone as primary contact information.
Arun Mandapaka, Oklahoma State University
Amit Kushwah, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
This is the way I have always done it and it works fine for me. Have you
heard yourself or others say this when someone suggests a new technique to
help solve a problem? Most of us have a set of tricks and techniques from
which we draw when starting a new project. Over time we might overlook
newer techniques because our old toolkit works just fine. Sometimes we
actively avoid new techniques because our initial foray leaves us daunted
by the steep learning curve to mastery. For me, the PRX functions and the
SAS® hash object fell into this category. In this workshop,
we address possible objections to learning to use the SAS hash object. We
start with the fundamentals of setting up the hash object and work through
a variety of practical examples to help you master this powerful
technique.
Peter Eberhardt, Fernwood Consulting Group Inc.
Applying models to analyze sports data has always been done by teams
across the globe. The film Moneyball has generated much hype about how a
sports team can use data and statistics to build a winning team. The
objective of this poster is to use the model comparison algorithm of
SAS® Enterprise Miner™ to pick the best
model that can predict the outcome of a soccer game. It is hence important
to determine which factors influence the results of a game. The data set
used contains input variables about a team s offensive and defensive
abilities and the outcome of a game is modeled as a target variable. Using
SAS Enterprise Miner, multinomial regression, neural networks, decision
trees, ensemble models and gradient boosting models are built. Over 100
different versions of these models are run. The data contains statistics
from the 2012-13 English premier league season. The competition has 20
teams playing each other in a home and away format. The season has a total
of 380 games; the first 283 games are used to predict the outcome of the
last 97 games. The target variable is treated as both nominal variable and
ordinal variable with 3 levels for home win, away win, and tie. The
gradient boosting model is the winning model which seems to predict games
with 65% accuracy and identifies factors such as goals scored and ball
possession as more important compared to fouls committed or red cards
received.
Vandana Reddy, Oklahoma State University
Sai Vijay Kishore Movva, Oklahoma State University
In the traveling salesman problem, a salesman must minimize travel
distance while visiting each of a given set of cities exactly once. This
paper uses the SAS/OR® OPTMODEL procedure to formulate and
solve the traveling baseball fan problem, which complicates the traveling
salesman problem by incorporating scheduling constraints: a baseball fan
must visit each of the 30 Major League ballparks exactly once, and each
visit must include watching a scheduled Major League game. The objective
is to minimize the time between the start of the first attended game and
the end of the last attended game. One natural integer programming
formulation involves a binary decision variable for each scheduled game,
indicating whether the fan attends. But a reformulation as a
side-constrained network flow problem yields much better solver
performance.
Tonya Chapman, SAS
Matt Galati, SAS
Rob Pratt, SAS
Identifying claim fraud using predictive analytics represents a unique
challenge. 1. Predictive analytics generally requires that you have a
target variable which can be analyzed. Fraud is unique in this regard in
that there is a lot of fraud that has occurred historically that has not
been identified. Therefore, the definition of the target variable is
difficult. 2.There is also a natural assumption that the past will bear
some resemblance to the future. In the case of fraud, methods of
defrauding insurance companies change quickly and can make the analysis of
a historical database less valuable for identifying future fraud. 3. In an
underlying database of claims that may have been determined to be
fraudulent by an insurance company, there is many times an inconsistency
between different claim adjusters regarding which claims are referred for
investigation. This inconsistency can lead to erroneous model results due
to data that is not homogenous. This paper will demonstrate how analytics
can be used in several ways to help identify fraud: 1. More consistent
referral of suspicious claims 2. Better identification of new types of
suspicious claims 3. Incorporating claim adjuster insight into the
analytics results. As part of this paper, we will demonstrate the
application of several approaches to fraud identification: 1. Clustering
2. Association analysis 3. PRIDIT (Principal Component Analysis of RIDIT
scores).
Roosevelt C. Mosley, Pinnacle Actuarial Resources, Inc.
Nick Kucera, Pinnacle Actuarial Resources, Inc.
HTML5 has become the de facto standard for web applications. As a result,
the lingua franca object notation of the web services that the web
applications call has switched from XML to JSON. JSON is remarkably easy
to parse in JavaScript, but so far SAS doesn't have any native JSON
parsers. The Facebook Graph API dropped XML support a few years ago. This
paper shows how we can parse the JSON in SAS by calling an external
script, using PROC GROOVY to parse it inside of SAS, or by parsing the
JSON manually with a DATA step. We'll extract the data from the Facebook
Graph API and import it into an OLAP data mart to report and analyze a
marketing campaign's effectiveness.
Philihp Busby, SAS
This new SAS® tool is a two-dimensional color chart for
visualizing changes in a population or in a system over time. Data for one
point in time appear as a thin horizontal band of color. Bands for
successive periods are stacked up to make a two-dimensional plot, with the
vertical direction showing changes over time. As a system evolves over
time, different kinds of events have different characteristic patterns.
Creation of Time Contour plots is explained step-by-step. Examples are
given in astrostatistics, biostatistics, econometrics, and demographics.
David Corliss, Magnify Analytic Solutions
Changes in health insurance and other industries often have a spatial
component. Maps can be used to convey this type of information to the user
more quickly than tabular reports and other non-graphical formats. SAS®
provides programmers and analysts with the tools to not only create
professional and colorful maps, but also the ability to display spatial
data on these maps in a meaningful manner that aids in the understanding
of the changes that have transpired. This paper illustrates the creation
of a number of different maps for displaying change over time with
examples from the health insurance arena.
Barbara Okerson, WellPoint
SAS® Management Console was designed to control and monitor
virtually all of the parts and features of the SAS®
Intelligence Platform. However, administering even a small SAS®
Business Intelligence system can be a daunting task. This paper presents a
few techniques that will help you simplify your administrative tasks and
enable you and your user community to get the most out of your system. The
SAS® Metadata Server stores most of the information required
to maintain and run the SAS Intelligence Platform, which is obviously the
heart of SAS BI. It stores information about libraries, users, database
logons, passwords, stored processes, reports, OLAP cubes, and a myriad of
other information. Organization of this metadata is an essential part of
an optimally performing system. This paper discusses ways of organizing
the metadata to serve your organization well. It also discusses some of
the key features of SAS Management Console and best practices that will
assist the administrator in defining roles, promoting, archiving, backing
up, securing, and simply just organizing the data so that it can be found
and accessed easily by administrators and users alike.
Michael Sadof, MGS Associates, Inc.
No need to fret, Base SAS® programmers. Converting to
SAS® Enterprise Guide® is a breeze, and it
provides so many advantages. Coding remote connections to SAS®
servers is a thing of the past. Generate WYSIWYG prompts to increase the
usage of the SAS code and to create reports and SAS® Stored
Processes to share easily with people who don t use SAS Enterprise Guide.
The first and most important thing, however, is to change the default
options and preferences to tame SAS Enterprise Guide, making it behave
similar to your Base SAS ways. I cover all of these topics and provide
demos along the way.
Angela Hall, SAS
As a longtime Base SAS® programmer, whether to use a
different application for programming is a constant question when powerful
applications such as SAS® Enterprise Guide®
are available. This paper provides some important tips for a programmer,
such as the best way to use the code window and how to take advantage of
system-generated code in SAS Enterprise Guide 5.1. This paper also
explains the differences between some of the functions and procedures in
Base SAS and SAS Enterprise Guide. It highlights features in SAS
Enterprise Guide such as process flow, data access management, and report
automation, including formatting using XML tag sets.
Anjan Matlapudi, AmerihealthCaritas
This paper gives you a better idea of how and where to use the record
lookup functions to locate observations where a variable has some
characteristic. Various related functions are illustrated to search
numeric and character values in this process. Code is shown with time
comparisons. I will discuss three possible ways to retrieve records using
the SAS® DATA step, PROC SQL, and Perl regular expressions.
Real and CPU time processing issues will be highlighted when comparing to
retrieve records using these methods. Although the program is written for
the PC using SAS® 9.2 in a Windows XP 32-bit environment,
all the functions are applicable to any system. All the tools discussed
are in Base SAS®. The typical attendee or reader will have
some experience in SAS, but not a lot of experience dealing with large
amount of data.
Anjan Matlapudi, Amerihealth Critas
Two new production features offered in the Output Delivery System (ODS) in
SAS® 9.4 are ODS LAYOUT and the ODS Report Writing
Interface. This one-two punch gives you power and flexibility in
structuring your SAS® output. What are the strengths for
each? How do they differ? How do they interact? This paper highlights the
similarities and differences between the two and illustrates the
advantages of using them together. Why go twelve rounds? Make your report
a knockout with ODS LAYOUT and the Report Writing Interface.
Daniel Kummer, SAS
This paper introduces basic-to-advanced strategies and syntax, the tools
of the SAS® trade, that enable client-quality PDF output to
be delivered through a production system of macro programs. A variety of
PROC REPORT output with proven client value serves to illustrate a
discussion of the fundamental syntax used to create and share formats,
macro programs, PROC REPORT output, inline styles, and style templates.
The syntax is integrated into basic macro programs that demonstrate the
the core functionality of the reporting system. Later sections of the
paper describe in detail the macro programs used to start and end a PDF:
(a) programs to save all current titles, footnotes, and option settings,
establish standard titles, footnotes and option settings, and initially
create the PDF document; and (b) programs to create a final standard data
documentation page, end the PDF, and restore all original titles,
footnotes, and option settings. The paper also shows how macro programs
enable the setting of inline styles at the global, macro program, and
macro program call-levels. The paper includes the style template syntax
and the complete PROC REPORT syntax generated by the macro programs, and
is designed for the intermediate to advanced SAS programmer using
Foundation SAS® for Release 9.2 on a Windows operating
system.
Patrick Thornton, SRI International
When assisting SAS® customers who are experiencing
performance issues, we are often asked by the SAS users at a customer site
for the top 10 guidelines to share with those who have taken on the role
of system administrator or SAS administrator. This paper points you to
where you can get more information regarding each of the guidelines and
related details on the SAS website.
Margaret Crevar, SAS
Tony Brown, SAS
One of the most striking features separating SAS® from other
statistical languages is that SAS has native SQL (Structured Query
Language) capacity. In addition to the merging or the querying that a SAS
user commonly applies in daily practice, SQL significantly enhances the
power of SAS in descriptive statistics and data management. In this paper,
we show reproducible examples to introduce 10 useful tips for the SQL
procedure in the BASE module.
Chao Huang, Oklahoma State University
Do you often create SAS® web applications? Do you need to
update or retrieve values from a SAS data set and display them in a
browser? Do you need to show the results of a SAS® Stored
Process in a browser? Are you finding it difficult to figure out how to
pass parameters from a web page to a SAS Stored Process? If you answered
yes to any of these questions, then look no further. Techniques shown in
this paper include: How to take advantage of JavaScript and minimize PUT
statements. How to call a SAS Stored Process from your web page by using
JavaScript and XMLHTTPRequest. How to pass parameters from a web page to a
SAS Stored Process and from a SAS Stored Process back to the web page. How
to use simple Ajax to refresh and update a specific part of a web page
without the need to reload the entire page. How to apply Cascading Style
Sheets (CSS) on your web page. How to use some of the latest HTML5
features, like drag and drop. How to display run-time graphs in your web
page by using STATGRAPH and PROC SGRENDER. This paper contains sample code
that demonstrates each of the techniques.
Yogendra Joshi, SAS
SAS® Add-In for Microsoft Office remains a popular tool for
people who are not SAS® programmers due to its easy
interface with the SAS servers. In this session, you'll learn some of the
many tricks that other organizations use for getting more value out of the
tool.
Tricia Aanderud, And Data Inc
The independent means t-test is commonly used for testing the equality of
two population means. However, this test is very sensitive to violations
of the population normality and homogeneity of variance assumptions. In
such situations, Yuen s (1974) trimmed t-test is recommended as a robust
alternative. The purpose of this paper is to provide a SAS®
macro that allows easy computation of Yuen s symmetric trimmed t-test. The
macro output includes a table with trimmed means for each of two groups,
Winsorized variance estimates, degrees of freedom, and obtained value of t
(with two-tailed p-value). In addition, the results of a simulation study
are presented and provide empirical comparisons of the Type I error rates
and statistical power of the independent samples t-test, Satterthwaite s
approximate t-test, and the trimmed t-test when the assumptions of
normality and homogeneity of variance are violated.
Patricia Rodriguez de Gil, University of South Florida
Anh P. Kellermann, University of South Florida
Diep T. Nguyen, University of South Florida
Eun Sook Kim, University of South Florida
Jeffrey D. Kromrey, University of South Florida
As SAS® professionals, we often wish our clients would make
more use of the many excellent SAS tools at their disposal. However, it
remains an indisputable fact that for many business users, Microsoft Excel
is still their go-to application when it comes to carrying out any form of
data analysis. There have been many attempts to integrate SAS and Excel,
but none of these has up to now been entirely seamless. This paper
addresses that problem by showing how, with a minimum of VBA (Visual Basic
for Applications) code and by using the SAS Integrated Object Model (IOM)
together with Microsoft s ActiveX Data Objects (ADO), we can create an
Excel User Defined Function (UDF) that can accept parameters, carry out
all data manipulations in SAS, and return the result to the spreadsheet in
a way that is completely invisible to the user. They can nest or link
these functions together just as if they were native Excel functions. We
then go on to demonstrate how, using the same techniques, we can create
small Excel applications that can perform sophisticated data analyses in
SAS while not forcing users out of their Excel comfort zones.
Chris Brooks, Melrose Analytics Ltd
As a retailer, your bottom line is determined by supply and demand. Are
you supplying what your customer is demanding? Or do they have to go look
somewhere else? Accurate allocation and size optimization mean your
customer will find what they want more often. And that means more sales,
higher profits, and fewer losses for your organization. In this session,
Linda Canada will share how DSW went from static allocation models without
size capability to precision allocation using intelligent, dynamic models
that incorporate item plans and size optimization.
Linda Canada, DSW Inc.
You don't have to be with the CIA to discover why your SAS®
stored process is producing clandestine results. In this talk, you will
learn how to use prompts to get the results you want, work with the
metadata to ensure correct results, and even pick up simple coding tricks
to improve performance. You will walk away with a new decoder ring that
allows you to discover the secrets of the SAS logs!
Tricia Aanderud, And Data Inc
Angela Hall, SAS
Understanding previous research in key domain areas can help R&D
organizations focus new research in non-duplicative areas and ensure that
future endeavors do not repeat the mistakes of the past. However, manual
analysis of previous research efforts can prove insufficient to meet these
ends. This paper highlights how a combination of SAS® Text
Analytics and SAS® Visual Analytics can deliver the
capability to understand key topics and patterns in previous research and
how it applies to a current research endeavor. We will explore these
capabilities in two use cases. The first will be in uncovering trends in
publicly visible government funded research (SBIR) and how these trends
apply to future research in nanotechnology. The second will be visualizing
past research trends in publicly available NASA publications, and how
these might impact the development of next-generation spacecraft.
Tom Sabo, SAS
SAS® provides a wide variety of products and solutions that
address analytics, data management, and reporting. It can be challenging
to understand how the data and processes in a SAS deployment relate to
each other and how changes in your processes affect downstream consumers.
This paper presents visualization and reporting tools for lineage and
impact analysis. These tools enable you to understand where the data for
any report or analysis originates or how data is consumed by data
management, analysis, or reporting processes. This paper introduces new
capabilities to import metadata from third-party systems to provide
lineage and impact analysis across your enterprise.
Liz McIntosh, SAS
Nancy Rausch, SAS
Bryan Wolfe, SAS
The DOW-loop is not official terminology that one can find in SAS®
documentation, but it has been well known and widely used among
experienced SAS programmers. The DOW-loop was developed over a decade ago
by a few SAS gurus, including Don Henderson, Paul Dorfman, and Ian
Whitlock. A common construction of the DOW-loop consists of a DO-UNTIL
loop with a SET and a BY statement within the loop. This construction
isolates actions that are performed before and after the loop from the
action within the loop, which results in eliminating the need for
retaining or resetting the newly created variables to missing in the DATA
step. In this talk, in addition to explaining the DOW-loop construction,
we review how to apply the DOW-loop to various applications.
Arthur Li, City of Hope
Paper 2446-2014:
UniCredit Leverages Teradata Appliance for SAS to Analyze Sales and
Business Network Analysis with SAS® Visual Analytics
UniCredit Group is a large financial institution (G-Sifi) with a clear
focus to develop and execute a data governance strategy. To deliver this
focus, UniCredit implemented a robust environment to support the advanced
analytics process that is directly connected to the Teradata Data
Warehouse. This presentation highlights how UniCredit developed an
analytic program for the Region Italy, covering the business needs in an
integrated and highly governed environment. The CFO s aim is to use the
analytical business tools for monitoring Sales Area and Business Network
analysis with the adoption of SAS® Visual Analytics on the
Teradata Appliance for SAS®, Model 720.
Roberto Monachino, UniCredit Group
SAS® Visual Analytics is a unique tool that provides both
exploratory and predictive data analysis capabilities. As the visual part
of the name suggests, the rendering of this analysis in the form of
visuals (crosstabs, line charts, histograms, scatter plots, geo maps,
treemaps, and so on) make this a very useful tool. Join me as I walk you
down the path of exploring the capabilities of SAS Visual Analytics 6.3,
starting with data stored in a desktop application as multiple Microsoft
Excel files. Together, we import the data into SAS Visual Analytics,
prepare the data using the data builder, load the data into SAS®
LASR™ Analytic Server, explore data, and create reports.
Beena Mathew, SAS
Michelle Wilkie, SAS
You have built the simple bar chart and mastered the art of layering
multiple plot statements to create complex graphs like the Survival Plot
using the SGPLOT procedure. You know all about how to use plot statements
creatively to get what you need and how to customize the axes to achieve
the look and feel you want. Now it s time to up your game and step into
the realm of the Graphics Wizard. Behold the magical powers of Graph
Template Language Layouts! Here you will learn the esoteric art of
creating complex multi-cell graphs using LAYOUT LATTICE. This is the
incantation that gives you the power to build complex, multi-cell graphs
like the Forest plot, Stock plots with multiple indicators like MACD and
Stochastics, Adverse Events by Relative Risk graphs, and more. If you ever
wondered how the Diagnostics panel in the REG procedure was built, this
paper is for you. Be warned, this is not the realm for the faint of heart!
Sanjay Matange, SAS
Paper SAS317-2014:
Use SAS® Studio to Build Analytical Models to Explore and
Analyze Your Data
The new SAS® Studio application is a web-based interface
that provides point-and-click methods that enable you to access a set of
commonly used analytical tasks without having to install SAS®
on your local machine. This paper shows how you can use the analytical
tasks to explore your data, build a model, and analyze the results' right
in your web browser on any Windows, Mac, or mobile device. No SAS
programming experience is required to run these tasks, but this
application displays the automatically generated SAS procedure code for
users who are interested in learning and understanding SAS procedure
syntax.
Udo Sglavo, SAS
When deploying SAS® code into a production environment, a
programmer should ensure that the code satisfies the following key
criteria: The code runs without errors. The code performs operations
consistent with the agreed upon business logic. The code is not dependent
on manual human intervention. The code performs necessary checks in order
to provide sufficient quality control of the deployment process. Base
SAS® programming offers a wide range of techniques to
support the last two aforementioned criteria. This presentation
demonstrates the use of SAS® macro variables in combination
with simple macro programs to perform a number of routine automated tasks
that are often part of the production-ready code. Some of the examples to
be demonstrated include the following topics: How to check that required
key parameters for a successful program run are populated in the
parameters file. How to automatically copy the content of the permanent
folder to the newly created backup folder. How to automatically update the
log file with new run information. How to check whether a data set already
exists in the library.
Elena Shtern, SAS
Paper SAS282-2014:
Useful Tips for Building Your Own SAS® Cloud
Everyone has heard about SAS® Cloud. Now come learn how you
can build and manage your own cloud using the same SAS®
virtual application (vApp) technology.
Brad Murphy, SAS
Peter Villiers, SAS
Epidemic modeling is an increasingly important tool in the study of
infectious diseases. As technology advances and more and more parameters
and data are incorporated into models, it is easy for programs to get
bogged down and become unacceptably slow. The use of arrays for importing
real data and collecting generated model results in SAS® can
help to streamline the process so results can be obtained and analyzed
more efficiently. This paper describes a stochastic mathematical model for
transmission of influenza among residents and healthcare workers in
long-term care facilities (LTCFs) in New Mexico. The purpose of the model
was to determine to what extent herd immunity among LTCF residents could
be induced by varying the vaccine coverage among LTCF healthcare workers.
Using arrays in SAS made it possible to efficiently incorporate real
surveillance data into the model while also simplifying analyses of the
results, which ultimately held important implications for LTCF policy and
practice.
Carl Grafe, University of Utah
This session demonstrates how to use Base SAS® tools to add
functional, reusable extensions to the SAS® system. Learn
how to do the following: Write user-defined macro functions that can be
used inline with any other SAS code. Use PROC FCMP to write and store
user-defined functions that can be used in other SAS programs. Write DS2
user-defined methods and store them in packages for easy reuse in
subsequent DS2 programs.
Mark Jordan, SAS
Are you a Java programmer who has been asked to work with
SAS®, or a SAS programmer who has been asked to provide an
interface to your IT colleagues? Let s face it, not a lot of Java
programmers are heavy SAS users. If this is the case in your company, then
you are in luck because SAS provides a couple of really slick features to
allow Java programmers to access both SAS data and SAS programming from
within a Java program. This paper walks beginner Java or SAS programmers
through the simple task of accessing SASdata and SAS programs from a Java
program. All that you need is a Java environment and access to a running
SAS process, such as a SAS server. This SAS server can either be a
SAS/SHARE® server or an IOM server. However, if you do not
have either of these two servers that is okay; with the tools that are
provided by SAS, you can start up a remote SAS session within Java and
harness the power of SAS.
Jeremy Palbicki, Mayo Clinic
Have you found OS file permissions to be insufficient to tailor access
controls to meet your SAS® data security requirements? Have
you found metadata permissions on tables useful for restricting access to
SAS data, but then discovered that SAS programmers can avoid the
permissions by issuing LIBNAME statements that do not use the metadata?
Would you like to ensure that users have access to only particular rows or
columns in SAS data sets, no matter how they access the SAS data sets?
Metadata-bound libraries provide the ability to authorize access to SAS
data by authenticated Metadata User and Group identities that cannot be
bypassed by SAS programmers who attempt to avoid the metadata with direct
LIBNAME statements. They also provide the ability to limit the rows and
columns in SAS data sets that an authenticated user is allowed to see. The
authorization decision is made in the bowels of the SAS® I/O
system, where it cannot be avoided when data is accessed. Metadata-bound
libraries were first implemented in the second maintenance release of
SAS® 9.3 and were enhanced in SAS® 9.4. This
paper overviews the feature and discusses best practices for administering
libraries bound to metadata and user experiences with bound data. It also
discusses enhancements included in the first maintenance release of SAS
9.4.
Jack Wallace, SAS
SAS® has a wide variety of functions and call routines
available. More and more operating system-level functionality has become
available as part of SAS language and functions over the versions of SAS.
However, there is a wealth of other operating system functionality that
can be accessed from within SAS with some preparation on the part of the
SAS programmer. Much of the Microsoft Windows functionality is stored in
easily re-usable system DLL (Dynamic Link Library) files. This paper
describes some of the Windows functionality that might not be available
directly as part of SAS language. It also describes methods of accessing
that functionality from within SAS code. Using the methods described here,
practically any Windows API should become accessible. User-created DLL
functionality should also be accessible to SAS programs.
Rajesh Lal, Experis Business Analytics
Regression is a helpful statistical tool for showing relationships between
two or more variables. However, many users can find the barrage of numbers
at best unhelpful, and at worst undecipherable. Using the shipments and
inventories historical data from the U.S. Census Bureau's office of
Manufacturers' Shipments, Inventories, and Orders (M3), we can create a
graphical representation of two time series with PROC GPLOT and map out
reported and expected results. By combining this output with results from
PROC REG, we are able to highlight problem areas that might need a second
look. The resulting graph shows which dates have abnormal relationships
between our two variables and presents the data in an easy-to-use format
that even users unfamiliar with SAS® can interpret. This
graph is ideal for analysts finding problematic areas such as outliers and
trend-breakers or for managers to quickly discern complications and the
effect they have on overall results.
William Zupko II, DHS
The new Markov chain Monte Carlo (MCMC) procedure introduced in
SAS/STAT® 9.2 and further exploited in SAS/STAT®
9.3 enables Bayesian computations to run efficiently with
SAS®. The MCMC procedure allows one to carry out complex
statistical modeling within Bayesian frameworks under a wide spectrum of
scientific research; in psychometrics, for example, the estimation of item
and ability parameters is a kind. This paper describes how to use PROC
MCMC for Bayesian inferences of item and ability parameters under a
variety of popular item response models. This paper also covers how the
results from SAS PROC MCMC are different from or similar to the results
from WinBUGS. For those who are interested in the Bayesian approach to
item response modeling, it is exciting and beneficial to shift to SAS,
based on its flexibility of data managements and its power of data
analysis. Using the resulting item parameter estimates, one can continue
to test form constructions, test equatings, etc., with all these test
development processes being accomplished with SAS!
Yi-Fang Wu, Department of Educational Measurement and Statistics, Iowa
Testing Programs, University of Iowa
Existing health literacy assessment tools developed for research purposes
have constraints that limit their utility for clinical practice. The
measurement of health literacy in clinical practice can be impractical due
to the time requirements of existing assessment tools. Single Item
Literacy Screener (SILS) items, which are self-administered brief
screening questions, have been developed to address this constraint. We
developed a model to predict limited health literacy that consists of two
SILS and demographic information (for example, age, race, and education
status) using a sample of patients in a St. Louis emergency department. In
this paper, we validate this prediction model in a separate sample of
patients visiting a primary care clinic in St. Louis. Using the prediction
model developed in the previous study, we use SAS/STAT®
software to validate this model based on three goodness of fit criteria:
rescaled R-squared, AIC, and BIC. We compare models using two different
measures of health literacy, Newest Vital Sign (NVS) and Rapid Assessment
of Health Literacy in Medicine Revised (REALM-R). We evaluate the
prediction model by examining the concordance, area under the ROC curve,
sensitivity, specificity, kappa, and gamma statistics. Preliminary results
show 69% concordance when comparing the model results to the REALM-R and
66% concordance when comparing to the NVS. Our conclusion is that
validating a prediction model for inadequate health literacy would provide
a feasible way to assess health literacy in fast-paced clinical settings.
This would allow us to reach patients with limited health literacy with
educational interventions and better meet their information needs.
Lucy D�Agostino McGowan, Washington University School of Medicine
Melody S. Goodman, Washington University School of Medicine
Kimberly A. Kaphingst, Washington University School of Medicine
There are yearly 2.35 million road accident cases recorded in the U.S.
Among them, 37,000 were considered fatal. Road crashes cost USD 230.6
billion per year, or an average of USD 820 per person. Our efforts are to
identify the important factors that lead to vehicle collisions and to
predict the injury risk involved in them. Data was collected from National
Automotive Sampling System (NASS), containing 20,247 cases with 19
variables. Input variables describe the factors involved in an accident
like Height, Age, Weight, Gender, Vehicle model year, Speed limit, Energy
absorption in Collision & Deformation location, etc. The target variable
is nominal showing levels of injury. Missing values in interval variables
were imputed using mean and class variables using the count method.
Multivariate analysis suggests high correlation between tire footprint and
wheelbase (Corr=0.97, P<0.0001) and original weight of car and curb weight
of car (Corr=0.79, P<0.0001). Variables having high kurtosis values were
transformed using range standardization. Variables were sorted using
variable importance using decision tree analysis. Models like multiple
regression, polynomial regression, neural network, and decision tree were
applied in the dataset to identify the factors that are most significant
in predicting the injury risk. Multilinear perception neural network came
out to be the best model to predict injury risk index, with the least
Average Squared Error 0.086 in validation dataset.
Prateek Khare, Oklahoma State University
Vandana Reddy, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Developing a good graph with ODS statistical graphics becomes a challenge
when the input data maps to crowded displays with overlapping points or
lines. Such is the case with the Framingham Heart Study of 5209 subjects
captured in the Sashelp.Heart data set, a series of 100 booking curves for
the airline industry, and three interleaving series plots that capture
closing stock values over a twenty year period for three giants in the
computer industry. In this paper, transparency, layering, data point
rounding, and color coding are evaluated for their effectiveness to add
visual clarity to graphics output. SAS® Graph Template
Language plotting statements (compatible with SAS® 9.2) that
are referenced in this paper include HISTOGRAM, SCATTERPLOT, BANDPLOT, and
SERIESPLOT, as well as the layout statements OVERLAY, DATAPANEL, LATTICE,
and GRIDDED, which produce single or multiple-panel graphs. SAS Graph
Template Language is chosen over ODS Graphics procedures because of its
greater graphics capability. While the original version of the paper used
SAS 9.2, the latest version incorporates SAS® 9.3 updates
such as HEATMAPPARM for heat maps that add a third dimension to a graph
via color, and the RANGEATTRMAP statement for grouping continuous data in
a legend. If you have a license for SAS 9.3, you automatically have access
to Graph Template Language. Since this is not a tutorial, you will get
more out of this presentation if you have read introductory papers or
Warren Kuhfeld s book Statistical Graphics in SAS®: An
Introduction to the Graph Template Language and the Statistical Graphics
Procedures .
Perry Watts, Stakana Analytics
Nate Derby, Stakana Analytics
Dataprev has become the principal owner of social data on the citizens in
Brazil by collecting information for over forty years in order to
subsidize pension applications for the government. The use of this data
can be expanded to provide new tools to aid policy and assist the
government to optimize the use of its resources. Using SAS®
MDM, we are developing a solution that uniquely identifies the citizens of
Brazil. Overcoming challenges with multiple government agencies and with
the validation of survey records that suggest the same person requires
rules for governance and a definition of what represents a particular
Brazilian citizen. In short, how do you turn a repository of master data
into an efficient catalyst for public policy? This is the goal for
creating a repository focused on identifying the citizens of Brazil.
Ielton de Melo Gonçalves, Dataprev
This presentation will teach the audience how to use SAS®
ODS Graphics. Now part of Base SAS®, ODS Graphics is a great
way to easily create clear graphics that enable any user to tell their
story well. SGPLOT and SGPANEL are two of the procedures that can be used
to produce powerful graphics that used to require a lot of work. The core
of the procedures are explained, as well as the options available.
Furthermore, we explore the ways to combine the individual statements to
make more complex graphics that tell the story better. Any user of Base
SAS on any platform will find great value from the SAS ODS Graphics
procedures.
Chuck Kincaid, Experis Business Analytics
This paper discusses the techniques I used at the Census Bureau to
overcome the issue of dealing with large amounts of data while modernizing
some of their public-facing web applications by using service oriented
architecture (SOA) to deploy Flex web applications powered by
SAS®. The paper covers techniques that resulted in reducing
142,293 XML lines (3.6 MB) down to 15,813 XML lines (1.8 MB), a 50% size
reduction on the server side (HTTP Response), and 196,167 observations
down to 283 observations, a reduction of 99.8% in summarized data on the
client side (XML Lookup file).
Ahmed Al-Attar, AnA Data Warehousing Consulting, LLC
In the past, calibration was done by using extremely complicated macros in
Base SAS® to create a Microsoft Excel workbook with multiple
linked spreadsheets. This process made it hard to audit, was not reliably
replicable, and was open to user error. The task was to create a
replicable, auditable, and locked down application that allowed the user
to change certain parameters and see the impact of those changes without
needing to code. SAS® Stored Processes are used to generate
a screen that is split into three sections: one shows static reporting,
the second is a data-driven custom input form, and the third shows test
results. The initial screen uses a standard stored process that enables
the user to select the model and time period. Macro variables are passed
through to subset data. The Static reports are created from a stored
process that executes two REPORT procedures that subset the data based on
the passed parameters. The form is built using SAS® to
generate HTML and is data driven. The Update button at the end of the form
executes a stored process that collects the data that the user has entered
into the form and updates a database. After the rates have been updated,
they are used to generate test results using PROC REPORT.
Anita Measey, Bank of Montreal
The Affordable Care Act that is being implemented now is expected to
fundamentally reshape the health care industry. All current
participants--providers, subscribers, and payers--will operate differently
under a new set of key performance indicators (KPIs). This paper uses
public data and SAS® software to establish a baseline for
the health care industry today so that structural changes can be measured
in the future to establish the impact of the new laws.
John Cohen, Advanced Data Concepts LLC
Meenal (Mona) Sinha, Independence Blue Cross
Health plans use wide-ranging interventions based on criteria set by
nationally recognized organizations (for example, NCQA and CMS) to change
health-related behavior in large populations. Evaluation of these
interventions has become more important with the increased need to report
patient-centered quality of care outcomes. Findings from evaluations can
detect successful intervention elements and identify at-risk patients for
further targeted interventions. This paper describes how SAS®
was applied to evaluate the effectiveness of a patient-directed
intervention designed to increase medication adherence and a health plan s
CMS Part D Star Ratings. Topics covered include querying data warehouse
tables, merging pharmacy and eligibility claims, manipulating data to
create outcome variables, and running statistical tests to measure
pre-post intervention differences.
Scott Leslie, MedImpact Healthcare Systems, Inc.
Comprehensive cancer centers have been mandated to engage communities in
their work; thus, measurement of community engagement is a priority area.
Siteman Cancer Center s Program for the Elimination of Cancer Disparities
(PECaD) projects seek to align with 11 Engagement Principles (EP)
previously developed in the literature. Participants in a PECaD pilot
project were administered a survey with questions on community engagement
in order to evaluate how well the project aligns with the EPs. Internal
consistency is examined using PROC CORR with the ALPHA option to calculate
Cronbach s alpha for questions that relate to the same EP. This allows
items that have a lack of internal consistency to be identified and to be
edited or removed from the assessment. EP-specific scores are developed on
quantity and quality scales. Lack of internal consistency was found for
six of the 16 EP s examined items (alpha<.70). After editing the items,
all EP question groups had strong internal consistency (alpha>.85). There
was a significant positive correlation between quantity and quality scores
(r=.918, P<.001). Average EP-specific scores ranged from 6.87 to 8.06;
this suggests researchers adhered to the 11 EPs between sometime and most
of the time on the quantity scale and between good and very good on the
quality scale. Examining internal consistency is necessary to develop
measures that accurately determine how well PECaD projects align with EPs.
Using SAS® to determine internal consistency is an integral
step in the development of community engagement scores.
Renee Gennarelli, Washington University School of Medicine
Melody Goodman, Washington University School of Medicine
Especially in this current financial climate, many of us are being asked
to do more with less. For several years, the Office of Institutional
Research and Testing at Baylor University has been using SAS®
software to increase the efficiency of the office and of the University as
a whole. Reports that were once prepared manually have been automated.
Data quality processes have been implemented in order to reduce the number
of duplicate mailings. Predictive modeling is used to focus recruiting
efforts on those prospective students most likely to respond. A web-based
portal has been created to provide self-service report generation for many
administrators across campus. Along with this, a number of data processing
functions have been centralized, eliminating the need for additional
programming skills and software support. This presentation discusses these
improvements in more detail and provides examples of the end results.
Faron Kincheloe, Baylor University
The Patient-Centered Outcomes Research Institute (PCORI) was created as
part of the Affordable Care Act. PCORI is authorized by Congress to
conduct research to provide information about the best available evidence
to help patients and their health care providers make more informed
decisions. Community Care Behavioral Health Organization in Pittsburgh,
Pennsylvania was awarded a PCORI research grant to investigate health care
system improvements for adults with serious mental illness. The grant,
titled Optimizing Behavioral Health Homes by Focusing on Outcomes that
Matter Most for Adults with Serious Mental Illness, began in January of
2013 and is ongoing. Information Technology staff at Community Care have
leveraged SAS® solutions in providing real-time data
extraction and reports to support the development and implementation of
this research project. SAS tools have been used to merge data from
multiple platforms and database sources, including web data sources. SAS
has also enabled the formatting and traffic lighting of multiple Microsoft
Excel data sets and files, in addition to the creation of many operational
reports and data files needed for study implementation, administration,
and maintenance. The challenges faced and the SAS solutions employed are
the subject of this paper.
Michele Mesiano, Community Care Behavioral Health Organization
Meghna Parthasarathy, Community Care Behavioral Health Organization
Lauren Terhorst, Community Care Behavioral Health Organization
When providing lengthy cost and utilization data to medical providers, it
is ideal to sort the report by descending cost (or utilization) so that
the important categories are at the top. This task can be easily solved
using PROC SORT. However, when you need other variables (such as unit cost
per procedure or national average) to follow the sort but not be sorted
themselves, the solution is not as intuitive. This paper looks at several
sorting algorithms to solve this problem. First, we look at the basic
bubble sort (which is still effective for smaller data sets), which sets
up arrays for each variable and then sorts on just one of them. Next, we
discuss the quicksort algorithm, which is effective for large data sets,
too. The results of the sorts provide sorted data that is easy to read and
makes for effective analysis.
Matthew Neft, Highmark Inc.
Chelle Pronko, Highmark Inc.
When reading data files or writing SAS® programs, we are
often hunting for the right format or informat. There are so many to
choose from! Does it seem like too many to search the manual? Let SAS help
find the right one! We use the SAS dictionary table VFORMAT and a very
small SAS program. This presentation demonstrates how two simple functions
unlock the potential of this great resource: SASHELP.VFORMAT.
Peter Crawford, Crawford Software Consultancy Limited
Researchers often rely on self-report for survey based studies. The
accuracy of this self-reported data is often unknown, particularly in a
medical setting that serves an under-insured patient population with
varying levels of health literacy. We recruited participants from the
waiting room of a St. Louis primary care safety net clinic to participate
in a survey investigating the relationship between health environments and
health outcomes. The survey included questions regarding personal and
family history of chronic disease (diabetes, heart disease, and cancer) as
well as BMI and self-perceived weight. We subsequently accessed the
participant s electronic medical record (EMR) and collected
physician-reported data on the same variables. We calculated concordance
rates between participant answers and information gathered from EMRs using
McNemar s chi-squared test. Logistic regression was then performed to
determine the demographic predictors of concordance. Three hundred
thirty-two patients completed surveys as part of the pilot phase of the
study; 64% female, 58% African American, 4% Hispanic, 15% with less than
high school level education, 76% annual household income less than
$20,000, and 29% uninsured. Preliminary findings suggest an 82-94%
concordance rate between self-reported and medical record data across
outcomes, with the exception of family history of cancer (75%) and heart
disease (42%). Our conclusion is that determining the validity of the
self-reported data in the pilot phase influences whether self-reported
personal and family history of disease and BMI are appropriate for use in
this patient population.
Sarah Lyons, Washington University School of Medicine
Kimberly Kaphingst, Washington University School of Medicine
Melody Goodman, Washington University School of Medicine
Paper SAS139-2014:
Visualize, Analyze, and Deploy with New SAS Data Mining and Forecasting
Web Clients
The demand for scalable and approachable analytics through easy-to-use
interfaces has increased exponentially. SAS has developed new web-based
analytic interfaces that extend the capabilities of its data mining and
forecasting web suites to address this demand. By taking advantage of our
latest high-performance analytics technology, SAS users can build scalable
models with an automated approach. Why build one model when you can use
new clients from SAS to build hundreds--incorporating all of your
data--with a few clicks? With a model factory approach, users can build
models down to the product and SKU level, and SAS will produce
exception-based reports to aid adjustments. During this session, you will
gain an early glimpse into the latest analytic web interface development
and have an opportunity to provide feedback.
Jonathan Wexler, SAS
The world's first wind resource assessment buoy, residing in Lake
Michigan, uses a pulsing laser wind sensor to accurately measure wind
speed, direction, and turbulence offshore up to wind turbine hub-height
and across the blade span every second. Understanding wind behavior would
be tedious and fatiguing with such large data sets. However, SAS/GRAPH®
9.4 helps the user grasp wind characteristics over time and at different
altitudes by exploring the data visually. This paper covers graphical
approaches to evaluate wind speed validity, seasonal wind speed variation,
and storm systems to inform engineers on the candidacy of Lake Michigan
offshore wind farms.
Aaron Clark, Grand Valley State University
Volatility estimation plays an important role in the elds of statistics
and nance. Many different techniques address the problem of estimating
volatility of nancial assets. Autoregressive conditional
heteroscedasticity (ARCH) models and the related generalized ARCH models
are popular models for volatility. This talk will introduce the need for
volatility modeling as well as introduce the framework of ARCH and GARCH
models. A brief discussion about the structure of ARCH and GARCH models
will then be compared to other volatility modeling techniques.
Aric LaBarr, Institute for Advanced Analytics
Data quality is at the very heart of accurate, relevant, and trusted
information, but traditional techniques that require the data to be moved,
cleansed, and repopulated simply can't scale up to cover the ultra-jumbo
nature of big data environments. This paper describes how SAS®
Data Quality accelerators for databases like Teradata and Hadoop deliver
data quality for big data by operating in situ and in parallel on each of
the nodes of these clustered environments. The paper shows how data
quality operations can be easily modified to leverage these technologies.
It examines the results of performance benchmarks that show how
in-database operations can scale to meet the demands of any use case, no
matter how big a big data mammoth you have.
Mike Frost, SAS
Missing observations caused by dropouts or skipped visits present a
problem in studies of longitudinal data. When the analysis is restricted
to complete cases and the missing data depend on previous responses, the
generalized estimating equation (GEE) approach, which is commonly used
when the population-average effect is of primary interest, can lead to
biased parameter estimates. The new GEE procedure in SAS/STAT
®
13.2 implements a weighted GEE method, which provides consistent parameter
estimates when the dropout mechanism is correctly specified. When none of
the data are missing, the method is identical to the usual GEE approach,
which is available in the GENMOD procedure. This paper reviews the
concepts and statistical methods. Examples illustrate how you can apply
the GEE procedure to incomplete longitudinal data.
Download the ZIP file
Guixian Lin, SAS
Bob Rodriguez, SAS
In connection with the consolidation work at Nykredit, the data stored on
the Nykredit z/OS SAS® installation had to be migrated
(copied) to the new x64 Windows SAS platform storage. However, getting an
overview of these data on the z/OS mainframe can be difficult, and a
series of questions arise during the process. For example: Who is
responsible? How many bytes? How many rows and columns? When were the data
created? And so on. With extensive use of filename FTP and looping, and
extracting metadata, it is possible to get an overview of the data on the
host presented in a Microsoft Excel spreadsheet.
Jesper Michelsen, Nykredit
A Norwegian hospital, Nordlandssykehuset, is using SAS® to
automate the Global Trigger Tool (GTT) method to monitor and reveal
incidents of adverse events in the treatment of patients by search of
structured and unstructured data within medical records.
Tonje Hansen, Nordland Hospital Trust
Do you know everything you need to know about missing values? Do you know
how to assign a missing value to multiple variables with one statement?
Can you display missing values as something other than . or blank? How
many types of missing numeric values are there? This paper reviews
techniques for assigning, displaying, referencing, and summarizing missing
values for numeric variables and character variables.
Christopher Bost, MDRC
The latest releases of SAS® Data Integration Studio and
SAS® Data Management provide an integrated environment for
managing and transforming your data to meet new and increasingly complex
data management challenges. The enhancements help develop efficient
processes that can clean, standardize, transform, master, and manage your
data. The latest features include: capabilities for building complex job
processes web and tablet environments for managing your data enhanced ELT
transformation capabilities big data transformation capabilities for
Hadoop integration with the SAS® LASR™ platform
enhanced features for lineage tracing and impact analysis new features for
master data and metadata management This paper provides an overview of the
latest features of the products and includes use cases and examples for
leveraging product capabilities.
Nancy Rausch, SAS
Mike Frost, SAS
Michael Ames, SAS
Over the last year, the SAS® Enterprise Miner™
development team has made numerous and wide-ranging enhancements and
improvements. New utility nodes that save data, integrate better with
open-source software, and register models make your routine tasks easier.
The area of time series data mining has three new nodes. There are also
new models for Bayesian network classifiers, generalized linear models
(GLMs), support vector machines (SVMs), and more.
Jared Dean, SAS
Jonathan Wexler, SAS
Paper SAS1584-2014:
What's New in SAS® Merchandise Planning
SAS® Merchandise Planning introduces key changes with the
recent 6.4 release and the upcoming 6.5 release. This session highlights
the integration to SAS® Visual Analytics, the analytic
infrastructure that enables users to integrate analytic results into their
planning decisions, as well as multiple usability enhancements. Included
is a look at the first of the packaged analytics that include the
Recommended Assortment analytic.
Elaine Markey, SAS
Expensive physical capital must be regularly maintained for optimal
efficiency and long-term insurance against damage. The maintenance process
usually consists of constantly monitoring high-frequency sensor data and
performing corrective maintenance when the expected values do not match
the actual values. An economic system can also be thought of as a system
that requires constant monitoring and occasional maintenance in the form
of monetary or fiscal policy. This paper shows how to use the SSM
procedure in SAS/ETS® to make forecasts of expected values
by using high-frequency multivariate time series. The paper also
demonstrates the functionality of the new SASEFRED interface engine in
SAS/ETS.
Kenneth Sanford, SAS
SAS® has an amazing arsenal of tools to use and display
geographic information that is relatively unknown and underutilized. This
presentation will highlight both new and existing capacities for creating
stunning, informative maps as well as using geographic data in other ways.
SAS provided map data files, functions, format libraries and other
geographic data files will be explored in detail. Custom mapping of
geographic areas will be discussed. Maps produced will include use of both
the annotate facility (including some new functions) and PROC GREPLAY.
Products used are Base SAS® and SAS/GRAPH®.
SAS programmers of any skill level will benefit from this presentation.
Louise Hadden, Abt Associates Inc.
A European utility company has several thousand service engineers who
provide its customers with services that range from performing routine
maintenance to handling emergency breakdowns. Each service engineer is
assigned to a work area that consists of a set of postal sectors. The
company wants to understand how it should configure its work areas to
improve customer satisfaction, minimize travel time for its full-time
service engineers, and minimize the costs of overtime and subcontractor
hours. This paper describes the use of SAS/OR® optimization
procedures to model this problem and configure optimal work areas, and the
use of SAS® Simulation Studio to simulate how the optimal
configurations might satisfy the customer service requirements. The
experimental results show that the proposed solution can satisfy customer
demand within the desired service-time window, with significantly less
travel time for the engineers, and with lower overtime and subcontractor
costs.
Jinxin Yi, SAS
Emily Lada, SAS
Anne Smith, SAS
Colin Gray, SAS
The DATA step allows one to read, write, and manipulate many types of
data. As data evolves to a more free-form state, the ability of SAS®
to handle character data becomes increasingly important. This paper
addresses character data from multiple vantage points. For example, what
is the default length of a character string, and why does it appear to
change under different circumstances? What type of formatting is available
for character data? How can we examine and manipulate character data? The
audience for this paper is beginner to intermediate, and the goal is to
provide an introduction to the numerous character functions available in
SAS, including the basic LENGTH and SUBSTR functions, plus many others.
Andrew Kuligowski, HSN
Swati Agarwal, Optum
SAS® Visual Analytics is one of the newer SAS®
products with a lot of excitement surrounding it. But what is SAS Visual
Analytics really? By examining the similarities, differences, and
synergies between SAS Visual Analytics and other SAS offerings, we can
more clearly understand this new product.
Brian Varney, Experis Business Analytics
We receive a daily file with information about patients who use our drug.
It s updated every day so that we have the most current information.
Nearly every variable on a patient s record can be different from one day
to the next. But what if you wanted to capture information that changed?
For example, what if a patient switched doctors sometime along the way,
and the original prescribing doctor is different than the patient's
present doctor? With this type of daily file, that information is lost. To
avoid losing these changes, you have to build a cumulative data set. I ll
show you how to build it.
Myra Oltsik, Acorda Therapeutics
Paper 2166-2014:
You Have an Assortment Plan; Now What?
57 Category teams. 8,500 stores. 10,000 SKUs. 1 integrated Planning
Solution. Deploying a stand-alone Assortment Planning system creates an
isolated planning structure that adds complexity to your ability to
deliver results in a dynamic retail environment. Today s challenging
competitive and economic conditions reward retailers who take the
opportunity to integrate their strategic systems into their downstream
execution. This presentation describes the approach Family Dollar followed
to integrate SAS® Assortment Planning with existing
operational systems. The result? Reduced complexity, improved efficiency,
and better on-time execution in our stores.
Ryan Kehoe, Family Dollar Stores
Wesley Stewart, Family Dollar Stores