Getting Started Papers A-Z

A
Session 8160-2016:
A Descriptive Analysis of Reported Health Issues in Rural Jamaica
There are currently thousands of Jamaican citizens that lack access to basic health care. In order to improve the health-care system, I collect and analyze data from two clinics in remote locations of the island. This report analyzes data collected from Clarendon Parish, Jamaica. In order to create a descriptive analysis, I use SAS® Studio 9.4. A few of the procedures I use include: PROC IMPORT, PROC MEANS, PROC FREQ, and PROC GCHART. After conducting the aforementioned procedures, I am able to produce a descriptive analysis of the health issues plaguing the island.
Read the paper (PDF) | View the e-poster or slides (PDF)
Verlin Joseph, Florida A&M University
Session SAS5280-2016:
A Guide to Section 508 Compliance Using SAS® 9.4 Output Delivery System (ODS)
Disability rights groups are using the court system to exact change. As a result, the enforcement of Section 508 and similar laws around the world has become a priority. When you generate SAS® output, you need to protect your organization from litigation. This paper describes concrete steps that help you use SAS® 9.4 Output Delivery System (ODS) to create SAS output that complies with accessibility standards. It also provides recommendations and code samples that are aligned with the accessibility standards defined by Section 508 and the Web Content Accessibility Guidelines (WCAG 2.0).
Read the paper (PDF)
Glen Walker, SAS
Session 10000-2016:
A Macro That Can Fix Data Length Inconsistency and Detect Data Type Inconsistency
Common tasks that we need to perform are merging or appending SAS® data sets. During this process, we sometimes get error or warning messages saying that the same fields in different SAS data sets have different lengths or different types. If the problems involve a lot of fields and data sets, we need to spend a lot of time to identify those fields and write extra SAS codes to solve the issues. However, if you use the macro in this paper, it can help you identify the fields that have inconsistent data type or length issues. It also solves the length issues automatically by finding the maximum field length among the current data sets and assigning that length to the field. An html report is generated after running the macro that includes the information about which fields' lengths have been changed and which fields have inconsistent data type issues.
Read the paper (PDF) | Watch the recording
Ting Sa, Cincinnati Children's Hospital Medical Center
Session 10082-2016:
A Pseudo-Interactive Approach to Teaching SAS® Programming
With the advent of the exciting new hybrid field of Data Science, programming and data management skills are in greater demand than ever and have never been easier to attain. Online resources like codecademy and w3schools offer a host of tutorials and assistance to those looking to develop their programming abilities and knowledge. Though their content is limited to languages and tools suited mostly for web developers, the value and quality of these sites are undeniable. To this end, similar tutorials for other free-to-use software applications are springing up. The interactivity of these tutorials elevates them above most, if not all, other out-of-classroom learning tools. The process of learning programming or a new language can be quite disjointed when trying to pair a textbook or similar walk-through material with matching coding tasks and problems. These sites unify these pieces for users by presenting them with a series of short, simple lessons that always require the user to demonstrate their understanding in a coding exercise before progressing. After teaching SAS® in a classroom environment, I became fascinated by the potential for a similar student-driven approach to learning SAS. This could afford me more time to provide individualized attention, as well as open up additional class time to more advanced topics. In this talk, I discuss my development of a series of SAS scripts that walk the user through learning the basics of SAS and that involve programming at every step of the process. This collection of scripts should serve as a self-contained, pseudo-interactive course in SAS basics that students could be asked to complete on their own in a few weeks, leaving the remainder of the term to be spent on more challenging, realistic tasks.
Read the paper (PDF) | Download the data file (ZIP)
Hunter Glanz, California Polytechnic State University
Session 10340-2016:
Agile BI: How Eandis is using SAS® Visual Analytics for Energy Grid Management
Eandis is a rapidly growing energy distribution grid operator in the heart of Europe, with requirements to manage power distribution on behalf of 229 municipalities in Belgium. With a legacy SAP data warehouse and other diverse data sources, business leaders at Eandis faced challenges with timely analysis of key issues such as power quality, investment planning, and asset management. To face those challenges, a new agile way of thinking about Business Intelligence (BI) was necessary. A sandbox environment was introduced where business key-users could explore and manipulate data. It allowed them to have approachable analytics and to build prototypes. Many pitfalls appeared and the greatest challenge was the change in mindset for both IT and business users. This presentation addresses those issues and possible solutions.
Read the paper (PDF)
Olivier Goethals, Eandis
Session SAS3960-2016:
An Insider's Guide to SAS/ACCESS® Interface to Impala
Impala is an open source SQL engine designed to bring real-time, concurrent, ad hoc query capability to Hadoop. SAS/ACCESS® Interface to Impala allows SAS® to take advantage of this exciting technology. This presentation uses examples to show you how to increase your program's performance and troubleshoot problems. We discuss how Impala fits into the Hadoop ecosystem and how it differs from Hive. Learn the differences between the Hadoop and Impala SAS/ACCESS engines.
Read the paper (PDF)
Jeff Bailey, SAS
Session 6406-2016:
An Introduction to SAS® Arrays
So you've heard about SAS® arrays, but you're not sure when or why you would use them. This presentation provides some background on SAS arrays, from explaining what occurs during compile time to explaining how to use them programmatically. It also includes a discussion about how DO loops and macro variables can enhance array usability. Specific examples, including Fahrenheit-to-Celsius temperature conversion, salary adjustments, and data transposition and counting, demonstrate how you can use SAS arrays effectively in your own work and also provide a few caveats about their use.
Read the paper (PDF) | Watch the recording
Andrew Kuligowski, HSN
Lisa Mendez, IMS Government Solutions
Session 7320-2016:
Analytics of Things: Golf a Good Walk Spoiled?
This paper demonstrates techniques using SAS® software to combine data from devices and sensors for analytics. Base SAS®, SAS® Data Integration Studio, and SAS® Visual Analytics are used to query external services, import, fuzzy match, analyze, and visualize data. Finally, the benefits of SAS® Event Stream Processing models and alerts is discussed. To bring the analytics of things to life, the following data are collected: GPS data from countryside walks, GPS and score card data from smart phones while playing golf, and meteorological data feeds. These data are combined to test the old adage that golf is a good walk spoiled. Further, the use of alerts and potential for predictive analytics is discussed.
Read the paper (PDF)
David Shannon, Amadeus Software
B
Session 2761-2016:
Be More Productive! Tips and Tricks to Improve your SAS® Programming Environment
For me, it's all about avoiding manual effort and repetition. Whether your work involves data exploration, reporting, or analytics, you probably find yourself repeating steps and tasks with each new program, project, or analysis. That repetition adds time to the delivery of results and also contributes to a lack of standardization. This presentation focuses on productivity tips and tricks to help you create a standard and efficient environment for your SAS® work so you can focus on the results and not the processes. Included are the following: setting up your programming environment (comment blocks, environment cleanup, easy movement between test and production, and modularization) sharing easily with your team (format libraries, macro libraries, and common code modules) managing files and results (date and time stamps for logs and output, project IDs, and titles and footnotes)
Read the paper (PDF) | Watch the recording
Marje Fecht, Prowerk Consulting
Session 10861-2016:
Best Practices in Connecting External Databases to SAS®
Connecting database schemas to libraries in the SAS® metadata is a very important part of setting up a functional and useful environment for business users. This task can be quite difficult for the untrained administrator. This paper addresses the key configuration items that often go unnoticed but can make a big difference. Using the wrong options can lead to poor database performance or even to a total lockdown, depending on the number of connections to the database.
Read the paper (PDF) | Watch the recording
Mathieu Gaouette, Videotron
Session 11360-2016:
Breakthroughs at Old Dominion Electric Cooperative with Energy Load Forecasting Innovation
The electrical grid has become more complex; utilities are revisiting their approaches, methods, and technology to accurately predict energy demands across all time horizons in a timely manner. With the advanced analytics of SAS® Energy Forecasting, Old Dominion Electric Cooperative (ODEC) provides data-driven load predictions from next hour to next year and beyond. Accurate intraday forecasts means meeting daily peak demands saving millions of dollars at critical seasons and events. Mid-term forecasts provide a baseline to the cooperative and its members to accurately anticipate regional growth and customer needs, in addition to signaling power marketers where, when, and how much to hedge future energy purchases to meet weather-driven demands. Long-term forecasts create defensible numbers for large capital expenditures such as generation and transmission projects. Much of the data for determining load comes from disparate systems such as supervisory control and data acquisition (SCADA) and internal billing systems combined with external market data (PJM Energy Market), weather, and economic data. This data needs to be analyzed, validated, and shaped to fully leverage predictive methods. Business insights and planning metrics are achieved when flexible data integration capabilities are combined with advanced analytics and visualization. These increased computing demands at ODEC are being achieved by leveraging Amazon Web Services (AWS) for expanded business discovery and operational capacity. Flexible and scalable data and discovery environments allow ODEC analysts to efficiently develop and test models that are I/O intensive. SAS® visualization for the analyst is a graphic compute environment for information-sharing that is memory intensive. Also, ODEC IT operations require deployment options tuned for process optimization to meet service level agreements that can be quickly evaluated, tested, and promoted into production. What was once very difficult for most ut ilities to embrace is now achievable with new approaches, methods, and technology like never before.
Read the paper (PDF) | Watch the recording
David Hamilton, ODEC
Steve Becker, SAS
Emily Forney, SAS
Session SAS5661-2016:
Bringing Google Analytics, Facebook, and Twitter Data to SAS® Visual Analytics
Your marketing team would like to pull data from its different marketing activities into one report. What happens in Vegas might stay in Vegas, but what happens in your data does not have to stay there, locked in different tools or static spreadsheets. Learn how to easily bring data from Google Analytics, Facebook, and Twitter into SAS® Visual Analytics to create interactive explorations and reports on this data along with your other data for better overall understanding of your marketing activity.
Read the paper (PDF) | Watch the recording
I-Kong Fu, SAS
Mark Chaves, SAS
Andrew Fagan, SAS
Session SAS6682-2016:
Bringing the US Department of Defense from PC to the Enterprise!
A United States Department of Defense agency with over USD 40 billion in sales and revenue, 25 thousand employees, and 5.3 million parts to source, partnered with SAS® to turn their disparate PC-based analytic environment into a modern SAS® Grid Computing server-based architecture. This presentation discusses the challenges of under-powered desktops, data sprawl, outdated software, difficult upgrades, and inefficient compute processing and the solution crafted to enable the agency to run as the Fortune 50 company that its balance sheet (and our nation's security) demand. In the modern architecture, rolling upgrades, high availability, centralized data set storage, and improved performance enable improved forecasting getting our troops the supplies they need, when and where they need them.
Read the paper (PDF)
Erin Stevens, SAS
Douglas Liming, SAS
Session 4021-2016:
Building and Using User-Defined Formats
Formats are powerful tools within the SAS® System. They can be used to change how information is brought into SAS, to modify how it is displayed, and even to reshape the data itself. Base SAS® comes with a great many predefined formats, and it is even possible for you to create your own specialized formats. This paper briefly reviews the use of formats in general and then covers a number of aspects of user-generated formats. Since formats themselves have a number of uses that are not at first apparent to the new user, we also look at some of the broader applications of formats. Topics include building formats from data sets, using picture formats, transformations using formats, value translations, and using formats to perform table lookups.
Read the paper (PDF) | Download the data file (ZIP)
Art Carpenter, California Occidental Consultants
C
Session 11840-2016:
Can You Read This into SAS® for Me: Using INFILE and INPUT to Load Data into SAS®
With all the talk of big data and visual analytics, we sometimes forget how important, and often difficult, it is to get external data into SAS®. In this paper, we review some common data sources such as delimited sources (for example, comma-separated values format [CSV]) as well as structured flat files and the programming steps needed to successfully load these files into SAS. In addition to examining the INFILE and INPUT statements, we look at some methods for dealing with bad data. This paper assumes only basic SAS skills, although the topic can be of interest to anyone who needs to read external files.
Read the paper (PDF) | Watch the recording
Peter Eberhardt, Fernwood Consulting Group Inc.
Audrey Yeo, Athene
Session 7940-2016:
Careers in Biotatistics and Clinical SAS® Programming: An Overview for the Uninitiated
In the biopharmaceutical industry, biostatistics plays an important and essential role in the research and development of drugs, diagnostics, and medical devices. Familiarity with biostatistics combined with knowledge of SAS® software can lead to a challenging and rewarding career that also improves patients' lives. This paper provides a broad overview of the different types of jobs and career paths available, discusses the education and skill sets needed for each, and presents some ideas for overcoming entry barriers to careers in biostatistics and clinical SAS programming.
Read the paper (PDF)
Justina Flavin, Independent Consultant
Session 2440-2016:
Change Management: The Secret to a Successful SAS® Implementation
Whether you are deploying a new capability with SAS® or modernizing the tool set that people already use in your organization, change management is a valuable practice. Sharing the news of a change with employees can be a daunting task and is often put off until the last possible second. Organizations frequently underestimate the impact of the change, and the results of that miscalculation can be disastrous. Too often, employees find out about a change just before mandatory training and are expected to embrace it. But change management is far more than training. It is early and frequent communication; an inclusive discussion; encouraging and enabling the development of an individual; and facilitating learning before, during, and long after the change. This paper not only showcases the importance of change management but also identifies key objectives for a purposeful strategy. We outline our experiences with both successful and not so successful organizational changes. We present best practices for implementing change management strategies and highlighting common gaps. For example, developing and engaging Change Champions from the beginning alleviates many headaches and avoids disruptions. Finally, we discuss how the overall company culture can either support or hinder the positive experience change management should be and how to engender support for formal change management in your organization.
Read the paper (PDF) | Watch the recording
Greg Nelson, ThotWave
Session 9541-2016:
Cleaning Up Your SAS® Log: Note Messages
As a SAS® programmer, you probably spend some of your time reading and possibly creating specifications. Your job also includes writing and testing SAS code to produce the final product, whether it is Study Data Tabulation Model (SDTM) data sets, Analysis Data Model (ADaM) data sets, or statistical outputs such as tables, listings, or figures. You reach the point where you have completed the initial programming, removed all obvious errors and warnings from your SAS log, and checked your outputs for accuracy. You are almost done with your programming task, but one important step remains. It is considered best practice to check your SAS log for any questionable messages generated by the SAS system. In addition to messages that begin with the words WARNING or ERROR, there are also messages that begin with the words NOTE or INFO. This paper will focus on five different types of NOTE messages that commonly appear in the SAS log and will present ways to remove these messages from your log.
Read the paper (PDF) | Watch the recording
Jennifer Srivastava, Quintiles
Session 8360-2016:
Creating a Better SAS® Visual Analytics User Experience while Working under HIPAA Data Restrictions
The HIPAA Privacy Rule can restrict geographic and demographic data used in health-care analytics. After reviewing the HIPAA requirements for de-identification of health-care data used in research, this poster guides the beginning SAS® Visual Analytics user through different options to create a better user experience. This poster presents a variety of data visualizations the analyst will encounter when describing a health-care population. We explore the different options SAS Visual Analytics offers and also offer tips on data preparation prior to using SAS® Visual Analytics Designer. Among the topics we cover are SAS Visual Analytics Designer object options (including geo bubble map, geo region map, crosstab, and treemap), tips for preparing your data for use in SAS Visual Analytics, and tips on filtering data after it's been loaded into SAS Visual Analytics, and more.
View the e-poster or slides (PDF)
Jessica Wegner, Optum
Margaret Burgess, Optum
Catherine Olson, Optum
Session SAS4240-2016:
Creating a Strong Business Case for SAS® Grid Manager: Translating Grid Computing Benefits to Business Benefits
SAS® Grid Manager, as well as other grid computing technologies, have a set of great capabilities that we, IT professionals, love to have in our systems. This technology increases high availability, allows parallel processing, facilitates increasing demand by scale out, and offers other features that make life better for those managing and using these environments. However, even when business users take advantage of these features, they are more concerned about the business part of the problem. Most of the time business groups hold the budgets and are key stakeholders for any SAS Grid Manager project. Therefore, it is crucial to demonstrate to business users how they will benefit from the new technologies, how the features will improve their daily operations, help them be more efficient and productive, and help them achieve better results. This paper guides you through a process to create a strong and persuasive business plan that translates the technology features from SAS Grid Manager to business benefits.
Read the paper (PDF) | Watch the recording
Marlos Bosso, SAS
Session 12521-2016:
Cyclist Death and Distracted Driving: Important Factors to Consider
Introduction: Cycling is on the rise in many urban areas across the United States. The number of cyclist fatalities is also increasing, by 19% in the last 3 years. With the broad-ranging personal and public health benefits of cycling, it is important to understand factors that are associated with these traffic-related deaths. There are more distracted drivers on the road than ever before, but the question remains of the extent that these drivers are affecting cycling fatality rates. Methods: This paper uses the Fatality Analysis Reporting System (FARS) data to examine factors related to cyclist death when the drivers are distracted. We use a novel machine learning approach, adaptive LASSO, to determine the relevant features and estimate their effect. Results: If a cyclist makes an improper action at or just before the time of the crash, the likelihood of the driver of the vehicle being distracted decreases. At the same time, if the driver is speeding or has failed to obey a traffic sign and fatally hits a cyclist, the likelihood of them also being distracted increases. Being distracted is related to other risky driving practices when cyclists are fatally injured. Environmental factors such as weather and road condition did not impact the likelihood that a driver was distracted when a cyclist fatality occurred.
Read the paper (PDF)
Lysbeth Floden, University of Arizona
Dr Melanie Bell, Dept of Epidemiology & Biostatistics, University of Arizona
Patrick O'Connor, University of Arizona
D
Session 8761-2016:
Data to Dashboard: Visualizing Classroom Utilization and Diversity Trends with SAS® Visual Analytics
Transforming data into intelligence for effective decision-making support is critically based on the role and capacity of the Office of Institutional Research (OIR) in managing the institution's data. Presenters share their journey from providing spreadsheet data to developing SAS® programs and dashboards using SAS® Visual Analytics. Experience gained and lessons learned are also shared at this session. The presenters demonstrate two dashboards the OIR office developed: one for classroom utilization and one for the university's diversity initiatives. The presenters share the steps taken for creating the dashboard and they describe the process the office took in getting the stakeholders involved in determining the key performance indicators (KPIs) and in evaluating and providing feedback regarding the dashboard. They share their experience gained and lessons learned in building the dashboard.
Read the paper (PDF)
Shweta Doshi, University of Georgia
Julie Davis, University of Georgia
Session 10722-2016:
Detecting Phishing Attempts with SAS®: Minimally Invasive Email Log Data
Phishing is the attempt of a malicious entity to acquire personal, financial, or otherwise sensitive information such as user names and passwords from recipients through the transmission of seemingly legitimate emails. By quickly alerting recipients of known phishing attacks, an organization can reduce the likelihood that a user will succumb to the request and unknowingly provide sensitive information to attackers. Methods to detect phishing attacks typically require the body of each email to be analyzed. However, most academic institutions do not have the resources to scan individual emails as they are received, nor do they wish to retain and analyze message body data. Many institutions simply rely on the education and participation of recipients within their network. Recipients are encouraged to alert information security (IS) personnel of potential attacks as they are delivered to their mailboxes. This paper explores a novel and more automated approach that uses SAS® to examine email header and transmission data to determine likely phishing attempts that can be further analyzed by IS personnel. Previously a collection of 2,703 emails from an external filtering appliance were examined with moderate success. This paper focuses on the gains from analyzing an additional 50,000 emails, with the inclusion of an additional 30 known attacks. Real-time email traffic is exported from Splunk Enterprise into SAS for analysis. The resulting model aids in determining the effectiveness of alerting IS personnel to potential phishing attempts faster than a user simply forwarding a suspicious email to IS personnel.
View the e-poster or slides (PDF)
Taylor Anderson, University of Alabama
Denise McManus, University of Alabama
Session 2742-2016:
Differentiate Yourself
Today's employment and business marketplace is highly competitive. As a result, it is necessary for SAS® professionals to differentiate themselves from the competition. Success depends on a number of factors, including positioning yourself with the necessary technical skills in relation to the competition. This presentation illustrates how SAS professionals can acquire a wealth of knowledge and enhance their skills by accessing valuable and free web content related to SAS. With the aid of a web browser and the Internet, anyone can access published PDF papers, Microsoft Word documents, Microsoft PowerPoint presentations, comprehensive student notes, instructor lesson plans, hands-on exercises, webinars, audios, videos, a comprehensive technical support website maintained by SAS, and more to acquire the essential expertise that is needed to cut through all the marketplace noise and begin differentiating yourself to secure desirable opportunities with employers and clients.
Read the paper (PDF)
Kirk Paul Lafler, Software Intelligence Corporation
E
Session 2760-2016:
Easing into Data Exploration, Reporting, and Analytics Using SAS® Enterprise Guide®
Whether you have been programming in SAS® for years, are new to it, or have dabbled with SAS® Enterprise Guide® before, this hands-on workshop sheds some light on the depth, breadth, and power of the Enterprise Guide environment. With all the demands on your time, you need powerful tools that are easy to learn and deliver end-to-end support for your data exploration, reporting, and analytics needs. Included are the following: data exploration tools formatting code--cleaning up after your coworkers enhanced programming environment (and how to calm it down) easily creating reports and graphics producing the output formats you need (XLS, PDF, RTF, HTML) workspace layout start-up processing notes to help your coworkers use your processes This workshop uses SAS Enterprise Guide 7.1, but most of the content is applicable to earlier versions.
Read the paper (PDF)
Marje Fecht, Prowerk Consulting
Session 11662-2016:
Effective Ways of Handling Various File Types and Importing Techniques Using SAS®9.4
Data-driven decision making is critical for any organization to thrive in this fiercely competitive world. The decision-making process has to be accurate and fast in order to stay a step ahead of the competition. One major problem organizations face is huge data load times in loading or processing the data. Reducing the data loading time can help organizations perform faster analysis and thereby respond quickly. In this paper, we compared the methods that can import data of a particular file type in the shortest possible time and thereby increase the efficiency of decision making. SAS® takes input from various file types (such as XLS, CSV, XLSX, ACCESS, and TXT) and converts that input into SAS data sets. To perform this task, SAS provides multiple solutions (such as the IMPORT procedure, the INFILE statement, and the LIBNAME engine) to import the data. We observed the processing times taken by each method for different file types with a data set containing 65,535 observations and 11 variables. We executed the procedure multiple times to check for variation in processing time. From these tests, we recorded the minimum processing time for the combination of procedure and file type. From our analysis of processing times taken by each importing technique, we observed that the shortest processing times for CSV and TXT files, XLS and XLSX files, and ACCESS files are the INFILE statement, the LIBNAME engine, and PROC IMPORT, respectively.
View the e-poster or slides (PDF)
Divya Dadi, Oklahoma State University
Rahul Jhaver, Oklahoma State University
Session 9180-2016:
Efficiently Create Rates over Different Time Periods (PROC MEANS and PROC EXPAND)
This session illustrates how to quickly create rates over a specified period of time, using the MEANS and EXPAND procedures. For example, do you want to know how to use the power of SAS® to create a year-to-date, rolling 12-month, or monthly rate? At Kaiser Permanente, we use this technique to develop Emergency Department (ED) use rates, ED admit rates, patient day rates, readmission rates, and more. A powerful function of PROC MEANS, given a database table with several dimensions and one or more facts, is to perform a mathematical calculation on fact columns across several different combinations of dimensions. For example, if a membership database table exists with the dimensions member ID, year-month, line of business, medical office building, and age grouping, PROC MEANS can easily determine and output the count of members by every possible dimension combination into a SAS data set. Likewise, if a hospital visit database table exists with the same dimensions and facts, PROC MEANS can output the number of hospital visits by the dimension combinations into a second SAS data set. With the power of PROC EXPAND, each of the data sets above, once sorted properly, can have columns added, which calculate total members and total hospital visits by a time dimension of the analyst's choice. Common time dimensions used for Kaiser Permanente's utilization rates are monthly, rolling 12-months, and year-to-date. The resulting membership and hospital visit data sets can be joined with a MERGE statement, and simple division produces a rate for the given dimensions.
Read the paper (PDF) | Watch the recording
Thomas Gant, Kaiser Permanente
Session 11683-2016:
Equivalence Tests
Motivated by the frequent need for equivalence tests in clinical trials, this presentation provides insights into tests for equivalence. We summarize and compare equivalence tests for different study designs, including one-parameter problems, designs with paired observations, and designs with multiple treatment arms. Power and sample size estimations are discussed. We also provide examples to implement the methods by using the TTEST, ANOVA, GLM, and MIXED procedures.
Read the paper (PDF)
Fei Wang, McDougall Scientific Ltd.
John Amrhein, McDougall Scientific Ltd.
Session 4140-2016:
Event Stream Processing for Enterprise Data Exploitation
With the big data throughputs generated by event streams, organizations can opportunistically respond with low-latency effectiveness. Having the ability to permeate identified patterns of interest throughout the enterprise requires deep integration between event stream processing and foundational enterprise data management applications. This paper describes the innovative ability to consolidate real-time data ingestion with controlled and disciplined universal data access from SAS® and Teradata.
Read the paper (PDF)
Tho Nguyen, Teradata
Fiona McNeill, SAS
Session SAS5060-2016:
Exploring SAS® Embedded Process Technologies on Hadoop
SAS® Embedded Process offers a flexible, efficient way to leverage increasing amounts of data by injecting the processing power of SAS® directly where the data lives. SAS Embedded Process can tap into the massively parallel processing (MPP) architecture of Hadoop for scalable performance. Using SAS® In-Database Technologies for Hadoop, you can run scoring models generated by SAS® Enterprise Miner™ or, with SAS® In-Database Code Accelerator for Hadoop, user-written DS2 programs in parallel. With SAS Embedded Process on Hadoop you can also perform data quality operations, and extract and transform data using SAS® Data Loader. This paper explores key SAS technologies that run inside the Hadoop parallel processing framework and prepares you to get started with them.
Read the paper (PDF)
David Ghazaleh, SAS
Session SAS6380-2016:
Extending the Armed Conflict Location & Event Data Project with SAS® Contextual Analysis
The Armed Conflict Location & Event Data Project (ACLED) is a comprehensive public collection of conflict data for developing states. As such, the data is instrumental in informing humanitarian and development work in crisis and conflict-affected regions. The ACLED project currently manually codes for eight types of events from a summarized event description, which is a time-consuming process. In addition, when determining the root causes of the conflict across developing states, these fixed event types are limiting. How can researchers get to a deeper level of analysis? This paper showcases a repeatable combination of exploratory and classification-based text analytics provided by SAS® Contextual Analysis, applied to the publicly available ACLED for African states. We depict the steps necessary to determine (using semi-automated processes) the next deeper level of themes associated with each of the coded event types; for example, while there is a broad protests and riots event type, how many of those are associated individually with rallies, demonstrations, marches, strikes, and gatherings? We also explore how to use predictive models to highlight the areas that require aid next. We prepare the data for use in SAS® Visual Analytics and SAS® Visual Statistics using SAS® DS2 code and enable dynamic explorations of the new event sub-types. This ultimately provides the end-user analyst with additional layers of data that can be used to detect trends and deploy humanitarian aid where limited resources are needed the most.
Read the paper (PDF)
Tom Sabo, SAS
Session 12487-2016:
Extracting Useful Information From the Google Ngram Data Set: A General Method to Take the Growth of the Scientific Literature into Account
Recent years have seen the birth of a powerful tool for companies and scientists: the Google Ngram data set, built from millions of digitized books. It can be, and has been, used to learn about past and present trends in the use of words over the years. This is an invaluable asset from a business perspective, mostly because of its potential application in marketing. The choice of words has a major impact on the success of a marketing campaign and an analysis of the Google Ngram data set can validate or even suggest the choice of certain words. It can also be used to predict the next buzzwords in order to improve marketing on social media or to help measure the success of previous campaigns. The Google Ngram data set is a gift for scientists and companies, but it has to be used with a lot of care. False conclusions can easily be drawn from straightforward analysis of the data. It contains only a limited number of variables, which makes it difficult to extract valuable information from it. Through a detailed example, this paper shows that it is essential to account for the disparity in the genre of the books used to construct the data set. This paper argues that for the years after 1950, the data set has been constructed using a much higher proportion of scientific books than for the years before. An ingenious method is developed to approximate, for each year, this unknown proportion of books coming from the scientific literature. A statistical model accounting for that change in proportion is then presented. This model is used to analyze the trend in the use of common words of the scientific literature in the 20th century. Results suggest that a naive analysis of the trends in the data can be misleading.
Read the paper (PDF)
Aurélien Nicosia, Université Laval
Thierry Duchesne, Universite Laval
Samuel Perreault, Université Laval
F
Session 9260-2016:
FASHION, STYLE "GOTTA HAVE IT" COMPUTE DEFINE BLOCK
Do you create complex reports using PROC REPORT? Are you confused by the COMPUTE BLOCK feature of PROC REPORT? Are you even aware of it? Maybe you already produce reports using PROC REPORT, but suddenly your boss needs you to modify some of the values in one or more of the columns. Maybe your boss needs to see the values of some rows in boldface and others highlighted in a stylish yellow. Perhaps one of the columns in the report needs to display a variety of fashionable formats (some with varying decimal places and some without any decimals). Maybe the customer needs to see a footnote in specific cells of the report. Well, if this sounds familiar then come take a look at the COMPUTE BLOCK of PROC REPORT. This paper shows a few tips and tricks of using the COMPUTE DEFINE block with conditional IF/THEN logic to make your reports stylish and fashionable. The COMPUTE BLOCK allows you to use data DATA step code within PROC REPORT to provide customization and style to your reports. We'll see how the Census Bureau produces a stylish demographic profile for customers of its Special Census program using PROC REPORT with the COMPUTE BLOCK. The paper focuses on how to use the COMPUTE BLOCK to create this stylish Special Census profile. The paper shows quick tips and simple code to handle multiple formats within the same column, make the values in the Total rows boldface, trafficlighting, and how to add footnotes to any cell based on the column or row. The Special Census profile report is an Excel table created with ODS tagsets.ExcelXP that is stylish and fashionable, thanks in part to the COMPUTE BLOCK.
Read the paper (PDF) | Watch the recording
Chris Boniface, Census Bureau
Session 2700-2016:
Forecasting Behavior with Age-Period-Cohort Models: How APC Predicted the US Mortgage Crisis, but Also Does So Much More
We introduce age-period-cohort (APC) models, which analyze data in which performance is measured by age of an account, account open date, and performance date. We demonstrate this flexible technique with an example from a recent study that seeks to explain the root causes of the US mortgage crisis. In addition, we show how APC models can predict website usage, retail store sales, salesperson performance, and employee attrition. We even present an example in which APC was applied to a database of tree rings to reveal climate variation in the southwestern United States.
View the e-poster or slides (PDF)
Joseph Breeden, Prescient Models
H
Session 8820-2016:
How Managers and Executives Can Leverage SAS® Enterprise Guide®
SAS® Enterprise Guide® is an extremely valuable tool for programmers, but it should also be leveraged by managers and executives to do data exploration, get information on the fly, and take advantage of the powerful analytics and reporting that SAS® has to offer. This can all be done without learning to program. This paper gives an overview of how SAS Enterprise Guide can improve the process of turning real-time data into real-time business decisions by managers.
Read the paper (PDF)
Steven First, Systems Seminar Consultants, Inc.
Session SAS3560-2016:
How to Find Your Perfect Match Using SAS® Data Management
SAS® Data Management is not a dating application. However, as a data analyst, you do strive to find the best matches for your data. Similar to a dating application, when trying to find matches for your data, you need to specify the criteria that constitutes a suitable match. You want to strike a balance between being too stringent with your criteria and under-matching your data and being too loose with your criteria and over-matching your data. This paper highlights various SAS Data Management matching techniques that you can use to strike the right balance and help your data find its perfect match, and as a result improve your data for reporting and analytics purposes.
Read the paper (PDF)
Mary Kathryn Queen, SAS
Session 10940-2016:
How to Move Data among Client Hard Disk, the Hadoop File System, and SAS® LASR™ Analytic Server
In SAS® LASR™ Analytic Server, data can reside in three types of environments: client hard disk (for example, a laptop), the Hadoop File System (HDFS) and the memory of the SAS LASR Analytic Server. Moving the data efficiently among these is critical for getting insights from the data on time. In this paper, we illustrate all the possible ways to move the data, including 1) moving data from client hard disk to HDFS; 2) moving data from HDFS to client hard disk; 3) moving data from HDFS to SAS LASR Analytic Server; 4) moving data from SAS LASR Analytic Server to HDFS; 5) moving data from client hard disk to SAS LASR Analytic Server; and 6) moving data from SAS LASR Analytic Server to client hard disk.
Read the paper (PDF) | Watch the recording
Yue Qi, SAS
Session 7740-2016:
How to STRIP Your Data: Five Go-To Steps to Assure Data Quality
Managing large data sets comes with the task of providing a certain level of quality assurance, no matter what the data is used for. We present here the fundamental SAS® procedures to perform when determining the completeness of a data set. Even though each data set is unique and has its own variables that need more examination in detail, it is important to first examine the size, time, range, interactions, and purity (STRIP) of a data set to determine its readiness for any use. This paper covers first steps you should always take, regardless of whether you're dealing with health, financial, demographic, or environmental data.
Read the paper (PDF) | Watch the recording
Michael Santema, Kaiser Permanente
Fagen Xie, Kaiser Permanente
I
Session 9060-2016:
Introduction to ODS Graphics
This presentation teaches the audience how to use ODS Graphics. Now part of Base SAS®, ODS Graphics are a great way to easily create clear graphics that enable users to tell their stories well. SGPLOT and SGPANEL are two of the procedures that can be used to produce powerful graphics that used to require a lot of work. The core of the procedures is explained, as well as some of the many options available. Furthermore, we explore the ways to combine the individual statements to make more complex graphics that tell the story better. Any user of Base SAS on any platform will find great value in the SAS® ODS Graphics procedures.
Read the paper (PDF)
Chuck Kincaid, Experis BI & Analytics Practice
Session SAS5200-2016:
It's raining data! Harnessing the Cloud with Amazon Redshift and SAS/ACCESS® Software
Every day, companies all over the world are moving their data into the cloud. While there are many options available, much of this data will wind up in Amazon Redshift. As a SAS® user, you are probably wondering, 'What is the best way to access this data using SAS?' This paper discusses the many ways that you can use SAS/ACCESS® software to get to Amazon Redshift. We compare and contrast the various approaches and help you decide which is best for you. Topics include building a connection, moving data into Amazon Redshift, and applying performance best practices.
Read the paper (PDF)
Chris DeHart, SAS
Jeff Bailey, SAS
M
Session 6643-2016:
Making Better Decisions about Risk Classification Using Decision Trees in SAS® Visual Analytics
SAS® Visual Analytics Explorer puts the robust power of decision trees at your fingertips, enabling you to visualize and explore how data is structured. Decision trees help analysts better understand discrete relationships within data by visually showing how combinations of variables lead to a target indicator. This paper explores the practical use of decision trees in SAS Visual Analytics Explorer through an example of risk classification in the financial services industry. It explains various parameters and implications, explores ways the decision tree provides value, and provides alternative methods to help you the reality of imperfect data.
Read the paper (PDF) | Watch the recording
Stephen Overton, Zencos Consulting LLC
Ben Murphy, Zencos Consulting LLC
Session SAS5801-2016:
Minimizing Fraud Risk through Dynamic Entity Resolution and Network Analysis
Every day, businesses have to remain vigilant of fraudulent activity, which threatens customers, partners, employees, and financials. Normally, networks of people or groups perpetrate deviant activity. Finding these connections is now made easier for analysts with SAS® Visual Investigator, an upcoming SAS® solution that ultimately minimizes the loss of money and preserves mutual trust among its shareholders. SAS Visual Investigator takes advantage of the capabilities of the new SAS® In-Memory Server. Investigators can efficiently investigate suspicious cases across business lines, which has traditionally been difficult. However, the time required to collect, process and identify emerging fraud and compliance issues has been costly. Making proactive analysis accessible to analysts is now more important than ever. SAS Visual Investigator was designed with this goal in mind and a key component is the visual social network view. This paper discusses how the network analysis view of SAS Visual Investigator, with all its dynamic visual capabilities, can make the investigative process more informative and efficient.
Read the paper (PDF)
Danielle Davis, SAS
Stephen Boyd, SAS Institute
Ray Ong, SAS Institute
Session 2120-2016:
More Hidden Base SAS® Features to Impress Your Colleagues
Across the languages of SAS® are many golden nuggets--functions, formats, and programming features just waiting to impress your friends and colleagues. While learning SAS for over 30 years, I have collected a few of these nuggets, and I offer a dozen more of them to you in this presentation. I presented the first dozen in a similar paper at SAS Global Forum 2015.
Read the paper (PDF) | Watch the recording
Peter Crawford, Crawford Software Consultancy limited
N
Session 3442-2016:
Name That Tune--Writing Music with SAS®
Writing music with SAS® is a simple process. Including a snippet of music in a program is a great way to signal completion of processing. It's also fun! This paper illustrates a method for translating music into SAS code using the CALL SOUND routine.
Read the paper (PDF) | Watch the recording
Dan Bretheim, Towers Watson
Session 10360-2016:
Nine Frequently Asked Questions about Getting Started with SAS® Visual Analytics
You've heard all the talk about SAS® Visual Analytics--but maybe you are still confused about how the product would work in your SAS® environment. Many customers have the same points of confusion about what they need to do with their data, how to get data into the product, how SAS Visual Analytics would benefit them, and even should they be considering Hadoop or the cloud. In this paper, we cover the questions we are asked most often about implementation, administration, and usage of SAS Visual Analytics.
Read the paper (PDF) | Watch the recording
Tricia Aanderud, Zencos Consulting LLC
Ryan Kumpfmiller, Zencos Consulting
Nick Welke, Zencos Consulting
O
Session 8220-2016:
Optimizing SAS® on Red Hat Enterprise Linux (RHEL) 6 and 7
Today, companies are increasingly using analytics to discover new revenue-increasing and cost-saving opportunities. Many business professionals turn to SAS, a leader in business analytics software and service, to help them improve performance and make better decisions faster. Analytics are also being used in risk management, fraud detection, life sciences, sports, and many more emerging markets. To maximize their value to a business, analytics solutions need to be deployed quickly and cost-effectively, while also providing the ability to scale readily without degrading performance. Of course, in today's demanding environments, where budgets are shrinking and the number of mandates to reduce carbon footprints are growing, the solution must deliver excellent hardware utilization, power efficiency, and return on investment. To address some of these challenges, Red Hat and SAS have collaborated to recommend the best practices for configuring SAS® 9 running on Red Hat Enterprise Linux. The scope of this document includes Red Hat Enterprise Linux 6 and 7. Researched areas include the I/O subsystem, file system selection, and kernel tuning, in both bare-metal and kernel-based virtual machine (KVM) environments. In addition, we include grid configurations that run with the Red Hat Resilient Storage Add-On, which includes Global File System 2 (GFS2) clusters.
Read the paper (PDF)
Barry Marson, Red Hat, Inc
P
Session 7540-2016:
PROC SQL for SQL DieHards
Inspired by Christianna William's paper on transitioning to PROC SQL from the DATA step, this paper aims to help SQL programmers transition to SAS® by using PROC SQL. SAS adapted the Structured Query Language (SQL) by means of PROC SQL back with SAS®6. PROC SQL syntax closely resembles SQL. However, there are some SQL features that are not available in SAS. Throughout this paper, we outline common SQL tasks and how they might differ in PROC SQL. We also introduce useful SAS features that are not available in SQL. Topics covered are appropriate for novice SAS users.
Read the paper (PDF)
Barbara Ross, NA
Jessica Bennett, Snap Finance
Session 10381-2016:
Pastries, Microbreweries, Diamonds, and More: Small Businesses Can Profit with SAS®
Today, there are 28 million small businesses, which account for 54% of all sales in the United States. The challenge is that small businesses struggle every day to accurately forecast future sales. These forecasts not only drive investment decisions in the business, but also are used in setting daily par, determining labor hours, and scheduling operating hours. In general, owners use their gut instinct. Using SAS® provides the opportunity to develop accurate and robust models that can unlock costs for small business owners in a short amount of time. This research examines over 5,000 records from the first year of daily sales data for a start-up small business, while comparing the four basic forecasting models within SAS® Enterprise Guide®. The objective of this model comparison is to demonstrate how quick and easy it is to forecast small business sales using SAS Enterprise Guide. What does that mean for small businesses? More profit. SAS provides cost-effective models for small businesses to better forecast sales, resulting in better business decisions.
View the e-poster or slides (PDF)
Cameron Jagoe, The University of Alabama
Taylor Larkin, The University of Alabama
Denise McManus, University of Alabama
Session 2480-2016:
Performing Pattern Matching by Using Perl Regular Expressions
SAS® software provides many DATA step functions that search and extract patterns from a character string, such as SUBSTR, SCAN, INDEX, TRANWRD, etc. Using these functions to perform pattern matching often requires you to use many function calls to match a character position. However, using the Perl regular expression (PRX) functions or routines in the DATA step improves pattern-matching tasks by reducing the number of function calls and making the program easier to maintain. This talk, in addition to discussing the syntax of Perl regular expressions, demonstrates many real-world applications.
Read the paper (PDF) | Download the data file (ZIP)
Arthur Li, City of Hope
Session 11779-2016:
Predicting Response Time for the First Reply after a Question Is Posted in the SAS® Community Forum
Many inquisitive minds are filled with excitement and anticipation of response every time one posts a question on a forum. This paper explores the factors that impact the response time of the first response for questions posted in the SAS® Community forum. The factors are contributors' availability, nature of topic, and number of contributors knowledgeable for that particular topic. The results from this project help SAS® users receive an estimated response time, and the SAS Community forum can use this information to answer several business questions such as following: What time of the year is likely to have an overflow of questions? Do specific topics receive delayed responses? Which days of the week are the community most active? To answer such questions, we built a web crawler using Python and Selenium to fetch data from the SAS Community forum, one of the largest analytics groups. We scraped over 13,443 queries and solutions starting from January 2014 to present. We also captured several query-related attributes such as the number of replies, likes, views, bookmarks, and the number of people conversing on the query. Using different tools, we analyzed this data set after clustering the queries into 22 subtopics and found interesting patterns that can help the SAS Community forum in several ways, as presented in this paper.
View the e-poster or slides (PDF)
Praveen Kumar Kotekal, Oklahoma State University
Session 11671-2016:
Predicting the Influence of Demographics on Domestic Violence Using SAS® Enterprise Guide® 6.1 and SAS® Enterprise Miner™ 12.3
The Oklahoma State Department of Health (OSDH) conducts home visiting programs with families that need parental support. Domestic violence is one of the many screenings performed on these visits. The home visiting personnel are trained to do initial screenings; however, they do not have the extensive information required to treat or serve the participants in this arena. Understanding how demographics such as age, level of education, and household income among others, are related to domestic violence might help home visiting personnel better serve their clients by modifying their questions based on these demographics. The objective of this study is to better understand the demographic characteristics of those in the home visiting programs who are identified with domestic violence. We also developed predictive models such as logistic regression and decision trees based on understanding the influence of demographics on domestic violence. The study population consists of all the women who participated in the Children First Program of the OSDH from 2012 to 2014. The data set contains 1,750 observations collected during screening by the home visiting personnel over the two-year period. In addition, they must have completed the Demographic form as well as the Relationship Assessment form at the time of intake. Univariate and multivariate analysis has been performed to discover the influence that age, education, and household income have on domestic violence. From the initial analysis, we can see that women who are younger than 25 years old, who haven't completed high school, and who are somewhat dependent on their husbands or partners for money are most vulnerable. We have even segmented the clients based on the likelihood of domestic violence.
View the e-poster or slides (PDF)
Soumil Mukherjee, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Miriam McGaugh, Oklahoma state department of Health
Session 3420-2016:
Predictive Analytics in Credit Scoring: Challenges and Practical Resolutions
This session discusses challenges and considerations typically faced in credit risk scoring as well as options and practical ways to address them. No two problems are ever the same even if the general approach is clear. Every model has its own unique characteristics and creative ways to address them. Successful credit scoring modeling projects are always based on a combination of both advanced analytical techniques and data, and a deep understanding of the business and how the model will be applied. Different aspects of the process are discussed, including feature selection, reject inferencing, sample selection and validation, and model design questions and considerations.
Read the paper (PDF)
Regina Malina, Equifax
R
Session 10981-2016:
Rapid Web Services using SAS/IntrNet® Software, jQuery AJAX, and PROC JSON
Creating web applications and web services can sometimes be a daunting task. With the ever changing facets of web programming, it can be even more challenging. For example, client-side web programming using applets was very popular almost 20 years ago, only to be replaced with server-side programming techniques. With all of the advancements in JavaScript libraries, it seems client-side programming is again making a comeback. Amidst all of the changing web technologies, surprisingly one of the most powerful tools I have found that has provided exceptional capabilities across all types of web development techniques has been SAS/IntrNet®. Traditionally seen as a server-side programming tool for generating complete web pages based on SAS® content, SAS/IntrNet, coupled with jQuery AJAX or AJAX alone, also has the ability to provide client-side web programming techniques, as well as provide RESTful web service implementations. I hope to show that with the combination of these tools including the JSON procedure from SAS® 9.4, simple yet powerful web services or dynamic content rich web pages can be created easily and rapidly.
Read the paper (PDF)
Jeremy Palbicki, Mayo Clinic
Session 1660-2016:
Reading JSON in SAS® Using Groovy
JavaScript Object Notation (JSON) has quickly become the de-facto standard for data transfer on the Internet, due to an increase in both web data and the usage of full-stack JavaScript. JSON has become dominant in the emerging technologies of the web today, such as the Internet of Things and the mobile cloud. JSON offers a light and flexible format for data transfer, and can be processed directly from JavaScript without the need for an external parser. The SAS® JSON procedure lacks the ability to read in JSON. However, the addition of the GROOVY procedure in SAS® 9.3 allows for execution of Java code from within SAS, allowing for JSON data to be read into a SAS data set through XML conversion. This paper demonstrates the method for parsing JSON into data sets with Groovy and the XML LIBNAME Engine, all within Base SAS®.
Read the paper (PDF) | Download the data file (ZIP)
John Kennedy, Mesa Digital
S
Session 2748-2016:
SAS® Debugging 101
SAS® users are always surprised to discover their programs contain bugs (or errors). In fact, when asked, users will emphatically stand by their programs and logic by saying they are error free. But, the vast number of experiences, along with the realities of writing code, say otherwise. Errors in program code can appear anywhere, whether accidentally introduced by developers or programmers, when writing code. No matter where an error occurs, the overriding sentiment among most users is that debugging SAS programs can be a daunting and humbling task. This presentation explores the world of SAS errors, providing essential information about the various error types. Attendees learn how errors are created, their symptoms, identification techniques, and how to apply effective techniques to better understand, repair, and enable program code to work as intended.
Read the paper (PDF)
Kirk Paul Lafler, Software Intelligence Corporation
Session SAS6365-2016:
SAS® Grid Administration Made Simple
Historically, administration of your SAS® Grid Manager environment has required interaction with a number of disparate applications including Platform RTM for SAS, SAS® Management Console, and command line utilities. With the third maintenance release of SAS® 9.4, you can now use SAS® Environment Manager for all monitoring and management of your SAS Grid. The new SAS Environment Manager interface gives you the ability to configure the Load Sharing Facility (LSF), manage and monitor high-availability applications, monitor overall SAS Grid health, define event-based alerts, and much, much more through a single, unified, web-based interface.
Read the paper (PDF) | Watch the recording
Scott Parrish, SAS
Paula Kavanagh, SAS Institute, Inc.
Linda Zeng, SAS Institute, Inc.
Session 8860-2016:
SAS® Metadata Security 101: A Primer for SAS Administrators and Users Not Familiar with SAS
It is not uncommon to hear SAS® administrators complain that their IT department and users just don't get it when it comes to metadata and security. For the administrator or user not familiar with SAS, understanding how SAS interacts with the operating system, the file system, external databases, and users can be confusing. This paper walks you through all the basic metadata relationships and how they are created on an installation of SAS® Enterprise Office Analytics installation in a Windows environment. This guided tour unravels the mystery of how the host system, external databases, and SAS work together to give users what they need, while reliably enforcing the appropriate security.
Read the paper (PDF) | Watch the recording
Charyn Faenza, F.N.B. Corporation
Session 10962-2016:
SAS® Metadata Security 201: Security Basics for a New SAS Administrator
The purpose of this paper is to provide an overview of SAS® metadata security for new or inexperienced SAS administrators. The focus of the discussion is on identifying the most common metadata security objects such as access control entries (ACEs), access control templates (ACTs), metadata folders, authentication domains, etc. and describing how these objects work together to secure the SAS environment. Based on a standard SAS® Enterprise Office Analytics for Midsize Business installation in a Windows environment, this paper walks through a simple example of securing a metadata environment, which demonstrates how security is prioritized, the impact of each security layer, and how conflicts are resolved.
Read the paper (PDF) | Watch the recording
Charyn Faenza, F.N.B. Corporation
Session SAS5780-2016:
SAS® Visual Statistics 8.1: The New Self-Service, Easy Analytics Experience
In today's Business Intelligence world, self-service, which allows an everyday knowledge worker to explore data and personalize business reports without being tech-savvy, is a prerequisite. The new release of SAS® Visual Statistics introduces an HTML5-based, easy-to-use user interface that combines statistical modeling, business reporting, and mobile sharing into a one-stop self-service shop. The backbone analytic server of SAS Visual Statistics is also updated, allowing an end user to analyze data of various sizes in the cloud. The paper illustrates this new self-service modeling experience in SAS Visual Statistics using telecom churn data, including the steps of identifying distinct user subgroups using decision tree, building and tuning regression models, designing business reports for customer churn, and sharing the final modeling outcome on a mobile device.
Read the paper (PDF)
Xiangxiang Meng, SAS
Don Chapman, SAS
Cheryl LeSaint, SAS
Session 10960-2016:
SAS® and R: A Perfect Combination for Sports Analytics
Revolution Analytics reports more than two million R users worldwide. SAS® has the capability to use R code, but users have discovered a slight learning curve to performing certain basic functions such as getting data from the web. R is a functional programming language while SAS is a procedural programming language. These differences create difficulties when first making the switch from programming in R to programming in SAS. However, SAS/IML® software enables integration between the two languages by enabling users to write R code directly into SAS/IML. This paper details the process of using the SAS/IML command Submit /R and the R package XML to get data from the web into SAS/IML. The project uses public basketball data for each of the 30 NBA teams over the past 35 years, taken directly from Basketball-Reference.com. The data was retrieved from 66 individual web pages, cleaned using R functions, and compiled into a final data set composed of 48 variables and 895 records. The seamless compatibility between SAS and R provide an opportunity to use R code in SAS for robust modeling. The resulting analysis provides a clear and concise approach for those interested in pursuing sports analytics.
View the e-poster or slides (PDF)
Matt Collins, University of Alabama
Taylor Larkin, The University of Alabama
Session 11762-2016:
Sampling in SAS® using PROC SURVEYSELECT
This paper examines the various sampling options that are available in SAS® through PROC SURVEYSELECT. We do not cover all of the possible sampling methods or options that PROC SURVEYSELECT features. Instead, we look at Simple Random Sampling, Stratified Random Sampling, Cluster Sampling, Systematic Sampling, and Sequential Random Sampling.
Read the paper (PDF) | View the e-poster or slides (PDF)
Rachael Becker, University of Central Florida
Drew Doyle, University of Central Florida
Session 11741-2016:
Secrets of Efficient SAS® Coding Techniques
Just as there are many ways to solve any problem in any facet of life, most SAS® programming problems have more than one potential solution. Each solution has tradeoffs; a complex program might execute very quickly but prove challenging to maintain, while a program designed for ease of use might require more resources for development, execution, and maintenance. Too often, it seems like those tasked to produce the results are advised on delivery date and estimated development time in advance, but are given no guidelines for efficiency expectations. This paper provides ways for improving the efficiency of your SAS® programs. It suggests coding techniques, provides guidelines for their use, and shows the results of experimentation to compare various coding techniques, with examples of acceptable and improved ways to accomplish the same task.
Read the paper (PDF) | Watch the recording
Andrew Kuligowski, HSN
Swati Agarwal, Optum
Session 10862-2016:
Setting Up Winning Conditions for a Migration of SAS® from a PC Environment to a Server Environment
Whether it is a question of security or a question of centralizing the SAS® installation to a server, the need to phase out SAS in the PC environment has never been so common. On the surface, this type of migration seems very simple and smooth. However, migrating SAS from a PC environment to a SAS server environment (SAS® Enterprise Guide®) is really easy to underestimate. This paper presents a way to set up the winning conditions to achieve this goal without falling into the common traps. Based on a successful conversion with a previous employer, I have identified a high-level roadmap with specific objectives that will guide people in this important task.
Read the paper (PDF) | Watch the recording
Mathieu Gaouette, Videotron
Session SAS5201-2016:
Size Optimization Made Simple: Creating Omni-Channel and Supply Chain Efficiencies to Increase Consumer Satisfaction and Revenue
Retailers and wholesalers invest heavily in technology, people, processes, and data to create relevant assortments across channels. While technology and vast amounts of data help localize assortments based on consumer preferences, product attributes, and store performance, it's impossible to complete the assortment planning process down to the most granular level of size. The ability to manage millions of size and store combinations is burdensome, not scalable, and not precise. Valuable time and effort is spent creating detailed, insightful assortments only to marginalize those assortments by applying corporate averages of size selling for the purchasing and distribution of sizes to locations. The result is missed opportunity: disappointed customers, lost revenue, and lost profitability due to missing sizes and markdowns on abundant sizes. This paper shows how retailers and wholesalers can transform historical sales data into true size demand and determine the optimal size demand profile to use in the purchasing and allocation of products. You do not need to be a data scientist, statistician, or hold a PhD to augment the business process with approachable analytics and optimization to yield game-changing results!
Read the paper (PDF)
Donna McGuckin, SAS
Session 11480-2016:
Solving a Business Problem in SAS® Enterprise Guide®: Creating a "Layered" Inpatient Indicator Model
This paper describes a Kaiser Permanente Northwest business problem regarding tracking recent inpatient hospital utilization at external hospitals, and how it was solved with the flexibility of SAS® Enterprise Guide®. The Inpatient Indicator is an estimate of our regional inpatient hospital utilization as of yesterday. It tells us which of our members are in which hospitals. It measures inpatient admissions, which are health care interactions where a patient is admitted to a hospital for bed occupancy to receive hospital services. The Inpatient Indicator is used to produce data and create metrics and analysis essential to the decision making of Kaiser Permanente executives, care coordinators, patient navigators, utilization management physicians, and operations managers. Accurate, recent hospital inpatient information is vital for decisions regarding patient care, staffing, and member utilization. Due to a business policy change, Kaiser Permanente Northwest lost the ability to track urgent and emergent inpatient admits at external, non-plan hospitals through our referral system, which was our data source for all recent external inpatient admits. Without this information, we did not have complete knowledge of whether a member had an inpatient stay at an external hospital until a claim was received, which could be several weeks after the member was admitted. Other sources were needed to understand our inpatient utilization at external hospitals. A tool was needed with the flexibility to easily combine and compare multiple data sets with different field names, formats, and values representing the same metric. The tool needed to be able to import data from different sources and export data to different destinations. We also needed a tool that would allow this project to be scheduled. We chose to build the model with SAS Enterprise Guide.
View the e-poster or slides (PDF)
Thomas Gant, Kaiser Permanente
Session 11700-2016:
Solving the 1,001-Piece Puzzle in 10 (or Fewer) Easy Steps: Using SAS®9 .cfg, autoexec.sas, SAS Registry, and Options to Set Up Base SAS®
Are you frustrated with manually setting options to control your SAS® Display Manager sessions but become daunted every time you look at all the places you can set options and window layouts? In this paper, we look at various files SAS accesses when starting, what can (and cannot) go into them, and what takes precedence after all are executed. We also look at the SAS Registry and how to programmatically change settings. By the end of the paper, you will be comfortable in knowing where to make the changes that best fit your needs.
Read the paper (PDF)
Peter Eberhardt, Fernwood Consulting Group Inc.
Session SAS5602-2016:
Sparking Analytical Insight with SAS® Data Loader for Hadoop
Sixty percent of organizations will have Hadoop in production by 2016, per a recent TDWI survey, and it has become increasingly challenging to access, cleanse, and move these huge volumes of data. How can data scientists and business analysts clean up their data, manage it where it lives, and overcome the big data skills gap? It all comes down to accelerating the data preparation process. SAS® Data Loader leverages years of expertise in data quality and data integration, a simplified user interface, and the parallel processing power of Hadoop to overcome the skills gap and compress the time taken to prepare data for analytics or visualization. We cover some of the new capabilities of SAS Data Loader for Hadoop including in-memory Spark processing of data quality and master data management functions, faster profiling and unstructured data field extraction, and chaining multiple transforms together for improved productivity.
Read the paper (PDF)
Matthew Magne, SAS
Scott Gidley, SAS
Session SAS6367-2016:
Streaming Decisions: How SAS® Puts Streaming Data to Work
Sensors, devices, social conversation streams, web movement, and all things in the Internet of Things (IoT) are transmitting data at unprecedented volumes and rates. SAS® Event Stream Processing ingests thousands and even hundreds of millions of data events per second, assessing both the content and the value. The benefit to organizations comes from doing something with those results, and eliminating the latencies associated with storing data before analysis happens. This paper bridges the gap. It describes how to use streaming data as a portable, lightweight micro-analytics service for consumption by other applications and systems.
Read the paper (PDF) | Download the data file (ZIP)
Fiona McNeill, SAS
David Duling, SAS
Steve Sparano, SAS
T
Session SAS6740-2016:
Text Mining Secretary Clinton's Emails
The recent controversy regarding former Secretary Hillary Clinton's use of a non-government, privately maintained email server provides a great opportunity to analyze real-world data using a variety of analytic techniques. This email corpus is interesting because of the challenges in acquiring and preparing the data for analysis as well as the variety of analyses that can be performed, including techniques for searching, entity extraction and resolution, natural language processing for topic generation, and social network analysis. Given the potential for politically charged discussion, rest assured there will be no discussion of politics--just fact-based analysis.
Read the paper (PDF)
Michael Ames, SAS
Session 8080-2016:
The HPSUMMARY Procedure: An Old Friend's Younger (and Brawnier) Cousin
The HPSUMMARY procedure provides data summarization tools to compute basic descriptive statistics for variables in a SAS® data set. It is a high-performance version of the SUMMARY procedure in Base SAS®. Though PROC SUMMARY is popular with data analysts, PROC HPSUMMARY is still a new kid on the block. The purpose of this paper is to provide an introduction to PROC HPSUMMARY by comparing it with its well-known counterpart, PROC SUMMARY. The comparison focuses on differences in syntax, options, and performance in terms of execution time and memory usage. Sample code, outputs, and SAS log snippets are provided to illustrate the discussion. Simulated large data is used to observe the performance of the two procedures. Using SAS® 9.4 installed on a single-user machine with four cores available, preliminary experiments that examine performance of the two procedures show that HPSUMMARY is more memory-efficient than SUMMARY when the data set is large. (For example, SUMMARY failed due to insufficient memory, whereas HPSUMMARY finished successfully). However, there is no evidence of a performance advantage of the HPSUMMARY over the SUMMARY procedures in this single-user machine.
Read the paper (PDF) | View the e-poster or slides (PDF)
Anh Kellermann, University of South Florida
Jeff Kromrey, University of South Florida
Session 11940-2016:
The Journey Towards an Omnichannel Reality
Omnichannel, and the omniscient customer experience, is most commonly used as a buzzword to describe the seamless customer experience in a traditional multi-channel marketing and sales environment. With more channels and methods of communication, there is a growing need to establish a more customer-centric way of dealing with all customer interactions, not only 1:1. Telenor, based out of Norway, is one of the world's major mobile operators with in excess of 200 million mobile subscriptions throughout 13 markets across Europe and Asia. The Norwegian home-market is a highly saturated and mature market in which customer demands and expectations are constantly rising. To deal with this and with increased competition, two major initiatives were established together with SAS®. The initiatives aimed to leverage both the need for real-time analytics and decision management in our inbound channel, and for creating an omnichannel experience across inbound and outbound channels. The projects were aimed at both business-to-consumer (B2C) and business-to-business (B2B) markets. With significant legacy of back-end systems and a complex value chain it was important to both improve the customer experience and simplify customer treatment, all without impacting the back-end system at large. The presentation sheds light on how the projects worked to meet the technical challenges alongside the need for an optimal customer experience. With results far exceeding expectations, the outcome has established the basis for further Customer Lifecycle Management (CLM) initiatives to strengthen both Net Promoter Score/Customer loyalty and revenue.
Read the paper (PDF)
Jørn Tronstad, Telenor
Session SAS2900-2016:
The Six Tenets of a Better Decision
SAS® helps people make better decisions. But what makes a decision better? How can we make sure we are not making worse decisions? There are six tenets to follow to ensure we are making better decisions. Decisions are better when they are: (1) Aligned with your mission; (2) Complete; (3) Faster; (4) Accurate; (5) Accessible; and (6) Recurring, ongoing, or productionalized. By combining all of these aspects of making a decision, you can have confidence that you are making a better decision. The breadth of SAS software is examined to understand how it can be applied toward these tenets. Scorecards are used to ensure that your business stays aligned with goals. Data Management is used to bring together all of the data you have, to provide complete information. SAS® Visual Analytics offerings are unparalleled in their speed to enable you to make faster decisions. Exhaustive testing verifies accuracy. Modern, easy-to-use user interfaces are adapted for multiple languages and designed for a variety of users to ensure accessibility. And the powerful SAS data flow architecture is built for ongoing support of decisions. Several examples from the SAS® Solutions OnDemand group are used as case studies in support of these tenets.
Read the paper (PDF)
Dan Jahn, SAS
Session 11770-2016:
Time Series Analysis: U.S. Military Casualties in the Pacific Theater during World War II
This paper shows how SAS® can be used to obtain a Time Series Analysis of data regarding World War II. This analysis tests whether Truman's justification for the use of atomic weapons was valid. Truman believed that by using the atomic weapons, he would prevent unacceptable levels of U.S. casualties that would be incurred in the course of a conventional invasion of the Japanese islands.
Read the paper (PDF) | View the e-poster or slides (PDF)
Rachael Becker, University of Central Florida
Session 11822-2016:
To Macro or not to Macro: That Is the Question
Do you need a macro for your program? This paper provides some guidelines, based on user experience, and explores whether it's worth the time to create a macro (for example, a parameter-driven macro or just a simple macro variable). This paper is geared toward new users and experienced users who do not use macros.
Read the paper (PDF)
Claudine Lougee, Dualenic
U
Session 9920-2016:
UCF SAS® Visual Analytics: Implementation, Usage, and Performance
At the University of Central Florida (UCF), we recently invested in SAS® Visual Analytics, along with the updated SAS® Business Intelligence platform (from 9.2 to 9.4), a project that took over a year to be completed. This project was undertaken to give our users the best and most updated tools available. This paper introduces the SAS Visual Analytics environment at UCF and includes projects created using this product. It answers why we selected SAS Visual Analytics for development over other SAS® applications. It explains the technical environment for our non-distributed SAS Visual Analytics: RAM, servers, benchmarking, sizing, and scaling. It discusses why we chose the non-distributed mode versus distributed mode. Challenges in the design, implementation, usage, and performance are also presented, including the reasons why Hadoop was not adopted.
Read the paper (PDF) | Watch the recording
Scott Milbuta, University of Central Florida
Ulf Borjesson, University of Central Florida
Carlos Piemonti, University of Central Florida
Session 12420-2016:
Using SAS® Programming, SAS® Enterprise Guide®, and SAS® Visual Analytics to Generate and Disseminate National Postgraduate Programs Data in a Mobile Computing Environment
The Coordination for the Improvement of Higher Education Personnel (CAPES) is a foundation within the Ministry of Education in Brazil whose central purpose is to coordinate efforts to promote high standards for postgraduate programs inside the country. Structured in a SAS® data warehouse, vast amounts of information about the National Postgraduate System (SNPG) is collected and analyzed daily. This data must be accessed by different operational and managerial profiles, on desktops and mobile devices (in this case, using SAS® Mobile BI). Therefore, accurate and fresh data must be maintained so that is possible to calculate statistics and indicators about programs, courses, teachers, students, and intellectual productions. By using SAS programming within SAS® Enterprise Guide®, all statistical calculations are performed and the results become available for exploration and presentation in SAS® Visual Analytics. Using the report designing tool, an excellent user experience is created by integrating the reports into Sucupira Platform, an online tool designed to provide greater data transparency for the academic community and the general public. This integration is made possible through the creation of public access reports with automatic authentication of guest users, presented within iframes inside the Foundation's platform. The content of the reports is grouped by scope, which makes it possible to view the indicators in different forms of presentation, to apply filters (including from URL GET parameters), and to execute stored processes.
Read the paper (PDF) | Download the data file (ZIP)
Leonardo de Lima Aguirre, Coordination for the Improvement of Higher Education Personnel
Sergio da Costa Cortes, Coordination for the Improvement of Higher Education Personnel
Marcus Vinicius de Olivera Palheta, Capes
Session SAS5963-2016:
Using SAS® Visual Analytics to Develop a Real-Time Business Metrics Command Center for Your Office
Seeing business metrics in real time enables a company to understand and respond to ever-changing customer demands. In reality, though, obtaining such metrics in real time is not always easy. However, SAS Australia and New Zealand Technical Support solved that problem by using SAS® Visual Analytics to develop a 16-display command center in the Sydney office. Using this center to provide real-time data enables the Sydney office to respond to customer demands across the entire South Asia region. The success of this deployment makes reporting capabilities and data available for Technical Support hubs in Wellington, Mumbai, Kuala Lumpur, and Singapore--covering a total distance of 12,360 kilometers (approximately 7,680 miles). By sharing SAS Visual Analytics report metrics on displays spanning multiple time zones that cover a 7-hour time difference, SAS Australia and New Zealand Technical Support has started a new journey of breaking barriers, collaborating more closely, and providing fuel for innovation and change for an entire region. This paper is aimed at individuals or companies who want to learn how SAS Australia & New Zealand Technical Support developed its command center and who are inspired to do the same for their offices!
Read the paper (PDF)
Chris Blake, SAS
Session 10725-2016:
Using SAS® to Implement Simultaneous Linking in Item Response Theory
The objective of this study is to use the GLM procedure in SAS® to solve a complex linkage problem with multiple test forms in educational research. Typically, the ABSORB option in the GLM procedure makes this task relatively easy to implement. Note that for educational assessments, to apply one-dimensional combinations of two-parameter logistic (2PL) models (Hambleton, Swaminathan, and Rogers 1991, ch. 1) and generalized partial credit models (Muraki 1997) to a large-scale high-stakes testing program with very frequent administrations requires a practical approach to link test forms. Haberman (2009) suggested a pragmatic solution of simultaneous linking to solve the challenging linking problem. In this solution, many separately calibrated test forms are linked by the use of least-squares methods. In SAS, the GLM procedure can be used to implement this algorithm by the use of the ABSORB option for the variable that specifies administrations, as long as the data are sorted by order of administration. This paper presents the use of SAS to examine the application of this proposed methodology to a simple case of real data.
Read the paper (PDF)
Lili Yao, Educational Testing Service
Session 10080-2016:
Using the REPORT Procedure to Export a Big Data Set to an External XML File
A number of SAS® tools can be used to report data, such as the PRINT, MEANS, TABULATE, and REPORT procedures. The REPORT procedure is a single tool that can produce many of the same results as other SAS tools. Not only can it create detailed reports like PROC PRINT can, but it can summarize and calculate data like the MEANS and TABULATE procedures do. Unfortunately, despite its power, PROC REPORT seems to be used less often than the other tools, possibly due to its seemingly complex coding. This paper uses PROC REPORT and the Output Delivery System (ODS) to export a big data set into a customized XML file that a user who is not familiar with SAS can easily read. Several options for the COLUMN, DEFINE, and COMPUTE statements are shown that enable you to present your data in a more colorful way. We show how to control the format of the selected columns and rows, make column headings more meaningful, and how to color selected cells differently to bring attention to the most important data.
Read the paper (PDF) | Download the data file (ZIP) | View the e-poster or slides (PDF)
Guihong Chen, TCF Bank
W
Session SAS2400-2016:
What's New in SAS® Data Management
The latest releases of SAS® Data Integration Studio, SAS® Data Management Studio and SAS® Data Integration Server, SAS® Data Governance, and SAS/ACCESS® software provide a comprehensive and integrated set of capabilities for collecting, transforming, and managing your data. The latest features in the product suite include capabilities for working with data from a wide variety of environments and types including Hadoop, cloud, RDBMS, files, unstructured data, streaming, and others, and the ability to perform ETL and ELT transformations in diverse run-time environments including SAS®, database systems, Hadoop, Spark, SAS® Analytics, cloud, and data virtualization environments. There are also new capabilities for lineage, impact analysis, clustering, and other data governance features for enhancements to master data and support metadata management. This paper provides an overview of the latest features of the SAS® Data Management product suite and includes use cases and examples for leveraging product capabilities.
Read the paper (PDF)
Nancy Rausch, SAS
Session SAS6223-2016:
What's New in SAS® Federation Server 4.2
SAS® Federation Server is a scalable, threaded, multi-user data access server providing seamlessly integrated data from various data sources. When your data becomes too large to move or copy and too sensitive to allow direct access, the powerful set of data virtualization capabilities allows you to effectively and efficiently manage and manipulate data from many sources, without moving or copying the data. This agile data integration framework allows business users the ability to connect to more data, reduce risks, respond faster, and make better business decisions. For technical users, the framework provides central data control and security, reduces complexity, optimizes commodity hardware, promotes rapid prototyping, and increases staff productivity. This paper provides an overview of the latest features of the product and includes use cases and examples for leveraging product capabilities.
Read the paper (PDF)
Tatyana Petrova, SAS
Session 9960-2016:
Working with Big Data in Near Real Time Using SAS® Event Stream Processing
In the world of big data, real-time processing and event stream processing are becoming the norm. However, there are not many tools available today that can do this type of processing. SAS® Event Stream Processing aims to process this data. In this paper, we look at using SAS Event Stream Processing to read multiple data sets stored in big data platforms such as Hadoop and Cassandra in real time and to perform transformations on the data such as joining data sets, filtering data based on preset business rules, and creating new variables as required. We look at how we can score the data based on a machine learning algorithm. This paper shows you how to use the provided Hadoop Distributed File System (HDFS) publisher and subscriber to read and push data to Hadoop. The HDFS adapter is discussed in detail. We look at the Streamviewer to see how data flows through SAS Event Stream Processing.
View the e-poster or slides (PDF)
Krishna Sai Kishore Konudula, Kavi Associates
Y
Session 10600-2016:
You Can Bet on It: Missing Observations Are Preserved with the PRELOADFMT and COMPLETETYPES Options
Do you write reports that sometimes have missing categories across all class variables? Some programmers write all sorts of additional DATA step code in order to show the zeros for the missing rows or columns. Did you ever wonder whether there is an easier way to accomplish this? PROC MEANS and PROC TABULATE, in conjunction with PROC FORMAT, can handle this situation with a couple of powerful options. With PROC TABULATE, we can use the PRELOADFMT and PRINTMISS options in conjunction with a user-defined format in PROC FORMAT to accomplish this task. With PROC SUMMARY, we can use the COMPLETETYPES option to get all the rows with zeros. This paper uses examples from Census Bureau tabulations to illustrate the use of these procedures and options to preserve missing rows or columns.
Read the paper (PDF) | Watch the recording
Chris Boniface, Census Bureau
Janet Wysocki, U.S. Census Bureau
back to top