The new and highly anticipated SAS® Output Delivery System (ODS) destination for Microsoft Excel is finally here! Available as a production feature in the third maintenance release of SAS® 9.4 (TS1M3), this new destination generates native Excel (XLSX) files that are compatible with Microsoft Office 2010 or later. This paper is written for anyone, from entry-level programmers to business analysts, who uses the SAS® System and Microsoft Excel to create reports. The discussion covers features and benefits of the new Excel destination, differences between the Excel destination and the older ExcelXP tagset, and functionality that exists in the ExcelXP tagset that is not available in the Excel destination. These topics are all illustrated with meaningful examples. The paper also explains how you can bridge the gap that exists as a result of differences in the functionality between the destination and the tagset. In addition, the discussion outlines when it is beneficial for you to use the Excel destination versus the ExcelXP tagset, and vice versa. After reading this paper, you should be able to make an informed decision about which tool best meets your needs.
Chevell Parker, SAS
This paper aims to show a SAS® macro for generating random numbers of skew-normal and skew-t distributions, as well as the quantiles of these distributions. The results are similar to those generated by the sn package of R software.
Alan Silva, University of Brasilia
Paulo Henrique Dourado da Silva, Banco do Brasil
Geographically Weighted Negative Binomial Regression (GWNBR) was developed by Silva and Rodrigues (2014). It is a generalization of the Geographically Weighted Poisson Regression (GWPR) proposed by Nakaya and others (2005) and of the Poisson and negative binomial regressions. This paper shows a SAS® macro to estimate the GWNBR model encoded in SAS/IML® software and shows how to use the SAS procedure GMAP to draw the maps.
Alan Silva, University of Brasilia
Thais Rodrigues, University of Brasilia
This paper presents how Norway, the world's second-largest seafood-exporting country, shares valuable seafood insight using the SAS® Visual Analytics Designer. Three complementary data sources: trade statistics, consumption panel data, and consumer survey data, are used to strengthen the knowledge and understanding about the important markets for seafood, which is a potential competitive advantage for the Norwegian seafood industry. The need for information varies across users and as the amount of data available is growing, the challenge is to make the information available for everyone, everywhere, at any time. Some users are interested in only the latest trade developments, while others working with product innovation are in need of deeper consumer insights. Some have quite advanced analytical skills, while others do not. Thus, one of the most important things is to make the information understandable for everyone, and at the same time provide in-depth insights for the advanced user. SAS Visual Analytics Designer makes it possible to provide both basic reporting and more in-depth analyses on trends and relationships to cover the various needs. This paper demonstrates how the functionality in SAS Visual Analytics Designer is fully used for this purpose, and presents how data from different sources is visualized in SAS Visual Analytics Designer reports located in the SAS® Information Delivery Portal. The main challenges and suggestions for improvements that have been uncovered during the process are also presented in this paper.
Kia Uuskartano, Norwegian Seafood Council
Tor Erik Somby, Norwegian Seafood Council
Government funding is a highly sought-after financing medium by entrepreneurs and researchers aspiring to make a breakthrough in the market and compete with larger organizations. The funding is channeled via federal agencies that seek out promising research and products that improve the extant work. This study analyzes the project abstracts that earned the government funding through a text analytic approach over the past three and a half decades in the fields of Defense and Health and Human Services. This helps us understand the factors that might be responsible for awards made to a project. We collected the data of 100,279 records from the Small Business Innovation and Research (SBIR) website (https://www.sbir.gov). Initially, we analyze the trends in government funding over research by the small businesses. Then we perform text mining on the 55,791 abstracts of the projects that are funded by the Department of Defense and the Department of Health. From the text mining, we portray the key topics and patterns related to the past research and determine the changes in their trends over time. Through our study, we highlight the research trends to enable organizations to shape their business strategy and make better investments in the future research.
Sairam Tanguturi, Oklahoma State University
Sagar Rudrake, Oklahoma state University
Nithish Reddy Yeduguri, Oklahoma State University
For far too long, anti-money laundering and terrorist financing solutions have forced analysts to wade through oceans of transactions and alerted work items (alerts). Alert-centered analysis is both ineffective and costly. The goal of an anti-money laundering program is to reduce risk for your financial institution, and to do this most effectively, you must start with analysis at the customer level, rather than simply troll through volumes of alerts and transactions. In this session, discover how a customer-centric approach leads to increased analyst efficiency and streamlined investigations. Rather than starting with alerts and transactions, starting with a customer-centric view allows your analysts to rapidly triage suspicious activities, prioritize work, and quickly move into investigating the highest risk customer activities.
Kathy Hart, SAS
This paper presents supervised and unsupervised pattern recognition techniques that use Base SAS® and SAS® Enterprise Miner™ software. A simple preprocessing technique creates many small image patches from larger images. These patches encourage the learned patterns to have local scale, which follows well-known statistical properties of natural images. In addition, these patches reduce the number of features that are required to represent an image and can decrease the training time that algorithms need in order to learn from the images. If a training label is available, a classifier is trained to identify patches of interest. In the unsupervised case, a stacked autoencoder network is used to generate a dictionary of representative patches, which can be used to locate areas of interest in new images. This technique can be applied to pattern recognition problems in general, and this paper presents examples from the oil and gas industry and from a solar power forecasting application.
Patrick Hall, SAS
Alex Chien, SAS Institute
ILKNUR KABUL, SAS Institute
JORGE SILVA, SAS Institute
Hospital Episode Statistics (HES) is a data warehouse that contains records of all admissions, outpatient appointments, and accident and emergency (A&E) attendances at National Health Service (NHS) hospitals in England. Each year it processes over 125 million admitted patient, outpatient, and A&E records. Such a large data set gives endless research opportunities for researchers and health-care professionals. However, patient care data is complex and might be difficult to manage. This paper demonstrates the flexibility and power of SAS® programming tools such as the DATA step, the SQL procedure, and macros to help to analyze HES data.
Violeta Balinskaite, Imperial College London
In light of past indiscretions by the police departments leading to riots in major U.S. cities, it is important to assess factors leading to the arrest of a citizen. The police department should understand the arrest situation before they can make any decision and diffuse any possibility of a riot or similar incidents. Many such incidents in the past are a result of the police department failing to make right decisions in the case of emergencies. The primary objective of this study is to understand the key factors that impact arrest of people in New York City and understanding various crimes that have taken place in the region in the last two years. The study explores different regions of New York City where the crimes have taken place and also the timing of incidents leading to them. The data set consists of 111 variables and 273,430 observations from the year 2013 to 2014, with a binary target variable arrest made . The percentage of Yes and No incidents of arrest made are 6% and 94% respectively. This study analyzes the reasons for arrest, which are suspicion, timing of frisk activities, identifying threats, basis of search, whether arrest required a physical intervention, location of the arrest, and whether the frisk officer needs any support. Decision tree, regression, and neural network models were built, and decision tree turned out to be the best model with a validation misclassification rate of 6.47% and model sensitivity of 73.17%. The results from the decision tree model showed that suspicion count, search basis count, summons leading to violence, ethnicity, and use of force are some of the important factors influencing arrest of a person.
Karan Rudra, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Maitreya Kadiyala, Oklahoma State University
The Health Indicators Warehouse (HIW) is part of the US Department of Health and Human Services' (DHHS) response to make federal data more accessible. Through it, users can access data and metadata for over 1,200 indicators from approximately 180 federal and nonfederal sources. The HIW also supports data access by applications such as SAS® Visual Analytics through the use of an application programming interface (API). An API serves as a communication interface for integration. As a result of the API, HIW data consumers using SAS Visual Analytics can avoid difficult manual data processing. This paper provides detailed information about how to access HIW data with SAS Visual Analytics in order to produce easily understood visualizations with minimal effort through a methodology that automates HIW data processing. This paper also shows how to run SAS® macros inside a stored process to make HIW data available in SAS Visual Analytics for exploration and reporting via API calls; the SAS macros are provided. Use cases involving dashboards are also examined in order to demonstrate the value of streaming data directly from the HIW. Both IT professionals and population health analysts will benefit from understanding how to import HIW data into SAS Visual Analytics using SAS® Stored Processes, macros, and APIs. This can be very helpful to organizations that want to lower maintenance costs associated with data management while gaining insights into health data with visualizations. This paper provides a starting point for any organization interested in deriving full value from SAS Visual Analytics while augmenting their work with HIW data.
Li Hui Chen, US Consumer Product Safety Commission
Manuel Figallo, SAS
A United States Department of Defense agency with over USD 40 billion in sales and revenue, 25 thousand employees, and 5.3 million parts to source, partnered with SAS® to turn their disparate PC-based analytic environment into a modern SAS® Grid Computing server-based architecture. This presentation discusses the challenges of under-powered desktops, data sprawl, outdated software, difficult upgrades, and inefficient compute processing and the solution crafted to enable the agency to run as the Fortune 50 company that its balance sheet (and our nation's security) demand. In the modern architecture, rolling upgrades, high availability, centralized data set storage, and improved performance enable improved forecasting getting our troops the supplies they need, when and where they need them.
Erin Stevens, SAS
Douglas Liming, SAS
This paper is a case study to explore the analytical and reporting capabilities of SAS® Visual Analytics to perform data exploration, determine order patterns and trends, and create data visualizations to generate extensive dashboard reports using the open source pollution data available from the United States Environmental Protection Agency (EPA). Data collection agencies report their data to EPA via the Air Quality System (AQS). The EPA makes available several types of aggregate (summary) data sets, such as daily and annual pollutant summaries, in CSV format for public use. We intend to demonstrate SAS Visual Analytics capabilities by using pollution data to create visualizations that compare Air Quality Index (AQI) values for multiple pollutants by location and time period, generate a time series plot by location and time period, compare 8-hour ozone 'exceedances' from this year with previous years, and perform other such analysis. The easy-to-use SAS Visual Analytics web-based interface is leveraged to explore patterns in the pollutant data to obtain insightful information. SAS® Visual Data Builder is used to summarize data, join data sets, and enhance the predictive power within the data. SAS® Visual Analytics Explorer is used to explore data, to create calculated data items and aggregated measures, and to define geography items. Visualizations such as chats, bar graphs, geo maps, tree maps, correlation matrices, and other graphs are created to graphically visualize pollutant information contaminating the environment; hierarchies are derived from date and time items and across geographies to allow rolling up the data. Various reports are designed for pollution analysis. Viewing on a mobile device such as an iPad is also explored. In conclusion, this paper attempts to demonstrate use of SAS Visual Analytics to determine the impact of pollution on the environment over time using various visualizations and graphs.
Viraj Kumbhakarna, MUFG Union Bank
We set out to model two of the leading causes of fatal automobile accidents in America - drunk driving and speeding. We created a decision tree for each of those causes that uses situational conditions such as time of day and type of road to classify historical accidents. Using this model a law enforcement or town official can decide if a speed trap or DUI checkpoint can be the most effective in a particular situation. This proof of concept can easily be applied to specific jurisdictions for customized solutions.
Catherine LaChapelle, NC State University
Daniel Brannock, North Carolina State University
Aric LaBarr, Institute for Advanced Analytics
Craig Shaver, North Carolina State University
Fire department facilities in Belgium are in desperate need of renovation due to relentless budget cuts during the last decade. But time has changed and the current location is troubled by city center traffic jams or narrow streets. In other words, their current location is not ideal. The approach for our project was to find the best possible locations for new facilities, based on historical interventions and live traffic information for fire trucks. The goal was not to give just one answer, but to provide instead SAS® Visual Analytics dashboards that allowed policy makers to play with some of the deciding factors and view the impact of cutting off or adding facilities on the estimated intervention time. For each option the optimal locations are given. SAS® software offers many great products but the true power of SAS as a data mining tool is in using the full SAS suite. SAS® Data Integration Studio was used for its parallel processing capabilities in extracting and parsing the traveling time via an online API to MapQuest. SAS/OR® was used for the optimization of the model using a Lagrangian multiplier. SAS® Enterprise Guide® was used to combine and integrate the data preparation. SAS Visual Analytics was used to visualize the results and to serve as an interface to execute the model via a stored process.
Wouter Travers, Deloitte
SAS® Grid Manager, as well as other grid computing technologies, have a set of great capabilities that we, IT professionals, love to have in our systems. This technology increases high availability, allows parallel processing, facilitates increasing demand by scale out, and offers other features that make life better for those managing and using these environments. However, even when business users take advantage of these features, they are more concerned about the business part of the problem. Most of the time business groups hold the budgets and are key stakeholders for any SAS Grid Manager project. Therefore, it is crucial to demonstrate to business users how they will benefit from the new technologies, how the features will improve their daily operations, help them be more efficient and productive, and help them achieve better results. This paper guides you through a process to create a strong and persuasive business plan that translates the technology features from SAS Grid Manager to business benefits.
Marlos Bosso, SAS
As SAS® programmers, we often develop listings, graphs, and reports that need to be delivered frequently to our customers. We might decide to manually run the program every time we get a request, or we might easily schedule an automatic task to send a report at a specific date and time. Both scenarios have some disadvantages. If the report is manual, we have to find and run the program every time someone request an updated version of the output. It takes some time and it is not the most interesting part of the job. If we schedule an automatic task in Windows, we still sometimes get an email from the customers because they need the report immediately. That means that we have to find and run the program for them. This paper explains how we developed an on-demand report platform using SAS® Enterprise Guide®, SAS® Web Application Server, and stored processes. We had developed many reports for different customer groups, and we were getting more and more emails from them asking for updated versions of their reports. We felt we were not using our time wisely and decided to create an infrastructure where users could easily run their programs through a web interface. The tool that we created enables SAS programmers to easily release on-demand web reports with minimum programming. It has web interfaces developed using stored processes for the administrative tasks, and it also automatically customizes the front end based on the user who connects to the website. One of the challenges of the project was that certain reports had to be available to a specific group of users only.
Romain Miralles, Genomic Health
As Data Management professionals, you have to comply with new regulations and controls. One such regulation is Basel Committee on Banking Supervision (BCBS) 239. To respond to these new demands, you have to put processes and methods in place to automate metadata collection and analysis, and to provide rigorous documentation around your data flows. You also have to deal with many aspects of data management including data access, data manipulation (ETL and other), data quality, data usage, and data consumption, often from a variety of toolsets that are not necessarily from a single vendor. This paper shows you how to use SAS® technologies to support data governance requirements, including third party metadata collection and data monitoring. It highlights best practices such as implementing a business glossary and establishing controls for monitoring data. Attend this session to become familiar with the SAS tools used to meet the new requirements and to implement a more managed environment.
Jeff Stander, SAS
SAS® Embedded Process offers a flexible, efficient way to leverage increasing amounts of data by injecting the processing power of SAS® directly where the data lives. SAS Embedded Process can tap into the massively parallel processing (MPP) architecture of Hadoop for scalable performance. Using SAS® In-Database Technologies for Hadoop, you can run scoring models generated by SAS® Enterprise Miner™ or, with SAS® In-Database Code Accelerator for Hadoop, user-written DS2 programs in parallel. With SAS Embedded Process on Hadoop you can also perform data quality operations, and extract and transform data using SAS® Data Loader. This paper explores key SAS technologies that run inside the Hadoop parallel processing framework and prepares you to get started with them.
David Ghazaleh, SAS
Injury severity describes the severity of the injury to the person involved in the crash. Understanding the factors that influence injury severity can be helpful in designing mechanisms to reduce accident fatalities. In this research, we model and analyze the data as a hierarchy with three levels to answer the question what road, vehicle and driver-related factors influence injury severity. In this study, we used hierarchical linear modeling (HLM) for analyzing nested data from Fatality Analysis Reporting System (FARS). The results show that driver-related factors are directly related to injury severity. On the other hand, road conditions and vehicle characteristics have significant moderation impact on injury severity. We believe that our study has important policy implications for designing customized mechanisms specific to each hierarchical level to reduce the occurrence of fatal accidents.
The Armed Conflict Location & Event Data Project (ACLED) is a comprehensive public collection of conflict data for developing states. As such, the data is instrumental in informing humanitarian and development work in crisis and conflict-affected regions. The ACLED project currently manually codes for eight types of events from a summarized event description, which is a time-consuming process. In addition, when determining the root causes of the conflict across developing states, these fixed event types are limiting. How can researchers get to a deeper level of analysis? This paper showcases a repeatable combination of exploratory and classification-based text analytics provided by SAS® Contextual Analysis, applied to the publicly available ACLED for African states. We depict the steps necessary to determine (using semi-automated processes) the next deeper level of themes associated with each of the coded event types; for example, while there is a broad protests and riots event type, how many of those are associated individually with rallies, demonstrations, marches, strikes, and gatherings? We also explore how to use predictive models to highlight the areas that require aid next. We prepare the data for use in SAS® Visual Analytics and SAS® Visual Statistics using SAS® DS2 code and enable dynamic explorations of the new event sub-types. This ultimately provides the end-user analyst with additional layers of data that can be used to detect trends and deploy humanitarian aid where limited resources are needed the most.
Tom Sabo, SAS
Do you create complex reports using PROC REPORT? Are you confused by the COMPUTE BLOCK feature of PROC REPORT? Are you even aware of it? Maybe you already produce reports using PROC REPORT, but suddenly your boss needs you to modify some of the values in one or more of the columns. Maybe your boss needs to see the values of some rows in boldface and others highlighted in a stylish yellow. Perhaps one of the columns in the report needs to display a variety of fashionable formats (some with varying decimal places and some without any decimals). Maybe the customer needs to see a footnote in specific cells of the report. Well, if this sounds familiar then come take a look at the COMPUTE BLOCK of PROC REPORT. This paper shows a few tips and tricks of using the COMPUTE DEFINE block with conditional IF/THEN logic to make your reports stylish and fashionable. The COMPUTE BLOCK allows you to use data DATA step code within PROC REPORT to provide customization and style to your reports. We'll see how the Census Bureau produces a stylish demographic profile for customers of its Special Census program using PROC REPORT with the COMPUTE BLOCK. The paper focuses on how to use the COMPUTE BLOCK to create this stylish Special Census profile. The paper shows quick tips and simple code to handle multiple formats within the same column, make the values in the Total rows boldface, trafficlighting, and how to add footnotes to any cell based on the column or row. The Special Census profile report is an Excel table created with ODS tagsets.ExcelXP that is stylish and fashionable, thanks in part to the COMPUTE BLOCK.
Chris Boniface, Census Bureau
Road safety is a major concern for all United States of America citizens. According to the National Highway Traffic Safety Administration, 30,000 deaths are caused by automobile accidents annually. Oftentimes fatalities occur due to a number of factors such as driver carelessness, speed of operation, impairment due to alcohol or drugs, and road environment. Some studies suggest that car crashes are solely due to driver factors, while other studies suggest car crashes are due to a combination of roadway and driver factors. However, other factors not mentioned in previous studies may be contributing to automobile accident fatalities. The objective of this project was to identify the significant factors that lead to multiple fatalities in the event of a car crash.
Bill Bentley, Value-Train
Gina Colaianni, Kennesaw State University
Cheryl Joneckis, Kennesaw State University
Sherry Ni, Kennesaw State University
Kennedy Onzere, Kennesaw State University
The importance of understanding terrorism in the United States assumed heightened prominence in the wake of the coordinated attacks of September 11, 2001. Yet, surprisingly little is known about the frequency of attacks that might happen in the future and the factors that lead to a successful terrorist attack. This research is aimed at forecasting the frequency of attacks per annum in the United States and determining the factors that contribute to an attack's success. Using the data acquired from the Global Terrorism Database (GTD), we tested our hypothesis by examining the frequency of attacks per annum in the United States from 1972 to 2014 using SAS® Enterprise Miner™ and SAS® Forecast Studio. The data set has 2,683 observations. Our forecasting model predicts that there could be as many as 16 attacks every year for the next 4 years. From our study of factors that contribute to the success of a terrorist attack, we discovered that attack type, weapons used in the attack, place of attack, and type of target play pivotal roles in determining the success of a terrorist attack. Results reveal that the government might be successful in averting assassination attempts but might fail to prevent armed assaults and facilities or infrastructure attacks. Therefore, additional security might be required for important facilities in order to prevent further attacks from being successful. Results further reveal that it is possible to reduce the forecasted number of attacks by raising defense spending and by putting an end to the raging war in the Middle East. We are currently working on gathering data to do in-depth analysis at the monthly and quarterly level.
SriHarsha Ramineni, Oklahoma State University
Koteswara Rao Sriram, Oklahoma State University
If your organization already deploys one or more software solutions via Amazon Web Services (AWS), you know the value of the public cloud. AWS provides a scalable public cloud with a global footprint, allowing users access to enterprise software solutions anywhere at any time. Although SAS® began long before AWS was even imagined, many loyal organizations driven by SAS are moving their local SAS analytics into the public AWS cloud, alongside other software hosted by AWS. SAS® Solutions OnDemand has assisted organizations in this transition. In this paper, we describe how we extended our enterprise hosting business to AWS. We describe the open source automation framework from which SAS Soultions onDemand built our automation stack, which simplified the process of migrating a SAS implementation. We'll provide the technical details of our automation and network footprint, a discussion of the technologies we chose along the way, and a list of lessons learned.
Ethan Merrill, SAS
Bryan Harkola, SAS
Project management is a hot topic across many industries, and there are multiple commercial software applications for managing projects available. The reality, however, is that the majority of project management software is not applicable for daily usage. SAS® has a solution for this issue that can be used for managing projects graphically in real time. This paper introduces a new paradigm for project management using the SAS® Graph Template Language (GTL). SAS clients, in real time, can use GTL to visualize resource assignments, task plans, delivery tracking, and project status across multiple project levels for more efficient project management.
Zhouming(Victor) Sun, Medimmune
Have you ever wondered how to get the most from Web 2.0 technologies in order to visualize SAS® data? How to make those graphs dynamic, so that users can explore the data in a controlled way, without needing prior knowledge of SAS products or data science? Wonder no more! In this session, you learn how to turn basic sashelp.stocks data into a snazzy HighCharts stock chart in which a user can review any time period, zoom in and out, and export the graph as an image. All of these features with only two DATA steps and one SORT procedure, for 57 lines of SAS code.
Vasilij Nevlev, Analytium Ltd
Looking for new ways to improve your business? Try mining your own data! Event log data is a side product of information systems generated for audit and security purposes and is seldom analyzed, especially in combination with business data. Along with the cloud computing era, more event log data has been accumulated and analysts are searching for innovative ways to take advantage of all data resources in order to get valuable insights. Process mining, a new field for discovering business patterns from event log data, has recently proved useful for business applications. Process mining shares some algorithms with data mining but it is more focused on interpretation of the detected patterns rather than prediction. Analysis of these patterns can lead to improvements in the efficiency of common existing and planned business processes. Through process mining, analysts can uncover hidden relationships between resources and activities and make changes to improve organizational structure. This paper shows you how to use SAS® Analytics to gain insights from real event log data.
Emily (Yan) Gao, SAS
Robert Chu, SAS
Xudong Sun, SAS
Microsoft Visual Basic Scripting Edition (VBScript) and SAS® software are each powerful tools in their own right. These two technologies can be combined so that SAS code can call a VBScript program or vice versa. This gives a programmer the ability to automate SAS tasks; traverse the file system; send emails programmatically via Microsoft Outlook or SMTP; manipulate Microsoft Word, Microsoft Excel, and Microsoft PowerPoint files; get web data; and more. This paper presents example code to demonstrate each of these capabilities.
Christopher Johnson, BrickStreet Insurance
In studies where randomization is not possible, imbalance in baseline covariates (confounding by indication) is a fundamental concern. Propensity score matching (PSM) is a popular method to minimize this potential bias, matching individuals who received treatment to those who did not, to reduce the imbalance in pre-treatment covariate distributions. PSM methods continue to advance, as computing resources expand. Optimal matching, which selects the set of matches that minimizes the average difference in propensity scores between mates, has been shown to outperform less computationally intensive methods. However, many find the implementation daunting. SAS/IML® software allows the integration of optimal matching routines that execute in R, e.g. the R optmatch package. This presentation walks through performing optimal PSM in SAS® through implementing R functions, assessing whether covariate trimming is necessary prior to PSM. It covers the propensity score analysis in SAS, the matching procedure, and the post-matching assessment of covariate balance using SAS/STAT® 13.2 and SAS/IML procedures.
Lucy D'Agostino McGowan, Vanderbilt University
Robert Greevy, Department of Biostatistics, Vanderbilt University
Do you want to see and experience how to configure SAS® Enterprise Miner™ single sign-on? Are you looking to explore setting up Integrated Windows Authentication with SAS® Visual Analytics? This hands-on workshop demonstrates how you can configure Kerberos delegation with SAS® 9.4. You see how to validate the prerequisites, make the configuration changes, and use the applications. By the end of this workshop you will be empowered to start your own configuration.
Stuart Rogers, SAS
Considering the fact that SAS® Grid Manager is becoming more and more popular, it is important to fulfill the user's need for a successful migration to a SAS® Grid environment. This paper focuses on key requirements and common issues for new SAS Grid users, especially if they are coming from a traditional environment. This paper describes a few common requirements like the need for a current working directory, the change of file system navigation in SAS® Enterprise Guide® with user-given location, getting job execution summary email, and so on. The GRIDWORK directory has been introduced in SAS Grid Manager, which is a bit different from the traditional SAS WORK location. This paper explains how you can use the GRIDWORK location in a more user-friendly way. Sometimes users experience data set size differences during grid migration. A few important reasons for data set size difference are demonstrated. We also demonstrate how to create new custom scripts as per business needs and how to incorporate them with SAS Grid Manager engine.
Piyush Singh, TATA Consultancy Services Ltd
Tanuj Gupta, TATA Consultancy Services
Prasoon Sangwan, Tata consultancy services limited
This paper presents the use of latent class analysis (LCA) to base the identification of a set of mutually exclusive latent classes of individuals on responses to a set of categorical, observed variables. The LCA procedure, a user-defined SAS® procedure for conducting LCA and LCA with covariates, is demonstrated using follow-up data on substance use from Monitoring the Future panel data, a nationally representative sample of high school seniors who are followed at selected time points during adulthood. The demonstration includes guidance on data management prior to analysis, PROC LCA syntax requirements and options, and interpretation of output.
Patricia Berglund, University of Michigan
Is uniqueness essential for your reports? SAS® Visual Analytics provides the ability to customize your reports to make them unique by using the SAS® Theme Designer. The SAS Theme Designer can be accessed from the SAS® Visual Analytics Hub to create custom themes to meet your branding needs and to ensure a unified look across your company. The report themes affect the colors, fonts, and other elements that are used in tables and graphs. The paper explores how to access SAS Theme Designer from the SAS Visual Analytics home page, how to create and modify report themes that are used in SAS Visual Analytics, how to create report themes from imported custom themes, and how to import and export custom report themes.
Meenu Jaiswal, SAS
Ipsita Samantarai, SAS Research & Development (India) Pvt Ltd
Business Intelligence users analyze business data in a variety of ways. Seventy percent of business data contains location information. For in-depth analysis, it is essential to combine location information with mapping. New analytical capabilities are added to SAS® Visual Analytics, leveraging the new partnership with Esri, a leader in location intelligence and mapping. The new capabilities enable users to enhance the analytical insights from SAS Visual Analytics. This paper demonstrates and discusses the new partnership with Esri and the new capabilities added to SAS Visual Analytics.
Murali Nori, SAS
Himesh Patel, SAS
For SAS® Enterprise Guide® users, sometimes macro variables and their values need to be brought over to the local workspace from the server, especially when multiple data sets or outputs need to be written to separate files in a local drive. Manually retyping the macro variables and their values in the local workspace after they have been created on the server workspace would be time-consuming and error-prone, especially when we have quite a number of macro variables and values to bring over. Instead, this task can be achieved in an efficient manner by using dictionary tables and the CALL SYMPUT routine, as illustrated in more detail below. The same approach can also be used to bring macro variables and their values from the local to the server workspace.
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
As part of promoting a data-driven culture and data analytics modernization at its federal sector clientele, Northrop Grumman developed a framework for designing and implementing an in-house Data Analytics and Research Center (DAARC) using a SAS® set of tools. This DAARC provides a complete set of SAS® Enterprise BI (Business Intelligence) and SAS® Data Management tools. The platform can be used for data research, evaluations, and analysis and reviews by federal agencies such as the Social Security Administration (SSA), the Center for Medicare and Medicaid Services (CMS), and others. DAARC architecture is based on a SAS data analytics platform with newer capabilities of data mining, forecasting, visual analytics, and data integration using SAS® Business Intelligence. These capabilities enable developers, researchers, and analysts to explore big data sets with varied data sources, create predictive models, and perform advanced analytics including forecasting, anomaly detection, use of dashboards, and creating online reports. The DAARC framework that Northrop Grumman developed enables agencies to implement a self-sufficient 'analytics as a service' approach to meet their business goals by making informed and proactive data-driven decisions. This paper provides a detailed approach to how the DAARC framework was established in strong partnership with federal customers of Northrop Grumman. This paper also discusses the best practices that were adopted for implementing specific business use cases in order to save tax-payer dollars through many research-related analytical and statistical initiatives that continue to use this platform.
Vivek Sethunatesan, Northrop Grumman
Business problems have become more stratified and micro-segmentation is driving the need for mass-scale, automated machine learning solutions. Additionally, deployment environments include diverse ecosystems, requiring hundreds of models to be built and deployed quickly via web services to operational systems. The new SAS® automated modeling tool allows you to build and test hundreds of models across all of the segments in your data, testing a wide variety of machine learning techniques. The tool is completely customizable, allowing you transparent access to all modeling results. This paper shows you how to identify hundreds of champion models using SAS® Factory Miner, while generating scoring web services using SAS® Decision Manager. Immediate benefits include efficient model deployments, which allow you to spend more time generating insights that might reveal new opportunities, expose hidden risks, and fuel smarter, well-timed decisions.
Jonathan Wexler, SAS
Steve Sparano, SAS
Every day, businesses have to remain vigilant of fraudulent activity, which threatens customers, partners, employees, and financials. Normally, networks of people or groups perpetrate deviant activity. Finding these connections is now made easier for analysts with SAS® Visual Investigator, an upcoming SAS® solution that ultimately minimizes the loss of money and preserves mutual trust among its shareholders. SAS Visual Investigator takes advantage of the capabilities of the new SAS® In-Memory Server. Investigators can efficiently investigate suspicious cases across business lines, which has traditionally been difficult. However, the time required to collect, process and identify emerging fraud and compliance issues has been costly. Making proactive analysis accessible to analysts is now more important than ever. SAS Visual Investigator was designed with this goal in mind and a key component is the visual social network view. This paper discusses how the network analysis view of SAS Visual Investigator, with all its dynamic visual capabilities, can make the investigative process more informative and efficient.
Danielle Davis, SAS
Stephen Boyd, SAS Institute
Ray Ong, SAS Institute
When analyzing data with SAS®, we often encounter missing or null values in data. Missing values can arise from the availability, collectibility, or other issues with the data. They represent the imperfect nature of real data. Under most circumstances, we need to clean, filter, separate, impute, or investigate the missing values in data. These processes can take up a lot of time, and they are annoying. For these reasons, missing values are usually unwelcome and need to be avoided in data analysis. There are two sides to every coin, however. If we can think outside the box, we can take advantage of the negative features of missing values for positive uses. Sometimes, we can create and use missing values to achieve our particular goals in data manipulation and analysis. These approaches can make data analyses convenient and improve work efficiency for SAS programming. This kind of creative and critical thinking is the most valuable quality for data analysts. This paper exploits real-world examples to demonstrate the creative uses of missing values in data analysis and SAS programming, and discusses the advantages and disadvantages of these methods and approaches. The illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
You've heard all the talk about SAS® Visual Analytics--but maybe you are still confused about how the product would work in your SAS® environment. Many customers have the same points of confusion about what they need to do with their data, how to get data into the product, how SAS Visual Analytics would benefit them, and even should they be considering Hadoop or the cloud. In this paper, we cover the questions we are asked most often about implementation, administration, and usage of SAS Visual Analytics.
Tricia Aanderud, Zencos Consulting LLC
Ryan Kumpfmiller, Zencos Consulting
Nick Welke, Zencos Consulting
In cooperation with the Joint Research Centre - European Commission (JRC), we have developed a number of innovative techniques to detect outliers on a large scale. In this session, we show the power of SAS/IML® Studio as an interactive tool for exploring and detecting outliers using customized algorithms that were built from scratch. The JRC uses this for detecting abnormal trade transactions on a large scale. The outliers are detected using the Forward Search, which starts from a central subset in the data and subsequently adds observations that are close to the current subset based on regression (R-student) or multivariate (Mahalanobis distance) output statistics. The implementation of this algorithm and its applications were done in SAS/IML Studio and converted to a macro for use in the IML procedure in Base SAS®.
Jos Polfliet, SAS
Inspired by Christianna William's paper on transitioning to PROC SQL from the DATA step, this paper aims to help SQL programmers transition to SAS® by using PROC SQL. SAS adapted the Structured Query Language (SQL) by means of PROC SQL back with SAS®6. PROC SQL syntax closely resembles SQL. However, there are some SQL features that are not available in SAS. Throughout this paper, we outline common SQL tasks and how they might differ in PROC SQL. We also introduce useful SAS features that are not available in SQL. Topics covered are appropriate for novice SAS users.
Barbara Ross, NA
Jessica Bennett, Snap Finance
SAS® software provides many DATA step functions that search and extract patterns from a character string, such as SUBSTR, SCAN, INDEX, TRANWRD, etc. Using these functions to perform pattern matching often requires you to use many function calls to match a character position. However, using the Perl regular expression (PRX) functions or routines in the DATA step improves pattern-matching tasks by reducing the number of function calls and making the program easier to maintain. This talk, in addition to discussing the syntax of Perl regular expressions, demonstrates many real-world applications.
Arthur Li, City of Hope
In a data warehousing system, change data capture (CDC) plays an important part not just in making the data warehouse (DWH) aware of the change but also in providing a means of flowing the change to the DWH marts and reporting tables so that we see the current and latest version of the truth. This and slowly changing dimensions (SCD) create a cycle that runs the DWH and provides valuable insights in the history and for the decision-making future. What if the source has no CDC? It would be an ETL nightmare to identify the exact change and report the absolute truth. If these two processes can be combined into a single process where just one single transform does both jobs of identifying the change and applying the change to the DWH, then we can save significant processing times and value resources of the system. Hence, I came up with a hybrid SCD with CDC approach for this. My paper focuses on sources that DO NOT have CDC in their sources and need to perform SCD Type 2 on such records without worrying about data duplications and increased processing times.
Vishant Bhat, University of Newcastle
Tony Blanch, SAS Consultant
Horizontal data sorting is a very useful SAS® technique in advanced data analysis when you are using SAS programming. Two years ago (SAS® Global Forum Paper 376-2013), we presented and illustrated various methods and approaches to perform horizontal data sorting, and we demonstrated its valuable application in strategic data reporting. However, this technique can also be used as a creative analytic method in advanced business analytics. This paper presents and discusses its innovative and insightful applications in product purchase sequence analyses such as product opening sequence analysis, product affinity analysis, next best offer analysis, time-span analysis, and so on. Compared to other analytic approaches, the horizontal data sorting technique has the distinct advantages of being straightforward, simple, and convenient to use. This technique also produces easy-to-interpret analytic results. Therefore, the technique can have a wide variety of applications in customer data analysis and business analytics fields.
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
One of the most important factors driving the success of requirements-gathering can be easily overlooked. Your user community needs to have a clear understanding of what is possible: from different ways to represent a hierarchy to how visualizations can drive an analysis to newer, but less common, visualizations that are quickly becoming standard. Discussions about desktop access versus mobile deployment and/or which users might need more advanced statistical reporting can lead to a serious case of option overload. One of the best cures for option overload is to provide your user community with access to template reports they can explore themselves. In this paper, we describe how you can take a single rich data set and build a set of template reports that demonstrate the full functionality of SAS® Visual Analytics, a suite of the most common, most useful SAS Visual Analytics report structures, from high-level dashboards to statistically deep dynamic visualizations. We show exactly how to build a dozen template reports from a single data source, simultaneously representing options for color schemes, themes, and other choices to consider. Although this template suite approach can apply to any industry, our example data set will be publicly available data from the Home Mortgage Disclosure Act, de-identified data on mortgage loan determinations. Instead of beginning requirements-gathering with a blank slate, your users can begin the conversation with, I would like something like Template #4, greatly reducing the time and effort required to meet their needs.
Elliot Inman, SAS
Michael Drutar, SAS
Our company Veikkaus is a state-owned gambling and lottery company in Finland that has a national legalized monopoly for gambling. All the profit we make goes back to Finnish society (for art, sports, science, and culture), and this is done by our government. In addition to the government's requirements of profit, the state (Finland) also requires us to handle the adverse social aspects of gaming, such as problem gambling. The challenge in our business is to balance between these two factors. For the purposes of problem gambling, we have used SAS® tools to create a responsible gaming tool, called VasA, based on a logistic regression model. The name VasA is derived from the Finnish words for 'Responsible Customership.' The model identifies problem gamblers from our customer database using the data from identified gaming, money transfers, web behavior, and customer data. The variables that were used in the model are based on the theory behind the problem gambling. Our actions for problem gambling include, for example, different CRM and personalization of a customer's website in our web service. There were several companies who provided responsible gambling tools as such for us to buy, but we wanted to create our own for two reasons. Firstly, we wanted it to include our whole customer database, meaning all our customers and not just those customers who wanted to take part in it. These other tools normally include only customers who want to take part. The other reason was that we saved a ridiculous amount of money by doing it by ourselves compared to having to buy one. During this process, SAS played a big role, from gathering the data to the construction of the tool, and from modeling to creating the VasA variables, then on to the database, and finally to the analyses and reporting.
Tero Kallioniemi, Veikkaus
This paper explores some proven methods used to automate complex SAS® Enterprise Guide® projects so that the average Joe can run them with little or no prior experience. There are often times when a programmer is requested to extract data and dump it into Microsoft Excel for a user. Often these data extracts are very similar and can be run with previously saved code. However, the user quite often has to wait for the programmer to have the time to simply run the code. By automating the code, the programmer regains control over their data requests. This paper discusses the benefits of establishing macro variables and creating stored procedures, among other tips
Jennifer Davies, Department of Education
Mobile devices are an integral part of a business professional's life. These mobile devices are getting increasingly powerful in terms of processor speeds and memory capabilities. Business users can benefit from a more analytical visualization of the data along with their business context. The new SAS® Mobile BI contains many enhancements that facilitate the use of SAS® Analytics in the newest version of SAS® Visual Analytics. This paper demonstrates how to use the new analytical visualization that has been added to SAS Mobile BI from SAS Visual Analytics, for a richer and more insightful experience for business professionals on the go.
Murali Nori, SAS
Disease prevalence is one of the most basic measures of the burden of disease in the field of epidemiology. As an estimate of the total number of cases of disease in a given population, prevalence is a standard in public health analysis. The prevalence of diseases in a given area is also frequently at the core of governmental policy decisions, charitable organization funding initiatives, and countless other aspects of everyday life. However, all too often, prevalence estimates are restricted to descriptive estimates of population characteristics when they could have a much wider application through the use of inferential statistics. As an estimate based on a sample from a population, disease prevalence can vary based on random fluctuations in that sample rather than true differences in the population characteristic. Statistical inference uses a known distribution of this sampling variation to perform hypothesis tests, calculate confidence intervals, and perform other advanced statistical methods. However, there is no agreed-upon sampling distribution of the prevalence estimate. In cases where the sampling distribution of an estimate is unknown, statisticians frequently rely on the bootstrap re-sampling procedure first given by Efron in 1979. This procedure relies on the computational power of software to generate repeated pseudo-samples similar in structure to an original, real data set. These multiple samples allow for the construction of confidence intervals and statistical tests to make statistical determinations and comparisons using the estimated prevalence. In this paper, we use the bootstrapping capabilities of SAS® 9.4 to compare statistically the difference between two given prevalence rates. We create a bootstrap analog to the two-sample t test to compare prevalence rates from two states despite the fact that the sampling distribution of these estimates is unknown using SAS®.
Matthew Dutton, Florida A&M University
Charlotte Baker, Florida A&M University
Big data, small data--but what about when you have no data? Survey data commonly include missing values due to nonresponse. Adjusting for nonresponse in the analysis stage might lead different analysts to use different, and inconsistent, adjustment methods. To provide the same complete data to all the analysts, you can impute the missing values by replacing them with reasonable nonmissing values. Hot-deck imputation, the most commonly used survey imputation method, replaces the missing values of a nonrespondent unit by the observed values of a respondent unit. In addition to performing traditional cell-based hot-deck imputation, the SURVEYIMPUTE procedure, new in SAS/STAT® 14.1, also performs more modern fully efficient fractional imputation (FEFI). FEFI is a variation of hot-deck imputation in which all potential donors in a cell contribute their values. Filling in missing values is only a part of PROC SURVEYIMPUTE. The real trick is to perform analyses of the filled-in data that appropriately account for the imputation. PROC SURVEYIMPUTE also creates a set of replicate weights that are adjusted for FEFI. Thus, if you use the imputed data from PROC SURVEYIMPUTE along with the replicate methods in any of the survey analysis procedures--SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, or SURVEYPHREG--you can be confident that inferences account not only for the survey design, but also for the imputation. This paper discusses different approaches for handling nonresponse in surveys, introduces PROC SURVEYIMPUTE, and demonstrates its use with real-world applications. It also discusses connections with other ways of handling missing values in SAS/STAT.
Pushpal Mukhopadhyay, SAS
Universities, government agencies, and non-profits all require various levels of data analysis, data manipulation, and data management. This requires a workforce that knows how to use statistical packages. The server-based SAS® OnDemand for Academics: Studio offerings are excellent tools for teaching SAS in both postsecondary education and in professional continuing education settings. SAS OnDemand for Academics: Studio can be used to share resources; teach users using Windows, UNIX, and Mac OS computers; and teach in various computing environments. This paper discusses why one might use SAS OnDemand for Academics: Studio instead of SAS® University Edition, SAS® Enterprise Guide®, or Base SAS® for education and provides examples of how SAS OnDemand for Academics: Studio has been used for in-person and virtual training.
Charlotte Baker, Florida A&M University
C. Perry Brown, Florida Agricultural and Mechanical University
This paper discusses a set of practical recommendations for optimizing the performance and scalability of your Hadoop system using SAS®. Topics include recommendations gleaned from actual deployments from a variety of implementations and distributions. Techniques cover tips for improving performance and working with complex Hadoop technologies such as Kerberos, techniques for improving efficiency when working with data, methods to better leverage the SAS in Hadoop components, and other recommendations. With this information, you can unlock the power of SAS in your Hadoop system.
Nancy Rausch, SAS
Wilbram Hazejager, SAS
In this session, we examine the use of text analytics as a tool for strategic analysis of an organization, a leader, a brand, or a process. The software solutions SAS® Enterprise Miner™ and Base SAS® are used to extract topics from text and create visualizations that identify the relative placement of the topics next to various business entities. We review a number of case studies that identify and visualize brand-topic relationships in the context of branding, leadership, and service quality.
Nicholas Evangelopoulos, University of North Texas
The recent controversy regarding former Secretary Hillary Clinton's use of a non-government, privately maintained email server provides a great opportunity to analyze real-world data using a variety of analytic techniques. This email corpus is interesting because of the challenges in acquiring and preparing the data for analysis as well as the variety of analyses that can be performed, including techniques for searching, entity extraction and resolution, natural language processing for topic generation, and social network analysis. Given the potential for politically charged discussion, rest assured there will be no discussion of politics--just fact-based analysis.
Michael Ames, SAS
All public schools in the United States require health and safety education for their students. Furthermore, almost all states require driver education before minors can obtain a driver's license. Through extensive analysis of the Fatality Analysis Reporting System data, we have concluded that from 2011-2013 an average of 12.1% of all individuals killed in a motor vehicle accident in the United States, District of Columbia, and Puerto Rico were minors (18 years or younger). Our goal is to offer insight within our analysis in order to better road safety education to prevent future premature deaths involving motor vehicles.
Molly Funk, Bryant University
Max Karsok, Bryant University
Michelle Williams, Bryant University
VBA has been described as a glue language, and has been widely used in exchanging data between Microsoft products such as Excel and Word or PowerPoint. How to trigger the VBA macro from SAS® via DDE has been widely discussed in recent years. However, using SAS to send parameters to a VBA macro was seldom reported. This paper provides a solution for this problem. Copying Excel tables to PowerPoint using the combination of SAS and VBA is illustrated as an example. The SAS program rapidly scans all Excel files that are contained in one folder, passes the file information to VBA as parameters, and triggers the VBA macro to write PowerPoint files in a loop. As a result, a batch of PowerPoint files can be generated by just one mouse-click.
Zhu Yanrong, Medtronic
For marketers who are responsible for identifying the best customer to target in a campaign, it is often daunting to determine which media channel, offer, or campaign program is the one the customer is more apt to respond to, and therefore, is more likely to increase revenue. This presentation examines the components of designing campaigns to identify promotable segments of customers and to target the optimal customers using SAS® Marketing Automation integrated with SAS® Marketing Optimization.
Pamela Dixon, SAS
This paper is a primer on the practice of designing, selecting, and making inferences on a statistical sample, where the goal is to estimate the magnitude of error in a book value total. Although the concepts and syntax are presented through the lens of an audit of health-care insurance claim payments, they generalize to other contexts. After presenting the fundamental measures of uncertainty that are associated with sample-based estimates, we outline a few methods to estimate the sample size necessary to achieve a targeted precision threshold. The benefits of stratification are also explained. Finally, we compare several viable estimators to quantify the book value discrepancy, making note of the scenarios where one might be preferred over the others.
Taylor Lewis, U.S. Office of Personnel Management
Julie Johnson, OPM - Office of the Inspector General
Christine Muha, U.S. Office of Personnel Management
Specifying colors based on group value is a quite popular practice in visualizing data, but it is not so easy to do, especially when there are multiple group values. This paper explores three different methods to dynamically assign colors to plots based on their group values. They are combining EVAL and IFN functions in the plot statements; bringing the DISCRETEATTRMAP block into the plot statements; and using the macro from the SAS® sample 40255.
Amos Shu, MedImmune
Are you going to enable HTTPS for your SAS® environment? Looking to improve the security of your SAS deployment? Do you need more details about how to efficiently configure HTTPS? This paper guides you through the configuration of SAS® 9.4 with HTTPS for the SAS middle tier. We examine how best to implement site-signed Transport Layer Security (TLS) certificates and explore how far you can take the encryption. This paper presents tips and proven practices that can help you be successful.
Stuart Rogers, SAS
At the University of Central Florida (UCF), we recently invested in SAS® Visual Analytics, along with the updated SAS® Business Intelligence platform (from 9.2 to 9.4), a project that took over a year to be completed. This project was undertaken to give our users the best and most updated tools available. This paper introduces the SAS Visual Analytics environment at UCF and includes projects created using this product. It answers why we selected SAS Visual Analytics for development over other SAS® applications. It explains the technical environment for our non-distributed SAS Visual Analytics: RAM, servers, benchmarking, sizing, and scaling. It discusses why we chose the non-distributed mode versus distributed mode. Challenges in the design, implementation, usage, and performance are also presented, including the reasons why Hadoop was not adopted.
Scott Milbuta, University of Central Florida
Ulf Borjesson, University of Central Florida
Carlos Piemonti, University of Central Florida
Administrative health databases, including hospital and physician records, are frequently used to estimate the prevalence of chronic diseases. Disease-surveillance information is used by policy makers and researchers to compare the health of populations and develop projections about disease burden. However, not all cases are captured by administrative health databases, which can result in biased estimates. Capture-recapture (CR) models, originally developed to estimate the sizes of animal populations, have been adapted for use by epidemiologists to estimate the total sizes of disease populations for such conditions as cancer, diabetes, and arthritis. Estimates of the number of cases are produced by assessing the degree of overlap among incomplete lists of disease cases captured in different sources. Two- and three-source CR models are most commonly used, often with covariates. Two important assumptions--independence of capture in each data source and homogeneity of capture probabilities, which underlie conventional CR models--are unlikely to hold in epidemiological studies. Failure to satisfy these assumptions bias the model results. Log-linear, multinomial logistic regression, and conditional logistic regression models, if used properly, can incorporate dependency among sources and covariates to model the effect of heterogeneity in capture probabilities. However, none of these models is optimal, and researchers might be unfamiliar with how to use them in practice. This paper demonstrates how to use SAS® to implement the log-linear, multinomial logistic regression, and conditional logistic regression CR models. Methods to address the assumptions of independence between sources and homogeneity of capture probabilities for a three-source CR model are provided. The paper uses a real numeric data set about Parkinson's disease involving physician claims, hospital abstract, and prescription drug records from one Canadian province. Advantages and disadvantages of each model are discus
sed.
Lisa Lix, University of Manitoba
The Statistical Graphics (SG) procedures and the Graph Template Language (GTL) are capable of generating powerful individual data displays. What if one wanted to visualize how a distribution changes with different parameters, or to view multiple aspects of a three-dimensional plot, just as two examples? By using macros to generate a graph for each frame, combined with the ODS PRINTER destination, it is possible to create GIF files to create effective animated data displays. This paper outlines the syntax and strategy necessary to generate these displays and provides a handful of examples. Intermediate knowledge of PROC SGPLOT, PROC TEMPLATE, and the SAS® macro language is assumed.
Jesse Pratt, Cincinnati Children's Hospital Medical Center
Many SAS® procedures can be used to analyze large amounts of correlated data. This study was a secondary analysis of data obtained from the South Carolina Revenue and Fiscal Affairs Office (RFA). The data includes medical claims from all health care systems in South Carolina (SC). This study used the SAS procedure GENMOD to analyze a large amount of correlated data about Military Health Care (MHS) system beneficiaries who received behavioral health care in South Carolina Health Care Systems from 2005 to 2014. Behavioral health (BH) was defined by Major Diagnostic Code (MDC) 19 (mental disorders and diseases) and 20 (alcohol/drug use). MDCs are formed by dividing all possible principal diagnoses from the International Classification Diagnostic (ICD-9) codes into 25 mutually exclusive diagnostic categories. The sample included a total of 6,783 BH visits and 4,827 unique adult and child patients that included military service members, veterans, and their adult and child dependents who have MHS insurance coverage. PROC GENMOD included a multivariate GEE model with type of BH visit (mental health or substance abuse) as the dependent variable, and with gender, race group, age group, and discharge year as predictors. Hospital ID was used in the repeated statement with different correlation structures. Gender was significant with both independent correlation (p = .0001) and exchangeable structure (p = .0003). However, age group was significant using the independent correlation (p = .0160), but non-significant using the exchangeable correlation structure (p = .0584). SAS is a powerful statistical program for analyzing large amounts of correlated data with categorical outcomes.
Abbas Tavakoli, University of South Carolina
Jordan Brittingham, USC/ Arnold School of Public Health
Nikki R. Wooten, USC/College of Social Work
Sensitive data requires elevated security requirements and the flexibility to apply logic that subsets data based on user privileges. Following the instructions in SAS® Visual Analytics: Administration Guide gives you the ability to apply row-level permission conditions. After you have set the permissions, you have to prove through audits who has access and row-level security. This paper provides you with the ability to easily apply, validate, report, and audit all tables that have row-level permissions, along with the groups, users, and conditions that will be applied. Take the hours of maintenance and lack of visibility out of row-level secure data and build confidence in the data and analytics that are provided to the enterprise.
Brandon Kirk, SAS
For SAS® users, PROC TABULATE and PROC REPORT (and its compute blocks) are probably among the most common procedures for calculating and displaying data. It is, however, pretty difficult to calculate and display changes from one column to another using data from other rows with just these two procedures. Compute blocks in PROC REPORT can calculate additional columns, but it would be challenging to pick up values from other rows as inputs. This presentation shows how PROC TABULATE can work with the lag(n) function to calculate rates of change from one period of time to another. This offers the flexibility of feeding into calculations the data retrieved from other rows of the report. PROC REPORT is then used to produce the desired output. The same approach can also be used in a variety of scenarios to produce customized reports.
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
The Add Health Parent Study is using a new and innovative method to augment our other interview verification strategies. Typical verification strategies include calling respondents to ask questions about their interview, recording pieces of interaction (CARI - Computer Aided Recorded Interview), and analyzing timing data to see that each interview was within a reasonable length. Geocoding adds another tool to the toolbox for verifications. By applying street-level geocoding to the address where an interview is reported to be conducted and comparing that to a captured latitude/longitude reading from a GPS tracking device, we are able to compute the distance between two points. If that distance is very small and time stamps are close to each other, then the evidence points to the field interviewer being present at the respondent's address during the interview. For our project, the street-level geocoding to an address is done using SAS® PROC GEOCODE. Our paper describes how to obtain a US address database from the SAS website and how it can be used in PROC GEOCODE. We also briefly compare this technique to using the Google Map API and Python as an alternative.
Chris Carson, RTI International
Lilia Filippenko, RTI International
Mai Nguyen, RTI International
Marriage laws have changed in 38 states, including the District of Columbia, permitting same-sex couples to marry since 2004. However, since 1996, 13 states have banned same-sex marriages, some of which are under legal challenge. A U.S. animated map, made using the SAS® GMAP procedure in SAS® 9.4, shows the changes in the acceptance, rejection, or undecided legal status of this issue, year-by-year. In this study, we also present other SAS data visualization techniques to show how concentrations (percentages) of married same-sex couples and their relationships have evolved over time, thereby instigating changes in marriage laws. Using a SAS GRAPH display, we attempt to show how both the total number of same-sex-couple households and the percent that are married have changed on a state-by-state basis since 2005. These changes are also compared to the year in which marriage laws permitting same-sex marriages were enacted, and are followed over time since that year. It is possible to examine trends on an annual basis from 2005 to 2013 using the American Community Survey one-year estimates. The SAS procedures and SAS code used to create the data visualization and data manipulation techniques are provided.
Bula Ghose, HHS/ASPE
Susan Queen, National Center for Health Statistics
Joan Turek, Health and Human Services
Multivariate statistical analysis plays an increasingly important role as the number of variables being measured increases in educational research. In both cognitive and noncognitive assessments, many instruments that researchers aim to study contain a large number of variables, with each measured variable assigned to a specific factor of the bigger construct. Recalling the educational theories or empirical research, the factor of each instrument usually emerges the same way. Two types of factor analysis are widely used in order to understand the latent relationships among these variables based on different scenarios. (1) Exploratory factor analysis (EFA), which is performed by using the SAS® procedure PROC FACTOR, is an advanced statistical method used to probe deeply into the relationship among the variables and the larger construct and then develop a customized model for the specific assessment. (2) When a model is established, confirmatory factor analysis (CFA) is conducted by using the SAS procedure PROC CALIS to examine the model fit of specific data and then make adjustments for the model as needed. This paper presents the application of SAS to conduct these two types of factor analysis to fulfill various research purposes. Examples using real noncognitive assessment data are demonstrated, and the interpretation of the fit statistics is discussed.
Jun Xu, Educational Testing Service
Steven Holtzman, Educational Testing Service
Kevin Petway, Educational Testing Service
Lili Yao, Educational Testing Service
In health care and epidemiological research, there is a growing need for basic graphical output that is clear, easy to interpret, and easy to create. SAS® 9.3 has a very clear and customizable graphic called a Radar Graph, yet it can only display the unique responses of one variable and would not be useful for multiple binary variables. In this paper we describe a way to display multiple binary variables for a single population on a single radar graph. Then we convert our method into a macro with as few parameters as possible to make this procedure available for everyday users.
Kevin Sundquist, Columbia University Medical Center
Jacob E Julian, Columbia University Medical Center
Faith Parsons, Columbia University Medical Center
The latest releases of SAS® Data Integration Studio, SAS® Data Management Studio and SAS® Data Integration Server, SAS® Data Governance, and SAS/ACCESS® software provide a comprehensive and integrated set of capabilities for collecting, transforming, and managing your data. The latest features in the product suite include capabilities for working with data from a wide variety of environments and types including Hadoop, cloud, RDBMS, files, unstructured data, streaming, and others, and the ability to perform ETL and ELT transformations in diverse run-time environments including SAS®, database systems, Hadoop, Spark, SAS® Analytics, cloud, and data virtualization environments. There are also new capabilities for lineage, impact analysis, clustering, and other data governance features for enhancements to master data and support metadata management. This paper provides an overview of the latest features of the SAS® Data Management product suite and includes use cases and examples for leveraging product capabilities.
Nancy Rausch, SAS
Each night on the news we hear the level of the Dow Jones Industrial Average along with the 'first difference,' which is today's price-weighted average minus yesterday's. It is that series of first differences that excites or depresses us each night as it reflects whether stocks made or lost money that day. Furthermore, the differences form the data series that has the most addressable statistical features. In particular, the differences have the stationarity requirement, which justifies standard distributional results such as asymptotically normal distributions of parameter estimates. Differencing arises in many practical time series because they seem to have what are called 'unit roots,' which mathematically indicate the need to take differences. In 1976, Dickey and Fuller developed the first well-known tests to decide whether differencing is needed. These tests are part of the ARIMA procedure in SAS/ETS® in addition to many other time series analysis products. I'll review a little of what is was like to do the development and the required computing back then, say a little about why this is an important issue, and focus on examples.
David Dickey, NC State University
For many organizations, the answer to whether to manage their data and analytics in a public or private cloud is going to be both. Both can be the answer for many different reasons: common sense logic not to replace a system that already works just to incorporate something new; legal or corporate regulations that require some data, but not all data, to remain in place; and even a desire to provide local employees with a traditional data center experience while providing remote or international employees with cloud-based analytics easily managed through software deployed via Amazon Web Services (AWS). In this paper, we discuss some of the unique technical challenges of managing a hybrid environment, including how to monitor system performance simultaneously for two different systems that might not share the same infrastructure or even provide comparable system monitoring tools; how to manage authorization when access and permissions might be driven by two different security technologies that make implementation of a singular protocol problematic; and how to ensure overall automation of two platforms that might be independently automated, but not originally designed to work together. In this paper, we share lessons learned from a decade of experience implementing hybrid cloud environments.
Ethan Merrill, SAS
Bryan Harkola, SAS
Even if you're not a GIS mapping pro, it pays to have some geographic problem-solving techniques in your back pocket. In this paper we illustrate a general approach to finding the closest location to any given US zip code, with a specific, user-accessible example of how to do it, using only Base SAS®. We also suggest a method for implementing the solution in a production environment, as well as demonstrate how parallel processing can be used to cut down on computing time if there are hardware constraints.
Andrew Clapson, MD Financial Management
Annmarie Smith, HomeServe USA
This paper shows how to create maps from the output of the SAS/STAT® SPP procedure using the same layer of the data. This cut can be done with PROC GINSIDE and the final plot with PROC GMAP. The result is a map that is nicer than the plot produced by PROC SPP.
Alan Silva, University of Brasilia
Do you write reports that sometimes have missing categories across all class variables? Some programmers write all sorts of additional DATA step code in order to show the zeros for the missing rows or columns. Did you ever wonder whether there is an easier way to accomplish this? PROC MEANS and PROC TABULATE, in conjunction with PROC FORMAT, can handle this situation with a couple of powerful options. With PROC TABULATE, we can use the PRELOADFMT and PRINTMISS options in conjunction with a user-defined format in PROC FORMAT to accomplish this task. With PROC SUMMARY, we can use the COMPLETETYPES option to get all the rows with zeros. This paper uses examples from Census Bureau tabulations to illustrate the use of these procedures and options to preserve missing rows or columns.
Chris Boniface, Census Bureau
Janet Wysocki, U.S. Census Bureau