Disability rights groups are using the court system to exact change. As a result, the enforcement of Section 508 and similar laws around the world has become a priority. When you generate SAS® output, you need to protect your organization from litigation. This paper describes concrete steps that help you use SAS® 9.4 Output Delivery System (ODS) to create SAS output that complies with accessibility standards. It also provides recommendations and code samples that are aligned with the accessibility standards defined by Section 508 and the Web Content Accessibility Guidelines (WCAG 2.0).
Glen Walker, SAS
Common tasks that we need to perform are merging or appending SAS® data sets. During this process, we sometimes get error or warning messages saying that the same fields in different SAS data sets have different lengths or different types. If the problems involve a lot of fields and data sets, we need to spend a lot of time to identify those fields and write extra SAS codes to solve the issues. However, if you use the macro in this paper, it can help you identify the fields that have inconsistent data type or length issues. It also solves the length issues automatically by finding the maximum field length among the current data sets and assigning that length to the field. An html report is generated after running the macro that includes the information about which fields' lengths have been changed and which fields have inconsistent data type issues.
Ting Sa, Cincinnati Children's Hospital Medical Center
Although rapid development of information technologies in the past decade has provided forecasters with both huge amounts of data and massive computing capabilities, these advancements do not necessarily translate into better forecasts. Different industries and products have unique demand patterns. There is not yet a one-size-fits-all forecasting model or technique. A good forecasting model must be tailored to the data in order to capture the salient features and satisfy the business needs. This paper presents a multistage modeling strategy for demand forecasting. The proposed strategy, which has no restrictions on the forecasting techniques or models, provides a general framework for building a forecasting system in multiple stages.
Pu Wang, SAS
With increasing data needs, it becomes more and more unwieldy to ensure that all scheduled jobs are running successfully and on time. Worse, maintaining your reputation as an information provider becomes a precarious prospect as the likelihood increases that your customers alert you of reporting issues before you are even aware yourself. By combining various SAS® capabilities and tying them together with concise visualizations, it is possible to track jobs actively and alert customers of issues before they become a problem. This paper introduces a report tracking framework that helps achieve this goal and improve customer satisfaction. The report tracking starts by obtaining table and job statuses and then color-codes them by severity levels. Based on the job status, it then goes deeper into the log to search for potential errors or other relevant information to help assess the processes. The last step is to send proactive alerts to users informing them of possible delays or potential data issues.
Jia Heng, Wyndham Destination Network
The new and highly anticipated SAS® Output Delivery System (ODS) destination for Microsoft Excel is finally here! Available as a production feature in the third maintenance release of SAS® 9.4 (TS1M3), this new destination generates native Excel (XLSX) files that are compatible with Microsoft Office 2010 or later. This paper is written for anyone, from entry-level programmers to business analysts, who uses the SAS® System and Microsoft Excel to create reports. The discussion covers features and benefits of the new Excel destination, differences between the Excel destination and the older ExcelXP tagset, and functionality that exists in the ExcelXP tagset that is not available in the Excel destination. These topics are all illustrated with meaningful examples. The paper also explains how you can bridge the gap that exists as a result of differences in the functionality between the destination and the tagset. In addition, the discussion outlines when it is beneficial for you to use the Excel destination versus the ExcelXP tagset, and vice versa. After reading this paper, you should be able to make an informed decision about which tool best meets your needs.
Chevell Parker, SAS
This work presents a macro for generating a ternary graph that can be used to solve a problem in the ceramics industry. The ceramics industry uses mixtures of clay types to generate a product with good properties that can be assessed according to breakdown pressure after incineration, the porosity, and water absorption. Beyond these properties, the industry is concerned with managing geological reserves of each type of clay. Thus, it is important to seek alternative compositions that present properties similar to the default mixture. This can be done by analyzing the surface response of these properties according to the clay composition, which is easily done in a ternary graph. SAS® documentation does not describe how to adjust an analysis grid in nonrectangular forms on graphs. A macro triaxial, however, can generate ternary graphical analysis by creating a special grid, using an annotated database, and making linear transformations of the original data. The GCONTOUR procedure is used to generate three-dimensional analysis.
IGOR NASCIMENTO, UNB
Zacarias Linhares, IFPI
ROBERTO SOARES, IFPI
This paper presents how Norway, the world's second-largest seafood-exporting country, shares valuable seafood insight using the SAS® Visual Analytics Designer. Three complementary data sources: trade statistics, consumption panel data, and consumer survey data, are used to strengthen the knowledge and understanding about the important markets for seafood, which is a potential competitive advantage for the Norwegian seafood industry. The need for information varies across users and as the amount of data available is growing, the challenge is to make the information available for everyone, everywhere, at any time. Some users are interested in only the latest trade developments, while others working with product innovation are in need of deeper consumer insights. Some have quite advanced analytical skills, while others do not. Thus, one of the most important things is to make the information understandable for everyone, and at the same time provide in-depth insights for the advanced user. SAS Visual Analytics Designer makes it possible to provide both basic reporting and more in-depth analyses on trends and relationships to cover the various needs. This paper demonstrates how the functionality in SAS Visual Analytics Designer is fully used for this purpose, and presents how data from different sources is visualized in SAS Visual Analytics Designer reports located in the SAS® Information Delivery Portal. The main challenges and suggestions for improvements that have been uncovered during the process are also presented in this paper.
Kia Uuskartano, Norwegian Seafood Council
Tor Erik Somby, Norwegian Seafood Council
This paper demonstrates how to use the ODS destination for PowerPoint to create attractive presentations from your SAS® output. Packed with examples, this paper gives you a behind-the-scenes tour of how ODS creates Microsoft PowerPoint presentations. You get an in-depth look at how to customize the ODS PowerPoint style templates that control the appearance of your presentation. With this information you can quickly turn your SAS output into an engaging and informative presentation. This paper is a follow-on to the SAS® Global Forum 2013 paper A First Look at the ODS Destination for PowerPoint.
Tim Hunter, SAS
Longitudinal and repeated measures data are seen in nearly all fields of analysis. Examples of this data include weekly lab test results of patients or performance of test score by children from the same class. Statistics students and analysts alike might be overwhelmed when it comes to repeated measures or longitudinal data analyses. They might try to educate themselves by diving into text books or taking semester-long or intensive weekend courses, resulting in even more confusion. Some might try to ignore the repeated nature of data and take short cuts such as analyzing all data as independent observations or analyzing summary statistics such as averages or changes from first to last points and ignoring all the data in-between. This hands-on presentation introduces longitudinal and repeated measures analyses without heavy emphasis on theory. Students in the workshop will have the opportunity to get hands-on experience graphing longitudinal and repeated measures data. They will learn how to approach these analyses with tools like PROC MIXED and PROC GENMOD. Emphasis will be on continuous outcomes, but categorical outcomes will briefly be covered.
Leanne Goldstein, City of Hope
Amazon Web Services (AWS) as a platform for analytics and data warehousing has gained significant adoption over the years. With SAS® Visual Analytics being one of the preferred tools for data visualization and analytics, it is imperative to be able to deploy SAS Visual Analytics on AWS. This ensures swift analysis and reporting on large amounts of data with SAS Visual Analytics by minimizing the movement of data across environments. This paper focuses on installing SAS Visual Analytics 7.2 in an Amazon Web Services environment, migration of metadata objects and content from previous versions to the deployment on the cloud, and ensuring data security.
Vimal Raj Arockiasamy, Kavi Associates
Rajesh Inbasekaran, Kavi Associates
SAS® functions provide amazing power to your DATA step programming. Some of these functions are essential--they save you from writing volumes of unnecessary code. This talk covers a number of the most useful SAS functions. Some may be new to you, and they will change the way you program and approach common programming tasks. The majority of the functions discussed in this talk work with character data. Some functions search for strings, and others find and replace strings or join strings together. Still others measure the spelling distance between two strings (useful for fuzzy matching). Some of the newest and most amazing functions are not functions at all, but call routines. Did you know that you can sort values within an observation? Did you know that you can identify not only the largest or smallest value in a list of variables, but the second- or third- or nth-largest or smallest value? Knowledge of the functions described here will make you a much better SAS programmer.
Ron Cody, Camp Verde Associates
Government funding is a highly sought-after financing medium by entrepreneurs and researchers aspiring to make a breakthrough in the market and compete with larger organizations. The funding is channeled via federal agencies that seek out promising research and products that improve the extant work. This study analyzes the project abstracts that earned the government funding through a text analytic approach over the past three and a half decades in the fields of Defense and Health and Human Services. This helps us understand the factors that might be responsible for awards made to a project. We collected the data of 100,279 records from the Small Business Innovation and Research (SBIR) website (https://www.sbir.gov). Initially, we analyze the trends in government funding over research by the small businesses. Then we perform text mining on the 55,791 abstracts of the projects that are funded by the Department of Defense and the Department of Health. From the text mining, we portray the key topics and patterns related to the past research and determine the changes in their trends over time. Through our study, we highlight the research trends to enable organizations to shape their business strategy and make better investments in the future research.
Sairam Tanguturi, Oklahoma State University
Sagar Rudrake, Oklahoma state University
Nithish Reddy Yeduguri, Oklahoma State University
The Waze application, purchased by Google in 2013, alerts millions of users about traffic congestion, collisions, construction, and other complexities of the road that can stymie motorists' attempts to get from A to B. From jackknifed rigs to jackalope carcasses, roads can be gnarled by gridlock or littered with obstacles that impede traffic flow and efficiency. Waze algorithms automatically reroute users to more efficient routes based on user-reported events as well as on historical norms that demonstrate typical road conditions. Extract, transform, load (ETL) infrastructures often represent serialized process flows that can mimic highways and that can become similarly snarled by locked data sets, slow processes, and other factors that introduce inefficiency. The LOCKITDOWN SAS® macro, introduced at the Western Users of SAS® Software Conference 2014, detects and prevents data access collisions that occur when two or more SAS processes or users simultaneously attempt to access the same SAS data set. Moreover, the LOCKANDTRACK macro, introduced at the conference in 2015, provides real-time tracking of and historical performance metrics for locked data sets through a unified control table, enabling developers to hone processes in order to optimize efficiency and data throughput. This paper demonstrates the implementation of LOCKANDTRACK and its lock performance metrics to create data-driven, fuzzy logic algorithms that preemptively reroute program flow around inaccessible data sets. Thus, rather than needlessly waiting for a data set to become available or for a process to complete, the software actually anticipates the wait time based on historical norms, performs other (independent) functions, and returns to the original process when it becomes available.
Troy Hughes, Datmesis Analytics
Although business intelligence experts agree that empowering businesses through a well-constructed semantic layer has undisputed benefits, a successful implementation has always been a formidable challenge. This presentation highlights the best practices to follow and mistakes to avoid, leading to a successful semantic layer implementation by using SAS® Visual Analytics. A correctly implemented semantic layer provides business users with quick and easy access to information for analytical and fact-based decision-making. Today, everyone talks about how the modern data platform enables businesses to store and analyze big data, but we still see most businesses trying to generate value from the data that they already store. From self-service to data visualization, business intelligence and descriptive analytics are still the key requirements for any business, and we discuss how to use SAS Visual Analytics to address them all. We also describe the key considerations in strategy, people, process, and data for a successful semantic layer rollout that uses SAS Visual Analytics.
Arun Sugumar, KAVI ASSOCIATES
Vignesh Balasubramanian, Kavi Global
Harsh Sharma, Kav Global
The presentation illustrates techniques and technology to manage complex and large-scale ETL workloads using SAS® and IBM Platform LSF to provide greater control of flow triggering and more complex triggering logic based on a rules-engine approach that parses pending workloads and current activity. Key techniques, configuration steps, and design and development patterns are demonstrated. Pros and cons of the techniques that are used are discussed. Benefits include increased throughput, reduced support effort, and the ability to support more sophisticated inter-flow dependencies and conflicts.
Angus Looney, Capgemini
Since it was first released three years ago, SAS® Environment Manager has been used widely in many customers' production environment. Customers are now familiar with the basic functions of the product. They are asking for more information about advanced topics such as how to manage users, roles, and permissions; how to secure the product; and how to interact with their existing system management tools in their enterprise environment. This paper addresses those advanced but practical issues from a real customer's perspective. The paper first briefly lists what's new in the most current release of SAS Environment Manager. It then discusses the new user management design introduced in the third maintenance release of SAS 9.4. We explain how that design addresses customers' concerns and the best practices for using that design and for setting up user roles and permissions. We also discuss the new process and best practices for configuring SSL for SAS Environment Manager. More and more customers have expressed an interest in integrating SAS Environment Manager with their own system management tools. Our discussion ends with explaining how customers can do that through the SNMP interface built into SAS Environment Manager.
Zhiyong Li, SAS
Gilles Chrzaszcz, SAS Institute
Over the last few years both Microsoft Excel file formats and the SAS® interfaces to those Excel formats have changed. SAS® has worked hard to make the interface between the two systems easier to use. Starting with Comma Separated Variable files and moving to PROC IMPORT and PROC EXPORT, LIBNAME processing, SQL processing, SAS® Enterprise Guide®, JMP®, and then on to the HTML and XML tagsets like MSOFFICE2K, and EXCELXP. Well, there is now a new entry into the processes available for SAS users to send data directly to Excel. This new entry into the ODS arena of data transfer to Excel is the ODS destination called EXCEL. This process is included within SAS ODS and produces native format Excel files for version 2007 of Excel and later. It was first shipped as an experimental version with the first maintenance release of SAS® 9.4. This ODS destination has many features similar to the EXCELXP tagsets.
William E Benjamin Jr, Owl Computer Consultancy LLC
Impala is an open source SQL engine designed to bring real-time, concurrent, ad hoc query capability to Hadoop. SAS/ACCESS® Interface to Impala allows SAS® to take advantage of this exciting technology. This presentation uses examples to show you how to increase your program's performance and troubleshoot problems. We discuss how Impala fits into the Hadoop ecosystem and how it differs from Hive. Learn the differences between the Hadoop and Impala SAS/ACCESS engines.
Jeff Bailey, SAS
So you've heard about SAS® arrays, but you're not sure when or why you would use them. This presentation provides some background on SAS arrays, from explaining what occurs during compile time to explaining how to use them programmatically. It also includes a discussion about how DO loops and macro variables can enhance array usability. Specific examples, including Fahrenheit-to-Celsius temperature conversion, salary adjustments, and data transposition and counting, demonstrate how you can use SAS arrays effectively in your own work and also provide a few caveats about their use.
Andrew Kuligowski, HSN
Lisa Mendez, IMS Government Solutions
Organizations are collecting more structured and unstructured data than ever before, and it is presenting great opportunities and challenges to analyze all of that complex data. In this volatile and competitive economy, there has never been a bigger need for proactive and agile strategies to overcome these challenges by applying the analytics directly to the data rather than shuffling data around. Teradata and SAS® have joined forces to revolutionize your business by providing enterprise analytics in a harmonious data management platform to deliver critical strategic insights by applying advanced analytics inside the database or data warehouse where the vast volume of data is fed and resides. This paper highlights how Teradata and SAS strategically integrate in-memory, in-database, and the Teradata relational database management system (RDBMS) in a box, giving IT departments a simplified footprint and reporting infrastructure, as well as lower total cost of ownership (TCO), while offering SAS users increased analytic performance by eliminating the shuffling of data over external network connections.
Greg Otto, Teradata Corporation
Tho Nguyen, Teradata
Paul Segal, Teradata
This paper demonstrates techniques using SAS® software to combine data from devices and sensors for analytics. Base SAS®, SAS® Data Integration Studio, and SAS® Visual Analytics are used to query external services, import, fuzzy match, analyze, and visualize data. Finally, the benefits of SAS® Event Stream Processing models and alerts is discussed. To bring the analytics of things to life, the following data are collected: GPS data from countryside walks, GPS and score card data from smart phones while playing golf, and meteorological data feeds. These data are combined to test the old adage that golf is a good walk spoiled. Further, the use of alerts and potential for predictive analytics is discussed.
David Shannon, Amadeus Software
For some users, having an annotation facility is an integral part of creating polished graphics for their work. To meet that need, we created a new annotation facility for the ODS Graphics procedures in SAS® 9.3. Now, with SAS® 9.4, the Graph Template Language (GTL) supports annotation as well! In fact, GTL annotation facility has some unique features not available in the ODS Graphics procedures, such as using multiple sets of annotation in the same graph and the ability to bind annotation to a particular cell in the graph. This presentation covers some basic concepts of annotating that are common to both GTL and the ODS Graphics procedures. I apply those concepts to demonstrate some unique abilities of GTL annotation. Come see annotation in action!
Dan Heath, SAS
Seven principles frequently manifest themselves in data management projects. These principles improve two aspects: the development process, which enables the programmer to deliver better code faster with fewer issues; and the actual performance of programs and jobs. Using examples from Base SAS®, SAS® Macro Language, DataFlux®, and SQL, this paper will present seven principles, including environments; job control data; test data; improvement; data locality; minimal passes; and indexing. Readers with an intermediate knowledge of SAS DataFlux Management Platform and/or Base SAS and SAS Macro Language as well as SQL will understand these principles and find ways to use them in their data management projects.
Bob Janka, Modern Analytics
With the popularity of network-attached storage for shared systems, IT shops are increasingly turning to them for SAS® Grid implementations. Given the high I/O demand of SAS large block processing, how do you ensure good performance between network-attached SAS Grid nodes and storage? This paper discusses different types of network-attached implementations, what works, and what does not. It provides advice for bandwidth planning, types of network technology to use, and what has been practically successful in the field.
Tony Brown, SAS
This paper demonstrates the deployment of SAS® Visual Analytics 7.2 with a distributed SAS® LASR™ Analytic Server and an internet-facing web tier. The key factor in this deployment is to establish the secure web server connection using a third-party certificate to perform the client and server authentication. The deployment process involves the following steps: 1) Establish the analytics cluster, which consists of SAS® High-Performance Deployment of Hadoop and the deployment of high-performance analytics environment master and data nodes. 2) Get the third-party signed certificate and the key files. 3) Deploy the SAS Visual Analytics server tier and middle tier. 4) Deploy the standalone web tier with HTTP protocol configured using secure sockets. 5) Deploy the SAS® Web Infrastructure Platform. 6) Perform post-installation validation and configuration to handle the certificate between the servers.
Vimal Raj Arockiasamy, Kavi Associates
Ratul Saha, Kavi Associates
This presentation is an open-ended discussion about techniques for creating graphical reports using SAS® ODS Graphics components. After some introductory comments, this presentation does not have any set content. Instead, the topics discussed are dictated by attendee questions. Come prepared to ask and get answers to your questions.
This paper provides tools and guidelines to assess SAS® skill level during the interview process. Included are interview questions for Base SAS® developers, statisticians, and SAS administration candidates. Whether the interviewer has a deep understanding of or just some familiarity with these areas, the questions provided will allow discussions to uncover skills and usage experience for the skills your company requires.
Jenine Milum, Citi
The SAS® Visual Analytics autoload feature loads any files placed in the autoload location into the public SAS® LASR™ Analytic Server. But what if we want to autoload the tables into a private SAS LASR Analytic Server and we want to load database tables into SAS Visual Analytics and we also want to reload them every day so that the tables reflect any changes in the database? One option is to create a query in SAS® Visual Data Builder and schedule the table for load into the SAS LASR Analytic Server, but in this case, a query has to be created for each table that has to be loaded. It would be rather efficient if all tables could be loaded simultaneously; that would make the job a lot easier. This paper provides a custom solution for autoloading the tables in one step, so that tables in SAS Visual Analytics don't have to be reloaded manually or multiple queries don't have to be rerun whenever there is a change in the source data. This solution automatically unloads and reloads all the tables listed in a text file every day so that the data available in SAS Visual Analytics is always up-to-date. This paper focuses on autoloading the database tables into the private SAS LASR Analytic server in SAS Visual Analytics hosted in a UNIX environment and uses the SASIOLA engine to perform the load. The process involves four steps: initial load, unload, reload, and schedule unload and reload. This paper provides the scripts associated for each step. The initial load needs to be performed only once. Unload and reload have to be performed periodically to refresh the data source, so these steps have to be scheduled. Once the tables are loaded, a description is also added to the tables detailing the name of the table, time of the load, and user ID performing the load, which is similar to the description added when loading the tables manually into the SAS LASR Analytic Server using SAS Visual Analytics Administrator.
Swetha Vuppalanchi, SAS Institute Inc.
Developers working on a production process need to think carefully about ways to avoid future changes that require change control, so it's always important to make the code dynamic rather than hardcoding items into the code. Even if you are a seasoned programmer, the hardcoded items might not always be apparent. This paper assists in identifying the harder-to-reach hardcoded items and addresses ways to effectively use control tables within the SAS® software tools to deal with sticky areas of coding such as formats, parameters, grouping/hierarchies, and standardization. The paper presents examples of several ways to use the control tables and demonstrates why this usage prevents the need for coding changes. Practical applications are used to illustrate these examples.
Frank Ferriola, Financial Risk Group
For me, it's all about avoiding manual effort and repetition. Whether your work involves data exploration, reporting, or analytics, you probably find yourself repeating steps and tasks with each new program, project, or analysis. That repetition adds time to the delivery of results and also contributes to a lack of standardization. This presentation focuses on productivity tips and tricks to help you create a standard and efficient environment for your SAS® work so you can focus on the results and not the processes. Included are the following: setting up your programming environment (comment blocks, environment cleanup, easy movement between test and production, and modularization) sharing easily with your team (format libraries, macro libraries, and common code modules) managing files and results (date and time stamps for logs and output, project IDs, and titles and footnotes)
Marje Fecht, Prowerk Consulting
When you were a kid, did you dress up as a unicorn for Halloween? Did you ever imagine you were a fairy living at the bottom of your garden? Maybe your dreams can come true! Modern-day marketers need a variety of complementary (and sometimes conflicting) skill sets. Forbes and others have started talking about unicorns, a rare breed of marketer--marketing technologists who understand both marketing and marketing technology. A good marketing analyst is part of this new breed; a unicorn isn't a horse with a horn but truly a new breed. It is no longer enough to be good at coding in SAS®--or a whiz with Excel--and to know a few marketing buzzwords. In this session, we explore the skills an analyst needs to turn himself or herself into one of these mythical, highly sought-after creatures.
Emma Warrillow, Data Insight Group Inc. (DiG)
The power of SAS®9 applications allows information and knowledge creation from very large amounts of data. Analysis that used to consist of 10s to 100s of gigabytes (GBs) of supporting data has rapidly grown into the 10s to 100s of terabytes (TBs). This data expansion has resulted in more and larger SAS® data stores. Setting up file systems to support these large volumes of data with adequate performance, as well as ensuring adequate storage space for the SAS temporary files, can be very challenging. Technology advancements in storage and system virtualization, flash storage, and hybrid storage management require continual updating of best practices to configure IO subsystems. This paper presents updated best practices for configuring the IO subsystem for your SAS®9 applications, ensuring adequate capacity, bandwidth, and performance for your SAS®9 workloads. We have found that very few storage systems work ideally with SAS with their out-of-the-box settings, so it is important to convey these general guidelines.
Tony Brown, SAS
Building representative machine learning models that generalize well on future data requires careful consideration both of the data at hand and of assumptions about the various available training algorithms. Data are rarely in an ideal form that enables algorithms to train effectively. Some algorithms are designed to account for important considerations such as variable selection and handling of missing values, whereas other algorithms require additional preprocessing of the data or appropriate tweaking of the algorithm options. Ultimate evaluation of a model's quality requires appropriate selection and interpretation of an assessment criterion that is meaningful for the given problem. This paper discusses the most common issues faced by machine learning practitioners and provides guidance for using these powerful algorithms to build effective models.
Brett Wujek, SAS
Funda Gunes, SAS Institute
Patrick Hall, SAS
SAS® solutions that run in Hadoop provide you with the best tools to transform data in Hadoop. They also provide insights to help you make the right decisions for your business. It is possible to incorporate SAS products and solutions into your shared Hadoop cluster in a cooperative fashion with YARN to manage the resources. Best practices and customer examples are provided to show how to build and manage a shared cluster with SAS applications and products.
James Kochuba, SAS
Connecting database schemas to libraries in the SAS® metadata is a very important part of setting up a functional and useful environment for business users. This task can be quite difficult for the untrained administrator. This paper addresses the key configuration items that often go unnoticed but can make a big difference. Using the wrong options can lead to poor database performance or even to a total lockdown, depending on the number of connections to the database.
Mathieu Gaouette, Videotron
Architects do not see a single architectural solution, such as SAS® Grid Manager, satisfying the varied needs of users across the enterprise. Multi-tenant environments need to support data movement (such as ETL), analytic processing (such as forecasting and predictive modeling), and reporting, which can include everything from visual data discovery to standardized reporting. SAS® users have a myriad of choices that might seem at odds with one another, such as in-database versus in-memory, or data warehouses (tightly structured schemes) versus data lakes and event stream processing. Whether fit for purpose or for potential, these choices force us as architects to modernize our thinking about the appropriateness of architecture, configuration, monitoring, and management. This paper discusses how SAS® Grid Manager can accommodate the myriad use cases and the best practices used in large-scale, multi-tenant SAS environments.
Jan Bigalke, Allianz Manged Operations & Services SE
Greg Nelson, ThotWave
Higher education is taking on big data initiatives to help improve student outcomes and operational efficiencies. This paper shows how one institution went form White Paper to action. Challenges such as developing a common definition of big data, access to certain types of data, and bringing together institutional groups who had little previous contact are highlighted. The stepwise, collaborative approach taken is highlighted. How the data was eventually brought together and analyzed to generate actionable results is also covered. Learn how to make big data a part of your analytics toolbox.
Stephanie Thompson, Datamum
The surge of data and data sources in marketing has created an analytical bottleneck in most organizations. Analytics departments have been pushed into a difficult decision: either purchase black-box analytical tools to generate efficiencies or hire more analysts, modelers, and data scientists. Knowledge gaps stemming from restrictions in black-box tools or from backlogs in the work of analytical teams have resulted in lost business opportunities. Existing big data analytics tools respond well when dealing with large record counts and small variable counts, but they fall short in bringing efficiencies when dealing with wide data. This paper discusses the importance of an agile modeling engine designed to deliver productivity, irrespective of the size of the data or the complexity of the modeling approach.
Mariam Seirafi, Cornerstone Group of Companies
Your marketing team would like to pull data from its different marketing activities into one report. What happens in Vegas might stay in Vegas, but what happens in your data does not have to stay there, locked in different tools or static spreadsheets. Learn how to easily bring data from Google Analytics, Facebook, and Twitter into SAS® Visual Analytics to create interactive explorations and reports on this data along with your other data for better overall understanding of your marketing activity.
I-Kong Fu, SAS
Mark Chaves, SAS
Andrew Fagan, SAS
Microsoft Office has over 1 billion users worldwide, making it one of the most successful pieces of software on the market today. Imagine combining the familiarity and functionality of Microsoft Office with the power of SAS® to include SAS content in a Microsoft Office document. By using SAS® Office Analytics, you can create Microsoft Excel worksheets that are not just static reports, but interactive documents. This paper looks at opening, filtering, and editing data in an Excel worksheet. It shows how to create an interactive experience in Excel by leveraging Visual Basic for Applications using SAS data and stored processes. Finally this paper shows how to open SAS® Visual Analytics reports into Excel, so the interactive benefits of SAS Visual Analytics are combined with the familiar interface of an Excel worksheet. All of these interactions with SAS content are possible without leaving Microsoft Excel.
Tim Beese, SAS
Surveys are a critical research component when trying to gather information about a population of people and their knowledge, attitudes, and experiences. Researchers often implement surveys after a population has participated in a class or educational program. Survey developers often create sets of items, which when analyzed as a group, can measure a construct that describes the underlying behavior, attribute, or characteristic of a study participant. After testing for the reliability and validity of individual items, the group of items can be combined to create a scale score that measures a predefined construct. For example, in education research, a construct can measure students' use of global reading strategies or teachers' confidence in literacy instruction. The number of items that compose a construct can vary widely. Construct scores can be used as outcome measures to assess the impact of a program on its participants. Our example is taken from a project evaluating a teacher's professional development program aimed at improving students' literacy in target subject classes. The programmer is tasked with creating such scores quickly. With an unlimited amount of time to spend, a programmer could operationalize the creation of these constructs by manually entering predefined formulas in the DATA step. More often, time and resources are limited. Therefore, the programmer is more efficient by automating this process by using macro programs. In this paper, we present a technique that uses an externally created key data set that contains the construct specifications and corresponding items of each construct. By iterating through macro variables created from this key data set, we can create construct score variables for varying numbers of items. This technique can be generalized to other processing tasks that involve any number of mathematical combinations of variables to create one single score.
Vincent Chan, IMPAQ International
Lorena Ortiz, IMPAQ International
Nowadays, the recommender system is a popular tool for online retailer businesses to predict customers' next-product-to-buy (NPTB). Based on statistical techniques and the information collected by the retailer, an efficient recommender system can suggest a meaningful NPTB to customers. A useful suggestion can reduce the customer's searching time for a wanted product and improve the buying experience, thus increasing the chance of cross-selling for online retailers and helping them build customer loyalty. Within a recommender system, the combination of advanced statistical techniques with available information (such as customer profiles, product attributes, and popular products) is the key element in using the retailer's database to produce a useful suggestion of an NPTB for customers. This paper illustrates how to create a recommender system with the SAS® RECOMMEND procedure for online business. Using the recommender system, we can produce predictions, compare the performance of different predictive models (such as decision trees or multinomial discrete-choice models), and make business-oriented recommendations from the analysis.
Miao Nie, ABN AMRO Bank
Shanshan Cong, SAS Institute
Formats are powerful tools within the SAS® System. They can be used to change how information is brought into SAS, to modify how it is displayed, and even to reshape the data itself. Base SAS® comes with a great many predefined formats, and it is even possible for you to create your own specialized formats. This paper briefly reviews the use of formats in general and then covers a number of aspects of user-generated formats. Since formats themselves have a number of uses that are not at first apparent to the new user, we also look at some of the broader applications of formats. Topics include building formats from data sets, using picture formats, transformations using formats, value translations, and using formats to perform table lookups.
Art Carpenter, California Occidental Consultants
Have you ever wondered the best way to secure SAS® software? Many pieces need to be secured-from passwords and authentication to encryption of data at rest and in transit. During this presentation we discuss several security tasks that are important when setting up SAS, and we provide some tips for finding the information you need in the mountains of SAS documentation. The tasks include 1) enabling basic network security (TLS) using the automation options in the latest release of SAS® 9.4; 2) configuring HTTPS for the SAS middle tier (once the TLS connection is established); 3) setting up user accounts and IDs with an eye toward auditing user activity. Whether you are an IT expert who is concerned about corporate security policies, a SAS Administrator who needs to be able to describe SAS capabilities and configure SAS software to work with corporate IT standards, or an auditor who needs to review specific questions about security vulnerabilities, this presentation is for you.
Robin Crumpton, SAS
Donna Bennett, SAS
Qiana Eaglin, SAS
You would think that training as a code breaker, similar to those who were employed during the Second World War, wouldn't be necessary to perform the routine SAS® programming duties of your job, such as debugging code. However, if the author of the code doesn't incorporate good elements of style in his or her program, the task of reviewing code becomes arduous and tedious for others. Style touches upon very specific aspects of writing code--indention for code and comments, casing of keywords, variables, and data set names, and spacing between PROC and DATA steps. Paying attention to these specific issues enhances the reusability and lifespan of your code. By using style to make your code readable, you'll impress your superiors and grow as a programmer.
Jay Iyengar, Data Systems Consultants
With all the talk of big data and visual analytics, we sometimes forget how important, and often difficult, it is to get external data into SAS®. In this paper, we review some common data sources such as delimited sources (for example, comma-separated values format [CSV]) as well as structured flat files and the programming steps needed to successfully load these files into SAS. In addition to examining the INFILE and INPUT statements, we look at some methods for dealing with bad data. This paper assumes only basic SAS skills, although the topic can be of interest to anyone who needs to read external files.
Peter Eberhardt, Fernwood Consulting Group Inc.
Audrey Yeo, Athene
Even though marketing is inevitable in every business, every year the marketing budget is limited and prudent fund allocations are required to optimize marketing investment. In many businesses, the marketing fund is allocated based on the marketing manager's experience, departmental budget allocation rules, and sometimes 'gut feelings' of business leaders. Those traditional ways of budget allocation yield suboptimal results and in many cases lead to money being wasted on irrelevant marketing efforts. Marketing mixed models can be used to understand the effects of marketing activities and identify the key marketing efforts that drive the most sales among a group of competing marketing activities. The results can be used in marketing budget allocation to take out the guesswork that typically goes into the budget allocation. In this paper, we illustrate practical methods for developing and implementing marketing mixed modeling using SAS® procedures. Real-life challenges of marketing mixed model development and execution are discussed, and several recommendations are provided to overcome some of those challenges.
Delali Agbenyegah, Alliance Data Systems
Packing a carry-on suitcase for air travel and designing a report for mobile devices have a lot in common. Your carry-on suitcase contains indispensable items for your journey, and the contents are limited by tight space. Your reports for mobile devices face similar challenges--data display is governed by tight real estate space and other factors such as users' shorter attention span and information density come into play. How do you overcome these challenges while displaying data effectively for your mobile users? This paper demonstrates how smaller real estate on mobile devices, as well as device orientation in portrait or landscape mode, influences best practices for designing reports. The use of containers, layouts, filters, information windows, and carefully selected objects enable you to design and guide user interaction effectively. Appropriate selection of font styles, font sizes, and colors reduce distraction and enhance quick user comprehension. By incorporating these recommendations into your report design, you can produce reports that display seamlessly on mobile devices and browsers.
Lavanya Mandavilli, SAS
Anand Chitale, SAS
We use SAS® Forecast Studio to develop time series models for the daily number of taxi trips and the daily taxi fare revenue of Yellow Cabs in New York City, using publicly available data from 1/1/2011 to 6/30/2015. Interest centers on trying to assess the effects (if any) of the Uber ride-hailing service on Yellow Cabs in New York City.
Simon Sheather, Texas A&M University
In healthcare and other fields, the importance of cell suppression as a means to avoid unintended disclosure or identification of Protected Health Information (PHI) or any other sensitive data has grown as we move toward dynamic query systems and reports. Organizations such as Centers for Medicare & Medicaid Services (CMS), the National Center for Health Statistics (NCHS), and the Privacy Technical Assistance Center (PTAC) have outlined best practices to help researchers, analysts, report writers, and others avoid unintended disclosure for privacy reasons and to maintain statistical validity. Cell suppression is a crucial consideration during report design and can be a substantial hurdle in the dissemination of information. Often, the goal is to display as much data as possible without enabling the identification of individuals and while maintaining statistical validity. When designing reports using SAS® Visual Analytics, achieving suppression can be handled multiple ways. One way is to suppress the data before loading it into the SAS® LASR™ Analytic Server. This has the drawback that a user cannot take full advantage of the dynamic filtering and aggregation available with SAS Visual Analytics. Another method is to create formulas that govern how SAS Visual Analytics displays cells within a table (crosstab) or bars within a chart. The logic can be complex and can meet a variety of needs. This presentation walks through examples of the latter methodology, namely, the creation of suppression formulas and how to apply them to report objects.
Marc Flore, University of New Hampshire
Whether you are deploying a new capability with SAS® or modernizing the tool set that people already use in your organization, change management is a valuable practice. Sharing the news of a change with employees can be a daunting task and is often put off until the last possible second. Organizations frequently underestimate the impact of the change, and the results of that miscalculation can be disastrous. Too often, employees find out about a change just before mandatory training and are expected to embrace it. But change management is far more than training. It is early and frequent communication; an inclusive discussion; encouraging and enabling the development of an individual; and facilitating learning before, during, and long after the change. This paper not only showcases the importance of change management but also identifies key objectives for a purposeful strategy. We outline our experiences with both successful and not so successful organizational changes. We present best practices for implementing change management strategies and highlighting common gaps. For example, developing and engaging Change Champions from the beginning alleviates many headaches and avoids disruptions. Finally, we discuss how the overall company culture can either support or hinder the positive experience change management should be and how to engender support for formal change management in your organization.
Greg Nelson, ThotWave
Graphs are essential for many clinical and health care domains, including analysis of clinical trials safety data and analysis of the efficacy of the treatment, such as change in tumor size. Creating such graphs is a breeze with procedures from SAS® 9.4 ODS Graphics. This paper shows how to create many industry-standard graphs such as Lipid Profile, Swimmer Plot, Survival Plot, Forest Plot with Subgroups, Waterfall Plot, and Patient Profile using Study Data Tabulation Model (SDTM) data with just a few lines of code.
Sanjay Matange, SAS
Managing and coordinating various aspects of a report can be challenging. This is especially true when the structure and composition of the report are data driven. For complex reports, the inclusion of attributes such as color, labeling, and the ordering of items complicates the coding process. Fortunately we have some powerful reporting tools in SAS® that allow the process to be automated to a great extent. In the example presented in this paper, we are tasked with generating an Excel spreadsheet that ranks types of injuries within age groups. A given injury type is to be assigned a constant color regardless of its rank, and the labeling is to include not only the injury label but the actual count as well. Of course the user needs to be able to control such things as the age groups, color selection and order, and number of desired ranks.
Art Carpenter, California Occidental Consultants
Suppose that you have a very large data set with some specific values in one of the columns of the data set, and you want to classify the entire data set into different comma-separated-values format (CSV) sheets based on the values present in that specific column. Perhaps you might use codes using IF/THEN and ELSE statement conditions in SAS®, along with some OUTPUT statements. If you divide that data set into csv sheets, it is more frustrating to use the conventional, manual process of converting each of the separated data sets into csv files. This paper shows a comparative study of using the Macro command in SAS with the help of the proc Export statement and the Output Delivery System (ODS) command using proc tabulate. In these two processes, the whole tedious process is done automatically using the SAS code.
Saurabh Nandy, Oklahoma State University
SAS® macros offer a very flexible way of developing, organizing, and running SAS code. In many systems, programming components are stored as macros in files and called as macros via a variety of means so that they can perform the task at hand. Consideration must be given to the organization of these files in a manner consistent with proper use and retention. An example might be the development of some macros that are company-wide in potential application, others that are departmental, and still others that might be for personal or individual user. Super-imposed on this structure in a factorial fashion are the need to have areas that are considered: (1) production - validated, (2) archived - retired and (3) developmental. Several proposals for accomplishing this are discussed as well as how this structure might interrelate with stored processes available in some SAS systems. The use of these macros in systems ranging from simple background batch processing to highly interactive SAS and BI processing are discussed. Specifically, the drivers or driver files are addressed. Pro's and con's to all approaches are considered.
Roger Muller, Data-To-Events, Inc.
Within SAS® literally millions of colors are available for use in our charts, graphs, and reports. We can name these colors using techniques that include color wheels, RGB (Red, Green, Blue) HEX codes, and HLS (Hue, Lightness, Saturation) HEX codes. But sometimes I just want to use a color by name. When I want purple, I want to be able to ask for purple, not CX703070 or H03C5066. But am I limiting myself to just one purple? What about light purple or pinkish purple? Do those colors have names or must I use the codes? It turns out that they do have names. Names that we can use. Names that we can select, names that we can order, names that we can use to build our graphs and reports. This paper shows you how to gather color names and manipulate them so that you can take advantage of your favorite purple, be it purple, grayish purple, vivid purple, or pale purplish blue.
Art Carpenter, California Occidental Consultants
Abstract quasi-complete separation is a commonly detected issue in logit/probit models. Quasi-complete separation occurs when a dependent variable separates an independent variable or a combination of several independent variables to a certain degree. In other words, levels in a categorical variable or values in a numeric variable are separated by groups of discrete outcome variables. Most of the time, this happens in categorical independent variable(s). Quasi-complete separation can cause convergence failures in logistic regression, which consequently result in a large coefficient estimate and standard errors inflation. Therefore, it can potentially yield biased results. This paper provides comprehensive reviews with various approaches for correcting quasi-complete separation in binary logistic regression model and presents hands-on examples from the healthcare insurance industry. First, it introduces the concept of quasi-complete separation and how to diagnosis the issue. Then, it provides step-by-step guidelines for fixing the problem, from a straightforward data configuration approach to a complicated statistical modeling approach such as the EXACT method, the FIRTH method, and so on.
Xinghe Lu, Amerihealth Caritas
You might already know that SAS® Studio tasks provide you with prompts to fill in the blanks in SAS® code and help you navigate SAS syntax. But did you know that SAS Studio tasks are not only designed to allow you to modify them, but there is an entire common task model (CTM) provided for you to build your own? You can build basic utilities or complex reports. You can just run SAS code or you can have elaborate prompts to dynamically generate code. And since SAS Studio runs in a web browser, your tasks are browser-based and can easily be shared with others. In this paper, you learn how to take a typical SAS program and create a SAS Studio task from it. When the task is run, you see web-based prompts for the dynamic portions of the program and HTML output delivered to your browser.
Kris Kiser, SAS
Christie Corcoran, SAS
Marie Dexter, SAS
Amy Peters, SAS
This workshop shows you how to create powerful interactive visualizations using SAS® Stored Processes to deliver data to JavaScript objects. We construct some simple HTML to make a simple dashboard layout with a range of connected graphs and some static data. Then, we replace the static data with SAS Stored Processes that we build, which use the STREAM and JSON procedures in SAS® 9.4 to deliver the data to the objects. You will see how easy it is to build a bespoke dashboard that resembles that in SAS® Visual Analytics with only SAS Stored Processes and some basic HTML, JavaScript, and CSS 3.
Phil Mason, Wood Street Consultants Ltd.
Session SAS3460-2016:
Creating Custom Map Regions in SAS® Visual Analytics
Discover how to answer the Where? question of data visualization by leveraging SAS® Visual Analytics. Geographical data elements within SAS Visual Analytics provides users the capability to quickly map data by countries and regions, by states or provinces, and by the centroid of US ZIP codes. This paper demonstrates how easy it is to map by these elements. Of course, once your manager sees your new maps they will ask for more granular shapes (such as US counties or US ZIP codes). Respond with Here it is! Follow the steps provided to add custom boundaries by parsing the shape files into consumable data objects and loading these custom boundaries into SAS Visual Analytics.
Angela Hall, SAS
This paper shows how to program a powerful Q-gram algorithm for measuring the similarity of two character strings. Along with built-in SAS® functions--such as SOUNDEX, SPEDIS, COMPGED, and COMPLEV--Q-gram can be a valuable tool in your arsenal of string comparators. Q-gram is especially useful when measuring the similarity of strings that are intuitively identical, but which have a different word order, such as John Smith and Smith, John.
Joe DeShon, Boehringer-Ingelheim
SAS® Grid Manager, as well as other grid computing technologies, have a set of great capabilities that we, IT professionals, love to have in our systems. This technology increases high availability, allows parallel processing, facilitates increasing demand by scale out, and offers other features that make life better for those managing and using these environments. However, even when business users take advantage of these features, they are more concerned about the business part of the problem. Most of the time business groups hold the budgets and are key stakeholders for any SAS Grid Manager project. Therefore, it is crucial to demonstrate to business users how they will benefit from the new technologies, how the features will improve their daily operations, help them be more efficient and productive, and help them achieve better results. This paper guides you through a process to create a strong and persuasive business plan that translates the technology features from SAS Grid Manager to business benefits.
Marlos Bosso, SAS
You've heard that SAS® Output Delivery System (ODS) Graphics provides a powerful and detailed syntax for creating custom graphs, but for whatever reason you still haven't added them to your bag of SAS® tricks. Let's change that! We will also present a code playground based on Microsoft Office that will enable you to quickly try out a variety of prepared SAS ODS Graphics examples, tweak the code, and see the results--all directly from Microsoft Excel. More experienced users will also find the code playground (which is similar in spirit to Google Code Playground or JSFiddle) useful for compiling SAS ODS Graphics code snippets for themselves and for sharing with colleagues, as well as for creating dashboards hosted by Microsoft Excel or Microsoft PowerPoint that contain precisely sized and placed SAS graphics.
Ted Conway, Discover Financial Services
Ezequiel Torres, MPA Healthcare Solutions, Inc.
Customer lifetime value (LTV) estimation involves two parts: the survival probabilities and profit margins. This article describes the estimation of those probabilities using discrete-time logistic hazard models. The estimation of profit margins is based on linear regression. In the scenario, when outliers are present among margins, we suggest applying robust regression with PROC ROBUSTREG.
Vadim Pliner, Verizon Wireless
The DATA step and DS2 both offer the user a built-in general purpose hash object that has become the go-to tool for many data analysis problems. However, there are occasions where the best solution would require a custom object specifically tailored to the problem space. The DS2 Package syntax allows the user to create custom objects that can form linked structures in memory. With DS2 Packages it is possible to create lists or tree structures that are custom tailored to the problem space. For data that can describe a great many more states than actually exist, dynamic structures can provide an extremely compact way to manipulate and analyze the data. The SAS® In-Database Code Accelerator allows these custom packages to be deployed in parallel on massive data grids.
Robert Ray, SAS
Transforming data into intelligence for effective decision-making support is critically based on the role and capacity of the Office of Institutional Research (OIR) in managing the institution's data. Presenters share their journey from providing spreadsheet data to developing SAS® programs and dashboards using SAS® Visual Analytics. Experience gained and lessons learned are also shared at this session. The presenters demonstrate two dashboards the OIR office developed: one for classroom utilization and one for the university's diversity initiatives. The presenters share the steps taken for creating the dashboard and they describe the process the office took in getting the stakeholders involved in determining the key performance indicators (KPIs) and in evaluating and providing feedback regarding the dashboard. They share their experience gained and lessons learned in building the dashboard.
Shweta Doshi, University of Georgia
Julie Davis, University of Georgia
This presentation describes a technique for dealing with the precision problems inherent in datetime values containing nanosecond data. Floating-point values cannot store sufficient precision for this, and this limitation can be a problem for transactional data where nanoseconds are pertinent. Methods discussed include separation of variables and using the special GROUPFORMAT feature of the BY statement with MERGE in the DATA step.
Rick Langston, SAS
Do you know how many different ways SAS® Studio can run your programs with SAS® Grid Manager? SAS Studio is the latest and coolest interface to the SAS® software. As such, we want to use it in most situations, including sites that leverage SAS Grid Manager. Are you new to SAS and want to be guided by a modern GUI? SAS Studio is here to help you. Are you a code fanatic, who wants total control of how your program runs to harness the full power of SAS Grid Manager? Sure, SAS Studio is for you, too. This paper covers all the different SAS Studio editions. You can learn how to connect each of them to SAS Grid Manager and discover best practices for harnessing a high-performance SAS analytics environment, while avoiding potential pitfalls.
Edoardo Riva, SAS
Metadata is an integral and critical part of any environment. Metadata facilitates resource discovery and provides unique identification of every single digital component of a system, simple to complex. SAS® Visual Analytics, one of the most powerful analytics visualization platforms, leverages the power of metadata to provide a plethora of functionalities for all types of users. The possibilities range from real-time advanced analytics and power-user reporting to advanced deployment features for a robust and scalable distributed platform to internal and external users. This paper explains the best practices and advanced approaches for designing and managing metadata for a distributed global SAS Visual Analytics environment. Designing and building the architecture of such an environment requires attention to important factors like user groups and roles, access management, data protection, data volume control, performance requirements, and so on. This paper covers how to build a sustainable and scalable metadata architecture through a top-down hierarchical approach. It helps SAS Visual Analytics Data Administrators to improve the platform benchmark through memory mapping, perform administrative data load (AUTOLOAD, Unload, Reload-on-Start, and so on), monitor artifacts of distributed SAS® LASR™ Analytic Servers on co-located Hadoop Distributed File System (HDFS), optimize high-volume access via FullCopies, build customized FLEX themes, and so on. It showcases practical approaches to managing distributed SAS LASR Analytic Servers, offering guest access for global users, managing host accounts, enabling Mobile BI, using power-user reporting features, customizing formats, enabling home page customization, using best practices for environment migration, and much more.
Ratul Saha, Kavi Associates
Vimal Raj Arockiasamy, Kavi Associates
Vignesh Balasubramanian, Kavi Global
When you create reports in SAS® Visual Analytics, you automatically have reports that work on mobile devices. How do you ensure that the reports are easy to use and understand on all of your desktops, tablets, and phones? This paper describes how you can design powerful reports that your users can easily view on all their devices. You also learn how to deliver reports to users effortlessly, ensuring that they always have the latest reports. Examples show you tips and techniques to use that create the best possible reports for all devices. The paper provides sample reports that you can download and interactively view on your own devices. These reports include before and after examples that illustrate why the recommended best practices are important. By using these tips and techniques you learn how to design a report once and have confidence that it can be viewed anywhere.
Karen Mobley, SAS
Rich Hogan, SAS
Pratik Phadke, SAS
This paper covers developing SAS® Studio repositories. SAS Studio introduced a new way of developing custom tasks using an XML markup specification and the Apache Velocity templating language. SAS Studio repositories build on this framework and provide a flexible way to package custom tasks and snippets. After tasks and snippets are packaged into a repository, they can be shared with users inside your organization or outside your organization. This paper uses several examples to help you create your first repository.
Swapnil Ghan, SAS
Michael Monaco, SAS
Amy Peters, SAS
As SAS® programmers, we often develop listings, graphs, and reports that need to be delivered frequently to our customers. We might decide to manually run the program every time we get a request, or we might easily schedule an automatic task to send a report at a specific date and time. Both scenarios have some disadvantages. If the report is manual, we have to find and run the program every time someone request an updated version of the output. It takes some time and it is not the most interesting part of the job. If we schedule an automatic task in Windows, we still sometimes get an email from the customers because they need the report immediately. That means that we have to find and run the program for them. This paper explains how we developed an on-demand report platform using SAS® Enterprise Guide®, SAS® Web Application Server, and stored processes. We had developed many reports for different customer groups, and we were getting more and more emails from them asking for updated versions of their reports. We felt we were not using our time wisely and decided to create an infrastructure where users could easily run their programs through a web interface. The tool that we created enables SAS programmers to easily release on-demand web reports with minimum programming. It has web interfaces developed using stored processes for the administrative tasks, and it also automatically customizes the front end based on the user who connects to the website. One of the challenges of the project was that certain reports had to be available to a specific group of users only.
Romain Miralles, Genomic Health
Today's employment and business marketplace is highly competitive. As a result, it is necessary for SAS® professionals to differentiate themselves from the competition. Success depends on a number of factors, including positioning yourself with the necessary technical skills in relation to the competition. This presentation illustrates how SAS professionals can acquire a wealth of knowledge and enhance their skills by accessing valuable and free web content related to SAS. With the aid of a web browser and the Internet, anyone can access published PDF papers, Microsoft Word documents, Microsoft PowerPoint presentations, comprehensive student notes, instructor lesson plans, hands-on exercises, webinars, audios, videos, a comprehensive technical support website maintained by SAS, and more to acquire the essential expertise that is needed to cut through all the marketplace noise and begin differentiating yourself to secure desirable opportunities with employers and clients.
Kirk Paul Lafler, Software Intelligence Corporation
The new SAS® High-Performance Statistics procedures were developed to respond to the growth of big data and computing capabilities. Although there is some documentation regarding how to use these new high-performance (HP) procedures, relatively little has been disseminated regarding under what specific conditions users can expect performance improvements. This paper serves as a practical guide to getting started with HP procedures in SAS®. The paper describes the differences between key HP procedures (HPGENSELECT, HPLMIXED, HPLOGISTIC, HPNLMOD, HPREG, HPCORR, HPIMPUTE, and HPSUMMARY) and their legacy counterparts both in terms of capability and performance, with a particular focus on discrepancies in real time required to execute. Simulations were conducted to generate data sets that varied on the number of observations (10,000, 50,000, 100,000, 500,000, 1,000,000, and 10,000,000) and the number of variables (50, 100, 500, and 1,000) to create these comparisons.
Diep Nguyen, University of South Florida
Sean Joo, University of South Florida
Anh Kellermann, University of South Florida
Jeff Kromrey, University of South Florida
Jessica Montgomery, University of South Florida
Patricia Rodríguez de Gil, University of South Florida
Yan Wang, University of South Florida
Discover how to document your SAS® programs, data sets, and catalogs with a few lines of code that include SAS functions, macro code, and SAS metadata. Do you start every project with the best of intentions to document all of your work, and then fall short of that aspiration when deadlines loom? Learn how your programs can automatically update your processing log. If you have ever wondered who ran a program that overwrote your data, SAS has the answer! And If you don't want to be tracing back through a year's worth of code to produce a codebook for your client at the end of a contract, SAS has the answer!
Roberta Glass, Abt Associates
Louise Hadden, Abt Associates
In any organization where people work with data, it is extremely unlikely that there will be only one way of doing things. Things like divisional silos and differences in education, training, and subject matter often result in a diversity of tools and levels of expertise. One situation that frequently arises is that of 'code versus click.' Let's call it the difference between code-based, 'power user' data tools and simpler, purely graphic point-and-click tools such as Microsoft Excel. Even though the work itself might be quite similar, differences in analysis tools often mean differences in vocabulary and experience, and it can be difficult to convert users of one tool to another. This discussion will highlight the potential challenges of SAS® adoption in an Excel-based workplace and propose strategies to gain new SAS advocates in your organization.
Andrew Clapson, MD Financial Management
Dynamic interactive visual displays known as dashboards are most effective when they show essential graphs, tables, statistics, and other information where data is the star. The first rule for creating an effective SAS® dashboard is to keep it simple. Striking a balance between content and style, a dashboard should be void of clutter so as not to distract from or obscure the information displayed. The second rule of effective dashboard design is to display data that meets one or more business or organizational objectives. To accomplish this, the elements in a dashboard should convey a format easily understood by its intended audience. Attendees learn how to create dynamic interactive user- and data-driven dashboards, graphical and table-driven dashboards, statistical dashboards, and drill-down dashboards with a purpose by using Base SAS® programming techniques including the DATA step, PROC FORMAT, PROC PRINT, PROC MEANS, PROC SQL, ODS, statistical graphics, and HTML.
Kirk Paul Lafler, Software Intelligence Corporation
Many organizations need to report forecasts of large numbers of time series at various levels of aggregation. Numerous model-based forecasts that are statistically generated at the lowest level of aggregation need to be combined to form an aggregate forecast that is not required to follow a fixed hierarchy. The forecasts need to be dynamically aggregated according to any subset of the time series, such as from a query. This paper proposes a technique for large-scale automatic forecast aggregation and uses SAS® Forecast Server and SAS/ETS® software to demonstrate this technique.
Michael Leonard, SAS
For any SAS® Platform Administrator, the challenge is to ensure the right jobs run for the right users with the right resources at the right time. Running on a grid-enabled SAS platform greatly assists this task, allowing prioritization of processes to ensure an optimal mix of critical batch processing and interactive user sessions. A key feature of SAS® Grid Computing 9.4 is grid options sets, which make it even easier to manage options, resources, and queues. Grid options sets allow different user groups to run SAS applications on a common shared SAS Application Server Context (such as SASApp), but the applications are still tailored to suit the requirements of each application and user group. Offering much more than just queue selection, grid options sets in SAS Grid Computing 9.4 now allow SAS Platform Administrators to effectively have 'conditional queues,' 'conditional configuration options,' and 'conditional resource requirements,' all centrally located and managed within a single SAS Application Server Context. This paper examines the benefits of SAS Grid Computing processing, looks at some of the issues encountered in previous versions of SAS Grid Computing, and explains how these issues are managed more effectively on a SAS 9.4 platform, thanks to SAS grid options sets.
Andrew Howell, ANJ Solutions P/L
Whether you have been programming in SAS® for years, are new to it, or have dabbled with SAS® Enterprise Guide® before, this hands-on workshop sheds some light on the depth, breadth, and power of the Enterprise Guide environment. With all the demands on your time, you need powerful tools that are easy to learn and deliver end-to-end support for your data exploration, reporting, and analytics needs. Included are the following: data exploration tools formatting code--cleaning up after your coworkers enhanced programming environment (and how to calm it down) easily creating reports and graphics producing the output formats you need (XLS, PDF, RTF, HTML) workspace layout start-up processing notes to help your coworkers use your processes This workshop uses SAS Enterprise Guide 7.1, but most of the content is applicable to earlier versions.
Marje Fecht, Prowerk Consulting
Data-driven decision making is critical for any organization to thrive in this fiercely competitive world. The decision-making process has to be accurate and fast in order to stay a step ahead of the competition. One major problem organizations face is huge data load times in loading or processing the data. Reducing the data loading time can help organizations perform faster analysis and thereby respond quickly. In this paper, we compared the methods that can import data of a particular file type in the shortest possible time and thereby increase the efficiency of decision making. SAS® takes input from various file types (such as XLS, CSV, XLSX, ACCESS, and TXT) and converts that input into SAS data sets. To perform this task, SAS provides multiple solutions (such as the IMPORT procedure, the INFILE statement, and the LIBNAME engine) to import the data. We observed the processing times taken by each method for different file types with a data set containing 65,535 observations and 11 variables. We executed the procedure multiple times to check for variation in processing time. From these tests, we recorded the minimum processing time for the combination of procedure and file type. From our analysis of processing times taken by each importing technique, we observed that the shortest processing times for CSV and TXT files, XLS and XLSX files, and ACCESS files are the INFILE statement, the LIBNAME engine, and PROC IMPORT, respectively.
Divya Dadi, Oklahoma State University
Rahul Jhaver, Oklahoma State University
This session illustrates how to quickly create rates over a specified period of time, using the MEANS and EXPAND procedures. For example, do you want to know how to use the power of SAS® to create a year-to-date, rolling 12-month, or monthly rate? At Kaiser Permanente, we use this technique to develop Emergency Department (ED) use rates, ED admit rates, patient day rates, readmission rates, and more. A powerful function of PROC MEANS, given a database table with several dimensions and one or more facts, is to perform a mathematical calculation on fact columns across several different combinations of dimensions. For example, if a membership database table exists with the dimensions member ID, year-month, line of business, medical office building, and age grouping, PROC MEANS can easily determine and output the count of members by every possible dimension combination into a SAS data set. Likewise, if a hospital visit database table exists with the same dimensions and facts, PROC MEANS can output the number of hospital visits by the dimension combinations into a second SAS data set. With the power of PROC EXPAND, each of the data sets above, once sorted properly, can have columns added, which calculate total members and total hospital visits by a time dimension of the analyst's choice. Common time dimensions used for Kaiser Permanente's utilization rates are monthly, rolling 12-months, and year-to-date. The resulting membership and hospital visit data sets can be joined with a MERGE statement, and simple division produces a rate for the given dimensions.
Thomas Gant, Kaiser Permanente
You have SAS® Enterprise Guide® installed. You use SAS Enterprise Guide in your day-to-day work. You see how Enterprise Guide can be an aid to accessing data and insightful analytics. You have people you work with or support who are new to SAS® and want to learn. You have people you work with or support who don't particularly want to code but use the GUI and wizard within Enterprise Guide. And then you have the spreadsheet addict, the person or group who refuse to even sign on to SAS. These people need to consume the data sitting in SAS, and they need to do analysis, but they want to do it all in a spreadsheet. But you need to retain an audit trail of the data, and you have to reduce the operational risk of using spreadsheets for reporting. What do you do? This paper shares some of the challenges and triumphs in empowering these very different groups of people using SAS.
Anita Measey, Bank of Montreal
What will your customer do next? Customers behave differently; they are not all average. Segmenting your customers into different groups enables you to provide different communications and interactions for the different segments, resulting in greater customer satisfaction as well as increased profits. Using SAS® Visual Analytics and SAS® Visual Statistics to visualize your segments with respect to customer attributes enables you to create more useful segments for customer relationship management and to understand the value and relative importance of different customer attributes. You can segment your customers by using the following methods: 1) business rules; 2) supervised clustering--decision trees and so on; 3) unsupervised clustering; 4) creating segments based on quantile membership. Whatever way you choose, SAS Visual Analytics enables you to graphically represent your customer data with respect to demographic, geographic, and customer behavioral dimensions. This paper covers the four segmentation techniques and demonstrates how SAS Visual Analytics and SAS Visual Statistics can be used for easy and comprehensive understanding of your customers.
Darius Baer, SAS
Suneel Grover, SAS
Ensemble models are a popular class of methods for combining the posterior probabilities of two or more predictive models in order to create a potentially more accurate model. This paper summarizes the theoretical background of recent ensemble techniques and presents examples of real-world applications. Examples of these novel ensemble techniques include weighted combinations (such as stacking or blending) of predicted probabilities in addition to averaging or voting approaches that combine the posterior probabilities by adding one model at a time. Fit statistics across several data sets are compared to highlight the advantages and disadvantages of each method, and process flow diagrams that can be used as ensemble templates for SAS® Enterprise Miner™ are presented.
Wendy Czika, SAS
Ye Liu, SAS Institute
Now that you have deployed SAS®, what should you do to ensure it continues to meet your SAS users' performance expectations? This paper discusses how to proactively monitor your SAS infrastructure, with tools that should be used on a regular basis to keep tabs on infrastructure performance and housekeeping. Basic SAS administration concepts are discussed, with references for blogs, communities, and the new visual learning sites.
Margaret Crevar, SAS
As Data Management professionals, you have to comply with new regulations and controls. One such regulation is Basel Committee on Banking Supervision (BCBS) 239. To respond to these new demands, you have to put processes and methods in place to automate metadata collection and analysis, and to provide rigorous documentation around your data flows. You also have to deal with many aspects of data management including data access, data manipulation (ETL and other), data quality, data usage, and data consumption, often from a variety of toolsets that are not necessarily from a single vendor. This paper shows you how to use SAS® technologies to support data governance requirements, including third party metadata collection and data monitoring. It highlights best practices such as implementing a business glossary and establishing controls for monitoring data. Attend this session to become familiar with the SAS tools used to meet the new requirements and to implement a more managed environment.
Jeff Stander, SAS
With the big data throughputs generated by event streams, organizations can opportunistically respond with low-latency effectiveness. Having the ability to permeate identified patterns of interest throughout the enterprise requires deep integration between event stream processing and foundational enterprise data management applications. This paper describes the innovative ability to consolidate real-time data ingestion with controlled and disciplined universal data access from SAS® and Teradata.
Tho Nguyen, Teradata
Fiona McNeill, SAS
SAS® In-Memory Statistics uses a powerful interactive programming interface for analytics, aimed squarely at the data scientist. We show how the custom tasks that you can create in SAS® Studio (a web-based programming interface) can make everyone a data scientist! We explain the Common Task Model of SAS Studio, and we build a simple task in steps that carries out the basic functionality of the IMSTAT procedure. This task can then be shared amongst all users, empowering everyone on their journey to becoming a data scientist. During the presentation, it will become clear that not only can shareable tasks be created but the developer does not have to understand coding in Java, JavaScript, or ActionScript. We also use the task we created in the Visual Programming perspective in SAS Studio.
Stephen Ludlow, SAS
Every day, we are bombarded by pundits pushing big data as the cure for all research woes and heralding the death of traditional quantitative surveys. We are told how big data, social media, and text analytics will make the Likert scale go the way of the dinosaur. This presentation makes the case for evolving our surveys and data sets to embrace new technologies and modes while still welcoming new advances in big data and social listening. Examples from the global automotive industry are discussed to demonstrate the pros and cons of different types of data in an automotive environment.
Will Neafsey, Ford Motor Company
Does the rapidly changing fraud and compliance landscape make it difficult to adapt your applications to meet your current needs? Fraudsters are constantly evolving and your applications need to be able to keep up. No two businesses are ever the same. They all have different business drivers, customer needs, market demands, and strategic objectives. Using SAS® Visual Investigator, business units can quickly adapt to their ever-changing needs and deliver what end users and customers need to be effective investigators. Using the administrative tools provided, users can quickly and easily adapt the pages and data structures that underpin applications to fit their needs. This presentation walks through the process of updating an insurance fraud detection system using SAS Visual Investigator, including changing the way alerts are routed to users, pulling in additional information, and building out application pages. It includes an example of how an end user can customize a solution due to changing fraud threats.
Gordon Robinson, SAS
SAS® Embedded Process offers a flexible, efficient way to leverage increasing amounts of data by injecting the processing power of SAS® directly where the data lives. SAS Embedded Process can tap into the massively parallel processing (MPP) architecture of Hadoop for scalable performance. Using SAS® In-Database Technologies for Hadoop, you can run scoring models generated by SAS® Enterprise Miner™ or, with SAS® In-Database Code Accelerator for Hadoop, user-written DS2 programs in parallel. With SAS Embedded Process on Hadoop you can also perform data quality operations, and extract and transform data using SAS® Data Loader. This paper explores key SAS technologies that run inside the Hadoop parallel processing framework and prepares you to get started with them.
David Ghazaleh, SAS
The Armed Conflict Location & Event Data Project (ACLED) is a comprehensive public collection of conflict data for developing states. As such, the data is instrumental in informing humanitarian and development work in crisis and conflict-affected regions. The ACLED project currently manually codes for eight types of events from a summarized event description, which is a time-consuming process. In addition, when determining the root causes of the conflict across developing states, these fixed event types are limiting. How can researchers get to a deeper level of analysis? This paper showcases a repeatable combination of exploratory and classification-based text analytics provided by SAS® Contextual Analysis, applied to the publicly available ACLED for African states. We depict the steps necessary to determine (using semi-automated processes) the next deeper level of themes associated with each of the coded event types; for example, while there is a broad protests and riots event type, how many of those are associated individually with rallies, demonstrations, marches, strikes, and gatherings? We also explore how to use predictive models to highlight the areas that require aid next. We prepare the data for use in SAS® Visual Analytics and SAS® Visual Statistics using SAS® DS2 code and enable dynamic explorations of the new event sub-types. This ultimately provides the end-user analyst with additional layers of data that can be used to detect trends and deploy humanitarian aid where limited resources are needed the most.
Tom Sabo, SAS
Recent years have seen the birth of a powerful tool for companies and scientists: the Google Ngram data set, built from millions of digitized books. It can be, and has been, used to learn about past and present trends in the use of words over the years. This is an invaluable asset from a business perspective, mostly because of its potential application in marketing. The choice of words has a major impact on the success of a marketing campaign and an analysis of the Google Ngram data set can validate or even suggest the choice of certain words. It can also be used to predict the next buzzwords in order to improve marketing on social media or to help measure the success of previous campaigns. The Google Ngram data set is a gift for scientists and companies, but it has to be used with a lot of care. False conclusions can easily be drawn from straightforward analysis of the data. It contains only a limited number of variables, which makes it difficult to extract valuable information from it. Through a detailed example, this paper shows that it is essential to account for the disparity in the genre of the books used to construct the data set. This paper argues that for the years after 1950, the data set has been constructed using a much higher proportion of scientific books than for the years before. An ingenious method is developed to approximate, for each year, this unknown proportion of books coming from the scientific literature. A statistical model accounting for that change in proportion is then presented. This model is used to analyze the trend in the use of common words of the scientific literature in the 20th century. Results suggest that a naive analysis of the trends in the data can be misleading.
Aurélien Nicosia, Université Laval
Thierry Duchesne, Universite Laval
Samuel Perreault, Université Laval
Do you create complex reports using PROC REPORT? Are you confused by the COMPUTE BLOCK feature of PROC REPORT? Are you even aware of it? Maybe you already produce reports using PROC REPORT, but suddenly your boss needs you to modify some of the values in one or more of the columns. Maybe your boss needs to see the values of some rows in boldface and others highlighted in a stylish yellow. Perhaps one of the columns in the report needs to display a variety of fashionable formats (some with varying decimal places and some without any decimals). Maybe the customer needs to see a footnote in specific cells of the report. Well, if this sounds familiar then come take a look at the COMPUTE BLOCK of PROC REPORT. This paper shows a few tips and tricks of using the COMPUTE DEFINE block with conditional IF/THEN logic to make your reports stylish and fashionable. The COMPUTE BLOCK allows you to use data DATA step code within PROC REPORT to provide customization and style to your reports. We'll see how the Census Bureau produces a stylish demographic profile for customers of its Special Census program using PROC REPORT with the COMPUTE BLOCK. The paper focuses on how to use the COMPUTE BLOCK to create this stylish Special Census profile. The paper shows quick tips and simple code to handle multiple formats within the same column, make the values in the Total rows boldface, trafficlighting, and how to add footnotes to any cell based on the column or row. The Special Census profile report is an Excel table created with ODS tagsets.ExcelXP that is stylish and fashionable, thanks in part to the COMPUTE BLOCK.
Chris Boniface, Census Bureau
Road safety is a major concern for all United States of America citizens. According to the National Highway Traffic Safety Administration, 30,000 deaths are caused by automobile accidents annually. Oftentimes fatalities occur due to a number of factors such as driver carelessness, speed of operation, impairment due to alcohol or drugs, and road environment. Some studies suggest that car crashes are solely due to driver factors, while other studies suggest car crashes are due to a combination of roadway and driver factors. However, other factors not mentioned in previous studies may be contributing to automobile accident fatalities. The objective of this project was to identify the significant factors that lead to multiple fatalities in the event of a car crash.
Bill Bentley, Value-Train
Gina Colaianni, Kennesaw State University
Cheryl Joneckis, Kennesaw State University
Sherry Ni, Kennesaw State University
Kennedy Onzere, Kennesaw State University
With millions of users and peak traffic of thousands of requests a second for complex user-specific data, fantasy football offers many data design challenges. Not only is there a high volume of data transfers, but the data is also dynamic and of diverse types. We need to process data originating on the stadium playing field and user devices and make it available to a variety of different services. The system must be nimble and must produce accurate and timely responses. This talk discusses the strategies employed by and lessons learned from one of the primary architects of the National Football League's fantasy football system. We explore general data design considerations with specific examples of high availability, data integrity, system performance, and some other random buzzwords. We review some of the common pitfalls facing large-scale databases and the systems using them. And we cover some of the tips and best practices to take your data-driven applications from fantasy to reality.
Clint Carpenter, Carpenter Programming
SAS® for Windows is extremely powerful software, not only for analyzing data, but also for organizing and maintaining output and permanent data sets. By using pipes and operating system (x) commands within a SAS session, you can easily and effectively manage files of all types stored on your local network.
Emily Sisson, Boston University School of Public Health Data Coordinating Center
Logistic regression models are commonly used in direct marketing and consumer finance applications. In this context the paper discusses two topics about the fitting and evaluation of logistic regression models. Topic 1 is a comparison of two methods for finding multiple candidate models. The first method is the familiar best subsets approach. Best subsets is then compared to a proposed new method based on combining models produced by backward and forward selection plus the predictors considered by backward and forward. This second method uses the SAS® procedure HPLOGISTIC with selection of models by Schwarz Bayes criterion (SBC). Topic 2 is a discussion of model evaluation statistics to measure predictive accuracy and goodness-of-fit in support of the choice of a final model. Base SAS® and SAS/STAT® software are used.
Bruce Lund, Magnify Analytic Solutions, Division of Marketing Associates
In this era of data analytics, you are often faced with a challenge of joining data from multiple legacy systems. When the data systems share a consistent merge key, such as ID or SSN, the solution is straightforward. However, what do you do when there is no common merge key? If one data system has a character value ID field, another has an alphanumeric field, and the only common fields are the names or addresses or dates of birth, a standard merge query does not work. This paper demonstrates fuzzy matching methods that can overcome this obstacle and build your master record through Base SAS® coding. The paper also describes how to leverage the SAS® Data Quality Server in SAS® code.
Elena Shtern, SAS
Kim Hare, SAS
The popular MIXED, GLIMMIX, and NLMIXED procedures in SAS/STAT® software fit linear, generalized linear, and nonlinear mixed models, respectively. These procedures take the classical approach of maximizing the likelihood function to estimate model. The flexible MCMC procedure in SAS/STAT can fit these same models by taking a Bayesian approach. Instead of maximizing the likelihood function, PROC MCMC draws samples (using a variety of sampling algorithms) to approximate the posterior distributions of model parameters. Similar to the mixed modeling procedures, PROC MCMC provides estimation, inference, and prediction. This paper describes how to use the MCMC procedure to fit Bayesian mixed models and compares the Bayesian approach to how the classical models would be fit with the familiar mixed modeling procedures. Several examples illustrate the approach in practice.
Gordon Brown, SAS
Fang K. Chen, SAS
Maura Stokes, SAS
Analytics is now pervasive in our society, informing millions of decisions every year, and even making decisions without human intervention in automated systems. The implicit assumption is that all completed analytical projects are accurate and successful, but the reality is that some are not. Bad decisions that result from faulty analytics can be costly, embarrassing and even dangerous. By understanding the root causes of analytical failures, we can minimize the chances of repeating the mistakes of the past.
SAS® Forecast Server provides an excellent tool towards forecasting of time series across a variety of scenarios and industries. Given sufficient data, SAS Forecast Server can provide insights into seasonality, trend, and effects of input variables and events. In some business cases, there might not be sufficient data within individual series to produce these complex models. This paper presents an approach to forecasting these short time series. Using sufficiently long time series as a training data set, forecasted shape and volumes are created through clustering and attribute-based regressions, allowing for new series to have forecasts based on their characteristics. These forecasts are passed into SAS Forecast Server through preprocessing. Production of artificial history and input variables, matched with a custom model addition to the standard model repository, allows for future values of the artificial variable to be realized as the forecast. This process not only relieves the need for manual entry of overrides, but also allows SAS Forecast Server to choose alternative models as additional observations are provided through incremental loads.
Cobey Abramowski, SAS
Christian Haxholdt, SAS Institute
Chris Houck, SAS Institute
Benjamin Perryman, SAS Institute
The importance of understanding terrorism in the United States assumed heightened prominence in the wake of the coordinated attacks of September 11, 2001. Yet, surprisingly little is known about the frequency of attacks that might happen in the future and the factors that lead to a successful terrorist attack. This research is aimed at forecasting the frequency of attacks per annum in the United States and determining the factors that contribute to an attack's success. Using the data acquired from the Global Terrorism Database (GTD), we tested our hypothesis by examining the frequency of attacks per annum in the United States from 1972 to 2014 using SAS® Enterprise Miner™ and SAS® Forecast Studio. The data set has 2,683 observations. Our forecasting model predicts that there could be as many as 16 attacks every year for the next 4 years. From our study of factors that contribute to the success of a terrorist attack, we discovered that attack type, weapons used in the attack, place of attack, and type of target play pivotal roles in determining the success of a terrorist attack. Results reveal that the government might be successful in averting assassination attempts but might fail to prevent armed assaults and facilities or infrastructure attacks. Therefore, additional security might be required for important facilities in order to prevent further attacks from being successful. Results further reveal that it is possible to reduce the forecasted number of attacks by raising defense spending and by putting an end to the raging war in the Middle East. We are currently working on gathering data to do in-depth analysis at the monthly and quarterly level.
SriHarsha Ramineni, Oklahoma State University
Koteswara Rao Sriram, Oklahoma State University
Collection of customer data is often done in status tables or snapshots, where, for example, for each month, the values for a handful of variables are recorded in a new status table whose name is marked with the value of the month. In this QuickTip, we present how to construct a table of last occurrence times for customers using a DATA step merge of such status tables and the colon (':') wildcard. If the status tables are sorted, this can be accomplished in four lines of code (where RUN; is the fourth). Also, we look at how to construct delta tables (for example, from one time period to another, or which customers have arrived or left) using a similar method of merge and colons.
Jingyu She, Danica Pension
In the fast-changing world of data management, new technologies to handle increasing volumes of (big) data are emerging every day. Many companies are struggling in dealing with these technologies and more importantly in integrating them into their existing data management processes. Moreover, they also want to rely on the knowledge built by their teams in existing products. Implementing change to learn new technologies can therefore be a costly procedure. SAS® is a perfect fit in this situation by offering a suite of software that can be set up to work with any third-party database through using the corresponding SAS/ACCESS® interface. Indeed, for new database technologies, SAS is releasing a specific SAS/ACCESS interface, allowing users to develop and migrate SAS solutions almost transparently. Your users need to know only a few techniques in order to combine the power of SAS with a third-party (big) database. This paper will help companies with the integration of rising technologies in their current SAS® Data Management platform using SAS/ACCESS. More specifically, the paper focuses on best practices for success in such an implementation. Integrating SAS Data Management with the IBM PureData for Analytics Appliance is provided as an example.
Magali Thesias, Deloitte
When attempting to match names and addresses from different files, we often run into a situation where the names are similar, but not exactly the same. Sometimes there are additional words in the names, sometimes there are different spellings, and sometimes the businesses have the same name but are located thousands of miles apart. The files that contain the names might have numeric keys that cannot be matched. Therefore, we need to use a process called fuzzy matching to match the names from different files. The SAS® function COMPGED, combined with SAS character-handling functions, provides a straightforward method of applying business rules and testing for similarity.
Stephen Sloan, Accenture
Daniel Hoicowitz, Accenture
SAS® Customer Intelligence 360 is the new cloud-based customer data gathering application that uses the Amazon Web Services Cloud as a hosting platform. Learn how you can instrument your mobile application to gather usage data from your users as well as provide targeted application content based on either customer behavior or marketing campaigns.
Jim Adams, SAS
Color is an important aspect of data visualization and provides an analyst with another tool for identifying data trends. But is the default option the best for every case? Default color scales may be familiar to us; however, they can have inherent flaws that skew our perception of the data. The impact of data visualizations can be considerably improved with just a little thought toward choosing the correct colors. Selecting an appropriate color map can be difficult, and you may decide that it may be easier to generate your own custom color scale. After a brief introduction to the red, green, and blue (RGB) color space, we discuss the strengths and weaknesses of some widely used color scales, as well as what to keep in mind when designing your own. A simple technique is presented, detailing how you can use SAS® to generate a number of color scales and apply them to your data. Using just a few graphics procedures, you can transform almost any complex data into an easily digestible set of visuals. The techniques used in this discussion were developed and tested using SAS® Enterprise Guide® 5.1.
Jeff Grant, Bank of Montreal
Mahmoud Mamlouk, BMO Harris Bank
SAS® programmers spend hours developing DATA step code to clean up their data. Save time and money and enhance your data quality results by leveraging the power of the SAS Quality Knowledge Base (QKB). Knowledge Engineers at SAS have developed advanced context-specific data quality operations and packaged them in callable objects in the QKB. In this session, a Senior Software Development Manager at SAS explains the QKB and shows how to invoke QKB operations in a DATA step and in various SAS® Data Management offerings.
Brian Rineer, SAS
SAS® Output Delivery System (ODS) Graphics started appearing in SAS® 9.2. When first starting to use these tools, the traditional SAS/GRAPH® software user might come upon some very significant challenges in learning the new way to do things. This is further complicated by the lack of simple demonstrations of capabilities. Most graphs in training materials and publications are rather complicated graphs that, while useful, are not good teaching examples. This paper contains many examples of very simple ways to get very simple things accomplished. Over 20 different graphs are developed using only a few lines of code each, using data from the SASHELP data sets. The use of the SGPLOT, SGPANEL, and SGSCATTER procedures are shown. In addition, the paper addresses those situations in which the user must alternatively use a combination of the TEMPLATE and SGRENDER procedures to accomplish the task at hand. Most importantly, the use of ODS Graphics Designer as a teaching tool and a generator of sample graphs and code are covered. The emphasis in this paper is the simplicity of the learning process. Users will be able to take the included code and run it immediately on their personal machines to achieve an instant sense of gratification.
Roger Muller, Data-To-Events, Inc.
As more of your work is performed in an off-premises cloud environment, understanding how to get the data you want to analyze and report on becomes important. In addition, working in a cloud environment like Amazon Web Services might be something that is new to you or to your organization when you use a product like SAS® Visual Analytics. This presentation discusses how to get the best use out of cloud resources, how to efficiently transport information between systems, and how to continue to leverage data in an on-premises database management system (DBMS) in your future cloud system.
Gary Mehler, SAS
Since its inception, SAS® Text Miner has used the singular value decomposition (SVD) to convert a term-document matrix to a representation that is crucial for building successful supervised and unsupervised models. In this presentation, using SAS® code and SAS Text Miner, we compare these models with those that are based on SVD representations of subcomponents of documents. These more granular SVD representations are also used to provide further insights into the collection. Examples featuring visualizations, discovery of term collocations, and near-duplicate subdocument detection are shown.
Russell Albright, SAS
James Cox, SAS Institute
Ning Jin, SAS Institute
If your organization already deploys one or more software solutions via Amazon Web Services (AWS), you know the value of the public cloud. AWS provides a scalable public cloud with a global footprint, allowing users access to enterprise software solutions anywhere at any time. Although SAS® began long before AWS was even imagined, many loyal organizations driven by SAS are moving their local SAS analytics into the public AWS cloud, alongside other software hosted by AWS. SAS® Solutions OnDemand has assisted organizations in this transition. In this paper, we describe how we extended our enterprise hosting business to AWS. We describe the open source automation framework from which SAS Soultions onDemand built our automation stack, which simplified the process of migrating a SAS implementation. We'll provide the technical details of our automation and network footprint, a discussion of the technologies we chose along the way, and a list of lessons learned.
Ethan Merrill, SAS
Bryan Harkola, SAS
In today's world of torrential data, plotting large amounts of data has its own challenges. The SGPLOT procedure in ODS Graphics offers a simple yet powerful tool for your graphing needs. In this paper we present some graphs that work better for large data sets. We create heat maps, box plots, and parallel coordinate plots that visualize large data. You can make your graphs resilient to your growing data with ODS Graphics!
Prashant Hebbar, SAS Institute Inc
Lingxiao Li, SAS
Sanjay Matange, SAS
Project management is a hot topic across many industries, and there are multiple commercial software applications for managing projects available. The reality, however, is that the majority of project management software is not applicable for daily usage. SAS® has a solution for this issue that can be used for managing projects graphically in real time. This paper introduces a new paradigm for project management using the SAS® Graph Template Language (GTL). SAS clients, in real time, can use GTL to visualize resource assignments, task plans, delivery tracking, and project status across multiple project levels for more efficient project management.
Zhouming(Victor) Sun, Medimmune
SAS® programs can be very I/O intensive. SAS data sets with inappropriate variable attributes can degrade the performance of SAS programs. Using SAS compression offers some relief but does not eliminate the issues caused by inappropriately defined SAS variables. This paper examines the problems that inappropriate SAS variable attributes can cause and introduces a macro to tackle the problem of minimizing the footprint of a SAS data set.
Brian Varney, Experis BI & Analytics Practice
Would you like to be more confident in producing graphs and figures? Do you understand the differences between the OVERLAY, GRIDDED, LATTICE, DATAPANEL, and DATALATTICE layouts? Would you like to know how to easily create life sciences industry standard graphs such as adverse event timelines, Kaplan-Meier plots, and waterfall plots? Finally, would you like to learn all these methods in a relaxed environment that fosters questions? Great--this topic is for you! In this hands-on workshop, you will be guided through the Graph Template Language (GTL). You will also complete fun and challenging SAS graphics exercises to enable you to more easily retain what you have learned. This session is structured so that you will learn how to create the standard plots that your manager requests, how to easily create simple ad hoc plots for your customers, and also how to create complex graphics. You will be shown different methods to annotate your plots, including how to add Unicode characters to your plots. You will find out how to create reusable templates, which can be used by your team. Years of information have been carefully condensed into this 90-minute hands-on, highly interactive session. Feel free to bring some of your challenging graphical questions along!
Kriss Harris, SAS Specialists Ltd
Annotating a blank Case Report Form (blankcrf.pdf, which is required for a nondisclosure agreement with the US Food and Drug Administration) by hand is an arduous process that can take many hours of precious time. Once again SAS® comes to the rescue! You can have SAS use the Study Data Tabulation Model (SDTM) specifications data to create all the necessary annotations and place them on the appropriate pages of the blankcrf.pdf. In addition you can dynamically color and adjust the font size of these annotations. This approach uses SAS, Adobe Acrobat's forms definition format (FDF) language, and Adobe Reader to complete the process. In this paper I go through each of the steps needed and explain in detail exactly how to accomplish each task.
Steven Black, Agility Clinical
SAS® provides some powerful, flexible tools for creating reports, like the REPORT and TABULATE procedures. With the advent of the Output Delivery System (ODS), you have almost total control over how the output from those procedures looks. But there are still times where you need (or want) just a little more and that's where the Report Writing Interface (RWI) can help. The Report Writing Interface is just a fancy way of saying you're using the ODSOUT object in a DATA step. This object allows you to lay out the page, create tables, embed images, add titles and footnotes, and more--all from within a DATA step, using whatever DATA step logic you need. Also, all the style capabilities of ODS are available to you so that the output created by your DATA step can have fonts, sizes, colors, backgrounds, and borders to make your report look just like you want. This presentation quickly covers some of the basics of using the ODSOUT object and then walks through some of the techniques to create four real-world examples. Who knows, you might even go home and replace some of your PROC REPORT code--I know I have!
Pete Lund, Looking Glass Analytics
A group tasked with testing SAS® software from the customer perspective has gathered a number of helpful hints for SAS® 9.4 that will smooth the transition to its new features and products. These hints will help with the 'huh?' moments that crop up when you are getting oriented and will provide short, straightforward answers. We also share insights about changes in your order contents. Gleaned from extensive multi-tier deployments, SAS® Customer Experience Testing shares insiders' practical tips to ensure that you are ready to begin your transition to SAS 9.4 and guidance for after you are at SAS 9.4. The target audience for this paper is primarily system administrators who will be installing, configuring, or administering the SAS 9.4 environment. This paper was first published in 2012; it has been revised each year since with new information.
Lisa Cook, SAS
Lisa Cook, SAS
Edith Jeffries, SAS
Cindy Taylor, SAS
SAS® Federated Query Language (FedSQL) is a SAS proprietary implementation of the ANSI SQL:1999 core standard capable of providing a scalable, threaded, high-performance way to access relational data in multiple data sources. This paper explores the FedSQL language in a three-step approach. First, we introduce the key features of the language and discuss the value each feature provides. Next, we present the FEDSQL procedure (the Base SAS® implementation of the language) and compare it to PROC SQL in terms of both syntax and performance. Finally, we examine how FedSQL queries can be embedded in DS2 programs to merge the power of these two high-performance languages.
Shaun Kaufmann, Farm Credit Canada
You can use annotation, modify templates, and change dynamic variables to customize graphs in SAS®. Standard graph customization methods include template modification (which most people use to modify graphs that analytical procedures produce) and SG annotation (which most people use to modify graphs that procedures such as PROC SGPLOT produce). However, you can also use SG annotation to modify graphs that analytical procedures produce. You begin by using an analytical procedure, ODS Graphics, and the ODS OUTPUT statement to capture the data that go into the graph. You use the ODS document to capture the values that the procedure sets for the dynamic variables, which control many of the details of how the graph is created. You can modify the values of the dynamic variables, and you can modify graph and style templates. Then you can use PROC SGRENDER along with the ODS output data set, the captured or modified dynamic variables, the modified templates, and SG annotation to create highly customized graphs. This paper shows you how and provides examples.
Warren Kuhfeld, SAS
SAS® Enterprise Guide® is an extremely valuable tool for programmers, but it should also be leveraged by managers and executives to do data exploration, get information on the fly, and take advantage of the powerful analytics and reporting that SAS® has to offer. This can all be done without learning to program. This paper gives an overview of how SAS Enterprise Guide can improve the process of turning real-time data into real-time business decisions by managers.
Steven First, Systems Seminar Consultants, Inc.
As SAS® programmers we often want our code or program logic to be driven by the data at hand, rather than be directed by us. Such dynamic code enables the data to drive the logic or sequence of execution. This type of programming can be very useful when creating lists of observations, variables, or data sets from ever-changing data. Whether these lists are used to verify the data at hand or are to be used in later steps of the program, dynamic code can write our lists once and ensure that the values change in tandem with our data. This Quick Tip paper will present the concepts of creating data-driven lists for observations, variables, and data sets, the code needed to execute these tasks, and examples to better explain the process and results of the programs we will create.
Kate Burnett-Isaacs, Statistics Canada
SAS® Data Management is not a dating application. However, as a data analyst, you do strive to find the best matches for your data. Similar to a dating application, when trying to find matches for your data, you need to specify the criteria that constitutes a suitable match. You want to strike a balance between being too stringent with your criteria and under-matching your data and being too loose with your criteria and over-matching your data. This paper highlights various SAS Data Management matching techniques that you can use to strike the right balance and help your data find its perfect match, and as a result improve your data for reporting and analytics purposes.
Mary Kathryn Queen, SAS
Today's SAS® environment has large numbers of concurrent SAS processes that have to process ever-growing data volumes. To help SAS users remain productive, SAS administrators must ensure that SAS applications have sufficient computer resources that are properly configured and monitored often. Understanding how all of the components of SAS work and how they are used by your users is the first step. The guidance offered in this paper helps SAS administrators evaluate hardware, operating system, and infrastructure options for a SAS environment that will keep their SAS applications running at optimal performance and keep their user community happy.
Margaret Crevar, SAS
Most SAS® products are offered as web-based applications these days. Even though a web-based interface provides unmatched accessibility, it comes with known security vulnerabilities. This paper examines the most common exploitation scenarios and suggests the ways to make web applications more secure. The top ten focus areas include protection of user credentials, use of stronger authentication methods, implementation of SSL for all communications between client and server, understanding of attacking mechanism, penetration testing, adoption and integration with third-party security packages, encryption of any sensitive data, security logging and auditing, mobile device access management, and prevention of threats from inside.
Heesun Park, SAS
In SAS® LASR™ Analytic Server, data can reside in three types of environments: client hard disk (for example, a laptop), the Hadoop File System (HDFS) and the memory of the SAS LASR Analytic Server. Moving the data efficiently among these is critical for getting insights from the data on time. In this paper, we illustrate all the possible ways to move the data, including 1) moving data from client hard disk to HDFS; 2) moving data from HDFS to client hard disk; 3) moving data from HDFS to SAS LASR Analytic Server; 4) moving data from SAS LASR Analytic Server to HDFS; 5) moving data from client hard disk to SAS LASR Analytic Server; and 6) moving data from SAS LASR Analytic Server to client hard disk.
Yue Qi, SAS
Managing large data sets comes with the task of providing a certain level of quality assurance, no matter what the data is used for. We present here the fundamental SAS® procedures to perform when determining the completeness of a data set. Even though each data set is unique and has its own variables that need more examination in detail, it is important to first examine the size, time, range, interactions, and purity (STRIP) of a data set to determine its readiness for any use. This paper covers first steps you should always take, regardless of whether you're dealing with health, financial, demographic, or environmental data.
Michael Santema, Kaiser Permanente
Fagen Xie, Kaiser Permanente
This paper provides tips and techniques to speed up the validation process without and with automation. For validation without automation, it introduces both standard use and clever use of options and statements to be implemented in the COMPARE procedure that can speed up the validation process. For validation with automation, a macro named %QCDATA is introduced for individual data set validation, and a macro named %QCDIR is introduced for comparison of data sets in two different directories. Also introduced in this section is a definition of &SYSINFO and an explanation of how it can be of use to interpret the result of the comparison.
Alice Cheng, Portola Pharmaceuticals
Justina Flavin, Independent Consultant
Michael Wise, Experis BI & Analytics Practice
PERL is a good language to work with text in in the case of SAS® with character variables. PERL was much used in website programming in the early days of the World Wide Web. SAS provides some PERL functions in DATA step statements. PERL is useful for finding text patterns or simply strings of text and returning the strings or positions in a character variable of the string. I often use PERL to locate a string of text in a character variable. Knowing this location, I know where to start a substr function. The poster covers a number of PERL functions available in SAS® 9.2 and later. Each function is explained with written text, and then a brief code demonstration shows how I use the function. The examples are jointly explained based on SAS documentation and my best practices.
Peter Timusk, Independent
Have you ever wondered how to get the most from Web 2.0 technologies in order to visualize SAS® data? How to make those graphs dynamic, so that users can explore the data in a controlled way, without needing prior knowledge of SAS products or data science? Wonder no more! In this session, you learn how to turn basic sashelp.stocks data into a snazzy HighCharts stock chart in which a user can review any time period, zoom in and out, and export the graph as an image. All of these features with only two DATA steps and one SORT procedure, for 57 lines of SAS code.
Vasilij Nevlev, Analytium Ltd
Your organization already uses SAS® Visual Analytics and you have designed reports for internal use. Now you want to publish a report on your external website. How do you design a report for the general public considering the wide range of education and abilities? This paper defines best practices for designing reports that are universally accessible to the broadest audience. You learn tips and techniques for designing reports that the general public can easily understand and use to gain insight. You also learn how to leverage features that help you comply with your legal obligations regarding users with disabilities. The paper includes recommendations and examples that you can apply to your own reports.
Jesse Sookne, SAS
Julianna Langston, SAS Institute
Karen Mobley, SAS
Ed Summers, SAS
Although hashing methods such as SHA256 and MD5 are already available in SAS®, other methods may be needed by SAS users. This presentation shows how hashing and methods can be implemented using SAS DATA steps and macros, with judicious use of the bitwise functions (BAND, BOR, and so on) and by referencing public domain sources.
Rick Langston, SAS
In the pharmaceutical industry, generating summary tables or listings in PDF format is becoming increasingly popular. You can create various PDF output files with different styles by combining the ODS PDF statement and the REPORT procedure in SAS® software. In order to validate those outputs, you need to import the PDF summary tables or listings into SAS data sets. This paper presents an overview of a utility that reads in an uncompressed PDF file that is generated by SAS, extracts the data, and converts it to SAS data sets.
William Wu, Herodata LLC
Yun Guan, Theravance Biopharm
Importing metadata can be a time-consuming, painstaking, and fraught task when you have multiple SAS packages to deal with. Fortunately SAS® software provides a batch facility to import (and export) metadata by using the command line, thereby saving you from many a mouse click and potential repetitive strain injury when you use the Metadata Import Wizard. This paper aims to demystify the SAS batch import tool. It also introduces a template-driven process for programmatically building a batch file that contains the syntax needed to import metadata and then automatically executing the batch import scripts.
David Moors, Whitehound Limited
Looking for new ways to improve your business? Try mining your own data! Event log data is a side product of information systems generated for audit and security purposes and is seldom analyzed, especially in combination with business data. Along with the cloud computing era, more event log data has been accumulated and analysts are searching for innovative ways to take advantage of all data resources in order to get valuable insights. Process mining, a new field for discovering business patterns from event log data, has recently proved useful for business applications. Process mining shares some algorithms with data mining but it is more focused on interpretation of the detected patterns rather than prediction. Analysis of these patterns can lead to improvements in the efficiency of common existing and planned business processes. Through process mining, analysts can uncover hidden relationships between resources and activities and make changes to improve organizational structure. This paper shows you how to use SAS® Analytics to gain insights from real event log data.
Emily (Yan) Gao, SAS
Robert Chu, SAS
Xudong Sun, SAS
Memory-based reasoning (MBR) is an empirical classification method that works by comparing cases in hand with similar examples from the past and then applying that information to the new case. MBR modeling is based on the assumptions that the input variables are numeric, orthogonal to each other, and standardized. The latter two assumptions are taken care of by principal components transformation of raw variables and using the components instead of the raw variables as inputs to MBR. To satisfy the first assumption, the categorical variables are often dummy coded. This raises issues such as increasing dimensionality and overfitting in the training data by introducing discontinuity in the response surface relating inputs and target variables. The Weight of Evidence (WOE) method overcomes this challenge. This method measures the relative response of the target for each group level of a categorical variable. Then the levels are replaced by the pattern of response of the target variable within that category. The SAS® Enterprise Miner™ Interactive Grouping Node is used to achieve this. With this method, the categorical variables are converted into numeric variables. This paper demonstrates the improvement in performance of an MBR model when categorical variables are WOE coded. A credit screening data set obtained from SAS® Education that comprises 25 attributes for 3000 applicants is used for this study. Three different types of MBR models were built using the SAS Enterprise Miner MBR node to check the improvement in performance. The results show the MBR model with WOE coded categorical variables is the best, based on misclassification rate. Using this data, when WOE coding is adopted, the model misclassification rate decreases from 0.382 to 0.344 while the sensitivity of the model increases from 0.552 to 0.572.
Vinoth Kumar Raja, West Corporation
Goutam Chakraborty, Oklahoma State University
Vignesh Dhanabal, Oklahoma State University
Many organizations want to innovate with the analytics solutions from SAS®. However, many companies are facing constraints in time and money to build an innovation strategy. In this session, you learn how SaasNow can offer you a flexible solution to become innovative again. During this session, you experience how easy it is to deploy a SAS® Visual Analytics and SAS® Visual Statistics environment in just 30 minutes in a pay-per-month model. Visitors to this session receive a free voucher to test-drive SaasNow!
A picture is worth a thousand words, but what if there are a billion words? This is where the picture becomes even more important, and this is where infographics step in. Infographics are a representation of information in a graphic format designed to make the data easily understandable, at a glance, without having to have a deep knowledge of the data. Due to the amount of data available today, more infographics are being created to communicate information and insight from all available data, both in the boardroom and on social media. This session shows you how to create information graphics that can be printed, shared, and dynamically explored with objects and data from SAS® Visual Analytics. Connect your infographics to the high-performance analytical engine from SAS® for repeatability, scale, and performance on big data, and for ease of use. You will see how to leverage elements of your corporate dashboards and self-service analytics while communicating subjective information and adding the context that business teams require, in a highly visual format. This session looks at how SAS® Office Analytics enables a Microsoft Office user to create infographics for all occasions. You will learn the workflow that lets you get the most from your SAS Visual Analytics system without having to code anything. You will leave this session with the perfect blend of creative freedom and data governance that comes from leveraging the power of SAS Visual Analytics and the familiarity of Microsoft Office.
Travis Murphy, SAS
One of the tedious but necessary things that SAS® programmers must do is to trace and monitor SAS runs: counting observations and columns, checking performance, drawing flowcharts, and diagramming data flows. This is required in order to audit results, find errors, find long-running steps, identify large data files, and to simply understand what a particular job does. SAS® Enterprise Guide® includes functionality to help with some of this, on some SAS jobs. This paper presents an innovative custom tool that automatically produces flowcharts--showing all SAS steps in a job, file counts in and out, variable counts, recommendations and generated code for performance improvements, and more. The system was written completely in SAS using SAS source programs, SAS logs, PROC SCAPROC output, and more as input. This tool can eliminate much of the manual work needed in job optimization, space issues, debugging, change management, documentation, and job check-out. As has been said many times before: We have computers, let's use them.
Steven First, Systems Seminar Consultants, Inc.
Jennifer First-Kluge, Systems Seminar Consultants, Inc.
An interactive SAS® environment is preferred for developing programs as it gives the flexibility of instantly viewing the log in the log window. The programmer must review the log window to ensure that each and every single line of a written program is running successfully without displaying any messages defined by SAS that are potential errors. Reviewing the log window every time is not only time consuming but also prone to manual error for any level of programmer. Just to confirm that the log is free from error, the programmer must check the log. Currently in the interactive SAS environment there is no way to get an instant notification about the generated log from the Log window, indicating whether there have been any messages defined by SAS that are potential errors. This paper introduces an instant approach to analyzing the Log window using the SAS macro %ICHECK that displays the reports instantly in the same SAS environment. The report produces a summary of all the messages defined by SAS in the Log window. The programmer does not need to add %ICHECK at the end of the program. Whether a single DATA step, a single PROC step, or the whole program is submitted, the %ICHECK macro is automatically executed at the end of every submission. It might be surprising to you to learn how a compiled macro can be executed without calling it in the Editor window. But it is possible with %ICHECK, and you can develop it using only SAS products. It can be used in a Windows or UNIX interactive SAS environment without requiring any user inputs. With the proposed approach, there is a significant benefit in the log review process and a 100% gain in time saved for all levels of programmers because the log is free from error. Similar functionality can be introduced in the SAS product itself.
Amarnath Vijayarangan, Emmes Services Pvt Ltd, India
PALANISAMY MOHAN, ICON CLINICAL RESEARCH INDIA PVT LTD
Microsoft Visual Basic Scripting Edition (VBScript) and SAS® software are each powerful tools in their own right. These two technologies can be combined so that SAS code can call a VBScript program or vice versa. This gives a programmer the ability to automate SAS tasks; traverse the file system; send emails programmatically via Microsoft Outlook or SMTP; manipulate Microsoft Word, Microsoft Excel, and Microsoft PowerPoint files; get web data; and more. This paper presents example code to demonstrate each of these capabilities.
Christopher Johnson, BrickStreet Insurance
In studies where randomization is not possible, imbalance in baseline covariates (confounding by indication) is a fundamental concern. Propensity score matching (PSM) is a popular method to minimize this potential bias, matching individuals who received treatment to those who did not, to reduce the imbalance in pre-treatment covariate distributions. PSM methods continue to advance, as computing resources expand. Optimal matching, which selects the set of matches that minimizes the average difference in propensity scores between mates, has been shown to outperform less computationally intensive methods. However, many find the implementation daunting. SAS/IML® software allows the integration of optimal matching routines that execute in R, e.g. the R optmatch package. This presentation walks through performing optimal PSM in SAS® through implementing R functions, assessing whether covariate trimming is necessary prior to PSM. It covers the propensity score analysis in SAS, the matching procedure, and the post-matching assessment of covariate balance using SAS/STAT® 13.2 and SAS/IML procedures.
Lucy D'Agostino McGowan, Vanderbilt University
Robert Greevy, Department of Biostatistics, Vanderbilt University
Companies looking for an optimal solution to run their SAS® Analytics need a seamless way to manage their data between many different systems, including commodity Hadoop storage and the more traditional data warehouse. This presentation outlines a simple path for building a single platform that integrates SAS®, Hadoop, and the data warehouse into a single, pre-configured solution, as well as strategies for querying data within multiple existing systems and combing the results to produce even more powerful decision-making possibilities.
This paper builds on the knowledge gained in the paper 'Introduction to ODS Graphics.' The capabilities in ODS Graphics grow with every release as both new paradigms and smaller tweaks are introduced. After talking with the ODS developers, I have chosen a selection of the many wonderful capabilities to highlight here. This paper provides the reader with more tools for his or her tool belt. Visualization of data is an important part of telling the story seen in the data. And while the standards and defaults in ODS Graphics are very well done, sometimes the user has specific nuances for characters in the story or additional plot lines they want to incorporate. Almost any possibility, from drama to comedy to mystery, is available in ODS Graphics if you know how. We explore tables, annotation and changing attributes, as well as the block plot. Any user of Base SAS® on any platform will find great value from the SAS® ODS Graphics procedures. Some experience with these procedures is assumed, but not required.
Chuck Kincaid, Experis BI & Analytics Practice
Organizations view Hadoop as a lower cost means to assemble their data in one location. Increasingly, the Hadoop environment also provides the processing horsepower to analyze the patterns contained in this data to create business value. SAS® Grid Manager for Hadoop allows you to co-locate your SAS® Analytics workload directly on your Hadoop cluster via an integration with the YARN and Oozie components of the Hadoop ecosystem. YARN allows Hadoop to become the operating system for your data-it is a tool that manages and mediates access to the shared pool of data, and manages and mediates the resources that are used to manipulate the pool. Learn the capabilities and benefits of SAS Grid Manager for Hadoop as well as some configuration tips. In addition, sample SAS Grid jobs are provided to illustrate different ways to access and analyze your Hadoop data with your SAS Grid jobs.
Cheryl Doninger, SAS
Doug Haigh, SAS
This presentation teaches the audience how to use ODS Graphics. Now part of Base SAS®, ODS Graphics are a great way to easily create clear graphics that enable users to tell their stories well. SGPLOT and SGPANEL are two of the procedures that can be used to produce powerful graphics that used to require a lot of work. The core of the procedures is explained, as well as some of the many options available. Furthermore, we explore the ways to combine the individual statements to make more complex graphics that tell the story better. Any user of Base SAS on any platform will find great value in the SAS® ODS Graphics procedures.
Chuck Kincaid, Experis BI & Analytics Practice
The SAS® Output Delivery System (ODS) ExcelXP tagset offers users the ability to export a SAS data set, with all of the accompanying functions and calculations, to a Microsoft Excel spreadsheet. For industries, this is particularly useful because although not everyone is a SAS programmer, they would like to have access to and manipulate data from SAS. The ExcelXP tagset is one of several built-in templates for exporting data in SAS. The tagset gives programmers the ability to export functions and calculations into the cells of an Excel spreadsheet. Several options within the tagset enable the programmer to customize the Excel file. Some of these options enable the programmer to name the worksheet, style each table, embed titles and footnotes, and export multiple data tables to the same Excel worksheet.
Veronica Renauldo, Grand Valley State University
While there is no shortage of MS and MA programs in data science, PhD programs have been slow to develop. Is there a role for a PhD in data science? How would this differ from an advanced degree in statistics? Computer science? This session explores the issues related to the curriculum content for a PhD in data science and what those students would be expected to do after graduation. Come join the conversation.
Jennifer Priestley, Kennesaw State University
Every day, companies all over the world are moving their data into the cloud. While there are many options available, much of this data will wind up in Amazon Redshift. As a SAS® user, you are probably wondering, 'What is the best way to access this data using SAS?' This paper discusses the many ways that you can use SAS/ACCESS® software to get to Amazon Redshift. We compare and contrast the various approaches and help you decide which is best for you. Topics include building a connection, moving data into Amazon Redshift, and applying performance best practices.
Chris DeHart, SAS
Jeff Bailey, SAS
Gone are the days when the only method of receiving a loan was by visiting your local branch and working with a loan officer. In today's economy, financial institutions increasingly rely on online channels to interact with their customers. The anonymity that is inherent in this channel makes it a prime target for fraudsters. The solution is to profile the behavior of internet banking in real time and assess each transaction for risk as it is processed in order to prevent financial loss before it occurs. SAS® Visual Scenario Designer enables you to create rules, scenarios, and models, test their impact, and inject them into real-time transaction processing using SAS® Event Stream Processing.
Sam Atassi, SAS
Jamie Hutton, SAS
Do you want to see and experience how to configure SAS® Enterprise Miner™ single sign-on? Are you looking to explore setting up Integrated Windows Authentication with SAS® Visual Analytics? This hands-on workshop demonstrates how you can configure Kerberos delegation with SAS® 9.4. You see how to validate the prerequisites, make the configuration changes, and use the applications. By the end of this workshop you will be empowered to start your own configuration.
Stuart Rogers, SAS
Considering the fact that SAS® Grid Manager is becoming more and more popular, it is important to fulfill the user's need for a successful migration to a SAS® Grid environment. This paper focuses on key requirements and common issues for new SAS Grid users, especially if they are coming from a traditional environment. This paper describes a few common requirements like the need for a current working directory, the change of file system navigation in SAS® Enterprise Guide® with user-given location, getting job execution summary email, and so on. The GRIDWORK directory has been introduced in SAS Grid Manager, which is a bit different from the traditional SAS WORK location. This paper explains how you can use the GRIDWORK location in a more user-friendly way. Sometimes users experience data set size differences during grid migration. A few important reasons for data set size difference are demonstrated. We also demonstrate how to create new custom scripts as per business needs and how to incorporate them with SAS Grid Manager engine.
Piyush Singh, TATA Consultancy Services Ltd
Tanuj Gupta, TATA Consultancy Services
Prasoon Sangwan, Tata consultancy services limited
You are a skilled SAS® programmer. You can code circles around those newbies who point and click in SAS® Enterprise Guide®. And yet& there are tasks you struggle with on a regular basis, such as Is the name of that data set DRUG or DRUGS? and What intern wrote this code? It's not formatted well at all and is hard to read. In this seminar you learn how to program, yes program, more efficiently. You learn the benefits of autocomplete and inline help, as well as how to easily format the code that intern wrote that you inherited. In addition, you learn how to create a process flow of a program to identify any dead ends, i.e., data sets that get created but are not used in that program.
Michelle Buchecker, ThotWave Technologies
From stock price histories to hospital stay records, analysis of time series data often requires use of lagged (and occasionally lead) values of one or more analysis variables. For the SAS® user, the central operational task is typically getting lagged (lead) values for each time point in the data set. While SAS has long provided a LAG function, it has no analogous lead function--an especially significant problem in the case of large data series. This paper reviews the LAG function, in particular the powerful, but non-intuitive implications of its queue-oriented basis. The paper demonstrates efficient ways to generate leads with the same flexibility as the LAG function, but without the common and expensive recourse to data re-sorting. It also shows how to dynamically generate leads and lags through use of the hash object.
Mark Keintz, Wharton Research Data Services
In this paper, a SAS® macro is introduced that can help users find and access their folders and files very easily. By providing a path to the macro and letting the macro know which folders and files you are looking for under this path, the macro creates an HTML report that lists the matched folders and files. The best part of this HTML report is that it also creates a hyperlink for each folder and file so that when a user clicks the hyperlink, it directly opens the folder or file. Users can also ask the macro to find certain folders or files by providing part of the folder or file name as the search criterion. The results shown in the report can be sorted in different ways so that it can further help users quickly find and access their folders and files.
Ting Sa, Cincinnati Children's Hospital Medical Center
Are you still using TRIM, LEFT, and vertical bar operators to concatenate strings? It's time to modernize and streamline that clumsy code by using the string concatenation functions introduced in SAS®9. This paper is an overview of the CAT, CATS, CATT, and CATX functions introduced in SAS®9, and the new CATQ function added in SAS® 9.2. In addition to making your code more compact and readable, this family of functions offers some new tricks for accomplishing previously cumbersome tasks.
Josh Horstman, Nested Loop Consulting
Report automation and scheduling are very hot topics in many corporate industries. Automating reports has many advantages, including reducing workload, eliminating repetitive tasks, generating accurate results, and offering better performance. In recent years SAS® launched more powerful tools to present and share business analytics data. This paper illustrates the stepwise process of how to deploy and schedule reports on a server using SAS® Management Console 9.4. Many of us know that the scheduling jobs can be done using Windows (as well as scheduling jobs at the server level) and it is important to note that the server-side scheduling has more advantages than scheduling jobs on Windows. The Windows scheduler invokes SAS programs on a local PC and these are more often subject to system crashes. The main advantage of scheduling on the server is that most jobs that are scheduled run using nighttime facilities when there is faster record retrieval and less load burden on database servers. Other advantages of scheduling on the server side are that all scheduled jobs are at one location and it is also easy to maintain and keep track of log files if any scheduled jobs fail to run. This paper includes an overview of the schedule manager in SAS Management Console 9.4, a description of system tools and their options, instructions for converting SAS® Enterprise Guide® point-and-click programs into a consolidated SAS program for deployment, and several other related topics.
Anjan Matlapudi,, AmeriHealth Caritas Family of Companies
Is uniqueness essential for your reports? SAS® Visual Analytics provides the ability to customize your reports to make them unique by using the SAS® Theme Designer. The SAS Theme Designer can be accessed from the SAS® Visual Analytics Hub to create custom themes to meet your branding needs and to ensure a unified look across your company. The report themes affect the colors, fonts, and other elements that are used in tables and graphs. The paper explores how to access SAS Theme Designer from the SAS Visual Analytics home page, how to create and modify report themes that are used in SAS Visual Analytics, how to create report themes from imported custom themes, and how to import and export custom report themes.
Meenu Jaiswal, SAS
Ipsita Samantarai, SAS Research & Development (India) Pvt Ltd
We spend so much time talking about GRID environments, distributed jobs, and huge data volumes that we ignore the thousands of relatively tiny programs scheduled to run every night, which produce only a few small data sets, but make all the difference to the users who depend on them. Individually, these jobs might place a negligible load on the system, but by their sheer number they can often account for a substantial share of overall resources, sometimes even impacting the performance of the bigger, more important jobs. SAS® programs, by their varied nature, use available resources in a varied way. Depending on whether a SAS® procedure is CPU-, disk- or memory-intensive, chunks of a memory or CPU can sometimes remain unused for quite a while. Bigger jobs leave bigger chunks, and this is where being small and able to effectively exploit those leftovers can be a great advantage. We call our main optimization technique the Fast Lane, which is a queue configuration dedicated to jobs with consistently small SASWORK directories, that, when available, lets them use unused RAM in place of their SASWORK disk. The approach improves overall CPU saturation considerably while taking loads off the I/O subsystem, and without failure results in improved runtimes for big jobs and small jobs alike, without requiring any changes to deployed code. This paper explores the practical aspects of implementing the Fast Lane on your environment. The time-series profiling of overnight workload finds available resource gaps, identifies candidate jobs to fill those gaps, schedules and queues triggers, changes application server configuration, and monitors and controls techniques that keep everything running smoothly. And, of course, are the very real, tangible gains in performance that Fast Lane has delivered in real-world scenarios.
Nikola Markovic, Boemska Ltd.
Greg Nelson, ThotWave
Business Intelligence users analyze business data in a variety of ways. Seventy percent of business data contains location information. For in-depth analysis, it is essential to combine location information with mapping. New analytical capabilities are added to SAS® Visual Analytics, leveraging the new partnership with Esri, a leader in location intelligence and mapping. The new capabilities enable users to enhance the analytical insights from SAS Visual Analytics. This paper demonstrates and discusses the new partnership with Esri and the new capabilities added to SAS Visual Analytics.
Murali Nori, SAS
Himesh Patel, SAS
Markov chain Monte Carlo (MCMC) algorithms are an essential tool in Bayesian statistics for sampling from various probability distributions. Many users prefer to use an existing procedure to code these algorithms, while others prefer to write an algorithm from scratch. We demonstrate the various capabilities in SAS® software to satisfy both of these approaches. In particular, we first illustrate the ease of using the MCMC procedure to define a structure. Then we step through the process of using SAS/IML® to write an algorithm from scratch, with examples of a Gibbs sampler and a Metropolis-Hastings random walk.
Chelsea Lofland, University of California, Santa Cruz
For SAS® Enterprise Guide® users, sometimes macro variables and their values need to be brought over to the local workspace from the server, especially when multiple data sets or outputs need to be written to separate files in a local drive. Manually retyping the macro variables and their values in the local workspace after they have been created on the server workspace would be time-consuming and error-prone, especially when we have quite a number of macro variables and values to bring over. Instead, this task can be achieved in an efficient manner by using dictionary tables and the CALL SYMPUT routine, as illustrated in more detail below. The same approach can also be used to bring macro variables and their values from the local to the server workspace.
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
Logic model produced propensity scores have been intensively used to assist direct marketing name selections. As a result, only customers with an absolute higher likelihood to respond are mailed offers in order to achieve cost reduction. Thus, event ROI is increased. There is a fly in the ointment, however. Compared to the model building performance time window, usually 6 months to 12 months, a marketing event time period is usually much shorter. As such, this approach lacks of the ability to deselect those who have a high propensity score but are unlikely to respond to an upcoming campaign. To consider dynamically building a complete propensity model for every upcoming camping is nearly impossible. But, incorporating time to respond has been of great interest to marketers to add another dimension for response prediction enhancement. Hence, this paper presents an inventive modeling technique combining logistic regression and the Cox Proportional Hazards Model. The objective of the fusion approach is to allow a customer's shorter next to repurchase time to compensate for his or her insignificant lower propensity score in winning selection opportunities. The method is accomplished using PROC LOGISTIC, PROC LIFETEST, PROC LIFEREF, and PROC PHREG on the fusion model that is building in a SAS® environment. This paper also touches on how to use the results to predict repurchase response by demonstrating a case of repurchase time-shift prediction on the 12-month inactive customers of a big box store retailer. The paper also shares a results comparison between the fusion approach and logit alone. Comprehensive SAS macros are provided in the appendix.
Hsin-Yi Wang, Alliance Data Systems
When I help users design or debug their SAS® programs, they are sometimes unable to provide relevant SAS data sets because they contain confidential information. Sometimes, confidential data values are intrinsic to their problem, but often the problem could still be identified or resolved with innocuous data values that preserve some of the structure of the confidential data. Or the confidential values are in variables that are unrelated to the problem. While techniques for masking or disguising data exist, they are often complex or proprietary. In this paper, I describe a very simple macro, REVALUE, that can change the values in a SAS data set. REVALUE preserves some of the structure of the original data by ensuring that for a given variable, observations with the same real value have the same replacement value, and if possible, observations with a different real value have a different replacement value. REVALUE enables the user to specify which variables to change and whether to order the replacement values for each variable by the sort order of the real values or by observation order. I discuss the REVALUE macro in detail and provide a copy of the macro.
Bruce Gilsen, Federal Reserve Board
Business problems have become more stratified and micro-segmentation is driving the need for mass-scale, automated machine learning solutions. Additionally, deployment environments include diverse ecosystems, requiring hundreds of models to be built and deployed quickly via web services to operational systems. The new SAS® automated modeling tool allows you to build and test hundreds of models across all of the segments in your data, testing a wide variety of machine learning techniques. The tool is completely customizable, allowing you transparent access to all modeling results. This paper shows you how to identify hundreds of champion models using SAS® Factory Miner, while generating scoring web services using SAS® Decision Manager. Immediate benefits include efficient model deployments, which allow you to spend more time generating insights that might reveal new opportunities, expose hidden risks, and fuel smarter, well-timed decisions.
Jonathan Wexler, SAS
Steve Sparano, SAS
The SQL procedure is extremely powerful when it comes to summarizing and aggregating data, but it can be a little daunting for programmers who are new to SAS® or for more experienced programmers who are more familiar with using the SUMMARY or MEANS procedure for aggregating data. This hands-on workshop demonstrates how to use PROC SQL for a variety of summarization and aggregation tasks. These tasks include summarizing multiple measures for groupings of interest, combining summary and detail information (via several techniques), nesting summary functions by using inline views, and generating summary statistics for derived or calculated variables. Attendees will learn how to use a variety of PROC SQL summary functions, how to effectively use WHERE and HAVING clauses in constructing queries, and how to exploit the PROC SQL remerge. The examples become progressively more complex over the course of the workshop as you gain mastery of using PROC SQL for summarizing data.
Christianna Williams, Self-Employed
It is of paramount importance for brand managers to measure and understand consumer brand associations and the mindspace their brand captures. Brands are encoded in memory on a cognitive and emotional basis. Traditionally, brand tracking has been done by surveys and feedback, resulting in a direct communication that covers the cognitive segment and misses the emotional segment. Respondents generally behave differently under observation and in solitude. In this paper, a new brand-tracking technique is proposed that involves capturing public data from social media that focuses more on the emotional aspects. For conceptualizing and testing this approach, we downloaded nearly one million tweets for three major brands--Nike, Adidas, and Reebok--posted by users. We proposed a methodology and calculated metrics (benefits and attributes) using this data for each brand. We noticed that generally emoticons are not used in sentiment mining. To incorporate them, we created a macro that automatically cleans the tweets and replaces emoticons with an equivalent text. We then built supervised and unsupervised models on those texts. The results show that using emoticons improves the efficiency of predicting the polarity of sentiments as the misclassification rate was reduced from 0.31 to 0.24. Using this methodology, we tracked the reactions that are triggered in the minds of customers when they think about a brand and thus analyzed their mind share.
Sharat Dwibhasi, Oklahoma State University
Although limited to a small fraction of health care providers, the existence and magnitude of fraud in health insurance programs requires the use of fraud prevention and detection procedures. Data mining methods are used to uncover odd billing patterns in large databases of health claims history. Efficient fraud discovery can involve the preliminary step of deploying automated outlier detection techniques in order to classify identified outliers as potential fraud before an in-depth investigation. An essential component of the outlier detection procedure is the identification of proper peer comparison groups to classify providers as within-the-norm or outliers. This study refines the concept of peer comparison group within the provider category and considers the possibility of distinct billing patterns associated with medical or surgical procedure codes identifiable by the Berenson-Eggers Type of System (BETOS). The BETOS system covers all HCPCS codes (Health Care Procedure Coding System); assigns a HCPCS code to only one BETOS code; consists of readily understood clinical categories; consists of categories that permit objective assignment& (Center for Medicare and Medicaid Services, CMS). The study focuses on the specialty General Practice and involves two steps: first, the identification of clusters of similar BETOS-based billing patterns; and second, the assessment of the effectiveness of these peer comparison groups in identifying outliers. The working data set is a sample of the summary of 2012 data of physicians active in health care government programs made publicly available by the CMS through its website. The analysis uses PROC FASTCLUS, the SAS® cubic clustering criterion approach, to find the optimal number of clusters in the data. It also uses PROC ROBUSTREG to implement a multivariate adaptive threshold outlier detection method.
Paulo Macedo, Integrity Management Services
Code review is an important tool for ensuring the quality and maintainability of an organization's ETL processes. This paper introduces a method that analyzes the metadata to assign an alert score to each job. For the jobs of each flow, the metadata is collected, giving information about the number of transformations in a job, the amount of user-written code, the number of loops, the number of empty description fields, whether naming conventions are followed, and so on. This is aggregated into complexity indicators per job. Together with other information about the jobs (e.g. the CPU-time used from logging applications) this information can be made available in SAS® Visual Analytics, creating a dashboard that highlights areas for closer inspection.
Frank Poppe, PW Consulting
Joris Huijser, De Nederlandsche Bank N.V.
Microsoft Office products play an important role in most enterprises. SAS® is combined with Microsoft Office to assist in decision making in everyday life. Most SAS users have moved from SAS® 9.3, or 32-bit SAS, to SAS® 9.4 for the exciting new features. This paper describes a few things that do not work quite the same in SAS 9.4 as they did in the 32-bit version. It discusses the reasons for the differences and the new approaches that SAS 9.4 provides. The paper focuses on how SAS 9.4 works with Excel and Access. Furthermore, this presentation summarizes methods by LIBNAME engines and by import or export procedures. It also demonstrates how to interactively import or export PC files with SAS 9.4. The issue of incompatible format catalogs is addressed as well.
Hong Zhang, Mathematica Policy Research
Every day, businesses have to remain vigilant of fraudulent activity, which threatens customers, partners, employees, and financials. Normally, networks of people or groups perpetrate deviant activity. Finding these connections is now made easier for analysts with SAS® Visual Investigator, an upcoming SAS® solution that ultimately minimizes the loss of money and preserves mutual trust among its shareholders. SAS Visual Investigator takes advantage of the capabilities of the new SAS® In-Memory Server. Investigators can efficiently investigate suspicious cases across business lines, which has traditionally been difficult. However, the time required to collect, process and identify emerging fraud and compliance issues has been costly. Making proactive analysis accessible to analysts is now more important than ever. SAS Visual Investigator was designed with this goal in mind and a key component is the visual social network view. This paper discusses how the network analysis view of SAS Visual Investigator, with all its dynamic visual capabilities, can make the investigative process more informative and efficient.
Danielle Davis, SAS
Stephen Boyd, SAS Institute
Ray Ong, SAS Institute
When analyzing data with SAS®, we often encounter missing or null values in data. Missing values can arise from the availability, collectibility, or other issues with the data. They represent the imperfect nature of real data. Under most circumstances, we need to clean, filter, separate, impute, or investigate the missing values in data. These processes can take up a lot of time, and they are annoying. For these reasons, missing values are usually unwelcome and need to be avoided in data analysis. There are two sides to every coin, however. If we can think outside the box, we can take advantage of the negative features of missing values for positive uses. Sometimes, we can create and use missing values to achieve our particular goals in data manipulation and analysis. These approaches can make data analyses convenient and improve work efficiency for SAS programming. This kind of creative and critical thinking is the most valuable quality for data analysts. This paper exploits real-world examples to demonstrate the creative uses of missing values in data analysis and SAS programming, and discusses the advantages and disadvantages of these methods and approaches. The illustrated methods and advanced programming skills can be used in a wide variety of data analysis and business analytics fields.
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
Specialized access requirements to tap into event streams vary depending on the source of the events. Open-source approaches from Spark, Kafka, Storm, and others can connect event streams to big data lakes, like Hadoop and other common data management repositories. But a different approach is needed to ensure latency and throughput are not adversely affected when processing streaming data with different open-source frameworks, that is, they need to scale. This talk distinguishes the advantages of adapters and connectors and shows how SAS® can leverage both Hadoop and YARN technologies to scale while still meeting the needs of streaming data analysis and large, distributed data repositories.
Evan Guarnaccia, SAS
Fiona McNeill, SAS
Steve Sparano, SAS
Across the languages of SAS® are many golden nuggets--functions, formats, and programming features just waiting to impress your friends and colleagues. While learning SAS for over 30 years, I have collected a few of these nuggets, and I offer a dozen more of them to you in this presentation. I presented the first dozen in a similar paper at SAS Global Forum 2015.
Peter Crawford, Crawford Software Consultancy limited
It is well documented in the literature the impact of missing data: data loss (Kim & Curry, 1977) and consequently loss of power (Raaijmakers, 1999), and biased estimates (Roth, Switzer, & Switzer, 1999). If left untreated, missing data is a threat to the validity of inferences made from research results. However, the application of a missing data treatment does not prevent researchers from reaching spurious conclusions. In addition to considering the research situation at hand and the type of missing data present, another important factor that researchers should also consider when selecting and implementing a missing data treatment is the structure of the data. In the context of educational research, multiple-group structured data is not uncommon. Assessment data gathered from distinct subgroups of students according to relevant variables (e.g., socio-economic status and demographics) might not be independent. Thus, when comparing the test performance of subgroups of students in the presence of missing data, it is important to preserve any underlying group effect by applying separate multiple imputation with multiple groups before any data analysis. Using attitudinal (Likert-type) data from the Civics Education Study (1999), this simulation study evaluates the performance of multiple-group imputation and total-group multiple imputation in terms of item parameter invariance within structural equation modeling using the SAS® procedure CALIS and item response theory using the SAS procedure MCMC.
Patricia Rodriguez de Gil, University of South Florida
Jeff Kromrey, University of South Florida
We asked for it, and once again, SAS delivers! Now production in SAS® 9.4M3, the ODS EXCEL statement provides users with an extremely easy way to export output to Microsoft Excel. Using existing ODS techniques that you may already be familiar with (like predefined styles, traffic-lighting, and custom formatting) in tandem with common Excel features (like formulas, frozen headers and footers, and page setup options), you can create a publication-ready document in a snap. As a matter of fact, the new ODS EXCEL statement is such a handy tool, you may find yourself using it for more than large-scope production programs. This statement is so straightforward that it may be a good choice for outputting quick ad hoc requests as well. Join me on my adventure as I explore how to use the ODS EXCEL statement to create institutional research reports. I offer tips and techniques you can use to make your output excel lent, too.
Gina Huff, Western Kentucky University
Writing music with SAS® is a simple process. Including a snippet of music in a program is a great way to signal completion of processing. It's also fun! This paper illustrates a method for translating music into SAS code using the CALL SOUND routine.
Dan Bretheim, Towers Watson
A new ODS destination for creating Microsoft Excel workbooks is available starting in the third maintenance release of SAS® 9.4. This destination creates native Microsoft Excel XLSX files, supports graphic images, and offers other advantages over the older ExcelXP tagset. In this presentation you learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS® output. The techniques can be used regardless of the platform on which SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on-demand and in real time using SAS server technology is discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.
Vince DelGobbo, SAS Institute Inc.
Customers are under increasing pressure to determine how data, reports, and other assets are interconnected, and to determine the implications of making changes to these objects. Of special interest is performing lineage and impact analysis on data from disparate systems, for example, SAS®, IBM InfoSphere DataStage, Informatica PowerCenter, SAP BusinessObjects, and Cognos. SAS provides utilities to identify relationships between assets that exist within the SAS system, and beginning with the third maintenance release of SAS® 9.4, these utilities support third-party products. This paper provides step-by-step instructions, tips, and techniques for using these utilities to collect metadata from these disparate systems, and to perform lineage and impact analysis.
Liz McIntosh, SAS
You've heard all the talk about SAS® Visual Analytics--but maybe you are still confused about how the product would work in your SAS® environment. Many customers have the same points of confusion about what they need to do with their data, how to get data into the product, how SAS Visual Analytics would benefit them, and even should they be considering Hadoop or the cloud. In this paper, we cover the questions we are asked most often about implementation, administration, and usage of SAS Visual Analytics.
Tricia Aanderud, Zencos Consulting LLC
Ryan Kumpfmiller, Zencos Consulting
Nick Welke, Zencos Consulting
In the consumer credit industry, privacy is key and the scrutiny increases every day. When returning files to a client, we must ensure that they are depersonalized so that the client cannot match back to any personally identifiable information (PII). This means that we must locate any values for a variable that occur on a limited number of records and null them out (i.e., replace them with missing values). When you are working with large files that have more than one million observations and thousands of variables, locating variables with few unique values is a difficult task. While the FREQ procedure and the DATA step merging can accomplish the task, using first./last. BY variable processing to locate the suspect values and hash objects to merge the data set back together might offer increased efficiency.
Renee Canfield, Experian
The SAS® macro language is an efficient and effective way to handle repetitive processing tasks. One such task is conducting the same DATA steps or procedures for different time periods, such as every month, quarter, or year. SAS® dates are based on the number of days between January 1st, 1960, and the target date value, which, while very simple for a computer, is not a human-friendly representation of dates. SAS dates and macro processing are not simple concepts themselves, so mixing the two can be particularly challenging, and it can be very easy to end up working with bad dates! Understanding how macros and SAS dates work individually and together can greatly improve the efficiency of your coding and data processing tasks. This paper covers the basics of SAS macro processing, SAS dates, and the benefits and difficulties of using SAS dates in macros, to ensure that you never have a bad date again.
Kate Burnett-Isaacs, Statistics Canada
Andrew Clapson, MD Financial Management
Many databases include default values that are set inappropriately. For example, a top-level Yes/No question might be followed by a group of check boxes, to which a respondent might indicate multiple answers. When creating the database, a programmer might choose to set a default value of 0 for each box that is not checked. One should interpret these default values with caution, however, depending on whether the top-level question is answered Yes or No or is missing. A similar scenario occurs with a Yes/No question where No is the default value if the question is not answered (but actually the value should be missing). These default values might be scattered throughout the database; there might be no pattern to their occurrence. Records without valid information should be omitted from statistical analysis. Programmers should understand the difference between missing values and invalid values (that is, incorrectly set defaults) because it is important to handle these records differently. Choosing the best method to omit records with invalid values can be difficult. Manual corrections are often prohibitively time-consuming. SAS® offers a useful approach: a combined DATA step and SQL procedure. This paper provides a step-by-step method to accomplish the task.
Lily Yu, Statistics Collaborative, Inc.
In today's fast paced work environment, time management is crucial to the success of the project. Sending requests to SAS® programmers to run reports every time you need to get the most current data can be a stretch sometimes on an already strained schedule. Why bother to contact the programmer? Why not build the execution of the SAS program into the report itself so that when the report is launched, real-time data is retrieved and the report shows the most recent data. This paper demonstrates that by opening an existing SAS report in Microsoft Word or Microsoft Excel, the data in the report refreshes automatically. Simple Visual Basic for Applications (VBA) code is written in Word or Excel. When an existing SAS report is opened, this VBA code calls the SAS programs that create the report from within a Microsoft Office product and overwrites the existing report data with the most current data.
Ron Palanca, Mathematica Policy Research
Today, companies are increasingly using analytics to discover new revenue-increasing and cost-saving opportunities. Many business professionals turn to SAS, a leader in business analytics software and service, to help them improve performance and make better decisions faster. Analytics are also being used in risk management, fraud detection, life sciences, sports, and many more emerging markets. To maximize their value to a business, analytics solutions need to be deployed quickly and cost-effectively, while also providing the ability to scale readily without degrading performance. Of course, in today's demanding environments, where budgets are shrinking and the number of mandates to reduce carbon footprints are growing, the solution must deliver excellent hardware utilization, power efficiency, and return on investment. To address some of these challenges, Red Hat and SAS have collaborated to recommend the best practices for configuring SAS® 9 running on Red Hat Enterprise Linux. The scope of this document includes Red Hat Enterprise Linux 6 and 7. Researched areas include the I/O subsystem, file system selection, and kernel tuning, in both bare-metal and kernel-based virtual machine (KVM) environments. In addition, we include grid configurations that run with the Red Hat Resilient Storage Add-On, which includes Global File System 2 (GFS2) clusters.
Barry Marson, Red Hat, Inc
In cooperation with the Joint Research Centre - European Commission (JRC), we have developed a number of innovative techniques to detect outliers on a large scale. In this session, we show the power of SAS/IML® Studio as an interactive tool for exploring and detecting outliers using customized algorithms that were built from scratch. The JRC uses this for detecting abnormal trade transactions on a large scale. The outliers are detected using the Forward Search, which starts from a central subset in the data and subsequently adds observations that are close to the current subset based on regression (R-student) or multivariate (Mahalanobis distance) output statistics. The implementation of this algorithm and its applications were done in SAS/IML Studio and converted to a macro for use in the IML procedure in Base SAS®.
Jos Polfliet, SAS
This paper highlights many of the major capabilities of the DATASETS procedure. It discusses how it can be used as a tool to update variable information in a SAS® data set; to provide information on data set and catalog contents; to delete data sets, catalogs, and indexes; to repair damaged SAS data sets; to rename files; to create and manage audit trails; to add, delete, and modify passwords; to add and delete integrity constraints; and more. The paper contains examples of the various uses of PROC DATASETS that programmers can cut and paste into their own programs as a starting point. After reading this paper, a SAS programmer will have practical knowledge of the many different facets of this important SAS procedure.
Michael Raithel, Westat
The DATA step has served SAS® programmers well over the years. Now there is a new, exciting, and powerful alternative: the DS2 procedure. By introducing an object-oriented programming environment, PROC DS2 enables users to effectively manipulate complex data and efficiently manage programming through additional data types, programming structure elements, user-defined methods, and shareable packages, as well as threaded execution. This hands-on workshop was developed based on our experiences with getting started with PROC DS2 and learning to use it to access, manage, and share data in a scalable and standards-based way. It will help SAS users of all levels easily get started with PROC DS2 and understand its basic functionality by practicing with its features.
Peter Eberhardt, Fernwood Consulting Group Inc.
Xue Yao, Winnipeg Regional Health Aurthority
In our book SAS for Mixed Models, we state that 'the majority of modeling problems are really design problems.' Graduate students and even relatively experienced statistical consultants can find translating a study design into a useful model to be a challenge. Generalized linear mixed models (GLMMs) exacerbate the challenge, because they accommodate complex designs, complex error structures, and non-Gaussian data. This talk covers strategies that have proven successful in design of experiment courses and in consulting sessions. In addition, GLMM methods can be extremely useful in planning experiments. This talk discusses methods to implement precision and power analysis to help choose between competing plausible designs and to assess the adequacy of proposed designs.
Walter Stroup, University of Nebraska, Lincoln
Inspired by Christianna William's paper on transitioning to PROC SQL from the DATA step, this paper aims to help SQL programmers transition to SAS® by using PROC SQL. SAS adapted the Structured Query Language (SQL) by means of PROC SQL back with SAS®6. PROC SQL syntax closely resembles SQL. However, there are some SQL features that are not available in SAS. Throughout this paper, we outline common SQL tasks and how they might differ in PROC SQL. We also introduce useful SAS features that are not available in SQL. Topics covered are appropriate for novice SAS users.
Barbara Ross, NA
Jessica Bennett, Snap Finance
Often SAS® users within business units find themselves running wholly within a single production environment, where non-production environments are reserved for the technology support teams to roll out patches, maintenance releases, and so on. So what is available for business units to set up and manage their intra-platform business systems development life cycle (SDLC)? Another scenario is when a new platform is provisioned, but the business units are responsible for promoting their own content from the old to the new platform. How can this be done without platform administrator roles? This paper examines some typical issues facing business users trying to manage their SAS content, and it explores the capabilities of various SAS tools available to promote and manage metadata, mid-tier content, and SAS® Enterprise Guide® projects.
Andrew Howell, ANJ Solutions P/L
Pedal-to-the-metal analytics is the notion that analytics can be quickly and easily achieved using web technologies. SAS® web technologies seamlessly integrate with each other through a web browser and with data via web APIs, enabling organizations to leapfrog traditional, manual analytic and data processes. Because of this integration (and the operational efficiencies obtained as a result), pedal-to-the-metal analytics dramatically accelerates the analytics lifecycle, which consists of these steps: 1) Data Preparation; 2) Exploration; 3) Modeling; 4) Scoring; and 5) Evaluating results. In this paper, data preparation is accomplished with SAS® Studio custom tasks (reusable drag-and-drop visual components or interfaces for underlying SAS code). This paper shows users how to create and implement these for public health surveillance. With data preparation complete, explorations of the data can be performed using SAS® Visual Analytics. Explorations provide insights for creating, testing, and comparing models in SAS® Visual Statistics to predict or estimate risk. The model score code produced by SAS Visual Statistics can then be deployed from within SAS Visual Analytics for scoring. Furthermore, SAS Visual Analytics provides the necessary dashboard and reporting capabilities to evaluate modeling results. In conclusion, the approach presented in this paper provides both new and long-time SAS users with easy-to-follow guidance and a repeatable workflow to maximize the return on their SAS investments while gaining actionable insights on their data. So, fasten your seat belts and get ready for the ride!
Manuel Figallo, SAS
Today many analysts in information technology are facing challenges working with large amounts of data. Most analysts are smart enough to write queries using correct table join conditions, text mining, index keys, and hash objects for quick data retrieval. The SAS® system is now able to work with Hadoop data storage to provide efficient processing power to SAS analytics. In this paper we demonstrate differences in data retrieval between the SQL procedure, the DATA step, and the Hadoop environment. In order to test data retrieval time, we used the following code to randomly generate ten million observations with character and numeric variables using the RANUNI function. DO LOOP=1 to 10e7 will generate ten million records; however this code can generate any number of records by changing the exponential log. We use the most commonly used functions and procedures to retrieve records on a test data set and we illustrate real- and CPU-time processing. All PROC SQL and Hadoop queries are on SAS® Enterprise Guide® 6.1 to record processing time. This paper includes an overview of Hadoop data architecture and describes how to connect to a Hadoop environment through SAS. It provides sample queries for data retrieval, explains differences using Hadoop pass-through versus Hadoop LIBNAME, and covers the big challenges for real-time and CPU-time comparison using the most commonly used SAS functions and procedures.
Anjan Matlapudi,, AmeriHealth Caritas Family of Companies
SAS® provides in-database processing technology in the SQL procedure, which allows the SQL explicit pass-through method to push some or all of the work to a database management system (DBMS). This paper focuses on using the SAS SQL explicit pass-through method to transform Teradata table columns into rows. There are two common approaches for transforming table columns into rows. The first approach is to create narrow tables, one for each column that requires transposition, and then use UNION or UNION ALL to append all the tables together. This approach is straightforward but can be quite cumbersome, especially when there is a large number of columns that need to be transposed. The second approach is using the Teradata TD_UNPIVOT function, which makes the wide-to-long table transposition an easy job. However, TD_UNPIVOT allows you to transpose only columns with the same data type from wide to long. This paper presents a SAS macro solution to the wide-to-long table transposition involving different column data types. Several examples are provided to illustrate the usage of the macro solution. This paper complements the author's SAS paper Performing Efficient Transposes on Large Teradata Tables Using SQL Explicit Pass-Through in which the solution of performing the long-to-wide table transposition method is discussed. SAS programmers who are working with data stored in an external DBMS and would like to efficiently transpose their data will benefit from this paper.
Tao Cheng, Accenture
SAS® software provides many DATA step functions that search and extract patterns from a character string, such as SUBSTR, SCAN, INDEX, TRANWRD, etc. Using these functions to perform pattern matching often requires you to use many function calls to match a character position. However, using the Perl regular expression (PRX) functions or routines in the DATA step improves pattern-matching tasks by reducing the number of function calls and making the program easier to maintain. This talk, in addition to discussing the syntax of Perl regular expressions, demonstrates many real-world applications.
Arthur Li, City of Hope
SAS® 9.4 allows for extensive customization of configuration settings. These settings are changed as new products are added into a deployment and upgrades to existing products are deployed into the SAS® infrastructure. The ability to track which configuration settings change during the addition of a specific product or the installation of a particular platform maintenance release can be very useful. Often, customers run a SAS deployment step and wonder what exactly changed and in which files. The use of version control systems is becoming increasingly popular for tracking configuration settings. This paper demonstrates how to use Git, a hugely popular, open-source version control system to manage, track, audit, and revert changes to your SAS configuration infrastructure. Using Git, you can quickly list which files were changed by the addition of a product or maintenance package and inspect the differences. You can then revert to the previous settings if that becomes desirable.
Alec Fernandez, SAS
Automation of everyday activities holds the promise of consistency, accuracy, and relevancy. When applied to business operations, the additional benefits of governance, adaptability, and risk avoidance are realized. Prescriptive analytics empowers both systems and front-line workers to take the desired company action each and every time. And with data streaming from transactional systems, from the Internet of Things (IoT), and any other source, doing the right thing with exceptional processing speed produces the responsiveness that customers depend on. This talk describes how SAS® and Teradata are enabling prescriptive analytics in current business environments and in the emerging IoT.
Tho Nguyen, Teradata
Fiona McNeill, SAS
In customer relationship management (CRM) or consumer finance it is important to predict the time of repeated events. These repeated events might be a purchase, service visit, or late payment. Specifically, the goal is to find the probability density for the time to first event, the probability density for the time to second event, and so on. Two approaches are presented and contrasted. One approach uses discrete time hazard modeling (DTHM). The second, a distinctly different approach, uses a multinomial logistic model (MLM). Both DTHM and MLM satisfy an important consistency criterion, which is to provide probability densities whose average across all customers equals the empirical population average. DTHM requires fewer models if the number of repeated events is small. MLM requires fewer models if the time horizon is small. SAS® code is provided through a simulation in order to illustrate the two approaches.
Bruce Lund, Magnify Analytic Solutions, Division of Marketing Associates
In a data warehousing system, change data capture (CDC) plays an important part not just in making the data warehouse (DWH) aware of the change but also in providing a means of flowing the change to the DWH marts and reporting tables so that we see the current and latest version of the truth. This and slowly changing dimensions (SCD) create a cycle that runs the DWH and provides valuable insights in the history and for the decision-making future. What if the source has no CDC? It would be an ETL nightmare to identify the exact change and report the absolute truth. If these two processes can be combined into a single process where just one single transform does both jobs of identifying the change and applying the change to the DWH, then we can save significant processing times and value resources of the system. Hence, I came up with a hybrid SCD with CDC approach for this. My paper focuses on sources that DO NOT have CDC in their sources and need to perform SCD Type 2 on such records without worrying about data duplications and increased processing times.
Vishant Bhat, University of Newcastle
Tony Blanch, SAS Consultant
Horizontal data sorting is a very useful SAS® technique in advanced data analysis when you are using SAS programming. Two years ago (SAS® Global Forum Paper 376-2013), we presented and illustrated various methods and approaches to perform horizontal data sorting, and we demonstrated its valuable application in strategic data reporting. However, this technique can also be used as a creative analytic method in advanced business analytics. This paper presents and discusses its innovative and insightful applications in product purchase sequence analyses such as product opening sequence analysis, product affinity analysis, next best offer analysis, time-span analysis, and so on. Compared to other analytic approaches, the horizontal data sorting technique has the distinct advantages of being straightforward, simple, and convenient to use. This technique also produces easy-to-interpret analytic results. Therefore, the technique can have a wide variety of applications in customer data analysis and business analytics fields.
Justin Jia, Trans Union Canada
Shan Shan Lin, CIBC
A bubble map is a useful tool for identifying trends and visualizing the geographic proximity and intensity of events. This session describes how to use readily available map data sets in SAS® along with PROC GEOCODE and PROC GMAP to turn a data set of addresses and events into a bubble map of the United States with scaled bubbles depicting the location and intensity of events.
Caroline Walker, Warren Rogers Associates
Regression is used to examine the relationship between one or more explanatory (independent) variables and an outcome (dependent) variable. Ordinary least squares regression models the effect of explanatory variables on the average value of the outcome. Sometimes, we are more interested in modeling the median value or some other quantile (for example, the 10th or 90th percentile). Or, we might wonder if a relationship between explanatory variables and outcome variable is the same for the entire range of the outcome: Perhaps the relationship at small values of the outcome is different from the relationship at large values of the outcome. This presentation illustrates quantile regression, comparing it to ordinary least squares regression. Topics covered will include: a graphical explanation of quantile regression, its assumptions and advantages, using the SAS® QUANTREG procedure, and interpretation of the procedure's output.
Ruth Croxford, Institute for Clinical Evaluative Sciences
One of the most important factors driving the success of requirements-gathering can be easily overlooked. Your user community needs to have a clear understanding of what is possible: from different ways to represent a hierarchy to how visualizations can drive an analysis to newer, but less common, visualizations that are quickly becoming standard. Discussions about desktop access versus mobile deployment and/or which users might need more advanced statistical reporting can lead to a serious case of option overload. One of the best cures for option overload is to provide your user community with access to template reports they can explore themselves. In this paper, we describe how you can take a single rich data set and build a set of template reports that demonstrate the full functionality of SAS® Visual Analytics, a suite of the most common, most useful SAS Visual Analytics report structures, from high-level dashboards to statistically deep dynamic visualizations. We show exactly how to build a dozen template reports from a single data source, simultaneously representing options for color schemes, themes, and other choices to consider. Although this template suite approach can apply to any industry, our example data set will be publicly available data from the Home Mortgage Disclosure Act, de-identified data on mortgage loan determinations. Instead of beginning requirements-gathering with a blank slate, your users can begin the conversation with, I would like something like Template #4, greatly reducing the time and effort required to meet their needs.
Elliot Inman, SAS
Michael Drutar, SAS
Creating web applications and web services can sometimes be a daunting task. With the ever changing facets of web programming, it can be even more challenging. For example, client-side web programming using applets was very popular almost 20 years ago, only to be replaced with server-side programming techniques. With all of the advancements in JavaScript libraries, it seems client-side programming is again making a comeback. Amidst all of the changing web technologies, surprisingly one of the most powerful tools I have found that has provided exceptional capabilities across all types of web development techniques has been SAS/IntrNet®. Traditionally seen as a server-side programming tool for generating complete web pages based on SAS® content, SAS/IntrNet, coupled with jQuery AJAX or AJAX alone, also has the ability to provide client-side web programming techniques, as well as provide RESTful web service implementations. I hope to show that with the combination of these tools including the JSON procedure from SAS® 9.4, simple yet powerful web services or dynamic content rich web pages can be created easily and rapidly.
Jeremy Palbicki, Mayo Clinic
JavaScript Object Notation (JSON) has quickly become the de-facto standard for data transfer on the Internet, due to an increase in both web data and the usage of full-stack JavaScript. JSON has become dominant in the emerging technologies of the web today, such as the Internet of Things and the mobile cloud. JSON offers a light and flexible format for data transfer, and can be processed directly from JavaScript without the need for an external parser. The SAS® JSON procedure lacks the ability to read in JSON. However, the addition of the GROOVY procedure in SAS® 9.3 allows for execution of Java code from within SAS, allowing for JSON data to be read into a SAS data set through XML conversion. This paper demonstrates the method for parsing JSON into data sets with Groovy and the XML LIBNAME Engine, all within Base SAS®.
John Kennedy, Mesa Digital
Abstract logistic regression is one of the widely used regression models in statistics. Its dependent variable is a discrete variable with at least two levels. It measures the relationship between categorical dependent variables and independent variables and predicts the likelihood of having the event associated with an outcome variable. Variable reduction and screening are the techniques that reduce the redundant independent variables and filter out the independent variables with less predictive power. For several reasons, variable reduction and screening are important and critical especially when there are hundreds of covariates available that could possibly fit into the model. These techniques decrease the model estimation time and they can avoid the occurrence of non-convergence and can generate an unbiased inference of the outcome. This paper reviews various approaches for variable reduction, including traditional methods starting from univariate single logistic regressions and methods based on variable clustering techniques. It also presents hands-on examples using SAS/STAT® software and illustrates three selection nodes available in SAS® Enterprise Miner™: variable selection node, variable clustering node, and decision tree node.
Xinghe Lu, Amerihealth Caritas
SAS® Visual Analytics offers many new and exciting ways to look at your data. Users are able to load data at will to explore and ask questions of their data. But what happens if you need to prevent users from viewing all data? What can you do to prevent users from loading too much data? What happens if a user loads data that exceeds the hardware capacity of your environment? This session covers practical ways to limit available resources and secure SAS LASR data. Real-world scenarios are covered and attendees can participate in an open discussion.
David Franklin, SAS
SharePoint is a web application framework and platform developed by Microsoft, mostly used for content and document management by mid-size businesses and large departments. Linking SAS® with SharePoint combines the power of these two into one. This paper shows users how to send PDF reports and files in other formats (such as Microsoft Excel files, HTML files, JPEG files, zipped files, and so on) from SAS to a SharePoint Document Library. The paper demonstrates how to configure SharePoint Document Library settings to receive files from SAS. A couple of SAS code examples are included to show how to send files from SAS to SharePoint. The paper also introduces a framework for creating data visualization on SharePoint by feeding SAS data into HTML pages on SharePoint. An example of how to create an infographics SharePoint page with data from SAS is also provided.
Xiaogang Tang, Wyndham Worldwide
Sometimes, the relationship between an outcome (dependent) variable and the explanatory (independent) variable(s) is not linear. Restricted cubic splines are a way of testing the hypothesis that the relationship is not linear or summarizing a relationship that is too non-linear to be usefully summarized by a linear relationship. Restricted cubic splines are just a transformation of an independent variable. Thus, they can be used not only in ordinary least squares regression, but also in logistic regression, survival analysis, and so on. The range of values of the independent variable is split up, with knots defining the end of one segment and the start of the next. Separate curves are fit to each segment. Overall, the splines are defined so that the resulting fitted curve is smooth and continuous. This presentation describes when splines might be used, how the splines are defined, choice of knots, and interpretation of the regression results.
Ruth Croxford, Institute for Clinical Evaluative Sciences
Create a model with SAS Contextual Analysis, deploy the model to your Hadoop cluster, run the model using map/reduce driven by SAS Code Accelerator not alongside, but in Hadoop. Abstract: Hadoop took the world by storm as a cost efficient software framework for storing data. Hadoop is more than that, it's an analytics platform. Analytics is more than just crunching numbers, it's also about gaining knowledge from your organization's unstructured data. SAS has the tools to do both. In this paper, we'll demonstrate how to access your unstructured data, automatically create a Text Analytics model, deploy that model in Hadoop and run SAS's Code Accelerator to give you insights into your big and unstructured data lying dormant in Hadoop. We'll show you how the insights gained can help you make better informed decisions for your organization.
David Bultman, SAS
Adam Pilz, SAS
High-quality documentation of SAS® code is standard practice in multi-user environments for smoother group collaborations. One of the documentation items that facilitate program sharing and retrospective review is a header section at the beginning of a SAS program highlighting the main features of the program, such as the program's name, its creation date, the program's aims, the programmer's identification, and the project title. In this header section, it is helpful to keep a list of the inputs and outputs of the SAS program (for example, SAS data sets and files that the program used and created). This paper introduces SAS-IO, a browser-based HTML/JavaScript tool that can automate production of such an input/output list. This can save the programmers' time, especially when working with long SAS programs.
Mohammad Reza Rezai, Institute for Clinical Evaluative Sciences
This presentation describes the new SAS® Customer Intelligence 360 solution for multi-armed bandit analysis of controlled experiments. Multi-armed bandit analysis has been in existence since the 1950s and is sometimes referred to as the K- or N-armed bandit problem. In this problem, a gambler at a row of slot machines (sometimes known as 'one-armed bandits') has to decide which machines to play, as well as the frequency and order in which to play each machine with the objective of maximizing returns. In practice, multi-armed bandits have been used to model the problem of managing research projects in a large organization. However, the same technology can be applied within the marketing space where, for example, variants of campaign creatives or variants of customer journey sub-paths are compared to better understand customer behavior by collecting their respective response rates. Distribution weights of the variants are adjusted as time passes and conversions are observed to reflect what customers are responding to, thus optimizing the performance of the campaign. The SAS Customer Intelligence 360 analytic engine collects data at regularly scheduled time intervals and processes it through a simulation engine to return an updated weighting schema. The process is automated in the digital space and led through the solution's content serving and execution engine for interactive communication channels. Both marketing gains and costs are studied during the process to illustrate the business value of multi-arm bandit analysis. SAS® has coupled results from traditional A/B testing to feed the multi-armed bandit engine, which are presented.
Thomas Lehman, SAS
If, like the character from George Orwell's novel, you need to control what your users are doing and also need to report on it, then this is paper for you. Using a combination of resources available in SAS® 9.4, any administrator can control what users are allowed to perform within SAS, and then can create comprehensive and customized reports detailing what was done. This paper discusses how metadata roles can be used to control users' capabilities. Particular attention is given to the user roles available to SAS® Enterprise Guide® and the SAS® Add-in for Microsoft Office, as well as administrator roles available to the SAS® Management Console. Best practices are discussed when it comes to the creation of these roles and how they should be applied to groups and users. The second part of this paper explores how to monitor SAS® utilization through SAS® Environment Manager. It investigates the data collected through its extended monitoring and how this can be harvested to create reports that can track sessions launched, procedures used, data accessed, and other purpose-built reports.This paper is for SAS Administrators who are responsible for the maintenance of systems, system architects that need to design new deployments, and users interested in an understanding of how to use SAS in a secure organization.
Elena Muriel, Amadeus Software Limited
SAS® Customer Intelligence 360 enables marketers to create activity maps to delight customers with a consistent experience on digital channels such as web, mobile, and email. Activity maps enable the user to create a customer journey across digital channels and assign conversion measures as individual customers and prospects navigate the customer experience. This session details how those interactions are constructed in terms of what types of interactions tasks are supported and how those interaction tasks can be related to each other in a visual paradigm with built-in goal splits, analytics, wait periods, and A/B testing. Business examples are provided for common omni-channel problems such as cross-selling and customer segmentation. In addition, we cover the integration of SAS® Marketing Automation and other marketing products, which enables sites to leverage their current segments and treatments for the digital channels.
Mark Brown, SAS
Brian Chick, SAS
SAS® users are always surprised to discover their programs contain bugs (or errors). In fact, when asked, users will emphatically stand by their programs and logic by saying they are error free. But, the vast number of experiences, along with the realities of writing code, say otherwise. Errors in program code can appear anywhere, whether accidentally introduced by developers or programmers, when writing code. No matter where an error occurs, the overriding sentiment among most users is that debugging SAS programs can be a daunting and humbling task. This presentation explores the world of SAS errors, providing essential information about the various error types. Attendees learn how errors are created, their symptoms, identification techniques, and how to apply effective techniques to better understand, repair, and enable program code to work as intended.
Kirk Paul Lafler, Software Intelligence Corporation
Historically, administration of your SAS® Grid Manager environment has required interaction with a number of disparate applications including Platform RTM for SAS, SAS® Management Console, and command line utilities. With the third maintenance release of SAS® 9.4, you can now use SAS® Environment Manager for all monitoring and management of your SAS Grid. The new SAS Environment Manager interface gives you the ability to configure the Load Sharing Facility (LSF), manage and monitor high-availability applications, monitor overall SAS Grid health, define event-based alerts, and much, much more through a single, unified, web-based interface.
Scott Parrish, SAS
Paula Kavanagh, SAS Institute, Inc.
Linda Zeng, SAS Institute, Inc.
It is not uncommon to hear SAS® administrators complain that their IT department and users just don't get it when it comes to metadata and security. For the administrator or user not familiar with SAS, understanding how SAS interacts with the operating system, the file system, external databases, and users can be confusing. This paper walks you through all the basic metadata relationships and how they are created on an installation of SAS® Enterprise Office Analytics installation in a Windows environment. This guided tour unravels the mystery of how the host system, external databases, and SAS work together to give users what they need, while reliably enforcing the appropriate security.
Charyn Faenza, F.N.B. Corporation
The purpose of this paper is to provide an overview of SAS® metadata security for new or inexperienced SAS administrators. The focus of the discussion is on identifying the most common metadata security objects such as access control entries (ACEs), access control templates (ACTs), metadata folders, authentication domains, etc. and describing how these objects work together to secure the SAS environment. Based on a standard SAS® Enterprise Office Analytics for Midsize Business installation in a Windows environment, this paper walks through a simple example of securing a metadata environment, which demonstrates how security is prioritized, the impact of each security layer, and how conflicts are resolved.
Charyn Faenza, F.N.B. Corporation
Mobile devices are an integral part of a business professional's life. These mobile devices are getting increasingly powerful in terms of processor speeds and memory capabilities. Business users can benefit from a more analytical visualization of the data along with their business context. The new SAS® Mobile BI contains many enhancements that facilitate the use of SAS® Analytics in the newest version of SAS® Visual Analytics. This paper demonstrates how to use the new analytical visualization that has been added to SAS Mobile BI from SAS Visual Analytics, for a richer and more insightful experience for business professionals on the go.
Murali Nori, SAS
Spontaneous combustion describes combustion that occurs without an external ignition source. With the right combination of fire tetrahedron components--including fuel, oxidizer, heat, and chemical reaction--it can be a deadly yet awe-inspiring phenomenon, and differs from traditional combustion that requires a fire source, such as a match, flint, or spark plugs (in the case of combustion engines). SAS® code as well often requires a 'spark' the first time it is run or run within a new environment. Thus, SAS® programs might operate correctly in an organization's original development environment, but might fail in its production environment until folders are created, SAS® libraries are assigned, control tables are constructed, or configuration files are built or modified. And, if software portability is problematic within a single organization, imagine the complexities that exist when SAS code is imported from a blog, white paper, textbook, or other external source into one's own infrastructure. The lack of software portability and the complexities of initializing new code often compel development teams to build code from scratch rather than attempting to rehabilitate or customize existent code to run in their environment. A solution is to develop SAS code that flexibly builds and validates its environment and required components during execution. To that end, this text describes techniques that increase the portability, reusability, and maintainability of SAS code by demonstrating self-extracting, spontaneously combustible code that requires no spark.
Troy Hughes, Datmesis Analytics
Over the past two years, the Analytics group at 89 Degrees has completely overhauled the toolset we use in our day-to-day work. We implemented SAS® Visual Analytics, initially to reduce the time to create new reports, and then to increase access to the data so that users not familiar with SAS® can create their own explorations and reports. SAS Visual Analytics has become a collaboration tool between our analysts and their business partners, with proportionally more time spent on the interpretation. (We show an example of this in this presentation.) Flush with success, we decided to tackle another area where processing times were longer than we would like, namely weblog data. We were treating weblog data in one of two ways: (1) creating a structured view of unstructured data by saving a handful of predefined variables from each session (for example, session and customer identifiers, page views, time on site, and so on), or (2) storing the granular weblog data in a Hadoop environment and relying on our data management team to fulfill data requests. We created a business case for SAS/ACCESS® Interface for Hadoop, invested in extra hardware, and created a big data environment that could be accessed directly from SAS by the analysts. We show an example of how we created new variables and used them in a logistic regression analysis to predict combined online and offline shopping rates using online and offline historic behavior. Our next dream is to bring all of that together by linking SAS Visual Statistics to a Hadoop environment other than the SAS® LASR™ Analytic server. We share our progress, and hopefully our success, as part of the session.
Rosie Poultney, 89 Degrees
In today's Business Intelligence world, self-service, which allows an everyday knowledge worker to explore data and personalize business reports without being tech-savvy, is a prerequisite. The new release of SAS® Visual Statistics introduces an HTML5-based, easy-to-use user interface that combines statistical modeling, business reporting, and mobile sharing into a one-stop self-service shop. The backbone analytic server of SAS Visual Statistics is also updated, allowing an end user to analyze data of various sizes in the cloud. The paper illustrates this new self-service modeling experience in SAS Visual Statistics using telecom churn data, including the steps of identifying distinct user subgroups using decision tree, building and tuning regression models, designing business reports for customer churn, and sharing the final modeling outcome on a mobile device.
Xiangxiang Meng, SAS
Don Chapman, SAS
Cheryl LeSaint, SAS
The third maintenance release of SAS® 9.4 was a huge release with respect to the interoperability between SAS® and Hadoop, the industry standard for big data. This talk brings you up-to-date with where we are: more distributions, more data types, more options. Come and learn about the exciting new developments for blending your SAS processing with your shared Hadoop cluster. Grid processing. Check. SAS data sets on HDFS. Check.
Paul Kent, SAS
Coronal mass ejections (CMEs) are massive explosions of magnetic field and plasma from the Sun. While responsible for the northern lights, these eruptions can cause geomagnetic storms and cataclysmic damage to Earth's telecommunications systems and power grid infrastructures. Hence, it is imperative to construct highly accurate predictive processes to determine whether an incoming CME will produce devastating effects on Earth. One such process, called stacked generalization, trains a variety of models, or base-learners, on a data set. Then, using the predictions from the base-learners, another model is trained to learn from the metadata. The goal of this meta-learner is to deduce information about the biases from the base-learners to make more accurate predictions. Studies have shown success in using linear methods, especially within regularization frameworks, at the meta-level to combine the base-level predictions. Here, SAS® Enterprise Miner™ 13.1 is used to reinforce the advantages of regularization via the Least Absolute Shrinkage and Selection Operator (LASSO) on this type of metadata. This work compares the LASSO model selection method to other regression approaches when predicting the occurrence of strong geomagnetic storms caused by CMEs.
Taylor Larkin, The University of Alabama
Denise McManus, University of Alabama
This paper examines the various sampling options that are available in SAS® through PROC SURVEYSELECT. We do not cover all of the possible sampling methods or options that PROC SURVEYSELECT features. Instead, we look at Simple Random Sampling, Stratified Random Sampling, Cluster Sampling, Systematic Sampling, and Sequential Random Sampling.
Rachael Becker, University of Central Florida
Drew Doyle, University of Central Florida
Just as there are many ways to solve any problem in any facet of life, most SAS® programming problems have more than one potential solution. Each solution has tradeoffs; a complex program might execute very quickly but prove challenging to maintain, while a program designed for ease of use might require more resources for development, execution, and maintenance. Too often, it seems like those tasked to produce the results are advised on delivery date and estimated development time in advance, but are given no guidelines for efficiency expectations. This paper provides ways for improving the efficiency of your SAS® programs. It suggests coding techniques, provides guidelines for their use, and shows the results of experimentation to compare various coding techniques, with examples of acceptable and improved ways to accomplish the same task.
Andrew Kuligowski, HSN
Swati Agarwal, Optum
With the ever growing global tourism sector, the hotel industry is also flourishing by leaps and bounds, and the standards of quality services by the tourists are also increasing. In today's cyber age where many on the Internet become part of the online community, word of mouth has steadily increased over time. According to a recent survey, approximately 46% of travelers look for online reviews before traveling. A one-star rating by users influences the peer consumer more than the five-star brand that the respective hotel markets itself to be. In this paper, we do the customer segmentation and create clusters based on their reviews. The next process relates to the creation of an online survey targeting the interests of the customers of those different clusters, which helps the hoteliers identify the interests and the hospitality expectations from a hotel. For this paper, we use a data set of 4096 different hotel reviews, with a total of 60,239 user reviews.
Saurabh Nandy, Oklahoma State University
Neha Singh, Oklahoma State University
Whether it is a question of security or a question of centralizing the SAS® installation to a server, the need to phase out SAS in the PC environment has never been so common. On the surface, this type of migration seems very simple and smooth. However, migrating SAS from a PC environment to a SAS server environment (SAS® Enterprise Guide®) is really easy to underestimate. This paper presents a way to set up the winning conditions to achieve this goal without falling into the common traps. Based on a successful conversion with a previous employer, I have identified a high-level roadmap with specific objectives that will guide people in this important task.
Mathieu Gaouette, Videotron
Sampling, whether with or without replacement, is an important component of the hypothesis testing process. In this paper, we demonstrate the mechanics and outcome of sampling without replacement, where sample values are not independent. In other words, what we get in the first sample affects what we can get for the second sample, and so on. We use the popular variant of poker known as No Limit Texas Hold'em to illustrate.
Dan Bretheim, Towers Watson
Similarity analysis is used to classify an entire time series by type. In this method, a distance measure is calculated to reflect the difference in form between two ordered sequences, such as are found in time series data. The mutual differences between many sequences or time series can be compiled to a similarity matrix, similar in form to a correlation matrix. When cluster methods are applied to a similarity matrix, time series with similar patterns are grouped together and placed into clusters distinct from others that contain times series with different patterns. In this example, similarity analysis is used to classify supernovae by the way the amount of light they produce changes over time.
David Corliss, Ford Motor Company
Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive question or in fear of embarrassment. Researchers often assume that their data are missing completely at random or missing at random.' Unfortunately, we cannot test whether the mechanism condition is satisfied because missing values cannot be calculated. Alternatively, we can run simulation in SAS® to observe the behaviors of missing data under different assumptions: missing completely at random, missing at random, and being ignorable. We compare the effects from imputation methods if we assign a set of variable of interests to missing. The idea of imputation is to see how efficient substituted values in a data set affect further studies. This lets the audience decide which method(s) would be best to approach a data set when it comes to missing data.
Danny Rithy, California Polytechnic State University
Soma Roy, California Polytechnic State University
In forecasting, there are often situations where several time series are interrelated: components of one time series can transition into and from other time series. A Markov chain forecast model may readily capture such intricacies through the estimation of a transition probability matrix, which enables a forecaster to forecast all the interrelated time series simultaneously. A Markov chain forecast model is flexible in accommodating various forecast assumptions and structures. Implementation of a Markov chain forecast model is straightforward using SAS/IML® software. This paper demonstrates a real-world application in forecasting a community supervision caseload in Washington State. A Markov model was used to forecast five interrelated time series in the midst of turbulent caseload changes. This paper discusses the considerations and techniques in building a Markov chain forecast model at each step. Sample code using SAS/IML is provided. Anyone interested in adding another tool to their forecasting technique toolbox will find that the Markov approach is useful and has some unique advantages in certain settings.
Gongwei Chen, Caseload Forecast Council
Are you frustrated with manually setting options to control your SAS® Display Manager sessions but become daunted every time you look at all the places you can set options and window layouts? In this paper, we look at various files SAS accesses when starting, what can (and cannot) go into them, and what takes precedence after all are executed. We also look at the SAS Registry and how to programmatically change settings. By the end of the paper, you will be comfortable in knowing where to make the changes that best fit your needs.
Peter Eberhardt, Fernwood Consulting Group Inc.
The report looks simple enough--a bar chart and a table, like something created with GCHART and REPORT procedures. But there are some twists to the reporting requirements that make those procedures not quite flexible enough. It's often the case that the programming tools and techniques we envision using for a project or are most comfortable with aren't necessarily the best to use. Fortunately, SAS® can provide many ways to get results. Rather than procedure-based output, the solution here was to mix 'old' and 'new' DATA step-based techniques to solve the problem. Annotate data sets are used to create the bar chart and the Report Writing Interface (RWI) is used to create the table. Without a whole lot of additional code, you gain an extreme amount of flexibility.
Pete Lund, Looking Glass Analytics
Big data is often distinguished as encompassing high volume, high velocity, or high variability of data. While big data can signal big business intelligence and big business value, it can also wreak havoc on systems and software ill-prepared for its profundity. Scalability describes the ability of a system or software to adequately meet the needs of additional users or its ability to use additional processors or resources to fulfill those added requirements. Scalability also describes the adequate and efficient response of a system to increased data throughput. Because sorting data is one of the most common and most resource-intensive operations in any software language, inefficiencies or failures caused by big data often are first observed during sorting routines. Much SAS® literature has been dedicated to optimizing big data sorts for efficiency, including minimizing execution time and, to a lesser extent, minimizing resource usage (that is, memory and storage consumption). However, less attention has been paid to implementing big data sorting that is reliable and robust even when confronted with resource limitations. To that end, this text introduces the SAFESORT macro, which facilitates a priori exception-handling routines (which detect environmental and data set attributes that could cause process failure) and post hoc exception-handling routines (which detect actual failed sorting routines). If exception handling is triggered, SAFESORT automatically reroutes program flow from the default sort routine to a less resource-intensive routine, thus sacrificing execution speed for reliability. Moreover, macro modularity enables developers to select their favorite sort procedure and, for data-driven disciples, to build fuzzy logic routines that dynamically select a sort algorithm based on environmental and data set attributes.
Troy Hughes, Datmesis Analytics
Sixty percent of organizations will have Hadoop in production by 2016, per a recent TDWI survey, and it has become increasingly challenging to access, cleanse, and move these huge volumes of data. How can data scientists and business analysts clean up their data, manage it where it lives, and overcome the big data skills gap? It all comes down to accelerating the data preparation process. SAS® Data Loader leverages years of expertise in data quality and data integration, a simplified user interface, and the parallel processing power of Hadoop to overcome the skills gap and compress the time taken to prepare data for analytics or visualization. We cover some of the new capabilities of SAS Data Loader for Hadoop including in-memory Spark processing of data quality and master data management functions, faster profiling and unstructured data field extraction, and chaining multiple transforms together for improved productivity.
Matthew Magne, SAS
Scott Gidley, SAS
SAS/ETS® 14.1 delivers a substantial number of new features to researchers who want to examine causality with observational data in addition to forecasting the future. This release adds count data models with spatial effects, new linear and nonlinear models for panel data, the X13 procedure for seasonal adjustment, and many more new features. This paper highlights the many enhancements to SAS/ETS software and demonstrates how these features can help your organization increase revenue and enhance productivity.
Jan Chvosta, SAS
The increasing size and complexity of data in research and business applications require a more versatile set of tools for building explanatory and predictive statistical models. In response to this need, SAS/STAT® software continues to add new methods. This presentation takes you on a high-level tour of five recent enhancements: new effect selection methods for regression models with the GLMSELECT procedure, model selection for generalized linear models with the HPGENSELECT procedure, model selection for quantile regression with the HPQUANTSELECT procedure, construction of generalized additive models with the GAMPL procedure, and building classification and regression trees with the HPSPLIT procedure. For each of these approaches, the presentation reviews its key concepts, uses a basic example to illustrate its benefits, and guides you to information that will help you get started.
Bob Rodriguez, SAS
A stored process is a SAS® program that can be executed as required by different applications. Stored processes have been making SAS users' lives easier for decades. In SAS® Visual Analytics, stored processes can be used to enhance the user experience, create custom functionality and output, and expand the usefulness of reports. This paper discusses a technique for how data can be loaded on demand into memory for SAS Visual Analytics and accessed by reports as needed using stored processes. Loading tables into memory so that they can be used to create explorations or reports is a common task in SAS Visual Analytics. This task is usually done by an administrator, enabling the SAS Visual Analytics user to have a seamless transition from data to report. At times, however, users need for tables to be partially loaded or modified in memory on demand. The step-by-step instructions in this paper make it easy enough for any SAS Visual Analytics report builder to include these stored processes in their work. By using this technique, SAS Visual Analytics users have the freedom to access their data without having to work through an administrator for every little task, helping everyone be more effective and productive.
Renato Luppi, SAS
Varsha Chawla, SAS Institute Inc.
Sensors, devices, social conversation streams, web movement, and all things in the Internet of Things (IoT) are transmitting data at unprecedented volumes and rates. SAS® Event Stream Processing ingests thousands and even hundreds of millions of data events per second, assessing both the content and the value. The benefit to organizations comes from doing something with those results, and eliminating the latencies associated with storing data before analysis happens. This paper bridges the gap. It describes how to use streaming data as a portable, lightweight micro-analytics service for consumption by other applications and systems.
Fiona McNeill, SAS
David Duling, SAS
Steve Sparano, SAS
This paper compares different solutions to a data transpose puzzle presented to the SAS® User Group at the United States Census Bureau. The presented solutions range from a SAS 101 multi-step solution to an advanced solution using techniques that are not widely known, which yields run-time savings of 85 percent!
Ahmed Al-Attar, AnA Data Warehousing Consulting, LLC
Big data, small data--but what about when you have no data? Survey data commonly include missing values due to nonresponse. Adjusting for nonresponse in the analysis stage might lead different analysts to use different, and inconsistent, adjustment methods. To provide the same complete data to all the analysts, you can impute the missing values by replacing them with reasonable nonmissing values. Hot-deck imputation, the most commonly used survey imputation method, replaces the missing values of a nonrespondent unit by the observed values of a respondent unit. In addition to performing traditional cell-based hot-deck imputation, the SURVEYIMPUTE procedure, new in SAS/STAT® 14.1, also performs more modern fully efficient fractional imputation (FEFI). FEFI is a variation of hot-deck imputation in which all potential donors in a cell contribute their values. Filling in missing values is only a part of PROC SURVEYIMPUTE. The real trick is to perform analyses of the filled-in data that appropriately account for the imputation. PROC SURVEYIMPUTE also creates a set of replicate weights that are adjusted for FEFI. Thus, if you use the imputed data from PROC SURVEYIMPUTE along with the replicate methods in any of the survey analysis procedures--SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, or SURVEYPHREG--you can be confident that inferences account not only for the survey design, but also for the imputation. This paper discusses different approaches for handling nonresponse in surveys, introduces PROC SURVEYIMPUTE, and demonstrates its use with real-world applications. It also discusses connections with other ways of handling missing values in SAS/STAT.
Pushpal Mukhopadhyay, SAS
Cancer is the second-leading cause of deaths in the United States. About 10% to 15% of all lung cancers are non-small cell lung cancer, but they constitute about 26.8% of cancer deaths. An efficient cancer treatment plan is therefore of prime importance for increasing the survival chances of a patient. Cancer treatments are generally given to a patient in multiple sittings and often doctors tend to make decisions based on the improvement over time. Calculating the survival chances of the patient with respect to time and determining the various factors that influence the survival time would help doctors make more informed decisions about the further course of treatment and also help patients develop a proactive approach in making choices for the treatment. The objective of this paper is to analyze the survival time of patients suffering from non-small cell lung cancer, identify the time interval crucial for their survival, and identify various factors such as age, gender, and treatment type. The performances of the models built are analyzed to understand the significance of cubic splines used in these models. The data set is from the Cerner database with 548 records and 12 variables from 2009 to 2013. The patient records with loss to follow-up are censored. The survival analysis is performed using parametric and nonparametric methods in SAS® Enterprise Miner™ 13.1. The analysis revealed that the survival probability of a patient is high within two weeks of hospitalization and the probability of survival goes down by 60% between weeks 5 and 9 of admission. Age, gender, and treatment type play prominent roles in influencing the survival time and risk. The probability of survival for female patients decreases by 70% and 80% during weeks 6 and 7, respectively, and for male patients the probability of survival decreases by 70% and 50% during weeks 8 and 13, respectively.
Raja Rajeswari Veggalam, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Akansha Gupta, Oklahoma State University
So, you've encrypted your data. But is it safe? What would happen if that anonymous data you've shared isn't as anonymous as you think? Senior SAS® Consultant Andy Smith from Amadeus Software discusses the approaches hackers take to break encryption, and he shares simple, practical techniques to foil their efforts.
Andy Smith, Amadeus Software Limited
When business rules are deployed and executed--whether a rule is fired or not--if the rule-fire outcomes are not monitored or investigated for validation or challenged, over time unintended business impacts can occur because of changing data profiles or characteristics of the input data for the rules. Comparing scenarios using modified rules and visually comparing how they might impact your business can aide you in meeting compliance regulations, knowing your customers, and staying relevant or accurate in your particular business context. Visual analysis of rules outcomes is a powerful way to validate what is expected or to compare potential impact that could lead to further investigation and refinement of existing rules. This paper shows how to use SAS® Visual Analytics and other visualizations to perform various types of meaningful and useful rule-fire outcome analysis with rules created in SAS® Business Rules Manager. Using visual graphical capabilities can give organizations or businesses a straightforward way to validate, monitor, and keep rules from running wild.
Charlotte Crain, SAS
Chris Upton, SAS
Universities, government agencies, and non-profits all require various levels of data analysis, data manipulation, and data management. This requires a workforce that knows how to use statistical packages. The server-based SAS® OnDemand for Academics: Studio offerings are excellent tools for teaching SAS in both postsecondary education and in professional continuing education settings. SAS OnDemand for Academics: Studio can be used to share resources; teach users using Windows, UNIX, and Mac OS computers; and teach in various computing environments. This paper discusses why one might use SAS OnDemand for Academics: Studio instead of SAS® University Edition, SAS® Enterprise Guide®, or Base SAS® for education and provides examples of how SAS OnDemand for Academics: Studio has been used for in-person and virtual training.
Charlotte Baker, Florida A&M University
C. Perry Brown, Florida Agricultural and Mechanical University
This paper discusses a set of practical recommendations for optimizing the performance and scalability of your Hadoop system using SAS®. Topics include recommendations gleaned from actual deployments from a variety of implementations and distributions. Techniques cover tips for improving performance and working with complex Hadoop technologies such as Kerberos, techniques for improving efficiency when working with data, methods to better leverage the SAS in Hadoop components, and other recommendations. With this information, you can unlock the power of SAS in your Hadoop system.
Nancy Rausch, SAS
Wilbram Hazejager, SAS
In this session, we examine the use of text analytics as a tool for strategic analysis of an organization, a leader, a brand, or a process. The software solutions SAS® Enterprise Miner™ and Base SAS® are used to extract topics from text and create visualizations that identify the relative placement of the topics next to various business entities. We review a number of case studies that identify and visualize brand-topic relationships in the context of branding, leadership, and service quality.
Nicholas Evangelopoulos, University of North Texas
The recent controversy regarding former Secretary Hillary Clinton's use of a non-government, privately maintained email server provides a great opportunity to analyze real-world data using a variety of analytic techniques. This email corpus is interesting because of the challenges in acquiring and preparing the data for analysis as well as the variety of analyses that can be performed, including techniques for searching, entity extraction and resolution, natural language processing for topic generation, and social network analysis. Given the potential for politically charged discussion, rest assured there will be no discussion of politics--just fact-based analysis.
Michael Ames, SAS
Are you living in Heartbreak Hotel because your boss wants different statistics in the SAME column on your report? Need a currency symbol in one cell of your pre-summarized data, but a percent sign in another cell? Large blocks of text on your report have you all shook up because they wrap badly on your report? Have you hit the wall with PROC PRINT? Well, rock out of your jailhouse with ODS, DATA step, and PROC REPORT. Are you living in Heartbreak Hotel because your boss wants different statistics in the SAME column on your report? Need a currency symbol in one cell of your pre-summarized data, but a percent sign in another cell? Large blocks of text on your report have you all shook up because they wrap badly on your report? Have you hit the wall with PROC PRINT? Well, rock out of your jailhouse with ODS, DATA step, and PROC REPORT. This paper is a sequel to the popular 2008 paper Creating Complex Reports. The paper presents a nuts-and-bolts look at more complex report examples gleaned from SAS® Community Forum questions and questions from students. Examples will include use of DATA step manipulation to produce PROC REPORT and PROC SGPLOT output as well as examples of ODS LAYOUT and the new report writing interface. And PROC TEMPLATE makes a special guest appearance. Even though the King of Rock 'n' Roll won't be there for the presentation, perhaps we'll hear his ghost say Thank you very much, I always wanted to know how to do that at the end of this presentation.
Cynthia Zender, SAS
It's impossible to know all of SAS® or all of statistics. There will always be some technique that you don't know. However, there are a few techniques that anyone in biostatistics should know. If you can calculate those with SAS, life is all the better. In this session you will learn how to compute and interpret a baker's dozen of these techniques, including several statistics that are frequently confused. The following statistics are covered: prevalence, incidence, sensitivity, specificity, attributable fraction, population attributable fraction, risk difference, relative risk, odds ratio, Fisher's exact test, number needed to treat, and McNemar's test. With these 13 tools in their tool chest, even nonstatisticians or statisticians who are not specialists will be able to answer many common questions in biostatistics. The fact that each of these can be computed with a few statements in SAS makes the situation all the sweeter. Bring your own doughnuts.
AnnMaria De Mars, 7 Generation Games
VBA has been described as a glue language, and has been widely used in exchanging data between Microsoft products such as Excel and Word or PowerPoint. How to trigger the VBA macro from SAS® via DDE has been widely discussed in recent years. However, using SAS to send parameters to a VBA macro was seldom reported. This paper provides a solution for this problem. Copying Excel tables to PowerPoint using the combination of SAS and VBA is illustrated as an example. The SAS program rapidly scans all Excel files that are contained in one folder, passes the file information to VBA as parameters, and triggers the VBA macro to write PowerPoint files in a loop. As a result, a batch of PowerPoint files can be generated by just one mouse-click.
Zhu Yanrong, Medtronic
Like a good pitcher and catcher in baseball, ODS layout and the ODS destination for PowerPoint are a winning combination in SAS® 9.4. With this dynamic duo, you can go straight from performing data analysis to creating a quality presentation. The ODS destination for PowerPoint produces native PowerPoint files from your output. When you pair it with ODS layout, you are able to dynamically place your output on each slide. Through code examples this paper shows you how to create a custom title slide, as well as place the desired number of graphs and tables on each slide. Don't be relegated to the sidelines--increase your winning percentage by learning how ODS layout works with the ODS destination for PowerPoint.
Jane Eslinger, SAS
Like all skilled tradespeople, SAS® programmers have many tools at their disposal. Part of their expertise lies in knowing when to use each tool. In this paper, we use a simple example to compare several common approaches to generating the requested report: the TABULATE, TRANSPOSE, REPORT, and SQL procedures. We investigate the advantages and disadvantages of each method and consider when applying it might make sense. A variety of factors are examined, including the simplicity, reusability, and extensibility of the code in addition to the opportunities that each method provides for customizing and styling the output. The intended audience is beginning to intermediate SAS programmers.
Josh Horstman, Nested Loop Consulting
SAS® Visual Analytics can display maps with your location information. However, you might need to display locations that do not match the categories found in the SAS Visual Analytics interface, such as street address locations or non-US postal code locations. You might also wish to display custom locations that are specific to your business or industry, such as the locations of power grid substations or railway mile markers. Alternatively, you might want to validate your address data. This presentation shows how PROC GEOCODE can be used to simplify the geocoding by processing your location information before putting your data into SAS Visual Analytics.
Darrell Massengill, SAS
The HPSUMMARY procedure provides data summarization tools to compute basic descriptive statistics for variables in a SAS® data set. It is a high-performance version of the SUMMARY procedure in Base SAS®. Though PROC SUMMARY is popular with data analysts, PROC HPSUMMARY is still a new kid on the block. The purpose of this paper is to provide an introduction to PROC HPSUMMARY by comparing it with its well-known counterpart, PROC SUMMARY. The comparison focuses on differences in syntax, options, and performance in terms of execution time and memory usage. Sample code, outputs, and SAS log snippets are provided to illustrate the discussion. Simulated large data is used to observe the performance of the two procedures. Using SAS® 9.4 installed on a single-user machine with four cores available, preliminary experiments that examine performance of the two procedures show that HPSUMMARY is more memory-efficient than SUMMARY when the data set is large. (For example, SUMMARY failed due to insufficient memory, whereas HPSUMMARY finished successfully). However, there is no evidence of a performance advantage of the HPSUMMARY over the SUMMARY procedures in this single-user machine.
Anh Kellermann, University of South Florida
Jeff Kromrey, University of South Florida
SAS customers have a growing need for accurate and quality SAS® maps. SAS has licensed the map data from a third party to satisfy this need. This presentation explores the new map data by discussing the problems and limitations with the old map data, and the features and examples for using the new data.
Darrell Massengill, SAS
For marketers who are responsible for identifying the best customer to target in a campaign, it is often daunting to determine which media channel, offer, or campaign program is the one the customer is more apt to respond to, and therefore, is more likely to increase revenue. This presentation examines the components of designing campaigns to identify promotable segments of customers and to target the optimal customers using SAS® Marketing Automation integrated with SAS® Marketing Optimization.
Pamela Dixon, SAS
The SAS® procedure PROC TREE sketches parent-child lineage--also known as trees--from hierarchical data. Hierarchical relationships can be difficult to flatten out into a data set, but with PROC TREE, its accompanying ODS table TREELISTING, and some creative yet simple handcrafting, a de-lineage of parent-children variables can be derived. Because the focus of PROC TREE is to provide the tree structure in graphical form, it does not explicitly output the depth of the tree, although the depth is visualized in the accompanying graph. Perhaps unknown to most, the path length variable, or simply the height of the tree, can be extracted from PROC TREE merely by capturing it from the ODS output, as demonstrated in this paper.
Can Tongur, Statistics Sweden
Time flies. Thirteen years have passed since the first introduction of the hash method by SAS®. Dozens of papers have been written offering new (sometimes weird) applications for hash. Even beyond look-ups, we can use the hash object for summation, splitting files, sorting files, fuzzy matching, and other data processing problems. In SAS, almost any task can be solved by more than one method. For a class of problems where hash can be applied, is it the best tool to use? We present a variety of problems and provide several solutions, using both hash tables and more traditional methods. Comparing the solutions, you must consider computing time as well as your time to code and validate your responses. Which solution is better? You decide!
David Izrael, Abt Associates
Elizabeth Axelrod, Abt Associates
This SAS® test begins where the SAS® Advanced Certification test leaves off. It has 25 questions to challenge even the most experienced SAS programmers. Most questions are about the DATA step, but there are a few macro and SAS/ACCESS® software questions thrown in. The session presents each question on a slide, then the answer on the next. You WILL be challenged.
Glen Becker, USAA
The Scheduler is an innovative tool that maps linear data (that is, time stamps) to an intuitive three-dimensional representation. This transformation enables the user to identify relationships, conflicts, gaps, and so on, that were not readily apparent in the data's native form. This tool has applications in operations research and litigation-related analyses. This paper discusses why the Scheduler was created, the data that is available to analyze the issue, and how this code can be used in other types of applications. Specifically, this paper discusses how the Scheduler can be used by supervisors to maintain the presence of three or more employees at all times to ensure that all federal- and state- mandated work breaks are taken. Additional examples of the Scheduler include assisting construction foremen to create schedules that visualize the presence of contractors in a logical sequence while eliminating overlap and ensuring cushions of time between contractors and matching items to non-uniform information.
Ariel Kumpinsky, The Claro Group, LLC
SAS® helps people make better decisions. But what makes a decision better? How can we make sure we are not making worse decisions? There are six tenets to follow to ensure we are making better decisions. Decisions are better when they are: (1) Aligned with your mission; (2) Complete; (3) Faster; (4) Accurate; (5) Accessible; and (6) Recurring, ongoing, or productionalized. By combining all of these aspects of making a decision, you can have confidence that you are making a better decision. The breadth of SAS software is examined to understand how it can be applied toward these tenets. Scorecards are used to ensure that your business stays aligned with goals. Data Management is used to bring together all of the data you have, to provide complete information. SAS® Visual Analytics offerings are unparalleled in their speed to enable you to make faster decisions. Exhaustive testing verifies accuracy. Modern, easy-to-use user interfaces are adapted for multiple languages and designed for a variety of users to ensure accessibility. And the powerful SAS data flow architecture is built for ongoing support of decisions. Several examples from the SAS® Solutions OnDemand group are used as case studies in support of these tenets.
Dan Jahn, SAS
I like to think that SAS® error messages come in three flavors, IMPORTANT, INTERESTING, and IRRELEVANT. SAS calls its messages NOTES, WARNINGS, and ERRORS. I intend to show you that not all NOTES are IRRELEVANT nor are all ERRORS IMPORTANT. This paper walks through many different scenarios and explains in detail the meaning and impact of messages presented in the SAS log. I show you how to locate, classify, analyze, and resolve many different SAS message types. And for those brave enough, I go on to teach you how to both generate and suppress messages sent to the SAS log. This paper presents an overview of messages that can often be found in a SAS Log window or output file. The intent of this presentation is to familiarize you with common messages, the meaning of the messages, and how to determine the best way to react to the messages. Notice I said react , not necessarily correct. Code examples and log output will be presented to aid in the explanations.
William E Benjamin Jr, Owl Computer Consultancy LLC
Specifying colors based on group value is a quite popular practice in visualizing data, but it is not so easy to do, especially when there are multiple group values. This paper explores three different methods to dynamically assign colors to plots based on their group values. They are combining EVAL and IFN functions in the plot statements; bringing the DISCRETEATTRMAP block into the plot statements; and using the macro from the SAS® sample 40255.
Amos Shu, MedImmune
Since it makes login transparent and does not send passwords over the wire, Integrated Windows Authentication (IWA) is both extremely convenient for end users and highly secure. However, for administrators, it is not easy to set up and rarely successful on the first attempt, so being able to effectively troubleshoot is imperative. In this paper, we take a step-by-step approach to configuring IWA, explain how and where to get useful debugging output, and share our hard-earned knowledge base of known problems and deciphered error messages.
Mike Roda, SAS
Inherently, mixed modeling with SAS/STAT® procedures (such as GLIMMIX, MIXED, and NLMIXED) is computationally intensive. Therefore, considerable memory and CPU time can be required. The default algorithms in these procedures might fail to converge for some data sets and models. This encore presentation of a paper from SAS Global Forum 2012 provides recommendations for circumventing memory problems and reducing execution times for your mixed-modeling analyses. This paper also shows how the new HPMIXED procedure can be beneficial for certain situations, as with large sparse mixed models. Lastly, the discussion focuses on the best way to interpret and address common notes, warnings, and error messages that can occur with the estimation of mixed models in SAS® software.
Phil GIbbs, SAS
SAS® DS2 is a powerful new object-oriented programming language that was introduced with SAS® 9.4. Having been designed for advanced data manipulation, it enables the programmer not only to use a much wider range of data types than with the traditional Base SAS® language but it also allows for the creation of custom methods and packages. These packages are analogous to classes in other object-oriented languages such as C# or Ruby and can be used to massively improve your programming effectiveness. This paper demonstrates a number of techniques that can be used to take full advantage of DS2's object-oriented user-defined package capabilities. Examples include creating new data types, storing and reusing packages, using simple packages as building blocks in the creation of more complex packages, a technique to overcome DS2's lack of support for inheritance and building lightweight packages to facilitate method overloading and make parameter passing simpler.
Chris Brooks, Melrose Analytics Ltd
Are you going to enable HTTPS for your SAS® environment? Looking to improve the security of your SAS deployment? Do you need more details about how to efficiently configure HTTPS? This paper guides you through the configuration of SAS® 9.4 with HTTPS for the SAS middle tier. We examine how best to implement site-signed Transport Layer Security (TLS) certificates and explore how far you can take the encryption. This paper presents tips and proven practices that can help you be successful.
Stuart Rogers, SAS
Do you need a macro for your program? This paper provides some guidelines, based on user experience, and explores whether it's worth the time to create a macro (for example, a parameter-driven macro or just a simple macro variable). This paper is geared toward new users and experienced users who do not use macros.
Claudine Lougee, Dualenic
Data transformations serve many functions in data analysis, including improving normality of distribution and equalizing variance to meet assumptions and improve effective sizes. Traditionally, the first step in the analysis is to preprocess and transform the data to derive different representations for further exploration. But now in this era of big data, it is not always feasible to have transformed data available beforehand. Analysts need to conduct exploratory data analysis and subsequently transform data on the fly according to their needs. SAS® Visual Analytics has an expression builder component integrated into SAS® Visual Data Builder, SAS® Visual Analytics Explorer, and SAS® Visual Analytics Designer that helps you transform data on the fly. The expression builder enables you to create expressions that you can use to aggregate columns, perform multiple operations on data, and perform conditional processing. It supports numeric, comparison, Boolean, text, and date time operators, and different functions like Log, Ln, Mod, Exp, Power, Root, and so on. This paper demonstrates how you can use the expression builder that is integrated into the data builder, the explorer, and the designer to create different types of expressions and transform data for analysis and reporting purpose.
Atul Kachare, SAS
Sometimes data are not arranged the way you need them to be for the purpose of reporting or combining with other data. This presentation examines just such a scenario. We focus on the TRANSPOSE procedure and its ability to transform data to meet our needs. We explore this method as an alternative to using longhand code involving arrays and OUTPUT statements in a SAS® DATA step.
Faron Kincheloe, Baylor University
At the University of Central Florida (UCF), we recently invested in SAS® Visual Analytics, along with the updated SAS® Business Intelligence platform (from 9.2 to 9.4), a project that took over a year to be completed. This project was undertaken to give our users the best and most updated tools available. This paper introduces the SAS Visual Analytics environment at UCF and includes projects created using this product. It answers why we selected SAS Visual Analytics for development over other SAS® applications. It explains the technical environment for our non-distributed SAS Visual Analytics: RAM, servers, benchmarking, sizing, and scaling. It discusses why we chose the non-distributed mode versus distributed mode. Challenges in the design, implementation, usage, and performance are also presented, including the reasons why Hadoop was not adopted.
Scott Milbuta, University of Central Florida
Ulf Borjesson, University of Central Florida
Carlos Piemonti, University of Central Florida
If data sets are being merged and a variable occurs in more than one of the data sets, the last value (from the variable in the right-most data set listed in the MERGE statement) is kept. It is critical, therefore, to know the name and sources of any common variables. This paper reviews simple techniques for noting overwriting in the log, summarizing the names and sources of all variables, and identifying common variables before merging files.
Christopher Bost, MDRC
Machine learning is gaining popularity, fueled by the rapid advancement of computing technology. Machine learning offers the ability to continuously adjust predictive models based on real-time data, and to visualize changes and take action on the new information. Hear from two PhD's from SAS and Intel about how SAS® machine learning, working with SAS streaming analytics and Intel microprocessor technology, makes this possible.
This session describes the construction of a program that converts any set of relational tables into a single flat file, using Base SAS® in SAS® 9.3. The program gets the information it needs from the data tables themselves, with a minimum of user configuration. It automatically detects one-to-many relationships and creates sets of replicate variables in the flat file. The program illustrates the use of macro programming and the SQL, DATASETS, and TRANSPOSE procedures, among others. The output is sent to a Microsoft Excel spreadsheet using the Output Delivery System (ODS).
Dav Vandenbroucke, HUD
SAS® High-Performance Risk distributes financial risk data and big data portfolios with complex analyses across a networked Hadoop Distributed File System (HDFS) grid to support rapid in-memory queries for hundreds of simultaneous users. This data is extremely complex and must be stored in a proprietary format to guarantee data affinity for rapid access. However, customers still desire the ability to view and process this data directly. This paper demonstrates how to use the HPRISK custom file reader to directly access risk data in Hadoop MapReduce jobs, using the HPDS2 procedure and the LASR procedure.
Mike Whitcher, SAS
Stacey Christian, SAS
Phil Hanna, SAS Institute
Don McAlister, SAS
This paper proposes a technique to implement wavelet analysis (WA) for improving a forecasting accuracy of the autoregressive integrated moving average model (ARIMA) in nonlinear time-series. With the assumption of the linear correlation, and conventional seasonality adjustment methods used in ARIMA (that is, differencing, X11, and X12), the model might fail to capture any nonlinear pattern. Rather than directly model such a signal, we decompose it to less complex components such as trend, seasonality, process variations, and noises, using WA. Then, we use them as exogenous variables in the autoregressive integrated moving average with explanatory variable model (ARIMAX). We describe a background of WA. Then, the code and a detailed explanation of WA based on multi-resolution analysis (MRA) in SAS/IML® software are demonstrated. The idea and mathematical basis of ARIMA and ARIMAX are also given. Next, we demonstrate our technique in forecasting applications using SAS® Forecast Studio. The demonstrated time-series are nonlinear in nature from different fields. The results suggest that WA effects are good regressors in ARIMAX, which captures nonlinear patterns well.
Woranat Wongdhamma, Oklahoma State University
Health care and other programs collect large amounts of information in order to administer the program. A health insurance plan, for example, probably records every physician service, hospital and emergency department visit, and prescription medication--information that is collected in order to make payments to the various providers (hospitals, physicians, pharmacists). Although the data are collected for administrative purposes, these databases can also be used to address research questions, including questions that are unethical or too expensive to answer using randomized experiments. However, when subjects are not randomly assigned to treatment groups, we worry about assignment bias--the possibility that the people in one treatment group were healthier, smarter, more compliant, etc., than those in the other group, biasing the comparison of the two treatments. Propensity score methods are one way to adjust the comparisons, enabling research using administrative data to mimic research using randomized controlled trials. In this presentation, I explain what the propensity score is, how it is used to compare two treatments, and how to implement a propensity score analysis using SAS®. Two methods using propensity scores are presented: matching and inverse probability weighting. Examples are drawn from health services research using the administrative databases of the Ontario Health Insurance Plan, the single payer for medically necessary care for the 13 million residents of Ontario, Canada. However, propensity score methods are not limited to health care. They can be used to examine the impact of any nonrandomized program, as long as there is enough information to calculate a credible propensity score.
Ruth Croxford, Institute for Clinical Evaluative Sciences
Someone has aptly said, Las Vegas looks the way one would imagine heaven must look at night. What if you know the secret to run a plethora of various businesses in the entertainment capital of the world? Nothing better, right? Well, we have what you want, all the necessary ingredients for you to precisely know what business to target in a particular locality of Las Vegas. Yelp, a community portal, wants to help people finding great local businesses. They cover almost everything from dentists and hair stylists through mechanics and restaurants. Yelp's users, Yelpers, write reviews and give ratings for all types of businesses. Yelp then uses this data to make recommendations to the Yelpers about which institutions best fit their individual needs. We have the yelp academic data set comprising 1.6 million reviews and 500K tips by 366K in 61K businesses across several cities. We combine current Yelp data from all the various data sets for Las Vegas to create an interactive map that provides an overview of how a business runs in a locality and how the ratings and reviews tickers a business. We answer the following questions: Where is the most appropriate neighborhood to start a new business (such as cafes, bars, and so on)? Which category of business has the greatest total count of reviews that is the most talked about (trending) business in Las Vegas? How does a business' working hours affect the customer reviews and the corresponding rating of the business? Our findings present research for further understanding of perceptions of various users, while giving reviews and ratings for the growth of a business by encompassing a variety of topics in data mining and data visualization.
Anirban Chakraborty, Oklahoma State University
Sensitive data requires elevated security requirements and the flexibility to apply logic that subsets data based on user privileges. Following the instructions in SAS® Visual Analytics: Administration Guide gives you the ability to apply row-level permission conditions. After you have set the permissions, you have to prove through audits who has access and row-level security. This paper provides you with the ability to easily apply, validate, report, and audit all tables that have row-level permissions, along with the groups, users, and conditions that will be applied. Take the hours of maintenance and lack of visibility out of row-level secure data and build confidence in the data and analytics that are provided to the enterprise.
Brandon Kirk, SAS
The HPSPLIT procedure, a powerful new addition to SAS/STAT®, fits classification and regression trees and uses the fitted trees for interpretation of data structures and for prediction. This presentation shows the use of PROC HPSPLIT to analyze a number of data sets from different subject areas, with a focus on prediction among subjects and across landscapes. A critical component of these analyses is the use of graphical tools to select a model, interpret the fitted trees, and evaluate the accuracy of the trees. The use of different kinds of cross validation for model selection and model assessment is also discussed.
For SAS® users, PROC TABULATE and PROC REPORT (and its compute blocks) are probably among the most common procedures for calculating and displaying data. It is, however, pretty difficult to calculate and display changes from one column to another using data from other rows with just these two procedures. Compute blocks in PROC REPORT can calculate additional columns, but it would be challenging to pick up values from other rows as inputs. This presentation shows how PROC TABULATE can work with the lag(n) function to calculate rates of change from one period of time to another. This offers the flexibility of feeding into calculations the data retrieved from other rows of the report. PROC REPORT is then used to produce the desired output. The same approach can also be used in a variety of scenarios to produce customized reports.
Khoi To, Office of Planning and Decision Support, Virginia Commonwealth University
Have you questioned the Read throughput or Write throughput for your Windows system drive? What about the input/output (I/O) throughput of a non-system drive? One solution is use SASIOTEST.EXE to measure the Read or Write throughput for any drive connected to your system. Since SAS® 9.2, SASIOTEST.EXE has been included with each release of SAS for Windows. This paper explains the options supported by SASIOTEST and the various ways to use SASIOTEST. It also describes how the I/O relates to SAS I/O on Windows.
Mike Jones, SAS
The increasing popularity and affordability of wearable devices, together with their ability to provide granular physical activity data down to the minute, have enabled researchers to conduct advanced studies on the effects of physical activity on health and disease. This provides statistical programmers the challenge of processing data and translating it into analyzable measures. One such measure is the number of time-specific bouts of moderate to vigorous physical activity (MVPA) (similar to exercise), which is needed to determine whether the participant meets current physical activity guidelines (for example, 150 minutes of MVPA per week performed in bouts of at least 20 minutes). In this paper, we illustrate how we used SAS® arrays to calculate the number of 20-minute bouts of MVPA per day. We provide working code on how we processed Fitbit Flex data from 63 healthy volunteers whose physical activities were monitored daily for a period of 12 months.
Faith Parsons, Columbia University Medical Center
Keith M Diaz, Columbia University Medical Center
Jacob E Julian, Columbia University Medical Center
Our daily work in SAS® involves manipulation of many independent data sets, and a lot of time can be saved if independent data sets can be manipulated simultaneously. This paper presents our interface RunParallel, which opens multiple SAS sessions and controls which SAS procedures and DATA steps to run on which sessions by parsing comments such as /*EP.SINGLE*/ and /*EP.END*/. The user can easily parallelize any code by simply wrapping procedure steps and DATA steps in such comments and executing in RunParallel. The original structure of the SAS code is preserved so that it can be developed and run in serial regardless of the RunParallel comments. When applied in SAS programs with many external data sources and heavy computations, RunParallel can give major performance boosts. Among our examples we include a simulation that demonstrates how to run DATA steps in parallel, where the performance gain greatly outweighs the single minute it takes to add RunParallel comments to the code. In a world full a big data, a lot of time can be saved by running in parallel in a comprehensive way.
Jingyu She, Danica Pension
Tomislav Kajinic, Danica Pension
In many discrete-event simulation projects, the chief goal is to investigate the performance of a system. You can use output data to better understand the operation of the real or planned system and to conduct various what-if analyses. But you can also use simulation for validation--specifically, to validate a solution found by an optimization model. In optimization modeling, you almost always need to make some simplifying assumptions about the details of the system you are modeling. These assumptions are especially important when the system includes random variation--for example, in the arrivals of individuals, their distinguishing characteristics, or the time needed to complete certain tasks. A common approach holds each random element at some nominal value (such as the mean of its observed values) and proceeds with the optimization. You can do better. In order to test an optimization model and its underlying assumptions, you can build a simulation model of the system that uses the optimal solution as an input and simulates the system's detailed behavior. The simulation model helps determine how well the optimal solution holds up when randomness and perhaps other logical complexities (which the optimization model might have ignored, summarized, or modeled only approximately) are accounted for. Simulation might confirm the optimization results or highlight areas of concern in the optimization model. This paper describes cases in which you can use simulation and optimization together in this manner and discusses the positive implications of this complementary analytic approach. For the reader, prior experience with optimization and simulation is helpful but not required.
Ed Hughes, SAS
Emily Lada, SAS Institute Inc.
Leo Lopes, SAS Institute Inc.
Imre Polik, SAS
SAS® Studio includes tasks that can be used to generate SAS® programs to process SAS data sets. The graph tasks generate SAS programs that use ODS Graphics to produce a range of plots and charts. SAS® Enterprise Guide 7.1 and SAS® Add-In for Microsoft Office 7.1 can also make use of these SAS Studio tasks to generate graphs with ODS Graphics, even though their built-in tasks use SAS/GRAPH®. This paper describes these SAS Studio graph tasks.
Philip Holland, Holland Numerics Ltd
Seeing business metrics in real time enables a company to understand and respond to ever-changing customer demands. In reality, though, obtaining such metrics in real time is not always easy. However, SAS Australia and New Zealand Technical Support solved that problem by using SAS® Visual Analytics to develop a 16-display command center in the Sydney office. Using this center to provide real-time data enables the Sydney office to respond to customer demands across the entire South Asia region. The success of this deployment makes reporting capabilities and data available for Technical Support hubs in Wellington, Mumbai, Kuala Lumpur, and Singapore--covering a total distance of 12,360 kilometers (approximately 7,680 miles). By sharing SAS Visual Analytics report metrics on displays spanning multiple time zones that cover a 7-hour time difference, SAS Australia and New Zealand Technical Support has started a new journey of breaking barriers, collaborating more closely, and providing fuel for innovation and change for an entire region. This paper is aimed at individuals or companies who want to learn how SAS Australia & New Zealand Technical Support developed its command center and who are inspired to do the same for their offices!
Chris Blake, SAS
Multivariate statistical analysis plays an increasingly important role as the number of variables being measured increases in educational research. In both cognitive and noncognitive assessments, many instruments that researchers aim to study contain a large number of variables, with each measured variable assigned to a specific factor of the bigger construct. Recalling the educational theories or empirical research, the factor of each instrument usually emerges the same way. Two types of factor analysis are widely used in order to understand the latent relationships among these variables based on different scenarios. (1) Exploratory factor analysis (EFA), which is performed by using the SAS® procedure PROC FACTOR, is an advanced statistical method used to probe deeply into the relationship among the variables and the larger construct and then develop a customized model for the specific assessment. (2) When a model is established, confirmatory factor analysis (CFA) is conducted by using the SAS procedure PROC CALIS to examine the model fit of specific data and then make adjustments for the model as needed. This paper presents the application of SAS to conduct these two types of factor analysis to fulfill various research purposes. Examples using real noncognitive assessment data are demonstrated, and the interpretation of the fit statistics is discussed.
Jun Xu, Educational Testing Service
Steven Holtzman, Educational Testing Service
Kevin Petway, Educational Testing Service
Lili Yao, Educational Testing Service
This paper introduces an extremely fast and simple implementation of the survey cell collapsing process. Prior implementations had used either several SQL queries or numerous DATA step arrays, with multiple data reads. This new approach uses a single hash object with a maximum of two data reads. The hash object provides an efficient and convenient mechanism for quick data storage and retrieval (sub-second total run time).
Ahmed Al-Attar, AnA Data Warehousing Consulting, LLC
The DOCUMENT procedure is a little known procedure that can save you vast amounts of time and effort when managing the output of your SAS® programming efforts. This procedure is deeply associated with the mechanism by which SAS controls output in the Output Delivery System (ODS). Have you ever wished you didn't have to modify and rerun the report-generating program every time there was some tweak in the desired report? PROC DOCUMENT enables you to store one version of the report as an ODS Document Object and then call it out in many different output forms, such as PDF, HTML, listing, RTF, and so on, without rerunning the code. Have you ever wished you could extract those pages of the output that apply to certain BY variables such as State, StudentName, or CarModel? With PROC DOCUMENT, you have where capabilities to extract these. Do you want to customize the table of contents that assorted SAS procedures produce when you make frames for the table of contents with HTML, or use the facilities available for PDF? PROC DOCUMENT enables you to get to the inner workings of ODS and manipulate them. This paper addresses PROC DOCUMENT from the viewpoint of end results, rather than provide a complete technical review of how to do the task at hand. The emphasis is on the benefits of using the procedure, not on detailed mechanics.
Roger Muller, Data-To-Events, Inc.
Because optimization models often do not capture some important real-world complications, a collection of optimal or near-optimal solutions can be useful for decision makers. This paper uses various techniques for finding the k best solutions to the linear assignment problem in order to illustrate several features recently added to the OPTMODEL procedure in SAS/OR® software. These features include the network solver, the constraint programming solver (which can produce multiple solutions), and the COFOR statement (which allows parallel execution of independent solver calls).
Rob Pratt, SAS
The SAS® Deployment Backup and Recovery tool in the third maintenance release of SAS® 9.4 helps SAS administrators to collectively take backups of important data artifacts in SAS deployments, which include SAS® Metadata Server, SAS® Content Server, SAS® Web Infrastructure Platform Data Server, and physical data in the SAS configuration directory. The tool supports all types of deployments, from single-tier to multi-tier clustered heterogeneous host deployments. The new configuration options in the tool give administrators more control over the data that is being backed up, from the SAS tiers to the individual directory level. The new options allow administrators to filter out old log directories and to choose the databases to back up. This paper talks about not only how to use the configuration options but also how to mediate the effects that the SAS deployment configuration changes have on the backups and how to optimize the backups in terms of size and time.
Bhaskar Kulkarni, SAS
By default, the SAS® hash object permits only entries whose keys, defined in its key portion, are unique. While in certain programming applications this is a rather utile feature, there are also others for which being able to insert and manipulate entries with duplicate keys is imperative. Such an ability, facilitated in SAS since SAS® 9.2, was a welcome development: It vastly expanded the functionality of the hash object and eliminated the necessity to work around the distinct-key limitation using custom code. However, nothing comes without a price, and the ability of the hash object to store duplicate key entries is no exception. In particular, additional hash object methods had to be--and were--developed to handle specific entries sharing the same key. The extra price is that using these methods is surely not quite as straightforward as the simple corresponding operations on distinct-key tables, and the documentation alone is a rather poor help for making them work in practice. Rather extensive experimentation and investigative coding is necessary to make that happen. This paper is a result of such endeavor, and hopefully, it will save those who delve into it a good deal of time and frustration.
Paul Dorfman, Dorfman Consulting
The SURVEYSELECT procedure is useful for sample selection for a wide variety of applications. This paper presents the application of PROC SURVEYSELECT to a complex scenario involving a state-wide educational testing program, namely, drawing interdependent stratified cluster samples of schools for the field-testing of test questions, which we call items. These stand-alone field tests are given to only small portions of the testing population and as such, a stratified procedure was used to ensure representativeness of the field-test samples. As the field test statistics for these items are evaluated for use in future operational tests, an efficient procedure is needed to sample schools, while satisfying pre-defined sampling criteria and targets. This paper provides an adaptive sampling application and then generalizes the methodology, as much as possible, for potential use in more industries.
Chuanjie Liao, Pearson
Brian Patterson, Pearson
One of the truths about SAS® is that there are, at a minimum, three approaches to achieve any intended task and each approach has its own pros and cons. Identifying and using efficient SAS programming techniques are recommended and efficient programming becomes mandatory when larger data sets are accessed. This paper describes the efficiency of virtual access and various situations to use virtual access of data sets using OPEN, FETCH and CLOSE functions with %SYSFUNC and %DO loops. There are several ways to get data set attributes like number of observations, variables, types of variables, and so on. It becomes more efficient to access larger data sets using the OPEN function with %SYSFUNC. The FETCH with OPEN function gives the ability to access the values of the variables from a SAS data set for conditional and iterative executions with %DO loops. In the following situations, virtual access of the SAS data set becomes more efficient on larger data sets. Situation 1: It is often required to split the master data set into several pieces depending upon a certain criterion for the batch submission, as it is required that data sets be independent. Situation 2: For many reports and dashboards and for the array process, it is required to keep the relevant variables together instead of maintaining the original data set order. Situation 3: For most of the statistical analysis, particularly for correlation and regression, a widely and frequently used SAS procedure requires a dynamic variable list to be passed. Creating a single macro variable list might run into an issue with the macro variable length on the larger transactional data sets. It is recommended that you prepare the list of the variables as a data set and access them using the proposed approach in many places, instead of creating several macro variables.
Amarnath Vijayarangan, Emmes Services Pvt Ltd, India
Few data visualizations are as striking or as useful as heat maps, and fortunately there are many applications for them. A heat map displaying eye-tracking data is a particularly potent example: the intensity of a viewer's gaze is quantified and superimposed over the image being viewed, and the resulting data display is often stunning and informative. This paper shows how to use Base SAS® to prepare, smooth, and transform eye-tracking data and ultimately render it on top of a corresponding image. By customizing a graphical template and using specific Graph Template Language (GTL) options, a heat map can be drawn precisely so that the user maintains pixel-level control. In this talk, and in the related paper, eye-tracking data is used primarily, but the techniques provided are easily adapted to other fields such as sports, weather, and marketing.
Matthew Duchnowski, Educational Testing Service
SAS® Web Application Server goes down and the user is presented with an error message. The error messages in the SAS® 9.4 middle tier are the default ones that are shipped with the underlying VMware vFabric Web Server and are seen by many users as too technical and uninformative. This paper describes an application called 'Errors' that was developed at the Royal Bank of Scotland that has been implemented across its 9.4 estate to provide a much better user experience for when things go wrong. In addition, regardless of communications, users always try to access an application if it is available. This paper goes into detail about a feature of the Errors application that RBS uses to prevent this. This feature is used to control access to the web applications during scheduled outage windows and it provides capability for IP and location-based access as well as others. This paper also documents features and capabilities that RBS would like to introduce to the application.
Christopher Blake, RBS
Making a data delivery to a client is a complicated endeavor. There are many aspects that must be carefully considered and planned for: de-identification, public use versus restricted access, documentation, ancillary files such as programs, formats, and so on, and methods of data transfer, among others. This paper provides a blueprint for planning and executing your data delivery.
Louise Hadden, Abt Associates
The latest releases of SAS® Data Integration Studio, SAS® Data Management Studio and SAS® Data Integration Server, SAS® Data Governance, and SAS/ACCESS® software provide a comprehensive and integrated set of capabilities for collecting, transforming, and managing your data. The latest features in the product suite include capabilities for working with data from a wide variety of environments and types including Hadoop, cloud, RDBMS, files, unstructured data, streaming, and others, and the ability to perform ETL and ELT transformations in diverse run-time environments including SAS®, database systems, Hadoop, Spark, SAS® Analytics, cloud, and data virtualization environments. There are also new capabilities for lineage, impact analysis, clustering, and other data governance features for enhancements to master data and support metadata management. This paper provides an overview of the latest features of the SAS® Data Management product suite and includes use cases and examples for leveraging product capabilities.
Nancy Rausch, SAS
SAS® Federation Server is a scalable, threaded, multi-user data access server providing seamlessly integrated data from various data sources. When your data becomes too large to move or copy and too sensitive to allow direct access, the powerful set of data virtualization capabilities allows you to effectively and efficiently manage and manipulate data from many sources, without moving or copying the data. This agile data integration framework allows business users the ability to connect to more data, reduce risks, respond faster, and make better business decisions. For technical users, the framework provides central data control and security, reduces complexity, optimizes commodity hardware, promotes rapid prototyping, and increases staff productivity. This paper provides an overview of the latest features of the product and includes use cases and examples for leveraging product capabilities.
Tatyana Petrova, SAS
Software quality comprises a combination of both functional and performance requirements that specify not only what software should accomplish, but also how well it should accomplish it. Recoverability--a common performance objective--represents the timeliness and efficiency with which software or a system can resume functioning following a catastrophic failure. Thus, requirements for high availability software often specify the recovery time objective (RTO), or the maximum amount of time that software might be down following an unplanned failure or a planned outage. While systems demanding high or near perfect availability require redundant hardware and network resources, and additional infrastructure, software must also facilitate rapid recovery. And, in environments in which system or hardware redundancy is infeasible, recoverability can be improved only through effective software development practices. Because even the most robust code can fail under duress or due to unavoidable or unpredictable circumstances, software reliability must incorporate recoverability principles and methods. This text introduces the TEACH mnemonic that describes guiding principles that software recovery should be timely, efficient, autonomous, constant, and harmless. Moreover, the text introduces the SPICIER mnemonic that describes discrete phases in the recovery period, each of which can benefit from and be optimized with TEACH principles. Software failure is inevitable, but negative impacts can be minimized through SAS® development best practices.
Troy Hughes, Datmesis Analytics
For many organizations, the answer to whether to manage their data and analytics in a public or private cloud is going to be both. Both can be the answer for many different reasons: common sense logic not to replace a system that already works just to incorporate something new; legal or corporate regulations that require some data, but not all data, to remain in place; and even a desire to provide local employees with a traditional data center experience while providing remote or international employees with cloud-based analytics easily managed through software deployed via Amazon Web Services (AWS). In this paper, we discuss some of the unique technical challenges of managing a hybrid environment, including how to monitor system performance simultaneously for two different systems that might not share the same infrastructure or even provide comparable system monitoring tools; how to manage authorization when access and permissions might be driven by two different security technologies that make implementation of a singular protocol problematic; and how to ensure overall automation of two platforms that might be independently automated, but not originally designed to work together. In this paper, we share lessons learned from a decade of experience implementing hybrid cloud environments.
Ethan Merrill, SAS
Bryan Harkola, SAS
Even if you're not a GIS mapping pro, it pays to have some geographic problem-solving techniques in your back pocket. In this paper we illustrate a general approach to finding the closest location to any given US zip code, with a specific, user-accessible example of how to do it, using only Base SAS®. We also suggest a method for implementing the solution in a production environment, as well as demonstrate how parallel processing can be used to cut down on computing time if there are hardware constraints.
Andrew Clapson, MD Financial Management
Annmarie Smith, HomeServe USA
In the world of big data, real-time processing and event stream processing are becoming the norm. However, there are not many tools available today that can do this type of processing. SAS® Event Stream Processing aims to process this data. In this paper, we look at using SAS Event Stream Processing to read multiple data sets stored in big data platforms such as Hadoop and Cassandra in real time and to perform transformations on the data such as joining data sets, filtering data based on preset business rules, and creating new variables as required. We look at how we can score the data based on a machine learning algorithm. This paper shows you how to use the provided Hadoop Distributed File System (HDFS) publisher and subscriber to read and push data to Hadoop. The HDFS adapter is discussed in detail. We look at the Streamviewer to see how data flows through SAS Event Stream Processing.
Krishna Sai Kishore Konudula, Kavi Associates
SAS/IML® 14.1 enables you to author, install, and call packages. A package consists of SAS/IML source code, documentation, data sets, and sample programs. Packages provide a simple way to share SAS/IML functions. An expert who writes a statistical analysis in SAS/IML can create a package and upload it to the SAS/IML File Exchange. A nonexpert can download the package, install it, and immediately start using it. Packages provide a standard and uniform mechanism for sharing programs, which benefits both experts and nonexperts. Packages are very popular with users of other statistical software, such as R. This paper describes how SAS/IML programmers can construct, upload, download, and install packages. They're not wrapped in brown paper or tied up with strings, but they'll soon be a few of your favorite things!
Rick Wicklin, SAS
I get annoyed when macros tread all over my SAS® environment, macro variables and data sets. So how do you write macros that play nicely and then clean up afterwards? This paper describes techniques for writing macros that do just that.
Philip Holland, Holland Numerics Ltd
Do you write reports that sometimes have missing categories across all class variables? Some programmers write all sorts of additional DATA step code in order to show the zeros for the missing rows or columns. Did you ever wonder whether there is an easier way to accomplish this? PROC MEANS and PROC TABULATE, in conjunction with PROC FORMAT, can handle this situation with a couple of powerful options. With PROC TABULATE, we can use the PRELOADFMT and PRINTMISS options in conjunction with a user-defined format in PROC FORMAT to accomplish this task. With PROC SUMMARY, we can use the COMPLETETYPES option to get all the rows with zeros. This paper uses examples from Census Bureau tabulations to illustrate the use of these procedures and options to preserve missing rows or columns.
Chris Boniface, Census Bureau
Janet Wysocki, U.S. Census Bureau