SAS Global Forum 2015 Proceedings

Are we alone in this universe? This is a question that undoubtedly passes through every mind several times during a lifetime. We often hear a lot of stories about close encounters, Unidentified Flying Object (UFO) sightings and other mysterious things, but we lack the documented evidence for analysis on this topic. UFOs have been a matter of interest in the public for a long time. The objective of this paper is to analyze one database that has a collection of documented reports of UFO sightings to uncover any fascinating story related to the data. Using SAS^® Enterprise Miner™ 13.1, the powerful capabilities of text analytics and topic mining are leveraged to summarize the associations between reported sightings. We used PROC GEOCODE to convert addresses of sightings to the locations on the map. Then we used PROC GMAP procedure to produce a heat map to represent the frequency of the sightings in various locations. The GEOCODE procedure converts address data to geographic coordinates (latitude and longitude values). These geographic coordinates can then be used on a map to calculate distances or to perform spatial analysis. On preliminary analysis of the data associated with sightings, it was found that the most popular words associated with UFOs tell us about their shapes, formations, movements, and colors. The Text Profiler node in SAS Enterprise Miner 13.1 was leveraged to build a model and cluster the data into different levels of segment variable. We also explain how the opinions about the UFO sightings change over time using Text Profiling. Further, this analysis uses the Text Profile node to find interesting terms or topics that were used to describe the UFO sightings. Based on the feedback received at SAS^® analytics conference, we plan to incorporate a technique to filter duplicate comments and include weather in that location.

Read the paper (PDF). | Download the data file (ZIP).

This presentation provides a brief introduction to logistic regression analysis in SAS. Learn differences between Linear Regression and Logistic Regression, including ordinary least squares versus maximum likelihood estimation. Learn to: understand LOGISTIC procedure syntax, use continuous and categorical predictors, and interpret output from ODS Graphics.

For decades, mixed models been used by researchers to account for random sources of variation in regression-type models. Now they are gaining favor in business statistics to give better predictions for naturally occurring groups of data, such as sales reps, store locations, or regions. Learn about how predictions based on a mixed model differ from predictions in ordinary regression, and see examples of mixed models with business data.

Text data constitutes more than half of the unstructured data held in organizations. Buried within the narrative of customer inquiries, the pages of research reports, and the notes in servicing transactions are the details that describe concerns, ideas and opportunities. The historical manual effort needed to develop a training corpus is now no longer required, making it simpler to gain insight buried in unstructured text. With the ease of machine learning refined with the specificity of linguistic rules, SAS Contextual Analysis helps analysts identify and evaluate the meaning of the electronic written word. From a single, point-and-click GUI interface the process of developing text models is guided and visually intuitive. This presentation will walk through the text model development process with SAS Contextual Analysis. The results are in SAS format, ready for text-based insights to be used in any other SAS application.

SAS/ETS provides many tools to improve the productivity of the analyst who works with time series data. This tutorial will take an analyst through the process of turning transaction-level data into a time series. The session will then cover some basic forecasting techniques that use past fluctuations to predict future events. We will then extend this modeling technique to include explanatory factors in the prediction equation.

SAS^® University Edition is a great addition to the world of freely available analytic software, and this 'how-to' presentation shows you how to implement a discrete event simulation using Base SAS^® to model future US Veterans population distributions. Features include generating a slideshow using ODS output to PowerPoint.

Read the paper (PDF). | Download the data file (ZIP).

Just as research is built on existing research, the references section is an important part of a research paper. The purpose of this study is to find the differences between professionals and academicians with respect to the references section of a paper. Data is collected from SAS^® Global Forum 2014 Proceedings. Two research hypotheses are supported by the data. First, the average number of references in papers by academicians is higher than those by professionals. Second, academicians follow standards for citing references more than professionals. Text mining is performed on the references to understand the actual content. This study suggests that authors of SAS Global Forum papers should include more references to increase the quality of the papers.

Read the paper (PDF).

Educational administrators sometimes have to make decisions based on what they believe is in the best interest of their students because they do not have the data they need at the time. Some administrators do not even know that the data exist to help them make their decisions. However, well-intentioned policies that are not based on facts can sometimes do more harm than good for the students and the institution. This presentation discusses the results of the policy analyses conducted by the Office of Institutional Research at Western Kentucky University using Base SAS^®, SAS/STAT^®, SAS^® Enterprise Miner™, and SAS^® Visual Analytics. The researchers analyzed Western Kentucky University's math course placement procedure for incoming students and assessed the criteria used for admissions decisions, including those for first-time first-year students, transfer students, and students readmitted to the University after being dismissed for unsatisfactory academic progress--procedures and criteria previously designed with the students' best interests at heart. The presenters discuss the statistical analyses used to evaluate the policies and the use of SAS Visual Analytics to present their results to administrators in a visual manner. In addition, the presenters discuss subsequent changes in the policies, and where possible, the results of the policy changes.

Read the paper (PDF).

Multilevel models (MLMs) are frequently used in social and health sciences where data are typically hierarchical in nature. However, the commonly used hierarchical linear models (HLMs) are appropriate only when the outcome of interest is normally distributed. When you are dealing with outcomes that are not normally distributed (binary, categorical, ordinal), a transformation and an appropriate error distribution for the response variable needs to be incorporated into the model. Therefore, hierarchical generalized linear models (HGLMs) need to be used. This paper provides an introduction to specifying HGLMs using PROC GLIMMIX, following the structure of the primer for HLMs previously presented by Bell, Ene, Smiley, and Schoeneberger (2013). A brief introduction into the field of multilevel modeling and HGLMs with both dichotomous and polytomous outcomes is followed by a discussion of the model-building process and appropriate ways to assess the fit of these models. Next, the paper provides a discussion of PROC GLIMMIX statements and options as well as concrete examples of how PROC GLIMMIX can be used to estimate (a) two-level organizational models with a dichotomous outcome and (b) two-level organizational models with a polytomous outcome. These examples use data from High School and Beyond (HS&B), a nationally representative longitudinal study of American youth. For each example, narrative explanations accompany annotated examples of the GLIMMIX code and corresponding output.

Read the paper (PDF).

Join us for lunch as we discuss the benefits of being part of the elite group that is SAS Certified Professionals. The SAS Global Certification program has awarded more than 79,000 credentials to SAS users across the globe. Come listen to Terry Barham, Global Certification Manager, give an overview of the SAS Certification program, explain the benefits of becoming SAS certified and discuss exam preparation tips. This session will also include a Q&A section where you can get answers to your SAS Certification questions.

Institutional research and effectiveness offices at most institutions are often the primary beneficiaries of the data warehouse (DW) technologies. However, at many institutions, building the data warehouse for growing accountability, decision support, and the institutional effectiveness needs are still unfulfilled, in part due to the growing data volumes as well as the prohibitively expensive data warehousing costs built by UIT departments. In recent years, many institutional research offices in the country are often asked to take a leadership role in building the DW or partner with the campus IT department to improve the efficiency and effectiveness of the DW development. Within this context, the Office of Institutional Research and Effectiveness at a large public research university in the north east was entrusted with the responsibility to build the new campus data warehouse for growing needs such as resource allocation, competitive positioning, new program development in emerging STEM disciplines, and accountability reporting. These requirements necessitated the deployment of state-of-the-art analytical decision support applications, such as SAS^® Visual Analytics (reporting and analysis), SAS^® Visual Statistics (predictive), in a disparate data environment, including PeopleSoft (student), Kuali (finance), Genesys (human resources), and homegrown sponsored funding database. This presentation focuses on the efforts of institutional research and effectiveness offices in developing the decision support applications using the SAS^® Enterprise business intelligence and analytical solutions. With users ranging from nontechnical to advanced analysts, greater efficiency lies in the ability to get faster and more elegant reporting from those huge stores of data and being able to share the resulting discoveries across departments. Most of the reporting applications were developed based on the needs of IPEDS, CUPA, Common Data Set, US News and World Report, g raduation and retention, and faculty activity, and deployed through an online web-based portal. The participants will learn how the University quickly analyzes institutional data through an easy-to-use, drag-and-drop, web-based application. This presentation demonstrates how to use SAS^® Visual Analytics to quickly design reports that are attractive, interactive, and meaningful and then distribute those reports via the web, or through SAS^® Mobile BI on an iPad^® or tablet.

Read the paper (PDF).

This workshop provides hands-on experience performing statistical analysis with SAS University Edition and SAS Studio. Workshop participants will learn to: install and setup, perform basic statistical analyses using tasks, connect folders to SAS Studio for data access and results storage, invoke code snippets to import CSV data into SAS, and create a code snippet.

Read the paper (PDF).

This workshop provides hands-on experience using SAS^® Text Miner. Workshop participants will learn to: read a collection of text documents and convert them for use by SAS Text Miner using the Text Import node, use the simple query language supported by the Text Filter node to extract information from a collection of documents, use the Text Topic node to identify the dominant themes and concepts in a collection of documents, and use the Text Rule Builder node to classify documents having pre-assigned categories.

Read the paper (PDF).

A leading killer in the United States is smoking. Moreover, over 8.6 million Americans live with a serious illness caused by smoking or second-hand smoking. Despite this, over 46.6 million U.S. adults smoke tobacco, cigars, and pipes. The key analytic question in this paper is, How would e-cigarettes affect this public health situation? Can monitoring public opinions of e-cigarettes using SAS^® Text Analytics and SAS^® Visual Analytics help provide insight into the potential dangers of these new products? Are e-cigarettes an example of Big Tobacco up to its old tricks or, in fact, a cessation product? The research in this paper was conducted on thousands of tweets from April to August 2014. It includes API sources beyond Twitter--for example, indicators from the Health Indicators Warehouse (HIW) of the Centers for Disease Control and Prevention (CDC)--that were used to enrich Twitter data in order to implement a surveillance system developed by SAS^® for the CDC. The analysis is especially important to The Office of Smoking and Health (OSH) at the CDC, which is responsible for tobacco control initiatives that help states to promote cessation and prevent initiation in young people. To help the CDC succeed with these initiatives, the surveillance system also: 1) automates the acquisition of data, especially tweets; and 2) applies text analytics to categorize these tweets using a taxonomy that provides the CDC with insights into a variety of relevant subjects. Twitter text data can help the CDC look at the public response to the use of e-cigarettes, and examine general discussions regarding smoking and public health, and potential controversies (involving tobacco exposure to children, increasing government regulations, and so on). SAS^® Content Categorization helps health care analysts review large volumes of unstructured data by categorizing tweets in order to monitor and follow what people are saying and why they are saying it. Ultimatel y, it is a solution intended to help the CDC monitor the public's perception of the dangers of smoking and e-cigarettes, in addition, it can identify areas where OSH can focus its attention in order to fulfill its mission and track the success of CDC health initiatives.

Read the paper (PDF).

Universities often have student data that is difficult to represent, which includes information about the student's home location. Often, student data is represented in tables, and patterns are easily overlooked. This study aimed to represent recruiting and retention data at a county level using SAS^® mapping software for a public, land-grant university. Three years of student data from the student records database were used to visually represent enrollment, retention, and other predictors of student success. SAS^® Enterprise Guide^® was used along with the GMAP procedure to make user-friendly maps. Displaying data using maps on a county level revealed patterns in enrollment, retention, and other factors of interest that might have otherwise been overlooked, which might be beneficial for recruiting purposes.

Read the paper (PDF).

Becoming one of the best memorizers in the world doesn't happen overnight. With hard work, dedication, a bit of obsession, and with the assistance of some clever analytics metrics, Nelson Dellis was able to climb himself up to the top of the memory rankings in under a year to become the now 3x USA Memory Champion. In this talk, he explains what it takes to become the best at memory, what is involved in such grueling memory competitions, and how analytics helped him get there.

Breast cancer is the leading cause of cancer-related deaths among women worldwide, and its early detection can reduce mortality rate. Using a data set containing information about breast screening provided by the Royal Liverpool University Hospital, we constructed a model that can provide early indication of a patient's tendency to develop breast cancer. This data set has information about breast screening from patients who were believed to be at risk of developing breast cancer. The most important aspect of this work is that we excluded variables that are in one way or another associated with breast cancer, while keeping the variables that are less likely to be associated with breast cancer or whose associations with breast cancer are unknown as input predictors. The target variable is a binary variable with two values, 1 (indicating a type of cancer is present) and 0 (indicating a type of cancer is not present). SAS^® Enterprise Miner™ 12.1 was used to perform data validation and data cleansing, to identify potentially related predictors, and to build models that can be used to predict at an early stage the likelihood of patients developing breast cancer. We compared two models: the first model was built with an interactive node and a cluster node, and the second was built without an interactive node and a cluster node. Classification performance was compared using a receiver operating characteristic (ROC) curve and average squares error. Interestingly, we found significantly improved model performance by using only variables that have a lesser or unknown association with breast cancer. The result shows that the logistic model with an interactive node and a cluster node has better performance with a lower average squared error (0.059614) than the model without an interactive node and a cluster node. Among other benefits, this model will assist inexperienced oncologists in saving time in disease diagnosis.

Read the paper (PDF).

Many freshmen leave their first college and go on to attend another institution. Some of these students are even successful in earning degrees elsewhere. As there is more focus on college graduation rates, this paper shows how the power of SAS^® can pull in data from many disparate sources, including the National Student Clearinghouse, to answer questions on the minds of many institutional researchers. How do we use the data to answer questions such as What would my graduation rate be if these students graduated at my institution instead of at another one?', What types of schools do students leave to attend? , and Are there certain characteristics of students who leave, and are they concentrated in certain programs? The data-handling capabilities of SAS are perfect for this type of analysis, and this presentation walks you through the process.