SAS Global Forum 2017 Proceedings

In this paper, a SAS^® macro is introduced that can search and replace any string in a SAS program. To use the macro, the user needs only to pass the search string to a folder. If the user wants to use the replacement function, the user also needs to pass the replacement string. The macro checks all of the SAS programs in the folder and subfolders to find out which files contain the search string. The macro generates new SAS files for replacements so that the old files are not affected. An HTML report is generated by the macro to include the original file locations, the line numbers of the SAS code that contain the search string, and the SAS code with search strings highlighted in yellow. If you use the replacement function, the HTML report also includes the location information for the new SAS files. The location information in the HTML report is created with hyperlinks so that the user can directly open the files from the report.

Read the paper (PDF) | View the e-poster or slides (PDF)

We have a lot of chances to use time-to-event (survival) analysis, especially in the biomedical and pharmaceutical fields. SAS^® provides the LIFETEST procedure to calculate Kaplan-Meier estimates for survival function and to delineate a survival plot. The PHREG procedure is used in Cox regression models to estimate the effect of predictors in hazard rates. Programs with ODS tables that are defined by PROC LIFETEST and PROC PHREG can provide more statistical information from the generated data sets. This paper provides a macro that uses PROC LIFETEST and PROC PHREG with ODS. It helps users to have a survival plot with estimates that include the subject at risk, events and total subject number, survival rate with median and 95% confidence interval, and hazard ratio estimates with 95% confidence interval. Some of these estimates are optional in the macro, so users can select what they need to display in the output. (Subject at risk and event and subject number are not optional.) Users can also specify the tick marks in the X-axis and subject at risk table, for example, every 10 or 20 units. The macro dynamic calculates the maximum for the X-axis and uses the interval that the user specified. Finally, the macro uses ODS and can be output in any document files, including JPG, PDF, and RTF formats.

Read the paper (PDF) | View the e-poster or slides (PDF)

The hospital Medicare readmission rate has become a key indicator for measuring the quality of health care in the US. This rate is currently used by major health-care stakeholders including the Centers for Medicare and Medicaid Services (CMS), the Agency for Healthcare Research and Quality (AHRQ), and the National Committee for Quality Assurance (NCQA) (Fan and Sarfarazi, 2014). Although many papers have been written about how to calculate readmissions, this paper provides updated code that includes ICD-10 (International Classification of Diseases) code and offers a novel and comprehensive approach using SAS^® DATA step options and PROC SQL. We discuss: 1) De-identifying patient data 2) Calculating sequential admissions 3) Subsetting criteria required to report for CMS 30-day readmissions. In addition, this papers demonstrates: 1) Using the output delivery system (ODS) to create a labeled and de-identified data set 2) Macro variables to examine data quality 3) Summary statistics for further reporting and analysis.

Read the paper (PDF) | View the e-poster or slides (PDF)

Duplicates in a clinical trial or survey database could jeopardize data quality and integrity, and they can induce biased analysis results. These complications often happen in clinical trials, meta analyses, and registry and observational studies. Common practice to identify possible duplicates involves sensitive personal information, such as name, Social Security number (SSN), date of birth, address, telephone number, etc. However, access to this sensitive information is limited. Sometimes, it is even restricted. As a measure of data quality control, a SAS^® program was developed to identify duplicated individuals using non-sensitive information, such as age, gender, race, medical history, vital signs, and laboratory measurements. A probabilistic approach was used by calculating weights for data elements used to identify duplicates based on two probabilities (probability of agreement for an element among matched pairs and probability of agreement purely by chance among non-matched pairs). For elements with categorical values, agreement was defined as matching pairs sharing the same value. For elements with interval values, agreement was defined as matching values within 1% of measurement precision range. Probabilities used to compute matching element weights were estimated using an expectation-maximization (EM) algorithm. The method was then tested on a survey and clinical trial data from hypertension studies.

View the e-poster or slides (PDF)

Technology plays an integral role in every aspect of daily life. As a result, educators should leverage technology-based learning to ensure that students are provided with authentic, engaging, and meaningful learning experiences (Pringle, Dawson, and Ritzhaupt, 2015).The significance and value of computer science understanding continue to increase. A major resource that can be credited with spreading support for computer science is the site Code.org. Its mission is to enable every student in every school to have the opportunity to learn computer science (https://code.org/about). Two years ago, our mentor partnered with Code.org to conduct workshops within the Charlotte, NC area to educate teachers on how to teach computer science activities and concepts in their classrooms. We had the opportunity to assist during the workshops to provide student perspectives and opinions. As we look back on the workshops, we wondered, How are the teachers who attended the workshops implementing the concepts they were taught? After each workshop, a survey was distributed to the attendees to receive workshop feedback and to follow up. We collected the data from the surveys sent to participants and analyzed it using SAS^® University Edition. The results of the survey concluded that the workshops were beneficial and that the educators had implemented a concept that they learned. We believe that computer science activity implementations will assist students across the curriculum.

View the e-poster or slides (PDF)

To determine whether there is a correlation between the repetitiveness of a song s lyrics and its popularity, the top 10 songs from the Billboard Hot 100 songs chart from 2006 to 2015 were collected. Song lyrics were assessed to determine the count of the top 10 words used. Word counts were used to predict the number of weeks the song was on the chart. The prediction model was analyzed to determine the quality of the model and whether word count was a significant predictor of a song s popularity. To investigate whether song lyrics are becoming more simplistic over time, several tests were performed to see whether the average word count has been changing over the years. All analysis was completed in SAS^® using various procedures.

View the e-poster or slides (PDF)

Are you tired of copying PROC FREQ or PROC MEANS output and pasting it into your tables? Do you need to produce summary tables repeatedly? Are you spending a lot of your time generating the same summary tables for different subpopulations? This paper introduces an easy-to-use macro to generate a descriptive statistics table. The table reports counts and percentages for categorical variables, and means, standard deviations, medians, and quantiles for continuous variables. For variables with missing values, the table also includes the count and percentage missing. Customization options allow for the analysis of stratified data, specification of variable output order, and user-defined formats. In addition, this macro incorporates the SAS^® Output Delivery System (ODS) to automatically produce a Rich Text Format (RTF) file, which can be further edited by a word processor for the purpose of publication.

Read the paper (PDF) | View the e-poster or slides (PDF)

A major issue in America today is the growing gap between the rich and the poor. Even though the basic concept has entered the public consciousness, the effects of highly concentrated wealth are hotly debated and poorly understood by the general public. The goal of this paper is to get a fair picture of the wealth gap and its ill effects on American society. Before visualizing the financial gap, an exploration and descriptive analysis is carried out. By considering the data (gross annual income, taxable income, and taxes paid), which is available on the website of United States Census Bureau, we try to find out the actual spending capacity of the people in America. We visualize the financial gap on the basis of the spending capacity. With the help of this analysis we try to answer the following questions. Why is it important to have a fair idea of this gap? At what rate is the average wealth of the American population increasing? How does it affect the tax system? Insights generated from answering these questions will be used for further analysis.

View the e-poster or slides (PDF)

Manufacturers of any product from toys to medicine to automobiles must create items that are, above all else, safe to use. Not only is this essential to long-term brand value and corporate success, but it's also required by law. Although perfection is the goal, defects are bound to occur, especially in advanced products such as automobiles. Automobiles are the largest purchase most people make, next to a house. When something that costs tens of thousands of dollars runs into problems, you tend to remember. Recalls in part reflect growing pains after decades of consolidation in the auto industry. Many believe that recalls are the culmination of years of neglect by manufacturers and the agencies that regulate them. For several reasons, automakers are acting earlier and more often in issuing recalls. In the past 20 years, the number of voluntarily recalled vehicles has steadily grown. The automotive-recall landscape changed dramatically in 2000 with the passage of the federal TREAD Act. Before that, federal law required that automakers issue a recall only when a consumer reported a problem. TREAD requires that companies identify potential problems and promptly notify the NHTSA. This is largely due to stricter laws, heavier fines, and more cautious car makers. This study helps automobile manufacturers understand customers who are talking about defects in their cars and to be proactive in recalling the product at the right time before the Government acts.

Read the paper (PDF) | View the e-poster or slides (PDF)

Uber has changed the face of taxi ridership, making it more convenient and comfortable for riders. But, there are times when customers are dissatisfied because of a shortage of Uber vehicles, which ultimately leads to Uber surge pricing. It's a very difficult task to forecast the number of riders at different locations in a city at different points in time. This gets more complicated with changes in weather. In this paper, we attempt to estimate the number of trips per borough on a daily basis in New York City. We add an exogenous factor weather to this analysis to see how it impacts the changes in the number of trips. We fetched six months worth of data (approximately 9.7 million records) of Uber rides in New York City ranging from January 2015 to June 2015 from GitHub. We gathered weather data (about 3.5 million records) for New York City for the same period from the National Climatic Data Center. We analyzed Uber data and weather data together to estimate the change in the number of trips per borough due to changing weather conditions. We built a model to predict the number of trips per day for a one-week-ahead forecast for each borough of New York City. As part of a further analysis, we got the number of trips on a particular day for each borough. Using time series analysis, we forecast the number of trips that might be required in the near future (probably one week).

Read the paper (PDF) | View the e-poster or slides (PDF)

Sovereign risk rating and country risk rating are conceptually distinct in that the former captures the risk of a country defaulting on its commercial debt obligations using economic variables while the latter covers the downside of a country's business environment including political and social variables alongside economic variables. Through this paper we would like to understand the differences between these risk approaches in assessing a country's credit worthiness by statistically examining the predictive power of political and social variables in determining country risk. To do this, we wish to build two models, first model with economic variables as regressors (sovereign risk model) and the second model with economic, political and social variables as regressors (country risk model) to compare the predictive power of regressors and model performance metrics between both the models. This will be an OLS regression model with country risk rating obtained from S&P as the target variable. With a general assumption that economic variables are driven by political processes and social factors, we would like to see if the second model has better predictive power. The economic, political and social indicators data that will be used as independent variables in the model will be obtained from world bank open data and target variable (country risk rating) will be obtained from S&P country risk ratings data.

View the e-poster or slides (PDF)

Research frequently shows that exposure to sunlight contributes to non-melanoma skin cancer. But, it also shows that sunlight might protect you against multiple sclerosis and breast, ovarian, prostate, and colon cancer. In my study, I explored whether mortality from skin cancer, myocardial infarction, atrial fibrillation, and stroke is associated with exposure to sunlight. I used SAS^® 9.4 and RStudio to conduct the entire study. I collected mortality data including cause of death in Los Angeles from 2000 to 2003. In addition, I collected sunlight data for Los Angeles for the same period. There are three types of sunlight in my data global sunlight, diffuse sunlight, and direct sunlight. Data was collected at three different times morning, middle of day, and afternoon. I used two models the Poisson time series regression model and a logistic regression model to investigate the association. I considered a one-year and two-year lag of sunlight association with the types of diseases. I adjusted for age, sex, race, education, temperature, and day of week. Results show that stroke is statistically and significantly associated with a one-year lag of sunlight (p<0.001). Previous epidemiological studies have found that sunlight exposure can ameliorate osteoporosis in stroke patients, and my study provides the protective effects of sunlight on stroke patients.

View the e-poster or slides (PDF)

Understanding customer behavior profiles is of great value to companies. Customer behavior is influenced by a multitude of elements-some are capricious, presumably resulting from environmental, economic, and other factors, while others are more fundamentally aligned with value and belief systems. In this paper, we use unstructured textual cheque card data to model and estimate latent spending behavioral profiles of banking customers. These models give insight into unobserved spending habits and patterns. SAS^® Text Miner is used in an atypical manner to determine the buying segments of customers and the latent buying profile using a clustering approach. Businesses benefit in the way the behavioral spend model is used. The model can be used for market segmentation, where each cluster is seen as a target marketing segment, leads optimization, or product offering where products are specifically compiled to align to each customer's requirements. It can also be used to predict future spend or to align customer needs with business offerings, supported by signing customers onto loyalty programs. This unique method of determining the spend behavior of customers makes it ideal for companies driving retention and loyalty in their customers.

Read the paper (PDF) | View the e-poster or slides (PDF)

Rapid advances in technology have empowered musicians all across the globe to share their music easily, resulting in intensified competition in the music industry. For this reason, musicians and record labels need to be aware of factors that can influence the popularity of their songs. The focus of our study is to determine how themes, topics, and terms within song lyrics have changed over time and how these changes might have influenced the popularity of songs. Moreover, we plan to run time series analysis on the numeric attributes of Billboard Top 100 songs in order to determine the appropriate combination of relevant attributes that influences a song's popularity. The findings of our study can potentially benefit musicians and record labels in understanding the necessary lyrical construction, overall themes, and topics that might enable a song to reach the highest chart position on the Billboard Top 100. The Billboard Top 100 is an optimal source of data, as it is an objective measure of popularity. Our data has been collected from open sources. Our data set consists of all 334,784 Billboard Top 100 observations for the years 1955-2015, with metadata covering all 26,869 unique songs that have appeared on the chart for that period. Our expanding lyric data set currently contains 18,002 of those songs, which were used to conduct our analysis. SAS^® Enterprise Miner and SAS^® Sentiment Analysis Studio were the primary tools of our analysis.

View the e-poster or slides (PDF)

SAS^® Output Delivery System (ODS) Graphics started appearing in SAS^® 9.2. Collectively these new tools were referred to as 'ODS Graphics,' 'SG Graphics' and 'Statistical Graphics'. When first starting to use these tools, the traditional SAS/GRAPH^® software user might come upon some very significant challenges in learning the new way to do things. This is further complicated by the lack of simple demonstrations of capabilities. Most graphs in training materials and publications are rather complicated graphs that, while useful, are not good teaching examples for starting purposes. This paper contains many examples of very simple ways to get very simple things accomplished. Many different graphs are developed using only a few lines of code each, using data from the SASHELP data sets. The use of the SGPLOT, SGPANEL, and SGSCATTER procedures are shown. In addition, the paper addresses those situations in which the user must alternatively use a combination of the TEMPLATE and SGRENDER procedures to accomplish the task at hand. Most importantly, the use of the 'ODS Graphics Designer' as a teaching tool and a generator of sample graphs and code are covered. This tool makes use of the TEMPLATE and SGRENDER Procedures, generating Graphics Template Language (GTL) code. Users get extremely productive fast. The emphasis in this paper is the simplicity of the learning process. Users will be able to take the generated code and run it immediately on their personal machines.

Read the paper (PDF) | View the e-poster or slides (PDF)

In this paper, we introduce a SAS/IML^® program of Classification Accuracy and Classification Consistency (CA/CC) that provides useful resources to test analysts or psychometricians. Our program optimizes functions of SAS^® by offering the CA/CC statistics not only with dichotomous items, but also with polytomous items. Classification Decision (CD) is a method to categorize examinees into achievement groups based on cut scores (Quinn and Cheng, 2013). CD has been predominantly used in educational and vocational situations such as admissions, selection, placement, or certification. This method needs to be accurate because its use has been important to examinees' professional and academic futures. Classification Accuracy and Classification Consistency (CA/CC) statistics are indices representing the precision of CD, and they need to be reported in order to affirm the validity of the CD. Classification Accuracy is referred to as the degree to which the classification of observed scores matches with the classification of true scores, and Classification Consistency is defined as the degree to which examinees are classified in the same category when taking two parallel test forms (Lee, 2010). Under item response theory (IRT), there are two methods to calculate CA/CC: Rudner (2001) and Lee (2010) approaches. This research deals with these two approaches for CA/CC with the examinee level.

View the e-poster or slides (PDF)

Computer and video games are complex these days. Events in video games are in some cases recorded automatically in text files, creating a history or story of game play. There are countable items in these event records that can be used as data for statistics and other types of modeling. This E-Poster shows you how to statistically analyze text files for video game events using SAS^®. Two games are analyzed. EVE Online, a massive multi-user online role-playing spaceship game, is one. The other game is Ingress, a cell phone game that combines exercise with a GPS and real-world environments. In both examples, the techniques involve parsing large amounts of text data to examine recurring patterns in text that describe events in the game play.

View the e-poster or slides (PDF)

SAS^® has a very efficient and powerful way to get distances between an event and a customer. Using the tables and code located at http://support.sas.com/rnd/datavisualization/mapsonline/html/geocode.html#street, you can load latitude and longitude to addresses that you have for your events and customers. Once you have the tables downloaded from SAS, and you have run the code to get them into SAS data sets, this paper helps guide you through the rest using PROC GEOCODE and the GEODIST function. This can help you determine to whom to market an event. And, you can see how far a client is from one of your facilities.

Read the paper (PDF) | View the e-poster or slides (PDF)

Discover how to document your SAS^® programs, data sets, and catalogs with a few lines of code that include SAS functions, macro code, and SAS metadata. Do you start every project with the best of intentions to document all of your work, and then fall short of that aspiration when deadlines loom? Learn how SAS system macro variables can provide valuable information embedded in your programs, logs, lists, catalogs, data sets and ODS output; how your programs can automatically update a processing log; and how to generate two different types of codebooks.

Read the paper (PDF) | View the e-poster or slides (PDF)

This paper illustrates proper applications of multidimensional item response theory (MIRT), which is available in SAS^® PROC IRT. MIRT combines item response theory (IRT) modeling and factor analysis when the instrument carries two or more latent traits. Although it might seem convenient to accomplish two tasks simultaneously by using one procedure, users should be cautious of misinterpretations. This illustration uses the 2012 Program for International Student Assessment (PISA) data set collected by Organisation for Economic Co-operation and Development (OECD). Because there are two known sub-domains in the PISA test (reading and math), PROC IRT was programmed to adopt a two-factor solution. In additional, the loading plot, dual plot, item difficulty/discrimination plot, and test information function plot in JMP^® were used to examine the psychometric properties of the PISA test. When reading and math items were analyzed in SAS MIRT, seven to 10 latent factors are suggested. At first glance, these results are puzzling because ideally all items should be loaded into two factors. However, when the psychometric attributes yielded from a two-parameter IRT analysis are examined, it is evident that both the reading and math test items are well written. It is concluded that even if factor indeterminacy is present, it is advisable to evaluate its psychometric soundness based on IRT because content validity can supersede construct validity.

Read the paper (PDF) | View the e-poster or slides (PDF)

Student growth percentile (SGP) is one of the most widely used score metrics for measuring a student's academic growth. Using longitudinal data, SGP describes a student's growth as the relative standing among students who had a similar level of academic achievement in previous years. Although several models for SGP estimation have been introduced, and some models have been implemented with R, no studies have yet described using SAS^®. As a result, this research describes various types of SGP models and demonstrates how practitioners can use SAS procedures to fit these models. Specifically, this study covers three types of statistical models for SGP: 1) quantile regression-based model 2) conditional cumulative density function-based model 3) multidimensional item response theory-based model. Each of the three models partly uses procedures in SAS, such as PROC QUANTREG, PROC LOGISTIC, PROC TRANSREG, PROC IRT, or PROC MCMC, for its computation. The program code is illustrated using a simulated longitudinal data set over two consecutive years, which is generated by SAS/IML^®. In addition, the interpretation of the estimation results and the advantages and disadvantages of implementing these three approaches in SAS are discussed.

View the e-poster or slides (PDF)

The DOCUMENT procedure is a little known procedure that can save you vast amounts of time and effort when managing the output of your SAS^® programming efforts. This procedure is deeply associated with the mechanism by which SAS controls output in the Output Delivery System (ODS). Have you ever wished you didn't have to modify and rerun the report-generating program every time there was some tweak in the desired report? PROC DOCUMENT enables you to store one version of the report as an ODS Document Object and then call it out in many different output forms, such as PDF, HTML, listing, RTF, and so on, without rerunning the code. Have you ever wished you could extract those pages of the output that apply to certain BY variables such as State, StudentName, or CarModel? With PROC DOCUMENT, you have where capabilities to extract these. Do you want to customize the table of contents that assorted SAS procedures produce when you make frames for the table of contents with HTML, or use the facilities available for PDF? PROC DOCUMENT enables you to get to the inner workings of ODS and manipulate them. This paper addresses PROC DOCUMENT from the viewpoint of end results, rather than provide a complete technical review of how to do the task at hand. The emphasis is on the benefits of using the procedure, not on detailed mechanics.

Read the paper (PDF) | View the e-poster or slides (PDF)

Having crossed the spectrum from an epidemiologist and researcher (where ad hoc is a way of life and where research is the main focus) to a SAS^® programmer (writing reusable code for automation and batch jobs, which require no manual interventions), I have learned a few things that I wish I had known as a researcher. These things would not only have helped me to be a better SAS programmer, but they also would have saved me time and effort as a researcher by enabling me to have well-organized, accurate code (that I didn't accidentally remove) and code that would work when I ran it again on another date. This poster presents five SAS tips that are common practice among SAS programmers. I provide researchers who use SAS with tips that are handy and useful, and I provide code (where applicable) that they can try out at home. Using the tips provided will make any SAS programmer smile when they are presented with your code (not guaranteed, but your results should not vary by using these tips).

View the e-poster or slides (PDF)

The mean-variance model might be the most famous model in the financial field. It can determine the optimal portfolio if you know every asset's expected return and its covariance matrix. The tangency portfolio is a type of optimal portfolio, which means that it has the maximum expected return (mean) and the minimial risk (variance) among all portfolios. This paper uses sample data to get the tangency portfolio using SAS/IML^® code.

Read the paper (PDF) | View the e-poster or slides (PDF)

When creating statistical models that include multiple covariates (for example, Cox proportional hazards models or multiple linear regression), it is important to address which variables are categorical and continuous for proper analysis and interpretation in SAS^®. Categorical variables, regardless of SAS data type, should be added in the MODEL statement with an additional CLASS statement. In larger models containing many continuous or categorical variables, it is easy to overlook variables that should be added to the CLASS statement. To solve this problem, we have created a macro that uses simple input from the model variables, with PROC CONTENTS and additional logic checks, to create the necessary CLASS statement and to run the desired model. With this macro, variables are evaluated on multiple conditions to see whether they should be considered class variables. Then, they are added automatically to the CLASS statement.

Read the paper (PDF) | View the e-poster or slides (PDF)

Because many SAS^® users either work for or own companies that house big data, the threat that malicious software poses becomes even more extreme. Malicious software, often abbreviated as malware, includes many different classifications, ways of infection, and methods of attack. This E-Poster highlights the types of malware, detection strategies, and removal methods. It provides guidelines to secure essential assets and prevent future malware breaches.

Read the paper (PDF) | View the e-poster or slides (PDF)

Throughout history, the phrase know thyself has been the aspiration of many. The trend of wearable technologies has certainly provided the opportunity to collect personal data. These technologies enable individuals to know thyself on a more sophisticated level. Specifically, wearable technologies that can track a patient's medical profile in a web-based environment, such as continuous blood glucose monitors, are saving lives. The main goal for diabetics is to replicate the functions of the pancreas in a manner that allows them to live a normal, functioning lifestyle. Many diabetics have access to a visual analytics website to track their blood glucose readings. However, they often are unreadable and overloaded with information. Analyzing these readings from the glucose monitor and insulin pump with SAS^®, diabetics can parse their own information into more simplified and readable graphs. This presentation demonstrates the ease in creating these visualizations. Not only is this beneficial for diabetics, but also for the doctors that prescribe the necessary basal and bolus levels of insulin for a patient s insulin pump.

View the e-poster or slides (PDF)

Linear regression, which is widely used, can be improved by the inclusion of the penalizing parameter. This helps reduce variance (at the cost of a slight increase in bias) and improves prediction accuracy and model interpretability. The regularization model is implemented on the sample data set, and recommendations for the practice are included.

View the e-poster or slides (PDF)

String externalization is the key to making your SAS^® applications speak multiple languages, even if you can't. Using the new features in SAS^® 9.3 for internationalization, your SAS applications can be written to adapt to whatever environment they are found in. String externalization is the process of identifying and separating translatable strings from your SAS program. This paper outlines the four steps of string externalization: create a Microsoft Excel spreadsheet for messages (optional), create SMD files, convert SMD files, and create the final SAS data set. Furthermore, it briefly shows you a real-world project on applying the concept. Using the Excel spreadsheet message text approach, professional translators can work more efficiently translating text in a friendlier and more comfortable environment. Subsequently, a programmer can also fully concentrate on developing and maintaining SAS code when your application is traveling to a new country.

View the e-poster or slides (PDF)

Prince Niccolo Machiavelli said things on the order of, The promise given was a necessity of the past: the word broken is a necessity of the present. His utilitarian philosophy can be summed up by the phrase, The ends justify the means. As a personality trait, Machiavelianism is characterized by the drive to pursue one's own goals at the cost of others. In 1970, Richard Christie and Florence L. Geis created the MACH-IV test to assign a MACH score to an individual, using 20 Likert-scaled questions. The purpose of this study was to build a regression model that can be used to predict the MACH score of an individual using fewer factors. Such a model could be useful in screening processes where personality is considered, such as in job screening, offender profiling, or online dating. The research was conducted on a data set from an online personality test similar to the MACH-IV test. It was hypothesized that a statistically significant model exists that can predict an average MACH score for individuals with similar factors. This hypothesis was accepted.

View the e-poster or slides (PDF)

Dynamic social networks can be used to monitor the constantly changing nature of interactions and relationships between people and groups. The size and complexity of modern dynamic networks can make this task extremely challenging. Using the combination of SAS/IML^®, SAS/QC^®, and R, we propose a fast approach to monitor dynamic social networks. A discrepancy score at edge level was developed to measure the unusualness of the observed social network. Then, multivariate and univariate change-point detection methods were applied on the aggregated discrepancy score to identify the edges and vertices that have experienced changes. Stochastic block model (SBM) networks were simulated to demonstrate this method using SAS/IML and R. PROC SHEWHART and PROC CUSUM in SAS/QC and PROC SGRENDER heat maps were applied on the aggregated discrepancy score to monitor the dynamic social network. The combination of SAS/IML, SAS/QC, and R make it an ideal tool to monitor dynamic social networks.

View the e-poster or slides (PDF)

The EXPAND procedure is very useful when handling time series data and is commonly used in fields such as finance or economics, but it can also be applied to medical encounter data within a health research setting. Medical encounter data consists of detailed information about healthcare services provided to a patient by a managed care entity and is a rich resource for epidemiologic research. Specific data items include, but are not limited to, dates of service, procedures performed, diagnoses, and costs associated with services provided. Drug prescription information is also available. Because epidemiologic studies generally focus on a particular health condition, a researcher using encounter data might wish to distinguish individuals with the health condition of interest by identifying encounters with a defining diagnosis and/or procedure. In this presentation, I provide two examples of how cases can be identified from a medical encounter database. The first uses a relatively simple case definition, and then I EXPAND the example to a more complex case definition.

View the e-poster or slides (PDF)

In item response theory (IRT), the distribution of examinees' abilities is needed to estimate item parameters. However, specifying the ability distribution is difficult, if not impossible, because examinees' abilities are latent variables. Therefore, IRT estimation programs typically assume that abilities follow a standard normal distribution. When estimating item parameters using two separate computer runs, one problem with this approach is that it causes item parameter estimates obtained from two groups that differ in ability level to be on different scales. There are several methods that can be used to place the item parameter estimates on a common scale, one of which is multi-group calibration. This method is also called concurrent calibration because all items are calibrated concurrently with a single computer run. There are two ways to implement multi-group calibration in SAS^®: 1) Using PROC IRT. 2) Writing an algorithm from scratch using SAS/IML^®. The purpose of this study is threefold. First, the accuracy of the item parameter estimates are evaluated using a simulation study. Second, the item parameter estimates are compared to those produced by an item calibration program flexMIRT. Finally, the advantages and disadvantages of using these two approaches to conduct multi-group calibration are discussed.

View the e-poster or slides (PDF)

When first learning SAS^®, programmers often see the proprietary DATA step as a foreign and nonstandard concept. The introduction of the SAS^® 9.4 DS2 language eases the transition for traditional programmers delving into SAS for the first time. Object Oriented Programming (OOP) has been an industry mainstay for many years, and the DS2 procedure provides an object-oriented environment for the DATA step. In this poster, we go through a business case to show how DS2 can be used to define a reusable package following object-oriented principles.

View the e-poster or slides (PDF)

Multicategory logit models extend the techniques of logistic regression to response variables with three or more categories. For ordinal response variables, a cumulative logit model assumes that the effect of an explanatory variable is identical for all modeled logits (known as the assumption of proportional odds). Past research supports the finding that as the sample size and number of predictors increase, it is unlikely that proportional odds can be assumed across all predictors. An emerging method to effectively model this relationship uses a partial proportional odds model, fit with unique parameter estimates at each level of the modeled relationship only for the predictors in which proportionality cannot be assumed. First used in SAS/STAT^® 12.1, PROC LOGISTIC in SAS^® 9.4 now extends this functionality for variable selection methods in a manner in which all equal and unequal slope parameters are available for effect selection. Previously, the statistician was required to assess predictor non-proportionality a priori through likelihood tests or subjectively through graphical diagnostics. Following a review of statistical methods and limitations of other commercially available software to model data exhibiting non-proportional odds, a public-use data set is used to examine the new functionality in PROC LOGISTIC using stepwise variable selection methods. Model diagnostics and the improvement in prediction compared to a general cumulative model are noted.

Read the paper (PDF) | Download the data file (ZIP) | View the e-poster or slides (PDF)

Organizations that create and store personally identifiable information (PII) are often required to de-identify sensitive data to protect an individual s privacy. There are multiple methods in SAS^® that can be used to de-identify PII depending on data types and encryption needs. The first method is to apply crosswalk mapping by linking a data set with PII to a secured data set that contains the PII and its corresponding surrogate. Then, the surrogate replaces the PII in the original data set. A second method is SAS encryption, which involves translating PII into an encrypted string using SAS functions. This could be a one-byte-to-one-byte swap or a one-byte-to-two-byte swap. The third method is in-database encryption, which encrypts the PII in a data warehouse, such as Oracle and Teradata, using SAS tools before any information is imported into SAS for users to see. This paper discusses the advantages and disadvantages of these three methods, provides sample SAS code, and describes the corresponding methods to decrypt the encrypted data.

Read the paper (PDF) | View the e-poster or slides (PDF)

Are secondary schools in the United States hiring enough qualified math teachers? In which regions is there a disparity of qualified teachers? Data from an extensive survey conducted by the National Center for Education Statistics (NCES) was used for predicting qualified secondary school teachers across public schools in the US. The three criteria examined to determine whether a teacher is qualified to teach a given subject are: 1) Whether the teacher has a degree in the subject he or she is teaching 2) Whether he or she has a teaching certification in the subject 3) Whether he or she has five years of experience in the subject. A qualified teacher is defined as one who has all three of the previous qualifications. The sample data included socioeconomic data at the county level, which was used as predictors for hiring a qualified teacher. Data such as the number of students on free or reduced lunch at the school was used to assign schools as high-needs or low-needs schools. Other socioeconomic factors included were the income and education levels of working adults within a given school district. Some of the results show that schools with higher-needs students (a school that has more than 40% of the students on some form of reduced lunch program) have less-qualified teachers. The resultant model is used to score other regions and is presented on a heat map of the US. SAS^® procedures such as PROC SURVEYFREQ and PROC SURVEYLOGISTIC are used.

View the e-poster or slides (PDF)

Research using electronic health records (EHR) is emerging, but questions remain about its completeness, due in part to physicians' time to enter data in all fields. This presentation demonstrates the use of SAS^® Enterprise Miner to predict completeness of clinical data using claims data as the standard 'source of truth' against which to compare it. A method for assessing and predicting the completeness of clinical data is presented using the tools and techniques from SAS Enterprise Miner. Some of the topics covered include: tips for preparing your sample data set for use in SAS Enterprise Miner; tips for preparing your sample data set for modeling, including effective use of the Input Data, Data Partition, Filter, and Replacement nodes; and building predictive models using Stat Explore, Decision Tree, Regression, and Model Compare nodes.

View the e-poster or slides (PDF)

Production forecasts that are based on data analytics are able to capture the character of the patterns that are created by past behavior of wells and reservoirs. Future trends are a reflection of past trends unless operating principles have changed. Therefore, the forecasts are more accurate than the monotonous, straight line that is provided by decline curve analysis (DCA). The patterns provide some distinct advantages: they provide a range instead of an absolute number, and the periods of high and low performance can be used for better planning. When used together with DCA, the current method of using data driven production forecasting can certainly enhance the value tremendously for the oil and gas industry, especially in times of volatility in the global oil and gas industry.

View the e-poster or slides (PDF)

Bayesian inference has become ubiquitous in applied science because of its flexibility in modeling data and advances in computation that allow special methods of simulation to obtain sound estimates when more mathematical approaches are intractable. However, when the sample size is small, the choice of a prior distribution becomes difficult. Computationally convenient choices for prior distributions can overstate prior beliefs and bias the estimates. We propose a simple form of prior distribution, a mixture of two uniform distributions, that is weakly informative, in that the prior distribution has a relatively large standard deviation. This choice leads to closed-form expressions for the posterior distribution if the observed data follow a normal, binomial, or Poisson distribution. The explicit formulas are easily encoded in SAS^®. For a small sample size of 10, we illustrate how to elicit the mixture prior and indicate that the resulting posterior distribution is insensitive to minor misspecification of input values. Weakly informative prior distributions suitable for small sample sizes are easy to specify and appear to provide robust inference.

View the e-poster or slides (PDF)

SAS^® Management Console has been a key tool to interact with SAS^® Metadata Server. But sometimes users need much more than what SAS Management Console can do. This paper contains a couple of SAS^® macros that can be used in SAS^® Enterprise Guide^® and PC SAS to read SAS metadata. These macros read users, roles, and groups registered in metadata. This paper explains how these macros can be executed in SAS Enterprise Guide and how to change these macros to meet other business requirements. There might be some tools available in the market that can be used to read SAS metadata, but this paper helps in achieving most of them within a SAS client like PC SAS and SAS Enterprise Guide, without requiring any additional plug-ins.

Read the paper (PDF) | View the e-poster or slides (PDF)

If you've got an iPhone, you might have noticed that the Health app is hard at work collecting data on every step you take. And, of course, the data scientist inside you is itching to analyze that data with SAS^®. This paper and an accompanying E-Poster show you how to get step data out of your iPhone Health app and into SAS. Once it's there, you can have at it with all things SAS. In this presentation, we show you how a (what else?) step plot can be used to visualize the 73,000+ steps the author took at SAS^® Global Forum 2016.

Read the paper (PDF) | View the e-poster or slides (PDF)

SAS^® In-Memory Analytics for Hadoop is an analytical programming environment that enables a user to use many components of an analytics project in a single environment, rather than switching between different applications. Users can easily prepare raw data for different types of analytics procedures. These techniques explore the data to enhance the information extractions. They can apply a large variety of statistical and machine learning techniques to the data to compare different analytical approaches. The model comparison capabilities let them quickly find the best model, which they can deploy and score in the Hadoop environment. All of these different components of the analytics project are supported in a distributed in-memory environment for lightning-fast processing. This paper highlights tips for working with the interaction between Hadoop data and for dealing with SAS^® LASR Analytic Server. It contains multiple scenarios with elementary but pragmatic approaches that enable SAS^® programmers to work efficiently within the SAS^® In-Memory Analytics environment.

Read the paper (PDF) | View the e-poster or slides (PDF)

Hospital Information Technologists are faced with a dilemma: how to get the many pharmacy databases, dynamic data sets, and software systems to communicate with each other and generate useful, automated, real-time output. SAS^® serves as a unifying tool for our hospital pharmacy. It brings together data from multiple sources, generates output in multiple formats, analyzes trends, and generates summary reports to meet workload, quality, and regulatory requirements. Data sets originate from multiple sources, including drug and device wholesalers, web-based drug information systems, dumb machine output, pharmacy drug-dispensing platforms, hospital administration systems, and others. SAS output includes CSV files that can be read by dispensing machines, report output for Pharmacy and Therapeutics committees, graphs to summarize year-to-year dispensing and quality trends, emails to customers with inventory and expiry date notifications, investigational drug information summaries for hospital staff, inventory trending with restock alerts, and quality assurance summary reports. For clinical trial support, additional output includes randomization codes, data collection forms, blinded enrollment summaries, study subject assignment lists, and others. For business operations, output includes invoices, shipping documents, and customer metrics. SAS brings our pharmacy information systems together and supports an efficient, cost-effective, flexible, and reliable workflow.

Read the paper (PDF) | View the e-poster or slides (PDF)

Self-driving cars are no longer a futuristic dream. In the recent past, Google has launched a prototype of the self-driving car, while Apple is also developing its own self-driving car. Companies like Tesla have just introduced an Auto Pilot version in their newer version of electric cars which have created quite a buzz in the car market. This technology is said to enable aging or disabled people to remain mobile, while also increasing overall traffic saftery. But many people are still skeptical about the idea of self-driving cars, and that's our area of interest. In this project, we plan to do sentiment analysis on thoughts voiced by people on the Internet about self-driving cars. We have obtained the data from http://www.crowdflower.com/data-for-everyone which contain these reviews about the self-driving cars. Our dataset contains 7,156 observations and 9 variables. We plan to do descriptive analysis of the reviews to identify key topics and then use supervised sentiment analysis. We also plan to track and report how the topics and the sentiments change over time.

View the e-poster or slides (PDF)

The University of Central Florida (UCF) Institutional Knowledge Management (IKM) office provides data analysis and reporting for all UCF divisions. These projects are logged and tracked through the Oracle PeopleSoft content management system (CMS). In the past, projects were monitored via a weekly query pulled using SAS^® Enterprise Guide^®. The output would be filtered and prioritized based on project importance and due dates. A project list would be sent to individual staff members to make updates in the CMS. As data requests were increasing, UCF IKM needed a tool to get a broad overview of the entire project list and more efficiently identify projects in need of immediate attention. A project management dashboard that all IKM staff members can access was created in SAS^® Visual Analytics. This dashboard is currently being used in weekly project management meetings and has eliminated the need to send weekly staff reports.

View the e-poster or slides (PDF)

Every sourcing and procurement department has limited resources to use for realizing productivity (cost savings). In practice, many organizations simply schedule yearly pricing negotiations with their main suppliers. They do not deviate from that approach unless there is a very large swing in the underlying commodity. Using cost data gleaned from previous quotes and SAS^® Enterprise Guide^®, we can put in place a program and methodology that move the practice from gut instinct to quantifiable and justifiable models that can easily be updated on a monthly basis. From these updated models, we can print a report of suppliers or categories that we should consider for cost downs, and suppliers or categories that we should work on to hold current pricing. By having all cost models, commodity data, and reporting functions within SAS Enterprise Guide, we are able to not only increase the precision and effectiveness of our negotiations, but also to vastly decrease the load of repetitive work that has been traditionally placed on supporting analysts. Now the analyst can execute the program, send the initial reports to the management team, and be leveraged for other projects and tasks. Moreover, the management team can have confidence in the analysis and the recommended plan of action.

View the e-poster or slides (PDF)

Survival analysis differs from other types of statistical analysis, including graphical summaries and regression modeling procedures, because data is almost always censored. The purpose of this project is to apply survival analysis techniques in SAS^® to practical survival data, aiming to understand the effects of gender and age on lung cancer patient survival at different cancer sites. Results show that both gender and age are significant variables in predicting lung cancer patient survival using the Cox proportional hazards model. Females have better survival than males when other variables in the model are fixed (p-value 0.0254). Moreover, the hazard of patients who are over 65 is 1.385 times that of patients who are under 65 (p-value 0.0145).

View the e-poster or slides (PDF)

SAS^® Macro Language can be used to enhance many report-generating processes. This presentation showcases the potential that macros have in populating predesigned RTF templates. If you have multiple report templates saved, SAS^® can choose and populate the correct ones based on macro programming and DATA _NULL_ using the TRANSTRN function. The autocall macro %TRIM, combined with a macro (for example, &TEMPLATE), can be attached to the output RTF template name. You can design and save as many templates as you like or need. When SAS assigns the macro variable TEMPLATE a value, the %TRIM(&TEMPLATE) statement in the output pathway correctly populates the appropriate template. This can make life easy if you create multiple different reports based on one data set. All that's required are stored templates on accessible pathways.

View the e-poster or slides (PDF)

SAS offers generation data set structure as part of the language feature that many users are familiar with. They use it in their organizations and manage it using keywords such as GENMAX and GENNUM. While SAS operates in a mainframe environment, users also have the ability to tap into the GDG (generation data group) feature available on z/OS, OS/390, OS/370, IBM 3070, or IBM 3090 machines. With cost-saving initiatives across businesses and due to some scaling factors, many organizations are in the process of migrating to mid-tier platforms to cheaper operating platforms such as UNIX and AIX. Because Linux is open source and is a cheaper alternative, several organizations have opted for the UNIX distribution of SAS that can work in UNIX and AIX environments. While this might be a viable alternative, there are certain nuances that the migration effort brings to the technical conversion teams. On UNIX, the concept of GDGs does not exist. While SAS offers generation data sets, they are good only for SAS data sets. If the business organization needs to house and operate with a GDG-like structure for text data sets, there isn't one available. While my organization had a similar initiative to migrate programs used to run the subprime mortgage analytic, incentive, and regulatory reporting, we identified the paucity of literature and research on this topic. Hence, I ended up developing the utility that addresses this need. This is a simple macro that helps us closely simulate a GDG/GDS.

Read the paper (PDF) | View the e-poster or slides (PDF)

This project described the method to classify movie genres based on synopses text data by two approaches: term frequency, and inverse document frequency (tf-idf) and C4.5 decision tree. Using the performance comparison of the classifiers by manipulating the different parameters, the strength and improvement of this method in substantial text analysis were also interpreted. As the result, these two approaches are powerful to identify movie genres.

Read the paper (PDF) | View the e-poster or slides (PDF)

Every visualization tells a story. The effectiveness of showing data through visualization becomes clear as these visualizations will tell stories about differences in US mortality using the National Longitudinal Mortality Study (NLMS) data, using the Public-Use Microdata Samples (PUMS) of 1.2 million cases and 122 thousand records of mortality. SAS^® Visual Analytics is a versatile and flexible tool that easily displays the simple effects of differences in mortality rates between age groups, genders, races, places of birth (native or foreign), education and income levels, and so on. Sophisticated analyses including logistical regression (with interactions), decision trees, and neural networks that are displayed in a clear, concise manner help describe more interesting relationships among variables that influence mortality. Some of the most compelling examples are: Males who live alone have a higher mortality rate than females. White men have higher rates of suicide than black men.

Read the paper (PDF) | View the e-poster or slides (PDF)

In the 2015-2016 season of the National Basketball Association (NBA), the Golden State Warriors achieved a record-breaking 73 regular-season wins. This accomplishment would not have been possible without their reigning Most Valuable Player (MVP) champion Stephen Curry and his historic shooting performance. Shattering his previous NBA record of 286 three-point shots made during the 2014-2015 regular season, he accrued an astounding 402 in the next season. With an increased emphasis on the advantages of the three-point shot and guard-heavy offenses in the NBA today, organizations are naturally eager to investigate player statistics related to shooting at long ranges, especially for the best of shooters. Furthermore, the addition of more advanced data-collecting entities such as SportVU creates an incredible opportunity for data analysis, moving beyond simply using aggregated box scores. This work uses quantile regression within SAS^® 9.4 to explore the relationships between the three-point shot and other relevant advanced statistics, including some SportVU player-tracking data, for the top percentile of three-point shooters from the 2015-2016 NBA regular season.

View the e-poster or slides (PDF)

You might encounter people who used SAS^® long ago (perhaps in university), or people who had a very limited use of SAS in a job. Some of these people with limited knowledge and experience think that SAS is just a statistics package or just a GUI. Those that think of it as a GUI usually reference SAS^® Enterprise Guide^® or, if it was a really long time ago, SAS/AF^® or SAS/FSP^®. The reality is that the modern SAS system is a very large and complex ecosystem, with hundreds of software products and diversified tools for programmers and users. This poster provides diagrams and tables that illustrate the complexity of the SAS system from the perspective of a programmer. Diagrams and illustrations include the functional scope and operating systems in the ecosystem; different environments that program code can run in; cross-environment interactions and related tools; SAS^® Grid Computing: parallel processing; how SAS can run with files in memory (the legacy SAFILE statement and big data and Hadoop); and how some code can run in-database. We end with a tabulation of the many programming languages and SQL dialects that are directly or indirectly supported within SAS. This poster should enlighten those who think that SAS is an old, dated statistics package or just a GUI.

View the e-poster or slides (PDF)

Using SAS^® to query relational databases can be challenging, even for seasoned SAS programmers. SAS/ACCESS^® software makes it easy to directly access data on nearly any platform, but there is a lot of under-the-hood functionality that takes time to learn. Here are tips that will get you on your way fast, including understanding and mastering SQL pass-through; efficiently bulk-loading data from SAS into other databases; tuning your SQL queries; and when to use native database versus SAS functionality.

View the e-poster or slides (PDF)

Public water supplies contain disease-causing microorganisms in the water or distribution ducts. To kill off these pathogens, a disinfectant, such as chlorine, is added to the water. Chlorine is the most widely used disinfectant in all US water treatment facilities. Chlorine is known to be one of the most powerful disinfectants to restrict harmful pathogens from reaching the consumer. In the interest of obtaining a better understanding of what variables affect the levels of chlorine in the water, this presentation analyzed a particular set of water samples randomly collected from locations in Orange County, Florida. Thirty water samples were collected and their chlorine level, temperature, and pH were recorded. A linear regression analysis was performed on the data collected with several qualitative and quantitative variables. Water storage time, temperature, time of day, location, pH, and dissolved oxygen level were the independent variables collected from each water sample. All data collected was analyzed using various SAS^® procedures. Partial residual plots were used to determine possible relationships between the chlorine level and the independent variables. A stepwise selection was used to eliminate possible insignificant predictors. From there, several possible models for the data were selected. F-tests were conducted to determine which of the models appeared to be the most useful.

View the e-poster or slides (PDF)

Knowing which SAS^® products are being used in your organization, by whom, and how often helps you decide whether you have the right mix and quantity licensed. These questions are not easy to answer. We present an innovative technique using three SAS utilities to answer these questions. This paper includes example code written for Linux that can easily be modified for Windows and other operating systems.

Read the paper (PDF) | View the e-poster or slides (PDF)

One of the research goals in public health is to estimate the burden of diseases on the US population. We describe burden of disease by analyzing the statistical association of various diseases with hospitalizations, emergency department (ED) visits, ambulatory/outpatient (doctors' offices) visits, and deaths. In this short paper, we discuss the use of large, nationally representative databases, such as those offered by the National Center for Health Statistics (NCHS) or the Agency for Healthcare Research and Quality (AHRQ), to produce reliable estimates of diseases for studies. In this example, we use SAS^® and SUDAAN to analyze the Nationwide Emergency Department Sample (NEDS), offered by AHRQ, to estimate ED visits for hand, foot, and mouth disease (HFMD) in children less than five years old.

Read the paper (PDF) | View the e-poster or slides (PDF)

Chemical incidents involving irritant chemicals such as chlorine pose a significant threat to life and require rapid assessment. Data from the Validating Triage for Chemical Mass Casualty Incidents A First Step R01 grant was used to determine the most predictive signs and symptoms (S/S) for a chlorine mass casualty incident. SAS^® 9.4 was used to estimate sensitivity, specificity, positive and negative predictive values, and other statistics of irritant gas syndrome agent S/S for two exiting systems designed to assist emergency responders in hazardous material incidents (Wireless Information System for Emergency Responders (WISER) and CHEMM Intelligent Syndrome Tool (CHEMM-IST)). The results for WISER showed the sensitivity was .72 to 1.0; specificity .25 to .47; and the positive predictive value and negative predictive value were .04 to .87 and .33 to 1.0, respectively. The results for CHEMM-IST showed the sensitivity was .84 to .97; specificity .29 to .45; and the positive predictive value and negative predictive value were .18 to .42 and .86 to .97, respectively.

Read the paper (PDF) | View the e-poster or slides (PDF)

It is often necessary to assess multi-rater agreement for multiple-observation categories in case-controlled studies. The Kappa statistic is one of the most common agreement measures for categorical data. The purpose of this paper is to show an approach for using SAS^® 9.4 procedures and the SAS^® Macro Language to estimate Kappa with 95% CI for pairs of nurses that used two different triage systems during a computer-simulated chemical mass casualty incident (MCI). Data from the Validating Triage for Chemical Mass Casualty Incidents A First Step R01 grant was used to assess the performance of a typical hospital triage system called the Emergency Severity Index (ESI), compared with an Irritant Gas Syndrome Agent (IGSA) triage algorithm being developed from this grant, to quickly prioritize the treatment of victims of IGSA incidents. Six different pairs of nurses used ESI triage, and seven pairs of nurses used the IGSA triage prototype to assess 25 patients exposed to an IGSA and 25 patients not exposed. Of the 13 pairs of nurses in this study, two pairs were randomly selected to illustrate the use of the SAS Macro Language for this paper. If the data was not square for two nurses, a square-form table for observers using pseudo-observations was created. A weight of 1 for real observations and a weight of .0000000001 for pseudo-observations were assigned. Several macros were used to reduce programming. In this paper, we show only the results of one pair of nurses for ESI.

Read the paper (PDF) | View the e-poster or slides (PDF)

It has become a need-it-now world, and many managers and decision-makers need their reports and information quicker than ever before to compete. As SAS^® developers, we need to acknowledge this fact and write code that gets us the results we need in seconds or minutes, rather than in hours. SAS is a great tool for extracting, transferring, and loading data, but as with any tool, it is most efficient when used in the most appropriate way. Using the SQL pass-through techniques presented in this paper can reduce run time by up to 90% by passing the processing to the database instead of moving the data back to SAS to be consumed. You can reap these benefits with only a minor increase in coding difficulty.

Read the paper (PDF) | View the e-poster or slides (PDF)

Have you ever been working on a task and wondered whether there might be a SAS^® function that could save you some time? Let alone, one that might be able to do the work for you? Data review and validation tasks can be time-consuming efforts. Any gain in efficiency is highly beneficial, especially if you can achieve a standard level where the data itself can drive parts of the process. The ANY and NOT functions can help alleviate some of the manual work in many tasks such as data review of variable values, data compliance, data formats, and derivation or validation of a variable's data type. The list goes on. In this poster, we cover the functions and their details and use them in an example of handling date and time data and mapping it to ISO 8601 date and time formats.

Read the paper (PDF) | View the e-poster or slides (PDF)

The SAS RAKING macro, introduced in 2000, has been implemented by countless survey researchers worldwide. The authors receive messages from users who tirelessly rake survey data using all three generations of the macro. In this poster, we present the fourth generation of the macro, cleaning up remnants from the previous versions, and resolving user-reported confusion. Most important, we introduce a few helpful enhancements including: 1) An explicit indicator for trimming (or not trimming) the weight that substantially saves run time when no trimming is needed. 2) Two methods of weight trimming, AND and OR, that enable users to overcome a stubborn non-convergence. When AND is indicated, weight trimming occurs only if both (individual and global) high weight cap values are true. Conversely, weight increase occurs only if both low weight cap values are true. When OR is indicated, weight trimming occurs if either of the two (individual or global) high weight cap values is true. Conversely, weight increase occurs if either of the two low weight cap values is true. 3) Summary statistics related to the number of cases with trimmed or increased weights have been expanded. 4) We introduce parameters that enable users to use different criteria of convergence for different raking marginal variables. We anticipate that these innovations will be enthusiastically received and implemented by the survey research community.