SAS Global Forum 2017 Proceedings

A D E F G H I M N R S T U

A

Session SAS0478-2017:

Advanced Hierarchical Modeling with the MCMC Procedure

Hierarchical models, also known as random-effects models, are widely used for data that consist of collections of units and are hierarchically structured. Bayesian methods offer flexibility in modeling assumptions that enable you to develop models that capture the complex nature of real-world data. These flexible modeling techniques include choice of likelihood functions or prior distributions, regression structure, multiple levels of observational units, and so on. This paper shows how you can fit these complex, multilevel hierarchical models by using the MCMC procedure in SAS/STAT^® software. PROC MCMC easily handles models that go beyond the single-level random-effects model, which typically assumes the normal distribution for the random effects and estimates regression coefficients. This paper shows how you can use PROC MCMC to fit hierarchical models that have varying degrees of complexity, from frequently encountered conditional independent models to more involved cases of modeling intricate interdependence. Examples include multilevel models for single and multiple outcomes, nested and non-nested models, autoregressive models, and Cox regression models with frailty. Also discussed are repeated measurement models, latent class models, spatial models, and models with nonnormal random-effects prior distributions.

Read the paper (PDF)

Fang Chen, SAS

Maura Stokes, SAS

Session 1161-2017:

An Analysis of the Repetitiveness of Lyrics in Predicting a Song's Popularity

To determine whether there is a correlation between the repetitiveness of a song s lyrics and its popularity, the top 10 songs from the Billboard Hot 100 songs chart from 2006 to 2015 were collected. Song lyrics were assessed to determine the count of the top 10 words used. Word counts were used to predict the number of weeks the song was on the chart. The prediction model was analyzed to determine the quality of the model and whether word count was a significant predictor of a song s popularity. To investigate whether song lyrics are becoming more simplistic over time, several tests were performed to see whether the average word count has been changing over the years. All analysis was completed in SAS^® using various procedures.

View the e-poster or slides (PDF)

Drew Doyle, University of Central Florida

Session SAS0752-2017:

Analytics Using SAS^® Customer Intelligence: Multivariate Testing Processes for Digital Marketing Campaigns

This presentation illustrates new technology that has been added to the SAS^® Customer Intelligence 360 analytics testing suite for digital campaign marketing. SAS Customer Intelligence 360 now has a multivariate testing tool (MVT). In digital marketing, MVT has become an increasingly popular process by which multiple components of a campaign can be tested in a live environment with the goal of finding an optimal mix, which drives a defined response metric. In simple terms, MVT is equivalent to running numerous A/B tests performed simultaneously. In theory, MVT can test the effectiveness of limitless combinations of factors. The number of factor-level combinations determines the test duration and the number of samples required to make statistically sound predictions for all permutations of a full factorial design. SAS^® applies experimental design analytics, driven by an interactive process that considers constraints and control cell definition, to guide the user toward an optimal reduced design of the test that can still adequately predict all factor-level combinations, given available resources. The results of the analysis enable the marketer to compare responses for both observed factor-level combinations to the predicted responses to the untested combinations.

Read the paper (PDF)

Thomas Lehman, SAS

Session 1031-2017:

Analyzing the Effectiveness of COPD Drugs Through Statistical Tests and Sentiment Analysis

Chronic obstructive pulmonary disease (COPD) is the third leading cause of death in the US. An estimated 24 million people suffer from COPD, and the medical cost associated with it stands at a whopping $36 billion. Besides the emotional and physical impact, a patient with COPD has to undergo severe economic burden to pay for the medication. Hospitals are subjected to heavy penalties for high re-admissions. Identifying the best medicine combinations to treat COPD benefits patients and hospitals. This paper deals with analyzing the effectiveness of three popular drugs prescribed for COPD patients in terms of mortality rates and re-admission within 30 days of discharge. The data from Cerner Health Facts consists of over 1 million real-world, anonymized patient records collected in a real-world health environment. Base SAS^® is used to perform statistical analysis and data processing; re-admission of patients is analyzed using a lag function. The preliminary results show a re-admission rate of 5.96% and a mortality rate of 3.3% among all patients. The odds ratios computed using logistic regression show an increased mortality rate 2.4 times more for patients using Symbicort compared to Spiriva and Advair. This paper also uses text mining of social media, drug portals, and blogs to gauge the sentiments of patients using these drugs. The results obtained through sentiment analysis are then compared with the statistical analysis to determine the effectiveness of drugs prescribed to the COPD patients.

Read the paper (PDF)

Indra Kiran Chowdavarapu, Oklahoma State University

Dursun Delen, Oklahoma State University

Vivek Manikandan Damodaran, Oklahoma State University

Session 0873-2017:

Auto Telematics: Deviations Drive Success

The use of telematics data within the insurance industry is becoming prevalent as insurers use this data to give discounts, categorize drivers, and provide feedback to improve customers' driving. The data captured through in-vehicle or mobile devices includes acceleration, braking, speed, mileage, and many other events. Data elements are analyzed to determine high-risk events such as rapid acceleration, hard braking, quick turning, and so on. The time between these successive high-risk events is a function of the mileage driven and time in the telematics program. Our discussion highlights how we treated these high-risk events as recurrent events and analyzed them using the RELIABILITY procedure within SAS/QC^® software. The RELIABILITY procedure is used to determine a nonparametric mean cumulative function (MCF) of high-risk events. We illustrate the use of the MCF for identifying and categorizing average driver behavior versus individual driver behavior. We also discuss the use of the MCF to evaluate how a loss event or driver feedback can affect future driving behavior.

Read the paper (PDF)

Kelsey Osterloo, State Farm Insurance Company

Deovrat Kakde, SAS

D

Session 0957-2017:

Does Factor Indeterminacy Matter in Multidimensional Item Response Theory?

This paper illustrates proper applications of multidimensional item response theory (MIRT), which is available in SAS^® PROC IRT. MIRT combines item response theory (IRT) modeling and factor analysis when the instrument carries two or more latent traits. Although it might seem convenient to accomplish two tasks simultaneously by using one procedure, users should be cautious of misinterpretations. This illustration uses the 2012 Program for International Student Assessment (PISA) data set collected by Organisation for Economic Co-operation and Development (OECD). Because there are two known sub-domains in the PISA test (reading and math), PROC IRT was programmed to adopt a two-factor solution. In additional, the loading plot, dual plot, item difficulty/discrimination plot, and test information function plot in JMP^® were used to examine the psychometric properties of the PISA test. When reading and math items were analyzed in SAS MIRT, seven to 10 latent factors are suggested. At first glance, these results are puzzling because ideally all items should be loaded into two factors. However, when the psychometric attributes yielded from a two-parameter IRT analysis are examined, it is evident that both the reading and math test items are well written. It is concluded that even if factor indeterminacy is present, it is advisable to evaluate its psychometric soundness based on IRT because content validity can supersede construct validity.

Read the paper (PDF) | View the e-poster or slides (PDF)

Chong Ho Yu, Azusa Pacific University

E

Session SAS0462-2017:

Evaluating Predictive Accuracy of Survival Models with PROC PHREG

Model validation is an important step in the model building process because it provides opportunities to assess the reliability of models before their deployment. Predictive accuracy measures the ability of the models to predict future risks, and significant developments have been made in recent years in the evaluation of survival models. SAS/STAT^® 14.2 includes updates to the PHREG procedure with a variety of techniques to calculate overall concordance statistics and time-dependent receiver operator characteristic (ROC) curves for right-censored data. This paper describes how to use these criteria to validate and compare fitted survival models and presents examples to illustrate these applications.

Read the paper (PDF)

Changbin Guo, SAS

Ying So, SAS

Woosung Jang, SAS

F

Session 1108-2017:

Fitting a Cumulative Logistic Regression

Cumulative logistic regression models are used to predict an ordinal response. They have the assumption of proportional odds. Proportional odds means that the coefficients for each predictor category must be consistent or have parallel slopes across all levels of the response. This paper uses a sample data set to demonstrate how to test the proportional odds assumption. It shows how to use the UNEQUALSLOPES option when the assumption is violated. A cumulative logistic regression model is built, and then the performance of the model on a test set is compared to the performance of a generalized multinomial model. This shows the utility and necessity of the UNEQUALSLOPES option when building a cumulative logistic regression model. The procedures shown are produced using SAS^® Enterprise Guide^® 7.1.

Read the paper (PDF)

Shana Kelly, Spectrum Health

Session 0202-2017:

Fitting a Flexible Model for Longitudinal Count Data Using the NLMIXED Procedure

Longitudinal count data arise when a subject's outcomes are measured repeatedly over time. Repeated measures count data have an inherent within subject correlation that is commonly modeled with random effects in the standard Poisson regression. A Poisson regression model with random effects is easily fit in SAS^® using existing options in the NLMIXED procedure. This model allows for overdispersion via the nature of the repeated measures; however, departures from equidispersion can also exist due to the underlying count process mechanism. We present an extension of the cross-sectional COM-Poisson (CMP) regression model established by Sellers and Shmueli (2010) (a generalized regression model for count data in light of inherent data dispersion) to incorporate random effects for analysis of longitudinal count data. We detail how to fit the CMP longitudinal model via a user-defined log-likelihood function in PROC NLMIXED. We demonstrate the model flexibility of the CMP longitudinal model via simulated and real data examples.

Read the paper (PDF)

Darcy Morris, U.S. Census Bureau

Session SAS0525-2017:

Five Things You Should Know about Quantile Regression

The increasing complexity of data in research and business analytics requires versatile, robust, and scalable methods of building explanatory and predictive statistical models. Quantile regression meets these requirements by fitting conditional quantiles of the response with a general linear model that assumes no parametric form for the conditional distribution of the response; it gives you information that you would not obtain directly from standard regression methods. Quantile regression yields valuable insights in applications such as risk management, where answers to important questions lie in modeling the tails of the conditional distribution. Furthermore, quantile regression is capable of modeling the entire conditional distribution; this is essential for applications such as ranking the performance of students on standardized exams. This expository paper explains the concepts and benefits of quantile regression, and it introduces you to the appropriate procedures in SAS/STAT^® software.

Read the paper (PDF)

Robert Rodriguez, SAS

Yonggang Yao, SAS

G

Session 1128-2017:

Geospatial Analysis: Linear, Nonlinear, or Both?

An important component of insurance pricing is the insured location and the associated riskiness of that location. Recently, we have experienced a large increase in the availability of external risk classification variables and associated risk factors by geospatial location. As additional geospatial data becomes available, it is prudent for insurers to take advantage of the new information to better match price to risk. Generalized additive models using penalized likelihood (GAMPL) have been explored as a way to incorporate new location-based information. This type of model can leverage the new geospatial information and incorporate it with traditional insurance rating variables in a regression-based model for rating. In our method, we propose a local regression model in conjunction with our GAMPL model. Our discussion demonstrates the use of the LOESS procedure as well as the GAMPL procedure in a combined solution. Both procedures are in SAS/STAT^® software. We discuss in detail how we built a local regression model and used the predictions from this model as an offset into a generalized additive model. We compare the results of the combined approach to results of each model individually.

Read the paper (PDF)

Kelsey Osterloo, State Farm Insurance Company

Angela Wu, State Farm Insurance Company

Session 0890-2017:

Getting Classy: A SAS^® Macro for CLASS Statement Automation

When creating statistical models that include multiple covariates (for example, Cox proportional hazards models or multiple linear regression), it is important to address which variables are categorical and continuous for proper analysis and interpretation in SAS^®. Categorical variables, regardless of SAS data type, should be added in the MODEL statement with an additional CLASS statement. In larger models containing many continuous or categorical variables, it is easy to overlook variables that should be added to the CLASS statement. To solve this problem, we have created a macro that uses simple input from the model variables, with PROC CONTENTS and additional logic checks, to create the necessary CLASS statement and to run the desired model. With this macro, variables are evaluated on multiple conditions to see whether they should be considered class variables. Then, they are added automatically to the CLASS statement.

Read the paper (PDF) | View the e-poster or slides (PDF)

Erica Goodrich, Brigham and Women's Hospital

Daniel Sturgeon, Brigham and Women's Hospital

Kathryn Schurr, Quest Diagnostics

Session 1529-2017:

Getting Started with Bayesian Analytics

The presentation will give a brief introduction to Bayesian Analysis within SAS. Participants will learn the difference between Bayesian and Classical Statistics and be introduced to PROC MCMC.

Danny Modlin, SAS

Session 1527-2017:

Getting Started with Multilevel Modeling

In this presentation you will learn the basics of working with nested data, such as students within classes, customers within households, or patients within clinics through the use of multilevel models. Multilevel models can accommodate correlation among nested units through random intercepts and slopes, and generalize easily to 2, 3, or more levels of nesting. These models represent a statistically efficient and powerful way to test your key hypotheses while accounting for the hierarchical nesting of the design. The GLIMMIX procedure is used to demonstrate analyses in SAS.

Catherine Truxillo, SAS

H

Session SAS2009-2017:

Hands-On Workshop: Statistical Analysis using SAS^® University Edition

This workshop provides hands-on experience performing statistical analysis with the Statistics tasks in SAS Studio. Workshop participants will learn to perform statistical analyses using tasks, evaluate which tasks are ideal for different kinds of analyses, edit the generated code, and customize a task.

Danny Modlin, SAS

I

Session SAS0722-2017:

Investigating Big-Data Crime Scenes

Statistical analysis is like detective work, and a data set is like the crime scene. The data set contains unorganized clues and patterns that can, with proper analysis, ultimately lead to meaningful conclusions. Using SAS^® tools, a statistical analyst (like any good crime scene investigator) performs a preliminary analysis of the data set through visualization and descriptive statistics. Based on the preliminary analysis, followed by a detailed analysis, both the crime scene investigator (CSI) and the statistical analyst (SA) can use scientific or analytical tools to answer the key questions: What happened? What were the causes and effects? Why did this happen? Will it happen again? Applying the CSI analogy, this paper presents an example case study using a two-step process to investigate a big-data crime scene. Part I shows the general procedures that are used to identify clues and patterns and to obtain preliminary insights from those clues. Part II narrows the focus on the specific statistical analyses that provide answers to different questions.

Read the paper (PDF)

Theresa Ngo, SAS

M

Session 1155-2017:

Meta-Analysis of Human Trafficking in the United States

Meta-analysis is a method for combining multiple independent studies on the same subject or question, producing a single large study with increased accuracy and enhanced ability to detect overall trends and smaller effects. This is done by treating the results of each study as a single observation and performing analysis on the set, while controlling for differences between individual studies. These differences can be treated as either fixed or random effects, depending on context. This paper demonstrates the process and techniques used in meta-analysis using human trafficking studies. This problem has seen increasing interest in the past few years, and there are now a number of localized studies for one state or a metropolitan area. This meta-analysis combines these to begin development of a comprehensive analytic understanding of human trafficking across the United States. Both fixed and random effects are described. All elements of this analysis were performed using SAS^® University Edition.

Read the paper (PDF)

David Corliss, Peace-Work

Heather Hill, Peace-Work

Session 1027-2017:

Monitoring Dynamic Social Networks Using SAS/IML^®, SAS/QC^®, and R

Dynamic social networks can be used to monitor the constantly changing nature of interactions and relationships between people and groups. The size and complexity of modern dynamic networks can make this task extremely challenging. Using the combination of SAS/IML^®, SAS/QC^®, and R, we propose a fast approach to monitor dynamic social networks. A discrepancy score at edge level was developed to measure the unusualness of the observed social network. Then, multivariate and univariate change-point detection methods were applied on the aggregated discrepancy score to identify the edges and vertices that have experienced changes. Stochastic block model (SBM) networks were simulated to demonstrate this method using SAS/IML and R. PROC SHEWHART and PROC CUSUM in SAS/QC and PROC SGRENDER heat maps were applied on the aggregated discrepancy score to monitor the dynamic social network. The combination of SAS/IML, SAS/QC, and R make it an ideal tool to monitor dynamic social networks.

View the e-poster or slides (PDF)

Huan Li, The University of Alabama

Michael Porter, The University of Alabama

Session 0764-2017:

Multi-Group Calibration in SAS^®: The IRT Procedure and SAS/IML^®

In item response theory (IRT), the distribution of examinees' abilities is needed to estimate item parameters. However, specifying the ability distribution is difficult, if not impossible, because examinees' abilities are latent variables. Therefore, IRT estimation programs typically assume that abilities follow a standard normal distribution. When estimating item parameters using two separate computer runs, one problem with this approach is that it causes item parameter estimates obtained from two groups that differ in ability level to be on different scales. There are several methods that can be used to place the item parameter estimates on a common scale, one of which is multi-group calibration. This method is also called concurrent calibration because all items are calibrated concurrently with a single computer run. There are two ways to implement multi-group calibration in SAS^®: 1) Using PROC IRT. 2) Writing an algorithm from scratch using SAS/IML^®. The purpose of this study is threefold. First, the accuracy of the item parameter estimates are evaluated using a simulation study. Second, the item parameter estimates are compared to those produced by an item calibration program flexMIRT. Finally, the advantages and disadvantages of using these two approaches to conduct multi-group calibration are discussed.

View the e-poster or slides (PDF)

Kyung Yong Kim, University of Iowa

Seohee Park, University of Iowa

Jinah Choi, University of Iowa

Hongwook Seo, ACT

Session 1404-2017:

Multicollinearity: What Is It, Why Should We Care, and How Can It Be Controlled?

Multicollinearity can be briefly described as the phenomenon in which two or more identified predictor variables in a multiple regression model are highly correlated. The presence of this phenomenon can have a negative impact on the analysis as a whole and can severely limit the conclusions of the research study. This paper reviews and provides examples of the different ways in which multicollinearity can affect a research project, and tells how to detect multicollinearity and how to reduce it once it is found. In order to demonstrate the effects of multicollinearity and how to combat it, this paper explores the proposed techniques by using the Behavioral Risk Factor Surveillance System data set. This paper is intended for any level of SAS^® user. This paper is also written to an audience with a background in behavioral science or statistics.

Read the paper (PDF)

Deanna Schreiber-Gregory, National University

N

Session 1470-2017:

N-Stage Machine Learning Analysis with the LUA Procedure Helps Solve Big Data Analysis Problems

Data has to be a moderate size when you estimate parameters with machine learning. For data with huge numbers of records like healthcare big data (for example, receipt data) or super multi-dimensional data like genome big data, it is important to follow a procedure in which data is cleaned first and then selection of data or variables for modeling is performed. Big data often consists of macroscopic and microscopic groups. With these groups, it is possible to increase the accuracy of estimation by following the above procedure in which data is cleaned from a macro perspective and the selection of data or variables for modeling is performed from a micro perspective. This kind of step-wise procedure can be expected to help reduce bias. We also propose a new analysis algorithm with N-stage machine learning. For simplicity, we assume N =2. Note that different machine learning approaches should be applied; that is, a random forest method is used at the first stage for data cleaning, and an elastic net method is used for the selection of data or variables. For programming N-stage machine learning, we use the LUA procedure that is not only efficient, but also enables an easily readable iteration algorithm to be developed. Note that we use well-known machine learning methods that are implementable with SAS^® 9.4, SAS^® In-Memory Statistics, and so on.

Read the paper (PDF)

Ryo Kiguchi, Shionogi & Co., LTD

Eri Sakai, Shionogi & Co., LTD

Yoshitake Kitanishi, Shionogi & Co., LTD

Akio Tsuji, Shionogi & Co., LTD

R

Session 0242-2017:

Random Forests with Approximate Bayesian Model Averaging

A random forest is an ensemble of decision trees that often produce more accurate results than a single decision tree. The predictions of the individual trees in the forest are averaged to produce a final prediction. The question now arises whether a better or more accurate final prediction cannot be obtained by a more intelligent use of the trees in the forest. In particular, in the way random forests are currently defined, every tree contributes the same fraction to the final result (for example, if there are 50 trees, each tree contributes 1/50th to the final result). This ignores model uncertainty as less accurate trees are treated exactly like more accurate trees. Replacing averaging with Bayesian Model Averaging will give better trees the opportunity to contribute more to the final result, which might lead to more accurate predictions. However, there are several complications to this approach that have to be resolved, such as the computation of an SBC value for a decision tree. Two novel approaches to solving this problem are presented and the results compared to that obtained with the standard random forest approach.

Read the paper (PDF)

Tiny Du Toit, North-West University

Andre De Waal, SAS

S

Session 1005-2017:

SAS^® Macros for Computing the Mediated Effect in the Pretest-Posttest Control Group Design

Mediation analysis is a statistical technique for investigating the extent to which a mediating variable transmits the relation of an independent variable to a dependent variable. Because it is useful in many fields, there have been rapid developments in statistical mediation methods. The most cutting-edge statistical mediation analysis focuses on the causal interpretation of mediated effect estimates. Cause-and-effect inferences are particularly challenging in mediation analysis because of the difficulty of randomizing subjects to levels of the mediator (MacKinnon, 2008). The focus of this paper is how incorporating longitudinal measures of the mediating and outcome variables aides in the causal interpretation of mediated effects. This paper provides useful SAS^® tools for designing adequately powered studies to detect the mediated effect. Three SAS macros were developed using the powerful but easy-to-use REG, CALIS, and SURVEYSELECT procedures to do the following: (1) implement popular statistical models for estimating the mediated effect in the pretest-posttest control group design; (2) conduct a prospective power analysis for determining the required sample size for detecting the mediated effect; and (3) conduct a retrospective power analysis for studies that have already been conducted and a required sample to detect an observed effect is desired. We demonstrate the use of these three macros with an example.

Read the paper (PDF)

David MacKinnon, Arizona State University

Session SAS0521-2017:

Step Up Your Statistical Practice with Today's SAS/STAT^® Software

Has the rapid pace of SAS/STAT^® releases left you unaware of powerful enhancements that could make a difference in your work? Are you still using PROC REG rather than PROC GLMSELECT to build regression models? Do you understand how the GENMOD procedure compares with the newer GEE and HPGENSELECT procedures? Have you grasped the distinction between PROC PHREG and PROC ICPHREG? This paper will increase your awareness of modern alternatives to well-established tools in SAS/STAT by using succinct, high-level comparisons rather than detailed descriptions to explain the relative benefits of procedures and methods. The paper focuses on alternatives in the areas of regression modeling, mixed models, generalized linear models, and survival analysis. When you see the advantages of these newer tools, you will want to put them into practice. This paper points you to helpful resources for getting started.

Read the paper (PDF)

Robert Rodriguez, SAS

Phil Gibbs, SAS

T

Session SAS0427-2017:

Telling the Story of Your Process with Graphical Enhancements of Control Charts

Have you ever used a control chart to assess the variation in a process? Did you wonder how you could modify the chart to tell a more complete story about the process? This paper explains how you can use the SHEWHART procedure in SAS/QC^® software to make the following enhancements: display multiple sets of control limits that visualize the evolution of the process, visualize stratified variation, explore within-subgroup variation with box-and-whisker plots, and add information that improves the interpretability of the chart. The paper begins by reviewing the basics of control charts and then illustrates the enhancements with examples drawn from real-world quality improvement efforts.

Read the paper (PDF)

Bucky Ransdell, SAS

Session 1160-2017:

To Hydrate or Chlorinate: A Regression Analysis of the Levels of Chlorine in the Public Water Supply

Public water supplies contain disease-causing microorganisms in the water or distribution ducts. To kill off these pathogens, a disinfectant, such as chlorine, is added to the water. Chlorine is the most widely used disinfectant in all US water treatment facilities. Chlorine is known to be one of the most powerful disinfectants to restrict harmful pathogens from reaching the consumer. In the interest of obtaining a better understanding of what variables affect the levels of chlorine in the water, this presentation analyzed a particular set of water samples randomly collected from locations in Orange County, Florida. Thirty water samples were collected and their chlorine level, temperature, and pH were recorded. A linear regression analysis was performed on the data collected with several qualitative and quantitative variables. Water storage time, temperature, time of day, location, pH, and dissolved oxygen level were the independent variables collected from each water sample. All data collected was analyzed using various SAS^® procedures. Partial residual plots were used to determine possible relationships between the chlorine level and the independent variables. A stepwise selection was used to eliminate possible insignificant predictors. From there, several possible models for the data were selected. F-tests were conducted to determine which of the models appeared to be the most useful.

View the e-poster or slides (PDF)

Drew Doyle, University of Central Florida

U

Session 1039-2017:

Using PROC SEVERITY to Evaluate Quantile Approximation Techniques for Compound Distributions

This paper uses a simulation comparison to evaluate quantile approximation methods in terms of their practical usefulness and potential applicability in an operational risk context. A popular method in modeling the aggregate loss distribution in risk and insurance is the Loss Distribution Approach (LDA). Many banks currently use the LDA for estimating regulatory capital for operational risk. The aggregate loss distribution is a compound distribution resulting from a random sum of losses, where the losses are distributed according to some severity distribution and the number (of losses) distributed according to some frequency distribution. In order to estimate the regulatory capital, an extreme quantile of the aggregate loss distribution has to be estimated. A number of numerical approximation techniques have been proposed to approximate the extreme quantiles of the aggregate loss distribution. We use PROC SEVERITY to fit various severity distributions to simulated samples of individual losses from a preselected severity distribution. The accuracy of the approximations obtained is then evaluated against a Monte Carlo approximation of the extreme quantiles of the compound distribution resulting from the preselected severity distribution. We find that the second-order perturbative approximation, a closed-form approximation, performs very well at the extreme quantiles and over a wide range of distributions and is very easy to implement.

Read the paper (PDF)

Helgard Raubenheimer, Center for BMI, North-West University

Riaan de Jongh, Center for BMI, North-West University

Session 0869-2017:

Using a Population Average Model to Investigate the Success of a Customer Retention Strategy

In many healthcare settings, patients are like customers they have a choice. One example is whether to participate in a procedure. In population-based screening in which the goal is to reduce deaths, the success of a program hinges on the patient's choice to accept and comply with the procedure. Like in many other industries, this not only relies on the program to attract new eligible patients to attend for the first time, but it also relies on the ability of the program to retain existing customers. The success of a new customer retention strategy within a breast screening environment is examined by applying a population averaged model (also know as marginal models), which uses generalized estimating equations (GEEs) to account for the lack of independence of the observations. Arguments for why a population average model was applied instead of a mixed effects model (or random effects model) are provided. This business case provides a great introductory session for people to better understand the difference between mixed effects and marginal models, and illustrates how to implement a population average model within SAS^® by using the GENMOD procedure.

Read the paper (PDF)

Colleen McGahan, BC CANCER AGENCY