When large amounts of data are available, choosing the variables for inclusion in model building can be problematic. In this analysis, a subset of variables was required from a larger set. This subset was to be used in a later cluster analysis with the aim of extracting dimensions of human flourishing. A genetic algorithm (GA), written in SAS®, was used to select the subset of variables from a larger set in terms of their association with the dependent variable life satisfaction. Life satisfaction was selected as a proxy for an as yet undefined quantity, human flourishing. The data were divided into subject areas (health, environment). The GA was applied separately to each subject area to ensure adequate representation from each in the future analysis when defining the human flourishing dimensions.
Lisa Henley, University of Canterbury
Companies that offer subscription-based services (such as telecom and electric utilities) must evaluate the tradeoff between month-to-month (MTM) customers, who yield a high margin at the expense of lower lifetime, and customers who commit to a longer-term contract in return for a lower price. The objective, of course, is to maximize the Customer Lifetime Value (CLV). This tradeoff must be evaluated not only at the time of customer acquisition, but throughout the customer's tenure, particularly for fixed-term contract customers whose contract is due for renewal. In this paper, we present a mathematical model that optimizes the CLV against this tradeoff between margin and lifetime. The model is presented in the context of a cohort of existing customers, some of whom are MTM customers and others who are approaching contract expiration. The model optimizes the number of MTM customers to be swapped to fixed-term contracts, as well as the number of contract renewals that should be pursued, at various term lengths and price points, over a period of time. We estimate customer life using discrete-time survival models with time varying covariates related to contract expiration and product changes. Thereafter, an optimization model is used to find the optimal trade-off between margin and customer lifetime. Although we specifically present the contract expiration case, this model can easily be adapted for customer acquisition scenarios as well.
Atul Thatte, TXU Energy
Goutam Chakraborty, Oklahoma State University
This session will describe an innovative way to identify groupings of customer offerings using SAS® software. The authors investigated the customer enrollments in nine different programs offered by a large energy utility. These programs included levelized billing plans, electronic payment options, renewable energy, energy efficiency programs, a home protection plan, and a home energy report for managing usage. Of the 640,788 residential customers, 374,441 had been solicited for a program and had adequate data for analysis. Nearly half of these eligible customers (49.8%) enrolled in some type of program. To examine the commonality among programs based on characteristics of customers who enroll, cluster analysis procedures and correlation matrices are often used. However, the value of these procedures was greatly limited by the binary nature of enrollments (enroll or no enroll), as well as the fact that some programs are mutually exclusive (limiting cross-enrollments for correlation measures). To overcome these limitations, PROC LOGISTIC was used to generate predicted scores for each customer for a given program. Then, using the same predictor variables, PROC LOGISTIC was used on each program to generate predictive scores for all customers. This provided a broad range of scores for each program, under the assumption that customers who are likely to join similar programs would have similar predicted scores for these programs. PROC FASTCLUS was used to build k-means cluster models based on these predicted logistic scores. Two distinct clusters were identified from the nine programs. These clusters not only aligned with the hypothesized model, but were generally supported by correlations (using PROC CORR) among program predicted scores as well as program enrollments.
Brian Borchers, PhD, Direct Options
Ashlie Ossege, Direct Options
Data mining and predictive models are extensively used to find the optimal customer targets in order to maximize the return on investment. Direct marketing techniques target all the customers who are likely to buy regardless of the customer classification. In a real sense, this mechanism couldn't classify the customers who are going to buy even without a marketing contact, thereby resulting in a loss on investment. This paper focuses on the Incremental Lift modeling approach using Weight of Evidence Coding and Information Value followed by Incremental Response and Outcome model Diagnostics. This model identifies the additional purchases that would not have taken place without a marketing campaign. Modeling work was conducted using a combined model. The research work is carried out on Travel Center data. This data identifies the increase in average response rate by 2.8% and the number of fuel gallons by 244 when compared with the results from the traditional campaign, which targeted everyone. This paper discusses in detail the implementation of the 'Incremental Response' node to direct the marketing campaigns and its Incremental Revenue and Profit analysis.
Sravan Vadigepalli, Best Buy
The Ebola virus outbreak is producing some of the most significant and fastest trending news throughout the globe today. There is a lot of buzz surrounding the deadly disease and the drastic consequences that it potentially poses to mankind. Social media provides the basic platforms for millions of people to discuss the issue and allows them to openly voice their opinions. There has been a significant increase in the magnitude of responses all over the world since the death of an Ebola patient in a Dallas, Texas hospital. In this paper, we aim to analyze the overall sentiment that is prevailing in the world of social media. For this, we extracted the live streaming data from Twitter at two different times using the Python scripting language. One instance relates to the period before the death of the patient, and the other relates to the period after the death. We used SAS® Text Miner nodes to parse, filter, and analyze the data and to get a feel for the patterns that exist in the tweets. We then used SAS® Sentiment Analysis Studio to further analyze and predict the sentiment of the Ebola outbreak in the United States. In our results, we found that the issue was not taken very seriously until the death of the Ebola patient in Dallas. After the death, we found that prominent personalities across the globe were talking about the disease and then raised funds to fight it. We are continuing to collect tweets. We analyze the locations of the tweets to produce a heat map that corresponds to the intensity of the varying sentiment across locations.
Dheeraj Jami, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Shivkanth Lanka, Oklahoma State University
With the increase in government and commissions incentivizing electric utilities to get consumers to save energy, there has been a large increase in the number of energy saving programs. Some are structural, incentivizing consumers to make improvements to their home that result in energy savings. Some, called behavioral programs, are designed to get consumers to change their behavior to save energy. Within behavioral programs, Home Energy Reports are a good method to achieve behavioral savings as well as to educate consumers on structural energy savings. This paper examines the different Home Energy Report communication channels (direct mail and e-mail) and the marketing channel effect on energy savings, using SAS® for linear models. For consumer behavioral change, we often hear the questions: 1) Are the people that responded via direct mail solicitation saving at a higher rate than people who responded via an e-mail solicitation? 1a) Hypothesis: Because e-mail is easy to respond to, the type of customers that enroll through this channel will exert less effort for the behavior changes that require more time and investment toward energy efficiency changes and thus will save less. 2) Does the mode of that ongoing dialog (mail versus e-mail) impact the amount of consumer savings? 2a) Hypothesis: E-mail is more likely to be ignored and thus these recipients will save less. As savings is most often calculated by comparing the treatment group to a control group (to account for weather and economic impact over time), and by definition you cannot have a dialog with a control group, the answers are not a simple PROC FREQ away. Also, people who responded to mail look very different demographically than people who responded to e-mail. So, is the driver of savings differences the channel, or is it the demographics of the customers that happen to use those chosen channels? This study used clustering (PROC FASTCLUS) to segment the consumers by mail versus e-mail and append cluster assignment
s to the respective control group. This study also used DID (Difference-in-Differences) as well as Billing Analysis (PROC GLM) to calculate the savings of these groups.
Angela Wells, Direct Options
Ashlie Ossege, Direct Options
The impact of price on brand sales is not always linear or independent of other brand prices. We demonstrate, using sales information and SAS® Enterprise Miner, how to uncover relative price bands where prices might be increased without losing market share or decreased slightly to gain share.
Ryan Carr, SAS
Charles Park, Lenovo
A common problem when developing classifications models is the imbalance of classes in the classification variable. This imbalance means that a class is represented by a large number of cases while the other class is represented by very few. When this happens, the predictive power of the developed model could be biased. This is the case because classification methods tend to favor the majority class. And the classification methods are designed to minimize the error on the total data set regardless of the proportions or balance of the classes. Due to this problem, there are several techniques used to balance the distribution of the classification variable. One method is to reduce the size of the majority class (under-sampling), another is to increase the number of cases in the minority class (over-sampling); or a third method is to combine these two methods. There is also a more complex technique called SMOTE (Synthetic Minority Over-sampling Technique) that consists of intelligently generating new synthetic registers of the minority class using a closest-neighbors approach. In this paper, we present the development in SAS® of a combination of SMOTE and under-sampling techniques as applied to a churn model. Then, we compare the predictive power of the model using this proposed balancing technique against other models developed with different data sampling techniques.
Lina Maria Guzman Cartagena, DIRECTV
Analyzing the key success factors for hit songs in the Billboard music charts is an ongoing area of interest to the music industry. Although there have been many studies over the past decades on predicting whether a song has the potential to become a hit song, the following research question remains, Can hit songs be predicted? And, if the answer is yes, what are the characteristics of those hit songs? This study applies data mining techniques using SAS® Enterprise Miner™ to understand why some music is more popular than other music. In particular, certain songs are considered one-hit wonders, which are in the Billboard music charts only once. Meanwhile, other songs are acknowledged as masterpieces. With 2,139 data records, the results demonstrate the practical validity of our approach.
Piboon Banpotsakun, National Institute of Development Administration
Jongsawas Chongwatpol, NIDA Business School, National Institute of Development Administration
It has always been a million-dollar question, What inhibits a donor to donate? Many successful universities have deep roots in annual giving. We know donor sentiment is a key factor in drawing attention to engage donors. This paper is a summary of findings about donor behaviors using textual analysis combined with the power of predictive modeling. In addition to identifying the characteristics of donors, the paper focuses on identifying the characteristics of a first-time donor. It distinguishes the features of the first-time donor from the general donor pattern. It leverages the variations in data to provide deeper insights into behavioral patterns. A data set containing 247,000 records was obtained from the XYZ University Foundation alumni database, Facebook, and Twitter. Solicitation content such as email subject lines sent to the prospect base was considered. Time-dependent data and time-independent data were categorized to make unbiased predictions about the first-time donor. The predictive models use input such as age, educational records, scholarships, events, student memberships, and solicitation methods. Models such as decision trees, Dmine regression, and neural networks were built to predict the prospects. SAS® Sentiment Analysis Studio and SAS® Enterprise Miner™ were used to analyze the sentiment.
Ramcharan Kakarla, Comcast
Goutam Chakraborty, Oklahoma State University
My SAS® Global Forum 2013 paper 'Variable Reduction in SAS® by Using Weight of Evidence (WOE) and Information Value (IV)' has become the most sought-after online article on variable reduction in SAS since its publication. But the methodology provided by the paper is limited to reduction of numeric variables for logistic regression only. Built on a similar process, the current paper adds several major enhancements: 1) The use of WOE and IV has been expanded to the analytics and modeling for continuous dependent variables. After the standardization of a continuous outcome, all records can be divided into two groups: positive performance (outcome y above sample average) and negative performance (outcome y below sample average). This treatment is rigorously consistent with the concept of entropy in Information Theory: the juxtaposition of two opposite forces in one equation, and a stronger contrast between the two suggests a higher intensity , that is, more information delivered by the variable in question. As the standardization keeps the outcome variable continuous and quantified, the revised formulas for WOE and IV can be used in the analytics and modeling for continuous outcomes such as sales volume, claim amount, and so on. 2) Categorical and ordinal variables can be assessed together with numeric ones. 3) Users of big data usually need to evaluate hundreds or thousands of variables, but it is not uncommon that over 90% of variables contain little useful information. We have added a SAS macro that trims these variables efficiently in a broad-brushed manner without a thorough examination. Afterward, we examine the retained variables more carefully on their behaviors to the target outcome. 4) We add Chi-Square analysis for categorical/ordinal variables and Gini coefficients for numeric variable in order to provide additional suggestions for segmentation and regression. With the above enhancements added, a SAS macro program is provided at the end of the paper as a
complete suite for variable reduction/selection that efficiently evaluates all variables together. The paper provides a detailed explanation for how to use the SAS macro and how to read the SAS outputs that provide useful insights for subsequent linear regression, logistic regression, or scorecard development.
Alec Zhixiao Lin, PayPal Credit
Design of experiments (DOE) is an essential component of laboratory, greenhouse, and field research in the natural sciences. It has also been an integral part of scientific inquiry in diverse social science fields such as education, psychology, marketing, pricing, and social works. The principle and practices of DOE are among the oldest and the most advanced tools within the realm of statistics. DOE classification schemes, however, are diverse and, at times, confusing. In this presentation, we provide a simple conceptual classification framework in which experimental methods are grouped into classical and statistical approaches. The classical approach is further divided into pre-, quasi-, and true-experiments. The statistical approach is divided into one, two, and more than two factor experiments. Within these broad categories, we review several contemporary and widely used designs and their applications. The optimal use of Base SAS® and SAS/STAT® to analyze, summarize, and report these diverse designs is demonstrated. The prospects and challenges of such diverse and critically important analytics tools on business insight extraction in marketing and pricing research are discussed.
Max Friedauer
Jason Greenfield, Cardinal Health
Yuhan Jia, Cardinal Health
Joseph Thurman, Cardinal Health
The PROPCASE function is useful when you are cleansing a database of names and addresses in preparation for mailing. But it does not know the difference between a proper name (in which initial capitalization should be used) and an acronym (which should be all uppercase). This paper explains an algorithm that determines with reasonable accuracy whether a word is an acronym and, if it is, converts it to uppercase.
Joe DeShon, Boehringer Ingelheim Vetmedica
Big data is quickly moving from buzzword to critical tool for today's analytics applications. It can be easy to get bogged down by Apache Hadoop terminology, but when you get down to it, big data is about empowering organizations to deliver the right message or product to the right audience at the right time. Find out how Epsilon built a next-generation marketing application, leveraging Cloudera and taking advantage of SAS® capabilities by our data science/analytics team, that provides its clients with a 360-degree view of their customers. Join Bob Zurek, Senior Vice President of Products at Epsilon to hear how this new big data solution is enhancing customer service and providing a significant competitive differentiation.
Bob Zurek, Epsilon
SAS® blogs (hosted at http://blogs.sas.com/content) attract millions of page views annually. With hundreds of authors, thousands of posts, and constant chatter within the blog comments, it's impossible for one person to keep track of all of the activity. In this paper, you learn how SAS technology is used to gather data and report on SAS blogs from the inside out. The beneficiaries include personnel from all over the company, including marketing, technical support, customer loyalty, and executives. The author describes the business case for tracking and reporting on the activity of blogging. You learn how SAS tools are used to access the WordPress database and how to create a 'blog data mart' for reporting and analytics. The paper includes specific examples of the insight that you can gain from examining the blogs analytically, and which techniques are most useful for achieving that insight. For example, the blog transactional data are combined with social media metrics (also gathered by using SAS) to show which blog entries and authors yield the most engagement on Twitter, Facebook, and LinkedIn. In another example, we identified the growing trend of 'blog comment spam' on the SAS blog properties and measured its cost to the business. These metrics helped to justify the investment in a solution. Many of the tools used are part of SAS® Foundation, including SAS/ACCESS®, the DATA step and SQL, PROC REPORT, PROC SGPLOT, and more. The results are shared in static reports, automated daily email summaries, dynamic reports hosted in SAS/IntrNet®, and even a corporate dashboard hosted in SAS® Visual Analytics.
Chris Hemedinger, SAS
In data mining modelling, data preparation is the most crucial, most difficult, and longest part of the mining process. A lot of steps are involved. Consider the simple distribution analysis of the variables, the diagnosis and reduction of the influence of variables' multicollinearity, the imputation of missing values, and the construction of categories in variables. In this presentation, we use data mining models in different areas like marketing, insurance, retail and credit risk. We show how to implement data preparation through SAS® Enterprise Miner™, using different approaches. We use simple code routines and complex processes involving statistical insights, cluster variables, transform variables, graphical analysis, decision trees, and more.
Ricardo Galante, SAS
Although today's marketing teams enjoy large-scale campaign relationship management systems, many are still left with the task of bridging the well-known gap between campaigns and customer purchasing decisions. During this session, we discuss how Slalom Consulting and Celebrity Cruises decided to take a bold step and bridge that gap. We show how marketing efforts are distorted when a team considers only the last campaign sent to a customer that later booked a cruise. Then we lay out a custom-built SAS 9.3 solution that scales to process thousands of campaigns per month using a stochastic attribution technique. This approach considers all of the campaigns that touch the customer, assigning a single campaign or a set of campaigns that contributed to their decision.
Christopher Byrd, Slalom Consulting
There are various economic factors that affect retail sales. One important factor that is expected to correlate is overall customer sentiment toward a brand. In this paper, we analyze how location-specific customer sentiment could vary and correlate with sales at retail stores. In our attempt to find any dependency, we have used location-specific Twitter feeds related to a national-brand chain retail store. We opinion-mine their overall sentiment using SAS® Sentiment Analysis Studio. We estimate correlation between the opinion index and retail sales within the studied geographic areas. Later in the analysis, using ArcGIS Online from Esri, we estimate whether other location-specific variables that could potentially correlate with customer sentiment toward the brand are significantly important to predict a brand's retail sales.
Asish Satpathy, University of California, Riverside
Goutam Chakraborty, Oklahoma State University
Tanvi Kode, Oklahoma State University
There are few business environments more dynamic than that of a casino. Serving a multitude of entertainment options to thousands of patrons every day results in a lot of customer interaction points. All of these interactions occur in a highly competitive environment where, if a patron doesn't feel that he is getting the recognition that he deserves, he can easily walk across the street to a competitor. Add to this the expected amount of reinvestment per patron in the forms of free meals and free play. Making high-quality real-time decisions during each customer interaction is critical to the success of a casino. Such decisions need to be relevant to customers' needs and values, reflect the strategy of the business, and help maximize the organization's profitability. Being able to make those decisions repeatedly is what separates highly successful businesses from those that flounder or fail. Casinos have a great deal of information about a patron's history, behaviors, and preferences. Being able to react in real time to newly gathered information captured in ongoing dialogues opens up new opportunities about what offers should be extended and how patrons are treated. In this session, we provide an overview of real-time decisioning and its capabilities, review the various opportunities for real-time interaction in a casino environment, and explain how to incorporate the outputs of analytics processes into a real-time decision engine.
Natalie Osborn, SAS
Predictive analytics has been widely studied in recent years, and it has been applied to solve a wide range of real-world problems. Nevertheless, current state-of-the-art predictive analytics models are not well aligned with managers' requirements in that the models fail to include the real financial costs and benefits during the training and evaluation phases. Churn predictive modeling is one of those examples in which evaluating a model based on a traditional measure such as accuracy or predictive power does not yield the best results when measured by investment per subscriber in a loyalty campaign and the financial impact of failing to detect a real churner versus wrongly predicting a non-churner as a churner. In this paper, we propose a new financially based measure for evaluating the effectiveness of a voluntary churn campaign, taking into account the available portfolio of offers, their individual financial cost, and the probability of acceptance depending on the customer profile. Then, using a real-world churn data set, we compared different cost-insensitive and cost-sensitive predictive analytics models and measured their effectiveness based on their predictive power and cost optimization. The results show that using a cost-sensitive approach yields to an increase in profitability of up to 32.5%.
Alejandro Correa Bahnsen, University of Luxembourg
Darwin Amezquita, DIRECTV
Juan Camilo Arias, Smartics
Customer Long-Term Value (LTV) is a concept that is readily explained at a high level to marketing management of a company, but its analytic development is complex. This complexity involves the need to forecast customer behavior well into the future. This behavior includes the timing, frequency, and profitability of a customer's future purchases of products and services. This paper describes a method for computing LTV. First, a multinomial logistic regression provides probabilities for time-of-first-purchase, time-of-second-purchase, and so on, for each customer. Then the profits for the first purchase, second purchase, and so on, are forecast but only after adjustment for non-purchaser selection bias. Finally, these component models are combined in the LTV formula.
Bruce Lund, Marketing Associates, LLC
Retailers proactively seek a data-driven approach to provide customized product recommendations to guarantee sales increase and customer loyalty. Product affinity models have been recognized as one of the vital tools for this purpose. The algorithm assigns a customer to a product affinity group when the likelihood of purchasing is the highest and the likelihood meets the minimum and absolute requirement. However, in practice, valuable customers, up to 30% of the total universe, who buy across multiple product categories with two or more balanced product affinity likelihoods, are undefined and unable to be effectively product recommended. This paper presents multiple product affinity models that are developed using SAS® macro language to address the problem. In this paper, we demonstrate how the innovative assignment algorithm successfully assigns the undefined customers to appropriate multiple product affinity groups using nationwide retailer transactional data. In addition, the result shows that potential customers establish loyalty through migration from a single to multiple product affinity groups. This comprehensive and insightful business solution will be shared in this paper. Also, this paper provides a clustering algorithm and nonparametric tree model for model building. The customer assignment for using SAS macro code is provided in an appendix.
Hsin-Yi Wang, Alliance Data Systems
Faced with diminishing forecast returns from the forecast engine within the existing replenishment application, Tractor Supply Company (TSC) engaged SAS® Institute to deliver a fully integrated forecasting solution that promised a significant improvement of chain-wide forecast accuracy. The end-to-end forecast implementation including problems faced, solutions delivered, and results realized will be explored.
Chris Houck, SAS
Diabetes is a chronic condition affecting people of all ages and is prevalent in around 25.8 million people in the U.S. The objective of this research is to predict the probability of a diabetic patient being readmitted. The results from this research will help hospitals design a follow-up protocol to ensure that patients having a higher re-admission probability are doing well in order to promote a healthy doctor-patient relationship. The data was obtained from the Center for Machine Learning and Intelligent Systems at University of California, Irvine. The data set contains over 100,000 instances and 55 variables such as insulin and length of stay, and so on. The data set was split into training and validation to provide an honest assessment of models. Various variable selection techniques such as stepwise regression, forward regression, LARS, and LASSO were used. Using LARS, prominent factors were identified in determining the patient readmission rate. Numerous predictive models were built: Decision Tree, Logistic Regression, Gradient Boosting, MBR, SVM, and others. The model comparison algorithm in SAS® Enterprise Miner™ 13.1 recognized that the High-Performance Support Vector Machine outperformed the other models, having the lowest misclassification rate of 0.363. The chosen model has a sensitivity of 49.7% and a specificity of 75.1% in the validation data.
Hephzibah Munnangi, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Retailers, amongst nearly every other consumer business are under more pressure and competition than ever before. Today 's consumer is more connected, informed and empowered and the pace of innovation is rapidly changing the way consumers shop. Retailers are expected to sift through and implement digital technology, make sense of their Big Data with analytics, change processes and cut costs all at the same time. Today 's session, SRetail 2015 the landscape, trends, and technology will cover major issues retailers are facing today as well as both business and technology trends that will shape their future.
Lori Schafer
This paper discusses the selection and transformation of continuous predictor variables for the fitting of binary logistic models. The paper has two parts: (1) A procedure and associated SAS® macro are presented that can screen hundreds of predictor variables and 10 transformations of these variables to determine their predictive power for a logistic regression. The SAS macro passes the training data set twice to prepare the transformations and one more time through PROC TTEST. (2) The FSP (function selection procedure) and a SAS implementation of FSP are discussed. The FSP tests all transformations from among a class of FSP transformations and finds the one with maximum likelihood when fitting the binary target. In a 2008 book, Patrick Royston and Willi Sauerbrei popularized the FSP.
Bruce Lund, Marketing Associates, LLC
The era of mass marketing is over. Welcome to the new age of relevant marketing where whispering matters far more than shouting.' At ZapFi, using the combination of sponsored free Wi-Fi and real-time consumer analytics,' we help businesses to better understand who their customers are. This gives businesses the opportunity to send highly relevant marketing messages based on the profile and the location of the customer. It also leads to new ways to build deeper and more intimate, one-on-one relationships between the business and the customer. During this presentation, ZapFi will use a few real-world examples to demonstrate that the future of mobile marketing is much more about data and far less about advertising.
Gery Pollet, ZapFi
Product affinity segmentation is a powerful technique for marketers and sales professionals to gain a good understanding of customers' needs, preferences, and purchase behavior. Performing product affinity segmentation is quite challenging in practice because product-level data usually has high skewness, high kurtosis, and a large percentage of zero values. The Doughnut Clustering method has been shown to be effective using real data, and was presented at SAS® Global Forum 2013 in the paper titled Product Affinity Segmentation Using the Doughnut Clustering Approach.' However, the Doughnut Clustering method is not a panacea for addressing the product affinity segmentation problem. There is a clear need for a comprehensive evaluation of this method in order to be able to develop generic guidelines for practitioners about when to apply it. In this paper, we meet the need by evaluating the Doughnut Clustering method on simulated data with different levels of skewness, kurtosis, and percentage of zero values. We developed a five-step approach based on Fleishman's power method to generate synthetic data with prescribed parameters. Subsequently, we designed and conducted a set of experiments to run the Doughnut Clustering method as well as the traditional K-means method as a benchmark on simulated data. We draw conclusions on the performance of the Doughnut Clustering method by comparing the clustering validity metric the ratio of between-cluster variance to within-cluster variance as well as the relative proportion of cluster sizes against those of K-means.
Darius Baer, SAS
Goutam Chakraborty, Oklahoma State University
Marketers often face a cross-channel challenge in making sense of the behavior of web visitors who spend considerable time researching an item online, even putting the item in a wish list or checkout basket, but failing to follow up with an actual purchase online, instead opting to purchase the item in the store. This research shows the use of SAS® Visual Analytics to address this challenge. This research uses a large data set of simulated web transactional data, combines it with common IDs to attach the data to in-store retail data, and studies it in SAS Visual Analytics. In this presentation, we go over tips and tricks for using SAS Visual Analytics on a non-distributed server. The loaded data set is analyzed step by step to show how to draw correlations in the web browsing behavior of customers and how to link the data to their subsequent in-store behavior. It shows how we can draw inferences between web visits and in-store visits by department. You'll change your marketing strategy as a result of the research.
Tricia Aanderud, Zencos
Johann Pasion, 89 Degrees
Understanding the behavior of your customers is key to improving and maintaining revenue streams. It is a critical requirement in the crafting of successful marketing campaigns. Using SAS® Visual Analytics, you can analyze and explore user behavior, click paths, and other event-based scenarios. Flow visualizations help you to best understand hotspots, highlight common trends, and find insights in individual user paths or in aggregated paths. This paper explains the basic concepts of path analysis as well as provides detailed background information about how to use flow visualizations within SAS Visual Analytics.
Falko Schulz, SAS
Olaf Kratzsch, SAS
The measurement of factors influencing consumer purchasing decisions is of interest to all manufacturers of goods, retailers selling these goods, and consumers buying these goods. In the past decade, conjoint analysis has become one of the commonly used statistical techniques for analyzing the decisions or trade-offs consumers make when they purchase products. Although recent years have seen increased use of conjoint analysis and conjoint software, there is limited work that has spelled out a systematic procedure on how to do a conjoint analysis or how to use conjoint software. This paper reviews basic conjoint analysis concepts, describes the mathematical and statistical framework on which conjoint analysis is built, and introduces the TRANSREG and PHREG procedures, their syntaxes, and the output they generate using simplified real-life data examples. This paper concludes by highlighting some of the substantives issues related to the application of conjoint analysis in a business environment and the available auto call macros in SAS/STAT®, SAS/IML®, and SAS/QC® software that can handle more complex conjoint designs and analyses. The paper will benefit the basic SAS user, and statisticians and research analysts in every industry, especially those in marketing and advertisement.
Delali Agbenyegah, Alliance Data Systems
Tracking responses is one of the most important aspects of the campaign life cycle for a marketing analyst; yet this is often a daunting task. This paper provides guidance for how to determine what is a response, how it is defined for your business, and how you collect data to support it. It provides guidance in the context of SAS® Marketing Automation and beyond.
Pamela Dixon, SAS
Currently, there are several methods for reading JSON formatted files into SAS® that depend on the version of SAS and which products are licensed. These methods include user-defined macros, visual analytics, PROC GROOVY, and more. The user-defined macro %GrabTweet, in particular, provides a simple way to directly read JSON-formatted tweets into SAS® 9.3. The main limitation of %GrabTweet is that it requires the user to repeatedly run the macro in order to download large amounts of data over time. Manually downloading tweets while conforming to the Twitter rate limits might cause missing observations and is time-consuming overall. Imagine having to sit by your computer the entire day to continuously grab data every 15 minutes, just to download a complete data set of tweets for a popular event. Fortunately, the %GrabTweet macro can be modified to automate the retrieval of Twitter data based on the rate that the tweets are coming in. This paper describes the application of the %GrabTweet macro combined with batch processing to download tweets without manual intervention. Users can specify the phrase parameters they want, run the batch processing macro, leave their computer to automatically download tweets overnight, and return to a complete data set of recent Twitter activity. The batch processing implements an automated retrieval of tweets through an algorithm that assesses the rate of tweets for the specified topic in order to make downloading large amounts of data simpler and effortless for the user.
Isabel Litton, California Polytechnic State University, SLO
Rebecca Ottesen, City of Hope and Cal Poly SLO
In 2012, the Obama campaign used advanced analytics to target voters, especially in social media channels. Millions of voters were scored on models each night to predict their voting patterns. These models were used as the driver for all campaign decisions, including TV ads, budgeting, canvassing, and digital strategies. This presentation covers how the Obama campaign strategies worked, what's in store for analytics in future elections, and how these strategies can be applied in the business world.
Peter Tanner, Capital One
Real-time web content personalization has come into its teen years, but recently a spate of marketing solutions have enabled marketers to finely personalize web content for visitors based on browsing behavior, geo-location, preferences, and so on. In an age where the attention span of a web visitor is measured in seconds, marketers hope that tailoring the digital experience will pique each visitor's interest just long enough to increase corporate sales. The range of solutions spans the entire spectrum of completely cloud-based installations to completely on-premises installations. Marketers struggle to find the most optimal solution that would meet their corporation's marketing objectives, provide them the highest agility and time-to-market, and still keep a low marketing budget. In the last decade or so, marketing strategies that involved personalizing using purely on-premises customer data quickly got replaced by ones that involved personalizing using only web-browsing behavior (a.k.a, clickstream data). This was possible because of a spate of cloud-based solutions that enabled marketers to de-couple themselves from the underlying IT infrastructure and the storage issues of capturing large volumes of data. However, this new trend meant that corporations weren't using much of their treasure trove of on-premises customer data. Of late, however, enterprises have been trying hard to find solutions that give them the best of both--the ease of gathering clickstream data using cloud-based applications and on-premises customer data--to perform analytics that lead to better web content personalization for a visitor. This paper explains a process that attempts to address this rapidly evolving need. The paper assumes that the enterprise already has tools for capturing clickstream data, developing analytical models, and for presenting the content. It provides a roadmap to implementing a phased approach where enterprises continue to capture clickstream data, but they bring that data in-house to be merg
ed with customer data to enable their analytics team to build sophisticated predictive models that can be deployed into the real-time web-personalization application. The final phase requires enterprises to continuously improve their predictive models on a periodic basis.
Mahesh Subramanian, SAS Institute Inc.
Suneel Grover, SAS
Companies are increasingly relying on analytics as the right solution to their problems. In order to use analytics and create value for the business, companies first need to store, transform, and structure the data to make it available and functional. This paper shows a successful business case where the extraction and transformation of the data combined with analytical solutions were developed to automate and optimize the management of the collections cycle for a TELCO company (DIRECTV Colombia). SAS® Data Integration Studio is used to extract, process, and store information from a diverse set of sources. SAS Information Map is used to integrate and structure the created databases. SAS® Enterprise Guide® and SAS® Enterprise Miner™ are used to analyze the data, find patterns, create profiles of clients, and develop churn predictive models. SAS® Customer Intelligence Studio is the platform on which the collection campaigns are created, tested, and executed. SAS® Web Report Studio is used to create a set of operational and management reports.
Darwin Amezquita, DIRECTV
Paulo Fuentes, Directv Colombia
Andres Felipe Gonzalez, Directv
Many retail and consumer packaged goods (CPG) companies are now keeping track of what their customers purchased in the past, often through some form of loyalty program. This record keeping is one example of how modern corporations are building data sets that have a panel structure, a data structure that is also pervasive in insurance and finance organizations. Panel data (sometimes called longitudinal data) can be thought of as the joining of cross-sectional and time series data. Panel data enable analysts to control for factors that cannot be considered by simple cross-sectional regression models that ignore the time dimension. These factors, which are unobserved by the modeler, might bias regression coefficients if they are ignored. This paper compares several methods of working with panel data in the PANEL procedure and discusses how you might benefit from using multiple observations for each customer. Sample code is available.
Bobby Gutierrez, SAS
Kenneth Sanford, SAS