SAS® Grid Computing is a scale-out SAS® solution that enables SAS applications to better utilize computing resources, which is extremely I/O and compute intensive. It requires the use of a high-performance shared storage (SS) that allows all servers to access the same file systems. SS may be implemented via traditional NFS NAS or clustered file systems (CFS) like GPFS. This paper uses the Lustre* file system, a parallel, distributed CFS, for a case study of performance scalability of SAS Grid Computing nodes on SS. The paper qualifies the performance of a standardized SAS workload running on Lustre at scale. Lustre has been traditionally used for large and sequential I/O. We will record and present the tuning changes necessary for the optimization of Lustre for the SAS applications. In addition, results from the scaling of SAS Cluster jobs running on Lustre will be presented.
Suleyman Sair, Intel Corporation
Brett Lee, Intel Corporation
Ying M. Zhang, Intel Corporation
One of the first lessons that SAS® programmers learn on the job is that numeric and character variables do not play well together, and that type mismatches are one of the more common source of errors in their otherwise flawless SAS programs. Luckily, converting variables from one type to another in SAS (that is, casting) is not difficult, requiring only the judicious use of either the input() or put() function. There remains, however, the danger of data being lost in the conversion process. This type of error is most likely to occur in cases of character-to-numeric variable conversion, most especially when the user does not fully understand the data contained in the data set. This paper will review the basics of data storage for character and numeric variables in SAS, the use of formats and informats for conversions, and how to ensure accurate type conversion of even high-precision numeric values.
Andrew Clapson, Statistics Canada
Finding groups with similar attributes is at the core of knowledge discovery. To this end, Cluster Analysis automatically locates groups of similar observations. Despite successful applications, many practitioners are uncomfortable with the degree of automation in Cluster Analysis, which causes intuitive knowledge to be ignored. This is more true in text mining applications since individual words have meaning beyond the data set. Discovering groups with similar text is extremely insightful. However, blind applications of clustering algorithms ignore intuition and hence are unable to group similar text categories. The challenge is to integrate the power of clustering algorithms with the knowledge of experts. We demonstrate how SAS/STAT® 9.2 procedures and the SAS® Macro Language are used to ensemble the opinion of domain experts with multiple clustering models to arrive at a consensus. The method has been successfully applied to a large data set with structured attributes and unstructured opinions. The result is the ability to discover observations with similar attributes and opinions by capturing the wisdom of the crowds whether man or model.
Masoud Charkhabi, Canadian Imperial Bank of Commerce
Ling Zhu, Canadian Imperial Bank of Commerce
SAS/ACCESS® Interface to ODBC has been around forever. On one level, ODBC is very easy to use. That ease hides the flexibility that ODBC offers. This presentation uses examples to show you how to increase your program's performance and troubleshoot problems. You will learn: the differences between ODBC and OLE DB what the odbc.ini file is (and why it is important) how to discover what your ODBC driver is actually doing the difference between a native ACCESS engine and SAS/ACCESS Interface to ODBC
Jeff Bailey, SAS
This paper kicks off a project to write a comprehensive book of best practices for documenting SAS® projects. The presenter s existing documentation styles are explained. The presenter wants to discuss and gather current best practices used by the SAS user community. The presenter shows documentation styles at three different levels of scope. The first is a style used for project documentation, the second a style for program documentation, and the third a style for variable documentation. This third style enables researchers to repeat the modeling in SAS research, in an alternative language, or conceptually.
Peter Timusk, Statistics Canada
The emerging discipline of data governance encompasses data quality assurance, data access and use policy, security risks and privacy protection, and longitudinal management of an organization s data infrastructure. In the interests of forestalling another bureaucratic solution to data governance issues, this presentation features database programming tools that provide rapid access to big data and make selective access to and restructuring of metadata practical.
Sigurd Hermansen, Westat
This paper describes a method that uses some simple SAS® macros and SQL to merge data sets containing related data that contains rows with varying effective date ranges. The data sets are merged into a single data set that represents a serial list of snapshots of the merged data, as of a change in any of the effective dates. While simple conceptually, this type of merge is often problematic when the effective date ranges are not consecutive or consistent, when the ranges overlap, or when there are missing ranges from one or more of the merged data sets. The technique described was used by the Fairfax County Human Resources Department to combine various employee data sets (Employee Name and Personal Data, Personnel Assignment and Job Classification, Personnel Actions, Position-Related data, Pay Plan and Grade, Work Schedule, Organizational Assignment, and so on) from the County's SAP-HCM ERP system into a single Employee Action History/Change Activity file for historical reporting purposes. The technique currently is used to combine fourteen data sets, but is easily expandable by inserting a few lines of code using the existing macros.
James Moon, County of Fairfax, Virginia
Your company s chronically overloaded SAS® environment, adversely impacted user community, and the resultant lackluster productivity have finally convinced your upper management that it is time to upgrade to a SAS® grid to eliminate all the resource problems once and for all. But after the contract is signed and implementation begins, you as the SAS administrator suddenly realize that your company-wide standard mode of SAS operations, that is, using the traditional SAS® Display Manager on a server machine, runs counter to the expectation of the SAS grid your users are now supposed to switch to SAS® Enterprise Guide® on a PC. This is utterly unacceptable to the user community because almost everything has to change in a big way. If you like to play a hero in your little world, this is your opportunity. There are a number of things you can do to make the transition to the SAS grid as smooth and painless as possible, and your users get to keep their favorite SAS Display Manager.
Houliang Li, HL SASBIPros Inc
SAS® is an outstanding suite of software, but not everyone in the workplace speaks SAS. However, almost everyone speaks Excel. Often, the data you are analyzing, the data you are creating, and the report you are producing is a form of a Microsoft Excel spreadsheet. Every year at SAS® Global Forum, there are SAS and Excel presentations, not just because Excel isso pervasive in the workplace, but because there s always something new to learn (or re-learn)! This paper summarizes and references (and pays homage to!) previous SAS Global Forum presentations, as well as examines some of the latest Excel capabilities with the latest versions of SAS® 9.4 and SAS® Visual Analytics.
Andrew Howell, ANJ Solutions
Business Intelligence (BI) dashboards serve as an invaluable, high-level, visual reference tool for decision-making processes in many business industries. A request was made to our department to develop some BI dashboards that could be incorporated in an academic setting. These dashboards would aim to serve various undergraduate executive and administrative staff at the university. While most business data may lend itself to work very well and easily in the development of dashboards, academic data is typically modeled differently and, therefore, faces unique challenges. In this paper, the authors detail and share the design and development process of creating dashboards for decision making in an academic environment utilizing SAS® BI Dashboard 4.3 and other SAS® Enterprise Business Intelligence 9.2 tools. The authors also provide lessons learned as well as recommendations for future implementations of BI dashboards utilizing academic data.
Evangeline Collado, University of Central Florida
Michelle Parente, University of Central Florida
Based on selection criteria, the SAS® Data Integration Studio loop or splitter transformations can be used to generate multiple output files. The ETL developer or SAS® administrator can decide which transformation is better suited for the design, priorities, and SAS configuration at their site. Factors to consider are the setup, maintenance, and performance of the ETL job. The loop transformation requires an understanding of macros and a control table. The splitter transformation is more straightforward and self documenting. If time allows, creating and running a job with each transformation can provide benchmarking to measure performance. For a comparison of these two options, this paper shows an example of the same job using the loop or splitter transformation. For added testing metrics, one can adapt the LOGPARSE SAS macro to parse the job logs.
Liotus Laura, Community Care Behavioral Health
The role of the Data Scientist is the viral job description of the decade. And like LOLcats, there are many types of Data Scientists. What is this new role? Who is hiring them? What do they do? What skills are required to do their job? What does this mean for the SAS® programmer and the statistician? Are they obsolete? And finally, if I am a SAS user, how can I become a Data Scientist? Come learn about this job of the future and what you can do to be part of it.
Chuck Kincaid, Experis Business Analytics
Traditional SAS® programs typically consist of a series of SAS DATA steps, which refine input data sets until the final data set or report is reached. SAS DATA steps do not run in-database. However, SAS® Enterprise Guide® users can replicate this kind of iterative programming and have the resulting process flow run in-database by linking a series of SAS Enterprise Guide Query Builder tasks that output SAS views pointing at data that resides in a Teradata database, right up to the last Query Builder task, which generates the final data set or report. This session both explains and demonstrates this functionality.
Frank Capobianco, Teradata
This poster shows the audience step-by-step how to connect to a database without registering the connection in either the Windows ODBC Administrator tool or in the Windows Registry database. This poster also shows how the connection can be more flexible and better managed by building it into a SAS® macro.
Jesper Michelsen, Nykredit
When viewing and working with SAS® data sets especially wide ones it s often instinctive to rearrange the variables (columns) into some intuitive order. The RETAIN statement is one of the most commonly cited methods used for ordering variables. Though RETAIN can perform this task, its use as an ordering clause can cause a host of easily missed problems due to its intended function of retaining values across DATA step iterations. This risk is especially great for the more novice SAS programmer. Instead, two equally effective and less risky ways to order data set variables are recommended, namely, the FORMAT and SQL SELECT statements.
Andrew Clapson, Statistics Canada
In a good clinical study, statisticians and various stakeholders are interested in assessing and isolating the effect of non-study drugs. One common practice in clinical trials is that clinical investigators follow the protocol to taper certain concomitant medications in an attempt to prevent or resolve adverse reactions and/or to minimize the number of subject withdrawals due to lack of efficacy or adverse event. To assess the impact of those tapering medicines during study is of high interest to clinical scientists and the study statistician. This paper presents the challenges and caveats of assessing the impact of tapering a certain type of concomitant medications using SAS® 9.3 based on a hypothetical case. The paper also presents the advantages of visual graphs in facilitating communications between clinical scientists and the study statistician.
Iuliana Barbalau, Santen Inc.
Chen Shi, Santen Inc
Yang Yang, Santen Inc.
The Department of Market Monitoring (DMM) at California ISO is responsible for promoting a robust, competitive, and nondiscriminatory electric power market in California by keeping a close watch on the efficiency and effectiveness of the ancillary service, congestion management, and real-time spot markets. We monitor the potential of market participants to exercise undue market power, the behavior of market participants that is consistent with attempts to exercise market power and the market performance that results from the interaction of market structure with participant behavior. In order to perform monitoring activities effectively, DMM collects available data, designs, and implement reporting dashboards that track key market metrics. We are using various SAS® BI tools to develop and employ metrics and analytic tools applicable to market structure, participant behavior, and market performance. This paper provides details about the effective use of various SAS BI tools to implement an automated real time market monitoring functionality.
Amol Deshmukh, California ISO Corp.
Jeff McDonald, California ISO Corp.
Changes in default behavior in the last few SAS® releases have enabled faster processing of SAS formats, especially for SAS/ACCESS® customers. But, as with any performance enhancement, your results may vary. This presentation teaches you: the differences between two important SAS format optimizations how to tell which optimization is in effect a simple method to get the behavior you want The target audience for this presentation is SAS/ACCESS customers, particularly those who have also licensed SAS® In-Database Code Accelerator for Teradata or SAS® In-Database Code Accelerator for Greenplum.
David Wiehle, SAS
Using Lilypond typesetting software, you can write publication-grade music scores. The input for Lilypond is a text file that can be written once and then transferred to SAS® for patterned repetition, so that you can cycle through patterns that occur in music. The author plays a sequence of notes and then writes this into Lilypond code. The sequence starts in the key of C with only a two-note sequence. Then the sequence is extended to three-, four-, then five-note sequences, always contained in one octave. SAS is then used to write the same code for all other eleven keys and in seven scale modes. The method is very simple and not advanced programming. Lookup files are used in the programming, demonstrating efficient lookup techniques. The result is a lengthy book or exercise for practicing music in a PDF file, and a sound source file in midi format is created that you can hear. This method shows how various programming languages can be used to write other programming languages.
Peter Timusk, Statistics Canada
Big data is all the rage these days, with the proliferation of data-accumulating electronic gadgets and instrumentation. At the heart of big data analytics is the MapReduce programming model. As a framework for distributed computing, MapReduce uses a divide-and-conquer approach to allow large-scale parallel processing of massive data. As the name suggests, the model consists of a Map function, which first splits data into key-value pairs, and a Reduce function, which then carries out the final processing of the mapper outputs. It is not hard to see how these functions can be simulated with the SAS® hash objects technique, and in reality, implemented in the new SAS® DS2 language. This paper demonstrates how hash object programming can handle data in a MapReduce fashion and shows some potential applications in physics, chemistry, biology, and finance.
Joseph Hinson, Accenture Life Sciences
This session demonstrates how to use Base SAS® tools to add functional, reusable extensions to the SAS® system. Learn how to do the following: Write user-defined macro functions that can be used inline with any other SAS code. Use PROC FCMP to write and store user-defined functions that can be used in other SAS programs. Write DS2 user-defined methods and store them in packages for easy reuse in subsequent DS2 programs.
Mark Jordan, SAS