Data Management Papers A-Z

A
Paper SAS1777-2015:
An Insider's Guide to SAS/ACCESS® Interface to Hadoop
In the very near future you will likely encounter Hadoop. It is rapidly displacing database management systems in the corporate world and is rearing its head in the SAS® world. If you think now is the time to learn how to use SAS with Hadoop, you are in luck. This workshop is the jump start you need. This workshop introduces Hadoop and shows you how to access it by using SAS/ACCESS® Interface to Hadoop. During the workshop, we show you how to do the following: how to configure your SAS environment so that you can access Hadoop data; how to use the Hadoop FILENAME statement; how to use the HADOOP procedure; and how to use SAS/ACCESS Interface to Hadoop (including performance tuning).
Diane Hatcher, SAS
Paper 3327-2015:
Automated Macros to Extract Data from the National (Nationwide) Inpatient Sample (NIS)
The use of administrative databases for understanding practice patterns in the real world has become increasingly apparent. This is essential in the current health-care environment. The Affordable Care Act has helped us to better understand the current use of technology and different approaches to surgery. This paper describes a method for extracting specific information about surgical procedures from the Healthcare Cost and Utilization Project (HCUP) database (also referred to as the National (Nationwide) Inpatient Sample (NIS)).The analyses provide a framework for comparing the different modalities of surgerical procedures of interest. Using an NIS database for a single year, we want to identify cohorts based on surgical approach. We do this by identifying the ICD-9 codes specific to robotic surgery, laparoscopic surgery, and open surgery. After we identify the appropriate codes using an ARRAY statement, a similar array is created based on the ICD-9 codes. Any minimally invasive procedure (robotic or laparoscopic) that results in a conversion is flagged as a conversion. Comorbidities are identified by ICD-9 codes representing the severity of each subject and merged with the NIS inpatient core file. Using a FORMAT statement for all diagnosis variables, we create macros that can be regenerated for each type of complication. These created macros are compiled in SAS® and stored in the library that contains the four macros that are called by tables. They call the macros for different macros variables. In addition, they create the frequencies of all cohorts and create the table structure with the title and number of the table. This paper describes a systematic method in SAS/STAT® 9.2 to extract the data from NIS using the ARRAY statement for the specific ICD-9 codes, to format the extracted data for the analysis, to merge the different NIS databases by procedures, and to use automatic macros to generate the report.
Read the paper (PDF).
Ravi Tejeshwar Reddy Gaddameedi, California State University,Eastbay
Usha Kreaden, Intuitive Surgical
B
Paper 1384-2015:
Building Flexible Jobs in DataFlux®
Creating DataFlux® jobs that can be executed from job scheduling software can be challenging. This presentation guides participants through the creation of a template job that accepts an email distribution list, subject, email verbiage, and attachment file name as macro variables. It also demonstrates how to call this job from a command line.
Read the paper (PDF).
Jeanne Estridge, Sinclair Community College
Paper SAS1824-2015:
Bust Open That ETL Black Box and Apply Proven Techniques to Successfully Modernize Data Integration
So you are still writing SAS® DATA steps and SAS macros and running them through a command-line scheduler. When work comes in, there is only one person who knows that code, and they are out--what to do? This paper shows how SAS applies extract, transform, load (ETL) modernization techniques with SAS® Data Integration Studio to gain resource efficiencies and to break down the ETL black box. We are going to share the fundamentals (metadata foldering and naming standards) that ensure success, along with steps to ease into the pool while iteratively gaining benefits. Benefits include self-documenting code visualization, impact analysis on jobs and tables impacted by change, and being supportable by interchangeable bench resources. We conclude with demonstrating how SAS® Visual Analytics is being used to monitor service-level agreements and provide actionable insights into job-flow performance and scheduling.
Read the paper (PDF).
Brandon Kirk, SAS
D
Paper 3104-2015:
Data Management Techniques for Complex Healthcare Data
Data sharing through healthcare collaboratives and national registries creates opportunities for secondary data analysis projects. These initiatives provide data for quality comparisons as well as endless research opportunities to external researchers across the country. The possibilities are bountiful when you join data from diverse organizations and look for common themes related to illnesses and patient outcomes. With these great opportunities comes great pain for data analysts and health services researchers tasked with compiling these data sets according to specifications. Patient care data is complex, and, particularly at large healthcare systems, might be managed with multiple electronic health record (EHR) systems. Matching data from separate EHR systems while simultaneously ensuring the integrity of the details of that care visit is challenging. This paper demonstrates how data management personnel can use traditional SAS PROCs in new and inventive ways to compile, clean, and complete data sets for submission to healthcare collaboratives and other data sharing initiatives. Traditional data matching methods such as SPEDIS are uniquely combined with iterative SQL joins using the SAS® functions INDEX, COMPRESS, CATX, and SUBSTR to yield the most out of complex patient and physician name matches. Recoding, correcting missing items, and formatting data can be efficiently achieved by using traditional functions such as MAX, PROC FORMAT, and FIND in new and inventive ways.
Read the paper (PDF).
Gabriela Cantu, Baylor Scott &White Health
Christopher Klekar, Baylor Scott and White Health
Paper 3261-2015:
Data Quality Scorecard
Many users would like to check the quality of data after the data integration process has loaded the data into a data set or table. The approach in this paper shows users how to develop a process that scores columns based on rules judged against a set of standards set by the user. Each rule has a standard that determines whether it passes, fails, or needs review (a green, red, or yellow score). A rule can be as simple as: Is the value for this column missing, or is this column within a valid range? Further, it includes comparing a column to one or more other columns, or checking for specific invalid entries. It also includes rules that compare a column value to a lookup table to determine whether the value is in the lookup table. Users can create their own rules and each column can have any number of rules. For example, a rule can be created to measure a dollar column to a range of acceptable values. The user can determine that it is expected that up to two percent of the values are allowed to be out of range. If two to five percent of the values are out of range, then data should be reviewed. And, if over five percent of the values are out of range, the data is not acceptable. The entire table has a color-coded scorecard showing each rule and its score. Summary reports show columns by score and distributions of key columns. The scorecard enables the user to quickly assess whether the SAS data set is acceptable, or whether specific columns need to be reviewed. Drill-down reports enable the user to drill into the data to examine why the column scored as it did. Based on the scores, the data set can be accepted or rejected, and the user will know where and why the data set failed. The process can store each scorecard data in a data mart. This data mart enables the user to review the quality of their data over time. It can answer questions such as: is the quality of the data improving overall? Are there specific columns that are improving or declining over time? What can we do to improve the qu ality of our data? This scorecard is not intended to replace the quality control of the data integration or ETL process. It is a supplement to the ETL process. The programs are written using only Base SAS® and Output Delivery System (ODS), macro variables, and formats. This presentation shows how to: (1) use ODS HTML; (2) color code cells with the use of formats; (3) use formats as lookup tables; (4) use INCLUDE statements to make use of template code snippets to simplify programming; and (5) use hyperlinks to launch stored processes from the scorecard.
Read the paper (PDF). | Download the data file (ZIP).
Tom Purvis, Qualex Consulting Services, Inc.
Paper 1762-2015:
Did Your Join Work Properly?
We have many options for performing merges or joins these days, each with various advantages and disadvantages. Depending on how you perform your joins, different checks can help you verify whether the join was successful. In this presentation, we look at some sample data, use different methods, and see what kinds of tests can be done to ensure that the results are correct. If the join is performed with PROC SQL and two criteria are fulfilled (the number of observations in the primary data set has not changed [presuming a one-to-one or many-to-one situation], and a variable that should be populated is not missing), then the merge was successful.
Read the paper (PDF).
Emmy Pahmer, inVentiv Health
Paper 3457-2015:
Documentation as You Go: aka Dropping Breadcrumbs!
Your project ended a year ago, and now you need to explain what you did, or rerun some of your code. Can you remember the process? Can you describe what you did? Can you even find those programs? Visually presented here are examples of tools and techniques that follow best practices to help us, as programmers, manage the flow of information from source data to a final product.
Read the paper (PDF).
Elizabeth Axelrod, Abt Associates Inc.
E
Paper 3329-2015:
Efficiently Using SAS® Data Views
For the Research Data Centers (RDCs) of the United States Census Bureau, the demand for disk space substantially increases with each passing year. Efficiently using the SAS® data view might successfully address the concern about disk space challenges within the RDCs. This paper discusses the usage and benefits of the SAS data view to save disk space and reduce the time and effort required to manage large data sets. The ability and efficiency of the SAS data view to process regular ASCII, compressed ASCII, and other commonly used file formats are analyzed and evaluated in detail. The authors discuss ways in which using SAS data views is more efficient than the traditional methods in processing and deploying the large census and survey data in the RDCs.
Read the paper (PDF).
Shigui Weng, US Bureau of the Census
Shy Degrace, US BUREAU OF THE CENSUS
Ya Jiun Tsai, US BUREAU OF THE CENSUS
Paper 3920-2015:
Entity Resolution and Master Data Life Cycle Management in the Era of Big Data
Proper management of master data is a critical component of any enterprise information system. However, effective master data management (MDM) requires that both IT and Business understand the life cycle of master data and the fundamental principles of entity resolution (ER). This presentation provides a high-level overview of current practices in data matching, record linking, and entity information life cycle management that are foundational to building an effective strategy to improve data integration and MDM. Particular areas of focus are: 1) The need for ongoing ER analytics--the systematic and quantitative measurement of ER performance; 2) Investing in clerical review and asserted resolution for continuous improvement; and 3) Addressing the large-scale ER challenge through distributed processing.
Read the paper (PDF). | Watch the recording.
John Talburt, Black Oak Analytics, Inc
Paper 3306-2015:
Extending the Scope of Custom Transformations
Building and maintaining a data warehouse can require a complex series of jobs. Having an ETL flow that is reliable and well integrated is one big challenge. An ETL process might need some pre- and post-processing operations on the database to be well integrated and reliable. Some might handle this via maintenance windows. Others like us might generate custom transformations to be included in SAS® Data Integration Studio jobs. Custom transformations in SAS Data Integration Studio can be used to speed ETL process flows and reduce the database administrator's intervention after ETL flows are complete. In this paper, we demonstrate the use of custom transformations in SAS Data Integration Studio jobs to handle database-specific tasks for improving process efficiency and reliability in ETL flows.
Read the paper (PDF).
Emre Saricicek, University of North Carolina at Chapel Hill
Dean Huff, UNC
F
Paper 1752-2015:
Fixing Lowercase Acronyms Left Over from the PROPCASE Function
The PROPCASE function is useful when you are cleansing a database of names and addresses in preparation for mailing. But it does not know the difference between a proper name (in which initial capitalization should be used) and an acronym (which should be all uppercase). This paper explains an algorithm that determines with reasonable accuracy whether a word is an acronym and, if it is, converts it to uppercase.
Read the paper (PDF). | Download the data file (ZIP).
Joe DeShon, Boehringer Ingelheim Vetmedica
Paper 3348-2015:
From ETL to ETL VCM: Ensure the Success of a Data Migration Project through a Flexible Data Quality Framework Using SAS® Data Management
Data quality is now more important than ever before. According to Gartner (2011), poor data quality is the primary reason why 40% of all business initiatives fail to achieve their targeted benefits. As a response, Deloitte Belgium has created an agile data quality framework using SAS® Data Management to rapidly identify and resolve root causes of data quality issues to jump-start business initiatives, especially a data migration one. Moreover, the approach uses both standard SAS Data Management functionalities (such as standardization, parsing, etc.) and advanced features (such as using macros, dynamic profiling in deployment mode, extracting the profiling results, etc.), allowing the framework to be agile and flexible and to maximize the reusability of specific components built in SAS Data Management.
Read the paper (PDF).
yves wouters, Deloitte
Valérie Witters, Deloitte
G
Paper SAS1852-2015:
Garbage In, Gourmet Out: How to Leverage the Power of the SAS® Quality Knowledge Base
Companies spend vast amounts of resources developing and enhancing proprietary software to clean their business data. Save time and obtain more accurate results by leveraging the SAS® Quality Knowledge Base (QKB), formerly a DataFlux® Data Quality technology. Tap into the existing QKB rules for cleansing contact information or product data, or easily design your own custom rules using the QKB editing tools. The QKB enables data management operations such as parsing, standardization, and fuzzy matching for contact information such as names, organizations, addresses, and phone numbers, or for product data attributes such as materials, colors, and dimensions. The QKB supports data in native character sets in over 38 locales. A single QKB can be shared by multiple SAS® Data Management installations across your enterprise, ensuring consistent results on workstations, servers, and massive parallel processing systems such as Hadoop. In this breakout, a SAS R&D manager demonstrates the power and flexibility of the QKB, and answers your questions about how to deploy and customize the QKB for your environment.
Read the paper (PDF).
Brian Rineer, SAS
Paper 1886-2015:
Getting Started with Data Governance
While there has been tremendous progress in technologies related to data storage, high-performance computing, and advanced analytic techniques, organizations have only recently begun to comprehend the importance of parallel strategies that help manage the cacophony of concerns around access, quality, provenance, data sharing, and use. While data governance is not new, the drumbeat around it, along with master data management and data quality, is approaching a crescendo. Intensified by the increase in consumption of information, expectations about ubiquitous access, and highly dynamic visualizations, these factors are also circumscribed by security and regulatory constraints. In this paper, we provide a summary of what data governance is and its importance. We go beyond the obvious and provide practical guidance on what it takes to build out a data governance capability appropriate to the scale, size, and purpose of the organization and its culture. Moreover, we discuss best practices in the form of requirements that highlight what we think is important to consider as you provide that tactical linkage between people, policies, and processes to the actual data lifecycle. To that end, our focus includes the organization and its culture, people, processes, policies, and technology. Further, we include discussions of organizational models as well as the role of the data steward, and provide guidance on how to formalize data governance into a sustainable set of practices within your organization.
Read the paper (PDF). | Watch the recording.
Greg Nelson, ThotWave
Lisa Dodson, SAS
H
Paper SAS1812-2015:
Hey! SAS® Federation Server Is Virtualizing 'Big Data'!
In this session, we discuss the advantages of SAS® Federation Server and how it makes it easier for business users to access secure data for reports and use analytics to drive accurate decisions. This frees up IT staff to focus on other tasks by giving them a simple method of sharing data using a centralized, governed, security layer. SAS Federation Server is a data server that provides scalable, threaded, multi-user, and standards-based data access technology in order to process and seamlessly integrate data from multiple data repositories. The server acts as a hub that provides clients with data by accessing, managing, and sharing data from multiple relational and non-relational data sources as well as from SAS® data. Users can view data in big data sources like Hadoop, SAP HANA, Netezza, or Teradata, and blend them with existing database systems like Oracle or DB2. Security and governance features, such as data masking, ensure that the right users have access to the data and reduce the risk of exposure. Finally, data services are exposed via a REST API for simpler access to data from third-party applications.
Read the paper (PDF).
Ivor Moan, SAS
I
Paper SAS1845-2015:
Introduction to SAS® Data Loader: The Power of Data Transformation in Hadoop
Organizations are loading data into Hadoop platforms at an extraordinary rate. However, in order to extract value from these platforms, the data must be prepared for analytic exploit. As the volume of data grows, it becomes increasingly more important to reduce data movement, as well as to leverage the computing power of these distributed systems. This paper provides a cursory overview of SAS® Data Loader, a product specifically aimed at these challenges. We cover the underlying mechanisms of how SAS Data Loader works, as well as how it's used to profile, cleanse, transform, and ultimately prepare data for analytics in Hadoop.
Read the paper (PDF).
Keith Renison, SAS
M
Paper SAS1785-2015:
Make Better Decisions with Optimization
Automated decision-making systems are now found everywhere, from your bank to your government to your home. For example, when you inquire for a loan through a website, a complex decision process likely runs combinations of statistical models and business rules to make sure you are offered a set of options for tantalizing terms and conditions. To make that happen, analysts diligently strive to encode their complex business logic into these systems. But how do you know if you are making the best possible decisions? How do you know if your decisions conform to your business constraints? For example, you might want to maximize the number of loans that you provide while balancing the risk among different customer categories. Welcome to the world of optimization. SAS® Business Rules Manager and SAS/OR® software can be used together to manage and optimize decisions. This presentation demonstrates how to build business rules and then optimize the rule parameters to maximize the effectiveness of those rules. The end result is more confidence that you are delivering an effective decision-making process.
Read the paper (PDF). | Download the data file (ZIP).
David Duling, SAS
Paper 2481-2015:
Managing Extended Attributes With a SAS® Enterprise Guide® Add-In
SAS® 9.4 introduced extended attributes, which are name-value pairs that can be attached to either the data set or to individual variables. Extended attributes are managed through PROC DATASETS and can be viewed through PROC CONTENTS or through Dictionary.XATTRS. This paper describes the development of a SAS® Enterprise Guide® custom add-in that allows for the entry and editing of extended attributes, with the possibility of using a controlled vocabulary. The controlled vocabulary used in the initial application is derived from the lifecycle branch of the Data Documentation Initiative metadata standard (DDI-L).
Read the paper (PDF).
Larry Hoyle, IPSR, Univ. of Kansas
Paper SAS1822-2015:
Master Data and Command Results: Combine Master Data Management with SAS® Analytics for Improved Insights
It's well known that SAS® is the leader in advanced analytics but often overlooked is the intelligent data preparation that combines information from disparate sources to enable confident creation and deployment of compelling models. Improving data-based decision making is among the top reasons why organizations decide to embark on master data management (MDM) projects and why you should consider incorporating MDM functionality into your analytics-based processes. MDM is a discipline that includes the people, processes, and technologies for creating an authoritative view of core data elements in enterprise operational and analytic systems. This paper demonstrates why MDM functionality is a natural fit for many SAS solutions that need to have access to timely, clean, and unique master data. Because MDM shares many of the same technologies that power SAS analytic solutions, it has never been easier to add MDM capabilities to your advanced analytics projects.
Read the paper (PDF).
Ron Agresta, SAS
N
Paper 3455-2015:
Nifty Uses of SQL Reflexive Join and Subquery in SAS®
SAS® SQL is so powerful that you hardly miss using Oracle PL/SQL. One SAS SQL forte can be found in using the SQL reflexive join. Another area of SAS SQL strength is the SQL subquery concept. The focus of this paper is to show alternative approaches to data reporting and to show how to surface data quality problems using reflexive join and subquery SQL concepts. The target audience for this paper is the intermediate SAS programmer or the experienced ANSI SQL programmer new to SAS programming.
Read the paper (PDF).
Cynthia Trinidad, Theorem Clinical Research
Paper SAS1866-2015:
Now That You Have Your Data in Hadoop, How Are You Staging Your Analytical Base Tables?
Well, Hadoop community, now that you have your data in Hadoop, how are you staging your analytical base tables? In my discussions with clients about this, we all agree on one thing: Data sizes stored in Hadoop prevent us from moving that data to a different platform in order to generate the analytical base tables. To address this dilemma, I want to introduce to you the SAS® In-Database Code Accelerator for Hadoop.
Read the paper (PDF).
Steven Sober, SAS
Donna DeCapite, SAS
P
Paper 3340-2015:
Performing Efficient Transposes on Large Teradata Tables Using SQL Explicit Pass-Through
It is a common task to reshape your data from long to wide for the purpose of reporting or analytical modeling and PROC TRANSPOSE provides a convenient way to accomplish this. However, when performing the transpose action on large tables stored in a database management system (DBMS) such as Teradata, the performance of PROC TRANSPOSE can be significantly compromised. In this case, it is more efficient for the DBMS to perform the transpose task. SAS® provides in-database processing technology in PROC SQL, which allows the SQL explicit pass-through method to push some or all of the work to the DBMS. This technique has facilitated integration between SAS and a wide range of data warehouses and databases, including Teradata, EMC Greenplum, IBM DB2, IBM Netezza, Oracle, and Aster Data. This paper uses the Teradata database as an example DBMS and explains how to transpose a large table that resides in it using the SQL explicit pass-through method. The paper begins with comparing the execution time using PROC TRANSPOSE with the execution time using SQL explicit pass-through. From this comparison, it is clear that SQL explicit pass-through is more efficient than the traditional PROC TRANSPOSE when transposing Teradata tables, especially large tables. The paper explains how to use the SQL explicit pass-through method and discusses the types of data columns that you might need to transpose, such as numeric and character. The paper presents a transpose solution for these types of columns. Finally, the paper provides recommendations on packaging the SQL explicit pass-through method by embedding it in a macro. SAS programmers who are working with data stored in an external DBMS and who would like to efficiently transpose their data will benefit from this paper.
Read the paper (PDF).
Tao Cheng, Accenture
R
Paper SAS1941-2015:
Real Time--Is That Your Final Decision?
Streaming data is becoming more and more prevalent. Everything is generating data now--social media, machine sensors, the 'Internet of Things'. And you need to decide what to do with that data right now. And 'right now' could mean 10,000 times or more per second. SAS® Event Stream Processing provides an infrastructure for capturing streaming data and processing it on the fly--including applying analytics and deciding what to do with that data. All in milliseconds. There are some basic tenets on how SAS® provides this extremely high-throughput, low-latency technology to meet whatever streaming analytics your company might want to pursue.
Read the paper (PDF). | Watch the recording.
Diane Hatcher, SAS
Jerry Baulier, SAS
Steve Sparano, SAS
S
Paper SAS4642-2015:
SAS and Hadoop. 4th Annual State of the Union
SAS9.4m2 brings even more progress to the interoperability between SAS and Hadoop - the industry standard for Big Data. This talk brings you up to date with where we are : - more distributions, more data types, more options :-) You now have the choice to run a stand-alone cluster or to co-mingle your SAS processing on your general purpose clusters managing it all with YARN. Come and learn about our new developments and get a glimpse of what we are looking at in the future.
Read the paper (PDF).
Paul Kent, SAS
Paper SAS4643-2015:
SAS in the Cloud -- Innovating Towards Easy
The real promise of the 'Cloud' is to make things easy. Easy for IT to manage. Easy for users to use. Easy for self-service deployments and access to software. SAS's approach to the Cloud is all about 'easy'. See what you can do today in the Cloud, and where we are going. What 'easy' do you want with SAS?
Diane Hatcher, SAS
Paper 3269-2015:
SASTRACE: Your Key to RDBMS Empowerment
For the many relational database products that SAS/ACCESS®supports (Oracle, Teradata, DB2, MySQL, SQL Server, Hadoop, Greenplum, PC Files, to name but a few), there are a myriad of RDBMS-specific options at your disposal, but how do you know the right options for any given situation? How much data should you transfer at a time? Which SAS® functions can be passed through to the database--which cannot? How do you verify that your processes are running efficiently? How do you test and validate any changes? The answer lies with the feedback capabilities of the SASTRACE system option.
Read the paper (PDF).
Andrew Howell, ANJ Solutions
Paper SAS1907-2015:
SAS® Data Management: Technology Options for Ensuring a Quality Journey Through the Data Management Process
When planning for a journey, one of the main goals is to get the best value possible. The same thing could be said for your corporate data as it journeys through the data management process. It is your goal to get the best data in the hands of decision makers in a timely fashion, with the lowest cost of ownership and the minimum number of obstacles. The SAS® Data Management suite of products provides you with many options for ensuring value throughout the data management process. The purpose of this session is to focus on how the SAS® Data Management solution can be used to ensure the delivery of quality data, in the right format, to the right people, at the right time. The journey is yours, the technology is ours--together, we can make it a fulfilling and rewarding experience.
Read the paper (PDF).
Mark Craver, SAS
Paper SAS4280-2015:
SAS® Workshop: SAS Data Loader for Hadoop
This workshop provides hands-on experience with SAS® Data Loader for Hadoop. Workshop participants will configure SAS Data Loader for Hadoop and use various directives inside SAS Data Loader for Hadoop to interact with data in the Hadoop cluster.
Read the paper (PDF).
Kari Richardson, SAS
Paper SAS1856-2015:
SAS® and SAP Business Warehouse on SAP HANA--What's in the Handshake?
Is your company using or considering using SAP Business Warehouse (BW) powered by SAP HANA? SAS® provides various levels of integration with SAP BW in an SAP HANA environment. This integration enables you to not only access SAP BW components from SAS, but to also push portions of SAS analysis directly into SAP HANA, accelerating predictive modeling and data mining operations. This paper explains the SAS toolset for different integration scenarios, highlights the newest technologies contributing to integration, and walks you through examples of using SAS with SAP BW on SAP HANA. The paper is targeted at SAS and SAP developers and architects interested in building a productive analytical environment with the help of the latest SAS and SAP collaborative advancements.
Read the paper (PDF).
Tatyana Petrova, SAS
Paper 3309-2015:
Snapshot SNAFU: Preventative Measures to Safeguard Deliveries
Little did you know that your last delivery ran on incomplete data. To make matters worse, the client realized the issue first. Sounds like a horror story, no? A few preventative measures can go a long way in ensuring that your data are up-to-date and progressing normally. At the data set level, metadata comparisons between the current and previous data cuts will help identify observation and variable discrepancies. Comparisons will also uncover attribute differences at the variable level. At the subject level, they will identify missing subjects. By compiling these comparison results into a comprehensive scheduled e-mail, a data facilitator need only skim the report to confirm that the data is good to go--or in need of some corrective action. This paper introduces a suite of checks contained in a macro that will compare data cuts in the data set, variable, and subject levels and produce an e-mail report. The wide use of this macro will help all SAS® users create better deliveries while avoiding rework.
Read the paper (PDF). | Download the data file (ZIP).
Spencer Childress, Rho,Inc
Alexandra Buck, Rho, Inc.
Paper 3209-2015:
Standardizing the Standardization Process
A prevalent problem surrounding Extract, Transform, and Load (ETL) development is the ability to apply consistent logic and manipulation of source data when migrating to target data structures. Certain inconsistencies that add a layer of complexity include, but are not limited to, naming conventions and data types associated with multiple sources, numerous solutions applied by an array of developers, and multiple points of updating. In this paper, we examine the evolution of implementing a best practices solution during the process of data delivery, with respect to standardizing data. The solution begins with injecting the transformations of the data directly into the code at the standardized layer via Base SAS® or SAS® Enterprise Guide®. A more robust method that we explore is to apply these transformations with SAS® macros. This provides the capability to apply these changes in a consistent manner across multiple sources. We further explore this solution by implementing the macros within SAS® Data Integration Studio processes on the DataFlux® Data Management Platform. We consider these issues within the financial industry, but the proposed solution can be applied across multiple industries.
Read the paper (PDF).
Avery Long, Financial Risk Group
Frank Ferriola, Financial Risk Group
Paper SAS1789-2015:
Step into the Cloud: Ways to Connect to Amazon Redshift with SAS/ACCESS®
Every day, companies all over the world are moving their data into the Cloud. While there are many options available, much of this data will wind up in Amazon Redshift. As a SAS® user, you are probably wondering, 'What is the best way to access this data using SAS?' This paper discusses the many ways that you can use SAS/ACCESS® to get to Amazon Redshift. We compare and contrast the various approaches and help you decide which is best for you. Topics that are discussed are building a connection, choosing appropriate data types, and SQL functions.
Read the paper (PDF).
James (Ke) Wang, SAS
Salman Maher, SAS
T
Paper SAS4644-2015:
The "Waiting Game" is Over. The New Game is "Now" with Event Stream Processing.
Streaming data and real-time analytics are being talked about in a lot of different situations. SAS has been exploring a variety of use cases that meld together streaming data and analytics to drive immediate action. We want to share these stories with you and get your feedback on what's going on in your world, where the waiting game is no longer what's being played.
Steve Sparano, SAS
Paper 3860-2015:
The Challenges with Governing Big Data: How SAS® Can Help
In this session, I discuss an overall approach to governing Big Data. I begin with an introduction to Big Data governance and the governance framework. Then I address the disciplines of Big Data governance: data ownership, metadata, privacy, data quality, and master and reference data management. Finally, I discuss the reference architecture of Big Data, and how SAS® tools can address Big Data governance.
Sunil Soares, Information Asset
Paper 3820-2015:
Time to Harvest: Operationalizing SAS® Analytics on the SAP HANA Platform
A maximum harvest in farming analytics is achieved only if analytics can also be operationalized at the level of core business applications. Mapped to the use of SAS® Analytics, the fruits of SAS be shared with Enterprise Business Applications by SAP. Learn how your SAS environment, including the latest of SAS® In-Memory Analytics, can be integrated with SAP applications based on the SAP In-Memory Platform SAP HANA. We'll explore how a SAS® Predictive Modeling environment can be embedded inside SAP HANA and how native SAP HANA data management capabilities such as SAP HANA Views, Smart Data Access, and more can be leveraged by SAS applications and contribute to an end-to-end in-memory data management and analytics platform. Come and see how you can extend the reach of your SAS® Analytics efforts with the SAP HANA integration!
Read the paper (PDF).
Morgen Christoph, SAP SE
U
Paper 3202-2015:
Using SAS® Mapping Functionality to Measure and Present the Veracity of Location Data
Crowd sourcing of data is growing rapidly, enabled by smart devices equipped with assisted GPS location, tagging of photos, and mapping other aspects of the users' lives and activities. A fundamental assumption that the reported locations are accurate within the usual GPS limitations of approximately 10m is made when such data is used. However, as a result of a wide range of technical issues, it turns out that the accuracy of the reported locations is highly variable and cannot be relied on; some locations are accurate but many are highly inaccurate, and that can affect many of the decisions that are being made based on the data. An analysis of a set of data is presented that demonstrates that this assumption is flawed, and examples of the levels of inaccuracy that has significant consequences in a range of contexts are provided. By using Base SAS®, the paper demonstrates the quality and veracity of the data and the scale of the errors that can be present. This analysis has critical significance in fields such as mobile location-based marketing, forensics, and law.
Read the paper (PDF).
Richard Self, University of Derby
W
Paper SAS1440-2015:
Want an Early Picture of the Data Quality Status of Your Analysis Data? SAS® Visual Analytics Shows You How
When you are analyzing your data and building your models, you often find out that the data cannot be used in the intended way. Systematic pattern, incomplete data, and inconsistencies from a business point of view are often the reason. You wish you could get a complete picture of the quality status of your data much earlier in the analytic lifecycle. SAS® analytics tools like SAS® Visual Analytics help you to profile and visualize the quality status of your data in an easy and powerful way. In this session, you learn advanced methods for analytic data quality profiling. You will see case studies based on real-life data, where we look at time series data from a bird's-eye-view and interactively profile GPS trackpoint data from a sail race.
Read the paper (PDF). | Download the data file (ZIP).
Gerhard Svolba, SAS
Paper SAS1390-2015:
What's New in SAS® Data Management
The latest releases of SAS® Data Integration Studio and DataFlux® Data Management Platform provide an integrated environment for managing and transforming your data to meet new and increasingly complex data management challenges. The enhancements help develop efficient processes that can clean, standardize, transform, master, and manage your data. The latest features include capabilities for building complex job processes, new web-based development and job monitoring environments, enhanced ELT transformation capabilities, big data transformation capabilities for Hadoop, integration with the analytic platform provided by SAS® LASR™ Analytic Server, enhanced features for lineage tracing and impact analysis, and new features for master data and metadata management. This paper provides an overview of the latest features of the products and includes use cases and examples for leveraging product capabilities.
Read the paper (PDF). | Watch the recording.
Nancy Rausch, SAS
Mike Frost, SAS
Y
Paper 3407-2015:
You Say Day-ta, I Say Dat-a: Measuring Coding Differences, Considering Integrity and Transferring Data Quickly
We all know there are multiple ways to use SAS® language components to generate the same values in data sets and output (for example, using the DATA step versus PROC SQL, If-Then-Elses versus Format table conversions, PROC MEANS versus PROC SQL summarizations, and so on). However, do you compare those different ways to determine which are the most efficient in terms of computer resources used? Do you ever consider the time a programmer takes to develop or maintain code? In addition to determining efficient syntax, do you validate your resulting data sets? Do you ever check data values that must remain the same after being processed by multiple steps and verify that they really don't change? We share some simple coding techniques that have proven to save computer and human resources. We also explain some data validation and comparison techniques that ensure data integrity. In our distributed computing environment, we show a quick way to transfer data from a SAS server to a local client by using PROC DOWNLOAD and then PROC EXPORT on the client to convert the SAS data set to a Microsoft Excel file.
Read the paper (PDF).
Jason Beene, WellsFargo
Mary Katz, Wells Fargo Bank
back to top