The Clinical Data Interchange Standards Consortium (CDISC) encompasses a variety of standards for medical research. Amongst the several standards developed by the CDISC organization are standards for data collection (Clinical Data Acquisition Standards Harmonization CDASH), data submission (Study Data Tabulation Model SDTM), and data analysis (Analysis Data Model ADaM). These standards were originally developed with drug development in mind. Therapeutic Area User Guides (TAUGs) have been a recent focus to provide advice, examples, and explanations for collecting and submitting data for a specific disease. Non-subjects even have a way to collect data using the Associated Persons Implementation Guide (APIG). SDTM domains for medical devices were published in 2012. Interestingly, the use of device domains in the TAUGs occurs in 14 out of 18 TAUGs, providing examples of the use of various device domains. Drug-device studies also provide a contrast on adoption of CDISC standards for drug submissions versus device submissions. Adoption of SDTM in general and the seven device SDTM domains by the medical device industry has been slow. Reasons for the slow adoption are discussed in this paper.
Carey Smoak, DataCeutics
In this paper, a SAS® macro is introduced that can search and replace any string in a SAS program. To use the macro, the user needs only to pass the search string to a folder. If the user wants to use the replacement function, the user also needs to pass the replacement string. The macro checks all of the SAS programs in the folder and subfolders to find out which files contain the search string. The macro generates new SAS files for replacements so that the old files are not affected. An HTML report is generated by the macro to include the original file locations, the line numbers of the SAS code that contain the search string, and the SAS code with search strings highlighted in yellow. If you use the replacement function, the HTML report also includes the location information for the new SAS files. The location information in the HTML report is created with hyperlinks so that the user can directly open the files from the report.
Ting Sa, Cincinnati Children's Hospital Medical Center
Healthcare is weird. Healthcare data is even more so. The digitization of healthcare data that describes the patient experience is a modern phenomenon, with most healthcare organizations still in their infancy. While the business of healthcare is already a century old, most organizations have focused their efforts on the financial aspects of healthcare and not on stakeholder experience or clinical outcomes. Think of the workflow that you might have experienced such as scheduling an appointment through doctor visits, obtaining lab tests, or obtaining prescriptions for interventions such as surgery or physical therapy. The modern healthcare system creates a digital footprint of administrative, process, quality, epidemiological, financial, clinical, and outcome measures, which range in size, cleanliness, and usefulness. Whether you are new to healthcare data or are looking to advance your knowledge of healthcare data and the techniques used to analyze it, this paper serves as a practical guide to understanding and using healthcare data. We explore common methods for how we structure and access data, discuss common challenges such as aggregating data into episodes of care, describe reverse engineering real world events, and talk about dealing with the myriad of unstructured data found in nursing notes. Finally, we discuss the ethical uses of healthcare data and the limits of informed consent, which are critically important for those of us in analytics.
Greg Nelson, Thotwave Technologies, LLC.
SAS/ACCESS® software grants access to data in third-party database management systems (DBMS), but how do you access data in DBMS not supported by SAS/ACCESS products? The introduction of the GROOVY procedure in SAS® 9.3 lets you retrieve this formerly inaccessible data through a JDBC connection. Groovy is an object-oriented, dynamic programming language executed on the Java Virtual Machine (JVM). Using Microsoft Azure HDInsight as an example, this paper demonstrates how to access and read data into a SAS data set using PROC GROOVY and a JDBC connection.
Lilyanne Zhang, SAS
The SAS® code looks perfect. You submit it and to your amazement, there is a problem with the CREATE TABLE statement. You need to change the table definition, ever so slightly, but how? Explicit pass-through? That's not an option. Fortunately, there are a handful of SAS options that can save the day. This presentation covers everything you need to know in order to adjust the SAS CREATE TABLE statements using SAS options. This presentation covers the following SAS options: DBCREATE_TABLE_OPTS=, POST_STMT_OPTS=, POST_TABLE_OPTS=, PRE_STMT_OPTS=, and PRE_TABLE_OPTS=. We use Hadoop and Oracle examples to show why these options can make your life easier. From there, we use real code to show you how to use them.
Jeff Bailey, SAS
Interactively redeploying SAS® Data Integration Studio jobs can be a slow and tedious process. The updated batch deployment utility gives the ETL Tech Lead a more efficient and repeatable method for administering batch jobs. This improved tool became available in SAS® Data Integration Studio 4.901.
Jeff Dyson, The Financial Risk Group
In the pharmaceutical industry, the Clinical Data Interchange Standards Consortium s (CDISC) Study Data Tabulation Model (SDTM) is required by the US Food and Drug Administration (FDA) as the standard data structure for regulatory submission of clinical data. Manually mapping raw data to SDTM domains can be time consuming and error prone, considering the increasing complexity of clinical data. However, this process can be much more efficient if the raw data is collected using the Clinical Data Acquisition Standards Harmonization (CDASH) standard, allowing for the automatic conversion to the SDTM data structure. This paper introduces a macro that can automatically create a SAS® program for each SDTM domain (for example, dm.sas for Demography [DM]), that maps CDASH data to SDTM data. The macro compares the attributes of CDASH raw data sets with SDTM domains to generate SAS code that performs the mapping. Each SAS program does the following: 1) sets up variables and assigns their proper order; 2) converts date and time to ISO8601 standard; 3) converts numeric variables to character variables; and 4) transposes the data sets from wide to long for the Findings and Events domains. This macro, which sets up the basic frame of SDTM mapping, can minimize the manual work for SAS programmers or, in some cases, completely handle some simple domains without any further modifications. This greatly increases the efficiency and speed of the SDTM conversion process.
Hao Xu, McDougall Scientific
Hong Chen, McDougall Scientific
You have SAS® software. You have databases or data platforms like Hadoop, possibly with some large distributed data. If you already know how to make SAS code talk to your data platforms, you have already taken a solid step toward a successful integration. But you might also want to know how to take this communication to a different level. If your data platform is the one that is built for massively parallel processing, chances are that SAS code has already created the SAS® Embedded Process framework that allows SAS tasks to be embedded next to your data sources for execution. SAS® In-Database Technologies is a family of products that use this framework and provide an accelerated level of integration. This paper explains core principles behind these technologies and presents application scenarios for each of these products. We use a variety of examples to highlight the specifics of individual SAS accelerators (SAS® Scoring Accelerator, SAS® In-Database Code Accelerator, and others) across the data platforms.
Tatyana Petrova, SAS
Session 0921-2017:
Bridging the Gap between Agile Model Development and IT Productionisation
Often the burden of productionisation of analytical models falls upon the analyst, so every Monday morning the analyst comes in and presses the Run button. Now this is obviously fraught with danger (for example, the source data isn't available, the analyst goes on holidays, or the analyst resigns), and might lead to invalid results being consumed by downstream systems. There are many reasons that this might occur, but the most common one is that it takes IT too long to put a model into full production (especially if that model contains new data sources). In this presentation, I show a tested architecture that allows for the typical rapid development of models (and in fact it actually significantly speeds up the discovery phase), as well as allows for an orderly handover to IT for them to productionise without disrupting the regular run of the models. This allows for notification of downstream users if there is a delay in the arrival of data, as well as rapid IT Operations response if there is a problem during the loading and creation.
Paul Segal, Teradata
SAS® Visual Analytics is a very powerful tool for users to visually explore data, but in some organizations not all data should be available for everybody. And although it is relatively easy to scale up a SAS Visual Analytics environment when the need for data increases, it still would be beneficial to set up a structure where the organization can keep control over who actually has the right to load data versus providing everybody the right to load data into a SAS Visual Analytics environment. Within this breakout session a potential solution is shown by providing a high-level overview of the SAS Visual Analytics data access management solution at ING bank in the Netherlands for the Risk Services Organization.
Chun-Yian Liew, ING Bank N.V.
As an information security or data professional, you have seen and heard about how advanced analytics has impacted nearly every business domain. You recognize the potential of insights derived from advanced analytics to improve the information security of your organization. You want to realize these benefits, and to understand their pitfalls. To successfully apply advanced analytics to the information security business problem, proper application of data management processes and techniques is of paramount importance. Based on professional services experience in implementing SAS® Cybersecurity, this session teaches you about the data sources used, the activities involved in properly managing this data, and the means to which these processes address information security business problems. You will come to appreciate how using advanced analytics in the information security domain requires more than just the application of tools or modeling techniques. Using a data management regime for information security concerns can benefit your organization by providing insights into IT infrastructure, enabling successful data science activities, and providing greater resilience by way of improved information security investigations.
Alex Anglin, SAS
Data validation plays a key role as an organization engages in a data governance initiative. Better data leads to better decisions. This applies to public schools as well as business entities. Each Local Educational Agency (LEA) in Pennsylvania reports children with disabilities to the Pennsylvania Department of Education (PDE) in compliance with IDEA (Individuals with Disabilities Education Act). PDE provides a Comparison Report to each LEA to assist in their data validation process. This Comparison Report provides counts of various categories for the most recent and previous year. LEAs use the Comparison Report to validate data submitted to PDE. This paper discusses how the Base SAS® SORT procedure and MERGE statement extract hidden information behind the counts to assist LEAs in their data validation process.
Barry Frye, Appalachia Intermediate Unit 8
The Analysis Data Model (ADaM) Basic Data Structure (BDS) can be used for many analysis needs. We all know that the SAS® DATA step is a very flexible and powerful tool for data processing. In fact, the DATA step is very useful in the creation of a non-trivial BDS data set. This paper walks through a series of examples showing use of the SAS DATA step when deriving rows in BDS. These examples include creating new parameters, new time points, and changes from multiple baselines.
Sandra Minjoe
The ever growing volume of data challenges us to keep pace in ensuring that we use it to its full advantage. Unfortunately, often our response to new data sources, data types, and applications is somewhat reactionary. There exists a misperception that organizations have precious little time to consider a purposeful strategy without disrupting business continuity. Strategy is a phrase that is often misused and ill-defined. However, it is nothing more than a set of integrated choices that help position an initiative for future success. This presentation covers the key elements defining data strategy. The following key topics are included: What data should we keep or toss? How should we structure data (warehouse versus data lake versus real-time event streaming)? How do we store data (cloud, virtualization, federation, cloud, Hadoop)? What is the approach we use to integrate and cleanse data (ETL versus cognitive/ automated profiling)? How do we protect and share data? These topics ensure that the organization gets the most value from our data. They explore how we prioritize and adapt our strategy to meet unanticipated needs in the future. As with any strategy, we need to make sure that we have a roadmap or plan for execution, so we talk specifically about the tools, technologies, methods, and processes that are useful as we design a data strategy that is both relevant and actionable to your organization.
Greg Nelson, Thotwave Technologies, LLC.
Discover how to document your SAS® programs, data sets, and catalogs with a few lines of code that include SAS functions, macro code, and SAS metadata. Do you start every project with the best of intentions to document all of your work, and then fall short of that aspiration when deadlines loom? Learn how SAS system macro variables can provide valuable information embedded in your programs, logs, lists, catalogs, data sets and ODS output; how your programs can automatically update a processing log; and how to generate two different types of codebooks.
Louise Hadden, Abt Associates
Roberta Glass, Abt Associates
The British Airways (BA) revenue management team is responsible for surfacing prices made available in the market with the objective of maximizing revenue from our 40,000,000 passenger journeys. BA is currently working to understand how competitor data can be exploited to help facilitate better decision making. Due to the low level of aggregation, competitor data is too large (and consequently too expensive) to store on conventional relational databases. Therefore, it has been stored on a small Hadoop installation at BA. Thanks to SAS/ACCESS® Interface to Hadoop, we have been able to run our complex algorithms on these large data sets without changing the way we work and whilst exploiting the full capabilities of SAS®.
Kayne Putman, British Airways
Are you a marketing analyst who speaks SAS®? Congratulations, you are in high demand! Or are you? Marketing analysts with programming skills are critical today. The ability to extract large volumes of data, massage it into a manageable format, and display it simply are necessary skills in the world of big data. However, programming skills are not nearly enough. In fact, some marketing managers are putting less and less weight on them and are focusing more on the softer skills that they require. This session will help ensure that you are not left out. In this session, Emma Warrillow shares why being a good programmer is only the beginning. She provide practical tips on moving from being a someone who is good at coding to becoming a true collaborator with marketing taking your marketing analytics to the next level. In 2016, Emma Warrillow's presentation at SAS® Global Forum was very well received (http://blogs.sas.com/content/sgf/2016/04/21/always-be-yourself-unless-you-can-be-a-unicorn/). In this follow-up, she revisits some of the highlights from 2016 and shares some new ideas. You can be sure of an engaging code-free session!
Emma Warrillow, Data Insight Group Inc. (DiG)
This paper demonstrates how to use the capabilities of SAS® Data Integration Studio to extract, load, and transform your data within a Hadoop environment. Which transformations can be used in each layer of the ELT process is illustrated using a sample use case, and the functionality of each is described. The use case steps through the process from source to target.
Darryl Yewchin, SAS
Todd Foreman, SAS
Session SAS2010-2017:
Hands-On Workshop: Accessing and Manipulating Data in SAS® Viya™
In this course you will learn how to access and manage SAS and Excel data in SAS® Viya .
Davetta Dunlap, SAS
Session 2001-2017:
Hands-On Workshop: SAS® Data Loader for Hadoop
This workshop provides hands-on experience with some basic functionality of SAS Data Loader for Hadoop. You will learn how to: Copy Data to Hadoop Profile Data in Hadoop Cleanse Data in Hadoop
Kari Richardson, SAS
Secondary use of administrative claims data, EHRs and EMRs, registry data, and other data sources within the health data ecosystem provide rich opportunity and potential to study topics ranging from public health surveillance to comparative effectiveness research. Data sourced from individual sites can be limited in their scope, coverage, and statistical power. Sharing and pooling data from multiple sites and sources, however, present administrative, governance, analytic, and patient-privacy challenges. Distributed data networks represent a paradigm shift in health-care data sharing. They have evolved at a critical time when big data and patient privacy are often competing priorities. A distributed data network is one that has no central repository of data. Data reside behind the firewall of each data-contributing partner in a network. Each partner transforms its source data in accordance with a common data model and allows indirect access to data through a standard query approach using flexibly designed informatics tools. This presentation discusses how distributed data networks have matured to make important contributions to the health-care data ecosystem and the evolving Learning Healthcare System. The presentation focuses on 1) the distributed data network and its purpose, concept, guiding principles, and benefits. 2) Common data models and their concepts, designs, and benefits. 3) Analytic tool development and its design and implementation considerations. 4) Analytic chal
Jennifer Popovic, Harvard Medical School / Harvard Pilgrim Health Care Institute
Another year implementing, validating, securing, optimizing, migrating, and adopting the Hadoop platform. What have been the top 10 accomplishments with Hadoop seen over the last year? We also review issues, concerns, and resolutions from the past year as well. We discuss where implementations are and some best practices for moving forward with Hadoop and SAS® releases.
Howard Plemmons, SAS
Mauro Cazzari, SAS
The clock is ticking! Is your company ready for May 25, 2018 when the General Data Protection Regulation that affects data privacy laws across Europe comes into force? If companies fail to comply, they incur very large fines and might lose customer trust if sensitive information is compromised. With data streaming in from multiple channels in different formats, sizes, and wavering quality, it is increasingly difficult to keep track of personal data so that you can protect it. SAS® Data Management helps companies on their journey toward governance and compliance involving tasks such as detection, quality assurance, and protection of personal data. This paper focuses on using SAS® Federation Server and SAS® Data Management Studio in the SAS® data management suite of products to surface and manage that hard-to find-personal data. SAS Federation Server provides you with a universal way to access data in Hadoop, Teradata, SQL Server, Oracle, SAP HANA, and other types of data without data movement during processing. The advanced data masking and encryption capabilities of SAS Federation Server can be use when virtualizing data for users. Purpose-built data quality functions are used to perform identification analysis, parsing, and matching and extraction of personal data in real time. We also provide insight to how the exploratory data analysis capability of SAS® Data Management Studio enables you to scan through your investigation hub to identify and categorize personal data.
Cecily Hoffritz, SAS
Many SAS® environments are set up for single-byte character sets (SBCS). But many organizations now have to process names of people and companies with characters outside that set. You can solve this problem by changing the configuration to the UTF-8 encoding, which is a multi-byte character set (MBCS). But the commonly used text manipulating functions like SUBSTR, INDEX, FIND, and so on, act on bytes, and should not be used anymore. SAS has provided new functions to replace these (K-functions). Also, character fields have to be enlarged to make room for multi-byte characters. This paper describes the problems and gives guidelines for a strategy to change. It also presents code to analyze existing code for functions that might cause problems. For those interested, a short historic background and a description of UTF-8 encoding is also provided. Conclusions focus on the positioning of SAS environments configured with UTF-8 versus single-byte encodings, the strategy of organizations faced with a necessary change, and the documentation.
Frank Poppe, PW Consulting
Traditionally, role-based access control is implemented as group memberships. Access to SAS® data sets or metadata libraries requires membership in the group that 'owns' the resources. From the point of view of a SAS process, these authorizations are additive. If a user is a member in two distinct groups, her SAS processes have access to the data resources of both groups simultaneously. This happens every time the user runs a SAS process; even when the code in question is meant to be used with only one group's resources. As a consequence, having a master data source defining data flows between groups becomes futile, as any SAS process of the user can bypass said definitions. In addition, as it is not possible to reduce the user's authorizations to match those of only the relevant group, it becomes challenging to determine whether other members of the group have sufficient authorization. Furthermore, it becomes difficult to audit statistics production, as it cannot be automatically determined which of the groups owns a certain log file. All these problems can be avoided by using role-based access control with dynamic separation of duties (RBAC DSoD). In DSoD, the user is able to activate only one group membership at a time. This paper describes one way to implement an RBAC with DSoD schema in a UNIX server environment.
Perttu Muurimaki, Statistics Finland
XML documents are becoming increasingly popular for transporting data from different operating systems. In the pharmaceutical industry, the Food and Drug Administration (FDA) requires pharmaceutical companies to submit certain types of data in XML format. This paper provides insights into XML documents and summarizes different methods of importing and exporting XML documents with SAS®, including: using the XML LIBNAME engine to translate between the XML markup and the SAS format; creating an XML Map and using the XML92 LIBNAME engine to read in XML documents and create SAS data sets; and using Clinical Data Interchange Standards Consortium (CDISC) procedures to import and export XML documents. An example of importing OpenClinica data into SAS by implementing these methods is provided.
Fei Wang, McDougall Scientific
How can we run traditional SAS® jobs, including SAS® Workspace Servers, on Hadoop worker nodes? The answer is SAS® Grid Manager for Hadoop, which is integrated with the Hadoop ecosystem to provide resource management, high availability and enterprise scheduling for SAS customers. This paper provides an introduction to the architecture, configuration, and management of SAS Grid Manager for Hadoop. Anyone involved with SAS and Apache Hadoop should find the information in this paper useful. The first area covered is a breakdown of each required SAS and Hadoop component. From the Hadoop ecosystem, we define the role of Hadoop YARN, Hadoop Distributed File System (HDFS) storage, and Hadoop client services. We review SAS metadata definitions for SAS Grid Manager, SAS® Object Spawner, and SAS® Workspace Servers. We cover required Kerberos security, as well as SAS® Enterprise Guide® and the SAS® Grid Manager Client Utility. YARN queues and the SAS Grid Policy file for optimizing job scheduling are also reviewed. And finally, we discuss traditional SAS math running on a Hadoop worker node, and how it can take advantage of high-performance math to accelerate job execution. By leveraging SAS Grid Manager for Hadoop, sites are moving SAS jobs inside a Hadoop cluster. This will ultimately cut down on data movement and provide more consistent job execution. Although this paper is written for SAS and Hadoop administrators, SAS users can also benefit from this session.
Mark Lochbihler, Hortonworks
For many years now you have learned the ins and outs of using SAS/ACCESS® software to move data into SAS® to do your analytics. With the new open, cloud-ready SAS® Viya platform comes a new set of data access technologies known as SAS data connectors and SAS data connect accelerators. This paper describes what these new data access products are and how they integrate with the SAS Viya platform. After reading this paper, you will have the foundation needed to load data from third-party data sources into SAS Viya.
Salman Maher, SAS
Chris DeHart, SAS
Barbara Kemper, SAS
Geofencing is one of the most promising and exciting concepts that has developed with the advent of the internet of things. Like John Anderton in the 2002 movie Minority Report, you can now enter a mall and immediately receive commercial ads and offers based on your personal taste and past purchases. Authorities can track vessels positions and detect when a ship is not in the area it should be, or they can forecast and optimize harbor arrivals. When a truck driver breaks from the route, the dispatcher can be alerted and can act immediately. And there are countless examples from manufacturing, industry, security, or even households. All of these applications are based on the core concept of geofencing, which consists of detecting whether a device s position is within a defined geographical boundary. Geofencing requires real-time processing in order to react appropriately. In this session, we explain how to implement real-time geofencing on streaming data with SAS® Event Stream Processing and achieve high-performance processing, in terms of millions of events per second, over hundreds of millions of geofences.
Frederic Combaneyre, SAS
No Batch Scheduler? No problem! This paper describes the use of a SAS® Data Integration Studio job that can be started by a time-dependent scheduler like Windows Scheduler (or crontab in UNIX) to mimic a batch scheduler using SAS® Grid Manager.
Patrick Cuba, Cuba BI Consulting
In this paper, we explore advantages of the DS2 procedure over the DATA step programming in SAS®. DS2 is a new SAS proprietary programming language appropriate for advanced data manipulation. We explore the use of PROC DS2 to execute queries in databases using SAS FedSQL. Several DS2 language elements accept embedded FedSQL syntax, and the run-time generated queries can exchange data interactively between DS2 and the supported database. This action enables SQL preprocessing of input tables, which effectively allows processing data from multiple tables in different databases within the same query, thereby drastically reducing processing times and improving performance. We explore use of DS2 for creating tables, bulk loading tables, manipulating tables, and querying data in an efficient manner. We explore advantages of using PROC DS2 over DATA step programming such as support for different data types, ANSI SQL types, programming structure elements, and benefits of using new expressions or writing one's own methods or packages available in the DS2 system. The DS2 procedure enables requests to be processed by the DS2 data access technology that supports a scalable, threaded, high-performance, and standards-based way to access, manage, and share relational data. In the end, we empirically measure performance benefits of using PROC DS2 over PROC SQL for processing queries in-database by taking advantage of threaded processing in supported databases such as Oracle.
Viraj Kumbhakarna, MUFG Union Bank
As a programmer specializing in tracking systems for research projects, I was recently given the task of implementing a newly developed, web-based tracking system for a complex field study. This tracking system uses an SQL database on the back end to hold a large set of related tables. As I learned about the new system, I found that there were deficiencies to overcome to make the system work on the project. Fortunately, I was able to develop a set of utilities in SAS® to bridge the gaps in the system and to integrate the system with other systems used for field survey administration on the project. The utilities helped to do the following: 1) connect schemas and compare cases across subsystems; 2) compare the statuses of cases across multiple tracked processes; 3) generate merge input files to be used to initiate follow-up activities; 4) prepare and launch SQL stored procedures from a running SAS job; and 5) develop complex queries in Microsoft SQL Server Management Studio and making them run in SAS. This paper puts each of these needs into a larger context by describing the need that is not addressed by the tracking system. Then, each program is explained and documented, with comments.
Chris Carson, RTI International
Organizations that create and store personally identifiable information (PII) are often required to de-identify sensitive data to protect an individual s privacy. There are multiple methods in SAS® that can be used to de-identify PII depending on data types and encryption needs. The first method is to apply crosswalk mapping by linking a data set with PII to a secured data set that contains the PII and its corresponding surrogate. Then, the surrogate replaces the PII in the original data set. A second method is SAS encryption, which involves translating PII into an encrypted string using SAS functions. This could be a one-byte-to-one-byte swap or a one-byte-to-two-byte swap. The third method is in-database encryption, which encrypts the PII in a data warehouse, such as Oracle and Teradata, using SAS tools before any information is imported into SAS for users to see. This paper discusses the advantages and disadvantages of these three methods, provides sample SAS code, and describes the corresponding methods to decrypt the encrypted data.
Shuhua Liang, Kaiser Permanente
Zoe Bider-Canfield, Kaiser Permanente
SAS® Decision Manager includes a hidden gem: a web service for high-speed online scoring of business events. The fourth maintenance release of SAS® 9.4 represents the third release of the SAS® Micro Analytics Service for scoring SAS® DS2 code decisions in a standard JSON web service. Users will learn how to create decisions, deploy modules to the web service, test the service, and record business events.
Prasenjit Sen, SAS
Chris Upton, SAS
This paper compiles information from documents produced by the U.S. Food and Drug Administration (FDA), the Clinical Data Interchange Standards Consortium (CDISC), and Computational Sciences Symposium (CSS) workgroups to identify what analysis data and other documentation is to be included in submissions and where it all needs to go. It not only describes requirements, but also includes recommendations for things that aren't so cut-and-dried. It focuses on the New Drug Application (NDA) submissions and a subset of Biologic License Application (BLA) submissions that are covered by the FDA binding guidance documents. Where applicable, SAS® tools are described and examples given.
Sandra Minjoe
John Troxell, Accenture Accelerated R&D Services
There are so many ways for SAS/ACCESS® users to read and write data from and to Microsoft Excel files: SAS® PC Files Server, XLS and XLSX engines, the SAS IMPORT and EXPORT procedures, various Excel file formats (.xls, .xlsx, .xlsb, .xlsm), and more. Many users ask, 'Which is best for me?' This paper explores the requirements and limitations of each engine, along with performance considerations and some of the not-so-obvious things to consider. It also includes a brief analogous discussion on Microsoft Access databases, which share some of the same mechanisms.
Joe Schluter, SAS
Henry Feldman, SAS
SAS® Data Integration Studio jobs are not always linear. While Loop transformations have been part of SAS Data Integration Studio for ages, only more recently has SAS Data Integration Studio included the Conditional Control transformations to control logic flow within a job. This paper demonstrates the use of both the Loop and Conditional transformations in a real world example.
Harry Droogendyk, Stratia Consulting Inc
A common issue in data integration is that often the documentation and the SAS® data integration job source code start to diverge and eventually become out of sync. At Capgemini, working for a specific client, we developed a solution to rectify this challenge. We proposed moving all necessary documentation into the SAS® Data Integration Studio job itself. In this way, all documentation then becomes part of the metadata we have created, with the possibility of automatically generating Job and Release documentation from the metadata. This presentation therefore focuses on the metadata documentation generator. Specifically, this presentation: 1) looks at how to use programming and documentation standards in SAS data integration jobs to enable the generation of documentation from the metadata; and 2) shows how the documentation is generated from the metadata, and the challenges that were encountered creating the code. I draw on our hands-on experience; Capgemini has implemented this for a customer in the Netherlands, and we are rolling this out as an accelerator in other SAS data integration projects worldwide. I share examples of the generated documentation, which contains functional and technical designs, including a list with all source tables, a list with the target tables, all transformations with their own documentation, job dependencies, and more.
Richard Hogenberg, Capgemini
The fourth maintenance release for SAS® 9.4 and the new SAS® Viya platform bring even more progress with respect to the interoperability between SAS® and Hadoop the industry standard for big data. This talk brings you up-to-date with where we are: more distributions, more data types, more options and then there is the cloud. Come and learn about the exciting new developments for blending your SAS processing with your shared Hadoop cluster.
Paul Kent, SAS
Hospital Information Technologists are faced with a dilemma: how to get the many pharmacy databases, dynamic data sets, and software systems to communicate with each other and generate useful, automated, real-time output. SAS® serves as a unifying tool for our hospital pharmacy. It brings together data from multiple sources, generates output in multiple formats, analyzes trends, and generates summary reports to meet workload, quality, and regulatory requirements. Data sets originate from multiple sources, including drug and device wholesalers, web-based drug information systems, dumb machine output, pharmacy drug-dispensing platforms, hospital administration systems, and others. SAS output includes CSV files that can be read by dispensing machines, report output for Pharmacy and Therapeutics committees, graphs to summarize year-to-year dispensing and quality trends, emails to customers with inventory and expiry date notifications, investigational drug information summaries for hospital staff, inventory trending with restock alerts, and quality assurance summary reports. For clinical trial support, additional output includes randomization codes, data collection forms, blinded enrollment summaries, study subject assignment lists, and others. For business operations, output includes invoices, shipping documents, and customer metrics. SAS brings our pharmacy information systems together and supports an efficient, cost-effective, flexible, and reliable workflow.
Robert MacArthur, Rockefeller University
Arman Altincatal, Evidera
The syntax to combine SAS® data sets is simple: use the SET statement to concatenate, and use the MERGE and BY statements to merge. The data sets themselves, however, might be complex. Combining the data sets might not result in what you need. This paper reviews techniques to perform before you combine data sets, including checking for the following: common variables; common variables with different attributes; duplicate identifiers; duplicate observations; and acceptable match rates.
Christopher Bost, Independent SAS Consultant
This paper discusses a set of practical recommendations for optimizing the performance and scalability of your Hadoop system using SAS®. Topics include recommendations gleaned from actual deployments from a variety of implementations and distributions. Techniques cover tips for improving performance and working with complex Hadoop technologies such as Kerberos, techniques for improving efficiency when working with data, methods to better leverage the SAS in Hadoop components, and other recommendations. With this information, you can unlock the power of SAS in your Hadoop system.
Wilbram Hazejager, SAS
Nancy Rausch, SAS
Testing is a weak spot in many data warehouse environments. A lot of the testing is focused on the correct implementation of requirements. But due to the complex nature of analytics environments, a change in a data integration process can lead to unexpected results in totally different and untouched areas. We developed a method to identify unexpected changes often and early by doing a nightly regression test. The test does a full ETL run, compares all output from the test to a baseline, and reports all the changes. This paper describes the process and the SAS® code needed to back up existing data, trigger ETL flows, compare results, and restore situations after a nightly regression test. We also discuss the challenges we experienced while implementing the nightly regression test framework.
Laurent de Walick, PW Consulting
bas Marsman, NN Bank
Stephan Minnaert, PW Consulting
As a SAS® Visual Analytics administrator, how do you efficiently manage your SAS® LASR environment? How do you ensure reliable data availability to your end users? How do you ensure that your users have the proper permissions to perform their tasks in SAS Visual Analytics? This paper covers some common management issues in SAS Visual Analytics, why and how they might arise, and how to resolve them. It discusses methods of programmatically managing your SAS® LASR Analytic Server and tables, as well as using SAS® Visual Analytics Administrator. Furthermore, it provides a better understanding of the roles in SAS Visual Analytics and demonstrates how to set up appropriate user permissions. Using the methods discussed in this paper can help you improve the end-user experience as well as system performance.
Beena Mathew, SAS
Zuzu Williams, SAS
Amy Gabig, SAS
Using SAS® to query relational databases can be challenging, even for seasoned SAS programmers. SAS/ACCESS® software makes it easy to directly access data on nearly any platform, but there is a lot of under-the-hood functionality that takes time to learn. Here are tips that will get you on your way fast, including understanding and mastering SQL pass-through; efficiently bulk-loading data from SAS into other databases; tuning your SQL queries; and when to use native database versus SAS functionality.
Andrew Clapson, MD Financial Management
The traditional model of SAS® source-code production is for all code to be directly written by users or indirectly written (that is, generated by user-written macros, Lua code, or with DATA steps). This model was recently extended to enable SAS macro code to operate on arbitrary text (for example, on HTML) using the STREAM procedure. In contrast, SAS includes many products that operate in the client/server environment and function as follows: 1) the user interacts with the product via a GUI to specify the processing desired; 2) the product saves the user-specifications in metadata and generates SAS source code for the target processing; 3) the source code is then run (per user directions) to perform the processing. Many of these products give users the ability to modify the generated code and/or insert their own user-written code. Also, the target code (system-generated plus optional user-written) can be exported or deployed to be run as a stored process, in batch, or in another SAS environment. In this paper, we review the SAS ecosystem contexts where source code is produced, the pros and cons of each approach, discuss why some system-generated code is inelegant, and make some suggestions for determining when to write the code manually, and when and how to use system-generated code.
Thomas Billings, MUFG Union Bank
If you run SAS® and Teradata software with default application and database client encodings, some operations with international character sets will appear to work because you are actually treating character strings as streams of bytes instead of streams of characters. As long as no one in the chain of custody tries to interpret the data as anything other than a stream of bytes, then data can sometimes flow in and out of a database without being altered, giving the appearance of correct operation. But when you need to compare a particular character to an international character, or when your data approaches the maximum size of a character field, then you will run into trouble. To correctly handle international character sets, every layer of software that touches the data needs to agree on the encoding of the data. UTF-8 encoding is a flexible way to handle many single-byte and multi-byte international character sets. To use UTF-8 encoding with international character sets, we need to configure the SAS session encoding, Teradata client encoding, and Teradata server encoding to all agree, so that they are handling UTF-8 encoded data. This paper shows you how to configure SAS and Teradata so that your applications run successfully with international characters.
Greg Otto, Teradata
Salman Maher, SAS
Austin Swift, SAS
Data is a valuable corporate asset that, when managed improperly, can detract from a company's ability to achieve strategic goals. At 1-800-Flowers.com, Inc. (18F), we have embarked on a journey toward data governance through embracing Master Data Management (MDM). Along the path, we've recognized that in order to protect and increase the value of our data, we must take data quality into consideration at all aspects of data movement in the organization. This presentation discusses the ways that SAS® Data Management is being leveraged by the team at 18F to create and enhance our data quality strategy to ensure data quality for MDM.
Brian Smith, 1800Flowers.com
The latest releases of SAS® Data Management software provide a comprehensive and integrated set of capabilities for collecting, transforming, and managing your data. The latest features in the product suite include capabilities for working with data from a wide variety of environments and types including Hadoop, cloud data sources, RDBMS, files, unstructured data, streaming, and others, and the ability to perform ETL and ELT transformations in diverse run-time environments including SAS®, database systems, Hadoop, Spark, SAS® Analytics, cloud, and data virtualization environments. There are also new capabilities for lineage, impact analysis, clustering, and other data governance features for enhancements to master data and support metadata management. This paper provides an overview of the latest features of the SAS® Data Management product suite and includes use cases and examples for leveraging product capabilities.
Nancy Rausch, SAS
High-quality analytics works best with the best-quality data. Preparing your data ranges from activities like text manipulation and filtering to creating calculated items and blending data from multiple tables. This paper covers the range of activities you can easily perform to get your data ready. High-performance analytics works best with in-memory data. Getting your data into an in-memory server, as well as keeping it fresh and secure, are considerations for in-memory data management. This paper covers how to make small or large data available and how to manage it for analytics. You can choose to perform these activities in a graphical user interface or via batch scripts. This paper describes both ways to perform these activities. You ll be well-prepared to get your data wrangled into shape for analytics!
Gary Mehler, SAS
Since databases often lack the extensive string-handling capabilities available in SAS®, SAS users are often forced to extract complex character data from the database into SAS for string manipulation. As database vendors make regular expression functionality more widely available for use in SQL, the need to move data into SAS for pattern matching, string replacement, and character extraction is necessary less often. This paper covers enough regular expression patterns to make you dangerous, demonstrates the various REGEXP SQL functions, and provides practical applications for each.
Harry Droogendyk, Stratia Consulting Inc
How often have you pulled oodles of data out of the corporate data warehouse down into SAS® for additional processing? Additional processing, sometimes thought to be unique to SAS, includes FIRST. logic, cumulative totals, lag functionality, specialized summarization, and advanced date manipulation. Using the Analytical/OLAP and Windowing functionality available in many databases (for example, Teradata and Netezza) all of this processing can be performed directly in the database without moving and reprocessing detail data unnecessarily. This presentation illustrates how to increase your coding and execution efficiency by using the database's power through your SAS environment.
Harry Droogendyk, Stratia Consulting Inc