SAS Global Forum 2017 Proceedings

Specifying the functional form of a covariate is a fundamental part of developing a regression model. The choice to include a variable as continuous, categorical, or as a spline can be determined by model fit. This paper offers an efficient and user-friendly SAS^® macro (%SPECI) to help analysts determine how best to specify the appropriate functional form of a covariate in a linear, logistic, and survival analysis model. For each model, our macro provides a graphical and statistical single-page comparison report of the covariate as a continuous, categorical, and restricted cubic spline variable so that users can easily compare and contrast results. The report includes the residual plot and distribution of the covariate. You can also include other covariates in the model for multivariable adjustment. The output displays the likelihood ratio statistic, the Akaike Information Criterion (AIC), as well as other model-specific statistics. The %SPECI macro is demonstrated using an example data set. The macro includes the PROC REG, PROC LOGISTIC, PROC PHREG, PROC REPORT, PROC SGPLOT, and more procedures in SAS^® 9.4.

Read the paper (PDF) | Download the data file (ZIP)

Duplicates in a clinical trial or survey database could jeopardize data quality and integrity, and they can induce biased analysis results. These complications often happen in clinical trials, meta analyses, and registry and observational studies. Common practice to identify possible duplicates involves sensitive personal information, such as name, Social Security number (SSN), date of birth, address, telephone number, etc. However, access to this sensitive information is limited. Sometimes, it is even restricted. As a measure of data quality control, a SAS^® program was developed to identify duplicated individuals using non-sensitive information, such as age, gender, race, medical history, vital signs, and laboratory measurements. A probabilistic approach was used by calculating weights for data elements used to identify duplicates based on two probabilities (probability of agreement for an element among matched pairs and probability of agreement purely by chance among non-matched pairs). For elements with categorical values, agreement was defined as matching pairs sharing the same value. For elements with interval values, agreement was defined as matching values within 1% of measurement precision range. Probabilities used to compute matching element weights were estimated using an expectation-maximization (EM) algorithm. The method was then tested on a survey and clinical trial data from hypertension studies.

View the e-poster or slides (PDF)

SAS/ACCESS^® software grants access to data in third-party database management systems (DBMS), but how do you access data in DBMS not supported by SAS/ACCESS products? The introduction of the GROOVY procedure in SAS^® 9.3 lets you retrieve this formerly inaccessible data through a JDBC connection. Groovy is an object-oriented, dynamic programming language executed on the Java Virtual Machine (JVM). Using Microsoft Azure HDInsight as an example, this paper demonstrates how to access and read data into a SAS data set using PROC GROOVY and a JDBC connection.