Sample Survey Researcher Papers A-Z

C
Session 0829-2017:
Choose Carefully! An Assessment of Different Sample Designs on Estimates of Official Statistics
Designing a survey is a meticulous process involving a number of steps and many complex choices. For most survey researchers, the choice of a probability or non-probability sample is somewhat simple. However, throughout the sample design process, there are more complex choices each of which can introduce bias into the survey estimates. For example, the sampling statistician must decide whether to stratify the frame. And, if so, he has to decide how many strata, whether to explicitly stratify, and how should a stratum be defined. He also has to decide whether to use clusters. And, if so, how to define a cluster and what should be the ideal cluster size. The factors affecting these choices, along with the impact of different sample designs on survey estimates, are explored in this paper. The SURVEYSELECT procedure in SAS/STAT® 14.1 is used to select a number of samples based on different designs using data from Jamaica's 2011 Population and Housing Census. Census results are assumed to be equal to the true population parameter. The estimates from each selected sample are evaluated against this parameter to assess the impact of different sample designs on point estimates. Design-adjusted survey estimates are computed using the SURVEYMEANS and SURVEYFREQ procedures in SAS/STAT 14.1. The resultant variances are evaluated to determine the sample design that yields the most precise estimates.
Read the paper (PDF)
Leesha Delatie-Budair, Statistical Institute of Jamaica
Jessica Campbell, Statistical Institute of Jamaica
E
Session 0767-2017:
Estimation Strategies Involving Pooled Survey Data
Pooling two or more cross-sectional survey data sets (such as stacking the data sets on top of one another) is a strategy often used by researchers for one of two purposes: (1) to more efficiently conduct significance tests on point estimate changes observed over time or (2) to increase the sample size in hopes of improving the precision of a point estimate. The latter purpose is especially common when making inferences on a subgroup, or domain, of the target population insufficiently represented by a single survey data set. Using data from the National Survey of Family Growth (NSFG), the aim of this paper is to walk through a series of practical estimation objectives that can be tackled by analyzing data from two or more pooled survey data sets. Where applicable, we comment on the resulting interpretive nuances.
Read the paper (PDF)
Taylor Lewis, George Mason University
Session 1308-2017:
Exploration of Information Technology-Related Barriers Affecting Rural Primary Care Clinics
With an aim to improve rural healthcare, Oklahoma State University (OSU) Center for Health Systems Innovation (CHSI) conducted a study with primary care clinics (n=35) in rural Oklahoma to identify possible impediments to clinic workflows. The study entailed semi-structured personal interviews (n=241) and administered an online survey using an iPad (n=190). Respondents encompassed all consenting clinic constituents (physicians, nurses, practice managers, schedulers). Quantitative data from surveys revealed that electronic medical records (EMRs) are well accepted and contributed to increasing workflow efficiency. However, the qualitative data from interviews reveals that there are IT-related barriers like Internet connectivity, hardware problems, and inefficiencies in information systems. Interview responses identified six IT-related response categories (computer, connectivity, EMR-related, fax, paperwork, and phone calls) that routinely affect clinic workflow. These categories together account for more than 50% of all the routine workflow-related problems faced by the clinics. Text mining was performed on transcribed Interviews using SAS® Text Miner to validate these six categories and to further identify concept linking for a quantifiable insight. Two variables (Redundancy Reduction and Idle Time Generation) were derived from survey questions with low scores of -129 and -64 respectively out of 384. Finally, ANOVA was run using SAS® Enterprise Guide® 6.1 to determine whether the six qualitative categories affect the two quantitative variables differently.
Read the paper (PDF)
Ankita Srivastava, Oklahoma State University
Ipe Paramel, Oklahoma State University
Onkar Jadhav, Oklahoma State University
Jennifer Briggs, Oklahoma State University
N
Session SAS0127-2017:
New for SAS® 9.4: Including Text and Graphics in Your Microsoft Excel Workbooks, Part 2
A new ODS destination for creating Microsoft Excel workbooks is available starting in the third maintenance release for SAS® 9.4. This destination creates native Microsoft Excel XLSX files, supports graphic images, and offers other advantages over the older ExcelXP tagset. In this presentation, you learn step-by-step techniques for quickly and easily creating attractive multi-sheet Excel workbooks that contain your SAS® output. The techniques can be used regardless of the platform on which SAS software is installed. You can even use them on a mainframe! Creating and delivering your workbooks on demand and in real time using SAS server technology is discussed. Using earlier versions of SAS to create multi-sheet workbooks is also discussed. Although the title is similar to previous presentations by this author, this presentation contains new and revised material not previously presented.
Read the paper (PDF) | Download the data file (ZIP)
Vince DelGobbo, SAS
Session 0970-2017:
Not So Simple: Intervals You Can Have Confidence In with Real Survey Data
Confidence intervals are critical to understanding your survey data. If your intervals are too narrow, you might inadvertently judge a result to be statistically significant when it is not. While many familiar SAS® procedures, such as PROC MEANS and PROC REG, provide statistical tests, they rely on the assumption that the data comes from a simple random sample. However, almost no real-world survey uses such sampling. Learn how to use the SURVEYMEANS procedure and its SURVEY cousins to estimate confidence intervals and perform significance tests that account for the structure of the underlying survey, including the replicate weights now supplied by some statistical agencies. Learn how to extract the results you need from the flood of output that these procedures deliver.
Read the paper (PDF)
David Vandenbroucke, U.S Department of Housing and Urban Development
P
Session 1252-2017:
Predicting Successful Math Teachers in Secondary Schools in the United States
Are secondary schools in the United States hiring enough qualified math teachers? In which regions is there a disparity of qualified teachers? Data from an extensive survey conducted by the National Center for Education Statistics (NCES) was used for predicting qualified secondary school teachers across public schools in the US. The three criteria examined to determine whether a teacher is qualified to teach a given subject are: 1) Whether the teacher has a degree in the subject he or she is teaching 2) Whether he or she has a teaching certification in the subject 3) Whether he or she has five years of experience in the subject. A qualified teacher is defined as one who has all three of the previous qualifications. The sample data included socioeconomic data at the county level, which was used as predictors for hiring a qualified teacher. Data such as the number of students on free or reduced lunch at the school was used to assign schools as high-needs or low-needs schools. Other socioeconomic factors included were the income and education levels of working adults within a given school district. Some of the results show that schools with higher-needs students (a school that has more than 40% of the students on some form of reduced lunch program) have less-qualified teachers. The resultant model is used to score other regions and is presented on a heat map of the US. SAS® procedures such as PROC SURVEYFREQ and PROC SURVEYLOGISTIC are used.
View the e-poster or slides (PDF)
Bogdan Gadidov, Kennesaw State University
S
Session 0268-2017:
%SURVEYGENMOD Macro: An Alternative to Deal with Complex Survey Design for the GENMOD Procedure
The purpose of this paper is to show a SAS® macro named %SURVEYGENMOD developed in a SAS/IML® procedure as an upgrade of macro %SURVEYGLM developed by Silva and Silva (2014) to deal with complex survey design in generalized linear models (GLMs). The new capabilities are the inclusion of negative binomial distribution, zero-inflated Poisson (ZIP) model, zero-inflated negative binomial (ZINB) model, and the possibility to get estimates for domains. The R function svyglm (Lumley, 2004) and Stata software were used as background, and the results showed that estimates generated by the %SURVEYGENMOD macro are close to the R function and Stata software.
Read the paper (PDF)
Alan Ricardo da Silva, University of Brasilia
T
Session 0802-2017:
The Truth Is Out There: Leveraging Census Data Using PROC SURVEYLOGISTIC
The advent of robust and thorough data collection has resulted in the term big data. With Census data becoming richer, more nationally representative, and voluminous, we need methodologies that are designed to handle the manifold survey designs that Census data sets implement. The relatively nascent PROC SURVEYLOGISTIC, an experimental procedure in SAS®9 and fully supported in SAS 9.1, addresses some of these methodologies, including clusters, strata, and replicate weights. PROC SURVEYLOGISTIC handles data that is not a straightforward random sample. Using Census data sets, this paper provides examples highlighting the appropriate use of survey weights to calculate various estimates, as well as the calculation and interpretation of odds ratios between categorical variable interactions when predicting a binary outcome.
Read the paper (PDF)
Richard Dirmyer, Rochester Institute of Technology
back to top