Using smart clothing with wearable medical sensors integrated to keep track of human health is now attracting many researchers. However, body movement caused by daily human activities inserts artificial noise into physiological data signals, which affects the output of a health monitoring/alert system. To overcome this problem, recognizing human activities, determining relationship between activities and physiological signals, and removing noise from the collected signals are essential steps. This paper focuses on the first step, which is human activity recognition. Our research shows that no other study used SAS® for classifying human activities. For this study, two data sets were collected from an open repository. Both data sets have 561 input variables and one nominal target variable with four levels. Principal component analysis along with other variable reduction and selection techniques were applied to reduce dimensionality in the input space. Several modeling techniques with different optimization parameters were used to classify human activity. The gradient boosting model was selected as the best model based on a test misclassification rate of 0.1233. That is, 87.67% of total events were classified correctly.
Minh Pham, Oklahoma State University
Mary Ruppert-Stroescu, Oklahoma State University
Mostakim Tanjil, Oklahoma State University
At a community college, there was a need for college employees to quickly and easily find available classroom time slots for the purposes of course scheduling. The existing method was time-consuming and inefficient, and there were no available IT resources to implement a solution. The Office of Institutional Research, which had already been delivering reports using SAS® Enterprise BI Server, created a report called Find an Open Room to fill the need. By combining SAS® programming techniques, a scheduled SAS® Enterprise Guide® project, and a SAS® Web Report Studio report delivered within the SAS® Information Delivery Portal, a report was created that allowed college users to search for available time slots.
Nicole Jagusztyn, Hillsborough Community College
As part of promoting a data-driven culture and data analytics modernization at its federal sector clientele, Northrop Grumman developed a framework for designing and implementing an in-house Data Analytics and Research Center (DAARC) using a SAS® set of tools. This DAARC provides a complete set of SAS® Enterprise BI (Business Intelligence) and SAS® Data Management tools. The platform can be used for data research, evaluations, and analysis and reviews by federal agencies such as the Social Security Administration (SSA), the Center for Medicare and Medicaid Services (CMS), and others. DAARC architecture is based on a SAS data analytics platform with newer capabilities of data mining, forecasting, visual analytics, and data integration using SAS® Business Intelligence. These capabilities enable developers, researchers, and analysts to explore big data sets with varied data sources, create predictive models, and perform advanced analytics including forecasting, anomaly detection, use of dashboards, and creating online reports. The DAARC framework that Northrop Grumman developed enables agencies to implement a self-sufficient 'analytics as a service' approach to meet their business goals by making informed and proactive data-driven decisions. This paper provides a detailed approach to how the DAARC framework was established in strong partnership with federal customers of Northrop Grumman. This paper also discusses the best practices that were adopted for implementing specific business use cases in order to save tax-payer dollars through many research-related analytical and statistical initiatives that continue to use this platform.
Vivek Sethunatesan, Northrop Grumman
In cooperation with the Joint Research Centre - European Commission (JRC), we have developed a number of innovative techniques to detect outliers on a large scale. In this session, we show the power of SAS/IML® Studio as an interactive tool for exploring and detecting outliers using customized algorithms that were built from scratch. The JRC uses this for detecting abnormal trade transactions on a large scale. The outliers are detected using the Forward Search, which starts from a central subset in the data and subsequently adds observations that are close to the current subset based on regression (R-student) or multivariate (Mahalanobis distance) output statistics. The implementation of this algorithm and its applications were done in SAS/IML Studio and converted to a macro for use in the IML procedure in Base SAS®.
Jos Polfliet, SAS
The North Carolina Community College System office can quickly and easily enable colleges to compare their program's success to other college programs. Institutional researchers can now spend their days quickly looking at trends, abnormalities, and other colleges, compared to spending their days digging for data to load into a Microsoft Excel spreadsheet. We look at performance measures and how programs are being graded using SAS® Visual Analytics.
Bill Schneider, North Carolina Community College System
All public schools in the United States require health and safety education for their students. Furthermore, almost all states require driver education before minors can obtain a driver's license. Through extensive analysis of the Fatality Analysis Reporting System data, we have concluded that from 2011-2013 an average of 12.1% of all individuals killed in a motor vehicle accident in the United States, District of Columbia, and Puerto Rico were minors (18 years or younger). Our goal is to offer insight within our analysis in order to better road safety education to prevent future premature deaths involving motor vehicles.
Molly Funk, Bryant University
Max Karsok, Bryant University
Michelle Williams, Bryant University
Educational systems at the district, state, and national levels all report possessing amazing student-level longitudinal data systems (LDS). Are the LDS systems improving educational outcomes for students? Are they guiding development of effective instructional practices? Are the standardized exams measuring student knowledge relative to the learning expectations? Many questions exist about the effective use of the LDS system and educational data, but data architecture and analytics (including the products developed by SAS®) are not designed to answer any of these questions. However, the ability to develop more effective educational interfaces, improve use of data to the classroom level, and improve student outcomes, might only be available through use of SAS. The purpose of this session and paper is to demonstrate an integrated use of SAS tools to guide the transformation of data to analytics that improve educational outcomes for all students.
Sean Mulvenon, University of Arkansas
The Add Health Parent Study is using a new and innovative method to augment our other interview verification strategies. Typical verification strategies include calling respondents to ask questions about their interview, recording pieces of interaction (CARI - Computer Aided Recorded Interview), and analyzing timing data to see that each interview was within a reasonable length. Geocoding adds another tool to the toolbox for verifications. By applying street-level geocoding to the address where an interview is reported to be conducted and comparing that to a captured latitude/longitude reading from a GPS tracking device, we are able to compute the distance between two points. If that distance is very small and time stamps are close to each other, then the evidence points to the field interviewer being present at the respondent's address during the interview. For our project, the street-level geocoding to an address is done using SAS® PROC GEOCODE. Our paper describes how to obtain a US address database from the SAS website and how it can be used in PROC GEOCODE. We also briefly compare this technique to using the Google Map API and Python as an alternative.
Chris Carson, RTI International
Lilia Filippenko, RTI International
Mai Nguyen, RTI International
Multivariate statistical analysis plays an increasingly important role as the number of variables being measured increases in educational research. In both cognitive and noncognitive assessments, many instruments that researchers aim to study contain a large number of variables, with each measured variable assigned to a specific factor of the bigger construct. Recalling the educational theories or empirical research, the factor of each instrument usually emerges the same way. Two types of factor analysis are widely used in order to understand the latent relationships among these variables based on different scenarios. (1) Exploratory factor analysis (EFA), which is performed by using the SAS® procedure PROC FACTOR, is an advanced statistical method used to probe deeply into the relationship among the variables and the larger construct and then develop a customized model for the specific assessment. (2) When a model is established, confirmatory factor analysis (CFA) is conducted by using the SAS procedure PROC CALIS to examine the model fit of specific data and then make adjustments for the model as needed. This paper presents the application of SAS to conduct these two types of factor analysis to fulfill various research purposes. Examples using real noncognitive assessment data are demonstrated, and the interpretation of the fit statistics is discussed.
Jun Xu, Educational Testing Service
Steven Holtzman, Educational Testing Service
Kevin Petway, Educational Testing Service
Lili Yao, Educational Testing Service