An Analysis of Airline Delays with SAS/IML® Studio
Rick Wicklin, SAS Institute Inc., 2009
The Data Expo is a biannual poster session usually sponsored jointly by the ASA Sections on Statistical Graphics and Statistical Computing. The purpose of the poster session is to distribute an interesting data set to many researchers and to challenge them to use statistical graphics to describe and visualize the data concisely on a single poster. The session helps highlight the importance of statistical graphics in data analysis.
This year's Data Expo was organized by Hadley Wickham, who assembled a truly massive set of data from the Research and Innovative Technology Administration (RITA) which coordinates the U.S. Department of Transportation (DOT) research programs. The data (available from http:// stat-computing.org/dataexpo/2009/) consist of 123 million records of U.S. domestic commercial flights between 1987 and 2008. Each flight contains information about 29 variables, including the following:
SAS software excels at handling massive data. This proved to be an advantage: the poster by Wicklin and Allison (2009) was awarded first place. This article describes several graphs in the poster. You can browse the electronic version of all the entries at http://stat-computing.org/dataexpo/ 2009/posters/.
The poster graphically presents ways in which flight delays and cancellations vary in time, among airports, and among airline carriers. The article also describes graphical methods and features of the data that elicited the most comments from visitors to the Data Expo.
All but one of the graphs in this paper were created by SAS/IML® Studio (formerly named SAS® Stat Studio) which is new software in SAS 9.2. SAS/IML Studio is intended for data analysts who are familiar with SAS/STAT® software, but who need a versatile programming environment to develop new computational algorithms or statistical graphics. See Wicklin (2008) for more information on SAS/IML Studio.