Introduction |
Stat Studio provides the data analyst with interactive and dynamic statistical graphics. By definition, interactive graphics must respond quickly to the changes and manipulations of the analyst. This quick response restricts the size of data sets that can be handled while still maintaining interactivity.
Wegman (1995) points out that the number of observations you can
analyze depends on the algorithmic complexity of the
statistical algorithms you are using. For example, if you have
observations, computing a mean and variance is
, sorting is
, and solving a least squares regression
on
variables is
Furthermore, visualization of
individual observations is
limited by the number of pixels that can be represented on a display
device.
Wegman's conclusion is that ``visualization of data
sets say of size or more is clearly a wide open field.'' More
recently, Unwin, Theus, and Hofmann (2006) discuss the challenges of
"visualizing a million," including a chapter dedicated to
interactive graphics.
On a typical PC (for example, a 1.8 GHz CPU with 512 MB of RAM), Stat Studio can help you analyze dozens of variables and tens of thousands of observations. Visualization of data with graphics such as histograms and box plots remains feasible for hundreds of thousands of observations, although the interactive graphics become less responsive. Scatter plots of this many observations suffer from overplotting.
Stat Studio uses the RAM on your PC to facilitate interaction and linking between plots and data tables. If you routinely analyze large data sets, increasing the RAM on your PC might increase Stat Studio's interactivity. For example, if you routinely examine hundreds of thousands of observations in dozens of variables, 1 GB of RAM is preferable to 512 MB.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.