Most SAS programmers have used the FREQ procedure, perhaps to evaluate basic data quality, but many valuable features of PROC FREQ are underutilized, even by experienced SAS users who turn to potentially complex routines when similar results could be produced with a simpler approach.
One traditional technique that yields a count of the number of distinct levels of a variable involves sorting the data prior to running a DATA step. Another possible approach is a GROUP BY clause in a PROC SQL query.
The major disadvantage to these approaches is that a separate analysis is required for each variable of interest. Furthermore, to identify the presence of missing values, more complexity must be introduced into each routine.
/* Repeat this for each variable of interest */ proc sort data=maps.metamaps out=sorted nodupkey; by contnent; run; data levels(keep= Contnent_Levels); set sorted end=no_more; if no_more; Contnent_Levels=_N_; label Contnent_Levels="Levels of CONTNENT"; run; proc print data=levels label noobs; run;
Levels of CONTNENT 7 |
The NLEVELS option is supported with PROC FREQ starting in SAS9 and generates a summary of the level counts at the start of the procedure's output, including an indicator of the presence of missing values.
/* One PROC FREQ step can handle many variables */ proc freq data=maps.metamaps nlevels ; tables type contnent gisimprt / noprint; run;
Number of Variable Levels Missing Nonmissing Variable Label Levels Levels Levels ------------------------------------------------------------------------------------ TYPE Type of Map Dataset: GIS or GRAPH 2 0 2 CONTNENT Numeric rep. for Continent 7 1 6 GISIMPRT Imports into GIS 6 1 5 |
A huge advantage of this approach is obtaining information for many variables in a single step. The NLEVELS information is printed even when you suppress the detailed frequency reports with the NOPRINT option in the TABLES statement. The ODS output object named NLEVELS can be captured as a data set, too.
/* One PROC FREQ step can handle many variables */ ods output nlevels=Levels; proc freq data=maps.metamaps nlevels; tables type contnent gisimprt / noprint; run; proc print data=Levels; run;
NNon NMiss Miss Obs TableVar TableVarLabel NLevels Levels Levels 1 TYPE Type of Map Dataset: GIS or GRAPH 2 0 2 2 CONTNENT Numeric rep. for Continent 7 1 6 3 GISIMPRT Imports into GIS 6 1 5 |
To emphasize the most common, or least common, values within your data, sequencing PROC FREQ results based on order of frequency is helpful.
A traditional technique is to sort and print the output data set produced through the OUT= option in the TABLE statement. This approach requires multiple steps to generate and order the counts, plus display the results. In addition, cumulative counts and percentages are not included in the output data set and would need to be calculated from the raw counts and percentages.
/* Repeat this for each variable of interest */ proc freq data=maps.metamaps; tables contnent / noprint out=byvalue; run; proc sort data=byvalue out=bycount; by descending count; run; proc print data=bycount; run;
Obs CONTNENT COUNT PERCENT 1 93 99 31.5287 2 95 83 26.4331 3 91 75 23.8854 4 92 27 8.5987 5 94 19 6.0510 6 96 11 3.5032 7 . 2 . |
For many years, PROC FREQ has had a built-in capability to accomplish this task: the ORDER=FREQ option. SAS programmers could apply this option within their mainframe batch jobs more than a quarter century ago!
/* One PROC FREQ step can handle many variables */ proc freq data=maps.metamaps order=freq ; tables type contnent gisimprt; run;
With this technique it is almost trivial to determine values that occur most often or least often. Likewise, because the cumulative statistics are displayed within the standard report, little effort is needed to determine what percentage of the data falls into the top N groups.
Type of Map Dataset: GIS or GRAPH Cumulative Cumulative TYPE Frequency Percent Frequency Percent ---------------------------------------------------------- GRAPH 281 88.92 281 88.92 GIS 35 11.08 316 100.00 Numeric rep. for Continent Cumulative Cumulative CONTNENT Frequency Percent Frequency Percent ------------------------------------------------------------- 93 99 31.53 99 31.53 95 83 26.43 182 57.96 91 75 23.89 257 81.85 92 27 8.60 284 90.45 94 19 6.05 303 96.50 96 11 3.50 314 100.00 Frequency Missing = 2 Imports into GIS Cumulative Cumulative GISIMPRT Frequency Percent Frequency Percent -------------------------------------------------------------------- NA 179 60.07 179 60.07 ASIS 97 32.55 276 92.62 WHERE LAKE NE 1 20 6.71 296 99.33 WHERE ID NE 999 1 0.34 297 99.66 WHERE LAKE=0 1 0.34 298 100.00 Frequency Missing = 18 |
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | Base SAS | OpenVMS Alpha | 9 TS M0 | |
Linux | 9 TS M0 | |||
HP-UX IPF | 9 TS M0 | |||
64-bit Enabled Solaris | 9 TS M0 | |||
64-bit Enabled HP-UX | 9 TS M0 | |||
64-bit Enabled AIX | 9 TS M0 | |||
Microsoft Windows XP Professional | 9 TS M0 | |||
Microsoft Windows Server 2003 Standard Edition | 9 TS M0 | |||
Microsoft Windows Server 2003 Enterprise Edition | 9 TS M0 | |||
Microsoft Windows Server 2003 Datacenter Edition | 9 TS M0 | |||
Microsoft Windows NT Workstation | 9 TS M0 | |||
Microsoft Windows 2000 Professional | 9 TS M0 | |||
Microsoft Windows 2000 Server | 9 TS M0 | |||
Microsoft Windows 2000 Datacenter Server | 9 TS M0 | |||
Microsoft Windows 2000 Advanced Server | 9 TS M0 | |||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | 9 TS M0 | |||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | 9 TS M0 | |||
Microsoft® Windows® for 64-Bit Itanium-based Systems | 9 TS M0 | |||
z/OS | 9 TS M0 | |||
Tru64 UNIX | 9 TS M0 |
Type: | Usage Note |
Priority: | |
Topic: | SAS Reference ==> Procedures ==> FREQ Common Programming Tasks |
Date Modified: | 2008-01-04 15:41:25 |
Date Created: | 2008-01-04 15:09:15 |