SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 30867: Modernizing Your SAS Code: PROC FREQ Applications

DetailsAboutRate It

Most SAS programmers have used the FREQ procedure, perhaps to evaluate basic data quality, but many valuable features of PROC FREQ are underutilized, even by experienced SAS users who turn to potentially complex routines when similar results could be produced with a simpler approach.

Counting the Number of Distinct Levels of a Classification Variable

One traditional technique that yields a count of the number of distinct levels of a variable involves sorting the data prior to running a DATA step. Another possible approach is a GROUP BY clause in a PROC SQL query.

The major disadvantage to these approaches is that a separate analysis is required for each variable of interest. Furthermore, to identify the presence of missing values, more complexity must be introduced into each routine.

   /* Repeat this for each variable of interest */
     proc sort data=maps.metamaps out=sorted nodupkey;
        by contnent;
     run;
     data levels(keep= Contnent_Levels);
        set sorted end=no_more;
        if no_more;
        Contnent_Levels=_N_;
        label Contnent_Levels="Levels of CONTNENT";
     run;
     proc print data=levels label noobs;
     run;

Levels
of
CONTNENT

7

The NLEVELS option is supported with PROC FREQ starting in SAS9 and generates a summary of the level counts at the start of the procedure's output, including an indicator of the presence of missing values.

   /* One PROC FREQ step can handle many variables */
     proc freq data=maps.metamaps nlevels ;
        tables type contnent gisimprt / noprint;
     run;

                             Number of Variable Levels

                                                               Missing    Nonmissing
Variable    Label                                   Levels      Levels        Levels
------------------------------------------------------------------------------------
TYPE        Type of Map Dataset: GIS or GRAPH            2           0             2
CONTNENT    Numeric rep. for Continent                   7           1             6
GISIMPRT    Imports into GIS                             6           1             5

A huge advantage of this approach is obtaining information for many variables in a single step. The NLEVELS information is printed even when you suppress the detailed frequency reports with the NOPRINT option in the TABLES statement. The ODS output object named NLEVELS can be captured as a data set, too.

 /* One PROC FREQ step can handle many variables */
  ods output nlevels=Levels;
  proc freq data=maps.metamaps nlevels;
     tables type contnent gisimprt / noprint;
  run;
  proc print data=Levels;
  run;

                                                                                     NNon
                                                                        NMiss        Miss
Obs    TableVar    TableVarLabel                          NLevels      Levels      Levels

  1    TYPE        Type of Map Dataset: GIS or GRAPH            2           0           2
  2    CONTNENT    Numeric rep. for Continent                   7           1           6
  3    GISIMPRT    Imports into GIS                             6           1           5

Displaying Counts in Descending Order of Frequency

To emphasize the most common, or least common, values within your data, sequencing PROC FREQ results based on order of frequency is helpful.

A traditional technique is to sort and print the output data set produced through the OUT= option in the TABLE statement. This approach requires multiple steps to generate and order the counts, plus display the results. In addition, cumulative counts and percentages are not included in the output data set and would need to be calculated from the raw counts and percentages.

  /* Repeat this for each variable of interest */
     proc freq data=maps.metamaps;
        tables contnent / noprint out=byvalue; 
     run;
    proc sort data=byvalue out=bycount;
       by descending count;
    run;
    proc print data=bycount;
    run;

Obs    CONTNENT    COUNT    PERCENT

 1        93         99     31.5287
 2        95         83     26.4331
 3        91         75     23.8854
 4        92         27      8.5987
 5        94         19      6.0510
 6        96         11      3.5032
 7         .          2       .
 
 

For many years, PROC FREQ has had a built-in capability to accomplish this task: the ORDER=FREQ option. SAS programmers could apply this option within their mainframe batch jobs more than a quarter century ago!

  /* One PROC FREQ step can handle many variables */
    proc freq data=maps.metamaps order=freq ;
       tables type contnent gisimprt;
    run;

With this technique it is almost trivial to determine values that occur most often or least often. Likewise, because the cumulative statistics are displayed within the standard report, little effort is needed to determine what percentage of the data falls into the top N groups.

            Type of Map Dataset: GIS or GRAPH

                                  Cumulative    Cumulative
TYPE     Frequency     Percent     Frequency      Percent
----------------------------------------------------------
GRAPH         281       88.92           281        88.92
GIS            35       11.08           316       100.00


                  Numeric rep. for Continent

                                     Cumulative    Cumulative
CONTNENT    Frequency     Percent     Frequency      Percent
-------------------------------------------------------------
      93          99       31.53            99        31.53
      95          83       26.43           182        57.96
      91          75       23.89           257        81.85
      92          27        8.60           284        90.45
      94          19        6.05           303        96.50
      96          11        3.50           314       100.00

Frequency Missing = 2


                          Imports into GIS

                                            Cumulative    Cumulative
GISIMPRT           Frequency     Percent     Frequency      Percent
--------------------------------------------------------------------
NA                      179       60.07           179        60.07
ASIS                     97       32.55           276        92.62
WHERE LAKE NE 1          20        6.71           296        99.33
WHERE ID NE 999           1        0.34           297        99.66
WHERE LAKE=0              1        0.34           298       100.00

Frequency Missing = 18

 


Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemBase SASOpenVMS Alpha9 TS M0
Linux9 TS M0
HP-UX IPF9 TS M0
64-bit Enabled Solaris9 TS M0
64-bit Enabled HP-UX9 TS M0
64-bit Enabled AIX9 TS M0
Microsoft Windows XP Professional9 TS M0
Microsoft Windows Server 2003 Standard Edition9 TS M0
Microsoft Windows Server 2003 Enterprise Edition9 TS M0
Microsoft Windows Server 2003 Datacenter Edition9 TS M0
Microsoft Windows NT Workstation9 TS M0
Microsoft Windows 2000 Professional9 TS M0
Microsoft Windows 2000 Server9 TS M0
Microsoft Windows 2000 Datacenter Server9 TS M0
Microsoft Windows 2000 Advanced Server9 TS M0
Microsoft Windows Server 2003 Enterprise 64-bit Edition9 TS M0
Microsoft Windows Server 2003 Datacenter 64-bit Edition9 TS M0
Microsoft® Windows® for 64-Bit Itanium-based Systems9 TS M0
z/OS9 TS M0
Tru64 UNIX9 TS M0
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.