![]() | ![]() | ![]() | ![]() |
Whenever you use numeric data such as age, height, or revenue it may be useful to quickly obtain a SAS format such as a decile, duo-decile or any equally sized grouping. Having a tool to automatically create a format and output it to the work library may be useful and could also save time.
Whenever I encounter numeric data and need to do a preliminary univariate analysis or even finalize deciles or groupings in a report I use a macro program I developed called format.sas.
The macro works best when the numeric variable is densely popluated. The new format is output to the work library and the corresponding PROC FORMAT code is output to the print window.
Program Name: format.sas
Purpose: The program creates equally sized groups for non-missing values of continuous numeric variables. Formats are output to work.formats
Required Parameters:
DATAVARFMTGROUPS
******************************************************************************;
* Program: format.sas *;
* Author: Ken Moore *;
* Created: 08/28/03 *;
******************************************************************************;
%MACRO FORMAT(DATA, /* INPUT DATA SET */
VAR, /* CONTINUOUS NUMERIC VARIABLE TO BE FORMATTED */
FMT, /* FORMAT NAME */
GROUPS /* NUMBER OF GROUPS */
);
**********************************************************************;
* ASSIGN RANK FOR NON-MISSINGS ;
**********************************************************************;
proc rank data=&DATA (keep=&VAR where=(&VAR ne .)) out=rank groups=&GROUPS;
var &VAR;
ranks RANK;
run;
*********************************************************************;
* OBTAIN GROUP MEANS FOR FORMAT BOUNDARIES ;
*********************************************************************;
proc means data=rank missing noprint;
var &VAR;
class RANK;
output out=mnrank (where=(RANK ne .))
min=min;
run;
data mnrank;
set mnrank;
RANK_=_N_;
run;
data getrange;
set mnrank;
RANK_=RANK_-1;
run;
proc sort data=getrange;
by RANK_;
run;
proc sort data=mnrank out=smnrank;
by RANK_;
run;
*********************************************************************;
* CREATE FORMAT DATASET ;
*********************************************************************;
data miss;
format START END $10. LABEL $20.;
FMTNAME="&FMT"; START='.'; END='.'; LABEL='Missing';
run;
data getrange2;
format START END $10. LABEL $20. MIN MAX 8.3;
merge getrange (in=a rename=(MIN=MAX_))
smnrank (in=b rename=(MIN=MIN_));
by RANK_;
retain FMTNAME "&FMT" SEXCL EEXCL 'Y';
MAX=MAX_;
MIN=MIN_;
if RANK_>0;
START=PUT(MIN,8.3);
END=PUT(MAX,8.3);
LABEL=left(compbl(put(MIN,8.3)||' -< '||put(MAX,8.3)));
if RANK=0 then do;
START='LOW';
HLO='L';
SEXCL='N';
EEXCL='N';
LABEL=compbl('< '||put(MAX,8.3));
end;
if RANK=%eval(&GROUPS-1) then do;
END='HIGH';
LABEL=compbl(put(MIN,8.3)||' +');
HLO='H';
SEXCL='N';
EEXCL='N';
end;
run;
data getrange3;
set miss getrange2;
run;
proc format cntlin=getrange3;
run;
*********************************************************************;
* OUTPUT SUMMARY TABLE ;
*********************************************************************;
title "VAR: %UPCASE(&VAR.), FORMAT NAME: %UPCASE(&FMT.), GROUPS: &GROUPS ";
proc freq data=&data.;
tables &var. / missing;
format &var. &fmt..;
run;
title;
*********************************************************************;
* OUTPUT PROC FORMAT TO PRINT ;
*********************************************************************;
data _null_;
set getrange3 end=eof;
file print;
TO=' -< ';
if END='HIGH' then TO=' - ';
SEMI='';
if eof then SEMI=';';
if _n_=1 then put "PROC FORMAT;" /
" VALUE %quote(&FMT.)" /
" " START TO END '="'LABEL'"';
else put
" " START TO END '="'LABEL'"' SEMI;
if eof then put "RUN;";
run;
*********************************************************************;
* CLEANUP ;
*********************************************************************;
proc datasets library=work nolist;
delete rank mnrank smnrank getrange miss getrange2 getrange3;
run; quit;
%MEND FORMAT;
|
* sample dataset;
data TEST;
do i=1 to 1308;
RAND=rannor(1);
* set some observations to missing;
if ranuni(1)<.085 then rand=.;
output;
end;
run;
%format(TEST, RAND, DECILE, 10);
|
About the Author
Ken Moore has been programming SAS in the banking, marketing, and healthcare industries for 3 ½ years. He became a SAS Certified professional in November, 2001 and is in the process of obtaining an MS in statistics with concentrations in multivariate analysis and statistical computing at the George Washington University.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
******************************************************************************;
* Program: format.sas *;
* Author: Ken Moore *;
* Created: 08/28/03 *;
******************************************************************************;
%MACRO FORMAT(DATA, /* INPUT DATA SET */
VAR, /* CONTINUOUS NUMERIC VARIABLE TO BE FORMATTED */
FMT, /* FORMAT NAME */
GROUPS /* NUMBER OF GROUPS */
);
**********************************************************************;
* ASSIGN RANK FOR NON-MISSINGS ;
**********************************************************************;
proc rank data=&DATA (keep=&VAR where=(&VAR ne .)) out=rank groups=&GROUPS;
var &VAR;
ranks RANK;
run;
*********************************************************************;
* OBTAIN GROUP MEANS FOR FORMAT BOUNDARIES ;
*********************************************************************;
proc means data=rank missing noprint;
var &VAR;
class RANK;
output out=mnrank (where=(RANK ne .))
min=min;
run;
data mnrank;
set mnrank;
RANK_=_N_;
run;
data getrange;
set mnrank;
RANK_=RANK_-1;
run;
proc sort data=getrange;
by RANK_;
run;
proc sort data=mnrank out=smnrank;
by RANK_;
run;
*********************************************************************;
* CREATE FORMAT DATASET ;
*********************************************************************;
data miss;
format START END $10. LABEL $20.;
FMTNAME="&FMT"; START='.'; END='.'; LABEL='Missing';
run;
data getrange2;
format START END $10. LABEL $20. MIN MAX 8.3;
merge getrange (in=a rename=(MIN=MAX_))
smnrank (in=b rename=(MIN=MIN_));
by RANK_;
retain FMTNAME "&FMT" SEXCL EEXCL 'Y';
MAX=MAX_;
MIN=MIN_;
if RANK_>0;
START=PUT(MIN,8.3);
END=PUT(MAX,8.3);
LABEL=left(compbl(put(MIN,8.3)||' -< '||put(MAX,8.3)));
if RANK=0 then do;
START='LOW';
HLO='L';
SEXCL='N';
EEXCL='N';
LABEL=compbl('< '||put(MAX,8.3));
end;
if RANK=%eval(&GROUPS-1) then do;
END='HIGH';
LABEL=compbl(put(MIN,8.3)||' +');
HLO='H';
SEXCL='N';
EEXCL='N';
end;
run;
data getrange3;
set miss getrange2;
run;
proc format cntlin=getrange3;
run;
*********************************************************************;
* OUTPUT SUMMARY TABLE ;
*********************************************************************;
title "VAR: %UPCASE(&VAR.), FORMAT NAME: %UPCASE(&FMT.), GROUPS: &GROUPS ";
proc freq data=&data.;
tables &var. / missing;
format &var. &fmt..;
run;
title;
*********************************************************************;
* OUTPUT PROC FORMAT TO PRINT ;
*********************************************************************;
data _null_;
set getrange3 end=eof;
file print;
TO=' -< ';
if END='HIGH' then TO=' - ';
SEMI='';
if eof then SEMI=';';
if _n_=1 then put "PROC FORMAT;" /
" VALUE %quote(&FMT.)" /
" " START TO END '="'LABEL'"';
else put
" " START TO END '="'LABEL'"' SEMI;
if eof then put "RUN;";
run;
*********************************************************************;
* CLEANUP ;
*********************************************************************;
proc datasets library=work nolist;
delete rank mnrank smnrank getrange miss getrange2 getrange3;
run; quit;
%MEND FORMAT;
* sample dataset;
data TEST;
do i=1 to 1308;
RAND=rannor(1);
* set some observations to missing;
if ranuni(1)<.085 then rand=.;
output;
end;
run;
%format(TEST, RAND, DECILE, 10);
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
| Type: | Sample |
| Topic: | SAS Reference ==> Macro SAS Reference ==> Formats Non SAS Authors ==> Ken Moore |
| Date Modified: | 2005-02-01 08:47:35 |
| Date Created: | 2004-10-13 16:16:52 |
| Product Family | Product | Host | SAS Release | |
| Starting | Ending | |||
| SAS System | Base SAS | Tru64 UNIX | n/a | n/a |
| Solaris | n/a | n/a | ||
| Linux | n/a | n/a | ||
| OpenVMS Alpha | n/a | n/a | ||
| HP-UX IPF | n/a | n/a | ||
| HP-UX | n/a | n/a | ||
| AIX | n/a | n/a | ||
| ABI+ for Intel Architecture | n/a | n/a | ||
| 64-bit Enabled Solaris | n/a | n/a | ||
| 64-bit Enabled HP-UX | n/a | n/a | ||
| 64-bit Enabled AIX | n/a | n/a | ||
| Windows | n/a | n/a | ||
| Microsoft® Windows® for 64-Bit Itanium-based Systems | n/a | n/a | ||
| OpenVMS VAX | n/a | n/a | ||
| CMS | n/a | n/a | ||
| z/OS | n/a | n/a | ||



