![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Contents: | Purpose / History / Requirements / Usage / Details / Missing Values / Limitations |
%cton(v)
The CtoN macro always attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro issues the following message:
NOTE: Unable to check for newer version of the CTON macro.
The computations performed by the macro are not affected by the appearance of this message. However, this check can be avoided by specifying nochk as the first macro argument. This can be useful if your machine has no connection to the internet.
Version
|
Update Notes
|
2.0 | Variable lists are allowed in var=. Added numvals= and numvaldat=. Revamped options= and added desc, asis, nonotes, and summary options. nochk can be specified as the first (version) parameter. |
1.1 | Added datanote|nodatanote and newcheck|nonewcheck options. |
1.0 | Initial coding |
%inc "<location of your file containing the CtoN macro>";
Following this statement, you can call the CtoN macro. See the Results tab for examples.
The following parameter is required when using the CtoN macro:
The following parameters are optional:
In the special case where the character variables contain only valid numeric values, specify options=asis to effectively convert these variables to numeric using the original numeric values. Valid numeric values include dollar values that use an initial dollar sign and commas (such as $1,000). Values in scientific notation (such as 1.2E3) are also valid. Any nonnumeric or missing values are assigned missing numeric values. The character variable is replaced by a numeric variable with the same name if options=noreplace is not also specified. See the example in the Results tab.
To effectively replace all character variables in data set a with numerically-coded variables as described above, specify %CtoN(data=a, out=a). By using the same data set name in out=, you overwrite the data= data set rather than create a new, separate data set. For example, if you have a character variable, C, that contains numeric values and just need to convert it to a numeric variable, specify %CtoN(data=a, out=a, var=c, options=asis).
The character values can be ordered in one of several ways as specified in order=. Other than the default ordering by unformatted, internal values, you can order them by formatted values (if the variable has a format), or by decreasing frequency count. You can also choose to use the order that the values appear in the data= data set. Any of the possible orderings can be reversed by also specifying options=desc. The default or specified numeric values are then applied to the resulting ordering of the character values.
To specify numeric coding other than the default 1, 2, 3, ..., use either numvals= or numvaldat=. With numvals=, you can specify one list of numeric values that will be used for all character variables specified in var=. numvaldat= allows you to specify different numeric lists for different character variables. If different numeric values lists are needed for multiple character variables, either use numvaldat= or run the macro multiple times using numvals= and specify only one variable in var= each time. See the example in the Results tab. numvals= and numvaldat= are ignored when options=asis. Using either option, if a specified list contains fewer numeric values than the character variable has values, then missing numeric values are assigned to the unmatched character values. If you specify more values than in the character variable, then the extra numeric values are ignored. Note that you can assign the same numeric value to multiple character values, but in this case options=noformat must be specified to prevent creating a format that can result in errors. An example of this appears on the Results tab.
By default, the CtoN macro does not provide any displayed output. However, if you specify options=summary, a table is displayed showing the unique values in each character variable and the numeric values assigned to them. The table displays internal, unformatted values for both the character and new numeric variables. When the original character variables are being replaced (the default when options=noreplace is not specified), the columns of numeric values are labeled with the original character variable name preceded by CODED, or by ASIS if options=asis is specified. Note that any formats associated with the input character variables are not used for the new numeric variables created by the macro. However, if options=noreplace is specified, the original character variables retain any associated formats.
When not replacing the original character variables (via options=noreplace), the new numeric variables add the specified prefix and/or suffix to the original variable names as specified in prefix= and/or suffix=. By default, only the suffix _N is added to the original variable name. If the resulting variable name would be too long to be valid (exceeding 32 characters) then the original name is truncated as needed to allow for the addition of the prefix and/or suffix. Be aware that if several character variable names are long and differ only in the final characters, this truncation can result in numeric variables with the same name. In this case, only the last is retained. All variable names and associated format names can be seen by running PROC CONTENTS on the out= data set.
Long data= data set names or var= character variable names might be truncated to create valid out= data set names or (when options=noreplace is specified) numeric variable names. See the details in the description of out= and in the Details section regarding prefix= and suffix=.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
%cton(data=neuralgia, options=summary)
![]() |
The next call selects a subset of the character variables in the sashelp.cars data set and produces temporary data set cars containing all the original numeric variables and new numeric variables (using the default coding) that replace the original character variables. A table summarizing the conversions is again produced.
%cton(data=sashelp.cars, out=cars, var=type--drivetrain, options=summary)
![]() |
Finally, this call creates data set A with a character variable that contains mostly numeric values, some of which have embedded commas or dollar signs or use scientific notation. One value is not numeric. Rather than using the default coding, options=asis is specified so that the new numeric variable replacing the character variable in output data set A_N contains the actual numeric values. Any nonnumeric values are converted to missing values. The macro uses the COMMA informat which allows numeric values to contain characters that are often used in valid numeric values.
data a; length x $12; input x $ @@; datalines; 1 10 100,000,000 $14.25 xyz 1.2E3 ; %cton(data=a, options=asis summary)
![]() |
data Ex1; infile datalines missover; input id:1. a:$1.; datalines; 1 b 2 c 3 4 a ;
The first macro call creates data set Ex1_N replacing character variable a with numeric variable a using the default coding and with an associated format that uses the original character values as formatted values.
In the table produced by options=summary the original character values are displayed in sorted order and the replacing numeric variable, labeled CODED a, shows the unformatted numeric values assigned.
The table from PROC CONTENTS shows the variables in data set Ex1_N. Note that original character variable a is now numeric variable a which has assigned format A. PROC PRINT displays data set Ex1_N showing the formatted values of numeric variable a which are the original character values. Notice that the missing value in the original character variable is also missing in the new numeric variable.
%cton(data=Ex1, options=summary) proc contents data=Ex1_N; ods select variables; run; proc print data=Ex1_N noobs; title "Data set Ex1_N"; run;
![]() |
The following repeats the above adding options=noreplace noformat.
%cton(data=Ex1, options=noreplace noformat summary) proc contents data=Ex1_N; ods select variables; run; proc print data=Ex1_N noobs; title "Data set Ex1_N"; run;
Because of the noreplace option, a new numeric variable, a_N, is created using the default prefix and the original character variable is retained. Note that the noformat option prevents the assignment of a format to the numeric variable, so the values displayed by PROC PRINT show the numeric coding.
![]() |
The following repeats the last call but adds the desc option to apply the default numeric coding to the original character values after sorting them in descending order. It also uses prefix= and suffix= to assign a custom prefix and suffix to the new numeric variable name.
%cton(data=Ex1, prefix=New_, suffix=_Num, options=desc noreplace noformat summary)
Note that unlike with the previous call, the last character value, c, is now assigned the first numeric variable, 1.
![]() |
In this last call, the desc option is removed and order=data is added. The result is to apply the default numeric coding to the order of the character values as seen in the data= data set.
%cton(data=Ex1, prefix=New_, suffix=_Num, order=data, options=noreplace noformat summary)
As shown in the data Ex1 step above, the order of the character values in the data set is b, c, a. The summary table shows the result of applying the coding to this ordering of the original values.
![]() |
%cton(data=neuralgia, numvals=0 1 2, options=summary)
In the summary table, note that the variables with only two distinct values use only the first two numeric values provided in numvals=. If more values are provided in numvals= than there are in a var= variable, the extra numeric values are ignored.
![]() |
However, if there are more distinct values in a variable than values in numvals=, then the unmatched variable values are assigned missing values.
%cton(data=neuralgia, numvals=0 1, options=summary)
![]() |
If different numeric coding is needed for different character variables, two solutions are available. First, the CtoN macro can be called multiple times with one variable specified in var= and the appropriate numeric coding for that variable specified in numvals=. When this is done, specify the same data set name in data= and out= in the second and subsequent calls. After the following calls, data set neuralgia_n replaces the three character variables with numeric variables using a distinct coding for each.
%cton(data=neuralgia, var=treatment, numvals=1 2 0) %cton(data=neuralgia_n, out=neuralgia_n, var=sex, numvals=0 1) %cton(data=neuralgia_n, out=neuralgia_n, var=pain, numvals=1 0)
Alternatively, the CtoN macro can be called once using numvaldat= which specifies a data set containing the coding to be used for each variable. The data set must contain two character variables named VAR and NUMVALS. Each observation in the data set contains the coding for one of the variables to be coded. In an observation, the VAR variable specifies the name of the variable that the coding should apply to and the NUMVALS variable provides the list of numeric values in the coding.
The lengths of the VAR and NUMVALS variables must be sufficient to fully record the specified values. If the name of a variable specified in var= is not found in the numvaldat= data set, it is given the default coding. If a variable name in the numvaldat= data set is not specified in var=, then it is ignored.
The following creates data set ND with the needed coding information. Note that the three observations provide the distinct coding needed for the three character variables in the neuralgia data set. This data set is then specified in numvaldat= in the macro call.
data nd; length var $32 numvals $32767; infile datalines delimiter=','; input var numvals; datalines; sex, 0 1 pain, 1 0 treatment, 1 2 0 ; %cton(data=neuralgia, numvaldat=nd, options=summary)
![]() |
For example, the character variable, treatment, in the neuralgia data set has categories "A", "B", and "P" representing two test treatments, A and B, and the placebo. For some purposes, it might be desirable to combine the test treatments into a single category.This is done in the next call of the CtoN macro.
%cton(data=neuralgia, var=treatment, numvals=1 1 0, options=noformat summary)
![]() |
Right-click on the link below and select Save to save the CtoN macro definition to a file. It is recommended that you name the file CtoN.sas.
Type: | Sample |
Topic: | Data Management ==> Data Sources ==> SAS Data Sets/Tables Data Management ==> Manipulation and Transformation |
Date Modified: | 2025-06-03 16:02:26 |
Date Created: | 2017-06-23 16:08:52 |
Product Family | Product | Host | SAS Release | |
Starting | Ending | |||
SAS System | Base SAS | z/OS | ||
z/OS 64-bit | ||||
OpenVMS VAX | ||||
Macintosh | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 8 Enterprise 32-bit | ||||
Microsoft Windows 8 Enterprise x64 | ||||
Microsoft Windows 8 Pro 32-bit | ||||
Microsoft Windows 8 Pro x64 | ||||
Microsoft Windows 8.1 Enterprise 32-bit | ||||
Microsoft Windows 8.1 Enterprise x64 | ||||
Microsoft Windows 8.1 Pro 32-bit | ||||
Microsoft Windows 8.1 Pro x64 | ||||
Microsoft Windows 10 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2003 for x64 | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows Server 2008 R2 | ||||
Microsoft Windows Server 2008 for x64 | ||||
Microsoft Windows Server 2012 Datacenter | ||||
Microsoft Windows Server 2012 R2 Datacenter | ||||
Microsoft Windows Server 2012 R2 Std | ||||
Microsoft Windows Server 2012 Std | ||||
Microsoft Windows XP Professional | ||||
Windows 7 Enterprise 32 bit | ||||
Windows 7 Enterprise x64 | ||||
Windows 7 Home Premium 32 bit | ||||
Windows 7 Home Premium x64 | ||||
Windows 7 Professional 32 bit | ||||
Windows 7 Professional x64 | ||||
Windows 7 Ultimate 32 bit | ||||
Windows 7 Ultimate x64 | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
Windows Vista for x64 | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |