Contents: | Purpose / History / Requirements / Usage / Details / See Also / References |
NOTE: Beginning with SAS/STAT 13.1 in SAS 9.4 TS1M1, the functionality of these macros has been updated and added to the ICLIFETEST procedure. For details, see the ICLIFETEST documentation.
Version | Update Notes |
2.2 | Added fuzzing in the %EMICM macro to avoid floating point errors. |
2.1 | The %EMICM and %ICSTEST macros are now numerically more stable. Both macros now accept a missing LEFT= value to represent left censoring. Observations with improper time or frequency values are rejected and the number of observations deleted are reported. %EMICM now gives the standard error of the survival probability rather than the variance. |
2.0 | Added the %EMICM macro allowing estimation of survival curves using the ICM and EM-ICM algorithms. This macro is recommended over the %ICE macro as described below. Added the %ICSTEST macro allowing comparison of survival curves using the generalized log-rank tests of Zhao and Sun (2004) and Sun, Zhao, and Zhao (2005). The %ICE macro remains at Version 1. |
1.0 | Initial coding of %ICE macro (13JUL93). |
%EMICM Macro
Follow the instructions in the Downloads tab of this sample to save the %EMICM macro definition. Before calling the %EMICM macro, specify the following %INC statement in your SAS program or in the SAS editor window to define the macro and make it available for use. In the %INC statement, replace the text within quotes with the location of the %EMICM macro definition file on your system.%inc "<location of your file containing the EMICM macro>";
Following this statement, you can call the %EMICM macro using the following syntax:
%EMICM(<list of macro arguments separated by commas>)
To use the %EMICM macro, prepare your data such that:
For discussion and an example, see So, Johnston, and Kim (2010).
The following arguments may be listed within parentheses in any order, separated by commas. The LEFT= and RIGHT= arguments are required. All other arguments are optional.
DATA= SAS data set to be analyzed. Default is DATA=_last_. LEFT= A numeric variable representing the left endpoint of the time interval. Left-censored observations have a missing value or a value of 0. RIGHT= A numeric variable representing the right endpoint of the time interval. Right-censored observations have a missing value (note this is different from the %ICE macro). FREQ= A single numeric variable whose values represent the frequency of occurrence of the observations. GROUP= A variable identifying different treatment groups. A separate NPMLE is computed for each value of the GROUP= variable. METHOD= Optimization technique for computing the NPMLE: EM -- Turnbull's self-consistency algorithm ICM -- Iterative Convex Minorant algorithm EMICM -- EM-ICM algorithm Default is METHOD=EMICM. OUT= The output data set containing the NPMLE. OUTITER= The output data set containing the history of iterations. Variables ERROR1, ERROR2, ERROR3, and ERROR4 represent the convergence measures of ERRORTYPE=1, ERRORTYPE=2, ERRORTYPE=3, and ERRORTYPE=4, respectively. ERRORTYPE= Convergence criterion to be used: 1 -- The maximum of the closeness of consecutive estimates, 2 -- The closeness of the log likelihood function, 3 -- The gradient of the log likelihood function, 4 -- The maximum measures of ERRORTYPE=1, ERRORTYPE=2, and ERRORTYPE=3. Default is ERRORTYPE=1. RATECONV= Rate of convergence for the selected ERRORTYPE. Default is RATECONV=1e-7. mRS= Number of resampling for the generalized Greenwood formula. Default is mRS=50. SEED= Random seed for resampling for the generalized Greenwood formula. Default is SEED=8375. TITLE= Primary title for the NPMLE plot. TITLE2= Secondary title for the NPMLE plot. TIMELABEL= Label of time axis in the NPMLE plot. OPTIONS= List of display options (separated by blanks): NOTABLE -- Suppressing the table of the NPMLE. PLOT -- Graphical display of the estimated survival curve.
%ICSTEST Macro
Follow the instructions in the Downloads tab of this sample to save the %ICSTEST macro definition. Before calling the %ICSTEST macro, specify the following %INC statement in your SAS program or in the SAS editor window to define the macro and make it available for use. In the %INC statement, replace the text within quotes with the location of the %ICSTEST macro definition file on your system.%inc "<location of your file containing the ICSTEST macro>";
Following this statement, you can call the %ICSTEST macro using the following syntax:
%ICSTEST(<list of macro arguments separated by commas>)
To use the %ICSTEST macro, prepare your data such that:
For discussion and an example, see So, Johnston, and Kim (2010).
The following arguments may be listed within parentheses in any order, separated by commas. The LEFT=, RIGHT=, and GROUP= arguments are required. All other arguments are optional.
DATA= SAS data set to be analyzed. Default is DATA=_last_. LEFT= A numeric variable representing the left endpoint of the time interval. Left-censored observations have a missing value or a value of 0. RIGHT= A numeric variable representing the right endpoint of the time interval. Right-censored observations have a missing value (note this is different from the %ICE macro). FREQ= A single numeric variable whose values represent the frequency of occurrence of the observations. GROUP= A variable identifying different treatment groups. ERRORTYPE= Convergence criterion to be used. 1 -- The maximum of the closeness of consecutive estimates, 2 -- The closeness of the log likelihood function, 3 -- The gradient of the log likelihood function, 4 -- The maximum measures of ERRORTYPE=1, ERRORTYPE=2, and ERRORTYPE=3. Default is ERRORTYPE=1. RATECONV= Rate of convergence for the selected ERRORTYPE. Default is RATECONV=1e-7. mRS= Number of resampling for the generalized Greenwood formula. Default is mRS=50. SEED= Random seed for resampling for the generalized Greenwood formula. Default is SEED=8375.
%ICE Macro
First, save the XMACRO set of utility macros to a file on your system. Then follow the instructions in the Downloads tab of this sample to save and edit the %ICE macro definition. Before calling the %ICE macro, specify the following %INC statement in your SAS program or in the SAS editor window to define the macro and make it available for use. In the %INC statement, replace the text within quotes with the location of the %ICE macro definition file on your system.%inc "<location of your file containing the ICE macro>";
Following this statement, you can call the %ICE macro using the following syntax:
%ICE(<list of macro arguments separated by commas>)
To use the %ICE macro, prepare your data such that:
For an example, see the Results tab.
The following arguments may be listed within parentheses in any order, separated by commas. Only the TIME= argument is required, all other arguments are optional.
DATA= SAS data set to be analyzed. BY= List of variables for BY groups. TIME= Two variables (separated by blanks) representing the left and right endpoints of the time interval. You may enclosed these variable names by a pair of parentheses/brackets/ braces, but a comma should not be used to separate the names. See above for how to represent exact, left, and right censored times. FREQ= A single numeric variable whose values represent the frequency of occurrence of the observations. TECH= Optimization technique for maximizing the likelihood. Valid values are: NRA -- Newton-Raphson Ridge QN -- Quasi-Newton CG -- Conjugate Gradient EM -- Self-Consistency Algorithm of Turnbull NRA, QN and CG are NLP optimization routines. EM is the self-consistency algorithm. With m as the number of estimated parameters, the default technique is NRA if m <=30 QN if 30 < m <= 200 CG if m > 200 LBOUND= Lower bound for the estimated parameters. The default is 1e-6. Only used in the NRA, QN and CG techniques. ALPHA= A number between 0 and 1 that sets the level of the confidence intervals for the survival curve. The confidence level for the intervals is 1-ALPHA. The default is .05. OPTIONS= List of display options (separated by blanks): NOPRINT Suppress printing of the parameter estimates, the survival curve estimates and confidence limits for the survival curve. PLOT Graphical display of the estimated survival curve. NLPOPT= An IML row vector to be passed into the OPT argument of the NLP optimization routines. This vector controls the option vector of the NLP optimization routine. The default is {1 0}. NLPTC= An IML row vector to be passed into the TC argument of the NLP optimization routine. This vector controls the termination criteria of the NLP optimization routine. The default is {2000 5000}. EMCONV= Convergence criterion for the EM technique. Convergence is declared if the increase in the log-likelihood is less than the convergence criterion. The default is 1e-8. OUTE= A SAS data set name containing the parameter estimates. OUTS= A SAS data set name containing the estimates of the survival curve and the corresponding confidence limits.
The following statements, specified before the %ICE macro call, may be useful for diagnosing errors:
%let _notes_=1; Prints SAS notes for all steps %let _echo_=1; Prints the arguments to the ICE macro %let _echo_=2; Prints the arguments to the ICE macro after defaults have been set options mprint; Prints SAS code generated by the macro options mlogic symbolgen; Prints lots of macro debugging info
To turn off the extra information produced by the above statements, specify these statements as needed:
%let _notes_=0; %let _echo_=0; options nomprint; options nomlogic nosymbolgen;
For the %EMICM and %ICSTEST macros, the version of the macro that you are using is displayed when you specify version (or any string) as the first argument. For example:
%EMICM(version, data=mydata, ...other options...)
Estimation of survival curves
The data for the ith subject consist of an interval of the form [Li,Ri]. Under the survival curve G, the likelihood for the ith observation isG(Li-) - G(Ri+)
Let [q1,p1],[q2,p2], ... ,[qm,pm] be the set of disjoint intervals whose left and right end points lie in the set {Li : 1≤i≤N} and {Ri :1≤i≤N} respectively, and which contain no other members of {Li} or {Ri} except at their end points. For 1≤j≤m, define Θj= G(qj-)-G(pj+). Then Θj≥0 and Σj Θj=1. The likelihood is proportional to
Lik(Θ1,Θ2, ... ,Θm)= Πi[ΣjαijΘj]
where αij=1 if [qj,pj] lies in [Li,Ri] and 0 otherwise. The nonparametric maximum likelihood estimates (NPMLEs) of Θ1, ... ,Θm are obtained by an NLP optimization routine or using Turnbull's (1976) self-consistency equation which can be solved using an expectation-maximization (EM) algorithm. These are available in the %ICE macro. The estimated variance is computed by inverting the negative of the Hessian matrix evaluated at the NPMLE.
Even with a moderate number of parameters, the EM algorithm is very slow. The iterative convex minorant (ICM) algorithm of Groeneboom and Wellner (1992) and the EM iterative convex minorant algorithm (EM-ICM) of Wellner and Zhan (1997) are much more efficient methods of computing the NPMLE than the EM algorithm. The latter algorithm converges to the NPMLE if it exists and is unique. The EM algorithm, the ICM, and the EM-ICM methods are available in the %EMICM macro. The estimated variance is computed based on the generalized Greenwood formula (Sun 2001).
While the %ICE and %EMICM macros both estimate the survival curve for interval-censored data, the %EMICM macro is recommended because of the increased efficiency of its optimization algorithm as described above. In addition, the %EMICM macro can create a plot with overlaid survival curves for multiple treatment groups.
Comparison of survival curves
Zhao and Sun (2004) generalized the log-rank test of Sun (1996) to include exact failure times in the interval-censored data. They also use an imputation approach to compute the variance of this generalized log-rank statistic. Sun, Zhao, and Zhao (2005) propose a new class of k-sample test for interval-censored data. Both tests are available in the %ICSTEST macro.
Macro updates
The %EMICM and %ICSTEST macros attempt to check for a later version of themselves. If a macro is unable to do this (such as if there is no active internet connection available), the macro will issue a message like the following:
EMICM: Unable to check for newer version
The computations performed by the macro are not affected by the appearance of this message.
For additional references, see the References section of So, Johnston, and Kim (2010). A few errors appear in the paper and are presented below:
Errata for So, Johnston, and Kim (2010)
In the call to the %ICSTEST macro, the comma after Therapy should be omitted as follows:
*************************************************** * Generalized Logrank Test I * ***************************************************; %ICSTEST(data=BreastCancer, left=lTime, right=rTime, group=Therapy );
The results shown in the paper as "Figure 4: Generalized Log-Rank Test I for the Breast Cancer Data" are incorrect for the data as presented in the paper. Correct results for these data are computed by the macro.
The paper shows the %ICE macro producing the generalized log-rank test of Sun, Zhao, and Zhao (2005). This test is now provided, along with the test of Zhao and Sun (2004), by the %ICSTEST macro.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
data ex1; input l r f; datalines; 0 2 1 1 3 1 2 10 4 4 10 4 ; %inc "<location of your file containing the ICE macro>"; %ice(data=ex1,time=(l r),freq=f);
Nonparametric Survival Curve for Interval Censoring Number of Observations: 4 Number of Parameters: 3 Optimization Technique: Newton Raphson Ridge Parameter Estimates Q P THETA 1 2 0.1999995 2 3 0.0000010 4 10 0.7999995 Survival Curve Estimates and 95% Confidence Intervals LEFT RIGHT ESTIMATE LOWER UPPER 0 1 1.0000 . . 2 2 0.8000 0.4494 1.0000 3 4 0.8000 0.4494 1.0000 10 10 0.0000 . .
Right-click on the links below and select Save to save the %EMICM, %ICSTEST, and %ICE macro definitions to files. It is recommended that you name the files emicm.sas, icstest.sas, and ice.sas.
For the %ICE macro only, edit the %INC statement in the first line and change it to point to your local file containing the XMACRO macro definitions.
Download and save emicm.sas Version 2.2
Type: | Sample |
Topic: | Analytics ==> Nonparametric Analysis Analytics ==> Survival Analysis |
Date Modified: | 2016-06-01 14:55:48 |
Date Created: | 2005-01-13 15:02:37 |
Product Family | Product | Host | SAS Release | |
Starting | Ending | |||
SAS System | SAS/IML | All | n/a | n/a |
SAS System | SAS/STAT | All | n/a | n/a |