SUPPORT / SAMPLES & SAS NOTES
 

Support

Sample 24980: Nonparametric estimation and comparison of survival curves from interval-censored data

DetailsResultsDownloadsAboutRate It

Nonparametric estimation and comparison of survival curves from interval-censored data

Contents: Purpose / History / Requirements / Usage / Details / See Also / References

NOTE: Beginning with SAS/STAT 13.1 in SAS 9.4 TS1M1, the functionality of these macros has been updated and added to the ICLIFETEST procedure. For details, see the ICLIFETEST documentation.

PURPOSE:
These macros compute nonparametric maximum likelihood estimates (NPMLEs) of survival curves from interval-censored data. Confidence intervals for survival curves and log-rank tests comparing survival curves from several groups are also provided.
HISTORY:
Version
Update Notes
2.2Added fuzzing in the %EMICM macro to avoid floating point errors.
2.1The %EMICM and %ICSTEST macros are now numerically more stable. Both macros now accept a missing LEFT= value to represent left censoring. Observations with improper time or frequency values are rejected and the number of observations deleted are reported. %EMICM now gives the standard error of the survival probability rather than the variance.
2.0Added the %EMICM macro allowing estimation of survival curves using the ICM and EM-ICM algorithms. This macro is recommended over the %ICE macro as described below. Added the %ICSTEST macro allowing comparison of survival curves using the generalized log-rank tests of Zhao and Sun (2004) and Sun, Zhao, and Zhao (2005). The %ICE macro remains at Version 1.
1.0Initial coding of %ICE macro (13JUL93).
REQUIREMENTS:
All macros require Base SAS and SAS/IML. The %ICE macro also requires the XMACRO set of utility macros. The %EMICM macro requires SAS/GRAPH and SAS/IML in SAS 9.2 or later.
USAGE:

%EMICM Macro

Follow the instructions in the Downloads tab of this sample to save the %EMICM macro definition. Before calling the %EMICM macro, specify the following %INC statement in your SAS program or in the SAS editor window to define the macro and make it available for use. In the %INC statement, replace the text within quotes with the location of the %EMICM macro definition file on your system.

%inc "<location of your file containing the EMICM macro>";

Following this statement, you can call the %EMICM macro using the following syntax:

%EMICM(<list of macro arguments separated by commas>)

To use the %EMICM macro, prepare your data such that:

  • Li=0 or Li=. for a left-censored time,
  • Ri=. for a right-censored time, and
  • Li=Ri for an exact survival time.

For discussion and an example, see So, Johnston, and Kim (2010).

The following arguments may be listed within parentheses in any order, separated by commas. The LEFT= and RIGHT= arguments are required. All other arguments are optional.

DATA=      SAS data set to be analyzed. Default is DATA=_last_.

LEFT=      A numeric variable representing the left endpoint of the time interval. 
           Left-censored observations have a missing value or a value of 0.

RIGHT=     A numeric variable representing the right endpoint of the time interval.
           Right-censored observations have a missing value (note this is different 
           from the %ICE macro).

FREQ=      A single numeric variable whose values represent the frequency of occurrence 
           of the observations.

GROUP=     A variable identifying different treatment groups. A separate NPMLE is computed 
           for each value of the GROUP= variable.

METHOD=    Optimization technique for computing the NPMLE:
               EM     -- Turnbull's self-consistency algorithm
               ICM    -- Iterative Convex Minorant algorithm
               EMICM  -- EM-ICM algorithm
               Default is METHOD=EMICM.

OUT=       The output data set containing the NPMLE.

OUTITER=   The output data set containing the history of iterations. Variables ERROR1, 
           ERROR2, ERROR3, and ERROR4 represent the convergence measures of ERRORTYPE=1,
           ERRORTYPE=2, ERRORTYPE=3, and ERRORTYPE=4, respectively.

ERRORTYPE= Convergence criterion to be used:
              1 -- The maximum of the closeness of consecutive estimates,
              2 -- The closeness of the log likelihood function,
              3 -- The gradient of the log likelihood function,
              4 -- The maximum measures of  ERRORTYPE=1, ERRORTYPE=2, and ERRORTYPE=3.
              Default is ERRORTYPE=1.

RATECONV=   Rate of convergence for the selected ERRORTYPE. Default is RATECONV=1e-7.

mRS=        Number of resampling for the generalized Greenwood formula. Default is mRS=50.

SEED=       Random seed for resampling for the generalized Greenwood formula. Default is
            SEED=8375.

TITLE=      Primary title for the NPMLE plot.

TITLE2=     Secondary title for the NPMLE plot.

TIMELABEL=  Label of time axis in the NPMLE plot.

OPTIONS=    List of display options (separated by blanks):
                 NOTABLE -- Suppressing the table of the NPMLE.
                 PLOT    -- Graphical display of the estimated survival curve.

%ICSTEST Macro

Follow the instructions in the Downloads tab of this sample to save the %ICSTEST macro definition. Before calling the %ICSTEST macro, specify the following %INC statement in your SAS program or in the SAS editor window to define the macro and make it available for use. In the %INC statement, replace the text within quotes with the location of the %ICSTEST macro definition file on your system.

%inc "<location of your file containing the ICSTEST macro>";

Following this statement, you can call the %ICSTEST macro using the following syntax:

%ICSTEST(<list of macro arguments separated by commas>)

To use the %ICSTEST macro, prepare your data such that:

  • Li=0 or Li=. for a left-censored time,
  • Ri=. for a right-censored time, and
  • Li=Ri for an exact survival time.

For discussion and an example, see So, Johnston, and Kim (2010).

The following arguments may be listed within parentheses in any order, separated by commas. The LEFT=, RIGHT=, and GROUP= arguments are required. All other arguments are optional.

DATA=      SAS data set to be analyzed. Default is DATA=_last_.

LEFT=      A numeric variable representing the left endpoint of the time interval. 
           Left-censored observations have a missing value or a value of 0.

RIGHT=     A numeric variable representing the right endpoint of the time interval.
           Right-censored observations have a missing value (note this is different 
           from the %ICE macro).

FREQ=      A single numeric variable whose values represent the frequency of occurrence 
           of the observations.

GROUP=     A variable identifying different treatment groups.

ERRORTYPE= Convergence criterion to be used.
              1 -- The maximum of the closeness of consecutive estimates,
              2 -- The closeness of the log likelihood function,
              3 -- The gradient of the log likelihood function,
              4 -- The maximum measures of  ERRORTYPE=1, ERRORTYPE=2, and ERRORTYPE=3.
              Default is ERRORTYPE=1.

RATECONV=  Rate of convergence for the selected ERRORTYPE. Default is RATECONV=1e-7.

mRS=       Number of resampling for the generalized Greenwood formula. Default is mRS=50.

SEED=      Random seed for resampling for the generalized Greenwood formula. Default is
           SEED=8375.

%ICE Macro

First, save the XMACRO set of utility macros to a file on your system. Then follow the instructions in the Downloads tab of this sample to save and edit the %ICE macro definition. Before calling the %ICE macro, specify the following %INC statement in your SAS program or in the SAS editor window to define the macro and make it available for use. In the %INC statement, replace the text within quotes with the location of the %ICE macro definition file on your system.

%inc "<location of your file containing the ICE macro>";

Following this statement, you can call the %ICE macro using the following syntax:

%ICE(<list of macro arguments separated by commas>)

To use the %ICE macro, prepare your data such that:

  • Li=0 for a left-censored time,
  • For a right-censored time, set Ri to an arbitrary fixed value beyond the last examination time, and
  • Li=Ri for an exact survival time.

For an example, see the Results tab.

The following arguments may be listed within parentheses in any order, separated by commas. Only the TIME= argument is required, all other arguments are optional.

   DATA=      SAS data set to be analyzed.
   
   BY=        List of variables for BY groups.

   TIME=      Two variables (separated by blanks) representing the left
              and right endpoints of the time interval. You may enclosed
              these variable names by a pair of parentheses/brackets/
              braces, but a comma should not be used to separate the
              names.  See above for how to represent exact, left, and right 
              censored times.

   FREQ=      A single numeric variable whose values represent the
              frequency of occurrence of the observations.

   TECH=      Optimization technique for maximizing the likelihood. Valid
              values are:

              NRA --  Newton-Raphson Ridge
              QN  --  Quasi-Newton
              CG  --  Conjugate Gradient
              EM  --  Self-Consistency Algorithm of Turnbull

              NRA, QN and CG are NLP optimization routines. EM is the
              self-consistency algorithm. With m as the number of
              estimated parameters, the default technique is

                    NRA  if m <=30
                    QN   if 30 < m <= 200
                    CG   if m > 200


   LBOUND=    Lower bound for the estimated parameters. The
              default is 1e-6. Only used in the NRA, QN and
              CG techniques.

   ALPHA=     A number between 0 and 1 that sets the level of the
              confidence intervals for the survival curve. The
              confidence level for the intervals is 1-ALPHA. The default
              is .05.

   OPTIONS=   List of display options (separated by blanks):

              NOPRINT  Suppress printing of the parameter estimates,
                       the survival curve estimates and confidence
                       limits for the survival curve.

              PLOT     Graphical display of the estimated survival
                       curve.

   NLPOPT=    An IML row vector to be passed into the OPT argument of
              the NLP optimization routines. This vector controls the
              option vector of the NLP optimization routine. The default
              is {1 0}.

   NLPTC=     An IML row vector to be passed into the TC argument
              of the NLP optimization routine. This vector controls
              the termination criteria of the NLP optimization
              routine. The default is {2000 5000}.

   EMCONV=    Convergence criterion for the EM technique. Convergence
              is declared if the increase in the log-likelihood
              is less than the convergence criterion. The default is
              1e-8.

   OUTE=      A SAS data set name containing the parameter estimates.

   OUTS=      A SAS data set name containing the estimates of the
              survival curve and the corresponding confidence limits.

The following statements, specified before the %ICE macro call, may be useful for diagnosing errors:

   %let _notes_=1;            Prints SAS notes for all steps
   %let _echo_=1;             Prints the arguments to the ICE macro
   %let _echo_=2;             Prints the arguments to the ICE macro
                                 after defaults have been set
   options mprint;            Prints SAS code generated by the macro
   options mlogic symbolgen;  Prints lots of macro debugging info

To turn off the extra information produced by the above statements, specify these statements as needed:

   %let _notes_=0;
   %let _echo_=0;
   options nomprint; 
   options nomlogic nosymbolgen;

For the %EMICM and %ICSTEST macros, the version of the macro that you are using is displayed when you specify version (or any string) as the first argument. For example:

   %EMICM(version, data=mydata, ...other options...)
DETAILS:

Estimation of survival curves

The data for the ith subject consist of an interval of the form [Li,Ri]. Under the survival curve G, the likelihood for the ith observation is

G(Li-) - G(Ri+)

Let [q1,p1],[q2,p2], ... ,[qm,pm] be the set of disjoint intervals whose left and right end points lie in the set {Li : 1≤i≤N} and {Ri :1≤i≤N} respectively, and which contain no other members of {Li} or {Ri} except at their end points. For 1≤j≤m, define Θj= G(qj-)-G(pj+). Then Θj≥0 and Σj Θj=1. The likelihood is proportional to

Lik(Θ12, ... ,Θm)= ΠijαijΘj]

where αij=1 if [qj,pj] lies in [Li,Ri] and 0 otherwise. The nonparametric maximum likelihood estimates (NPMLEs) of Θ1, ... ,Θm are obtained by an NLP optimization routine or using Turnbull's (1976) self-consistency equation which can be solved using an expectation-maximization (EM) algorithm. These are available in the %ICE macro. The estimated variance is computed by inverting the negative of the Hessian matrix evaluated at the NPMLE.

Even with a moderate number of parameters, the EM algorithm is very slow. The iterative convex minorant (ICM) algorithm of Groeneboom and Wellner (1992) and the EM iterative convex minorant algorithm (EM-ICM) of Wellner and Zhan (1997) are much more efficient methods of computing the NPMLE than the EM algorithm. The latter algorithm converges to the NPMLE if it exists and is unique. The EM algorithm, the ICM, and the EM-ICM methods are available in the %EMICM macro. The estimated variance is computed based on the generalized Greenwood formula (Sun 2001).

While the %ICE and %EMICM macros both estimate the survival curve for interval-censored data, the %EMICM macro is recommended because of the increased efficiency of its optimization algorithm as described above. In addition, the %EMICM macro can create a plot with overlaid survival curves for multiple treatment groups.

Comparison of survival curves

Zhao and Sun (2004) generalized the log-rank test of Sun (1996) to include exact failure times in the interval-censored data. They also use an imputation approach to compute the variance of this generalized log-rank statistic. Sun, Zhao, and Zhao (2005) propose a new class of k-sample test for interval-censored data. Both tests are available in the %ICSTEST macro.

Macro updates

The %EMICM and %ICSTEST macros attempt to check for a later version of themselves. If a macro is unable to do this (such as if there is no active internet connection available), the macro will issue a message like the following:

   EMICM: Unable to check for newer version

The computations performed by the macro are not affected by the appearance of this message.

SEE ALSO:
See the XMACRO utility macros.
REFERENCES:
So, Y., Johnston, G., and Kim, S.H. (2010), "Analyzing Interval-Censored Survival Data with SAS® Software," Proceedings of the SAS® Global Forum 2010 Conference, Cary, NC: SAS Institute Inc.

For additional references, see the References section of So, Johnston, and Kim (2010). A few errors appear in the paper and are presented below:

Errata for So, Johnston, and Kim (2010)

In the call to the %ICSTEST macro, the comma after Therapy should be omitted as follows:

      ***************************************************
      *          Generalized Logrank Test I             *
      ***************************************************;
      %ICSTEST(data=BreastCancer,
               left=lTime,
               right=rTime,
               group=Therapy
               );

The results shown in the paper as "Figure 4: Generalized Log-Rank Test I for the Breast Cancer Data" are incorrect for the data as presented in the paper. Correct results for these data are computed by the macro.

The paper shows the %ICE macro producing the generalized log-rank test of Sun, Zhao, and Zhao (2005). This test is now provided, along with the test of Zhao and Sun (2004), by the %ICSTEST macro.




These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.