### Recurrence Data from Repairable Systems

Failures in a system that can be repaired are sometimes modeled as recurrence data, or recurrent events data. When a repairable system fails, it is repaired and placed back in service. As a repairable system ages, it accumulates a history of repairs and costs of repairs. The mean cumulative function (MCF) is defined as the population mean of the cumulative number (or cost) of repairs up until time t. You can use the RELIABILITY procedure to compute and plot nonparametric estimates and plots of the MCF for the number of repairs or the cost of repairs. The Nelson (1995) confidence limits for the MCF are also computed and plotted. You can compute and plot estimates of the difference of two MCFs and confidence intervals. This is useful for comparing the repair performance of two systems.

See Nelson (2002, 1995, 1988), Doganaksoy and Nelson (1998), and Nelson and Doganaksoy (1989) for discussions and examples of nonparametric analysis of recurrence data.

You can also fit a parametric model for recurrent event data and display the resulting model on a plot, along with nonparametric estimates of the MCF.

See Rigdon and Basu (2000), Tobias and Trindade (1995), and Meeker and Escobar (1998) for discussions of parametric models for recurrent events data.

#### Nonparametric Analysis

##### Recurrent Events Data with Exact Ages

See the section Analysis of Recurrence Data on Repairs and the section Comparison of Two Samples of Repair Data for examples of the analysis of recurrence data with exact ages.

Formulas for the MCF estimator and the variance of the estimator Var are given in Nelson (1995). Table 16.70 shows a set of artificial repair data from Nelson (1988). For each system, the data consist of the system and cost for each repair. If you want to compute the MCF for the number of repairs, rather than cost of repairs, then you should set the cost equal to 1 for each repair. A plus sign (+) in place of a cost indicates that the age is a censoring time. The repair history of each system ends with a censoring time.

Table 16.70: System Repair Histories for Artificial Data

Unit

(Age in Months, Cost in $100) 6 (5,$3)

(12,$1) (12,+) 5 (16,+) 4 (2,$1)

(8,$1) (16,$2)

(20,+)

3

(18,$3) (29,+) 2 (8,$2)

(14,$1) (26,$1)

(33,+)

1

(19,$2) (39,$2)

(42,+)

Table 16.71 illustrates the calculation of the MCF estimate from the data in Table 16.70. The RELIABILITY procedure uses the following rules for computing the MCF estimates.

1. Order all events (repairs and censoring) by age from smallest to largest.

• If the event ages of the same or different systems are equal, the corresponding data are sorted from the largest repair cost to the smallest. Censoring events always sort as smaller than repair events with equal ages.

• When event ages and values of more than one system coincide, the corresponding data are sorted from the largest system identifier to the smallest. The system IDs can be numeric or character, but they are always sorted in ASCII order.

2. Compute the number of systems I in service at the current age as the number in service at the last repair time minus the number of censored units in the intervening times.

3. For each repair, compute the mean cost as the cost of the current repair divided by the number in service I.

4. Compute the MCF for each repair as the previous MCF plus the mean cost for the current repair.

Table 16.71: Calculation of MCF for Artificial Data

 Number I in Mean Event (Age,Cost) Service 1 (2,$1) 6$1/6=0.17 0.17 2 (5,$3) 6$3/6=0.50 0.67 3 (8,$2) 6$2/6=0.33 1.00 4 (8,$1) 6$1/6=0.17 1.17 5 (12,$1) 6$1/6=0.17 1.33 6 (12,+) 5 7 (14,$1) 5$1/5=0.20 1.53 8 (16,$2) 5$2/5=0.40 1.93 9 (16,+) 4 10 (18,$3) 4$3/4=0.75 2.68 11 (19,$2) 4$2/4=0.50 3.18 12 (20,+) 3 13 (26,$1) 3$1/3=0.33 3.52 14 (29,+) 2 15 (33,+) 1 16 (39,$2) 1$2/1=2.00 5.52 17 (42,+) 0

If you specify the VARIANCE=NELSON option, the variance of the estimator of the MCF Var is computed as in Nelson (1995). If the VARIANCE=LAWLESS or VARMETHOD2 option is specified, the method of Lawless and Nadeau (1995) is used to compute the variance of the estimator of the MCF. This method is recommended if the number of systems or events is large or if a FREQ statement is used to specify a frequency variable. If you do not specify a variance computation method, the method of Lawless and Nadeau (1995) is used.

Default approximate two-sided pointwise confidence limits for are computed as

where represents the percentile of the standard normal distribution.

If you specify the LOGINTERVALS option in the MCFPLOT statement, alternative confidence intervals based on the asymptotic normality of , rather than of , are computed. Let

Then the limits are computed as

These alternative limits are always positive, and can provide better coverage than the default limits when the MCF is known to be positive, such as for counts or for positive costs. They are not appropriate for MCF differences, and are not computed in this case.

The following SAS statements create the tabular output shown in Figure 16.58 and the plot shown in Figure 16.59:

data Art;
datalines;
eq    . 1.5
;

##### Recurrent Events Data with Exact Event Ages

Let there be m independent systems observed, and let be the times of observed events for system i. Let the last time of observation of system i be , with .

If there are no regression parameters in the model, or there are regression parameters and they are constant for each system, then the log-likelihood function is

If there are regression parameters that can change over time for individual systems, the RELIABILITY procedure uses the convention that a covariate value specified at a given event time takes effect immediately after the event time; that is, the value of a covariate used at an event time is the value specified at the previous event time. The value used at the first event time is the value specified at that event time. You can establish a different value for the first event time by specifying a zero cost event previous to the first actual event. The zero cost event is not used in the analysis, but it is used to establish a covariate value for the next event time. The covariate value used at the end time is the value established at the last event time.

With these conventions, the log likelihood is

with for each . Note that this log likelihood reduces to the previous log likelihood if covariate values do not change over time for each system.

In order to specify a parametric model for recurrence data with exact event times, you specify the event times, end of observation times, and regression model, if any, with a MODEL statement, as described in the section MODEL Statement. In addition, you specify a variable that uniquely identifies each system with a UNITID statement. See the section Parametric Model for Recurrent Events Data for an example of fitting a parametric recurrent events model to data with exact recurrence times.

##### Recurrent Events Data with Interval Event Ages

If n independent and statistically identical systems are observed in the time interval , then the number r of events that occur in the interval is a Poisson random variable with mean , where is the cumulative mean function for an individual system.

Let be nonoverlapping time intervals for which events are observed among the systems observed in time interval . The parameters in the mean function are estimated by maximizing the log likelihood

The time intervals do not have to be of the same length, and they do not have to be adjacent, although the preceding formula shows them as adjacent.

If you have data from groups of systems to which you are fitting a regression model (for example, to model the effects of different manufacturing lines or different vendors), the time intervals in the different groups do not have to coincide. The only requirement is that the data in the different groups be independent; for example, you cannot have data from the same systems in two different groups.

In order to specify a parametric model for interval recurrence data, you specify the time intervals and regression model, if any, with a MODEL statement, as described in the section MODEL Statement. In addition, you specify a variable that contains the number of systems under observation in time interval i with an NENTER statement, and the number of events observed with a FREQ statement. See the section Parametric Model for Interval Recurrent Events Data for an example of fitting a parametric recurrent events model to data with interval recurrence times.

#### Duane Plots

A Duane plot is defined as a graph of the quantity versus , where is the MCF. The graph axes are usually both on the log scale, so that if is the power law type in Table 16.72, a linear graph is produced. Duane plots are traditionally used as a visual assessment of the goodness of fit of a power law model. You should exercise caution in using Duane plots, because even if the underlying model is a power law process, a nonlinear Duane plot can result. See Rigdon and Basu (2000, section 4.1.1) for a discussion of Duane plots. You can create a Duane plot by specifying the DUANE option in the MCFPLOT statement. A scatter plot of nonparametric estimates of versus is created on a log-log scale, where are the nonparametric estimates of the MCF that are described in the section Nonparametric Analysis. If you specify a parametric model with the FIT=MODEL option in the MCFPLOT statement, the corresponding parametric estimate of is plotted on the same graph as the scatter plot.