In fitting a Cox model, the phenomenon of monotone likelihood is observed if the likelihood converges to a finite value while at least one parameter diverges (Heinze and Schemper, 2001).
Let denote the vector explanatory variables for the lth individual at time t. Let denote the k distinct, ordered event times. Let denote the multiplicity of failures at ; that is, is the size of the set of individuals that fail at . Let denote the risk set just before . Let be the vector of regression parameters. The Breslow log partial likelihood is given by
Denote
Then the score function is given by
and the Fisher information matrix is given by
Heinze (1999); Heinze and Schemper (2001) applied the idea of Firth (1993) by maximizing the penalized partial likelihood
The score function is replaced by the modified score function by , where
The Firth estimate is obtained iteratively as
The covariance matrix is computed as , where is the maximum penalized partial likelihood estimate.