The do file with all the code from this tutorial is available to download here

An area of intense research in recent years is in the field of joint frailty survival models, for the analysis of joint recurrent event and terminal event data. Here I’ll describe and implement the two most popular approaches, originally proposed by Liu et al. (2004) and Mazroui et al. (2012). We have a survival model for the recurrent event process, and a survival model for the terminal event process, linked through shared random effects.

I’ll illustrate these models using the `readmission`

dataset that comes with the extensive `frailtypack`

in `R`

(Król et al. (2017)), developed by Virginie Rondeau and her group.
The dataset has information on rehospitalisation times after surgery in patients diagnosed with colorectal cancer.
Covariates of interest include gender, Dukes’ tumour stage and comorbidity Charlson index. I’ll load the dataset
and generate binary indicator variables for use later.

```
. use "https://www.mjcrowther.co.uk/data/jointfrailty_example",clear
. tab sex, gen(gender)
sex | Freq. Percent Cum.
------------+-----------------------------------
Female | 312 36.24 36.24
Male | 549 63.76 100.00
------------+-----------------------------------
Total | 861 100.00
. tab dukes, gen(dk)
dukes | Freq. Percent Cum.
------------+-----------------------------------
A-B | 324 37.63 37.63
C | 331 38.44 76.07
D | 206 23.93 100.00
------------+-----------------------------------
Total | 861 100.00
. tab charlson, gen(ch)
charlson | Freq. Percent Cum.
------------+-----------------------------------
0 | 577 67.02 67.02
1-2 | 46 5.34 72.36
3 | 238 27.64 100.00
------------+-----------------------------------
Total | 861 100.00
```

The time of rehospitalisation is stored in `time`

, with corresponding event indicator `event`

. This is in clock-reset
formulation, i.e. each time a patient has a cancer recurrence the clock is reset to zero, so it’s a semi-Markov model.
Our overall survival time is stored in `stime`

, with corresponding event indicator stored in `death`

.

In the first specification, we have one random effect which accounts for the correlation between recurrent events, and which is then included in the linear predictor for the terminal event model, with an accompanying estimated coefficient, which estimates the strength of the relationship between the two processes. So for example,

$$ h_{ij}(t) = h_{0}(t) \exp (X_{1ij} \beta_{1} + b_{i}) $$ and $$ \lambda_{i}(u) = \lambda_{0}(u) \exp (X_{1i} \beta_{2} + \alpha b_{i}) $$

where $h_{ij}(t)$ is the hazard function for the $j$th event of the $i$th patient, $\lambda_{i}(u)$ is the hazard function for the terminal event, and $b_{i} \sim N(0,\sigma^2)$.

We can fit such a model with `merlin`

, adjusting for all available covariates in each outcome model,

```
. merlin (time gender1 dk2 dk3 ch2 ch3 M1[id]@1 ///
> , family(rp, df(3) failure(event))) ///
> (stime gender1 dk2 dk3 ch2 ch3 M1[id] ///
> , family(rp, df(1) failure(death)))
variables created: _rcs1_1 to _rcs1_3
variables created: _rcs2_1 to _rcs2_1
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -4188.9255
Iteration 1: log likelihood = -4175.3708 (not concave)
Iteration 2: log likelihood = -4148.1729
Iteration 3: log likelihood = -4143.6946
Iteration 4: log likelihood = -4143.6239
Iteration 5: log likelihood = -4143.6239
Mixed effects regression model Number of obs = 861
Log likelihood = -4143.6239
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time: |
gender1 | -.4603478 .1407838 -3.27 0.001 -.7362791 -.1844166
dk2 | .4080402 .157101 2.60 0.009 .100128 .7159525
dk3 | 1.395737 .2068641 6.75 0.000 .9902903 1.801183
ch2 | .3244107 .2581746 1.26 0.209 -.1816021 .8304235
ch3 | .3478034 .1353953 2.57 0.010 .0824335 .6131732
M1[id] | 1 . . . . .
_cons | -2.126217 .1482472 -14.34 0.000 -2.416776 -1.835657
-------------+----------------------------------------------------------------
stime: |
gender1 | -.3533375 .2254812 -1.57 0.117 -.7952725 .0885975
dk2 | .9249671 .3356416 2.76 0.006 .2671216 1.582813
dk3 | 2.96099 .4042653 7.32 0.000 2.168644 3.753335
ch2 | .7341089 .6293614 1.17 0.243 -.4994168 1.967635
ch3 | 1.058569 .2494584 4.24 0.000 .5696393 1.547498
M1[id] | .9107484 .2383294 3.82 0.000 .4436314 1.377865
_cons | -3.24061 .3218232 -10.07 0.000 -3.871372 -2.609849
-------------+----------------------------------------------------------------
id: |
sd(M1) | .8002135 .082537 .6537471 .9794943
------------------------------------------------------------------------------
Warning: Baseline spline coefficients not shown - use ml display
```

We now have two model specifications, each of which can be completely different. I’ve used Royston-Parmar survival models. We define a random effect with the same name in each linear predictor, which means only one random effect has been defined, but it is included in both outcome models. In the first model, I’ve simply constrained the coefficient of the random effect to be 1. This provides a highly convenient way of linking random effects between outcome models. Our results show a highly positive estimate of associtaion of 0.910 (95% CI: 0.444, 1.378) showing substantial association between the two event processes, i.e. a higher rate of rehospitalisation also increases the mortality rate.

The second approach consists of two random effects included in the recurrent event model, one of which is also included in the linear predictor of the terminal event model, as follows,

$$ h_{ij}(t) = h_{0}(t) \exp (X_{1ij} \beta_{1} + b_{1i} + b_{2i}) $$ and $$ \lambda_{i}(u) = \lambda_{0}(u) \exp (X_{1i} \beta_{2} + b_{2i}) $$

where $b_{1i} \sim N(0,\sigma_{1}^2)$ and $b_{2i} \sim N(0,\sigma_{2}^2)$. Our $\sigma_{2}$ parameter then
indicates the association between the two processes. We can fit this model with `merlin`

, this time illustrating
how to use different distributions for the recurrent event and terminal event processes,

```
. merlin (time gender1 dk2 dk3 ch2 ch3 M1[id]@1 M2[id]@1 ///
> , family(rp, df(3) failure(event))) ///
> (stime gender1 dk2 dk3 ch2 ch3 M2[id]@1 ///
> , family(rp, df(1) failure(death)))
variables created: _rcs1_1 to _rcs1_3
variables created: _rcs2_1 to _rcs2_1
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -4211.9343
Iteration 1: log likelihood = -4149.7324
Iteration 2: log likelihood = -4142.7635
Iteration 3: log likelihood = -4142.5391
Iteration 4: log likelihood = -4142.5347
Iteration 5: log likelihood = -4142.5346
Mixed effects regression model Number of obs = 861
Log likelihood = -4142.5346
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time: |
gender1 | -.461712 .1439655 -3.21 0.001 -.7438791 -.1795448
dk2 | .4038239 .1597776 2.53 0.011 .0906655 .7169823
dk3 | 1.370284 .2084699 6.57 0.000 .9616907 1.778878
ch2 | .3259101 .2628604 1.24 0.215 -.1892868 .8411069
ch3 | .3437384 .1366295 2.52 0.012 .0759494 .6115273
M1[id] | 1 . . . . .
M2[id] | 1 . . . . .
_cons | -2.153936 .1511434 -14.25 0.000 -2.450172 -1.857701
-------------+----------------------------------------------------------------
stime: |
gender1 | -.3586172 .228271 -1.57 0.116 -.8060201 .0887856
dk2 | .9218063 .3362933 2.74 0.006 .2626836 1.580929
dk3 | 2.992598 .371162 8.06 0.000 2.265133 3.720062
ch2 | .7382451 .6340333 1.16 0.244 -.5044374 1.980928
ch3 | 1.130689 .2543359 4.45 0.000 .6321992 1.629178
M2[id] | 1 . . . . .
_cons | -3.267832 .2988515 -10.93 0.000 -3.85357 -2.682094
-------------+----------------------------------------------------------------
id: |
sd(M1) | .3860378 .1480128 .1820816 .8184524
sd(M2) | .7341392 .0978276 .5653946 .9532463
------------------------------------------------------------------------------
Warning: Baseline spline coefficients not shown - use ml display
```

This defines two random effects at the `id`

level, which by default have an independent covariance structure.
Our estimate of $\sigma_{1}$ is 0.386 (95% CI: 0.182, 0.818),
and similarly we obtain an estimate for $\sigma_{2}$ of 0.734 (95% CI: 0.565, 0.953).

From both a theoretical and interpretation perspective, I prefer the first of these two joint frailty models. If there was no association, then $\sigma_{1} = 0$, which is at the boundary of its parameter space, and hence unlikely to converge, compared to $\alpha = 0$ from model 1.

Extensions including higher levels, time-dependent effects, non-linear covariate effects and the relative survival extension can be included very simply.

This example used `merlin`

version 2.0.2.