Survival Analysis

Survival analysis

So far, have seen:
- response variable counted or measured (regression)
- response variable categorized (logistic regression)
But what if response is time until event (eg. time of survival after surgery)?
Additional complication: event might not have happened at end of study (eg. patient still alive). But knowing that patient has “not died yet” presumably informative. Such data called censored.

… continued

Enter survival analysis, in particular the “Cox proportional hazards model”.
Explanatory variables in this context often called covariates.

Packages

Install package survival if not done. Also use broom and marginaleffects from earlier.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(survival)
library(broom)
library(marginaleffects)

Example: still dancing?

12 women who have just started taking dancing lessons are followed for up to a year, to see whether they are still taking dancing lessons, or have quit. The “event” here is “quit”.
This might depend on:
- a treatment (visit to a dance competition)
- woman’s age (at start of study).

Data

Months  Quit   Treatment Age
1        1        0      16
2        1        0      24
2        1        0      18
3        0        0      27
4        1        0      25
7        1        1      26
8        1        1      36
10       1        1      38
10       0        1      45
12       1        1      47

About the data

months and quit are kind of combined response:
- Months is number of months a woman was actually observed dancing
- quit is 1 if woman quit, 0 if still dancing at end of study.
Treatment is 1 if woman went to dance competition, 0 otherwise.
Fit model and see whether Age or Treatment have effect on survival.
Want to do predictions for probabilities of still dancing as they depend on whatever is significant, and draw plot.

Read data

Column-aligned:

url <- "http://ritsokiguess.site/datafiles/dancing.txt"
dance <- read_table(url)


── Column specification ────────────────────────────────────────────────────────
cols(
  Months = col_double(),
  Quit = col_double(),
  Treatment = col_double(),
  Age = col_double()
)

The data

dance

# A tibble: 12 × 4
   Months  Quit Treatment   Age
    <dbl> <dbl>     <dbl> <dbl>
 1      1     1         0    16
 2      2     1         0    24
 3      2     1         0    18
 4      3     0         0    27
 5      4     1         0    25
 6      5     1         0    21
 7     11     1         0    55
 8      7     1         1    26
 9      8     1         1    36
10     10     1         1    38
11     10     0         1    45
12     12     1         1    47

Fit model

Response variable has to incorporate both the survival time (Months) and whether or not the event, quitting, happened (that is, if Quit is 1).
This is made using Surv from survival package, with two inputs:
- the column that has the survival times
- something that is TRUE or 1 if the event happened.
Easiest for us to create this when we fit the model, predicting response from explanatories:

dance.1 <- coxph(Surv(Months, Quit) ~ Treatment + Age, 
                 data = dance)

What does `Surv` output actually look like?

dance %>% mutate(y = Surv(Months, Quit)) %>% 
  slice(1:6) # top 6 rows to fit

# A tibble: 6 × 5
  Months  Quit Treatment   Age      y
   <dbl> <dbl>     <dbl> <dbl> <Surv>
1      1     1         0    16     1 
2      2     1         0    24     2 
3      2     1         0    18     2 
4      3     0         0    27     3+
5      4     1         0    25     4 
6      5     1         0    21     5

Output looks a lot like regression

summary(dance.1)

Call:
coxph(formula = Surv(Months, Quit) ~ Treatment + Age, data = dance)

  n= 12, number of events= 10 

              coef exp(coef) se(coef)      z Pr(>|z|)  
Treatment -4.44915   0.01169  2.60929 -1.705   0.0882 .
Age       -0.36619   0.69337  0.15381 -2.381   0.0173 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

          exp(coef) exp(-coef) lower .95 upper .95
Treatment   0.01169     85.554 7.026e-05    1.9444
Age         0.69337      1.442 5.129e-01    0.9373

Concordance= 0.964  (se = 0.039 )
Likelihood ratio test= 21.68  on 2 df,   p=2e-05
Wald test            = 5.67  on 2 df,   p=0.06
Score (logrank) test = 14.75  on 2 df,   p=6e-04

Conclusions

Use \(\alpha=0.10\) here since not much data.
Three tests at bottom like global F-test. Consensus that something predicts survival time (whether or not dancer quit and/or how long it took).
Age (definitely), Treatment (marginally) both predict survival time.

Behind the scenes

All depends on hazard rate, which is based on probability that event happens in the next short time period, given that event has not happened yet:
\(X\) denotes time to event, \(\delta\) is small time interval:
\(h(t) = P(X \le t + \delta | X \ge t) / \delta\)
if \(h(t)\) large, event likely to happen soon (lifetime short)
if \(h(t)\) small, event unlikely to happen soon (lifetime long).

Modelling lifetime

want to model hazard rate
but hazard rate always positive, so actually model log of hazard rate
modelling how (log-)hazard rate depends on other things eg \(X_1 =\) age, \(X_2 =\) treatment, with the \(\beta\) being regression coefficients:
Cox model \(h(t)=h_0(t)\exp(\beta_0+\beta_1X_1+\beta_2 X_2+\cdots)\), or:
\(\log(h(t)) = \log(h_0(t)) + \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots\)
like a generalized linear model with log link.

Predictions with `marginaleffects`

Predicted survival probabilities depend on:
- the combination of explanatory variables you are looking at
- the time at which you are looking at them (when more time has passed, it is more likely that the event has happened, so the “survival probability” should be lower).
look at effect of age by comparing ages 20 and 40, and later look at the effect of treatment (values 1 and 0).
Also have to provide some times to predict for, in Months.

Effect of age

new <- datagrid(model = dance.1, 
                Age = c(20, 40), 
                Months = c(3, 5, 7))
new

  Quit Treatment Age Months rowid
1    1         0  20      3     1
2    1         0  20      5     2
3    1         0  20      7     3
4    1         0  40      3     4
5    1         0  40      5     5
6    1         0  40      7     6

These are actually for women who did not go to the dance competition.

The predictions

cbind(predictions(dance.1, newdata = new, type = "survival")) %>% 
  select(Age, Treatment, Months, estimate)

  Age Treatment Months      estimate
1  20         0      3  3.987336e-01
2  20         0      5  2.934959e-02
3  20         0      7 2.964394e-323
4  40         0      3  9.993936e-01
5  40         0      5  9.976749e-01
6  40         0      7  6.126327e-01

The estimated survival probabilities go down over time. For example a 20-year-old woman here has estimated probability 0.0293 of still dancing after 5 months.

A graph

We can plot the predictions over time for an experimental condition such as age. The key for plot_predictions is to put time first in the condition:

plot_predictions(dance.1, condition = c("Months", "Age"), 
                 type = "survival") +
     coord_cartesian(ylim = c(0, 1)) # y-axis from 0 to 1

Ignoring unknown labels:
• linetype : "Age"

Warning: Removed 60 rows containing missing values or values outside the scale range
(`geom_ribbon()`).

Comments

The plot picks some representative ages.
It is (usually) best to be up and to the right (has the highest chance of surviving longest).
Hence the oldest women have the best chance to still be dancing longest (the youngest women are most likely to quit soonest).

The effect of treatment

The same procedure will get predictions for women who did or did not go to the dance competition, at various times:

new <- datagrid(model = dance.1, Treatment = c(0, 1), 
                Months = c(3, 5, 7))
new

  Quit  Age Treatment Months rowid
1    1 31.5         0      3     1
2    1 31.5         0      5     2
3    1 31.5         0      7     3
4    1 31.5         1      3     4
5    1 31.5         1      5     5
6    1 31.5         1      7     6

The age used for predictions is the mean of all ages.

The predictions

cbind(predictions(dance.1, newdata = new, type = "survival")) %>% 
  select(Age, Treatment, Months, estimate)

   Age Treatment Months     estimate
1 31.5         0      3 9.864573e-01
2 31.5         0      5 9.490195e-01
3 31.5         0      7 1.646297e-05
4 31.5         1      3 9.998406e-01
5 31.5         1      5 9.993886e-01
6 31.5         1      7 8.792014e-01

Women of this age have a high (0.879) chance of still dancing after 7 months if they went to the dance competition, but much lower (almost zero) if they did not.

A graph

Again, time first, effect of interest second (as colours):

plot_predictions(dance.1, 
                 condition = c("Months", "Treatment"), 
                 type = "survival") + 
  coord_cartesian(ylim = c(0, 1)) -> g

The graph

Ignoring unknown labels:
• linetype : "Treatment"

Warning: Removed 50 rows containing missing values or values outside the scale range
(`geom_ribbon()`).

Comments

The survival curve for Treatment 1 is higher all the way along
Hence at any time, the women who went to the dance competition have a higher chance of still dancing than those who did not.

The model summary again

summary(dance.1)

Call:
coxph(formula = Surv(Months, Quit) ~ Treatment + Age, data = dance)

  n= 12, number of events= 10 

              coef exp(coef) se(coef)      z Pr(>|z|)  
Treatment -4.44915   0.01169  2.60929 -1.705   0.0882 .
Age       -0.36619   0.69337  0.15381 -2.381   0.0173 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

          exp(coef) exp(-coef) lower .95 upper .95
Treatment   0.01169     85.554 7.026e-05    1.9444
Age         0.69337      1.442 5.129e-01    0.9373

Concordance= 0.964  (se = 0.039 )
Likelihood ratio test= 21.68  on 2 df,   p=2e-05
Wald test            = 5.67  on 2 df,   p=0.06
Score (logrank) test = 14.75  on 2 df,   p=6e-04

Comments

The numbers in the coef column describe effect of that variable on log-hazard of quitting.
Both numbers are negative, so a higher value on both variables goes with a lower hazard of quitting:
- an older woman is less likely to quit soon (more likely to be still dancing)
- a woman who went to the dance competition (Treatment = 1) is less likely to quit soon vs. a woman who didn’t (more likely to be still dancing).

Model checking

With regression, usually plot residuals against fitted values.
Not quite same here (nonlinear model), but “martingale residuals” should have no pattern vs. “linear predictor”.
Use broom ideas to get them, in .resid and .fitted as below.
Martingale residuals can go very negative, so won’t always look normal.

Martingale residuals

dance.1 %>% augment(dance) %>% 
  ggplot(aes(x = .fitted, y = .resid)) + geom_point() + geom_smooth()

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

A more realistic example: lung cancer

When you load in an R package, get data sets to illustrate functions in the package.
One such is lung. Data set measuring survival in patients with advanced lung cancer.
Along with survival time, number of “performance scores” included, measuring how well patients can perform daily activities.
Sometimes high good, but sometimes bad!
Variables below, from the data set help file (?lung).

The variables

Uh oh, missing values

lung %>% select(meal.cal, wt.loss)

    meal.cal wt.loss
1       1175      NA
2       1225      15
3         NA      15
4       1150      11
5         NA       0
6        513       0
7        384      10
8        538       1
9        825      16
10       271      34
11      1025      27
12        NA      23
13        NA       5
14      1225      32
15      2600      60
16        NA      15
17      1150      -5
18      1025      22
19       238      10
20      1175      NA
21      1025      17
22      1175      -8
23        NA      16
24       975      13
25        NA       0
26       825       6
27      1025     -13
28      1075      20
29       875      -7
30       305      20
31      1025      -1
32       388      20
33        NA     -11
34       925     -15
35      1075      10
36      1025      NA
37       513      28
38       875       4
39      1225      24
40      1175      15
41       975      10
42      1075      11
43       363      27
44        NA      NA
45      1025       7
46       625     -24
47       463      30
48      1025      10
49      1425       2
50      1175       4
51        NA       9
52        NA       0
53      1025       0
54      1175       7
55      1300      15
56       725      NA
57       338       5
58        NA      18
59      1225      10
60      1075      -3
61       438       8
62      1300      68
63      1025      NA
64      1025       0
65      1125       0
66       825       8
67       538       2
68      1025       3
69      1039       0
70       488      23
71      1175      -1
72       538      29
73      1175       0
74        NA       3
75       825       3
76      1075      19
77      1300       0
78      1225      -2
79       731      15
80       169      30
81       768       5
82      1500      15
83      1425       8
84       588      -1
85      1025       1
86      1100      14
87      1150       1
88      1175       4
89       588      39
90       910       2
91       975      -1
92        NA      23
93       875       8
94       280      14
95        NA      13
96       288       7
97        NA      25
98        NA       0
99       910       0
100       NA      10
101      710      15
102     1175       3
103       NA       4
104       NA       0
105       NA      32
106      875      14
107      975      -3
108      925      NA
109      975       5
110      925      11
111      575      10
112     1175       5
113     1030       6
114       NA       1
115       NA      15
116      413      20
117      675      20
118     1300      30
119      613      24
120      346      11
121       NA       0
122      675      10
123      910       0
124      768      -3
125     1025      17
126      925      20
127     1075      13
128      993       0
129      750      28
130       NA       4
131      925      52
132       NA      20
133     1225       5
134       NA      49
135      313       6
136       96      37
137       NA       0
138     1075      NA
139      975      -5
140     1500      15
141     1225     -16
142      413      38
143     1500       8
144     1075       0
145      513      30
146       NA       2
147      775       2
148     1225      13
149      413      27
150       NA       0
151     1175      -2
152       NA       7
153       NA       0
154       NA       4
155     1025      10
156      713      20
157       NA       7
158      475      27
159      538      -2
160      825      17
161      588       8
162     2450       2
163     2450      36
164      875       2
165      413      16
166     1075       3
167       NA      33
168      860       4
169      730       0
170     1025       0
171      825       2
172     1225      10
173      768      37
174      338       6
175     1225      12
176     1025       0
177     1225      -2
178       NA      NA
179      588      13
180      588       0
181      975       5
182     1225      -5
183     1025      NA
184       NA      -1
185     1225       0
186      975       5
187      463      20
188     1300       8
189     1025      12
190     1225       8
191      488      14
192     1075      NA
193      513      NA
194      825      33
195     1300      -2
196     1175       6
197      825       0
198       NA       4
199      975       0
200     1275       0
201      488      37
202     2200       5
203     1025       0
204      635       1
205      413       0
206       NA      NA
207       NA      23
208     1025      -3
209       NA      NA
210       NA      10
211      488      -2
212      413      23
213     1075       0
214       NA      31
215       NA      10
216     1025      18
217       NA     -10
218      825       7
219      131       3
220      725      11
221     1500       2
222     1025       0
223       NA       0
224       NA       3
225     2350      -5
226     1025       5
227     1075       1
228     1060       0

A closer look

summary(lung)

      inst            time            status           age       
 Min.   : 1.00   Min.   :   5.0   Min.   :1.000   Min.   :39.00  
 1st Qu.: 3.00   1st Qu.: 166.8   1st Qu.:1.000   1st Qu.:56.00  
 Median :11.00   Median : 255.5   Median :2.000   Median :63.00  
 Mean   :11.09   Mean   : 305.2   Mean   :1.724   Mean   :62.45  
 3rd Qu.:16.00   3rd Qu.: 396.5   3rd Qu.:2.000   3rd Qu.:69.00  
 Max.   :33.00   Max.   :1022.0   Max.   :2.000   Max.   :82.00  
 NA's   :1                                                       
      sex           ph.ecog          ph.karno        pat.karno     
 Min.   :1.000   Min.   :0.0000   Min.   : 50.00   Min.   : 30.00  
 1st Qu.:1.000   1st Qu.:0.0000   1st Qu.: 75.00   1st Qu.: 70.00  
 Median :1.000   Median :1.0000   Median : 80.00   Median : 80.00  
 Mean   :1.395   Mean   :0.9515   Mean   : 81.94   Mean   : 79.96  
 3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.: 90.00   3rd Qu.: 90.00  
 Max.   :2.000   Max.   :3.0000   Max.   :100.00   Max.   :100.00  
                 NA's   :1        NA's   :1        NA's   :3       
    meal.cal         wt.loss       
 Min.   :  96.0   Min.   :-24.000  
 1st Qu.: 635.0   1st Qu.:  0.000  
 Median : 975.0   Median :  7.000  
 Mean   : 928.8   Mean   :  9.832  
 3rd Qu.:1150.0   3rd Qu.: 15.750  
 Max.   :2600.0   Max.   : 68.000  
 NA's   :47       NA's   :14

Remove obs with any missing values

lung %>% drop_na() -> lung.complete
lung.complete %>%
  select(meal.cal:wt.loss) %>%
  slice(1:10)

   meal.cal wt.loss
2      1225      15
4      1150      11
6       513       0
7       384      10
8       538       1
9       825      16
10      271      34
11     1025      27
15     2600      60
17     1150      -5

Check!

summary(lung.complete)

      inst            time            status           age       
 Min.   : 1.00   Min.   :   5.0   Min.   :1.000   Min.   :39.00  
 1st Qu.: 3.00   1st Qu.: 174.5   1st Qu.:1.000   1st Qu.:57.00  
 Median :11.00   Median : 268.0   Median :2.000   Median :64.00  
 Mean   :10.71   Mean   : 309.9   Mean   :1.719   Mean   :62.57  
 3rd Qu.:15.00   3rd Qu.: 419.5   3rd Qu.:2.000   3rd Qu.:70.00  
 Max.   :32.00   Max.   :1022.0   Max.   :2.000   Max.   :82.00  
      sex           ph.ecog          ph.karno        pat.karno     
 Min.   :1.000   Min.   :0.0000   Min.   : 50.00   Min.   : 30.00  
 1st Qu.:1.000   1st Qu.:0.0000   1st Qu.: 70.00   1st Qu.: 70.00  
 Median :1.000   Median :1.0000   Median : 80.00   Median : 80.00  
 Mean   :1.383   Mean   :0.9581   Mean   : 82.04   Mean   : 79.58  
 3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.: 90.00   3rd Qu.: 90.00  
 Max.   :2.000   Max.   :3.0000   Max.   :100.00   Max.   :100.00  
    meal.cal         wt.loss       
 Min.   :  96.0   Min.   :-24.000  
 1st Qu.: 619.0   1st Qu.:  0.000  
 Median : 975.0   Median :  7.000  
 Mean   : 929.1   Mean   :  9.719  
 3rd Qu.:1162.5   3rd Qu.: 15.000  
 Max.   :2600.0   Max.   : 68.000

No missing values left.

Model 1: use everything except `inst`

names(lung.complete)

 [1] "inst"      "time"      "status"    "age"       "sex"      
 [6] "ph.ecog"   "ph.karno"  "pat.karno" "meal.cal"  "wt.loss"

Event was death, goes with status of 2:

lung.1 <- coxph(
  Surv(time, status == 2) ~ . - inst - time - status,
  data = lung.complete
)

Warning in terms.formula(formula, specials = ss, data = data):
'varlist' has changed (from nvar=9) to new 11 after EncodeVars() --
should no longer happen!

Warning in terms.formula(formula, special, data = data): 'varlist'
has changed (from nvar=9) to new 11 after EncodeVars() -- should no
longer happen!

Warning in coxph(Surv(time, status == 2) ~ . - inst - time - status,
data = lung.complete): a variable appears on both the left and right
sides of the formula

“Dot” means “all the other variables”.

`summary` of model 1

summary(lung.1)

Call:
coxph(formula = Surv(time, status == 2) ~ . - inst - time - status, 
    data = lung.complete)

  n= 167, number of events= 120 

                coef  exp(coef)   se(coef)      z Pr(>|z|)   
age        1.080e-02  1.011e+00  1.160e-02  0.931  0.35168   
sex       -5.536e-01  5.749e-01  2.016e-01 -2.746  0.00603 **
ph.ecog    7.395e-01  2.095e+00  2.250e-01  3.287  0.00101 **
ph.karno   2.244e-02  1.023e+00  1.123e-02  1.998  0.04575 * 
pat.karno -1.207e-02  9.880e-01  8.116e-03 -1.488  0.13685   
meal.cal   2.835e-05  1.000e+00  2.594e-04  0.109  0.91298   
wt.loss   -1.420e-02  9.859e-01  7.766e-03 -1.828  0.06748 . 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

          exp(coef) exp(-coef) lower .95 upper .95
age          1.0109     0.9893    0.9881    1.0341
sex          0.5749     1.7395    0.3872    0.8534
ph.ecog      2.0950     0.4773    1.3479    3.2560
ph.karno     1.0227     0.9778    1.0004    1.0455
pat.karno    0.9880     1.0121    0.9724    1.0038
meal.cal     1.0000     1.0000    0.9995    1.0005
wt.loss      0.9859     1.0143    0.9710    1.0010

Concordance= 0.653  (se = 0.029 )
Likelihood ratio test= 28.16  on 7 df,   p=2e-04
Wald test            = 27.5  on 7 df,   p=3e-04
Score (logrank) test = 28.31  on 7 df,   p=2e-04

Overall significance

The three tests of overall significance:

glance(lung.1) %>% select(starts_with("p.value"))

# A tibble: 1 × 4
  p.value.log p.value.sc p.value.wald p.value.robust
        <dbl>      <dbl>        <dbl>          <dbl>
1    0.000205   0.000193     0.000271             NA

All strongly significant. Something predicts survival.

Coefficients for model 1

tidy(lung.1) %>% select(term, p.value) %>% arrange(p.value)

# A tibble: 7 × 2
  term      p.value
  <chr>       <dbl>
1 ph.ecog   0.00101
2 sex       0.00603
3 ph.karno  0.0457 
4 wt.loss   0.0675 
5 pat.karno 0.137  
6 age       0.352  
7 meal.cal  0.913

sex and ph.ecog definitely significant here
age, pat.karno and meal.cal definitely not
Take out definitely non-sig variables, and try again.

Model 2

lung.2 <- update(lung.1, . ~ . - age - pat.karno - meal.cal)
summary(lung.2)

Call:
coxph(formula = Surv(time, status == 2) ~ sex + ph.ecog + ph.karno + 
    wt.loss, data = lung.complete)

  n= 167, number of events= 120 

              coef exp(coef)  se(coef)      z Pr(>|z|)    
sex      -0.570881  0.565028  0.198842 -2.871 0.004091 ** 
ph.ecog   0.844660  2.327188  0.218644  3.863 0.000112 ***
ph.karno  0.017877  1.018038  0.010887  1.642 0.100584    
wt.loss  -0.012048  0.988025  0.007495 -1.607 0.107975    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

         exp(coef) exp(-coef) lower .95 upper .95
sex          0.565     1.7698    0.3827    0.8343
ph.ecog      2.327     0.4297    1.5161    3.5722
ph.karno     1.018     0.9823    0.9965    1.0400
wt.loss      0.988     1.0121    0.9736    1.0026

Concordance= 0.646  (se = 0.031 )
Likelihood ratio test= 24.9  on 4 df,   p=5e-05
Wald test            = 24.19  on 4 df,   p=7e-05
Score (logrank) test = 24.61  on 4 df,   p=6e-05

Compare with first model:

anova(lung.2, lung.1)

Analysis of Deviance Table
 Cox model: response is  Surv(time, status == 2)
 Model 1: ~ sex + ph.ecog + ph.karno + wt.loss
 Model 2: ~ (inst + age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + wt.loss) - inst - time - status
   loglik Chisq Df Pr(>|Chi|)
1 -495.67                    
2 -494.03 3.269  3      0.352

No harm in taking out those variables.

Model 3

Take out ph.karno and wt.loss as well.

lung.3 <- update(lung.2, . ~ . - ph.karno - wt.loss)

tidy(lung.3) %>% select(term, estimate, p.value)

# A tibble: 2 × 3
  term    estimate  p.value
  <chr>      <dbl>    <dbl>
1 sex       -0.510 0.00958 
2 ph.ecog    0.483 0.000266

summary(lung.3)

Call:
coxph(formula = Surv(time, status == 2) ~ sex + ph.ecog, data = lung.complete)

  n= 167, number of events= 120 

           coef exp(coef) se(coef)      z Pr(>|z|)    
sex     -0.5101    0.6004   0.1969 -2.591 0.009579 ** 
ph.ecog  0.4825    1.6201   0.1323  3.647 0.000266 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
sex        0.6004     1.6655    0.4082    0.8832
ph.ecog    1.6201     0.6172    1.2501    2.0998

Concordance= 0.641  (se = 0.031 )
Likelihood ratio test= 19.48  on 2 df,   p=6e-05
Wald test            = 19.35  on 2 df,   p=6e-05
Score (logrank) test = 19.62  on 2 df,   p=5e-05

Check whether that was OK

anova(lung.3, lung.2)

Analysis of Deviance Table
 Cox model: response is  Surv(time, status == 2)
 Model 1: ~ sex + ph.ecog
 Model 2: ~ sex + ph.ecog + ph.karno + wt.loss
   loglik  Chisq Df Pr(>|Chi|)  
1 -498.38                       
2 -495.67 5.4135  2    0.06675 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Just OK.

Commentary

OK (just) to take out those two covariates.
Both remaining variables strongly significant.
Nature of effect on survival time? Consider later.
Picture?

Plotting survival probabilities

Assess (separately) the effect of sex and ph.ecog score using plot_predictions
Don’t forget to add time (here actually called time) to the condition.

Effect of `sex`:

plot_predictions(lung.3, condition = c("time", "sex"), 
                 type = "survival")

Ignoring unknown labels:
• linetype : "sex"

Females (sex = 2) have better survival than males.
This graph from a mean ph.ecog score, but the male-female comparison is the same for any score.

Effect of `ph.ecog` score:

plot_predictions(lung.3, condition = c("time", "ph.ecog"), 
                 type = "survival")

Ignoring unknown labels:
• linetype : "ph.ecog"

Comments

A lower ph.ecog score is better.
For example, a patient with a score of 0 has almost a 50-50 chance of living 500 days, but a patient with a score of 3 has almost no chance to survive that long.
Is this for males or females? See over. (The comparison of scores is the same for both.) How many males and females did we observe?

lung %>% count(sex)

  sex   n
1   1 138
2   2  90

Sex and `ph.ecog` score

plot_predictions(lung.3, condition = c("time", "ph.ecog", "sex"), type = "survival")

Ignoring unknown labels:
• linetype : "ph.ecog"

Comments

The previous graph was (closer to) males. There were more males in the dataset (sex of 1).
This pair of graphs shows the effect of ph.ecog score (above and below on each facet), and the effect of males (left) vs. females (right).
The difference between males and females is about the same as 1 point on the ph.ecog scale (compare the red curve on the left facet with the green curve on the right facet).

A second go

Can we put the male and female survival curves on the same plot?

Try building plot ourselves:

plot_predictions(lung.3, condition = c("time", 
                                       "ph.ecog", "sex"), 
                 type = "survival", draw = FALSE) %>% 
  select(estimate, time, ph.ecog, sex) -> dd
dd

        estimate       time ph.ecog sex
1   0.9958261142    5.00000       0   1
2   0.9974917544    5.00000       0   2
3   0.9932464368    5.00000       1   1
4   0.9959394283    5.00000       1   2
5   0.9823698047    5.00000       3   1
6   0.9893765992    5.00000       3   2
7   0.9789845105   25.75510       0   1
8   0.9873280367   25.75510       0   2
9   0.9661742275   25.75510       1   1
10  0.9795503225   25.75510       1   2
11  0.9136340405   25.75510       3   1
12  0.9472099608   25.75510       3   2
13  0.9661295016   46.51020       0   1
14  0.9795230953   46.51020       0   2
15  0.9457035459   46.51020       1   1
16  0.9670355445   46.51020       1   2
17  0.8636936182   46.51020       3   1
18  0.9157735025   46.51020       3   2
19  0.9267257131   67.26531       0   1
20  0.9553363782   67.26531       0   2
21  0.8840076417   67.26531       1   1
22  0.9286461474   67.26531       1   2
23  0.7235244504   67.26531       3   1
24  0.8233995695   67.26531       3   2
25  0.9133835942   88.02041       0   1
26  0.9470540489   88.02041       0   2
27  0.8634800465   88.02041       1   1
28  0.9156375270   88.02041       1   2
29  0.6802517059   88.02041       3   1
30  0.7934668952   88.02041       3   2
31  0.8908692350  108.77551       0   1
32  0.9329674478  108.77551       0   2
33  0.8292607431  108.77551       1   1
34  0.8936741699  108.77551       1   2
35  0.6117493425  108.77551       3   1
36  0.7444765673  108.77551       3   2
37  0.8816909074  129.53061       0   1
38  0.9271840885  129.53061       0   2
39  0.8154631163  129.53061       1   1
40  0.8847161548  129.53061       1   2
41  0.5853916747  129.53061       3   1
42  0.7250475842  129.53061       3   2
43  0.8629103866  150.28571       0   1
44  0.9152747741  150.28571       0   2
45  0.7875077652  150.28571       1   1
46  0.8663784959  150.28571       1   2
47  0.5341717373  150.28571       3   1
48  0.6862617397  150.28571       3   2
49  0.8196382068  171.04082       0   1
50  0.8874331503  171.04082       0   2
51  0.7245277561  171.04082       1   1
52  0.8240849587  171.04082       1   2
53  0.4292008284  171.04082       3   1
54  0.6017777642  171.04082       3   2
55  0.7892492197  191.79592       0   1
56  0.8675283428  191.79592       0   2
57  0.6815089413  191.79592       1   1
58  0.7943470971  191.79592       1   2
59  0.3654954737  191.79592       3   1
60  0.5464350861  191.79592       3   2
61  0.7563407324  212.55102       0   1
62  0.8456245898  212.55102       0   2
63  0.6360689298  212.55102       1   1
64  0.7621085230  212.55102       1   2
65  0.3049446998  212.55102       3   1
66  0.4901259821  212.55102       3   2
67  0.7204837357  233.30612       0   1
68  0.8213200400  233.30612       0   2
69  0.5879357460  233.30612       1   1
70  0.7269379227  233.30612       1   2
71  0.2480375550  233.30612       3   1
72  0.4329586470  233.30612       3   2
73  0.7016083887  254.06122       0   1
74  0.8083319531  254.06122       0   2
75  0.5631842470  254.06122       1   1
76  0.7084049045  254.06122       1   2
77  0.2215573960  254.06122       3   1
78  0.4045817809  254.06122       3   2
79  0.6757023797  274.81633       0   1
80  0.7902764248  274.81633       0   2
81  0.5298810001  274.81633       1   1
82  0.6829465638  274.81633       1   2
83  0.1887990922  274.81633       3   1
84  0.3675227343  274.81633       3   2
85  0.6210609836  295.57143       0   1
86  0.7512601197  295.57143       0   2
87  0.4622176443  295.57143       1   1
88  0.6291609413  295.57143       1   2
89  0.1319048250  295.57143       3   1
90  0.2963280480  295.57143       3   2
91  0.5901730364  316.32653       0   1
92  0.7285976124  316.32653       0   2
93  0.4255516646  316.32653       1   1
94  0.5987004224  316.32653       1   2
95  0.1061799383  316.32653       3   1
96  0.2601361715  316.32653       3   2
97  0.5742800696  337.08163       0   1
98  0.7167524761  337.08163       0   2
99  0.4071406034  337.08163       1   1
100 0.5830106141  337.08163       1   2
101 0.0945418290  337.08163       3   1
102 0.2426206019  337.08163       3   2
103 0.5344001653  357.83673       0   1
104 0.6864379324  357.83673       0   2
105 0.3623290354  357.83673       1   1
106 0.5435876863  357.83673       1   2
107 0.0696139813  357.83673       3   1
108 0.2018892480  357.83673       3   2
109 0.5020868773  378.59184       0   1
110 0.6612060731  378.59184       0   2
111 0.3275042864  378.59184       1   1
112 0.5115861543  378.59184       1   2
113 0.0533949742  378.59184       3   1
114 0.1721651713  378.59184       3   2
115 0.4936458661  399.34694       0   1
116 0.6545089702  399.34694       0   2
117 0.3186304180  399.34694       1   1
118 0.5032174892  399.34694       1   2
119 0.0496805215  399.34694       3   1
120 0.1648705559  399.34694       3   2
121 0.4936458661  420.10204       0   1
122 0.6545089702  420.10204       0   2
123 0.3186304180  420.10204       1   1
124 0.5032174892  420.10204       1   2
125 0.0496805215  420.10204       3   1
126 0.1648705559  420.10204       3   2
127 0.4573995848  440.85714       0   1
128 0.6252148959  440.85714       0   2
129 0.2815971793  440.85714       1   1
130 0.4672367198  440.85714       1   2
131 0.0359201070  440.85714       3   1
132 0.1356977243  440.85714       3   2
133 0.4111651258  461.61224       0   1
134 0.5864641075  461.61224       0   2
135 0.2369455698  461.61224       1   1
136 0.4212272355  461.61224       1   2
137 0.0228309781  461.61224       3   1
138 0.1033710371  461.61224       3   2
139 0.3916013154  482.36735       0   1
140 0.5695461585  482.36735       0   2
141 0.2189508192  482.36735       1   1
142 0.4017170583  482.36735       1   2
143 0.0185560557  482.36735       3   1
144 0.0912717331  482.36735       3   2
145 0.3916013154  503.12245       0   1
146 0.5695461585  503.12245       0   2
147 0.2189508192  503.12245       1   1
148 0.4017170583  503.12245       1   2
149 0.0185560557  503.12245       3   1
150 0.0912717331  503.12245       3   2
151 0.3707322430  523.87755       0   1
152 0.5511226818  523.87755       0   2
153 0.2003610440  523.87755       1   1
154 0.3808759027  523.87755       1   2
155 0.0147007746  523.87755       3   1
156 0.0793605958  523.87755       3   2
157 0.3602173146  544.63265       0   1
158 0.5416832058  544.63265       0   2
159 0.1912353880  544.63265       1   1
160 0.3703630509  544.63265       1   2
161 0.0130076654  544.63265       3   1
162 0.0737389975  544.63265       3   2
163 0.3368569279  565.38776       0   1
164 0.5203088489  565.38776       0   2
165 0.1715500939  565.38776       1   1
166 0.3469770087  565.38776       1   2
167 0.0097805746  565.38776       3   1
168 0.0621358485  565.38776       3   2
169 0.2995378383  586.14286       0   1
170 0.4848894092  586.14286       0   2
171 0.1418317249  586.14286       1   1
172 0.3095238540  586.14286       1   2
173 0.0059361210  586.14286       3   1
174 0.0460395091  586.14286       3   2
175 0.2995378383  606.89796       0   1
176 0.4848894092  606.89796       0   2
177 0.1418317249  606.89796       1   1
178 0.3095238540  606.89796       1   2
179 0.0059361210  606.89796       3   1
180 0.0460395091  606.89796       3   2
181 0.2866664919  627.65306       0   1
182 0.4722690927  627.65306       0   2
183 0.1320898416  627.65306       1   1
184 0.2965775466  627.65306       1   2
185 0.0049247410  627.65306       3   1
186 0.0411551034  627.65306       3   2
187 0.2602043507  648.40816       0   1
188 0.4455882750  648.40816       0   2
189 0.1129073130  648.40816       1   1
190 0.2699107246  648.40816       1   2
191 0.0032621539  648.40816       3   1
192 0.0321379612  648.40816       3   2
193 0.2469821941  669.16327       0   1
194 0.4318515988  669.16327       0   2
195 0.1037593994  669.16327       1   1
196 0.2565590656  669.16327       1   2
197 0.0026132760  669.16327       3   1
198 0.0281309773  669.16327       3   2
199 0.2205587262  689.91837       0   1
200 0.4034858063  689.91837       0   2
201 0.0863795680  689.91837       1   1
202 0.2298173390  689.91837       1   2
203 0.0016151071  689.91837       3   1
204 0.0210718816  689.91837       3   2
205 0.1934926947  710.67347       0   1
206 0.3729818188  710.67347       0   2
207 0.0698698561  710.67347       1   1
208 0.2023344863  710.67347       1   2
209 0.0009255371  710.67347       3   1
210 0.0150838905  710.67347       3   2
211 0.1784507574  731.42857       0   1
212 0.3552913458  731.42857       0   2
213 0.0612840957  731.42857       1   1
214 0.1870164629  731.42857       1   2
215 0.0006560357  731.42857       3   1
216 0.0122678474  731.42857       3   2
217 0.1784507574  752.18367       0   1
218 0.3552913458  752.18367       0   2
219 0.0612840957  752.18367       1   1
220 0.1870164629  752.18367       1   2
221 0.0006560357  752.18367       3   1
222 0.0122678474  752.18367       3   2
223 0.1621079396  772.93878       0   1
224 0.3353806267  772.93878       0   2
225 0.0524523175  772.93878       1   1
226 0.1703336708  772.93878       1   2
227 0.0004360433  772.93878       3   1
228 0.0095995812  772.93878       3   2
229 0.1458111707  793.69388       0   1
230 0.3147094706  793.69388       0   2
231 0.0441790062  793.69388       1   1
232 0.1536521892  793.69388       1   2
233 0.0002778739  793.69388       3   1
234 0.0073241602  793.69388       3   2
235 0.1259201026  814.44898       0   1
236 0.2881805051  814.44898       0   2
237 0.0348352820  814.44898       1   1
238 0.1332219486  814.44898       1   2
239 0.0001489253  814.44898       3   1
240 0.0050363066  814.44898       3   2
241 0.1259201026  835.20408       0   1
242 0.2881805051  835.20408       0   2
243 0.0348352820  835.20408       1   1
244 0.1332219486  835.20408       1   2
245 0.0001489253  835.20408       3   1
246 0.0050363066  835.20408       3   2
247 0.1259201026  855.95918       0   1
248 0.2881805051  855.95918       0   2
249 0.0348352820  855.95918       1   1
250 0.1332219486  855.95918       1   2
251 0.0001489253  855.95918       3   1
252 0.0050363066  855.95918       3   2
253 0.1259201026  876.71429       0   1
254 0.2881805051  876.71429       0   2
255 0.0348352820  876.71429       1   1
256 0.1332219486  876.71429       1   2
257 0.0001489253  876.71429       3   1
258 0.0050363066  876.71429       3   2
259 0.1259201026  897.46939       0   1
260 0.2881805051  897.46939       0   2
261 0.0348352820  897.46939       1   1
262 0.1332219486  897.46939       1   2
263 0.0001489253  897.46939       3   1
264 0.0050363066  897.46939       3   2
265 0.1259201026  918.22449       0   1
266 0.2881805051  918.22449       0   2
267 0.0348352820  918.22449       1   1
268 0.1332219486  918.22449       1   2
269 0.0001489253  918.22449       3   1
270 0.0050363066  918.22449       3   2
271 0.1259201026  938.97959       0   1
272 0.2881805051  938.97959       0   2
273 0.0348352820  938.97959       1   1
274 0.1332219486  938.97959       1   2
275 0.0001489253  938.97959       3   1
276 0.0050363066  938.97959       3   2
277 0.1259201026  959.73469       0   1
278 0.2881805051  959.73469       0   2
279 0.0348352820  959.73469       1   1
280 0.1332219486  959.73469       1   2
281 0.0001489253  959.73469       3   1
282 0.0050363066  959.73469       3   2
283 0.1259201026  980.48980       0   1
284 0.2881805051  980.48980       0   2
285 0.0348352820  980.48980       1   1
286 0.1332219486  980.48980       1   2
287 0.0001489253  980.48980       3   1
288 0.0050363066  980.48980       3   2
289 0.1259201026 1001.24490       0   1
290 0.2881805051 1001.24490       0   2
291 0.0348352820 1001.24490       1   1
292 0.1332219486 1001.24490       1   2
293 0.0001489253 1001.24490       3   1
294 0.0050363066 1001.24490       3   2
295 0.1259201026 1022.00000       0   1
296 0.2881805051 1022.00000       0   2
297 0.0348352820 1022.00000       1   1
298 0.1332219486 1022.00000       1   2
299 0.0001489253 1022.00000       3   1
300 0.0050363066 1022.00000       3   2

The plot

ggplot(dd, aes(x = time, y = estimate, 
               colour = ph.ecog, 
               linetype = sex)) + geom_line()

The summary again

summary(lung.3)

Call:
coxph(formula = Surv(time, status == 2) ~ sex + ph.ecog, data = lung.complete)

  n= 167, number of events= 120 

           coef exp(coef) se(coef)      z Pr(>|z|)    
sex     -0.5101    0.6004   0.1969 -2.591 0.009579 ** 
ph.ecog  0.4825    1.6201   0.1323  3.647 0.000266 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
sex        0.6004     1.6655    0.4082    0.8832
ph.ecog    1.6201     0.6172    1.2501    2.0998

Concordance= 0.641  (se = 0.031 )
Likelihood ratio test= 19.48  on 2 df,   p=6e-05
Wald test            = 19.35  on 2 df,   p=6e-05
Score (logrank) test = 19.62  on 2 df,   p=5e-05

Comments

A higher-numbered sex (female) has a lower hazard of death (negative coef). That is, females are more likely to survive longer than males.
A higher ph.ecog score goes with a higher hazard of death (positive coef). So patients with a lower score are more likely to survive longer.
These are consistent with the graphs we drew.

Martingale residuals for this model

No problems here:

lung.3 %>% augment(lung.complete) %>% 
  ggplot(aes(x = .fitted, y = .resid)) + 
    geom_point() + geom_smooth()

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

When the Cox model fails (optional)

Invent some data where survival is best at middling age, and worse at high and low age:

age <- seq(20, 60, 5)
survtime <- c(10, 12, 11, 21, 15, 20, 8, 9, 11)
stat <- c(1, 1, 1, 1, 0, 1, 1, 1, 1)
d <- tibble(age, survtime, stat)
d

# A tibble: 9 × 3
    age survtime  stat
  <dbl>    <dbl> <dbl>
1    20       10     1
2    25       12     1
3    30       11     1
4    35       21     1
5    40       15     0
6    45       20     1
7    50        8     1
8    55        9     1
9    60       11     1

Small survival time 15 in middle was actually censored, so would have been longer if observed.

Fit Cox model

d.1 <- coxph(Surv(survtime, stat) ~ age, data = d)
summary(d.1)

Call:
coxph(formula = Surv(survtime, stat) ~ age, data = d)

  n= 9, number of events= 8 

       coef exp(coef) se(coef)     z Pr(>|z|)
age 0.01984   1.02003  0.03446 0.576    0.565

    exp(coef) exp(-coef) lower .95 upper .95
age      1.02     0.9804    0.9534     1.091

Concordance= 0.545  (se = 0.105 )
Likelihood ratio test= 0.33  on 1 df,   p=0.6
Wald test            = 0.33  on 1 df,   p=0.6
Score (logrank) test = 0.33  on 1 df,   p=0.6

Martingale residuals

Down-and-up indicates incorrect relationship between age and survival:

d.1 %>% augment(d) %>% 
  ggplot(aes(x = .fitted, y = .resid)) + 
    geom_point() + geom_smooth()

Attempt 2

Add squared term in age:

d.2 <- update(d.1, . ~ . + I(age^2))
summary(d.2)

Call:
coxph(formula = Surv(survtime, stat) ~ age + I(age^2), data = d)

  n= 9, number of events= 8 

              coef exp(coef)  se(coef)      z Pr(>|z|)  
age      -0.380184  0.683736  0.241617 -1.573   0.1156  
I(age^2)  0.004832  1.004844  0.002918  1.656   0.0977 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

         exp(coef) exp(-coef) lower .95 upper .95
age         0.6837     1.4626    0.4258     1.098
I(age^2)    1.0048     0.9952    0.9991     1.011

Concordance= 0.758  (se = 0.123 )
Likelihood ratio test= 3.26  on 2 df,   p=0.2
Wald test            = 3.16  on 2 df,   p=0.2
Score (logrank) test = 3.75  on 2 df,   p=0.2

(Marginally) helpful.

Martingale residuals this time

Not great, but less problematic than before:

d.2 %>% augment(d) %>% 
  ggplot(aes(x = .fitted, y = .resid)) + 
  geom_point() + geom_smooth()

Survival Analysis