--- title: "Repeated measures analysis" editor: markdown: wrap: 72 --- ## Repeated measures - More than one response *measurement* for each subject, same thing at different times - Generalization of matched pairs ("matched triples", etc.). - Expect measurements on same subject to be correlated, so assumptions of independence will fail. - *Repeated measures*. *Profile analysis* uses `Manova` (set up). - Another approach uses *mixed models* (random effects). - Variation: each subject does all treatments at different times (called *crossover design*). ## Packages ```{r bProfile-1, results="hide", message=FALSE} library(car) library(tidyverse) library(lme4) # for mixed models later ``` ## Example: histamine in dogs - 8 dogs take part in experiment. - Dogs randomized to one of 2 different drugs. - Response: log of blood concentration of histamine 0, 1, 3 and 5 minutes after taking drug. (Repeated measures.) - Data in `dogs.txt`, column-aligned. ## Read in data ```{r bProfile-2 } my_url <- "http://ritsokiguess.site/datafiles/dogs.txt" dogs <- read_table(my_url) dogs ``` ## Setting things up ```{r bProfile-3} response <- with(dogs, cbind(lh0, lh1, lh3, lh5)) response ``` ## Another way to make response ```{r} dogs %>% select(starts_with("lh")) %>% as.matrix() -> response response ``` ## The repeated measures MANOVA Get list of response variable names; we call them `times`. Save in data frame. ```{r bProfile-4, echo=FALSE} options(width = 70) ``` \small ```{r bProfile-5, error=TRUE} times <- colnames(response) times times.df <- data.frame(times=factor(times)) times.df ``` \normalsize ## Fitting the model ```{r} dogs.1 <- lm(response ~ drug, data = dogs) dogs.2 <- Manova(dogs.1, idata = times.df, idesign = ~times ) ``` ## The output (there is a lot) - normally you just run ```{r} #| output: false summary(dogs.2) ``` and pull out what you need to answer the question. - But you can grab just individual pieces as shown below: \small ```{r} names(summary(dogs.2)) ``` \normalsize ## What there is here - three sets of tests, for - times; drug; their interaction - two *types* of test for each of these: - univariate; multivariate - univariate is more powerful *if* it applies; if it doesn't, can make adjustments to it ## Sphericity - The thing that decides whether the univariate tests apply is called "sphericity". - This holds if the outcomes have equal variance (to each other) and have the same (positive) correlation across subjects. - Tested using Mauchly's test (part of output) - If sphericity rejected, there are adjustments to the univariate P-values due to Huynh-Feldt and Greenhouse-Geisser. Huynh-Feldt better if responses not actually normal (safer). ## Sphericity tests ```{r} summary(dogs.2)$sphericity.tests ``` Sphericity is not rejected; proceed to univariate tests. ## Univariate tests \scriptsize ```{r} summary(dogs.2)$univariate.tests ``` - Significant interaction between `drug` and time: the pattern of log-histamine over time is different for the different drugs. \normalsize ## If sphericity had been rejected then we would use the H-F adjusted P-values: ```{r} summary(dogs.2)$pval.adjustments ``` In this case (because sphericity was not rejected), these are very similar to the ones from the univariate tests, and the conclusion (significant interaction) was the same. ## Comments - If the interaction had not been significant: - cannot remove interaction with time - so look at univariate (or adjusted for sphericity) tests of main effects in model with non-significant interaction ## Next - investigate interaction with graph - but dataframe has several observations per line ("wide"). - Plotting works with data in "long format": one response per line. - The responses are log-histamine at different times, labelled `lh`-something. Call them all `lh` and put them in one column, with the time they belong to labelled. ## Running `pivot_longer`, try 1 ```{r bProfile-7, size="footnotesize"} dogs %>% pivot_longer(starts_with("lh"), names_to = "time", values_to = "lh") ``` \normalsize ## Getting the times Not quite right: want new variable containing just number in `time`: `parse_number`. (Top 5 rows shown.) \footnotesize ```{r bProfile-8} #| output: false dogs %>% pivot_longer(starts_with("lh"), names_to = "timex", values_to = "lh") %>% mutate(time = parse_number(timex)) ``` ```{r bProfile-8a} #| echo: false dogs %>% pivot_longer(starts_with("lh"), names_to = "timex", values_to = "lh") %>% mutate(time = parse_number(timex)) %>% slice(1:5) ``` \normalsize ## What I did differently - I realized that `pivot_longer` was going to produce something like `lh1`, which I needed to do something further with, so this time I gave it a temporary name `timex` (which we actually *do* use later). - This enabled me to use the name `time` for the actual numeric time. - This works now, so next save into a new data frame `dogs.long`. ## Saving ```{r bProfile-9 } dogs %>% pivot_longer(starts_with("lh"), names_to = "timex", values_to = "lh") %>% mutate(time = parse_number(timex)) -> dogs.long ``` ## Comments This says: - Take data frame dogs, and then: - Combine the columns `lh0` through `lh5` into one column called `lh`, with the column that each `lh` value originally came from labelled by `timex`, and then: - Pull out numeric values in `timex`, saving in `time` and then: - save the result in a data frame `dogs.long`. ## Interaction plot ```{r bProfile-10, fig.height=4} ggplot(dogs.long, aes(x = time, y = lh, colour = drug, group = drug)) + stat_summary(fun = mean, geom = "point") + stat_summary(fun = mean, geom = "line") ``` ## Comments - Plot mean `lh` value at each time, joining points on same drug by lines. - drugs same at time 0 - after that, Trimethaphan higher than Morphine. - Effect of drug not consistent over time: significant interaction. ## Take out time zero - Lines on interaction plot would then be parallel, and so interaction should no longer be significant. - Go back to original "wide" `dogs` data frame. ```{r bProfile-11, size="footnotesize", error=TRUE} response <- with(dogs, cbind(lh1, lh3, lh5)) # excl time 0 dogs.1 <- lm(response ~ drug, data = dogs) times <- colnames(response) times.df <- data.frame(times=factor(times)) dogs.2 <- Manova(dogs.1, idata = times.df, idesign = ~times ) ``` ## Results (univariate) \scriptsize ```{r} summary(dogs.2)$sphericity.tests # summary(dogs.2)$pval.adjustments summary(dogs.2)$univariate.tests ``` \normalsize ## Comments - sphericity: no problem (P-value 0.25) - univariate test for interaction no longer significant (P-value 0.082) - look at main effects: - strong significance of time, even after taking out time 0 - actually *not* significant drug effect, despite interaction plot ## Non-significant drug effect reasonable? - Plot *actual data*: `lh` against `days`, labelling observations by drug: "spaghetti plot". - Uses long data frame: - Plot (`time`, `lh`) points coloured by *drug* - connecting measurements for each *dog* by lines. - Hence, `group = dog`, but `colour = drug`: ```{r platanias} ggplot(dogs.long, aes(x = time, y = lh, colour = drug, group = dog)) + geom_point() + geom_line() -> g ``` ## The spaghetti plot ```{r hoverla,fig.height=5} g ``` ## Comments - For each dog over time, gradual decrease in log-histamine from time 1: significant time effect after we took out time 0. - Pattern about same for each dog, regardless of drug, hence non-significant interaction. - Most trimethaphan dogs (blue) have higher log-histamine throughout (time 1 and after), some morphine dogs (red) have lower. - *But* two morphine dogs have log-histamine profiles like trimethaphan dogs. This ambiguity probably why `drug` effect not quite significant. ## Mixed models - Another way to fit repeated measures - Subjects (on whom repeated measures taken) are *random sample of all possible subjects* (random effects) - Times and treatments are *the only ones we care about* (fixed effects) - Use package `lme4` function `lmer` (like `lm` in some ways) - Uses long-format "tidy" data ## Fitting the model (uses `lme4`) ```{r bProfile-13, message=FALSE} # dogs.long including time zero with categorical timex dogs.3 <- lmer(lh ~ drug * timex + (1|dog), data=dogs.long) ``` - note specification of random effect: each dog has "random intercept" that moves log-histamine up or down for that dog over all times ## What can we drop? - using `drop1`: ```{r bProfile-14} drop1(dogs.3, test="Chisq") ``` - Interaction very significant. Including time zero, the pattern of log-histamine over time is different for the two drugs (as we found before). ## Omitting time zero Let's pretend we are working at $\alpha = 0.01$: ```{r} dogs.long %>% filter(timex != "lh0") -> dogs.long.no0 dogs.4 <- lmer(lh ~ drug * timex + (1|dog), data=dogs.long.no0) drop1(dogs.4, test = "Chisq") ``` Interaction is not quite significant at $\alpha = 0.01$. So we could remove it. ## Removing the interaction ```{r} dogs.5 <- update(dogs.4, . ~ . - drug:timex) drop1(dogs.5, test = "Chisq") ``` - Definitely an effect of time, but drug is not quite significant (at $\alpha = 0.01$). - More or less same conclusions as from MANOVA. ## The exercise data - 30 people took part in an exercise study. - Each subject randomly assigned to one of two diets ("low fat" or "non-low fat") and to one of three exercise programs ("at rest", "walking", "running"). - $2\times3 = 6$ experimental treatments, and thus each one replicated $30/6=5$ times. (Two-way ANOVA, so far?) - However, each subject had pulse rate measured at three different times (1, 15 and 30 minutes after starting their exercise), so have repeated measures. ## Reading the data Separated by *tabs*: ```{r bProfile-16 } url <- "http://ritsokiguess.site/datafiles/exercise2.txt" exercise.long <- read_tsv(url) exercise.long %>% slice(1:7) # top 7 rows ``` ## Comments - "Long format", usually what we want. - But for repeated measures analysis, we want *wide* format! - Keep track of which is which: - `Manova` analysis: wider - graphs and `lmer` analysis: longer. - `pivot_wider`. ## Making wide format - `pivot_wider` needs: a column that is going to be split, and the column to make the values out of: ```{r bProfile-18} exercise.long %>% pivot_wider(names_from=time, values_from=pulse) -> exercise.wide exercise.wide %>% sample_n(5) # random 5 rows ``` ## Setting up - Make response variable from `min01, min15, min30`: ```{r bProfile-19 } response <- with(exercise.wide, cbind(min01, min15, min30)) ``` - Predict from `diet`, `exertype`, interaction using `lm`: ```{r bProfile-20 } exercise.1 <- lm(response ~ diet * exertype, data = exercise.wide ) ``` ## ... continued - Run this through `Manova`: ```{r bProfile-21, error=TRUE} times <- colnames(response) times.df <- data.frame(times=factor(times)) exercise.2 <- Manova(exercise.1, idata = times.df, idesign = ~times) ``` ## Sphericity tests ```{r} summary(exercise.2)$sphericity.tests ``` No problem with sphericity; go to univariate tests. ## Univariate tests \footnotesize ```{r} summary(exercise.2)$univariate.tests ``` \normalsize ## Comments - The three-way interaction is significant - the effect of diet on pulse rate over time is different for the different exercise types ## Making some graphs - Three-way interactions are difficult to understand. To make an attempt, look at some graphs. - Plot time trace of pulse rates for each individual, joined by lines, and make *separate* plots for each `diet-exertype` combo. - `facet_grid(diet~exertype)`: do a separate plot for each combination of diet and exercise type, with diets going down the page and exercise types going across. (Graphs are usually landscape, so have the factor `exertype` with more levels going across.) ## ... continued - `ggplot` again. Using *long* data frame: ```{r bProfile-24 } g <- ggplot(exercise.long, aes( x = time, y = pulse, group = id )) + geom_point() + geom_line() + facet_grid(diet ~ exertype) ``` ## The graph(s) ```{r bProfile-25, fig.height=5} g ``` ## Comments on graphs - At rest: no change in pulse rate over time - Walking: not much change in pulse rates over time. - Running: overall increase in pulse rate over time, but increase stronger for `lowfat` group. - No consistent effect of: - diet over all exercise groups. - exercise type over both diet groups. - time over all diet-exercise type combos. ## "Simple effects" of diet for the subjects who ran - Looks as if there is only any substantial time effect for the runners. For them, does diet have an effect? - Pull out only the runners from the wide data: ```{r bProfile-26 } exercise.wide %>% filter(exertype == "running") -> runners.wide ``` ## ... continued - Create response variable and do MANOVA. Some of this looks like before, but I have different data now: ```{r bProfile-27} response <- with(runners.wide, cbind(min01, min15, min30)) runners.1 <- lm(response ~ diet, data = runners.wide) times <- colnames(response) times.df <- data.frame(times=factor(times)) runners.2 <- Manova(runners.1, idata = times.df, idesign = ~times ) ``` ## Sphericity tests ```{r} summary(runners.2)$sphericity.tests ``` - No problem, look at univariate tests. ## Univariate tests \footnotesize ```{r} summary(runners.2)$univariate.tests ``` \normalsize - Interaction still significant - dependence of pulse rate on time still different for the two diets ## How is the effect of diet different over time? - Table of means. Only I need long data for this: ```{r bProfile-29 } runners.wide %>% pivot_longer(starts_with("min"), names_to = "time", values_to = "pulse") %>% group_by(time, diet) %>% summarize( mean = mean(pulse), sd = sd(pulse) ) -> summ ``` - Result of `summarize` is data frame, so can save it (and do more with it if needed). ## Interaction plot - We went to trouble of finding means by group, so making interaction plot is now mainly easy: ```{r bProfile-31, fig.height=4} ggplot(summ, aes(x = time, y = mean, colour = diet, group = diet)) + geom_point() + geom_line() ``` ## Comment on interaction plot - The lines are not parallel, so there is interaction between diet and time for the runners. - The effect of time on pulse rate is different for the two diets, even though all the subjects here were running.