---
title: "Repeated measures analysis"
editor: 
  markdown: 
    wrap: 72
---

## Repeated measures 

-   More than one response *measurement* for each subject,
    same thing at different times

-   Generalization of matched pairs ("matched triples", etc.).


-   Expect measurements on same subject to be correlated, so assumptions
    of independence will fail.

-   *Repeated measures*.  *Profile
    analysis* uses `Manova` (set up).

-   Another approach uses *mixed models* (random effects).

-   Variation: each subject does all treatments at
    different times (called *crossover design*).

## Packages

```{r bProfile-1, results="hide", message=FALSE}
library(car)
library(tidyverse)
library(lme4) # for mixed models later
```

## Example: histamine in dogs

-   8 dogs take part in experiment.

-   Dogs randomized to one of 2 different drugs.

-   Response: log of blood concentration of histamine 0, 1, 3 and 5
    minutes after taking drug. (Repeated measures.)

-   Data in `dogs.txt`, column-aligned.

## Read in data

```{r bProfile-2 }
my_url <- "http://ritsokiguess.site/datafiles/dogs.txt"
dogs <- read_table(my_url)
dogs
```

## Setting things up

```{r bProfile-3}
response <- with(dogs, cbind(lh0, lh1, lh3, lh5))
response
```

## Another way to make response

```{r}
dogs %>% select(starts_with("lh")) %>% 
  as.matrix() -> response
response
```


## The repeated measures MANOVA

Get list of response variable names; we call them `times`. Save in data
frame.

```{r bProfile-4, echo=FALSE}
options(width = 70)
```

\small

```{r bProfile-5, error=TRUE}
times <- colnames(response)
times
times.df <- data.frame(times=factor(times))
times.df
```

\normalsize

## Fitting the model

```{r}
dogs.1 <- lm(response ~ drug, data = dogs)
dogs.2 <- Manova(dogs.1,
  idata = times.df,
  idesign = ~times
)
```

## The output (there is a lot)

- normally you just run

```{r}
#| output: false

summary(dogs.2)
```

and pull out what you need to answer the question. 

- But you can grab just individual pieces as shown below:

\small

```{r}
names(summary(dogs.2))
```

\normalsize

## What there is here

-   three sets of tests, for

    -   times; drug; their interaction

-   two *types* of test for each of these:

    -   univariate; multivariate

-   univariate is more powerful *if* it applies; if it doesn't, can make adjustments to it

## Sphericity

-   The thing that decides whether the univariate tests apply is called
    "sphericity".
-   This holds if the outcomes have equal variance (to each other) and
    have the same (positive) correlation across subjects.
-   Tested using Mauchly's test (part of output)
-   If sphericity rejected, there are adjustments to the univariate
    P-values due to Huynh-Feldt and Greenhouse-Geisser. Huynh-Feldt
    better if responses not actually normal (safer).
    
## Sphericity tests

```{r}
summary(dogs.2)$sphericity.tests
```

Sphericity is not rejected; proceed to univariate tests.


## Univariate tests

\scriptsize

```{r}
summary(dogs.2)$univariate.tests
```

- Significant interaction between `drug` and time: the pattern of log-histamine over time is different for the different drugs.

\normalsize

## If sphericity had been rejected

then we would use the H-F adjusted P-values:

```{r}
summary(dogs.2)$pval.adjustments
```

In this case (because sphericity was not rejected), these are very similar to the ones from the univariate tests, and the conclusion (significant interaction) was the same.


## Comments

-   If the interaction had not been significant:
    -   cannot remove interaction with time
    -   so look at univariate (or adjusted for
        sphericity) tests of main effects in model with non-significant interaction

## Next

  - investigate interaction with graph
  - but dataframe has several observations per line ("wide").
  -   Plotting works with data in "long format": one response per line.
  -   The responses are log-histamine at different times, labelled
    `lh`-something. Call them all `lh` and put them in one column, with
    the time they belong to labelled.

## Running `pivot_longer`, try 1


```{r bProfile-7, size="footnotesize"}
dogs %>% pivot_longer(starts_with("lh"), 
                      names_to = "time", values_to = "lh") 
```

\normalsize

## Getting the times

Not quite right: want new variable containing just number in
`time`: `parse_number`. (Top 5 rows shown.)

\footnotesize

```{r bProfile-8}
#| output: false
dogs %>%
  pivot_longer(starts_with("lh"), 
               names_to = "timex", values_to = "lh") %>% 
  mutate(time = parse_number(timex)) 
```

```{r bProfile-8a}
#| echo: false
dogs %>%
  pivot_longer(starts_with("lh"), 
               names_to = "timex", values_to = "lh") %>% 
  mutate(time = parse_number(timex)) %>% 
  slice(1:5)
```

\normalsize

## What I did differently

-   I realized that `pivot_longer` was going to produce something like
    `lh1`, which I needed to do something further with, so this time I
    gave it a temporary name `timex` (which we actually *do* use later).

-   This enabled me to use the name `time` for the actual numeric time.

-   This works now, so next save into a new data frame `dogs.long`.

## Saving 

```{r bProfile-9 }
dogs %>%
  pivot_longer(starts_with("lh"), 
               names_to = "timex", values_to = "lh") %>% 
  mutate(time = parse_number(timex)) -> dogs.long
```

## Comments

This says:

-   Take data frame dogs, and then:

-   Combine the columns `lh0` through `lh5` into one column called `lh`,
    with the column that each `lh` value originally came from labelled
    by `timex`, and then:

-   Pull out numeric values in `timex`, saving in `time` and then:

-   save the result in a data frame `dogs.long`.

## Interaction plot


```{r bProfile-10, fig.height=4}
ggplot(dogs.long, aes(x = time, y = lh, 
                      colour = drug, group = drug)) +
  stat_summary(fun = mean, geom = "point") +
  stat_summary(fun = mean, geom = "line")
```


## Comments

-   Plot mean `lh` value at each time, joining points on same drug by
    lines.

-   drugs same at time 0

-   after that, Trimethaphan higher than Morphine.

-   Effect of drug not consistent over time: significant interaction.

## Take out time zero

-   Lines on interaction plot would then be parallel, and so interaction
    should no longer be significant.

-   Go back to original "wide" `dogs` data frame.

```{r bProfile-11, size="footnotesize", error=TRUE}
response <- with(dogs, cbind(lh1, lh3, lh5)) # excl time 0
dogs.1 <- lm(response ~ drug, data = dogs)
times <- colnames(response)
times.df <- data.frame(times=factor(times))
dogs.2 <- Manova(dogs.1,
  idata = times.df,
  idesign = ~times
)
```

## Results (univariate)

\scriptsize

```{r}
summary(dogs.2)$sphericity.tests
# summary(dogs.2)$pval.adjustments
summary(dogs.2)$univariate.tests
```

\normalsize

## Comments

-   sphericity: no problem (P-value 0.25)
-   univariate test for interaction no longer significant (P-value
    0.082)
-   look at main effects:
    -   strong significance of time, even after taking out time 0
    -   actually *not* significant drug effect, despite interaction plot

## Non-significant drug effect reasonable?

-   Plot *actual data*: `lh` against `days`, labelling observations by
    drug: "spaghetti plot".

-   Uses long data frame:

    -   Plot (`time`, `lh`) points coloured by *drug*
    -   connecting measurements for each *dog* by lines.
    -   Hence, `group = dog`, but `colour = drug`:

```{r platanias}
ggplot(dogs.long, aes(x = time, y = lh,
  colour = drug, group = dog)) +
  geom_point() + geom_line() -> g
```

## The spaghetti plot

```{r hoverla,fig.height=5}
g
```

## Comments

-   For each dog over time, gradual decrease in log-histamine from time 1:
    significant time effect after we took out time 0.

-   Pattern about same for each dog, regardless of
    drug, hence non-significant interaction.

-   Most trimethaphan dogs (blue) have higher log-histamine
    throughout (time 1 and after), some morphine dogs (red) have
    lower.

-   *But* two morphine dogs have log-histamine profiles like
    trimethaphan dogs. This ambiguity probably why `drug` effect
    not quite significant.

## Mixed models

-   Another way to fit repeated measures
-   Subjects (on whom repeated measures taken) are *random sample of all
    possible subjects* (random effects)
-   Times and treatments are *the only ones we care about* (fixed
    effects)
-   Use package `lme4` function `lmer` (like `lm` in some ways)
-   Uses long-format "tidy" data

## Fitting the model (uses `lme4`)

```{r bProfile-13, message=FALSE}
# dogs.long including time zero with categorical timex
dogs.3 <- lmer(lh ~ drug * timex + (1|dog), data=dogs.long)
```

-   note specification of random effect: each dog has "random intercept"
    that moves log-histamine up or down for that dog over all times

## What can we drop?

-   using `drop1`:

```{r bProfile-14}
drop1(dogs.3, test="Chisq")
```

-   Interaction very significant. Including time zero, the pattern of log-histamine over time is different for the two drugs (as we found before).

## Omitting time zero

Let's pretend we are working at $\alpha = 0.01$:

```{r}
dogs.long %>% filter(timex != "lh0") -> dogs.long.no0
dogs.4 <- lmer(lh ~ drug * timex + (1|dog), data=dogs.long.no0)
drop1(dogs.4, test = "Chisq")
```

Interaction is not quite significant at $\alpha = 0.01$. So we could remove it.

## Removing the interaction

```{r}
dogs.5 <- update(dogs.4, . ~ . - drug:timex)
drop1(dogs.5, test = "Chisq")
```

- Definitely an effect of time, but drug is not quite significant (at $\alpha = 0.01$).
- More or less same conclusions as from MANOVA.

## The exercise data

-   30 people took part in an exercise study.

-   Each subject randomly assigned to one of two diets ("low fat" or
    "non-low fat") and to one of three exercise programs ("at rest",
    "walking", "running").

-   $2\times3 = 6$ experimental treatments, and thus each one
    replicated $30/6=5$ times. (Two-way ANOVA, so far?)

-   However, each subject had pulse rate measured at three
    different times (1, 15 and 30 minutes after starting their
    exercise), so have repeated measures.

## Reading the data

Separated by *tabs*:

```{r bProfile-16 }
url <- "http://ritsokiguess.site/datafiles/exercise2.txt"
exercise.long <- read_tsv(url)
exercise.long %>% slice(1:7) # top 7 rows
```

## Comments

-   "Long format", usually what we want.

-   But for repeated measures analysis, we want *wide* format!

- Keep track of which is which:
  - `Manova` analysis: wider
  - graphs and `lmer` analysis: longer.

-   `pivot_wider`.

## Making wide format

-   `pivot_wider` needs: a column that is going to be split, and the
    column to make the values out of:


```{r bProfile-18}
exercise.long %>% pivot_wider(names_from=time, 
                              values_from=pulse) -> exercise.wide
exercise.wide %>% sample_n(5) # random 5 rows
```


## Setting up 

-   Make response variable from `min01, min15, min30`:

```{r bProfile-19 }
response <- with(exercise.wide, cbind(min01, min15, min30))
```

-   Predict from `diet`, `exertype`, interaction using `lm`:

```{r bProfile-20 }
exercise.1 <- lm(response ~ diet * exertype,
  data = exercise.wide
)
```

## ... continued

-   Run this through `Manova`:

```{r bProfile-21, error=TRUE}
times <- colnames(response)
times.df <- data.frame(times=factor(times))
exercise.2 <- Manova(exercise.1, 
                     idata = times.df, 
                     idesign = ~times)
```

## Sphericity tests

```{r}
summary(exercise.2)$sphericity.tests
```

No problem with sphericity; go to univariate tests.

## Univariate tests

\footnotesize

```{r}
summary(exercise.2)$univariate.tests
```

\normalsize

## Comments

-   The three-way interaction is significant
-   the effect of diet on pulse rate over time is different for the
        different exercise types

## Making some graphs

-   Three-way interactions are difficult to understand. To make an
    attempt, look at some graphs.

-   Plot time trace of pulse rates for each individual, joined by lines,
    and make *separate* plots for each `diet-exertype` combo.

-   `facet_grid(diet~exertype)`: do a separate plot for each combination
    of diet and exercise type, with diets going down the page and
    exercise types going across. (Graphs are usually landscape, so have
    the factor `exertype` with more levels going across.)

## ... continued

-   `ggplot` again. Using *long* data frame:

```{r bProfile-24 }
g <- ggplot(exercise.long, aes(
  x = time, y = pulse,
  group = id
)) + geom_point() + geom_line() +
  facet_grid(diet ~ exertype)
```


## The graph(s)

```{r bProfile-25, fig.height=5}
g
```

## Comments on graphs

-   At rest: no change in pulse rate over time

-   Walking: not much change in pulse rates over time.

-   Running: overall increase in pulse rate over
    time, but increase stronger for `lowfat` group.

-   No consistent effect of: 
    - diet over all exercise groups.
    -   exercise type over both diet groups.
    -   time over all diet-exercise type combos.

## "Simple effects" of diet for the subjects who ran

-   Looks as if there is only any substantial time effect for the
    runners. For them, does diet have an effect?

-   Pull out only the runners from the wide data:

```{r bProfile-26 }
exercise.wide %>%
  filter(exertype == "running") -> runners.wide
```

## ... continued

-   Create response variable and do MANOVA. Some of this looks like
    before, but I have different data now:

```{r bProfile-27}
response <- with(runners.wide, cbind(min01, min15, min30))
runners.1 <- lm(response ~ diet, data = runners.wide)
times <- colnames(response)
times.df <- data.frame(times=factor(times))
runners.2 <- Manova(runners.1,
  idata = times.df,
  idesign = ~times
)
```

## Sphericity tests

```{r}
summary(runners.2)$sphericity.tests
```

-   No problem, look at univariate tests.

## Univariate tests

\footnotesize

```{r}
summary(runners.2)$univariate.tests
```

\normalsize

-   Interaction still significant
    -   dependence of pulse rate on time still different for the two
        diets

## How is the effect of diet different over time?

-   Table of means. Only I need long data for this:

```{r bProfile-29 }
runners.wide %>%
  pivot_longer(starts_with("min"), 
               names_to = "time", values_to = "pulse") %>%
  group_by(time, diet) %>%
  summarize(
    mean = mean(pulse),
    sd = sd(pulse)
  ) -> summ
```

-   Result of `summarize` is data frame, so can save it (and do more
    with it if needed).


## Interaction plot 

-   We went to trouble of finding means by group, so making interaction
    plot is now mainly easy:

```{r bProfile-31, fig.height=4}
ggplot(summ, aes(x = time, y = mean, colour = diet,
                 group = diet)) + geom_point() + geom_line()
```

## Comment on interaction plot

-   The lines are not parallel, so there is interaction between diet and
    time for the runners.
-   The effect of time on pulse rate is different for the two diets,
    even though all the subjects here were running.