Bayesian Statistics with Stan

Packages for this section

Installation instructions for the last three of these are below.

library(tidyverse)
library(cmdstanr)
library(posterior)
library(bayesplot)

Installation 1/2

cmdstanr:

install.packages("cmdstanr", 
                 repos = c("https://stan-dev.r-universe.dev",
                           "https://cloud.r-project.org"))

posterior and bayesplot, from the same place:

install.packages("posterior", 
                 repos = c("https://stan-dev.r-universe.dev",
                           "https://cloud.r-project.org"))
install.packages("bayesplot", 
                 repos = c("https://stan-dev.r-universe.dev",
                           "https://cloud.r-project.org"))

Installation 2/2

Then, to check that you have the C++ stuff needed to compile Stan code:

check_cmdstan_toolchain()

which should produce output like The C++ toolchain required for CmdStan is setup properly!, and then:

install_cmdstan(cores = 6)

If you happen to know how many cores (processors) your computer has, insert the appropriate number. (My new laptop has 8 and my desktop 6.)

All of this is done once. If you have problems, go here (link).

Bayesian and frequentist inference 1/2

The inference philosophy that we have learned so far says that:
- parameters to be estimated are fixed but unknown
- Data random; if we took another sample we’d get different data.
This is called “frequentist” or “repeated-sampling” inference.

Bayesian and frequentist inference 2/2

Bayesian inference says:
- parameters are random, data is given
Ingredients:
- prior distribution: distribution of parameters before seeing data.
- likelihood: model for data if the parameters are known
- posterior distribution: distribution of parameters after seeing data.

Distribution of parameters

Instead of having a point or interval estimate of a parameter, we have an entire distribution
so in Bayesian statistics we can talk about eg.
- probability that a parameter is bigger than some value
- probability that a parameter is close to some value
- probability that one parameter is bigger than another
Name comes from Bayes’ Theorem, which here says

posterior is proportional to likelihood times prior

An example

We have these (integer) observations:

(x <- c(0, 4, 3, 6, 3, 3, 2, 4))

[1] 0 4 3 6 3 3 2 4

Suppose we believe that these come from a Poisson distribution with a mean \(\lambda\) that we want to estimate.
We need a prior distribution for \(\lambda\). I will take an exponential distribution with parameter \(\beta = 0.5\). Stan uses the form of exponential density \(\beta e^{-\beta x}\) which has mean \(1/\beta\).
Normally prior would come from your knowledge of the data-generating process.
The Poisson likelihood can be written down (see over).

Some algebra

We have \(n=8\) observations \(x_i\), so the Poisson likelihood is proportional to

\[ \prod_{i=1}^n e^{-\lambda} \lambda^{x_i} = e^{-n\lambda} \lambda^S, \] where \(S=\sum_{i=1}^n x_i\).

then you write the exponential prior density (as a function of \(\lambda\)):

\[ 0.5 e^{-0.5 \lambda} \]

and then you multiply these together and try to recognize the distributional form:

\[ e^{-n\lambda} \lambda^S (0.5) e^{-0.5 \lambda}\] which is \[ 0.5 \lambda^S e^{-(n + 0.5)\lambda}.\]

If you are lucky, you might recognize this as a gamma distribution, but you might not be so lucky.

Sampling from the posterior distribution

Wouldn’t it be nice if we could just sample from the posterior distribution? Then we would be able to compute it as accurately as we want.
Metropolis and Hastings: devise a Markov chain (C62) whose limiting distribution is the posterior you want, and then sample from that Markov chain (easy), allowing enough time to get close enough to the limiting distribution.
Stan: uses a modern variant that is more efficient (called Hamiltonian Monte Carlo), implemented in R packages cmdstanr.
Write Stan code in a file, compile it and sample from it.

Components of Stan code: the model

model {
  // likelihood
  x ~ poisson(lambda);
}

This is how you say “\(X\) has a Poisson distribution with mean \(\lambda\)”. Note that lines of Stan code have semicolons on the end.

Components of Stan code: the prior distribution

model {
  // prior
  lambda ~ exponential(0.5);
  // likelihood
  x ~ poisson(lambda);
}

Components of Stan code: data and parameters

first in the Stan code:

data {
  array[8] int x;
}

parameters {
  real<lower=0> lambda;
}

Compile and sample from the model 1/2

compile

poisson1 <- cmdstan_model("poisson1.stan")

poisson1

// Estimating Poisson mean

data {
  array[8] int x;
}

parameters {
  real<lower=0> lambda;
}

model {
  // prior
  lambda ~ exponential(0.5);
  // likelihood
  x ~ poisson(lambda);
}

Compile and sample from the model 2/2

set up data

poisson1_data <- list(x = x)
poisson1_data

$x
[1] 0 4 3 6 3 3 2 4

sample (output is (very) long)

poisson1_fit <- poisson1$sample(data = poisson1_data)

Running MCMC with 4 sequential chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 2 finished in 0.0 seconds.
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 finished in 0.0 seconds.
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.

The output

poisson1_fit

 variable mean median   sd  mad   q5  q95 rhat ess_bulk ess_tail
   lp__   2.59   2.86 0.69 0.29 1.24 3.07 1.00     1566     1542
   lambda 3.07   3.02 0.59 0.57 2.17 4.12 1.00     1589     1692

Comments

This summarizes the posterior distribution of \(\lambda\)
the posterior mean is 3.07
with a 90% posterior interval of 2.17 to 4.12.
The probability that \(\lambda\) is between these two values really is 90%.

Making the code more general

The coder in you is probably offended by hard-coding the sample size and the parameters of the prior distribution. More generally:

data {
  int<lower=1> n;
  real<lower=0> beta;
  array[n] int x;
}
...
model {
// prior
lambda ~ exponential(beta);
// likelihood
x ~ poisson(lambda);
}

Set up again and sample:

Compile again:

poisson2 <- cmdstan_model("poisson2.stan")

set up the data again including the new things we need:

poisson2_data <- list(x = x, n = length(x), beta = 0.5)
poisson2_data

$x
[1] 0 4 3 6 3 3 2 4

$n
[1] 8

$beta
[1] 0.5

Sample again

Output should be the same (to within randomness):

poisson2_fit <- poisson2$sample(data = poisson2_data)

poisson2_fit

 variable mean median   sd  mad   q5  q95 rhat ess_bulk ess_tail
   lp__   2.55   2.83 0.77 0.32 1.09 3.07 1.00     1749     1885
   lambda 3.08   3.04 0.61 0.60 2.16 4.13 1.00     1395     1662

Picture of posterior

mcmc_hist(poisson2_fit$draws("lambda"), binwidth = 0.25)

Extracting actual sampled values

A little awkward at first:

str(poisson2_fit$draws())

 'draws_array' num [1:1000, 1:4, 1:2] 3.06 2.95 3.06 2.9 3.07 ...
 - attr(*, "dimnames")=List of 3
  ..$ iteration: chr [1:1000] "1" "2" "3" "4" ...
  ..$ chain    : chr [1:4] "1" "2" "3" "4"
  ..$ variable : chr [1:2] "lp__" "lambda"

A 3-dimensional array. A dataframe would be much better.

Sampled values as dataframe

as_draws_df(poisson2_fit$draws()) %>% 
  as_tibble() -> poisson2_draws
poisson2_draws

Posterior predictive distribution

Another use for the actual sampled values is to see what kind of response values we might get in the future. This should look something like our data. For a Poisson distribution, the response values are integers:

poisson2_draws %>% 
  rowwise() %>% 
  mutate(xsim = rpois(1, lambda)) -> d

The simulated posterior distribution (in `xsim`)

d %>% select(lambda, xsim)

Comparison

Our actual data values were these:

[1] 0 4 3 6 3 3 2 4

None of these are very unlikely according to our posterior predictive distribution, so our model is believable.
Or make a plot: a bar chart with the data on it as well (over):

ggplot(d, aes(x = xsim)) + geom_bar() +
  geom_dotplot(data = tibble(x), aes(x = x), binwidth = 1) +
  scale_y_continuous(NULL, breaks = NULL) -> g

This also shows that the distribution of the data conforms well enough to the posterior predictive distribution (over).

The plot

Analysis of variance, the Bayesian way

Recall the jumping rats data:

my_url <- 
  "http://ritsokiguess.site/datafiles/jumping.txt"
rats0 <- read_delim(my_url, " ")
rats0

Our aims here

Estimate the mean bone density of all rats under each of the experimental conditions
Model: given the group means, each observation normally distributed with common variance \(\sigma^2\)
Three parameters to estimate, plus the common variance.
Obtain posterior distributions for the group means.
Ask whether the posterior distributions of these means are sufficiently different.

Numbering the groups 1/2

Stan doesn’t handle categorical variables (everything is real or int).
Turn the groups into group numbers first.
Take opportunity to put groups in logical order:

rats0 %>% mutate(
  group_fct = fct_inorder(group),
  group_no = as.integer(group_fct)
) -> rats

Numbering the groups 2/2

rats

Plotting the data 1/2

Most obviously, boxplots:

ggplot(rats, aes(x = group_fct, y = density)) + 
  geom_boxplot()

Plotting the data 2/2

Another way: density plot (smoothed out histogram); can distinguish groups by colours:

ggplot(rats, aes(x = density, fill = group_fct)) +
  geom_density(alpha = 0.6)

The procedure

For each observation, find out which (numeric) group it belongs to,
then model it as having a normal distribution with that group’s mean and the common variance.
Stan does for loops.

The model part

Suppose we have n_obs observations:

model {
  // likelihood
  for (i in 1:n_obs) {
    g = group_no[i];
    density[i] ~ normal(mu[g], sigma);
  }
}

The variables here

n_obs is data.
g is a temporary integer variable only used here
i is only used in the loop (integer) and does not need to be declared
density is data, a real vector of length n_obs
mu is a parameter, a real vector of length 3 (3 groups)
sigma is a real parameter

mu and sigma need prior distributions:

for mu, each component independently normal with mean 600 and SD 50 (my guess at how big and variable they will be)
for sigma, chi-squared with 50 df (my guess at typical amount of variability from obs to obs)

Complete the `model` section:

model {
  int g;
  // priors
  mu ~ normal(600, 50);
  sigma ~ chi_square(50);
  // likelihood
  for (i in 1:n_obs) {
    g = group_no[i];
    density[i] ~ normal(mu[g], sigma);
  }
}

Parameters

The elements of mu, one per group, and also sigma, scalar, lower limit zero:

parameters {
  array[n_group] real mu;
  real<lower=0> sigma;
}

Declare sigma to have lower limit zero here, so that the sampling runs smoothly.
declare n_group in data section

Data

Everything else:

data {
  int n_obs;
  int n_group;
  array[n_obs] real density;
  array[n_obs] int<lower=1, upper=n_group> group_no;
}

Compile

Arrange these in order data, parameters, model in file anova.stan, then:

anova <- cmdstan_model("anova.stan")

Set up data and sample

Supply values for everything declared in data:

anova_data <- list(
  n_obs = 30,
  n_group = 3,
  density = rats$density,
  group_no = rats$group_no
)
anova_fit <- anova$sample(data = anova_data)

Running MCMC with 4 sequential chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 2 finished in 0.0 seconds.
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 finished in 0.0 seconds.
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 finished in 0.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.5 seconds.

Check that the sampling worked properly

anova_fit$cmdstan_diagnose()

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Rank-normalized split effective sample size satisfactory for all parameters.

Rank-normalized split R-hat values satisfactory for all parameters.

Processing complete, no problems detected.

Look at the results

anova_fit

 variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
    lp__  -41.02 -40.64 1.48 1.26 -43.96 -39.32 1.00     1827     2574
    mu[1] 601.00 601.26 8.85 8.56 586.09 615.69 1.00     4682     3241
    mu[2] 611.95 611.95 9.05 8.72 597.20 626.65 1.00     3983     2569
    mu[3] 637.54 637.61 8.79 8.63 623.11 652.17 1.00     4368     2760
    sigma  28.48  28.12 4.23 4.08  22.21  36.20 1.00     3623     2632

Comments

The posterior 90% intervals for control (group 1) and highjump (group 3) do not quite overlap, suggesting that these exercise groups really are different.
Bayesian approach does not normally do tests: look at posterior distributions and decide whether they are different enough to be worth treating as different.

Plotting the posterior distributions for the `mu`

mcmc_hist(anova_fit$draws("mu"), binwidth = 5)

Extract the sampled values

as_draws_df(anova_fit$draws()) %>% as_tibble() -> anova_draws
anova_draws

Estimated probability that \(\mu_3 > \mu_1\)

anova_draws %>% 
  count(`mu[3]`>`mu[1]`) %>% 
  mutate(prob = n/sum(n))

High jumping group almost certainly has larger mean than control group.

Compare lowjump and control the same way

anova_draws %>% 
  count(`mu[2]`>`mu[1]`) %>% 
  mutate(prob = n/sum(n))

Likely that lowjump mean higher than control mean, but not a certainty.

More organizing

for another plot
- make longer
- give group values their proper names back

anova_draws %>% 
  pivot_longer(starts_with("mu"), 
               names_to = "group", 
               values_to = "bone_density") %>% 
  mutate(group = fct_recode(group,
    Control = "mu[1]",
    Lowjump = "mu[2]",
    Highjump = "mu[3]"
  )) -> sims

What we have now:

sims

Density plots of posterior mean distributions

ggplot(sims, aes(x = bone_density, fill = group)) + 
  geom_density(alpha = 0.6)

Posterior predictive distributions

Randomly sample from posterior means and SDs in sims. There are 12000 rows in sims:

sims %>% mutate(sim_data = rnorm(12000, bone_density,
                                 sigma)) -> ppd
ppd

Compare posterior predictive distribution with actual data

Check that the model works: distributions of data similar to what we’d predict
Idea: make plots of posterior predictive distribution, and plot actual data as points on them
Use facets, one for each treatment group:

my_binwidth <- 15
ggplot(ppd, aes(x = sim_data)) +
  geom_histogram(binwidth = my_binwidth) +
  geom_dotplot(
    data = rats, aes(x = density),
    binwidth = my_binwidth
  ) +
  facet_wrap(~group) +
  scale_y_continuous(NULL, breaks = NULL) -> g

See (over) that the data values are mainly in the middle of the predictive distributions.
Even for the control group that had outliers.

The plot

Extensions

if you want a different model other than normal, change distribution in model section
if you want to allow unequal spreads, create sigma[n_group] and in model density[i] ~ normal(mu[g], sigma[g]);
Stan will work just fine after you recompile
very flexible.
Typical modelling strategy: start simple, add complexity as warranted by data.

Bayesian Statistics with Stan

Packages for this section

Installation 1/2

Installation 2/2

Bayesian and frequentist inference 1/2

Bayesian and frequentist inference 2/2

Distribution of parameters

An example

Some algebra

Sampling from the posterior distribution

Components of Stan code: the model

Components of Stan code: the prior distribution

Components of Stan code: data and parameters

Compile and sample from the model 1/2

Compile and sample from the model 2/2

The output

Comments

Making the code more general

Set up again and sample:

Sample again

Picture of posterior

Extracting actual sampled values

Sampled values as dataframe

Posterior predictive distribution

The simulated posterior distribution (in xsim)

Comparison

The plot

Analysis of variance, the Bayesian way

Our aims here

Numbering the groups 1/2

Numbering the groups 2/2

Plotting the data 1/2

Plotting the data 2/2

The procedure

The model part

The variables here

Complete the model section:

Parameters

Data

Compile

Set up data and sample

Check that the sampling worked properly

Look at the results

Comments

Plotting the posterior distributions for the mu

Extract the sampled values

Estimated probability that \(\mu_3 > \mu_1\)

Compare lowjump and control the same way

More organizing

What we have now:

Density plots of posterior mean distributions

Posterior predictive distributions

Compare posterior predictive distribution with actual data

The plot

Extensions

The simulated posterior distribution (in `xsim`)

Complete the `model` section:

Plotting the posterior distributions for the `mu`