<<<<<<< HEAD <<<<<<< HEAD ======= ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 <<<<<<< HEAD >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 Alternative tests ======= ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 code{white-space: pre-wrap;} span.smallcaps{font-variant: small-caps;} div.columns{display: flex; gap: min(4vw, 1.5em);} div.column{flex: auto; overflow-x: auto;} div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} ul.task-list{list-style: none;} ul.task-list li input[type="checkbox"] { width: 0.8em; margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ vertical-align: middle; } /* CSS for syntax highlighting */ html { -webkit-text-size-adjust: 100%; } pre > code.sourceCode { white-space: pre; position: relative; } pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } pre > code.sourceCode > span:empty { height: 1.2em; } .sourceCode { overflow: visible; } code.sourceCode > span { color: inherit; text-decoration: inherit; } div.sourceCode { margin: 1em 0; } pre.sourceCode { margin: 0; } @media screen { div.sourceCode { overflow: auto; } } @media print { pre > code.sourceCode { white-space: pre-wrap; } pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } pre.numberSource code > span { position: relative; left: -4em; counter-increment: source-line; } pre.numberSource code > span > a:first-child::before { content: counter(source-line); position: relative; left: -1em; text-align: right; vertical-align: baseline; border: none; display: inline-block; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none; padding: 0 4px; width: 4em; color: #aaaaaa; } pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; } div.sourceCode { color: #003b4f; background-color: #f1f3f5; } @media screen { pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } } code span { color: #003b4f; } /* Normal */ code span.al { color: #ad0000; } /* Alert */ code span.an { color: #5e5e5e; } /* Annotation */ code span.at { color: #657422; } /* Attribute */ code span.bn { color: #ad0000; } /* BaseN */ code span.bu { } /* BuiltIn */ code span.cf { color: #003b4f; font-weight: bold; } /* ControlFlow */ code span.ch { color: #20794d; } /* Char */ code span.cn { color: #8f5902; } /* Constant */ code span.co { color: #5e5e5e; } /* Comment */ code span.cv { color: #5e5e5e; font-style: italic; } /* CommentVar */ code span.do { color: #5e5e5e; font-style: italic; } /* Documentation */ code span.dt { color: #ad0000; } /* DataType */ code span.dv { color: #ad0000; } /* DecVal */ code span.er { color: #ad0000; } /* Error */ code span.ex { } /* Extension */ code span.fl { color: #ad0000; } /* Float */ code span.fu { color: #4758ab; } /* Function */ code span.im { color: #00769e; } /* Import */ code span.in { color: #5e5e5e; } /* Information */ code span.kw { color: #003b4f; font-weight: bold; } /* Keyword */ code span.op { color: #5e5e5e; } /* Operator */ code span.ot { color: #003b4f; } /* Other */ code span.pp { color: #ad0000; } /* Preprocessor */ code span.sc { color: #5e5e5e; } /* SpecialChar */ code span.ss { color: #20794d; } /* SpecialString */ code span.st { color: #20794d; } /* String */ code span.va { color: #111111; } /* Variable */ code span.vs { color: #20794d; } /* VerbatimString */ code span.wa { color: #5e5e5e; font-style: italic; } /* Warning */ <<<<<<< HEAD >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

Alternative tests

When there isn’t sufficient normality

  • When your samples are not normal enough, cannot use \(t\) procedures
  • Sometimes transforming the data (eg taking logs of all the values) will help
  • Or, use test with no assumptions about normality:
    • for one sample, use sign test for median
    • for two samples, use Mood’s median test
    • for matched pairs, use sign test on differences.

One-sample: the IRS data

  • The IRS (“Internal Revenue Service”) is the US authority that deals with taxes (like Revenue Canada).
  • One of their forms is supposed to take no more than 160 minutes to complete. A citizen’s organization claims that it takes people longer than that on average.
  • Sample of 30 people; time to complete form recorded.
  • Read in data, and do \(t\)-test of \(H_0 : \mu = 160\) vs. \(H_a : \mu > 160\).
  • Only one column, so pretend it is delimited by something.

Packages

<<<<<<< HEAD <<<<<<< HEAD
library(tidyverse)
library(smmr) 
=======
library(tidyverse)
library(smmr) 
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
library(tidyverse)
library(smmr) 
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
  • installation instructions for smmr later

Read in data

<<<<<<< HEAD <<<<<<< HEAD
my_url <- "http://ritsokiguess.site/datafiles/irs.txt"
irs <- read_csv(my_url)
irs
=======
my_url <- "http://ritsokiguess.site/datafiles/irs.txt"
irs <- read_csv(my_url)
irs
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
my_url <- "http://ritsokiguess.site/datafiles/irs.txt"
irs <- read_csv(my_url)
irs
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

Test whether mean is 160 or greater

<<<<<<< HEAD <<<<<<< HEAD
with(irs, t.test(Time, mu = 160, 
                 alternative = "greater"))
=======
with(irs, t.test(Time, mu = 160, 
                 alternative = "greater"))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
with(irs, t.test(Time, mu = 160, 
                 alternative = "greater"))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

    One Sample t-test

data:  Time
t = 1.8244, df = 29, p-value = 0.03921
alternative hypothesis: true mean is greater than 160
95 percent confidence interval:
 162.8305      Inf
sample estimates:
mean of x 
 201.2333 

Reject null; mean (for all people to complete form) greater than 160.

But, look at a graph

<<<<<<< HEAD <<<<<<< HEAD
ggplot(irs, aes(x = Time)) + geom_histogram(bins = 6)
=======
ggplot(irs, aes(x = Time)) + geom_histogram(bins = 6)

>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
ggplot(irs, aes(x = Time)) + geom_histogram(bins = 6)

>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

Comments

The sign test

Getting a P-value for sign test 1/3

Getting P-value for sign test 2/3

<<<<<<< HEAD <<<<<<< HEAD
irs %>% count(Time > 160)
=======
irs %>% count(Time > 160)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
irs %>% count(Time > 160)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

Getting P-value for sign test 3/3

<<<<<<< HEAD <<<<<<< HEAD
dbinom(17, 30, 0.5)
=======
dbinom(17, 30, 0.5)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
dbinom(17, 30, 0.5)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
[1] 0.1115351
<<<<<<< HEAD <<<<<<< HEAD
tibble(x=17:30) %>% 
  mutate(prob=dbinom(x, 30, 0.5)) %>% 
  summarize(total=sum(prob))
=======
tibble(x=17:30) %>% 
  mutate(prob=dbinom(x, 30, 0.5)) %>% 
  summarize(total=sum(prob))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
tibble(x=17:30) %>% 
  mutate(prob=dbinom(x, 30, 0.5)) %>% 
  summarize(total=sum(prob))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

… or

use cumulative distribution

<<<<<<< HEAD <<<<<<< HEAD
pbinom(17, 30, 0.5) # prob of <= 17
=======
pbinom(17, 30, 0.5) # prob of <= 17
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
pbinom(17, 30, 0.5) # prob of <= 17
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
[1] 0.8192027

and hence (note first input):

<<<<<<< HEAD <<<<<<< HEAD
pbinom(16, 30, 0.5, lower.tail = FALSE)
=======
pbinom(16, 30, 0.5, lower.tail = FALSE)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
pbinom(16, 30, 0.5, lower.tail = FALSE)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
[1] 0.2923324

This last is \(P(X \ge 17) = P(X > 16)\).

Using my package smmr

<<<<<<< HEAD <<<<<<< HEAD
install.packages("smmr", repos = "nxskok.r-universe.dev")
=======
install.packages("smmr", repos = "nxskok.r-universe.dev")
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
install.packages("smmr", repos = "nxskok.r-universe.dev")
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
<<<<<<< HEAD <<<<<<< HEAD
library(smmr)
=======
library(smmr)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
library(smmr)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

smmr for sign test

<<<<<<< HEAD <<<<<<< HEAD
sign_test(irs, Time, 160)
=======
sign_test(irs, Time, 160)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
sign_test(irs, Time, 160)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
$above_below
below above 
   13    17 

$p_values
  alternative   p_value
1       lower 0.8192027
2       upper 0.2923324
3   two-sided 0.5846647

Comments (1/4)

Comments (2/4)

Test P-value
\(t\) 0.0392
Sign 0.2923

Comments (3/4)

<<<<<<< HEAD <<<<<<< HEAD
irs %>% summarize(mean_time = mean(Time), 
                  median_time = median(Time))
=======
irs %>% summarize(mean_time = mean(Time), 
                  median_time = median(Time))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
irs %>% summarize(mean_time = mean(Time), 
                  median_time = median(Time))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

Comments (4/4)

CI for median 1/2

CI for median 2/2

Test decision Confidence interval
Reject \(H_0\) at level \(\alpha\) \(\iff\) \(C\%\) CI does not contain \(H_0\) value
Do not reject \(H_0\) at level \(\alpha\) \(\iff\) \(C\%\) CI contains \(H_0\) value

The value of this

For our data

<<<<<<< HEAD <<<<<<< HEAD
pval_sign(160, irs, Time)
=======
pval_sign(160, irs, Time)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
pval_sign(160, irs, Time)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
[1] 0.5846647
<<<<<<< HEAD <<<<<<< HEAD
pval_sign(200, irs, Time)
[1] 0.3615946
pval_sign(300, irs, Time)
=======
pval_sign(200, irs, Time)
[1] 0.3615946
pval_sign(300, irs, Time)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
pval_sign(200, irs, Time)
[1] 0.3615946
pval_sign(300, irs, Time)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
[1] 0.001430906

Doing a whole bunch

<<<<<<< HEAD <<<<<<< HEAD
(d <- tibble(null_median=seq(100,300,20)))
=======
(d <- tibble(null_median=seq(100,300,20)))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
(d <- tibble(null_median=seq(100,300,20)))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

… and then

“for each null median, run the function pval_sign for that null median and get the P-value”:

<<<<<<< HEAD <<<<<<< HEAD
d %>% rowwise() %>% 
  mutate(p_value = pval_sign(null_median, irs, Time))
=======
d %>% rowwise() %>% 
  mutate(p_value = pval_sign(null_median, irs, Time))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
d %>% rowwise() %>% 
  mutate(p_value = pval_sign(null_median, irs, Time))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

Results

Make it easier for ourselves

<<<<<<< HEAD <<<<<<< HEAD
d %>% rowwise() %>% 
  mutate(p_value = pval_sign(null_median, irs, Time)) %>% 
  mutate(in_out = ifelse(p_value > 0.05, "inside", "outside"))
=======
d %>% rowwise() %>% 
  mutate(p_value = pval_sign(null_median, irs, Time)) %>% 
  mutate(in_out = ifelse(p_value > 0.05, "inside", "outside"))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
d %>% rowwise() %>% 
  mutate(p_value = pval_sign(null_median, irs, Time)) %>% 
  mutate(in_out = ifelse(p_value > 0.05, "inside", "outside"))
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

Confidence interval for median?

A more efficient way: bisection

<<<<<<< HEAD <<<<<<< HEAD
lo <- 200 
hi <- 220
=======
lo <- 200 
hi <- 220
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
lo <- 200 
hi <- 220
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
<<<<<<< HEAD <<<<<<< HEAD
try <- (lo + hi) / 2
try
[1] 210
pval_sign(try,irs,Time)
=======
try <- (lo + hi) / 2
try
[1] 210
pval_sign(try,irs,Time)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
try <- (lo + hi) / 2
try
[1] 210
pval_sign(try,irs,Time)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
[1] 0.09873715

… bisection continued

<<<<<<< HEAD <<<<<<< HEAD
lo <- try
try <- (lo + hi) / 2
try
[1] 215
pval_sign(try, irs, Time)
=======
lo <- try
try <- (lo + hi) / 2
try
[1] 215
pval_sign(try, irs, Time)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
lo <- try
try <- (lo + hi) / 2
try
[1] 215
pval_sign(try, irs, Time)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
[1] 0.06142835

Bisection automatically

<<<<<<< HEAD <<<<<<< HEAD
lo = 200
hi = 220
while (hi - lo > 1) { # replace 1 by desired accuracy
  try = (hi + lo) / 2
  ptry = pval_sign(try, irs, Time)
  print(c(try, ptry))
  if (ptry <= 0.05)
    hi = try
  else
    lo = try
}
======= ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
lo = 200
hi = 220
while (hi - lo > 1) { # replace 1 by desired accuracy
  try = (hi + lo) / 2
  ptry = pval_sign(try, irs, Time)
  print(c(try, ptry))
  if (ptry <= 0.05)
    hi = try
  else
    lo = try
}
<<<<<<< HEAD >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

The output from this loop

[1] 210.00000000   0.09873715
[1] 215.00000000   0.06142835
[1] 217.50000000   0.04277395
[1] 216.25000000   0.04277395
[1] 215.62500000   0.04277395

Using smmr

<<<<<<< HEAD <<<<<<< HEAD
ci_median(irs, Time)
=======
ci_median(irs, Time)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
ci_median(irs, Time)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
[1] 119.0065 214.9955
<<<<<<< HEAD <<<<<<< HEAD
ci_median(irs, Time, conf.level=0.90)
=======
ci_median(irs, Time, conf.level=0.90)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
ci_median(irs, Time, conf.level=0.90)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
[1] 123.0031 208.9960

Bootstrap

<<<<<<< HEAD <<<<<<< HEAD
tibble(sim = 1:10000) %>% 
  rowwise() %>% 
  mutate(my_sample = list(sample(irs$Time, replace = TRUE))) %>% 
  mutate(my_mean = mean(my_sample)) %>% 
  ggplot(aes(sample = my_mean)) + 
    stat_qq() + stat_qq_line() -> g
======= ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
tibble(sim = 1:10000) %>% 
  rowwise() %>% 
  mutate(my_sample = list(sample(irs$Time, replace = TRUE))) %>% 
  mutate(my_mean = mean(my_sample)) %>% 
  ggplot(aes(sample = my_mean)) + 
    stat_qq() + stat_qq_line() -> g
<<<<<<< HEAD >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

The sampling distribution

<<<<<<< HEAD <<<<<<< HEAD
g
=======
g

>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
g

>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

Comments

Two samples: Mood’s median test

The chi-squared test for independence

Suppose we want to know whether people are in favour of having daylight savings time all year round. We ask 20 males and 20 females whether they each agree with having DST all year round (“yes”) or not (“no”).

Some of the data:

<<<<<<< HEAD <<<<<<< HEAD
my_url <- "http://ritsokiguess.site/datafiles/dst.txt"
dst <- read_delim(my_url," ")
dst %>% slice_sample(n = 10)
=======
my_url <- "http://ritsokiguess.site/datafiles/dst.txt"
dst <- read_delim(my_url," ")
dst %>% slice_sample(n = 10)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
my_url <- "http://ritsokiguess.site/datafiles/dst.txt"
dst <- read_delim(my_url," ")
dst %>% slice_sample(n = 10)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

… continued

Count up individuals in each category combination, and arrange in contingency table:

<<<<<<< HEAD <<<<<<< HEAD
tab <- with(dst, table(gender, agree))
tab
=======
tab <- with(dst, table(gender, agree))
tab
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
tab <- with(dst, table(gender, agree))
tab
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
        agree
gender   no yes
  female 11   9
  male    3  17

… continued

…And finally

<<<<<<< HEAD <<<<<<< HEAD
chisq.test(tab, correct=FALSE)
=======
chisq.test(tab, correct=FALSE)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
chisq.test(tab, correct=FALSE)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

    Pearson's Chi-squared test

data:  tab
X-squared = 7.033, df = 1, p-value = 0.008002

Mood’s median test

Why it works

Mood’s median test for reading data

<<<<<<< HEAD <<<<<<< HEAD
kids %>% summarize(med=median(score)) %>% pull(med) -> m
m
=======
kids %>% summarize(med=median(score)) %>% pull(med) -> m
m
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
kids %>% summarize(med=median(score)) %>% pull(med) -> m
m
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
[1] 47
<<<<<<< HEAD <<<<<<< HEAD
tab <- with(kids, table(group, score > m))
tab
=======
tab <- with(kids, table(group, score > m))
tab
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
tab <- with(kids, table(group, score > m))
tab
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
     
group FALSE TRUE
    c    15    8
    t     7   14

The test

<<<<<<< HEAD <<<<<<< HEAD
chisq.test(tab, correct=FALSE)
=======
chisq.test(tab, correct=FALSE)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
chisq.test(tab, correct=FALSE)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

    Pearson's Chi-squared test

data:  tab
X-squared = 4.4638, df = 1, p-value = 0.03462

Or by smmr

<<<<<<< HEAD <<<<<<< HEAD
median_test(kids,score,group)
=======
median_test(kids,score,group)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
median_test(kids,score,group)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
$grand_median
[1] 47

$table
     above
group above below
    c     8    15
    t    14     7

$test
       what      value
1 statistic 4.46376812
2        df 1.00000000
3   P-value 0.03462105

Comments 1/2

Comments 2/2

Matched pairs: the pain relief data

Values aligned in columns:

<<<<<<< HEAD <<<<<<< HEAD
my_url <- 
  "http://ritsokiguess.site/datafiles/analgesic.txt"
pain <- read_table(my_url)
pain %>% mutate(diff = druga - drugb) -> pain
glimpse(pain)
======= ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
my_url <- 
  "http://ritsokiguess.site/datafiles/analgesic.txt"
pain <- read_table(my_url)
pain %>% mutate(diff = druga - drugb) -> pain
glimpse(pain)
<<<<<<< HEAD >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
Rows: 12
Columns: 4
$ subject <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
$ druga   <dbl> 2.0, 3.6, 2.6, 2.6, 7.3, 3.4, 14.9, 6.6, 2.3, 2.0, 6.8, 8.5
$ drugb   <dbl> 3.5, 5.7, 2.9, 2.4, 9.9, 3.3, 16.7, 6.0, 3.8, 4.0, 9.1, 20.9
$ diff    <dbl> -1.5, -2.1, -0.3, 0.2, -2.6, 0.1, -1.8, 0.6, -1.5, -2.0, -2.3,…

Assessing normality

The normal quantile plot (of differences)

<<<<<<< HEAD <<<<<<< HEAD
ggplot(pain, aes(sample = diff)) + stat_qq() + stat_qq_line()

What to do instead?

… continued

<<<<<<< HEAD <<<<<<< HEAD
sign_test(pain, diff, 0)
=======
sign_test(pain, diff, 0)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 =======
sign_test(pain, diff, 0)
>>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
$above_below
below above 
    9     3 

$p_values
  alternative    p_value
1       lower 0.07299805
2       upper 0.98071289
3   two-sided 0.14599609

Did we really need to worry about that outlier?

Bootstrap sampling distribution of sample mean differences:

<<<<<<< HEAD <<<<<<< HEAD
tibble(sim = 1:10000) %>% 
  rowwise() %>% 
  mutate(my_sample = list(sample(pain$diff, replace = TRUE))) %>% 
  mutate(my_mean = mean(my_sample)) %>% 
  ggplot(aes(sample = my_mean)) + stat_qq() + stat_qq_line() -> g
======= ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729
tibble(sim = 1:10000) %>% 
  rowwise() %>% 
  mutate(my_sample = list(sample(pain$diff, replace = TRUE))) %>% 
  mutate(my_mean = mean(my_sample)) %>% 
  ggplot(aes(sample = my_mean)) + stat_qq() + stat_qq_line() -> g
<<<<<<< HEAD >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729

The normal quantile plot

<<<<<<< HEAD <<<<<<< HEAD
g

Yes we did need to worry; this is clearly skewed left and not normal.

<<<<<<< HEAD <<<<<<< HEAD ======= ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 <<<<<<< HEAD >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 <<<<<<< HEAD <<<<<<< HEAD <<<<<<< HEAD <<<<<<< HEAD ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729 ======= >>>>>>> 33d33e2e1c2e32a86e951a45e0354efbb512c729