Rows: 15 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): smoke
dbl (1): cotanine
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Looking for an association between smoke and vs_median, both categorical.
Suggestion: count up number of values above and below in each group, and then test, but how?
Chi-squared test for independence (as it turns out).
Aside: explore chi-squared test, then apply here.
The chi-squared test for independence
Suppose we want to know whether people are in favour of having daylight savings time all year round. We ask 20 males and 20 females whether they each agree with having DST all year round (“yes”) or not (“no”). Some randomly chosen rows of data:
Reject null hypothesis of no association (P-value 0.008)
therefore there is a difference in rates of agreement between (all) males and females (or that gender and agreement are associated).
This calculation gives same answers as you would get by hand. (Omitting correct = FALSE uses “Yates correction”).
Chi-squared test for independence is always two-sided.
Mood’s median test
Earlier: compare medians of two groups.
Sign test: count number of values above and below something (there, hypothesized median).
Mood’s median test:
Find “grand median” of all the data, regardless of group
Count data values in each group above/below grand median.
Make contingency table of group vs. above/below.
Test for association.
If group medians equal, each group should have about half its observations above/below grand median. If not, one group will be mostly above grand median and other below.
Mood’s median test for smoke exposure data
Any data values equal to overall median have no information about rejecting null, so remove first.
Comments on boxplot