The Not so Tiny t-test

---

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(/Users/skynet/Documents/WVU/Teaching/GitHub.nosync/edp613/static/img/course_hex_alpha.png);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.title-slide):not(.inverse):not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('a')
          logo.classList = 'xaringan-extra-logo'
          logo.href = 'https://edp613.asocialdatascientist.com'
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

# Packages needed and a Note about Icons

Please load up the following packages. Remember to first install the ones you don't have.

<br>
<br>
You may come across the following icons. The table below lists what each means.

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;background-color: transparent !important;"> Icon </th>
   <th style="text-align:left;background-color: transparent !important;"> Description </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;width: 10em; background-color: #transparent !important;"> <svg aria-hidden="true" role="img" viewbox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:
#4682b4;overflow:visible;position:relative;"><path d="M52.51 440.6l171.5-142.9V214.3L52.51 71.41C31.88 54.28 0 68.66 0 96.03v319.9C0 443.3 31.88 457.7 52.51 440.6zM308.5 440.6l192-159.1c15.25-12.87 15.25-36.37 0-49.24l-192-159.1c-20.63-17.12-52.51-2.749-52.51 24.62v319.9C256 443.3 287.9 457.7 308.5 440.6z"></path></svg> </td>
   <td style="text-align:left;width: 40em; background-color: #transparent !important;"> Indicates that an example continues on the following slide. </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 10em; background-color: #transparent !important;"> <svg aria-hidden="true" role="img" viewbox="0 0 384 512" style="height:1em;width:0.75em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#ff6347;overflow:visible;position:relative;"><path d="M384 128v255.1c0 35.35-28.65 64-64 64H64c-35.35 0-64-28.65-64-64V128c0-35.35 28.65-64 64-64H320C355.3 64 384 92.65 384 128z"></path></svg> </td>
   <td style="text-align:left;width: 40em; background-color: #transparent !important;"> Indicates that a section using common syntax has ended. </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 10em; background-color: #transparent !important;"> <svg aria-hidden="true" role="img" viewbox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#5cb85c;overflow:visible;position:relative;"><path d="M172.5 131.1C228.1 75.51 320.5 75.51 376.1 131.1C426.1 181.1 433.5 260.8 392.4 318.3L391.3 319.9C381 334.2 361 337.6 346.7 327.3C332.3 317 328.9 297 339.2 282.7L340.3 281.1C363.2 249 359.6 205.1 331.7 177.2C300.3 145.8 249.2 145.8 217.7 177.2L105.5 289.5C73.99 320.1 73.99 372 105.5 403.5C133.3 431.4 177.3 435 209.3 412.1L210.9 410.1C225.3 400.7 245.3 404 255.5 418.4C265.8 432.8 262.5 452.8 248.1 463.1L246.5 464.2C188.1 505.3 110.2 498.7 60.21 448.8C3.741 392.3 3.741 300.7 60.21 244.3L172.5 131.1zM467.5 380C411 436.5 319.5 436.5 263 380C213 330 206.5 251.2 247.6 193.7L248.7 192.1C258.1 177.8 278.1 174.4 293.3 184.7C307.7 194.1 311.1 214.1 300.8 229.3L299.7 230.9C276.8 262.1 280.4 306.9 308.3 334.8C339.7 366.2 390.8 366.2 422.3 334.8L534.5 222.5C566 191 566 139.1 534.5 108.5C506.7 80.63 462.7 76.99 430.7 99.9L429.1 101C414.7 111.3 394.7 107.1 384.5 93.58C374.2 79.2 377.5 59.21 391.9 48.94L393.5 47.82C451 6.731 529.8 13.25 579.8 63.24C636.3 119.7 636.3 211.3 579.8 267.7L467.5 380z"></path></svg> </td>
   <td style="text-align:left;width: 40em; background-color: #transparent !important;"> Indicates that there is an active hyperlink on the slide. </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 10em; background-color: #transparent !important;"> <svg aria-hidden="true" role="img" viewbox="0 0 384 512" style="height:1em;width:0.75em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#faffbd;overflow:visible;position:relative;"><path d="M384 48V512l-192-112L0 512V48C0 21.5 21.5 0 48 0h288C362.5 0 384 21.5 384 48z"></path></svg> </td>
   <td style="text-align:left;width: 40em; background-color: #transparent !important;"> Indicates that a section covering a concept has ended. </td>
  </tr>
</tbody>
</table>

---

# Comparing the Means Between Groups of Things

The `$t$`-test is:

- One of the most common tests in statistics

- Used to determine whether the means of two groups are equal

---

# Ideas

> **One-sample *t*-tests**: Compare the sample mean with a known value, when the variance of the population is unknown

> **Two-sample *t*-tests**: Compare the means of two groups under the assumption that both samples are random, independent, and normally distributed with unknown but equal variances

> **Paired *t*-tests**: Compare the means of two sets of paired samples, taken from two populations with unknown variance

---

## Packages

Please load up the following packages

```r
library(tidyverse)
library(patchwork)
```

---

# The Base R `t.test` command

```r
t.test(x, y = NULL, 
       alternative = c("two.sided", "less", "greater"), 
       mu = 0, 
       paired = FALSE, 
       var.equal = FALSE, 
       conf.level = 0.95)
```

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;"> Option </th>
   <th style="text-align:left;"> Function </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;width: 10em; font-family: monospace;color: #b2dfdb !important;"> x </td>
   <td style="text-align:left;width: 40em; "> a numeric vector from a data set </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 10em; font-family: monospace;color: #b2dfdb !important;"> y </td>
   <td style="text-align:left;width: 40em; "> an optional numeric vector from a data set </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 10em; font-family: monospace;color: #b2dfdb !important;"> mu </td>
   <td style="text-align:left;width: 40em; "> a number indicating the true value of the mean </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 10em; font-family: monospace;color: #b2dfdb !important;"> alternative </td>
   <td style="text-align:left;width: 40em; "> preference on type of test you wish to run </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 10em; font-family: monospace;color: #b2dfdb !important;"> paired </td>
   <td style="text-align:left;width: 40em; "> preference on whether you wish to perform a paired <i>t</i>-test </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 10em; font-family: monospace;color: #b2dfdb !important;"> var.equal </td>
   <td style="text-align:left;width: 40em; "> indicates whether or not to assume equal variances when performing a two-sample  <i>t</i>-test </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 10em; font-family: monospace;color: #b2dfdb !important;"> conf.level </td>
   <td style="text-align:left;width: 40em; "> the confidence level of the reported confidence interval </td>
  </tr>
</tbody>
</table>

---

### Notes

- The `var.equals` argument has a default setting of <span style='color: #64b5f6;'>FALSE</span> indicating unequal variances and applies the Welsch approximation to the degrees of freedom. 
  
  - If you wish to have equal variances, this can be done by changing the setting to <span style='color: #64b5f6;'>TRUE</span>
  
--

- The `conf.level` argument is set to 95%, or where `$\alpha = 0.05$`.

- The confidence interval is determined by
      - `$\mu$` for the one-sample *t*-test 
    
      - `$\mu_1-\mu_2$` for the two-sample *t*-test. 
      
---

### Be Aware!

> The `wilcox.test` function provides the same basic functionality and arguments

<br>
<br>
> However it is used when we ***do not want to assume the data to follow a normal distribution***

<br>
<br>
> We're assuming normality

<br>
<br>
> So please ignore it for now!

---

# Assumptions

<br>
<br>
<br>
<br> 
.pull-left[**Independent observations**] .pull-right[*Observations are independent from one another*]

<br>
<br>
<br>
<br> 
.pull-left[**Normality**] .pull-right[*Observations are from a normally distributed population*]

<br>
<br>
<br>
<br> 
.pull-left[**Homogeneity**] .pull-right[*If more than one population is sampled from, then the populations have equal variances (aka **homogeneity of variances**)*]

---

## One- or Two-sample *t*-tests

If `y` is

– excluded, `t.test` will run as a one-sample *t*-test

– included, `t.test` will run as a two-sample *t*-test

- default `t.test` command will run as a two-sided *t*-test

- you can run a one-sided *t*-test by changing the `alternative` option to `greater` or `less`

---

#### Example

`t.test(x, alternative = "greater", mu = 47)` performs a one-sample *t*-test on the data contained in `x` where

`$$H_0: \mu = 47$$`
  
`$$H_1: \mu > 47$$`
      
---

## Example

```r
midwest %>%
  head()
```

```
## # A tibble: 6 × 28
##     PID county    state  area poptotal popdens…¹ popwh…² popbl…³ popam…⁴ popas…⁵
##   <int> <chr>     <chr> <dbl>    <int>     <dbl>   <int>   <int>   <int>   <int>
## 1   561 ADAMS     IL    0.052    66090     1271.   63917    1702      98     249
## 2   562 ALEXANDER IL    0.014    10626      759     7054    3496      19      48
## 3   563 BOND      IL    0.022    14991      681.   14477     429      35      16
## 4   564 BOONE     IL    0.017    30806     1812.   29344     127      46     150
## 5   565 BROWN     IL    0.018     5836      324.    5264     547      14       5
## 6   566 BUREAU    IL    0.05     35688      714.   35157      50      65     195
## # … with 18 more variables: popother <int>, percwhite <dbl>, percblack <dbl>,
## #   percamerindan <dbl>, percasian <dbl>, percother <dbl>, popadults <int>,
## #   perchsd <dbl>, percollege <dbl>, percprof <dbl>, poppovertyknown <int>,
## #   percpovertyknown <dbl>, percbelowpoverty <dbl>, percchildbelowpovert <dbl>,
## #   percadultpoverty <dbl>, percelderlypoverty <dbl>, inmetro <int>,
## #   category <chr>, and abbreviated variable names ¹popdensity, ²popwhite,
## #   ³popblack, ⁴popamerindian, ⁵popasian
```

---

### Purpose

We want to compare the differences between the average percent of college educated adults in Ohio versus Michigan

```r
*midwest
```
]
 
.panel2-sw1-auto[

```
## # A tibble: 437 × 28
##      PID county    state  area poptotal popden…¹ popwh…² popbl…³ popam…⁴ popas…⁵
##    <int> <chr>     <chr> <dbl>    <int>    <dbl>   <int>   <int>   <int>   <int>
##  1   561 ADAMS     IL    0.052    66090    1271.   63917    1702      98     249
##  2   562 ALEXANDER IL    0.014    10626     759     7054    3496      19      48
##  3   563 BOND      IL    0.022    14991     681.   14477     429      35      16
##  4   564 BOONE     IL    0.017    30806    1812.   29344     127      46     150
##  5   565 BROWN     IL    0.018     5836     324.    5264     547      14       5
##  6   566 BUREAU    IL    0.05     35688     714.   35157      50      65     195
##  7   567 CALHOUN   IL    0.017     5322     313.    5298       1       8      15
##  8   568 CARROLL   IL    0.027    16805     622.   16519     111      30      61
##  9   569 CASS      IL    0.024    13437     560.   13384      16       8      23
## 10   570 CHAMPAIGN IL    0.058   173025    2983.  146506   16559     331    8033
## # … with 427 more rows, 18 more variables: popother <int>, percwhite <dbl>,
## #   percblack <dbl>, percamerindan <dbl>, percasian <dbl>, percother <dbl>,
## #   popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
## #   poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
## #   percchildbelowpovert <dbl>, percadultpoverty <dbl>,
## #   percelderlypoverty <dbl>, inmetro <int>, category <chr>, and abbreviated
## #   variable names ¹popdensity, ²popwhite, ³popblack, ⁴popamerindian, …
```
]

---
count: false

```r
midwest %>%
* filter(state == "OH" | state == "MI")
```
]
 
.panel2-sw1-auto[

```
## # A tibble: 171 × 28
##      PID county  state  area poptotal popdensity popwh…¹ popbl…² popam…³ popas…⁴
##    <int> <chr>   <chr> <dbl>    <int>      <dbl>   <int>   <int>   <int>   <int>
##  1  1197 ALCONA  MI    0.041    10145       247.   10026      27      56      26
##  2  1198 ALGER   MI    0.051     8972       176.    8422     213     304      24
##  3  1199 ALLEGAN MI    0.049    90509      1847.   86760    1448     543     411
##  4  1200 ALPENA  MI    0.034    30605       900.   30372      35      93      85
##  5  1201 ANTRIM  MI    0.031    18185       587.   17895      23     211      24
##  6  1202 ARENAC  MI    0.021    14931       711    14695      10     139      38
##  7  1203 BARAGA  MI    0.054     7954       147.    6971      49     918      10
##  8  1204 BARRY   MI    0.034    50057      1472.   49429     104     188     144
##  9  1205 BAY     MI    0.026   111723      4297.  107747    1242     726     428
## 10  1206 BENZIE  MI    0.02     12200       610    11863      30     237      35
## # … with 161 more rows, 18 more variables: popother <int>, percwhite <dbl>,
## #   percblack <dbl>, percamerindan <dbl>, percasian <dbl>, percother <dbl>,
## #   popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
## #   poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
## #   percchildbelowpovert <dbl>, percadultpoverty <dbl>,
## #   percelderlypoverty <dbl>, inmetro <int>, category <chr>, and abbreviated
## #   variable names ¹popwhite, ²popblack, ³popamerindian, ⁴popasian
```
]

---
count: false

```r
midwest %>%
  filter(state == "OH" | state == "MI") %>%
* select(state, percollege)
```
]
 
.panel2-sw1-auto[

```
## # A tibble: 171 × 2
##    state percollege
##    <chr>      <dbl>
##  1 MI          14.1
##  2 MI          16.3
##  3 MI          18.1
##  4 MI          18.9
##  5 MI          19.0
##  6 MI          11.8
##  7 MI          14.6
##  8 MI          17.3
##  9 MI          18.2
## 10 MI          21.4
## # … with 161 more rows
```
]

---

```r
ohio_mi <- 
  midwest %>%
  filter(state == "OH" | state == "MI") %>%
  select(state, percollege)
```

---

### Descriptives

.pull-left[
<img src="Slides-Week-10R_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" />

```r
ohio_mi %>% 
  filter(state == "OH") %>%
  summary()
```

```
##     state             percollege    
##  Length:88          Min.   : 7.913  
##  Class :character   1st Qu.:13.089  
##  Mode  :character   Median :15.462  
##                     Mean   :16.890  
##                     3rd Qu.:18.995  
##                     Max.   :32.205
```
]

.pull-right[
<img src="Slides-Week-10R_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" />

```r
ohio_mi %>% 
  filter(state == "MI") %>%
  summary()
```

```
##     state             percollege   
##  Length:83          Min.   :11.31  
##  Class :character   1st Qu.:14.61  
##  Mode  :character   Median :17.43  
##                     Mean   :19.42  
##                     3rd Qu.:21.31  
##                     Max.   :48.08
```
]

<br>
<br>
<center>
Ohio appears to have slightly less college educated adults than Michigan but let's see if that's actually true
</center>

---

### Boxplot

```r
*ggplot(ohio_mi, aes(x = state,
*                   y = percollege,
*                   fill = state))
```
]
 
.panel2-sw2-auto[
![](Slides-Week-10R_files/figure-html/sw2_auto_01_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = state,
                    y = percollege,
                    fill = state)) +
*       geom_boxplot(alpha = 0.7,
*                    outlier.size = 2.5)
```
]
 
.panel2-sw2-auto[
![](Slides-Week-10R_files/figure-html/sw2_auto_02_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = state,
                    y = percollege,
                    fill = state)) +
        geom_boxplot(alpha = 0.7,
                     outlier.size = 2.5) +
*       scale_fill_manual(values = c("#00274C",
*                                    "#BB0000"))
```
]
 
.panel2-sw2-auto[
![](Slides-Week-10R_files/figure-html/sw2_auto_03_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = state,
                    y = percollege,
                    fill = state)) +
        geom_boxplot(alpha = 0.7,
                     outlier.size = 2.5) +
        scale_fill_manual(values = c("#00274C",
                                     "#BB0000")) +
*       theme_minimal()
```
]
 
.panel2-sw2-auto[
![](Slides-Week-10R_files/figure-html/sw2_auto_04_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = state,
                    y = percollege,
                    fill = state)) +
        geom_boxplot(alpha = 0.7,
                     outlier.size = 2.5) +
        scale_fill_manual(values = c("#00274C",
                                     "#BB0000")) +
        theme_minimal() +
*       theme(panel.grid.major.x = element_blank(),
*             panel.grid.minor.x = element_blank())
```
]
 
.panel2-sw2-auto[
![](Slides-Week-10R_files/figure-html/sw2_auto_05_output-1.png)
]

---

### Shifting Skewness

```r
*ggplot(ohio_mi, aes(x = percollege))
```
]
 
.panel2-sw3-auto[
![](Slides-Week-10R_files/figure-html/sw3_auto_01_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = percollege)) +
*       geom_histogram(aes(fill = ..count..),
*                      bins = 20)
```
]
 
.panel2-sw3-auto[
![](Slides-Week-10R_files/figure-html/sw3_auto_02_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = percollege)) +
        geom_histogram(aes(fill = ..count..),
                       bins = 20) +
*       scale_fill_viridis_c("Frequency")
```
]
 
.panel2-sw3-auto[
![](Slides-Week-10R_files/figure-html/sw3_auto_03_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = percollege)) +
        geom_histogram(aes(fill = ..count..),
                       bins = 20) +
        scale_fill_viridis_c("Frequency") +
*       facet_wrap(. ~ state)
```
]
 
.panel2-sw3-auto[
![](Slides-Week-10R_files/figure-html/sw3_auto_04_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = percollege)) +
        geom_histogram(aes(fill = ..count..),
                       bins = 20) +
        scale_fill_viridis_c("Frequency") +
        facet_wrap(. ~ state) +
*       theme_minimal()
```
]
 
.panel2-sw3-auto[
![](Slides-Week-10R_files/figure-html/sw3_auto_05_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = percollege)) +
        geom_histogram(aes(fill = ..count..),
                       bins = 20) +
        scale_fill_viridis_c("Frequency") +
        facet_wrap(. ~ state) +
        theme_minimal() +
*       scale_x_log10()
```
]
 
.panel2-sw3-auto[
![](Slides-Week-10R_files/figure-html/sw3_auto_06_output-1.png)
]

---

### More Shifting Skewness

```r
*ggplot(ohio_mi, aes(x = percollege))
```
]
 
.panel2-sw4-auto[
![](Slides-Week-10R_files/figure-html/sw4_auto_01_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = percollege)) +
*       geom_histogram(aes(fill = ..count..),
*                      bins = 20)
```
]
 
.panel2-sw4-auto[
![](Slides-Week-10R_files/figure-html/sw4_auto_02_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = percollege)) +
        geom_histogram(aes(fill = ..count..),
                       bins = 20) +
*       scale_fill_viridis_c("Frequency")
```
]
 
.panel2-sw4-auto[
![](Slides-Week-10R_files/figure-html/sw4_auto_03_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = percollege)) +
        geom_histogram(aes(fill = ..count..),
                       bins = 20) +
        scale_fill_viridis_c("Frequency") +
*       facet_wrap(. ~ state, ncol = 1)
```
]
 
.panel2-sw4-auto[
![](Slides-Week-10R_files/figure-html/sw4_auto_04_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = percollege)) +
        geom_histogram(aes(fill = ..count..),
                       bins = 20) +
        scale_fill_viridis_c("Frequency") +
        facet_wrap(. ~ state, ncol = 1) +
*       theme_minimal()
```
]
 
.panel2-sw4-auto[
![](Slides-Week-10R_files/figure-html/sw4_auto_05_output-1.png)
]

---
count: false

```r
ggplot(ohio_mi, aes(x = percollege)) +
        geom_histogram(aes(fill = ..count..),
                       bins = 20) +
        scale_fill_viridis_c("Frequency") +
        facet_wrap(. ~ state, ncol = 1) +
        theme_minimal() +
*       scale_x_log10()
```
]
 
.panel2-sw4-auto[
![](Slides-Week-10R_files/figure-html/sw4_auto_06_output-1.png)
]

---

```r
regularplot <- 
  ggplot(ohio_mi, aes(x = percollege)) +
  geom_histogram(aes(fill = ..count..),
                 bins = 20) +
  scale_fill_viridis_c("Frequency") +
  facet_wrap(~ state, ncol = 1) +
  theme_minimal() +
  ggtitle("Regular")
```
]

```r
scaledplot <- 
  ggplot(ohio_mi, aes(x = percollege)) +
  geom_histogram(aes(fill = ..count..),
                 bins = 20) +
  scale_fill_viridis_c("Frequency") +
  facet_wrap(~ state, ncol = 1) + 
  theme_minimal() +
  ggtitle("Scaled") +
  scale_x_log10() # this line was added
```
]

---

```r
regularplot + scaledplot
```

---

### Testing as is

```r
t.test(percollege ~ state, data = ohio_mi)
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  percollege by state
## t = 2.5953, df = 161.27, p-value = 0.01032
## alternative hypothesis: true difference in means between group MI and group OH is not equal to 0
## 95 percent confidence interval:
##  0.6051571 4.4568579
## sample estimates:
## mean in group MI mean in group OH 
##         19.42146         16.89045
```

<br>
> Results show a *p*-value < .01 so **there is a statistical difference between the two means**

<br>
> This supports the alternative hypothesis that there is a difference between the average percent of college educated adults in Ohio versus Michigan

---

### Testing using a `log` function

```r
t.test(log(percollege) ~ state, data = ohio_mi)
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  log(percollege) by state
## t = 2.9556, df = 168.98, p-value = 0.003567
## alternative hypothesis: true difference in means between group MI and group OH is not equal to 0
## 95 percent confidence interval:
##  0.04724892 0.23732151
## sample estimates:
## mean in group MI mean in group OH 
##         2.915873         2.773587
```

<br>
> Results show a *p*-value < .01 so **there is a statistical difference between the two means**

<br>
> So **there is a statistical difference between the two means**

---

## Paired-samples *t*-test

Same `t.test` command as in the previous sections but just change your option to `paired =` <span style='color: #64b5f6;'>TRUE</span>

```r
t.test(x, y = NULL, 
       alternative = c("two.sided", "less", "greater"), 
       mu = 0, 
       paired = TRUE, 
       var.equal = FALSE, 
       conf.level = 0.95)
```

---

## Example

```r
sleep %>%
  head()
```

```
##   extra group ID
## 1   0.7     1  1
## 2  -1.6     1  2
## 3  -0.2     1  3
## 4  -1.2     1  4
## 5  -0.1     1  5
## 6   3.4     1  6
```

---

## Purpose

We are assessing if there is a statistically significant effect of a particular drug on sleep (increase in hours of sleep compared to control) for 10 patients

```r
*sleep
```
]
 
.panel2-sw5-auto[

```
##    extra group ID
## 1    0.7     1  1
## 2   -1.6     1  2
## 3   -0.2     1  3
## 4   -1.2     1  4
## 5   -0.1     1  5
## 6    3.4     1  6
## 7    3.7     1  7
## 8    0.8     1  8
## 9    0.0     1  9
## 10   2.0     1 10
## 11   1.9     2  1
## 12   0.8     2  2
## 13   1.1     2  3
## 14   0.1     2  4
## 15  -0.1     2  5
## 16   4.4     2  6
## 17   5.5     2  7
## 18   1.6     2  8
## 19   4.6     2  9
## 20   3.4     2 10
```
]

---
count: false

```r
sleep %>%
* select(-ID)
```
]
 
.panel2-sw5-auto[

```
##    extra group
## 1    0.7     1
## 2   -1.6     1
## 3   -0.2     1
## 4   -1.2     1
## 5   -0.1     1
## 6    3.4     1
## 7    3.7     1
## 8    0.8     1
## 9    0.0     1
## 10   2.0     1
## 11   1.9     2
## 12   0.8     2
## 13   1.1     2
## 14   0.1     2
## 15  -0.1     2
## 16   4.4     2
## 17   5.5     2
## 18   1.6     2
## 19   4.6     2
## 20   3.4     2
```
]

---

## Descriptives

```r
sleep %>% 
  summary()
```

```
##      extra        group        ID   
##  Min.   :-1.600   1:10   1      :2  
##  1st Qu.:-0.025   2:10   2      :2  
##  Median : 0.950          3      :2  
##  Mean   : 1.540          4      :2  
##  3rd Qu.: 3.400          5      :2  
##  Max.   : 5.500          6      :2  
##                          (Other):8
```

---

## Boxplot

```r
sleep %>%
  ggplot(aes(group, extra, fill = group)) +
  geom_boxplot(alpha = 0.8) +
  scale_fill_manual(
    values = c("#428bca", "#d9534f")
    ) +
  theme_minimal() +
  theme(
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank()
    )
```
]

.pull-right[
<img src="Slides-Week-10R_files/figure-html/unnamed-chunk-23-1.png" width="68%" style="display: block; margin: auto;" />
]

<br>
<br>
Asessing if there is a statistically significant effect of a particular drug on sleep (increase in hours of sleep compared to control) for 10 patients

---

### Testing

We want to see if the mean values for the extra variable differs between group 1 and group 2

```r
t.test(extra ~ group, data = sleep, paired = TRUE)
```

```
## 
## 	Paired t-test
## 
## data:  extra by group
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean difference 
##           -1.58
```

<br>
> Results show a *p*-value < .01 so **there is a statistical difference between the two means**

<br>
> This supports the alternative hypothesis that suggesting that the drug increases sleep on average by 1.58 hours

---

## Thats it!