Estimations

class: center, middle, inverse, title-slide

.title[
# Estimations
]
.subtitle[
## EDP 613
]
.author[
### Week 8
]

---

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(/Users/skynet/Documents/WVU/Teaching/GitHub.nosync/edp613/static/img/course_hex_alpha.png);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
bottom:1em;left:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.title-slide):not(.inverse):not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('a')
          logo.classList = 'xaringan-extra-logo'
          logo.href = 'https://edp613.asocialdatascientist.com'
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

# <span style='color:#bff4ee;'>A Note About The Slides</span>

Currently the equations may not show up properly in Firefox. Other browsers such as Chrome and Safari do appear to render them correctly.

---

# <span style='color:#bff4ee;'>A Note About Probability</span>

We're going to introduce some concepts from Chapter 8 here.

---

# <span style='color:#bff4ee;'>From</span> To

<br>
<center>
<b style="color:#bff4ee;">Descriptive Statistics</b><br><br><i style="color:#bff4ee;">mathematical techniques for organizing and summarizing a set of numerical data</i>
</center>
<br>

<br>
<center>
<b style="color:#f4eebf;">Inferential Statistics</b><br><br><i style="color:#f4eebf;">generalizing from a sample to a population</i>
</center>

---

# Terms

- **Statistic** - Mathematical expression that describes some aspects of a set of scores for a sample

- **Parameter** - Describes some aspect of a set of scores for a population

---

# First a Brief Intro to Hypothesis Testing

>- Formally - Testing an assumption about a population parameter

>- In Better Terms - An assumption about a particular situation of the world that is testable

---

# The Null Hypothesis

>- Represented as `$H_0$`

>- is basically what you expect to happen before you run an experiment

>- *You have to know what the Null is!*

---

# The Alternative Hypothesis

>- Represented as `$H_1$` (or `$H_A$`)

>- is basically what else could happen if what you expect doesn't occur

>- *You don't have to know this!*

---

# Tests of Statistical Significance

>- *Formally* - Done to determine whether `$H_0$` or `$H_1$` can be rejected

>- *Better Explanation* - Test to figure out whether you can reasonably say if your initial assumption won't happen

>- *Results* - If the outcomes of a study don't go against what you expected to happen, then you aren't finding anything new or surprising

---

# Term

A **(statistical) estimation ** is a sample statistic is used to estimate the value of an unknown population parameter.

---

# Idea of Positive and Negative Outcomes

- The Null hypothesis `$H_0$` is typically assuming nothing is going to happen

- If `$H_0$` turns out to be right, then its called a ***negative*** outcome because nothing changed.

- If `$H_1$` turns out to be right, then its called a ***positive*** outcome because something that you expected to happen didn't happen.

>- Experiment: Over the span of one year, a group of people with ADHD gets an experimental pill that may help them focus better than their current medication

>>- `$H_0$`: Group stays the same (expected)

>>- `$H_A$`: Group is more focused (what we want to happen)

>- Results: After an assessment

>>- if the Group doesn't show greater focus, then we have a ***negative*** outcome because that's what was expected to happen

>>- if the Group shows greater focus, then we have a ***positive*** outcome because that's NOT what was expected to happen

---

# Notes about `$H_0$` and `$H_A$`
<center>
`$H_A$` is typically not the only alternative explanation
</center>
</br>

- What if the Group was found to more focused?

>- As a rule of thumb don't say that `$H_A$` is correct unless you absolutely know there are two outcomes (aka *binary outcomes*)

>- Instead write that "we reject `$H_0$`" because you don't know if that's the ONLY alternative hypothesis. 
>>- It could also be that in other experiments that groups are found to be less focused!

<br>
- What if nothing happened to the Group?

>- You can absolutely say that `$H_0$` is correct because that's what you expected

>- So you can write that "we accept `$H_0$`"

---

# Formal Table of Statistical Error Types

.center2[

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;background-color: #212121 !important;"> Decision </th>
   <th style="text-align:left;background-color: #212121 !important;"> Null is True </th>
   <th style="text-align:left;background-color: #212121 !important;"> Null is False </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;width: 10em; "> Reject Null </td>
   <td style="text-align:left;width: 10em; "> <b style="color:#f4bfc5;">Type I Error</b><br>(aka <b><i style="color:#f4bfc5;">False Positive</i></b>) </td>
   <td style="text-align:left;width: 10em; "> <span style="color:#bfe0f4;">Correct Outcome</span><br>(aka <i style="color:#bfe0f4;">True Positive</i>) </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; background-color: #212121 !important;"> Fail to Reject Null </td>
   <td style="text-align:left;width: 10em; background-color: #212121 !important;"> <span style="color:#bfe0f4;">Correct Outcome</span><br>(aka <i style="color:#bfe0f4;">True Negative</i>) </td>
   <td style="text-align:left;width: 10em; background-color: #212121 !important;"> <b style="color:#f4bfc5;">Type II Error</b><br>(aka <b><i style="color:#f4bfc5;">False Negative</i></b>) </td>
  </tr>
</tbody>
</table>
]

---

# Nutshell Table of Statistical Error Types

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;background-color: #212121 !important;"> Decision </th>
   <th style="text-align:center;background-color: #212121 !important;"> Your first thought was right </th>
   <th style="text-align:center;background-color: #212121 !important;"> Your first thought was wrong </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;width: 20em; ">  </td>
   <td style="text-align:center;width: 30em; ">  </td>
   <td style="text-align:center;width: 30em; ">  </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 20em; background-color: #212121 !important;">  </td>
   <td style="text-align:center;width: 30em; background-color: #212121 !important;">  </td>
   <td style="text-align:center;width: 30em; background-color: #212121 !important;">  </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 20em; "> You changed your mind </td>
   <td style="text-align:center;width: 30em; "> <b style="color:#f4bfc5;">You changed your mind<br>BUT<br>the reality is you shouldn't have</b> </td>
   <td style="text-align:center;width: 30em; "> <span style="color:#bfe0f4;">You changed your mind<br>AND<br>in reality that was the right decision</span> </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 20em; ">  </td>
   <td style="text-align:center;width: 30em; ">  </td>
   <td style="text-align:center;width: 30em; ">  </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 20em; "> <center>Results in a</center> </td>
   <td style="text-align:center;width: 30em; "> <center><b><i style="color:#f4bfc5;">False Positive / Type I Error</i></b></center> </td>
   <td style="text-align:center;width: 30em; "> <center><i style="color:#bfe0f4;">True Positive</i></center> </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 20em; ">  </td>
   <td style="text-align:center;width: 30em; ">  </td>
   <td style="text-align:center;width: 30em; ">  </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 20em; ">  </td>
   <td style="text-align:center;width: 30em; ">  </td>
   <td style="text-align:center;width: 30em; ">  </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 20em; "> You didn't change your mind </td>
   <td style="text-align:center;width: 30em; "> <span style="color:#bfe0f4;">You didn't change your mind<br>AND<br>in reality that was the right decision</span> </td>
   <td style="text-align:center;width: 30em; "> <b style="color:#f4bfc5;">You didn't change your mind<br>BUT<br>the reality is that you should have</b> </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 20em; ">  </td>
   <td style="text-align:center;width: 30em; ">  </td>
   <td style="text-align:center;width: 30em; ">  </td>
  </tr>
  <tr>
   <td style="text-align:center;width: 20em; "> <center>Results in a</center> </td>
   <td style="text-align:center;width: 30em; "> <center><i style="color:#bfe0f4;">True Negative</i></center> </td>
   <td style="text-align:center;width: 30em; "> <center><b><i style="color:#f4bfc5;">False Negative / Type II Error</i></b></center> </td>
  </tr>
</tbody>
</table>

---

## Example

---

# Alpha

Formally

>- rejecting `$H_0$` when it is true
  
--

>- the probability of making a <b style='color:#f4bfc5;'>Type I Error</b>

<br>
In Better Terms

>- the chance of making the wrong decision when what was initially expected to happen actually occurs

>- Given by `$\alpha$`

>- Ranges from 0-1 like all other probabilities

<br>
<center>
Typically `\alpha = 0.05` but its really context dependent
</center>

---

# Example

<center>
For airplanes
</center>
<br>

.pull-left[
- if they fly people around, then when **analyzing failures**

>- you may want to lower the probability of making a wrong decision

>- use a **smaller** `$\alpha$`
]

.pull-right[
- if they're made of paper, then when **analyzing   failures**

>- you might be willing accept the higher risk of making the wrong decision

>- use a **higher** `$\alpha$`
]

---

# Beta

Formally

>-  not rejecting the `$H_0$` when `$H_1$` is true

--
  
>-  the probability of making a <b style='color:#f4bfc5;'>Type II Error</b>

<br>
In Better Terms

>- the chance of making the wrong decision when an something else actually occurs

>- Given by `$\beta$`

>- Ranges from 0-1 like all other probabilities

---

# Power

- `$1-\beta$` is called **statistical power**

- extremely important!
  
--

- Formally - the probability of NOT making a Type II error
  
--

- In Better Terms - the chance that you can separate if an outcome is a result of something occurring vs.  pure luck!

---
  
# Decision Making

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;border-bottom: solid;
           border-bottom-width:1px;
           border-bottom-color: #666666;"> Reality </th>
   <th style="text-align:center;border-bottom: solid;
           border-bottom-width:1px;
           border-bottom-color: #666666;"> Rejected Null </th>
   <th style="text-align:center;border-bottom: solid;
           border-bottom-width:1px;
           border-bottom-color: #666666;"> Did Not Reject Null </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; "> Type I Error </td>
   <td style="text-align:center;width: 10em; "> Correct decision </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; "> `H_0` is true </td>
   <td style="text-align:center;width: 10em; "> `alpha` </td>
   <td style="text-align:center;width: 10em; "> `1-alpha` </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; "> Chance of rejecting `H_0` when it is true /<br><span style="color:#f4eebf;"><b><i>Level of Significance</i></b></span> </td>
   <td style="text-align:center;width: 10em; "> <span style="color:#f4eebf;"><b><i>Level of Confidence</i></b></span> </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; "> Correct Decision </td>
   <td style="text-align:center;width: 10em; "> Type II Error </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; "> `H_0` is false </td>
   <td style="text-align:center;width: 10em; "> `1-beta` </td>
   <td style="text-align:center;width: 10em; "> `beta` </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; "> <span style="color:#f4eebf;"><b><i>Statistical Power!</i></b></span> </td>
   <td style="text-align:center;width: 10em; "> <span style="color:#f4eebf;"><b><i>Rate of a Type II Error</i></b></span> /<br>Chance of accepting `H_0` when it is false </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
   <td style="text-align:center;width: 10em; ">  </td>
  </tr>
</tbody>
</table>

---

# Decision Making

|  | 
--------|---------|---------
Null | `$H_0 =$` | "Forecast says its NOT going to rain"
Alternative | `$H_1 =$` | "Something else will happen"
 |  | 
<br style="line-height: 3px" />

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;border-bottom: 0;">
 <thead>
  <tr>
   <th style="text-align:left;border-bottom: solid;
           border-bottom-width:1px;
           border-bottom-color: #666666;"> Reality </th>
   <th style="text-align:left;border-bottom: solid;
           border-bottom-width:1px;
           border-bottom-color: #666666;"> Did not reject the forecast </th>
   <th style="text-align:left;border-bottom: solid;
           border-bottom-width:1px;
           border-bottom-color: #666666;"> Rejected forecast </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;width: 10em; "> Forecast was right </td>
   <td style="text-align:left;width: 10em; "> Did not take an umbrella and you're dry </td>
   <td style="text-align:left;width: 10em; "> Took an umbrella AND you're dry but may look silly or possibly fancy </td>
  </tr>
  <tr>
   <td style="text-align:left;width: 10em; border-bottom: solid;
           border-bottom-width:1px;
           border-bottom-color: #666666;"> Forecast was wrong </td>
   <td style="text-align:left;width: 10em; border-bottom: solid;
           border-bottom-width:1px;
           border-bottom-color: #666666;"> Did not take an umbrella AND you're wet </td>
   <td style="text-align:left;width: 10em; border-bottom: solid;
           border-bottom-width:1px;
           border-bottom-color: #666666;"> Took an umbrella AND you're dry </td>
  </tr>
</tbody>
<tfoot><tr><td style="padding: 0; " colspan="100%">
<span style="font-style: italic;"><small>Note: </small></span> <sup></sup> <small><i>You could have also gotten wet from snow, a flood, etc. so again <b>the alternative hypothesis generally does not imply the opposite!</b></i></small>
</td></tr></tfoot>
</table>
<br style="line-height: 3px" />

---

# Estimation

- **(Statistical) Estimation** - a sample statistic is used to estimate the value of an unknown population parameter

- **Point estimation** - use of sample data to calculate a single value
  
--

- **Interval estimation** - use of sample data to calculate a possible range of values
  
--

<br>
<center>
<i>Selecting a sample mean</i>
</center>
<br>

---

# Updating Estimation for Sample Means

- **Point estimation** - use of sample data to calculate a single **mean** value

- Benefit - the sample mean will equal the population mean on average
  
--
  
  - Drawback - unable to figure out if a sample mean actually equals the population mean
  
--

- **Interval estimation** - use of sample data to calculate a possible range of **mean** values

---

# The Characteristic of Hypothesis Testing and Estimation

---

# Confidence

- **Confidence Interval** - an interval that contains an unknown parameter (e.g. `$\mu$`) with certain degree of confidence

- **Level of Confidence** - probability or likelihood that an interval estimate will contain an unknown population parameter

---

# Determining the Confidence Interval

1. Calculate the standard error of the mean `$$\sigma_{\overline{Y}} =\dfrac{\sigma}{\sqrt{N}}$$`

2. Decide on a level of confidence

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:center;border-bottom: solid;
           border-bottom-width:1px;
           border-bottom-color: #666666;"> Probability </th>
   <th style="text-align:center;border-bottom: solid;
           border-bottom-width:1px;
           border-bottom-color: #666666;"> `z`-score </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 0.90 </td>
   <td style="text-align:center;"> 1.645 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 0.95 </td>
   <td style="text-align:center;"> 1.96 </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 0.99 </td>
   <td style="text-align:center;"> 2.576 </td>
  </tr>
</tbody>
</table>

<br>
<br>
<center>
Again its typical to have a 95% level of confidence thereby making \[\alpha = 0.05\]
</center>

---

# Determining the Confidence Interval (continued)

<ol start=3>
<li> `CI = \overline{Y} \pm z\cdot\sigma_{\overline{Y}}`
</ol>

<ol start=4>
<li> Interpret the results
</ol>

---

# Example

IQ scores in the general healthy population are approximately normally distributed with `$100 ± 15$`. In a sample of 100 students a sample mean IQ of 103. Find the 90% confidence interval for this data.

Firstly we have `$N = 100$`, `$\mu=100$`, `$\sigma = 15$`, and `$\overline{Y} = 103$`.

1. `$$\sigma_{\overline{Y}} = \dfrac{\sigma}{\sqrt{N}} =\dfrac{15}{\sqrt{100}} = 1.50$$`

2. Want to find 90% confidence interval, so choose a 90% level of confidence.

`$$z\cdot \sigma_{\overline{Y}} = 1.645\cdot 1.50 = 2.47$$`
---

<ol start=3>
<li> So

$$ 90\%\, CI = 103\pm2.47 = (105.47, 100.53) $$
</ol>

<ol start=4>
<li> We are 90% confident that the overall mean IQ is between 100.53 and 105.47.
</ol>

---

## That's it. Take a break before our R session!