# Intro to Statistics

Statistics is all about understanding change and chance. At it's core, it simply involves **differences (subtraction) **and **similarities (averages)**. If you know how to subtract and find averages, you will be off to a great start!

**Samples vs. Populations**

With statistics, we are using information we know about a **sample** or group to make predictions about it's larger **population**. This process is called inference. This can also work the other way around. (We can take information about the larger population to make predictions about a subset of that population.)

**Populations**

Populations have a mean and a standard deviation. The **mean** is the average of the population. The **standard deviation** is the average difference between each member of the population.

**μ = Population Mean**

**σ = Population Standard Deviation**

The population statistics are represented by Greek symbols.

**Samples**

Samples also have a mean and standard deviation. The **sample mean** is the average of all the data points in the sample. The **sample standard deviation** is the average distance between each data point and the mean. Both are examples of descriptive statistics, which describe the data set.

**x - Individual data points**

**x̄ - Sample Mean** (average of the data points)

**s - Standard Deviation **(average difference between the data points)

The sample statistics are represented by letters.

*Note: Sometimes the sample mean is represented by a capital X or M.*

## The Key to Statistics: Understanding the **Differences** Between **Averages**.

In statistics, we want to understand the *differences between the averages*

**of samples and their populations**(

**X̄ -**

**μ)**. Specifically, we'd like to know whether or not those differences are large enough to be statistically

**significant**(likely to happen).

To consider whether a difference in between is likely to occur, must account for the change that is happening between all of the data points (the **standard deviation**,** ****σ**,** **or average difference).

If the difference between the averages is substantially larger than the standard deviation of the data points, then there is a significant difference.

For example, a difference of 20 points between the averages of a sample and its population may be significant if the typical difference or standard deviation is normally only 4 points. However, if the standard deviation is normally say 30 points, than a difference of 20 points would not seem significant at all.

This relationship is expressed in the equation for a z-score:

The *differences between the averages* of samples and their populations (**X̄ - ****μ)** *adjusted for (divided by) the **standard deviation*,** ****σ**.

The bigger the standard deviation, the smaller your test statistic (Z). **The smaller your test statistic (Z), the less significant the difference will be**. If you think about it, this makes sense because **the more change that is happening (bigger σ), the less meaningful the difference**.

Also, the larger your Z statistic, the closer it will be to the red rejection regions. The** rejection regions** indicate that the statistic** is significant**.

**Significance Tests / Hypothesis Testing**

To determine whether the difference between the sample mean and the population mean is significant, we perform **significance tests **aka **hypothesis tests** (the terms are used interchangeably)**. **With significance tests, we first decide how confident we want to be about something happening (our hypothesis). This is our **confidence level** aka **significance level. **Then we use the test to decide whether or not we can actually be *that *confident in it happening (or the opposite not happening).

One of the most confusing parts about statistics is the fact that we are actually testing to see what the probability is of our hypothesis __not happening__. That's right - in order to be so sure that our hypothesis is correct, we must prove that the chances of it not being correct are incredibly (significantly) small.

If the probability of our hypothesis *not* being true is small enough, we can conclude that our hypothesis is true.

**Types of Significance Tests**

**Z Tests**

**One Sample Z Test**

What is is

Formula

How to Interpret

When to use

**Two Sample Z Test**

What is is

Formula

How to Interpret

When to use

**One Proportion Z Test**

What is is

Formula

How to Interpret

When to use

**Two Proportion Z Test**

What is is

Formula

How to Interpret

When to use

**T Tests**

**One-Sample T Test**

What is is

Formula

How to Interpret

When to use

**Two-Sample T Test**

What is is

Formula

How to Interpret

When to use

**One Proportion T Test**

What is is

Formula

How to Interpret

When to use

**Two Proportion T Test**

What is is

Formula

How to Interpret

When to use

**Paired T Test (Repeated Measures)**

What is is

Formula

How to Interpret

When to use

**F Tests/ANOVA**

What is is

Formula

How to Interpret

When to use

**Pearson's Correlation Coefficient**

What is is

Formula

How to Interpret

When to use

**Simple Regression & Multiple Regression**

What is is

Formula

How to Interpret

When to use