In this lecture, we will
In this lecture, we will
Study the normal distribution and learn how to compute probability values for a random variable that follows a normal distribution.
The normal distribution is a continuous distribution. We will discuss the so-called normal curve which is the probability density function for a normal distribution.
In this lecture, we will
Study the normal distribution and learn how to compute probability values for a random variable that follows a normal distribution.
The normal distribution is a continuous distribution. We will discuss the so-called normal curve which is the probability density function for a normal distribution.
We will also learn about the expected value and variance for a normal random variable.
In this lecture, we will
Study the normal distribution and learn how to compute probability values for a random variable that follows a normal distribution.
The normal distribution is a continuous distribution. We will discuss the so-called normal curve which is the probability density function for a normal distribution.
We will also learn about the expected value and variance for a normal random variable.
This lecture corresponds to section 4.1 in the textbook, and you are encouraged to watch the lecture video included in the next slide.
tibble(x=rnorm(5000,66,3.5)) %>% gf_histogram(~x,color="black")
Think of the histogram on the previous slide as showing sample data for measurements of human heights in inches.
What are the key features of the histogram?
Think of the histogram on the previous slide as showing sample data for measurements of human heights in inches.
What are the key features of the histogram?
It is unimodal, highly symmetric, and centered at the mean.
Think of the histogram on the previous slide as showing sample data for measurements of human heights in inches.
What are the key features of the histogram?
It is unimodal, highly symmetric, and centered at the mean.
Do you think that such data could reasonably correspond to measurements of human heights in inches?
Think of the histogram on the previous slide as showing sample data for measurements of human heights in inches.
What are the key features of the histogram?
It is unimodal, highly symmetric, and centered at the mean.
Do you think that such data could reasonably correspond to measurements of human heights in inches?
Do you think it is reasonable to treat measurements of human heights in inches as a continuous variable?
Many random variables are nearly normal, but none are exactly normal. Thus the normal distribution, while not perfect for any single problem, is very useful for a variety of problems. We will use it in data exploration and to solve important problems in statistics.
Many random variables are nearly normal, but none are exactly normal. Thus the normal distribution, while not perfect for any single problem, is very useful for a variety of problems. We will use it in data exploration and to solve important problems in statistics.
Many random variables are nearly normal, but none are exactly normal. Thus the normal distribution, while not perfect for any single problem, is very useful for a variety of problems. We will use it in data exploration and to solve important problems in statistics.
Let's spend some time to develop some intuition for how the normal distribution is often used in practice.
We start by generating some data that is not necessarily normally distributed.
We are going to play a game. We proceed as follows:
1) Sample 15 values from a binomial random variable with n=25 and p=0.5. You can think of this as doing 15 rounds of an experiment where each time we flip a fair coin 25 times and count the number of heads that we obtain.
We are going to play a game. We proceed as follows:
1) Sample 15 values from a binomial random variable with n=25 and p=0.5. You can think of this as doing 15 rounds of an experiment where each time we flip a fair coin 25 times and count the number of heads that we obtain.
2) We compute and record the mean of the 15 values obtained in step 1.
We are going to play a game. We proceed as follows:
1) Sample 15 values from a binomial random variable with n=25 and p=0.5. You can think of this as doing 15 rounds of an experiment where each time we flip a fair coin 25 times and count the number of heads that we obtain.
2) We compute and record the mean of the 15 values obtained in step 1.
3) We repeat steps 1 & 2 a very large number of times, say 2,500.
We are going to play a game. We proceed as follows:
1) Sample 15 values from a binomial random variable with n=25 and p=0.5. You can think of this as doing 15 rounds of an experiment where each time we flip a fair coin 25 times and count the number of heads that we obtain.
2) We compute and record the mean of the 15 values obtained in step 1.
3) We repeat steps 1 & 2 a very large number of times, say 2,500.
Here is the first few rows of a data frame that contains the data we acquire by playing our game.
## # A tibble: 6 x 1## means## <dbl>## 1 12.5## 2 12.5## 3 13.1## 4 12.2## 5 12 ## 6 12.3
gf_dist("norm")
We can think of the sample mean as a random variable. The sample mean inputs sample data of a fixed size from a population and returns the mean of the data.
The point is that the sample mean will vary as the sample varies.
We can think of the sample mean as a random variable. The sample mean inputs sample data of a fixed size from a population and returns the mean of the data.
The point is that the sample mean will vary as the sample varies.
What the last slide shows us is that, at least in the particular example, the distribution of the sample mean (viewed as a random variable) is very close to a random variable that is normally distributed.
We can think of the sample mean as a random variable. The sample mean inputs sample data of a fixed size from a population and returns the mean of the data.
The point is that the sample mean will vary as the sample varies.
What the last slide shows us is that, at least in the particular example, the distribution of the sample mean (viewed as a random variable) is very close to a random variable that is normally distributed.
In general, we call the distribution of the sample mean (viewed as a random variable), the sampling distribution of the mean. It is a general fact that the sampling distribution of the mean (for independent samples) is always very close to a normal distribution, regardless of the type of distribution used to sample the data for which the sample mean is computed.
We can think of the sample mean as a random variable. The sample mean inputs sample data of a fixed size from a population and returns the mean of the data.
The point is that the sample mean will vary as the sample varies.
What the last slide shows us is that, at least in the particular example, the distribution of the sample mean (viewed as a random variable) is very close to a random variable that is normally distributed.
In general, we call the distribution of the sample mean (viewed as a random variable), the sampling distribution of the mean. It is a general fact that the sampling distribution of the mean (for independent samples) is always very close to a normal distribution, regardless of the type of distribution used to sample the data for which the sample mean is computed.
This is the reason why the normal distribution plays such a central role in statistics.
If X is a normal random variable with expected value μ=E(X) and standard deviation σ=√Var(X), then we write X∼N(μ,σ).
For example, if we have a normal random variable X with expected value 12.5 and standard deviation 0.5, then we write X∼N(μ=12.5,σ=0.5).
If X is a normal random variable with expected value μ=E(X) and standard deviation σ=√Var(X), then we write X∼N(μ,σ).
For example, if we have a normal random variable X with expected value 12.5 and standard deviation 0.5, then we write X∼N(μ=12.5,σ=0.5).
We call a random variable Z that satisfies Z∼N(μ=0,σ=1) a standard normal variable and we call the normal distribution with μ=0 and σ=1 the standard normal distribution.
If X is a normal random variable with expected value μ=E(X) and standard deviation σ=√Var(X), then we write X∼N(μ,σ).
For example, if we have a normal random variable X with expected value 12.5 and standard deviation 0.5, then we write X∼N(μ=12.5,σ=0.5).
We call a random variable Z that satisfies Z∼N(μ=0,σ=1) a standard normal variable and we call the normal distribution with μ=0 and σ=1 the standard normal distribution.
Our next goal is to see how to use the normal density function to compute probability values for a random variable that follows a normal distribution.
Let X∼N(μ,σ) be a normal random variable.
Define a new random variable Z=X−μσ. We can compute the expectation and variance of Z as follows:
Let X∼N(μ,σ) be a normal random variable.
Define a new random variable Z=X−μσ. We can compute the expectation and variance of Z as follows:
E(Z)=E(X−μσ)=1σE(X−μ)=1σ(E(X)−μ)=1σ(μ−μ)=0
Let X∼N(μ,σ) be a normal random variable.
Define a new random variable Z=X−μσ. We can compute the expectation and variance of Z as follows:
E(Z)=E(X−μσ)=1σE(X−μ)=1σ(E(X)−μ)=1σ(μ−μ)=0
Var(Z)=Var(X−μσ)=1σ2Var(X−μ)=1σ2Var(X)=σ2σ2=1
Let X∼N(μ,σ) be a normal random variable.
Define a new random variable Z=X−μσ. We can compute the expectation and variance of Z as follows:
E(Z)=E(X−μσ)=1σE(X−μ)=1σ(E(X)−μ)=1σ(μ−μ)=0
Var(Z)=Var(X−μσ)=1σ2Var(X−μ)=1σ2Var(X)=σ2σ2=1
Let X∼N(μ,σ) be a normal random variable.
Define a new random variable Z=X−μσ. We can compute the expectation and variance of Z as follows:
E(Z)=E(X−μσ)=1σE(X−μ)=1σ(E(X)−μ)=1σ(μ−μ)=0
Var(Z)=Var(X−μσ)=1σ2Var(X−μ)=1σ2Var(X)=σ2σ2=1
SAT | ACT | |
---|---|---|
Mean | 1100 | 21 |
SD | 200 | 6 |
SAT | ACT | |
---|---|---|
Mean | 1100 | 21 |
SD | 200 | 6 |
SAT | ACT | |
---|---|---|
Mean | 1100 | 21 |
SD | 200 | 6 |
person 1 Z-score=1300−1100200=1, person 2 Z-score=24−216=12
SAT | ACT | |
---|---|---|
Mean | 1100 | 21 |
SD | 200 | 6 |
person 1 Z-score=1300−1100200=1, person 2 Z-score=24−216=12
SAT | ACT | |
---|---|---|
Mean | 1100 | 21 |
SD | 200 | 6 |
person 1 Z-score=1300−1100200=1, person 2 Z-score=24−216=12
So we conclude that person 1 had the better exam score.
Note that observations above the mean have a positive Z-score while observations below the mean have a negative Z-score.
In the last slide, the shaded area on the left represents the probability of an outcome being less than or equal to -1.5.
In the last slide, the shaded area on the right represents the probability of an outcome being greater than or equal to 1.8.
In the last slide, the shaded area on the left represents the probability of an outcome being less than or equal to -1.5.
In the last slide, the shaded area on the right represents the probability of an outcome being greater than or equal to 1.8.
We will use Z-scores and the standard normal distribution to compute such areas.
In the last slide, the shaded area on the left represents the probability of an outcome being less than or equal to -1.5.
In the last slide, the shaded area on the right represents the probability of an outcome being greater than or equal to 1.8.
We will use Z-scores and the standard normal distribution to compute such areas.
We proceed by first learning to compute tail areas under the standard normal density curve which corresponds to a random variable Z∼N(μ=0,σ=1).
We spell out the steps for computing tail areas under the standard normal density function.
We spell out the steps for computing tail areas under the standard normal density function.
First, draw a picture of the tail area you want to compute.
Decide if it's a left tail area or a right tail area.
We spell out the steps for computing tail areas under the standard normal density function.
First, draw a picture of the tail area you want to compute.
Decide if it's a left tail area or a right tail area.
If it's a left tail area, use pnorm(z)
in R to compute the value.
We spell out the steps for computing tail areas under the standard normal density function.
First, draw a picture of the tail area you want to compute.
Decide if it's a left tail area or a right tail area.
If it's a left tail area, use pnorm(z)
in R to compute the value.
If it's a right tail area, use 1 - pnorm(z)
in R to compute the value.
We spell out the steps for computing tail areas under the standard normal density function.
First, draw a picture of the tail area you want to compute.
Decide if it's a left tail area or a right tail area.
If it's a left tail area, use pnorm(z)
in R to compute the value.
If it's a right tail area, use 1 - pnorm(z)
in R to compute the value.
The next slide explains why this approach works.
pnorm
pnorm(z)
computes the area under the standard normal density function for values less than or equal to z. That is, it computes the tail area to the left of the value z
. 1-pnorm(z)
to compute right tail areas.1-pnorm(z)
to compute right tail areas.pnorm(z)
gives the left tail area it must be that 1-pnorm(z)
gives the right tail area. pnorm(-2) =
0.0227501. 1-pnorm(1) =
0.1586553. l_area <- pnorm(-0.5) # compute left tail arear_area <- 1 - pnorm(1.2) # copute right tail area(mid_area <- 1 - (l_area + r_area)) # subtract tails areas from 1
## [1] 0.5763928
pnorm(1.2) - pnorm(-0.5)
## [1] 0.5763928
pnorm(1.2) - pnorm(-0.5)
## [1] 0.5763928
pnorm(1.2) - pnorm(-0.5)
## [1] 0.5763928
Take a minute to think about why this works and then we will discuss together.
What does pnorm(1.2)
represent? Draw the area under the standard normal density curve that it represents.
pnorm(1.2) - pnorm(-0.5)
## [1] 0.5763928
Take a minute to think about why this works and then we will discuss together.
What does pnorm(1.2)
represent? Draw the area under the standard normal density curve that it represents.
What does pnorm(-0.5)
represent? Draw the area under the standard normal density curve that it represents.
pnorm(1.2) - pnorm(-0.5)
## [1] 0.5763928
Take a minute to think about why this works and then we will discuss together.
What does pnorm(1.2)
represent? Draw the area under the standard normal density curve that it represents.
What does pnorm(-0.5)
represent? Draw the area under the standard normal density curve that it represents.
Now what does the difference pnorm(1.2) - pnorm(-0.5)
represent?
pnorm(z) =
0.1400711. To compute the area under a normal density curve for N(μ,σ),
To compute the area under a normal density curve for N(μ,σ),
First draw a picture and determine if it is a left tail area, right tail area, or middle area.
Then standardize by subtracting the mean μ and dividing by the standard deviation σ.
To compute the area under a normal density curve for N(μ,σ),
First draw a picture and determine if it is a left tail area, right tail area, or middle area.
Then standardize by subtracting the mean μ and dividing by the standard deviation σ.
Finally, use pnorm
in R as we have explained over the last several slides.
To compute the area under a normal density curve for N(μ,σ),
First draw a picture and determine if it is a left tail area, right tail area, or middle area.
Then standardize by subtracting the mean μ and dividing by the standard deviation σ.
Finally, use pnorm
in R as we have explained over the last several slides.
We take a break from the slides to work out some examples together.
In this lecture, we will
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |