In this lecture, we will
In this lecture, we will
Examine the most important concepts related to our study of random variables.
Recall from the last lecture that we introduced the notion of a random variable, that is, something that assigns a numerical value to events from a random process.
In this lecture, we will
Examine the most important concepts related to our study of random variables.
Recall from the last lecture that we introduced the notion of a random variable, that is, something that assigns a numerical value to events from a random process.
We typically denote random variables by capital letters at the end of the alphabet such as X, Y, or Z.
In this lecture, we will
Examine the most important concepts related to our study of random variables.
Recall from the last lecture that we introduced the notion of a random variable, that is, something that assigns a numerical value to events from a random process.
We typically denote random variables by capital letters at the end of the alphabet such as X, Y, or Z.
Our primary goal is to study methods that allow us to better understand the distribution of a random variable.
In this lecture, we will
Examine the most important concepts related to our study of random variables.
Recall from the last lecture that we introduced the notion of a random variable, that is, something that assigns a numerical value to events from a random process.
We typically denote random variables by capital letters at the end of the alphabet such as X, Y, or Z.
Our primary goal is to study methods that allow us to better understand the distribution of a random variable.
Specifically, we will cover expectation, variance, discrete and continuous distributions, and some common random variables and their distributions. See textbook sections 3.4, 3.5, 4.1, 4.2, and 4.3.
If a random variable has only a very small number of outcomes, then we can simply list its distribution.
For example, reconsider the process of rolling two six-sided dice. Let X be the random variable that records the sum of the values shown by the two dice. Then the distribution for X is
| Dice sum | X=2 | X=3 | X=4 | X=5 | X=6 | X=7 | X=8 | X=9 | X=10 | X=11 | X=12 |
| Probability | 1/36 | 2/36 | 3/36 | 4/36 | 5/36 | 6/36 | 5/36 | 4/36 | 3/36 | 2/36 | 1/36 |
If a random variable has only a very small number of outcomes, then we can simply list its distribution.
For example, reconsider the process of rolling two six-sided dice. Let X be the random variable that records the sum of the values shown by the two dice. Then the distribution for X is
| Dice sum | X=2 | X=3 | X=4 | X=5 | X=6 | X=7 | X=8 | X=9 | X=10 | X=11 | X=12 |
| Probability | 1/36 | 2/36 | 3/36 | 4/36 | 5/36 | 6/36 | 5/36 | 4/36 | 3/36 | 2/36 | 1/36 |
P(X=3)=236=118
or
P(X<=5)=136+236+336+436=1036=518
Consider the random process of tossing a coin where the probability of landing heads is a number p. Let X be the random variable that counts the number of heads after a single toss.
Construct the probability distribution for X. Note that the only possible outcomes for X is 0 or 1.
Consider the random process of tossing a coin where the probability of landing heads is a number p. Let X be the random variable that counts the number of heads after a single toss.
Construct the probability distribution for X. Note that the only possible outcomes for X is 0 or 1.
Obviously P(X=1)=p.
Consider the random process of tossing a coin where the probability of landing heads is a number p. Let X be the random variable that counts the number of heads after a single toss.
Construct the probability distribution for X. Note that the only possible outcomes for X is 0 or 1.
Obviously P(X=1)=p.
By the complement rule, we must have P(X=0)=1−p. Therefore,
| Num Heads | X=0 | X=1 |
| Probability | 1-p | p |
In cases where it is not easy to completely write down the probability distribution for a random variable, it is useful to be able to characterize the distribution.
The two most common characteristics we consider for the distribution of a random variable are its expectation or expected value, and its variance.
In cases where it is not easy to completely write down the probability distribution for a random variable, it is useful to be able to characterize the distribution.
The two most common characteristics we consider for the distribution of a random variable are its expectation or expected value, and its variance.
We will discuss expectation first.
Before we define the expectation of a random variable, it is helpful to distinguish two types of random variables.
A random variable X is called discrete if its outcomes form a discrete set.
A set is discrete if it can be labeled by the whole numbers 1, 2, 3, ...
Before we define the expectation of a random variable, it is helpful to distinguish two types of random variables.
A random variable X is called discrete if its outcomes form a discrete set.
A set is discrete if it can be labeled by the whole numbers 1, 2, 3, ...
For example, the random variable that adds the values after a roll of two six-sided dice is a discrete random variable. Additionally, the random variable that counts the number of heads after tossing a coin 10 times is a discrete random variable.
Before we define the expectation of a random variable, it is helpful to distinguish two types of random variables.
A random variable X is called discrete if its outcomes form a discrete set.
A set is discrete if it can be labeled by the whole numbers 1, 2, 3, ...
For example, the random variable that adds the values after a roll of two six-sided dice is a discrete random variable. Additionally, the random variable that counts the number of heads after tossing a coin 10 times is a discrete random variable.
Later we will describe continuous random variables. However, it's important to note that there are random variables that are neither discrete or continuous.
Expectation, or the expected value of a random variable X measures the average outcome for X. We typically denote the expectation of X by E(X), or sometimes by μ.
The expected value of a discrete random variable X is the sum of the products of its outcomes times its probability values.
Expectation, or the expected value of a random variable X measures the average outcome for X. We typically denote the expectation of X by E(X), or sometimes by μ.
The expected value of a discrete random variable X is the sum of the products of its outcomes times its probability values.
Mathematically,
E(X)=x1P(X=x1)+x2P(X=x2)+⋯+xnP(X=xn)
Expectation, or the expected value of a random variable X measures the average outcome for X. We typically denote the expectation of X by E(X), or sometimes by μ.
The expected value of a discrete random variable X is the sum of the products of its outcomes times its probability values.
Mathematically,
E(X)=x1P(X=x1)+x2P(X=x2)+⋯+xnP(X=xn)
E(X)=2136+3236+4336+5436+6536+7636+8536+9436+10336+11236+12136=24536≈6.8
## # A tibble: 6 x 3## die_1 die_2 sum## <int> <int> <int>## 1 6 2 12## 2 2 4 4## 3 6 5 12## 4 2 4 4## 5 2 6 4## 6 6 4 12## # A tibble: 6 x 3## die_1 die_2 sum## <int> <int> <int>## 1 6 2 12## 2 2 4 4## 3 6 5 12## 4 2 4 4## 5 2 6 4## 6 6 4 12## [1] 7.0548## # A tibble: 6 x 3## die_1 die_2 sum## <int> <int> <int>## 1 6 2 12## 2 2 4 4## 3 6 5 12## 4 2 4 4## 5 2 6 4## 6 6 4 12## [1] 7.0548## # A tibble: 6 x 3## die_1 die_2 sum## <int> <int> <int>## 1 6 2 12## 2 2 4 4## 3 6 5 12## 4 2 4 4## 5 2 6 4## 6 6 4 12## [1] 7.0548The point is that expected value is to random variables what the mean is to data.
That is, if we take a very large number of samples from a random variable and compute the sample mean, then this will give us an accurate (but not exact) estimate for the expected value.
Suppose we let X be the random variable that counts the number of heads after a single toss of a coin with probability of getting heads p.
Then,
E(X)=1⋅p+0⋅(1−p)=p
Suppose we let X be the random variable that counts the number of heads after a single toss of a coin with probability of getting heads p.
Then,
E(X)=1⋅p+0⋅(1−p)=p
Suppose we let X be the random variable that counts the number of heads after a single toss of a coin with probability of getting heads p.
Then,
E(X)=1⋅p+0⋅(1−p)=p
If our coin is fair, then p=12 and E(X)=12.
Here's the mean of 1,000 samples from this random variable (number of heads for a fair coin):
## [1] 0.527The expected value satisfies some important properties, among the most important are:
If we multiply a random variable X by a number a, and then add another number b, then we can compute the expected value in either of two ways and get the same answer. Mathematically,
E(aX+b)=aE(X)+b.
The expected value satisfies some important properties, among the most important are:
If we multiply a random variable X by a number a, and then add another number b, then we can compute the expected value in either of two ways and get the same answer. Mathematically,
E(aX+b)=aE(X)+b.
E(aX+bY)=aE(X)+bE(Y)
The expected value satisfies some important properties, among the most important are:
If we multiply a random variable X by a number a, and then add another number b, then we can compute the expected value in either of two ways and get the same answer. Mathematically,
E(aX+b)=aE(X)+b.
E(aX+bY)=aE(X)+bE(Y)
E(X1+X2+⋯+Xn)=E(X1)+E(X2)+⋯+E(Xn)
We have seen that expected value is to random variables what the mean is to data. What is the analog of the sample variance of data for a random variable?
The answer is the variance of a random variable. If X is a random variable and μ is its expected value, then the variance of X is
Var(X)=E((X−μ)2)
We have seen that expected value is to random variables what the mean is to data. What is the analog of the sample variance of data for a random variable?
The answer is the variance of a random variable. If X is a random variable and μ is its expected value, then the variance of X is
Var(X)=E((X−μ)2)
The standard deviation of a random variable X is the square root of its variance sd(X)=√Var(X).
It is helpful to know that if a and b are numbers and if X is a random variable, then
Var(aX+b)=a2Var(X)
You can take it on faith that if X is the random variable that returns the number of heads after a single toss of a fair coin, then Var(X)=14=0.25. Let's see how this compares with the sample variance of some data:
The sample variance after 1,000 sample tosses is
## [1] 0.2502142Now that we have covered the principal concepts regarding random variables, we introduce some famous types of random variables and describe their distributions.
We begin with some famous discrete distributions.
Now that we have covered the principal concepts regarding random variables, we introduce some famous types of random variables and describe their distributions.
We begin with some famous discrete distributions.
Bernoulli, Geometric, and Binomial
Now that we have covered the principal concepts regarding random variables, we introduce some famous types of random variables and describe their distributions.
We begin with some famous discrete distributions.
Bernoulli, Geometric, and Binomial
Then we discuss continuous random variables and the most famous continuous distribution, the normal distribution.
A Bernoulli random variable (section 4.2.1) is a random variable X corresponding to a random process with exactly two possible outcomes typically labeled "success" and "failure", a so-called Bernoulli trial. We define X by counting the number of successes after a single trial so that X=1 (for a success) and X=0 for failure.
A Bernoulli random variable (section 4.2.1) is a random variable X corresponding to a random process with exactly two possible outcomes typically labeled "success" and "failure", a so-called Bernoulli trial. We define X by counting the number of successes after a single trial so that X=1 (for a success) and X=0 for failure.
| Num Successes | X=0 | X=1 |
| Probability | 1-p | p |
A Bernoulli random variable (section 4.2.1) is a random variable X corresponding to a random process with exactly two possible outcomes typically labeled "success" and "failure", a so-called Bernoulli trial. We define X by counting the number of successes after a single trial so that X=1 (for a success) and X=0 for failure.
| Num Successes | X=0 | X=1 |
| Probability | 1-p | p |
μ=E(X)=p, and σ2=Var(X)=p(1−p)
A Bernoulli random variable (section 4.2.1) is a random variable X corresponding to a random process with exactly two possible outcomes typically labeled "success" and "failure", a so-called Bernoulli trial. We define X by counting the number of successes after a single trial so that X=1 (for a success) and X=0 for failure.
| Num Successes | X=0 | X=1 |
| Probability | 1-p | p |
μ=E(X)=p, and σ2=Var(X)=p(1−p)
The geometric distribution is used to describe how many trials it takes to observe a success.
The geometric distribution is used to describe how many trials it takes to observe a success.
The geometric distribution is used to describe how many trials it takes to observe a success.
Suppose we conduct a sequence of n independent Bernoulli trials with probability of success p. What is the probability that it takes n trials to obtain the first success?
Let A be the event that the first success occurs on the n-th trial. Then A can be realized as the event A=F1 and F2 and ⋯ and Fn−1 and S1, where and F corresponds to a failure event and an S corresponds to a success event. Since there are all independent, we have
P(A)=P(F1)P(F2)⋯P(Fn−1)P(S1)=(1−p)n−1p
The geometric distribution is used to describe how many trials it takes to observe a success.
Suppose we conduct a sequence of n independent Bernoulli trials with probability of success p. What is the probability that it takes n trials to obtain the first success?
Let A be the event that the first success occurs on the n-th trial. Then A can be realized as the event A=F1 and F2 and ⋯ and Fn−1 and S1, where and F corresponds to a failure event and an S corresponds to a success event. Since there are all independent, we have
P(A)=P(F1)P(F2)⋯P(Fn−1)P(S1)=(1−p)n−1p
μ=E(X)=1p, and σ2=Var(X)=1−pp2
The binomial distribution is used to describe the number of successes in a fixed number of trials. This is different from the geometric distribution, which describes the number of trials we must wait before we observe a success.
For a binomial distribution,
The binomial distribution is used to describe the number of successes in a fixed number of trials. This is different from the geometric distribution, which describes the number of trials we must wait before we observe a success.
For a binomial distribution,
The binomial distribution is used to describe the number of successes in a fixed number of trials. This is different from the geometric distribution, which describes the number of trials we must wait before we observe a success.
For a binomial distribution,
The number of trials, n, is fixed.
The trials are independent.
The binomial distribution is used to describe the number of successes in a fixed number of trials. This is different from the geometric distribution, which describes the number of trials we must wait before we observe a success.
For a binomial distribution,
The number of trials, n, is fixed.
The trials are independent.
Each trial outcome can be classified as a success or failure.
The binomial distribution is used to describe the number of successes in a fixed number of trials. This is different from the geometric distribution, which describes the number of trials we must wait before we observe a success.
For a binomial distribution,
The number of trials, n, is fixed.
The trials are independent.
Each trial outcome can be classified as a success or failure.
The probability of a success, p, is the same for each trial.
(nk)pk(1−p)n−k=n!k!(n−k)!pk(1−p)n−k
(nk)pk(1−p)n−k=n!k!(n−k)!pk(1−p)n−k
μ=npσ2=np(1−p)σ=√np(1−p)
We take a break from the slides to work out some examples related to Binomial random variables.
This video is also recommended:
We will also be interested in continuous random variables.
Continuous random variables are tricky to define precisely. Roughly, a random variable X is a continuous random variable if its outcomes are continuous numerical values.
We will also be interested in continuous random variables.
Continuous random variables are tricky to define precisely. Roughly, a random variable X is a continuous random variable if its outcomes are continuous numerical values.
Consider for example a random variable X whose outcomes can be any real number in the interval [0,1] and with each outcome equally likely. Such a random variable is said to follow a uniform distribution on [0,1].
We will also be interested in continuous random variables.
Continuous random variables are tricky to define precisely. Roughly, a random variable X is a continuous random variable if its outcomes are continuous numerical values.
Consider for example a random variable X whose outcomes can be any real number in the interval [0,1] and with each outcome equally likely. Such a random variable is said to follow a uniform distribution on [0,1].
If x is any real number in [0,1], then P(X=x)=0. However, if a,b are any two real numbers in [0,1] with a≤b, then P(a≤X≤b)=b−a.
We will also be interested in continuous random variables.
Continuous random variables are tricky to define precisely. Roughly, a random variable X is a continuous random variable if its outcomes are continuous numerical values.
Consider for example a random variable X whose outcomes can be any real number in the interval [0,1] and with each outcome equally likely. Such a random variable is said to follow a uniform distribution on [0,1].
If x is any real number in [0,1], then P(X=x)=0. However, if a,b are any two real numbers in [0,1] with a≤b, then P(a≤X≤b)=b−a.
The quantity P(a≤X≤b) is interpreted as the probability of randomly selecting any real number in [0,1] that lies between a and b.
The following plot shows a histogram of 10,000 random samples from a uniform distribution on [0,1]:
The following plot shows a histogram of 10,000 random samples from a uniform distribution on [0,1]:

The following plot shows a histogram of 10,000 random samples from a uniform distribution on [0,1]:




Hopefully, the last few slides provide intuition for the following facts:
Hopefully, the last few slides provide intuition for the following facts:
Note that for any density function f, we require that the total area under the graph of f is 1.
We now show how you can use R to work with the distributions we have introduced so far.
Here is what you need to know:
binom, geom, unif, or norm. We now show how you can use R to work with the distributions we have introduced so far.
Here is what you need to know:
Each distribution has a short hand name such binom, geom, unif, or norm.
Each distribution has four functions associated with it. For example, the four functions associated with binom are
rbinom - draws random samples from a binomial random variable
We now show how you can use R to work with the distributions we have introduced so far.
Here is what you need to know:
Each distribution has a short hand name such binom, geom, unif, or norm.
Each distribution has four functions associated with it. For example, the four functions associated with binom are
rbinom - draws random samples from a binomial random variable
dbinom - implements the probability function for a binomial random variable
We now show how you can use R to work with the distributions we have introduced so far.
Here is what you need to know:
Each distribution has a short hand name such binom, geom, unif, or norm.
Each distribution has four functions associated with it. For example, the four functions associated with binom are
rbinom - draws random samples from a binomial random variable
dbinom - implements the probability function for a binomial random variable
pbinom & qbinom which implement the distribution function and quantile function respectively for a binomial random variable. We have not really discussed the concepts related to these functions at this point.
We now show how you can use R to work with the distributions we have introduced so far.
Here is what you need to know:
Each distribution has a short hand name such binom, geom, unif, or norm.
Each distribution has four functions associated with it. For example, the four functions associated with binom are
rbinom - draws random samples from a binomial random variable
dbinom - implements the probability function for a binomial random variable
pbinom & qbinom which implement the distribution function and quantile function respectively for a binomial random variable. We have not really discussed the concepts related to these functions at this point.
Let's go to R together and see how these all work.
In this lecture, we will
Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |