+ - 0:00:00
Notes for current slide
Notes for next slide

Lecture 10

JMG

MATH 204

1 / 41

Linear Regression: Introduction

  • Linear regression is a statistical method for fitting a line to data.
2 / 41

Linear Regression: Introduction

  • Linear regression is a statistical method for fitting a line to data.

  • Recall that a (non-vertical) line in the x,y-plane is determined by

y=slope×x+intercept

2 / 41

Linear Regression: Introduction

  • Linear regression is a statistical method for fitting a line to data.

  • Recall that a (non-vertical) line in the x,y-plane is determined by

y=slope×x+intercept

  • There are two aspects to fitting a line to data that we will study:
2 / 41

Linear Regression: Introduction

  • Linear regression is a statistical method for fitting a line to data.

  • Recall that a (non-vertical) line in the x,y-plane is determined by

y=slope×x+intercept

  • There are two aspects to fitting a line to data that we will study:

    • Estimating the slope and intercept values, and
2 / 41

Linear Regression: Introduction

  • Linear regression is a statistical method for fitting a line to data.

  • Recall that a (non-vertical) line in the x,y-plane is determined by

y=slope×x+intercept

  • There are two aspects to fitting a line to data that we will study:

    • Estimating the slope and intercept values, and

    • Assessing the uncertainty of our estimates for the slope and intercept values.

2 / 41

Linear Regression: Introduction

  • Linear regression is a statistical method for fitting a line to data.

  • Recall that a (non-vertical) line in the x,y-plane is determined by

y=slope×x+intercept

  • There are two aspects to fitting a line to data that we will study:

    • Estimating the slope and intercept values, and

    • Assessing the uncertainty of our estimates for the slope and intercept values.

  • In this lecture, we cover all of the concepts necessary to understand how to carry out and interpret linear regression.

2 / 41

Linear Regression: Introduction

  • Linear regression is a statistical method for fitting a line to data.

  • Recall that a (non-vertical) line in the x,y-plane is determined by

y=slope×x+intercept

  • There are two aspects to fitting a line to data that we will study:

    • Estimating the slope and intercept values, and

    • Assessing the uncertainty of our estimates for the slope and intercept values.

  • In this lecture, we cover all of the concepts necessary to understand how to carry out and interpret linear regression.

  • We encourage you to watch the video on the next slide to help in getting introduced to linear regression.

2 / 41

Regression Intro Video

3 / 41

Learning Objectives

  • After this lecture, you should
4 / 41

Learning Objectives

  • After this lecture, you should

    • Understand the basic principles of simple linear regression: parameter estimates, residuals, and correlation. (8.1, 8.2)
4 / 41

Learning Objectives

  • After this lecture, you should

    • Understand the basic principles of simple linear regression: parameter estimates, residuals, and correlation. (8.1, 8.2)

    • Know the conditions for least squares regression: linearity, normality, constant variance, and independence. (8.2)

4 / 41

Learning Objectives

  • After this lecture, you should

    • Understand the basic principles of simple linear regression: parameter estimates, residuals, and correlation. (8.1, 8.2)

    • Know the conditions for least squares regression: linearity, normality, constant variance, and independence. (8.2)

    • Know how to diagnose problems with a linear fit by least squares regression. (8.3)

4 / 41

Learning Objectives

  • After this lecture, you should

    • Understand the basic principles of simple linear regression: parameter estimates, residuals, and correlation. (8.1, 8.2)

    • Know the conditions for least squares regression: linearity, normality, constant variance, and independence. (8.2)

    • Know how to diagnose problems with a linear fit by least squares regression. (8.3)

    • Understand the methods of inference of least squares regression. (8.4)

4 / 41

Learning Objectives

  • After this lecture, you should

    • Understand the basic principles of simple linear regression: parameter estimates, residuals, and correlation. (8.1, 8.2)

    • Know the conditions for least squares regression: linearity, normality, constant variance, and independence. (8.2)

    • Know how to diagnose problems with a linear fit by least squares regression. (8.3)

    • Understand the methods of inference of least squares regression. (8.4)

    • Know how to obtain a linear fit using R with the lm function.

4 / 41

Learning Objectives

  • After this lecture, you should

    • Understand the basic principles of simple linear regression: parameter estimates, residuals, and correlation. (8.1, 8.2)

    • Know the conditions for least squares regression: linearity, normality, constant variance, and independence. (8.2)

    • Know how to diagnose problems with a linear fit by least squares regression. (8.3)

    • Understand the methods of inference of least squares regression. (8.4)

    • Know how to obtain a linear fit using R with the lm function.

    • Be able to assess and interpret the results of a linear fit using R with the lm function.

4 / 41

Simple Regression Model

  • Simple linear regression models the relationship between two (numerical) variables x and y by the formula

y=β0+β1x+ϵ where

5 / 41

Simple Regression Model

  • Simple linear regression models the relationship between two (numerical) variables x and y by the formula

y=β0+β1x+ϵ where

  • β0 (intercept) and β1 (slope) are the model parameters
5 / 41

Simple Regression Model

  • Simple linear regression models the relationship between two (numerical) variables x and y by the formula

y=β0+β1x+ϵ where

  • β0 (intercept) and β1 (slope) are the model parameters

  • ϵ is the error

5 / 41

Simple Regression Model

  • Simple linear regression models the relationship between two (numerical) variables x and y by the formula

y=β0+β1x+ϵ where

  • β0 (intercept) and β1 (slope) are the model parameters

  • ϵ is the error

  • The parameters are estimated using data and their point estimates are denoted by b0 (intercept estimate) and b1 (slope estimate).
5 / 41

Simple Regression Model

  • Simple linear regression models the relationship between two (numerical) variables x and y by the formula

y=β0+β1x+ϵ where

  • β0 (intercept) and β1 (slope) are the model parameters

  • ϵ is the error

  • The parameters are estimated using data and their point estimates are denoted by b0 (intercept estimate) and b1 (slope estimate).

  • In linear regression, x is called the explanatory or predictor variable while y is called the response variable.

5 / 41

Simple Regression Model

  • Simple linear regression models the relationship between two (numerical) variables x and y by the formula

y=β0+β1x+ϵ where

  • β0 (intercept) and β1 (slope) are the model parameters

  • ϵ is the error

  • The parameters are estimated using data and their point estimates are denoted by b0 (intercept estimate) and b1 (slope estimate).

  • In linear regression, x is called the explanatory or predictor variable while y is called the response variable.

  • Let's look at an example data set for which linear regression is a potentially useful model.

5 / 41

Australian Brushtail Possum

6 / 41

Australian Brushtail Possum

  • The possum data set records measurements of 104 brushtail possums from Australia and New Guinea, the first few rows of the data are shown below
head(possum,4)
## # A tibble: 4 x 8
## site pop sex age head_l skull_w total_l tail_l
## <int> <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 Vic m 8 94.1 60.4 89 36
## 2 1 Vic f 6 92.5 57.6 91.5 36.5
## 3 1 Vic f 6 94 60 95.5 39
## 4 1 Vic f 6 93.2 57.1 92 38
6 / 41

Possum Data Example

  • Suppose that we as researchers are interested to study the relationship between the head length (head_l) and total length (total_l) measurements of the brushtail possum of Australia.
7 / 41

Possum Data Example

  • Suppose that we as researchers are interested to study the relationship between the head length (head_l) and total length (total_l) measurements of the brushtail possum of Australia.

  • Note that head length (head_l) and total length (total_l) are both (continuous) numerical variables.

7 / 41

Possum Data Example

  • Suppose that we as researchers are interested to study the relationship between the head length (head_l) and total length (total_l) measurements of the brushtail possum of Australia.

  • Note that head length (head_l) and total length (total_l) are both (continuous) numerical variables.

  • The next slide displays a scatterplot for head_l versus total_l.

7 / 41

Possum Data Scatterplot

8 / 41

Possum Data Scatterplot

  • Describe the features of any association that there appears to be between the two variables in the plot.
8 / 41

Possum Data Regression Line

  • We have added the "best fit" line to the scatter plot of head_l versus total_l. Later we discuss how this line is obtained.

9 / 41

Possum Data Regression Line

  • We have added the "best fit" line to the scatter plot of head_l versus total_l. Later we discuss how this line is obtained.

  • A residual is the vertical distance between a data point and the best fit line. The next slide shows all of the residuals for the data.
9 / 41

Possum Regression Residuals

10 / 41

Possum Regression Residuals

  • The best fit or regression line is the line that minimizes all of the residuals simultaneously.
10 / 41

Possum Regression Residual Plot

  • A residual plot displays the residual values versus the x values from the data.

11 / 41

Possum Regression Residual Plot

  • A residual plot displays the residual values versus the x values from the data.

  • As we will see, residual plots play an important role in assessing the results of a regression model.
11 / 41

Fits and Residual Plots

  • Let's look some linear fits and their corresponding residual plots.
12 / 41

Fits and Residual Plots

  • Let's look some linear fits and their corresponding residual plots.

12 / 41

Fits and Residual Plots

  • Let's look some linear fits and their corresponding residual plots.

  • In the first column, the residuals show no obvious pattern, this is desirable. In the second column, the residuals show a pattern that suggests a linear model is inappropriate. In the third column, it's not clear if the linear fit is statistically significant.
12 / 41

Correlation

  • Correlation, which always takes values between -1 and 1 is a statistic that describes the strength of the linear relationship between two variables. Correlation is denoted by R.
13 / 41

Correlation

  • Correlation, which always takes values between -1 and 1 is a statistic that describes the strength of the linear relationship between two variables. Correlation is denoted by R.

  • In R, correlation is computed with the cor command. For example, the correlation between the head_l and total_l variables in the possum data set is computed as

cor(possum$head_l,possum$total_l)
## [1] 0.6910937
13 / 41

Correlation

  • Correlation, which always takes values between -1 and 1 is a statistic that describes the strength of the linear relationship between two variables. Correlation is denoted by R.

  • In R, correlation is computed with the cor command. For example, the correlation between the head_l and total_l variables in the possum data set is computed as

cor(possum$head_l,possum$total_l)
## [1] 0.6910937
  • The plot in the next slide shows several scatter plots together with the corresponding correlation value.
13 / 41

Correlation Illustrations

14 / 41

Strongly Related Variables with Weak Correlations

  • It is important to note that two variables may have a strong association even if their correlation is relatively weak. This is because correlation measures linear association and variables may be a strong nonlinear association.
15 / 41

Strongly Related Variables with Weak Correlations

  • It is important to note that two variables may have a strong association even if their correlation is relatively weak. This is because correlation measures linear association and variables may be a strong nonlinear association.

15 / 41

Least Squares Regression

  • We now begin to discuss the details of how to fit a simple linear regression model to data.
16 / 41

Least Squares Regression

  • We now begin to discuss the details of how to fit a simple linear regression model to data.

  • The approach we take is called least squares regression.

16 / 41

Least Squares Regression

  • We now begin to discuss the details of how to fit a simple linear regression model to data.

  • The approach we take is called least squares regression.

  • The idea is to chose parameter estimates that minimize all of the residuals simultaneously. That is, for each observed data point (xi,yi), we find b0 and b1 such that if y^i=b0+b1xi, then

RSS=i=1n(y^iyi)2

is as small as possible.

16 / 41

Least Squares Regression

  • We now begin to discuss the details of how to fit a simple linear regression model to data.

  • The approach we take is called least squares regression.

  • The idea is to chose parameter estimates that minimize all of the residuals simultaneously. That is, for each observed data point (xi,yi), we find b0 and b1 such that if y^i=b0+b1xi, then

RSS=i=1n(y^iyi)2

is as small as possible.

  • For this to work out well, several conditions need to be met. These conditions are spelled out on the next slide.
16 / 41

Conditions for Least Squares

  • Linearity. The data should show a linear trend.
17 / 41

Conditions for Least Squares

  • Linearity. The data should show a linear trend.

  • Normality. Generally, the distribution of the residuals should be close to normal.

17 / 41

Conditions for Least Squares

  • Linearity. The data should show a linear trend.

  • Normality. Generally, the distribution of the residuals should be close to normal.

  • Constant Variance. The variability of points around the least squares line remains roughly constant. Residual plots are a good way to check this condition.

17 / 41

Conditions for Least Squares

  • Linearity. The data should show a linear trend.

  • Normality. Generally, the distribution of the residuals should be close to normal.

  • Constant Variance. The variability of points around the least squares line remains roughly constant. Residual plots are a good way to check this condition.

  • Independence. We want to avoid fitting a line to data via least squares whenever there is dependence between consecutive data points.

17 / 41

Conditions for Least Squares

  • Linearity. The data should show a linear trend.

  • Normality. Generally, the distribution of the residuals should be close to normal.

  • Constant Variance. The variability of points around the least squares line remains roughly constant. Residual plots are a good way to check this condition.

  • Independence. We want to avoid fitting a line to data via least squares whenever there is dependence between consecutive data points.

  • The next slide shows plot of data where at least one of the conditions for least squares regression fails to hold.

17 / 41

Regression Assumption Failures

18 / 41

Regression Assumption Failures

  • In the first column, the linearity condition fails. In the second column, the normality condition fails.
18 / 41

Regression Assumption Failures

  • In the first column, the linearity condition fails. In the second column, the normality condition fails.

  • In the third column, the constant variance condition fails. In the fourth column, the independence condition fails.

18 / 41

Regression Assumption Failures

  • In the first column, the linearity condition fails. In the second column, the normality condition fails.

  • In the third column, the constant variance condition fails. In the fourth column, the independence condition fails.

  • Notice how in each case, the residual plot can be used to diagnose problems with a least squares linear regression fit.

18 / 41

Fitting a Linear Model

  • For simple least squares linear regression with one predictor variable ( x ), one can fit the model to data "by hand". The mathematical formula is

b1=sysxR,

b0=y¯b1x¯,

where

  • R is the correlation between x and y,
19 / 41

Fitting a Linear Model

  • For simple least squares linear regression with one predictor variable ( x ), one can fit the model to data "by hand". The mathematical formula is

b1=sysxR,

b0=y¯b1x¯,

where

  • R is the correlation between x and y,

  • sy and sx are the sample standard deviations for y and x, and

19 / 41

Fitting a Linear Model

  • For simple least squares linear regression with one predictor variable ( x ), one can fit the model to data "by hand". The mathematical formula is

b1=sysxR,

b0=y¯b1x¯,

where

  • R is the correlation between x and y,

  • sy and sx are the sample standard deviations for y and x, and

  • y¯ and x¯ are the sample means for y and x.

19 / 41

Fitting a Linear Model

  • For simple least squares linear regression with one predictor variable ( x ), one can fit the model to data "by hand". The mathematical formula is

b1=sysxR,

b0=y¯b1x¯,

where

  • R is the correlation between x and y,

  • sy and sx are the sample standard deviations for y and x, and

  • y¯ and x¯ are the sample means for y and x.

  • Let's apply these formulas to the possum data set with x the total_l variable and y the head_l variable.
19 / 41

Applying the Regression Formulas

  • We need to compute the correlation, sample means, and sample standard deviations:
x <- possum$total_l; y <- possum$head_l
x_bar <- mean(x); y_bar <- mean(y)
s_x <- sd(x); s_y <- sd(y)
R <- cor(x,y)
20 / 41

Applying the Regression Formulas

  • We need to compute the correlation, sample means, and sample standard deviations:
x <- possum$total_l; y <- possum$head_l
x_bar <- mean(x); y_bar <- mean(y)
s_x <- sd(x); s_y <- sd(y)
R <- cor(x,y)
  • Now we can compute our estimates b0 and b1:
(b_1 <- (s_y/s_x)*R)
## [1] 0.5729013
(b_0 <- y_bar - b_1*x_bar)
## [1] 42.70979
20 / 41

Applying the Regression Formulas

  • We need to compute the correlation, sample means, and sample standard deviations:
x <- possum$total_l; y <- possum$head_l
x_bar <- mean(x); y_bar <- mean(y)
s_x <- sd(x); s_y <- sd(y)
R <- cor(x,y)
  • Now we can compute our estimates b0 and b1:
(b_1 <- (s_y/s_x)*R)
## [1] 0.5729013
(b_0 <- y_bar - b_1*x_bar)
## [1] 42.70979
  • There is an R command, lm (linear model) that will compute these values and much more for us.
20 / 41

The lm Command

  • Let's see an example of how the lm command is used:
lm(head_l ~ total_l, data=possum)
##
## Call:
## lm(formula = head_l ~ total_l, data = possum)
##
## Coefficients:
## (Intercept) total_l
## 42.7098 0.5729
21 / 41

The lm Command

  • Let's see an example of how the lm command is used:
lm(head_l ~ total_l, data=possum)
##
## Call:
## lm(formula = head_l ~ total_l, data = possum)
##
## Coefficients:
## (Intercept) total_l
## 42.7098 0.5729
  • Notice that this returns the point estimate values for b0 (Intercept) and the slope b1, and that these values are the same as what we obtained using the mathematical formulas on the last slide.
21 / 41

Interpreting Model Parameters

  • For a linear model,
22 / 41

Interpreting Model Parameters

  • For a linear model,

    • The slope describes the estimate difference in the y variable if the explanatory variable x for a case happened to be one unit larger.
22 / 41

Interpreting Model Parameters

  • For a linear model,

    • The slope describes the estimate difference in the y variable if the explanatory variable x for a case happened to be one unit larger.

    • The intercept describes the average outcome of y if x=0 and the linear model is valid all the way to x=0, which in many applications is not the case.

22 / 41

Interpreting Model Parameters

  • For a linear model,

    • The slope describes the estimate difference in the y variable if the explanatory variable x for a case happened to be one unit larger.

    • The intercept describes the average outcome of y if x=0 and the linear model is valid all the way to x=0, which in many applications is not the case.

  • To evaluate the strength of a linear fit, we compute R2 (R-squared). The value of R2 tells us the percent of variation in the response that is explained by the explanatory variable.

22 / 41

Interpreting Model Parameters

  • For a linear model,

    • The slope describes the estimate difference in the y variable if the explanatory variable x for a case happened to be one unit larger.

    • The intercept describes the average outcome of y if x=0 and the linear model is valid all the way to x=0, which in many applications is not the case.

  • To evaluate the strength of a linear fit, we compute R2 (R-squared). The value of R2 tells us the percent of variation in the response that is explained by the explanatory variable.

  • There are some pitfalls in interpreting the results of a linear model. In particular,

22 / 41

Interpreting Model Parameters

  • For a linear model,

    • The slope describes the estimate difference in the y variable if the explanatory variable x for a case happened to be one unit larger.

    • The intercept describes the average outcome of y if x=0 and the linear model is valid all the way to x=0, which in many applications is not the case.

  • To evaluate the strength of a linear fit, we compute R2 (R-squared). The value of R2 tells us the percent of variation in the response that is explained by the explanatory variable.

  • There are some pitfalls in interpreting the results of a linear model. In particular,

    • Applying a model to estimate values outside of the realm of the original data is called extrapolation. Generally, extrapolation is unreliable.
22 / 41

Interpreting Model Parameters

  • For a linear model,

    • The slope describes the estimate difference in the y variable if the explanatory variable x for a case happened to be one unit larger.

    • The intercept describes the average outcome of y if x=0 and the linear model is valid all the way to x=0, which in many applications is not the case.

  • To evaluate the strength of a linear fit, we compute R2 (R-squared). The value of R2 tells us the percent of variation in the response that is explained by the explanatory variable.

  • There are some pitfalls in interpreting the results of a linear model. In particular,

    • Applying a model to estimate values outside of the realm of the original data is called extrapolation. Generally, extrapolation is unreliable.

    • In many cases, even when there is a real association between variables, we cannot interpret a causal connection between the variables.

22 / 41

Example

  • Before we discuss further details of regression, let's look at a detailed example of fitting and interpreting a linear model.
23 / 41

Example

  • Before we discuss further details of regression, let's look at a detailed example of fitting and interpreting a linear model.

  • We will look at a linear model for the data set, cheddar from the faraway package.

23 / 41

Example

  • Before we discuss further details of regression, let's look at a detailed example of fitting and interpreting a linear model.

  • We will look at a linear model for the data set, cheddar from the faraway package.

  • Let's do this example together in RStudio.

23 / 41

Outlier Issues

  • Outliers in regression are observations that fall far from the cloud of points.
24 / 41

Outlier Issues

  • Outliers in regression are observations that fall far from the cloud of points.
  • Outliers can have a strong influence on the least squares line.
24 / 41

Outlier Issues

  • Outliers in regression are observations that fall far from the cloud of points.
  • Outliers can have a strong influence on the least squares line.

  • Points that fall horizontally away from the center of the cloud tend to pull harder on the line, so we call them points with high leverage.

24 / 41

Outlier Issues

  • Outliers in regression are observations that fall far from the cloud of points.
  • Outliers can have a strong influence on the least squares line.

  • Points that fall horizontally away from the center of the cloud tend to pull harder on the line, so we call them points with high leverage.

  • A data point is called an influential point if, had we fitted the line without it, the influential point would have been unusually far from the least squares line.

24 / 41

Outlier Issues

  • Outliers in regression are observations that fall far from the cloud of points.
  • Outliers can have a strong influence on the least squares line.

  • Points that fall horizontally away from the center of the cloud tend to pull harder on the line, so we call them points with high leverage.

  • A data point is called an influential point if, had we fitted the line without it, the influential point would have been unusually far from the least squares line.

  • The next slide shows data with outliers together with the corresponding regression line and residual plot.

24 / 41

Regression Outliers

25 / 41

Inference for Regression

  • Recall that a simple linear model has the form

y=β0+β1x+ϵ

26 / 41

Inference for Regression

  • Recall that a simple linear model has the form

y=β0+β1x+ϵ

  • Least squares is a method for obtaining point estimates b0 and b1 for the parameters β0 and β1. Thus, β0 and β1 are unknowns that correspond to population values that we want to infer information about.
26 / 41

Inference for Regression

  • Recall that a simple linear model has the form

y=β0+β1x+ϵ

  • Least squares is a method for obtaining point estimates b0 and b1 for the parameters β0 and β1. Thus, β0 and β1 are unknowns that correspond to population values that we want to infer information about.

  • A somewhat subtle point is that we also do not know the population standard deviation σ for the error ϵ. This is an additional model parameter.

26 / 41

Inference for Regression

  • Recall that a simple linear model has the form

y=β0+β1x+ϵ

  • Least squares is a method for obtaining point estimates b0 and b1 for the parameters β0 and β1. Thus, β0 and β1 are unknowns that correspond to population values that we want to infer information about.

  • A somewhat subtle point is that we also do not know the population standard deviation σ for the error ϵ. This is an additional model parameter.

  • We would like answers to the following questions:

26 / 41

Inference for Regression

  • Recall that a simple linear model has the form

y=β0+β1x+ϵ

  • Least squares is a method for obtaining point estimates b0 and b1 for the parameters β0 and β1. Thus, β0 and β1 are unknowns that correspond to population values that we want to infer information about.

  • A somewhat subtle point is that we also do not know the population standard deviation σ for the error ϵ. This is an additional model parameter.

  • We would like answers to the following questions:

    • How do we obtain confidence intervals for β0 and β1, specifically how to we get the standard error?
26 / 41

Inference for Regression

  • Recall that a simple linear model has the form

y=β0+β1x+ϵ

  • Least squares is a method for obtaining point estimates b0 and b1 for the parameters β0 and β1. Thus, β0 and β1 are unknowns that correspond to population values that we want to infer information about.

  • A somewhat subtle point is that we also do not know the population standard deviation σ for the error ϵ. This is an additional model parameter.

  • We would like answers to the following questions:

    • How do we obtain confidence intervals for β0 and β1, specifically how to we get the standard error?

    • How do we conduct hypothesis tests related to the parameters β0 and β1?

26 / 41

Confidence Intervals for Model Coefficients

  • Fact: The sampling distribution for estimates for β0 and β1 is a t-distribution. So we can obtain confidence intervals with

bi±tdf×SEbi

27 / 41

Confidence Intervals for Model Coefficients

  • Fact: The sampling distribution for estimates for β0 and β1 is a t-distribution. So we can obtain confidence intervals with

bi±tdf×SEbi

  • All the numerical information you need to obtain confidence intervals for β0 and β1 is provided in the output of the summary command for a linear model fit with lm.
27 / 41

Confidence Intervals for Model Coefficients

  • Fact: The sampling distribution for estimates for β0 and β1 is a t-distribution. So we can obtain confidence intervals with

bi±tdf×SEbi

  • All the numerical information you need to obtain confidence intervals for β0 and β1 is provided in the output of the summary command for a linear model fit with lm.

  • Suppose we fit a linear model for the possum data again:

(lm_fit <- lm(head_l ~ total_l, data=possum))
##
## Call:
## lm(formula = head_l ~ total_l, data = possum)
##
## Coefficients:
## (Intercept) total_l
## 42.7098 0.5729
27 / 41

Confidence Intervals for Model Coefficients

  • Fact: The sampling distribution for estimates for β0 and β1 is a t-distribution. So we can obtain confidence intervals with

bi±tdf×SEbi

  • All the numerical information you need to obtain confidence intervals for β0 and β1 is provided in the output of the summary command for a linear model fit with lm.

  • Suppose we fit a linear model for the possum data again:

(lm_fit <- lm(head_l ~ total_l, data=possum))
##
## Call:
## lm(formula = head_l ~ total_l, data = possum)
##
## Coefficients:
## (Intercept) total_l
## 42.7098 0.5729
  • On the next slide, we print the output of summary(lm_fit)
27 / 41

lm summary output

summary(lm_fit)
##
## Call:
## lm(formula = head_l ~ total_l, data = possum)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1877 -1.5340 -0.3345 1.2788 7.3968
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.70979 5.17281 8.257 5.66e-13 ***
## total_l 0.57290 0.05933 9.657 4.68e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.595 on 102 degrees of freedom
## Multiple R-squared: 0.4776, Adjusted R-squared: 0.4725
## F-statistic: 93.26 on 1 and 102 DF, p-value: 4.681e-16
28 / 41

lm summary output

summary(lm_fit)
##
## Call:
## lm(formula = head_l ~ total_l, data = possum)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1877 -1.5340 -0.3345 1.2788 7.3968
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.70979 5.17281 8.257 5.66e-13 ***
## total_l 0.57290 0.05933 9.657 4.68e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.595 on 102 degrees of freedom
## Multiple R-squared: 0.4776, Adjusted R-squared: 0.4725
## F-statistic: 93.26 on 1 and 102 DF, p-value: 4.681e-16
  • For obtaining a confidence interval, the relevant information is provided by Estimate, Std. Error, and the reported degrees of freedom.
28 / 41

Regression CI Example

  • Based on the information provided in the summary command output, we can construct confidence intervals for β0 and β1. First we observe that the degrees of freedom is 102, then the t102 value for a 95% CI is
(t_ast <- -qt((1.0-0.95)/2,df=102))
## [1] 1.983495
29 / 41

Regression CI Example

  • Based on the information provided in the summary command output, we can construct confidence intervals for β0 and β1. First we observe that the degrees of freedom is 102, then the t102 value for a 95% CI is
(t_ast <- -qt((1.0-0.95)/2,df=102))
## [1] 1.983495
  • Now we can obtain our confidence intervals for β0 and β1:
(beta_0_CI <- 42.71 + 1.98*c(-1,1)*5.17)
## [1] 32.4734 52.9466
(beta_1_CI <- 0.57 + 1.98*c(-1,1)*0.06)
## [1] 0.4512 0.6888
29 / 41

Hypothesis Testing for Linear Regression

  • There are actually several types of hypothesis tests one can conduct relating to linear regression models.
30 / 41

Hypothesis Testing for Linear Regression

  • There are actually several types of hypothesis tests one can conduct relating to linear regression models.

  • The most common test is of the form

    • H0:β1=0. The true linear model has slope zero. Versus
30 / 41

Hypothesis Testing for Linear Regression

  • There are actually several types of hypothesis tests one can conduct relating to linear regression models.

  • The most common test is of the form

    • H0:β1=0. The true linear model has slope zero. Versus

    • HA:β10. The true linear model has a slope different than zero.

30 / 41

Hypothesis Testing for Linear Regression

  • There are actually several types of hypothesis tests one can conduct relating to linear regression models.

  • The most common test is of the form

    • H0:β1=0. The true linear model has slope zero. Versus

    • HA:β10. The true linear model has a slope different than zero.

  • The summary command output includes a p-value for testing such a hypothesis. However, be aware that the lm command does not check whether the conditions for a linear model are met and the results for inference on model parameters is only valid if the conditions for a linear model are met.

30 / 41

Hypothesis Test for possum Data

  • Again, the output for summary(lm_fit) is
summary(lm_fit)
##
## Call:
## lm(formula = head_l ~ total_l, data = possum)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1877 -1.5340 -0.3345 1.2788 7.3968
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.70979 5.17281 8.257 5.66e-13 ***
## total_l 0.57290 0.05933 9.657 4.68e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.595 on 102 degrees of freedom
## Multiple R-squared: 0.4776, Adjusted R-squared: 0.4725
## F-statistic: 93.26 on 1 and 102 DF, p-value: 4.681e-16
31 / 41

Hypothesis Test for possum Data

  • Again, the output for summary(lm_fit) is
summary(lm_fit)
##
## Call:
## lm(formula = head_l ~ total_l, data = possum)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1877 -1.5340 -0.3345 1.2788 7.3968
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.70979 5.17281 8.257 5.66e-13 ***
## total_l 0.57290 0.05933 9.657 4.68e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.595 on 102 degrees of freedom
## Multiple R-squared: 0.4776, Adjusted R-squared: 0.4725
## F-statistic: 93.26 on 1 and 102 DF, p-value: 4.681e-16
  • The p-value corresponding to the slope estimate b1 is much smaller than 0.05 so we will decide to reject the null hypothesis H0:β1=0 at the α=0.05 significance level.
31 / 41

Inference for Regression Video

  • Please watch this video to gain further perspective on inference for linear regression:
32 / 41

More R For Linear Models

  • To fit a linear model in R we use the lm command. The necessary input is a formula of the form y ~ x and the data. The summary command outputs all of the information relevant for inferential purposes.
33 / 41

More R For Linear Models

  • To fit a linear model in R we use the lm command. The necessary input is a formula of the form y ~ x and the data. The summary command outputs all of the information relevant for inferential purposes.

  • However, the output from summary is not necessarily formatted in the most convenient way.

33 / 41

More R For Linear Models

  • To fit a linear model in R we use the lm command. The necessary input is a formula of the form y ~ x and the data. The summary command outputs all of the information relevant for inferential purposes.

  • However, the output from summary is not necessarily formatted in the most convenient way.

  • Another approach to working with regression model output is provided by functions in the broom package. For example, the tidy function from broom displays the results of the model fit:

tidy(lm_fit)
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 42.7 5.17 8.26 5.66e-13
## 2 total_l 0.573 0.0593 9.66 4.68e-16
33 / 41

More R For Linear Models

  • To fit a linear model in R we use the lm command. The necessary input is a formula of the form y ~ x and the data. The summary command outputs all of the information relevant for inferential purposes.

  • However, the output from summary is not necessarily formatted in the most convenient way.

  • Another approach to working with regression model output is provided by functions in the broom package. For example, the tidy function from broom displays the results of the model fit:

tidy(lm_fit)
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 42.7 5.17 8.26 5.66e-13
## 2 total_l 0.573 0.0593 9.66 4.68e-16
  • Let's go to R together, work some more examples, and learn to work with the broom functions.
33 / 41

Worked Examples

  • Let's work some more examples relating to linear regression.
34 / 41

Comparing Many Means

  • In previous lectures, we studied inference for a difference of means. There are also statistical methods for comparing more than two means. The primary method is called analysis of variance (ANOVA), see 7.5.
35 / 41

Comparing Many Means

  • In previous lectures, we studied inference for a difference of means. There are also statistical methods for comparing more than two means. The primary method is called analysis of variance (ANOVA), see 7.5.

  • ANOVA uses a single hypothesis to check whether the means across many groups are equal:

35 / 41

Comparing Many Means

  • In previous lectures, we studied inference for a difference of means. There are also statistical methods for comparing more than two means. The primary method is called analysis of variance (ANOVA), see 7.5.

  • ANOVA uses a single hypothesis to check whether the means across many groups are equal:

    • H0:μ1=μ2==μk. The mean outcome is the same across all groups. Versus

    • HA: At least one mean is different.

35 / 41

Comparing Many Means

  • In previous lectures, we studied inference for a difference of means. There are also statistical methods for comparing more than two means. The primary method is called analysis of variance (ANOVA), see 7.5.

  • ANOVA uses a single hypothesis to check whether the means across many groups are equal:

    • H0:μ1=μ2==μk. The mean outcome is the same across all groups. Versus

    • HA: At least one mean is different.

  • We must check three conditions for ANOVA:

35 / 41

Comparing Many Means

  • In previous lectures, we studied inference for a difference of means. There are also statistical methods for comparing more than two means. The primary method is called analysis of variance (ANOVA), see 7.5.

  • ANOVA uses a single hypothesis to check whether the means across many groups are equal:

    • H0:μ1=μ2==μk. The mean outcome is the same across all groups. Versus

    • HA: At least one mean is different.

  • We must check three conditions for ANOVA:

  • (1) Observations are independent across groups. (2) The data within each group are nearly normal. (3) The variability across each group is about equal.

35 / 41

Example Data Motivating ANOVA

  • Let's consider our chicken feed data again. The first few rows are shown below:
head(chickwts)
## weight feed
## 1 179 horsebean
## 2 160 horsebean
## 3 136 horsebean
## 4 227 horsebean
## 5 217 horsebean
## 6 168 horsebean
36 / 41

Example Data Motivating ANOVA

  • Let's consider our chicken feed data again. The first few rows are shown below:
head(chickwts)
## weight feed
## 1 179 horsebean
## 2 160 horsebean
## 3 136 horsebean
## 4 227 horsebean
## 5 217 horsebean
## 6 168 horsebean
  • The next slide show a plot of this data.
36 / 41

Plot of Data

chickwts %>% ggplot(aes(x=feed,y=weight)) + geom_boxplot()

37 / 41

Plot of Data

chickwts %>% ggplot(aes(x=feed,y=weight)) + geom_boxplot()

  • It appears that the mean weight is not the same for all groups of chickens. The question we would like to have an answer to is, is the observed difference significant?
37 / 41

The F Statistics and the F-Test

  • Analysis of variance (ANOVA) is used to test whether the mean outcome differs across 2 or more groups.
38 / 41

The F Statistics and the F-Test

  • Analysis of variance (ANOVA) is used to test whether the mean outcome differs across 2 or more groups.

  • ANOVA uses a test statistic denoted by F, which represents a standardized ratio of variability in the sample means relative to the variability within groups.

38 / 41

The F Statistics and the F-Test

  • Analysis of variance (ANOVA) is used to test whether the mean outcome differs across 2 or more groups.

  • ANOVA uses a test statistic denoted by F, which represents a standardized ratio of variability in the sample means relative to the variability within groups.

  • ANOVA uses an F distribution to compute a p-value that corresponds to the probability of observing an F statistic value that is as or more extreme than the sample F statistic value under the assumption that the null hypothesis is true.

38 / 41

The F Statistics and the F-Test

  • Analysis of variance (ANOVA) is used to test whether the mean outcome differs across 2 or more groups.

  • ANOVA uses a test statistic denoted by F, which represents a standardized ratio of variability in the sample means relative to the variability within groups.

  • ANOVA uses an F distribution to compute a p-value that corresponds to the probability of observing an F statistic value that is as or more extreme than the sample F statistic value under the assumption that the null hypothesis is true.

  • We will see how to conduct ANOVA and an F-test using R.

38 / 41

The F Statistics and the F-Test

  • Analysis of variance (ANOVA) is used to test whether the mean outcome differs across 2 or more groups.

  • ANOVA uses a test statistic denoted by F, which represents a standardized ratio of variability in the sample means relative to the variability within groups.

  • ANOVA uses an F distribution to compute a p-value that corresponds to the probability of observing an F statistic value that is as or more extreme than the sample F statistic value under the assumption that the null hypothesis is true.

  • We will see how to conduct ANOVA and an F-test using R.

  • Before conducting ANOVA, we should discuss the necessary conditions for an ANOVA analysis.

38 / 41

Conditions for ANOVA

  • There are three conditions we must check for an ANOVA:
39 / 41

Conditions for ANOVA

  • There are three conditions we must check for an ANOVA:

    • Independence. If the data are a simple random sample, this condition is satisfied.
39 / 41

Conditions for ANOVA

  • There are three conditions we must check for an ANOVA:

    • Independence. If the data are a simple random sample, this condition is satisfied.

    • Normality. As with one- and two-sample testing for means, the normality assumption is especially important when the sample size is small. Grouped histograms are a good way to diagnose potential problems with the normality assumption for ANOVA.

39 / 41

Conditions for ANOVA

  • There are three conditions we must check for an ANOVA:

    • Independence. If the data are a simple random sample, this condition is satisfied.

    • Normality. As with one- and two-sample testing for means, the normality assumption is especially important when the sample size is small. Grouped histograms are a good way to diagnose potential problems with the normality assumption for ANOVA.

    • Constant variance. The variance in the groups should be close to equal. This assumption can be checked with side-by-side box plots.

39 / 41

ANOVA Examples

  • Let's see some examples of conducting ANOVA. We will do this together in R.
40 / 41

Next Time: Inference for Categorical Data

  • Now that we have explored a lot about inference for numerical data, our next topic is some methods of inference for categorical data. The included video is a good place to start:
41 / 41

Linear Regression: Introduction

  • Linear regression is a statistical method for fitting a line to data.
2 / 41
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow