+ - 0:00:00
Notes for current slide
Notes for next slide

Lecture 12

JMG

MATH 204

1 / 22

Multiple Regression

  • Multiple regression builds on the foundations of simple linear regression to allow for more than one predictor. Watch the following video to get started.
2 / 22

Learning Objectives

  • After this lecture, you should
3 / 22

Learning Objectives

  • After this lecture, you should

    • know how to fit a multiple regression model using lm,
3 / 22

Learning Objectives

  • After this lecture, you should

    • know how to fit a multiple regression model using lm,

    • understand and be able to interpret adjusted R2, and

3 / 22

Learning Objectives

  • After this lecture, you should

    • know how to fit a multiple regression model using lm,

    • understand and be able to interpret adjusted R2, and

    • be able to use diagnostic plots to assess the validity of a linear fit.

3 / 22

Motivating Data

  • Consider the mariokart data set which consists of auction data from Ebay for the game Mario Kart for the Nintendo Wii. This data was collected in early October 2009.
4 / 22

Motivating Data

  • Consider the mariokart data set which consists of auction data from Ebay for the game Mario Kart for the Nintendo Wii. This data was collected in early October 2009.
head(mariokart)
## # A tibble: 6 x 12
## id duration n_bids cond start_pr ship_pr total_pr ship_sp seller_rate
## <dbl> <int> <int> <fct> <dbl> <dbl> <dbl> <fct> <int>
## 1 1.50e11 3 20 new 0.99 4 51.6 standard 1580
## 2 2.60e11 7 13 used 0.99 3.99 37.0 firstCl~ 365
## 3 3.20e11 3 16 new 0.99 3.5 45.5 firstCl~ 998
## 4 2.80e11 3 18 new 0.99 0 44 standard 7
## 5 1.70e11 1 20 new 0.01 0 71 media 820
## 6 3.60e11 3 19 new 0.99 4 45 standard 270144
## # ... with 3 more variables: stock_photo <fct>, wheels <int>, title <fct>
4 / 22

Motivating Data

  • Consider the mariokart data set which consists of auction data from Ebay for the game Mario Kart for the Nintendo Wii. This data was collected in early October 2009.
head(mariokart)
## # A tibble: 6 x 12
## id duration n_bids cond start_pr ship_pr total_pr ship_sp seller_rate
## <dbl> <int> <int> <fct> <dbl> <dbl> <dbl> <fct> <int>
## 1 1.50e11 3 20 new 0.99 4 51.6 standard 1580
## 2 2.60e11 7 13 used 0.99 3.99 37.0 firstCl~ 365
## 3 3.20e11 3 16 new 0.99 3.5 45.5 firstCl~ 998
## 4 2.80e11 3 18 new 0.99 0 44 standard 7
## 5 1.70e11 1 20 new 0.01 0 71 media 820
## 6 3.60e11 3 19 new 0.99 4 45 standard 270144
## # ... with 3 more variables: stock_photo <fct>, wheels <int>, title <fct>
  • Let's obtain another view of this data.
4 / 22

Glimpse of mariokart

glimpse(mariokart)
## Rows: 141
## Columns: 12
## $ id <dbl> 150377422259, 260483376854, 320432342985, 280405224677, 17~
## $ duration <int> 3, 7, 3, 3, 1, 3, 1, 1, 3, 7, 1, 1, 1, 1, 7, 7, 3, 3, 1, 1~
## $ n_bids <int> 20, 13, 16, 18, 20, 19, 13, 15, 29, 8, 15, 15, 13, 16, 6, ~
## $ cond <fct> new, used, new, new, new, new, used, new, used, used, new,~
## $ start_pr <dbl> 0.99, 0.99, 0.99, 0.99, 0.01, 0.99, 0.01, 1.00, 0.99, 19.9~
## $ ship_pr <dbl> 4.00, 3.99, 3.50, 0.00, 0.00, 4.00, 0.00, 2.99, 4.00, 4.00~
## $ total_pr <dbl> 51.55, 37.04, 45.50, 44.00, 71.00, 45.00, 37.02, 53.99, 47~
## $ ship_sp <fct> standard, firstClass, firstClass, standard, media, standar~
## $ seller_rate <int> 1580, 365, 998, 7, 820, 270144, 7284, 4858, 27, 201, 4858,~
## $ stock_photo <fct> yes, yes, no, yes, yes, yes, yes, yes, yes, no, yes, yes, ~
## $ wheels <int> 1, 1, 1, 1, 2, 0, 0, 2, 1, 1, 2, 2, 2, 2, 1, 0, 1, 1, 2, 0~
## $ title <fct> "~~ Wii MARIO KART &amp; WHEEL ~ NINTENDO Wii ~ BRAND NEW ~
5 / 22

Glimpse of mariokart

glimpse(mariokart)
## Rows: 141
## Columns: 12
## $ id <dbl> 150377422259, 260483376854, 320432342985, 280405224677, 17~
## $ duration <int> 3, 7, 3, 3, 1, 3, 1, 1, 3, 7, 1, 1, 1, 1, 7, 7, 3, 3, 1, 1~
## $ n_bids <int> 20, 13, 16, 18, 20, 19, 13, 15, 29, 8, 15, 15, 13, 16, 6, ~
## $ cond <fct> new, used, new, new, new, new, used, new, used, used, new,~
## $ start_pr <dbl> 0.99, 0.99, 0.99, 0.99, 0.01, 0.99, 0.01, 1.00, 0.99, 19.9~
## $ ship_pr <dbl> 4.00, 3.99, 3.50, 0.00, 0.00, 4.00, 0.00, 2.99, 4.00, 4.00~
## $ total_pr <dbl> 51.55, 37.04, 45.50, 44.00, 71.00, 45.00, 37.02, 53.99, 47~
## $ ship_sp <fct> standard, firstClass, firstClass, standard, media, standar~
## $ seller_rate <int> 1580, 365, 998, 7, 820, 270144, 7284, 4858, 27, 201, 4858,~
## $ stock_photo <fct> yes, yes, no, yes, yes, yes, yes, yes, yes, no, yes, yes, ~
## $ wheels <int> 1, 1, 1, 1, 2, 0, 0, 2, 1, 1, 2, 2, 2, 2, 1, 0, 1, 1, 2, 0~
## $ title <fct> "~~ Wii MARIO KART &amp; WHEEL ~ NINTENDO Wii ~ BRAND NEW ~
  • Question: What features affect the final price (total_pr) at which a game is sold?
5 / 22

A First Model

  • As a start, we fit a linear model with the game condition (cond) as the only predictor:
6 / 22

A First Model

  • As a start, we fit a linear model with the game condition (cond) as the only predictor:
lm_fit <- lm(total_pr~cond,data=mariokart)
summary(lm_fit)
##
## Call:
## lm(formula = total_pr ~ cond, data = mariokart)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.8911 -5.8311 0.1289 4.1289 22.1489
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 53.7707 0.9596 56.034 < 2e-16 ***
## condused -10.8996 1.2583 -8.662 1.06e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.371 on 139 degrees of freedom
## Multiple R-squared: 0.3506, Adjusted R-squared: 0.3459
## F-statistic: 75.03 on 1 and 139 DF, p-value: 1.056e-14
6 / 22

Results

  • Our first fit predicts that a used game will, on average, go for $10.90 less than a new game will.
7 / 22

Results

  • Our first fit predicts that a used game will, on average, go for $10.90 less than a new game will.

7 / 22

Results

  • Our first fit predicts that a used game will, on average, go for $10.90 less than a new game will.

  • Question: Do you think that the condition of the game alone is sufficient to predict the price of the game? Explain why or why not.
7 / 22

Adding Predictors

  • As we will see, in R it is extremely easy to fit a model with many predictors. Why might we want to do this?
8 / 22

Adding Predictors

  • As we will see, in R it is extremely easy to fit a model with many predictors. Why might we want to do this?

  • We would like to fit a model that includes all potentially important variables simultaneously.

8 / 22

Adding Predictors

  • As we will see, in R it is extremely easy to fit a model with many predictors. Why might we want to do this?

  • We would like to fit a model that includes all potentially important variables simultaneously.

  • Multiple regression can help us evaluate the relationship between a predictor variable and the outcome while controlling for the potential influence of other variables.
8 / 22

Adding Predictors

  • As we will see, in R it is extremely easy to fit a model with many predictors. Why might we want to do this?

  • We would like to fit a model that includes all potentially important variables simultaneously.

  • Multiple regression can help us evaluate the relationship between a predictor variable and the outcome while controlling for the potential influence of other variables.

  • Let's fit a more complicated linear model.

8 / 22

Multiple Regression Model

  • A multiple regression model is a linear model with many predictors. In general, we write the model as

y^=β0+β1x1+β2x2++βkxk

when there are k predictors. We always estimate the βi parameters using statistical software.

9 / 22

Multiple Regression Model

  • A multiple regression model is a linear model with many predictors. In general, we write the model as

y^=β0+β1x1+β2x2++βkxk

when there are k predictors. We always estimate the βi parameters using statistical software.

  • For example, we may want to use cond, stock_photo (whether the auction feature photo was a stock photo or not), duration (auction length, in days), and wheels (number of Wii wheels included in the auction) all as predictors of price for the mariokart data.
9 / 22

Multiple Regression Model

  • A multiple regression model is a linear model with many predictors. In general, we write the model as

y^=β0+β1x1+β2x2++βkxk

when there are k predictors. We always estimate the βi parameters using statistical software.

  • For example, we may want to use cond, stock_photo (whether the auction feature photo was a stock photo or not), duration (auction length, in days), and wheels (number of Wii wheels included in the auction) all as predictors of price for the mariokart data.

  • Let's obtain a linear fit with these predictors using lm.

9 / 22

Another Fit

lm_fit2 <- lm(total_pr~cond+stock_photo+duration+wheels,data=mariokart)
summary(lm_fit2)
##
## Call:
## lm(formula = total_pr ~ cond + stock_photo + duration + wheels,
## data = mariokart)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.3788 -2.9854 -0.9654 2.6915 14.0346
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.34153 1.71167 24.153 < 2e-16 ***
## condused -5.13056 1.05112 -4.881 2.91e-06 ***
## stock_photoyes 1.08031 1.05682 1.022 0.308
## duration -0.02681 0.19041 -0.141 0.888
## wheels 7.28518 0.55469 13.134 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.901 on 136 degrees of freedom
## Multiple R-squared: 0.719, Adjusted R-squared: 0.7108
## F-statistic: 87.01 on 4 and 136 DF, p-value: < 2.2e-16
10 / 22

Results

  • Notice that when we have controlled for other features, the condition (new versus used) of the game has a smaller impact on the price of the game since the slope estimate has gone from -10.90 to -5.13.
11 / 22

Results

  • Notice that when we have controlled for other features, the condition (new versus used) of the game has a smaller impact on the price of the game since the slope estimate has gone from -10.90 to -5.13.

  • For simple linear regression, we used R2 to determine the amount of variability in the response that was explained by the model. Recall that

R2=1variability in residualsvariability in the response

11 / 22

Results

  • Notice that when we have controlled for other features, the condition (new versus used) of the game has a smaller impact on the price of the game since the slope estimate has gone from -10.90 to -5.13.

  • For simple linear regression, we used R2 to determine the amount of variability in the response that was explained by the model. Recall that

R2=1variability in residualsvariability in the response

  • R2 does not work well for mulitple regression. Instead, we use adjusted R2.
11 / 22

Adjusted R2

  • The adjusted R2 is computed as

Radj2=1sresiduals2sresponse2n1nk1 where n is the number of observations and k is the number of predictor variables. Remember that a categorical predictor with p levels will contribute p1 to the number of variables in the model.

12 / 22

Adjusted R2

  • The adjusted R2 is computed as

Radj2=1sresiduals2sresponse2n1nk1 where n is the number of observations and k is the number of predictor variables. Remember that a categorical predictor with p levels will contribute p1 to the number of variables in the model.

  • Notice that the adjusted R2 will be smaller than the unadjusted R2.
12 / 22

Adjusted R2

  • The adjusted R2 is computed as

Radj2=1sresiduals2sresponse2n1nk1 where n is the number of observations and k is the number of predictor variables. Remember that a categorical predictor with p levels will contribute p1 to the number of variables in the model.

  • Notice that the adjusted R2 will be smaller than the unadjusted R2.

  • One of the main benefits of using adjusted R2 for multiple regression is that it accounts for model complexity.

12 / 22

Adjusted R2

  • The adjusted R2 is computed as

Radj2=1sresiduals2sresponse2n1nk1 where n is the number of observations and k is the number of predictor variables. Remember that a categorical predictor with p levels will contribute p1 to the number of variables in the model.

  • Notice that the adjusted R2 will be smaller than the unadjusted R2.

  • One of the main benefits of using adjusted R2 for multiple regression is that it accounts for model complexity.

  • The best model is not always the most complicated one. For one, more complex models are more likely to overfit.

12 / 22

Model Selection

  • Model selection seeks to identify variables in the model that may not be helpful.
13 / 22

Model Selection

  • Model selection seeks to identify variables in the model that may not be helpful.

  • The model that includes all available explanatory variables is referred to as the full model.

13 / 22

Model Selection

  • Model selection seeks to identify variables in the model that may not be helpful.

  • The model that includes all available explanatory variables is referred to as the full model.

  • There are a variety of model selection strategies that are used in practice. We will discuss two of the more common approaches.

13 / 22

Model Selecion Video

  • This video provides further perspective on model selection.
14 / 22

Model Selection Strategies

  • Backward Elimination. In this approach, we would identify the predictor corresponding to the largest p-value. If the p-value is above the significance level (usually α=0.05), then we drop that variable, refit the model, and repeat the process. If the largest p-value is less than the significance level, then we would not eliminate any predictors.
15 / 22

Model Selection Strategies

  • Backward Elimination. In this approach, we would identify the predictor corresponding to the largest p-value. If the p-value is above the significance level (usually α=0.05), then we drop that variable, refit the model, and repeat the process. If the largest p-value is less than the significance level, then we would not eliminate any predictors.

  • Forward Selection. This approach begins with no predictors, then we fit a model with each individual predictor one at a time and keep the predictor that has the smallest p-value. Forward selection proceeds by continuing to add at each step a predictor that results in the smallest p-value that is less than the significance level. When none of the remaining predictors can be added to the model and have a p-value less than the significance level, we stop.

15 / 22

Model Selection Strategies

  • Backward Elimination. In this approach, we would identify the predictor corresponding to the largest p-value. If the p-value is above the significance level (usually α=0.05), then we drop that variable, refit the model, and repeat the process. If the largest p-value is less than the significance level, then we would not eliminate any predictors.

  • Forward Selection. This approach begins with no predictors, then we fit a model with each individual predictor one at a time and keep the predictor that has the smallest p-value. Forward selection proceeds by continuing to add at each step a predictor that results in the smallest p-value that is less than the significance level. When none of the remaining predictors can be added to the model and have a p-value less than the significance level, we stop.

  • It is important to note that backward elimination and forward selection may not produce the same final model.

15 / 22

Model Selection Example

  • Let's see both backward elimination and forward selection applied to the mariokart data. Note that the full model is (in R formula notation) total_pr ~ cond + stock_photo + duration + wheels.
16 / 22

Model Selection Example

  • Let's see both backward elimination and forward selection applied to the mariokart data. Note that the full model is (in R formula notation) total_pr ~ cond + stock_photo + duration + wheels.
  • Let's work out the details together in R.
16 / 22

Model Selection Example

  • Let's see both backward elimination and forward selection applied to the mariokart data. Note that the full model is (in R formula notation) total_pr ~ cond + stock_photo + duration + wheels.
  • Let's work out the details together in R.

  • Backward elimination and forward selection use p-values in deciding which variables will make up the final model. However, there are other measures that are used in other approaches to model selection. For example, one could seek a model that has the largest adjusted R2 value. Information theoretic measures such as AIC and BIC are also often used. A discussion on these matters falls outside the scope of this course.

16 / 22

Model Selection Example

  • Let's see both backward elimination and forward selection applied to the mariokart data. Note that the full model is (in R formula notation) total_pr ~ cond + stock_photo + duration + wheels.
  • Let's work out the details together in R.

  • Backward elimination and forward selection use p-values in deciding which variables will make up the final model. However, there are other measures that are used in other approaches to model selection. For example, one could seek a model that has the largest adjusted R2 value. Information theoretic measures such as AIC and BIC are also often used. A discussion on these matters falls outside the scope of this course.

  • We note that there are packages associated with statistical software that implement various variable selection algorithms. For example, olsrr is an R package that implements a variety of variable selection methods.

16 / 22

Checking Model Conditions

  • Multiple regression methods using the model

y^=β0+β1x1+β2x2++βkxk generally depend on the following four conditions:

17 / 22

Checking Model Conditions

  • Multiple regression methods using the model

y^=β0+β1x1+β2x2++βkxk generally depend on the following four conditions:

  • the residuals of the model are nearly normal,
17 / 22

Checking Model Conditions

  • Multiple regression methods using the model

y^=β0+β1x1+β2x2++βkxk generally depend on the following four conditions:

  • the residuals of the model are nearly normal,

  • the variability of the residuals is nearly constant,

17 / 22

Checking Model Conditions

  • Multiple regression methods using the model

y^=β0+β1x1+β2x2++βkxk generally depend on the following four conditions:

  • the residuals of the model are nearly normal,

  • the variability of the residuals is nearly constant,

  • the residuals are independent, and

17 / 22

Checking Model Conditions

  • Multiple regression methods using the model

y^=β0+β1x1+β2x2++βkxk generally depend on the following four conditions:

  • the residuals of the model are nearly normal,

  • the variability of the residuals is nearly constant,

  • the residuals are independent, and

  • each variable is linearly related to the response.

17 / 22

Checking Model Conditions

  • Multiple regression methods using the model

y^=β0+β1x1+β2x2++βkxk generally depend on the following four conditions:

  • the residuals of the model are nearly normal,

  • the variability of the residuals is nearly constant,

  • the residuals are independent, and

  • each variable is linearly related to the response.

  • Diagnostic plots can be used to check each of these conditions.
17 / 22

Histogram of Residuals

  • A histogram of the residuals can be used to check for outliers. For example, the residuals for the final model for the mariokart data have the following histogram:

18 / 22

Histogram of Residuals

  • A histogram of the residuals can be used to check for outliers. For example, the residuals for the final model for the mariokart data have the following histogram:

  • There are no extreme outliers present.
18 / 22

Absolute Value of Residuals

  • A plot of the absolute value of residuals versus fitted values is helpful to check the condition that the variance of residuals is approximately constant.
19 / 22

Absolute Value of Residuals

  • A plot of the absolute value of residuals versus fitted values is helpful to check the condition that the variance of residuals is approximately constant.

19 / 22

Absolute Value of Residuals

  • A plot of the absolute value of residuals versus fitted values is helpful to check the condition that the variance of residuals is approximately constant.

  • There is no evident distinguished pattern.
19 / 22

Additional Diagnostic Plots

  • It can also be useful to examine the following types of plots:
20 / 22

Additional Diagnostic Plots

  • It can also be useful to examine the following types of plots:

    • Residuals in the order of their data collection. Such a plot is helpful in identifying any connection between cases that are close to one another.
20 / 22

Additional Diagnostic Plots

  • It can also be useful to examine the following types of plots:

    • Residuals in the order of their data collection. Such a plot is helpful in identifying any connection between cases that are close to one another.

    • Residuals against each predictor. We are looking for any notable change in variability between groups.

20 / 22

Additional Diagnostic Plots

  • It can also be useful to examine the following types of plots:

    • Residuals in the order of their data collection. Such a plot is helpful in identifying any connection between cases that are close to one another.

    • Residuals against each predictor. We are looking for any notable change in variability between groups.

  • These plots are shown for model results for the mariokart data on pages 369 and 370 of the textbook. Let's look at these together and discuss.

20 / 22

Diagnostic Plots Video

  • Watch the following video for further perspective on assessing multiple regression with plots.
21 / 22

Further Regression

  • When it comes to regression, we have only scratched the surface. There is more we could discuss regarding multiple regression, and there are also other types of regression. The following video provides an introduction to logistic regression
22 / 22

Further Regression

  • When it comes to regression, we have only scratched the surface. There is more we could discuss regarding multiple regression, and there are also other types of regression. The following video provides an introduction to logistic regression

  • For even more on regression, we recommend the text Linear Models with R by Faraway.

22 / 22

Multiple Regression

  • Multiple regression builds on the foundations of simple linear regression to allow for more than one predictor. Watch the following video to get started.
2 / 22
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow