Welcome to MATH 204 Introduction to Statistics!
Welcome to MATH 204 Introduction to Statistics!
The course description, learning outcomes, grade scheme, etc. may be found in the course syllabus posted on the course learning management system.
Please make sure you have read the syllabus carefully before the next class meeting.
If you have any questions regarding the syllabus feel free to ask the instructor in-person (LSC 319A) or via email (jason.graham@scranton.edu).
Our first quiz will contain questions about the syllabus.
The number one rule for this course is, ask a lot of questions.
The number two rule for this course is, bring your computer to class each day.
The required textbook for this course is the 4th edition of OpenIntro Statistics which is available for free and can be downloaded as a pdf file if you wish. You may also, for a modest price, purchase a print copy of the book.
There are a number of very useful resources associated with this book such as lecture videos, lecture slides, data sets, etc. We will make extensive use of many of these additional resources.
In this course, we will exploit the power of the R statistical computing environment and the interface to R provided by RStudio. These can both be accessed via a web browser by using RStudio Cloud.
R, R Studio, and RStudio Cloud are all free.
In this course, we will exploit the power of the R statistical computing environment and the interface to R provided by RStudio. These can both be accessed via a web browser by using RStudio Cloud.
R, R Studio, and RStudio Cloud are all free.
You must sign up for a free RStudio Cloud account (you can use an existing google account if you have one). My plan in the course this semester is to add everyone to an RStudio Cloud workspace where you will be able to access homework and lab assignments.
In this course, we will exploit the power of the R statistical computing environment and the interface to R provided by RStudio. These can both be accessed via a web browser by using RStudio Cloud.
R, R Studio, and RStudio Cloud are all free.
You must sign up for a free RStudio Cloud account (you can use an existing google account if you have one). My plan in the course this semester is to add everyone to an RStudio Cloud workspace where you will be able to access homework and lab assignments.
What can you do with R?
In this course, we will exploit the power of the R statistical computing environment and the interface to R provided by RStudio. These can both be accessed via a web browser by using RStudio Cloud.
R, R Studio, and RStudio Cloud are all free.
You must sign up for a free RStudio Cloud account (you can use an existing google account if you have one). My plan in the course this semester is to add everyone to an RStudio Cloud workspace where you will be able to access homework and lab assignments.
What can you do with R?
Great question!
2 + 2
## [1] 4
2 + 2
## [1] 4
mean(c(5,7,2,3,2,5,4,7))
## [1] 4.375
2 + 2
## [1] 4
mean(c(5,7,2,3,2,5,4,7))
## [1] 4.375
rnorm(10)
## [1] 0.6341227 0.1434006 -0.9974754 2.4204931 -2.0205946 -3.2013405## [7] -0.4428438 1.8298432 -1.7529270 1.3838205
2 + 2
## [1] 4
mean(c(5,7,2,3,2,5,4,7))
## [1] 4.375
rnorm(10)
## [1] 0.6341227 0.1434006 -0.9974754 2.4204931 -2.0205946 -3.2013405## [7] -0.4428438 1.8298432 -1.7529270 1.3838205
R is a programming language and it takes a little time to learn it, we will soon work through an introduction to R and RStudio.
Once we get over the starting hurdle for learning R, I think you will really enjoy it. Plus, R will make learning statistics a much more enjoyable and useful experience.
R is a programming language and it takes a little time to learn it, we will soon work through an introduction to R and RStudio.
Once we get over the starting hurdle for learning R, I think you will really enjoy it. Plus, R will make learning statistics a much more enjoyable and useful experience.
If you want to start learning some R, or if you find that you want to learn more R than what we cover in this course, a great resource is the swirl package which allows you to learn R interactively within R. Now that's meta 😄.
R is a programming language and it takes a little time to learn it, we will soon work through an introduction to R and RStudio.
Once we get over the starting hurdle for learning R, I think you will really enjoy it. Plus, R will make learning statistics a much more enjoyable and useful experience.
If you want to start learning some R, or if you find that you want to learn more R than what we cover in this course, a great resource is the swirl package which allows you to learn R interactively within R. Now that's meta 😄.
I'm also happy to provide you with other resources upon request.
Our dog Ziggy says, "hello!"
Obviously MATH 204 is about statistics, but what does this mean?
Statistics is fundamentally about data, how to collect data, how to analyze data, and how to use data to make inferences and draw conclusions about the real world.
Obviously MATH 204 is about statistics, but what does this mean?
Statistics is fundamentally about data, how to collect data, how to analyze data, and how to use data to make inferences and draw conclusions about the real world.
Section 1.1 of the textbook presents a case study to motivate the study of statistics. There is also a corresponding lecture video which you are asked to watch on your own time. For your convenience, the video is included in the next slide.
Obviously MATH 204 is about statistics, but what does this mean?
Statistics is fundamentally about data, how to collect data, how to analyze data, and how to use data to make inferences and draw conclusions about the real world.
Section 1.1 of the textbook presents a case study to motivate the study of statistics. There is also a corresponding lecture video which you are asked to watch on your own time. For your convenience, the video is included in the next slide.
Some things to think about when watching the video are:
The term "data" can be interpreted very broadly. However, the data that one typically analyzes using statistics or statistical methods have some common features that we will take a moment to point out:
The term "data" can be interpreted very broadly. However, the data that one typically analyzes using statistics or statistical methods have some common features that we will take a moment to point out:
The term "data" can be interpreted very broadly. However, the data that one typically analyzes using statistics or statistical methods have some common features that we will take a moment to point out:
The data is "structured" in the sense that it has an underlying order to it. This will be explained in greater detail soon.
The data is typically a "sample" in that it is but a minor representation of all of the data one could possibly collect.
Consider the following question:
"How much sleep do University of Scranton students get during the first week of classes?"
Consider the following question:
"How much sleep do University of Scranton students get during the first week of classes?"
Consider the following question:
"How much sleep do University of Scranton students get during the first week of classes?"
Can you think of a way to answer this question?
One way, at least in principle, would be to ask every single student at the University of Scranton to tell us how much they sleep during the first week of classes.
Consider the following question:
"How much sleep do University of Scranton students get during the first week of classes?"
Can you think of a way to answer this question?
One way, at least in principle, would be to ask every single student at the University of Scranton to tell us how much they sleep during the first week of classes.
Do you think this is a good way to answer our question or not?
Observe that our question1 is about a population2, in this case, all University of Scranton students. If we (hypothetically) record how much every single UofS student sleeps during the first week of classes, it might look something like this:
## r_number Sun Mon Tues Wed Thurs Fri Sat living year college## 1 R01920433 7.4 6.0 4.8 7.7 5.3 9.6 9.0 Martin First PCPS## 2 R01780024 10.0 7.2 3.9 7.3 4.9 9.3 9.9 Lynett First KSOM## 3 R01816495 8.6 6.7 7.2 5.6 5.3 8.5 8.7 Lynett First CAS## 4 R01647948 6.8 6.5 8.3 5.6 4.8 9.4 9.0 Driscoll First PCPS## 5 R01782171 8.2 6.6 6.1 5.6 4.8 8.3 9.4 Fitch First CAS
(Note: There would be around 4,000 rows, so only some are shown here.)
Observe that our question1 is about a population2, in this case, all University of Scranton students. If we (hypothetically) record how much every single UofS student sleeps during the first week of classes, it might look something like this:
## r_number Sun Mon Tues Wed Thurs Fri Sat living year college## 1 R01920433 7.4 6.0 4.8 7.7 5.3 9.6 9.0 Martin First PCPS## 2 R01780024 10.0 7.2 3.9 7.3 4.9 9.3 9.9 Lynett First KSOM## 3 R01816495 8.6 6.7 7.2 5.6 5.3 8.5 8.7 Lynett First CAS## 4 R01647948 6.8 6.5 8.3 5.6 4.8 9.4 9.0 Driscoll First PCPS## 5 R01782171 8.2 6.6 6.1 5.6 4.8 8.3 9.4 Fitch First CAS
(Note: There would be around 4,000 rows, so only some are shown here.)
Observe that our question1 is about a population2, in this case, all University of Scranton students. If we (hypothetically) record how much every single UofS student sleeps during the first week of classes, it might look something like this:
## r_number Sun Mon Tues Wed Thurs Fri Sat living year college## 1 R01920433 7.4 6.0 4.8 7.7 5.3 9.6 9.0 Martin First PCPS## 2 R01780024 10.0 7.2 3.9 7.3 4.9 9.3 9.9 Lynett First KSOM## 3 R01816495 8.6 6.7 7.2 5.6 5.3 8.5 8.7 Lynett First CAS## 4 R01647948 6.8 6.5 8.3 5.6 4.8 9.4 9.0 Driscoll First PCPS## 5 R01782171 8.2 6.6 6.1 5.6 4.8 8.3 9.4 Fitch First CAS
(Note: There would be around 4,000 rows, so only some are shown here.)
A population is often large and it is unfeasible to observe every individual in a population.
In practice, we take a sample and try to use the sample to infer something about the population.
[1] Recall that our question is "How much sleep do University of Scranton students get during the first week of classes?"
[2] Read section 1.3.1 for a discussion of populations and samples.
Here is a sample of the population of size n=25:
## r_number Sun Mon Tues Wed Thurs Fri Sat living year college## 1 R01718887 7.9 6.3 5.3 5.0 4.6 9.7 9.2 Junior/Senior Fourth CAS## 2 R01943517 7.4 5.8 5.9 6.2 4.9 7.5 8.9 McCourt First CAS## 3 R01866257 9.0 6.5 6.5 6.4 4.0 8.5 9.7 Junior/Senior Fourth KSOM## 4 R01942243 5.8 5.5 5.4 6.0 4.5 9.1 8.6 Off Campus Third CAS## 5 R01934011 6.4 5.1 5.5 7.3 5.2 9.6 8.9 Off Campus Third PCPS## 6 R01691560 9.4 7.2 6.0 7.0 5.2 9.5 9.2 Gavigan Second PCPS## 7 R01718539 7.7 5.2 5.8 5.7 5.4 10.2 8.5 Junior/Senior Fourth KSOM## 8 R01615516 6.4 5.5 5.2 7.4 4.8 11.7 9.1 Driscoll First CAS## 9 R01769847 6.6 5.7 7.6 5.1 5.1 8.4 9.3 Off Campus Fourth CAS## 10 R01850111 9.3 6.4 7.0 6.6 5.2 7.5 9.5 Off Campus Third KSOM## 11 R01574251 9.5 5.3 6.1 2.8 5.0 9.3 9.3 DE First CAS## 12 R01599827 8.6 7.2 6.0 7.4 3.2 9.5 8.8 Gavigan Second KSOM## 13 R01590698 9.0 6.3 6.0 6.7 5.6 12.3 9.5 Casey First CAS## 14 R01950549 6.6 6.3 7.3 6.1 4.6 9.7 8.9 Off Campus Second CAS## 15 R01804370 10.4 7.2 4.8 4.2 4.8 10.4 9.1 Junior/Senior Fourth KSOM## 16 R01839042 6.9 6.2 5.7 6.6 5.3 9.5 9.2 Gavigan Second CAS## 17 R01937793 7.8 4.4 8.0 4.3 5.0 8.1 8.8 Off Campus Third KSOM## 18 R01716376 8.9 6.2 5.2 5.1 5.3 7.0 8.1 Off Campus Third CAS## 19 R01937156 8.2 7.1 4.6 5.5 4.6 8.1 9.0 Gavigan Second KSOM## 20 R01589019 6.7 7.2 5.5 6.3 4.6 7.5 8.9 Junior/Senior Third CAS## 21 R01632235 7.9 5.5 6.3 5.1 4.7 7.9 8.3 Giblin-Kelly First CAS## 22 R01624959 8.7 6.9 6.3 6.6 5.5 8.6 8.5 McCourt First CAS## 23 R01913524 6.0 4.4 6.6 7.0 5.7 10.1 9.6 Junior/Senior Fourth CAS## 24 R01578674 5.1 6.4 5.6 7.8 4.5 7.2 9.2 Junior/Senior Third CAS## 25 R01752767 7.4 5.4 5.1 5.2 5.3 9.5 8.4 Junior/Senior Fourth CAS
How did we obtain the sample data? We selected 25 UofS students at random. The "at random" part is important and we will return to this point shortly.
Section 1.3 covers sampling principles in detail.
How did we obtain the sample data? We selected 25 UofS students at random. The "at random" part is important and we will return to this point shortly.
Section 1.3 covers sampling principles in detail.
Before we get into sampling principles, let's take a moment to reflect on how our (sample) data is represented.
Let's look at the first five rows of our sample data:
## r_number Sun Mon Tues Wed Thurs Fri Sat living year college## 1 R01718887 7.9 6.3 5.3 5.0 4.6 9.7 9.2 Junior/Senior Fourth CAS## 2 R01943517 7.4 5.8 5.9 6.2 4.9 7.5 8.9 McCourt First CAS## 3 R01866257 9.0 6.5 6.5 6.4 4.0 8.5 9.7 Junior/Senior Fourth KSOM## 4 R01942243 5.8 5.5 5.4 6.0 4.5 9.1 8.6 Off Campus Third CAS## 5 R01934011 6.4 5.1 5.5 7.3 5.2 9.6 8.9 Off Campus Third PCPS
Our data is organized into rows and columns, a so-called data matrix or data frame. Each row corresponds to a single observation which in this example is a single student.
The columns of our data correspond to variables, that is, the information or characteristics we observe and record about our observations.
Let's look at the first five rows of our sample data:
## r_number Sun Mon Tues Wed Thurs Fri Sat living year college## 1 R01718887 7.9 6.3 5.3 5.0 4.6 9.7 9.2 Junior/Senior Fourth CAS## 2 R01943517 7.4 5.8 5.9 6.2 4.9 7.5 8.9 McCourt First CAS## 3 R01866257 9.0 6.5 6.5 6.4 4.0 8.5 9.7 Junior/Senior Fourth KSOM## 4 R01942243 5.8 5.5 5.4 6.0 4.5 9.1 8.6 Off Campus Third CAS## 5 R01934011 6.4 5.1 5.5 7.3 5.2 9.6 8.9 Off Campus Third PCPS
Our data is organized into rows and columns, a so-called data matrix or data frame. Each row corresponds to a single observation which in this example is a single student.
The columns of our data correspond to variables, that is, the information or characteristics we observe and record about our observations.
Variables can usually be classified according to a type system.
r_number
- not a variableSun
- Sat
- continuous numerical living
& college
- nominal categorical year
- ordinal categorical We repeat the first five rows of our sample data:
## r_number Sun Mon Tues Wed Thurs Fri Sat living year college## 1 R01718887 7.9 6.3 5.3 5.0 4.6 9.7 9.2 Junior/Senior Fourth CAS## 2 R01943517 7.4 5.8 5.9 6.2 4.9 7.5 8.9 McCourt First CAS## 3 R01866257 9.0 6.5 6.5 6.4 4.0 8.5 9.7 Junior/Senior Fourth KSOM## 4 R01942243 5.8 5.5 5.4 6.0 4.5 9.1 8.6 Off Campus Third CAS## 5 R01934011 6.4 5.1 5.5 7.3 5.2 9.6 8.9 Off Campus Third PCPS
r_number
- not a variableSun
- Sat
- continuous numerical living
& college
- nominal categorical year
- ordinal categorical We repeat the first five rows of our sample data:
## r_number Sun Mon Tues Wed Thurs Fri Sat living year college## 1 R01718887 7.9 6.3 5.3 5.0 4.6 9.7 9.2 Junior/Senior Fourth CAS## 2 R01943517 7.4 5.8 5.9 6.2 4.9 7.5 8.9 McCourt First CAS## 3 R01866257 9.0 6.5 6.5 6.4 4.0 8.5 9.7 Junior/Senior Fourth KSOM## 4 R01942243 5.8 5.5 5.4 6.0 4.5 9.1 8.6 Off Campus Third CAS## 5 R01934011 6.4 5.1 5.5 7.3 5.2 9.6 8.9 Off Campus Third PCPS
r_number
is not a variable because it is a unique identifier for each observation.r_number
- not a variableSun
- Sat
- continuous numerical living
& college
- nominal categorical year
- ordinal categorical We repeat the first five rows of our sample data:
## r_number Sun Mon Tues Wed Thurs Fri Sat living year college## 1 R01718887 7.9 6.3 5.3 5.0 4.6 9.7 9.2 Junior/Senior Fourth CAS## 2 R01943517 7.4 5.8 5.9 6.2 4.9 7.5 8.9 McCourt First CAS## 3 R01866257 9.0 6.5 6.5 6.4 4.0 8.5 9.7 Junior/Senior Fourth KSOM## 4 R01942243 5.8 5.5 5.4 6.0 4.5 9.1 8.6 Off Campus Third CAS## 5 R01934011 6.4 5.1 5.5 7.3 5.2 9.6 8.9 Off Campus Third PCPS
r_number
is not a variable because it is a unique identifier for each observation.
Note that just because the value of a variable is a number does not necessarily make it numerical. For example, we could have recorded year
as 1, 2, 3, or 4 instead of "First", "Second", "Third", or "Fourth". A rule is, if it doesn't make sense to compute the average of a variable, then it's not numeric.
You should watch the following video on your own time.
county
data set discussed in the video is discrete and why?Consider the possum
data set from the openintro
R package that corresponds with the course text book, the first few rows are shown here:
head(possum) # R command used to print first few rows of a data frame
## # A tibble: 6 × 8## site pop sex age head_l skull_w total_l tail_l## <int> <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>## 1 1 Vic m 8 94.1 60.4 89 36 ## 2 1 Vic f 6 92.5 57.6 91.5 36.5## 3 1 Vic f 6 94 60 95.5 39 ## 4 1 Vic f 6 93.2 57.1 92 38 ## 5 1 Vic f 2 91.5 56.3 85.5 36 ## 6 1 Vic f 1 93.1 54.8 90.5 35.5
Consider the possum
data set from the openintro
R package that corresponds with the course text book, the first few rows are shown here:
head(possum) # R command used to print first few rows of a data frame
## # A tibble: 6 × 8## site pop sex age head_l skull_w total_l tail_l## <int> <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>## 1 1 Vic m 8 94.1 60.4 89 36 ## 2 1 Vic f 6 92.5 57.6 91.5 36.5## 3 1 Vic f 6 94 60 95.5 39 ## 4 1 Vic f 6 93.2 57.1 92 38 ## 5 1 Vic f 2 91.5 56.3 85.5 36 ## 6 1 Vic f 1 93.1 54.8 90.5 35.5
Consider the possum
data set from the openintro
R package that corresponds with the course text book, the first few rows are shown here:
head(possum) # R command used to print first few rows of a data frame
## # A tibble: 6 × 8## site pop sex age head_l skull_w total_l tail_l## <int> <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl>## 1 1 Vic m 8 94.1 60.4 89 36 ## 2 1 Vic f 6 92.5 57.6 91.5 36.5## 3 1 Vic f 6 94 60 95.5 39 ## 4 1 Vic f 6 93.2 57.1 92 38 ## 5 1 Vic f 2 91.5 56.3 85.5 36 ## 6 1 Vic f 1 93.1 54.8 90.5 35.5
State the type of each variable in the data matrix.
site
, pop
, and sex
are nominal categoricalage
is discrete numericalhead_l
to tail_l
are continuous numerical "How much sleep do University of Scranton students get during the first week of classes?"
"How much sleep do University of Scranton students get during the first week of classes?"
We need to consider how data are collected and how samples are obtained.
Importantly, we need to avoid as much as possible picking a biased sample.
Watch this video on your own time in order to reinforce the concepts of:
Consider the question:
"What is the most popular Starbucks drink for current students at the University?"
Consider the question:
"What is the most popular Starbucks drink for current students at the University?"
Consider the question:
"What is the most popular Starbucks drink for current students at the University?"
One approach to data collection for answering this question is to select some subset of students at the U and ask them about their favorite Starbucks drink.
For example, out of convenience we can ask this question to everyone in this class. In this case, our sample would be the students in this class.
Consider the question:
"What is the most popular Starbucks drink for current students at the University?"
One approach to data collection for answering this question is to select some subset of students at the U and ask them about their favorite Starbucks drink.
For example, out of convenience we can ask this question to everyone in this class. In this case, our sample would be the students in this class.
However, restricting our sample to students in a single class might not produce a sample that is sufficiently representative of the entire population. In other words, we may introduce bias by taking as our sample only students in this class.
Consider the question:
"What is the most popular Starbucks drink for current students at the University?"
One approach to data collection for answering this question is to select some subset of students at the U and ask them about their favorite Starbucks drink.
For example, out of convenience we can ask this question to everyone in this class. In this case, our sample would be the students in this class.
However, restricting our sample to students in a single class might not produce a sample that is sufficiently representative of the entire population. In other words, we may introduce bias by taking as our sample only students in this class.
It is preferable (essential) to obtain a sample by choosing individuals at random from the population. Random samples reduce bias!
Let's watch this video together and discuss.
Let's watch this video together and discuss.
Here we list and describe some of the most common sampling strategies:
Here we list and describe some of the most common sampling strategies:
Here we list and describe some of the most common sampling strategies:
Simple random sampling. In a simple random sample, each case in the population has an equal chance of being included in the sample.
Stratified sampling. The population is divided into groups called strata that are chosen so that similar cases are grouped together. Then, some other sampling method such as simple random sampling is used to select a sample from within each group. (In our Starbucks example we could first divide students by cohort, First year, Second year, etc. and then take a sample from each cohort).
Here we list and describe some of the most common sampling strategies:
Simple random sampling. In a simple random sample, each case in the population has an equal chance of being included in the sample.
Stratified sampling. The population is divided into groups called strata that are chosen so that similar cases are grouped together. Then, some other sampling method such as simple random sampling is used to select a sample from within each group. (In our Starbucks example we could first divide students by cohort, First year, Second year, etc. and then take a sample from each cohort).
Cluster sampling. This breaks the population up into many groups called clusters, then we sample a fixed number of clusters and include observations from each of those clusters.
Here we list and describe some of the most common sampling strategies:
Simple random sampling. In a simple random sample, each case in the population has an equal chance of being included in the sample.
Stratified sampling. The population is divided into groups called strata that are chosen so that similar cases are grouped together. Then, some other sampling method such as simple random sampling is used to select a sample from within each group. (In our Starbucks example we could first divide students by cohort, First year, Second year, etc. and then take a sample from each cohort).
Cluster sampling. This breaks the population up into many groups called clusters, then we sample a fixed number of clusters and include observations from each of those clusters.
Multistage sampling. This is like cluster sampling but rather than keeping all observations in each cluster, we collect a random sample within each cluster.
Many analyses are motivated by a researcher looking for a relationship between two or more variables.
Many analyses are motivated by a researcher looking for a relationship between two or more variables.
When two variables show some connection with one another, they are called associated variables.
Many analyses are motivated by a researcher looking for a relationship between two or more variables.
When two variables show some connection with one another, they are called associated variables.
"A pair of variables are either related in some way (associated) or not (independent). No pair of variables is both associated and independent."
Many analyses are motivated by a researcher looking for a relationship between two or more variables.
When two variables show some connection with one another, they are called associated variables.
"A pair of variables are either related in some way (associated) or not (independent). No pair of variables is both associated and independent."
Many analyses are motivated by a researcher looking for a relationship between two or more variables.
When two variables show some connection with one another, they are called associated variables.
"A pair of variables are either related in some way (associated) or not (independent). No pair of variables is both associated and independent."
It is important to point out that association does not imply causation.
In the next class meeting, we will go through an introduction to R where we will work with some data and see explicit examples where two (or more) variables in a data set might be related.
When we ask questions about the relationship between two variables, we sometimes also want to determine if the change in one variable causes a change in the other.
For example, consider the county
data set from the openintro
R package. The first few rows of select columns of this data set are shown below:
## # A tibble: 6 × 4## name state pop_change median_hh_income## <chr> <fct> <dbl> <int>## 1 Autauga County Alabama 1.48 55317## 2 Baldwin County Alabama 9.19 52562## 3 Barbour County Alabama -6.22 33368## 4 Bibb County Alabama 0.73 43404## 5 Blount County Alabama 0.68 47412## 6 Bullock County Alabama -2.28 29655
When we ask questions about the relationship between two variables, we sometimes also want to determine if the change in one variable causes a change in the other.
For example, consider the county
data set from the openintro
R package. The first few rows of select columns of this data set are shown below:
## # A tibble: 6 × 4## name state pop_change median_hh_income## <chr> <fct> <dbl> <int>## 1 Autauga County Alabama 1.48 55317## 2 Baldwin County Alabama 9.19 52562## 3 Barbour County Alabama -6.22 33368## 4 Bibb County Alabama 0.73 43404## 5 Blount County Alabama 0.68 47412## 6 Bullock County Alabama -2.28 29655
"If there is an increase in the median household income in a county, does this drive an increase in its population?"
When we ask questions about the relationship between two variables, we sometimes also want to determine if the change in one variable causes a change in the other.
For example, consider the county
data set from the openintro
R package. The first few rows of select columns of this data set are shown below:
## # A tibble: 6 × 4## name state pop_change median_hh_income## <chr> <fct> <dbl> <int>## 1 Autauga County Alabama 1.48 55317## 2 Baldwin County Alabama 9.19 52562## 3 Barbour County Alabama -6.22 33368## 4 Bibb County Alabama 0.73 43404## 5 Blount County Alabama 0.68 47412## 6 Bullock County Alabama -2.28 29655
"If there is an increase in the median household income in a county, does this drive an increase in its population?"
In this lecture, we have covered the following topics and concepts:
Course logistics, syllabus, textbook, use or R, etc.
Data and it's structure, e.g., data matrices and variable types.
Sampling principles and strategies.
Relationship between variables.
Before the next class meeting, please complete the following tasks:
Read the syllabus carefully.
Respond via email to my "getting to know you" prompt.
Review Chapter 1 of Textbook
Register for RStudio Cloud, if you already have a gmail account, you can just log in using it.
Accept my invitation to the MATH204 RStudio Cloud Workspace.
Make sure to bring your computer with you to class.
Have a great rest of the day.
If you want to get a head start for the next class or two, watch this video:
Welcome to MATH 204 Introduction to Statistics!
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |