# Lesson 8: Review for Exam 1

(Redirected from Lesson 8)

To prepare for the exam, you are strongly encouraged to review the readings, homework exercises, and other activities from Unit 1. Use the to review definitions of important terms. Here are the lesson outcomes that you should prepare to be tested on for the exam.

## 1 Lesson Outcomes

The expectation on the exam is that you should be able to do the following:

• Distinguish between a categorical and a quantitative variable.
• Distinguish between an observational study and an experiment.
• Distinguish between a population and a sample.
• Distinguish between a parameter and a statistic.
• Distinguish and give an example of each of the following sampling schemes:
• Simple random sampling
• Systematic sampling
• Cluster sampling
• Stratified sampling
• Convenience sampling
• Explain the significance of using a random sample.
• Determine the shape of a distribution using a histogram and/or box plot.
• Determine the centers of a given histogram and/or box plot.
• Identify the mean, median and standard deviation in skewed or normal histograms.
• Calculate the mean, median and standard deviation from quantitative data.
• Calculate a percentile from a quantitative data set.
• Calculate a five-number summary from quantitative data with Excel, or by hand.
• Create a histogram and a box-plot from quantitative data.
• State and apply the three axioms of probability.
• State the properties of a normal density curve.
• Calculate the z-score of an individual observation, given the mean and standard deviation.
• Interpret a z-score.
• Calculate probability as area under a normal density curve.
• Assess normality using a QQ-plot.
• Explain how a sampling distribution is created.
• Determine the mean, standard deviation and shape of a distribution of sample means.
• State and apply the Central Limit Theorem and the Law of Large numbers.
• Calculate probabilities using a distribution of sample means.

## 2 Lesson Summaries

Click on the link at right for a review of the summaries from each lesson.

Here are the summaries for each lesson in unit 1. Reviewing these key points from each lesson will help you in your preparation for the exam.

Lesson 01 Recap
• Each lesson follows the same schedule: Individual and Group Preparation, Class Meeting, and Homework Assignment and Quiz. Understanding this layout will help you successfully manage the workload of this class.
• In this class you will use the online textbook that has been written for you by your statistics teachers. All of the assignments and quizzes will be based on the readings, so study it well.
• By doing the work, staying on schedule, and living the Honor Code you can succeed in this class!

Lesson 02 Recap
• The Statistical Process has five steps: Design the study, Collect the data, Describe the data, Make inferences, Take action.
• In a designed experiment, researchers control the conditions of the study. In an observational study, researchers don't control the conditions but only observe what happens.
• There are many sampling methods used to obtain a sample from a population:
• A simple random sample (SRS) is a random selection taken from a population
• A systematic sample is every kth item in the population, beginning at a random starting point
• A cluster sample is all items in one or more randomly selected clusters, or blocks
• A stratified sample divides data into similar groups and an SRS is taken from each group
• A convenience sample is one easily obtained in a less-than-systematic way and should be avoided whenever possible
• Quantitative variables represent things that are numeric in nature, such as the value of a car or the number of students in a classroom. Categorical variables represent nonnumerical data that can only be considered as labels, such as colors or brands of shoes.
• The null hypothesis ($H_0$) is the foundational assumption about a population and represents the status quo. The alternative hypothesis ($H_a$) is a different assumption about a population. Using a hypothesis test, we determine whether it is more likely that the null hypothesis or the alternative hypothesis is true.

Lesson 03 Recap
• A histogram allows us to visually interpret data. Histograms can be left-skewed, right-skewed, or symmetrical and bell-shaped.
• The mean, median, and mode are measures of the center of a distribution. The mean is the most common measure of center, and is computed by adding up the observed data and dividing by the number of observations in the data set.
• The standard deviation is a number that describes how spread out the data are. A larger standard deviation means the data are more spread out than data with a smaller standard deviation.
• A parameter is a true (but usually unknown) number that describes a population. A statistic is an estimate of a parameter obtained from a sample.
• Quartiles/percentiles, Five-Number Summaries, and Boxplots are tools that help us understand data. The five-number summary of a data set contains the minimum value, the first quartile, the median, the third quartile, and the maximum value. A boxplot is a graphical representation of the five-number summary.

Lesson 04 Recap
• The three rules of probability are:
1. A probability is a number between 0 and 1.

$$0 ≤ P(X) ≤ 1$$

2. If you list all the outcomes of a probability experiment (such as rolling a die) the probability that one of these outcomes will occur is 1. In other words, the sum of the probabilities in any probability is 1.

$$\sum P(X) = 1$$

3. The probability that an outcome will not occur is 1 minus the probability that it will occur.

$$P(\text{not}~X) = 1 - P(X)$$

Lesson 05 Recap
• A normal density curve is symmetric and bell-shaped. The curve lies above the horizontal axis and the total area under the curve is equal to 1.
• A standard normal distribution has a mean of 0 and a standard deviation of 1. The 68-95-99.7% rule states that when data are normally distributed, approximately 68% of the data lie within 1 standard deviation from the mean, approximately 95% of the data lie within 2 standard deviations from the mean, and approximately 99.7% of the data lie within 3 standard deviations from the mean.
• A z-score tells us how many standard deviations away from the mean a given value is. It is calculated as: $\displaystyle{z = \frac{\text{value}-\text{mean}}{\text{standard deviation}} = \frac{x-\mu}{\sigma}}$
• The probability applet allows us to use z-scores to calculate proportions, probabilities, and percentiles.
• A Q-Q plot is used to assess whether or not a set of data is normally distributed.

Lesson 06 Recap
• The distribution of sample means is a distribution of all possible sample means ($\bar x$) for a particular sample size. It has a mean of $\mu$ and a standard deviation of $\sigma/\sqrt{n}$.
• The distribution of sample means is normal when $\bar x$ is normally distributed or when, thanks to the Central Limit Theorom (CLT), our sample size ($n$) is large.
• The Law of Large Numbers states that as the sample size ($n$) gets larger, the sample mean ($\bar x$) will get closer to the population mean ($\mu$).

Lesson 07 Recap
• When the distribution of sample means is normally distributed, we can use z-scores and the probability applet to calculate proportions and probabilities. A z-score is calculated as: $\displaystyle{z = \frac{\text{value}-\text{mean}}{\text{standard deviation}} = \frac{\bar x-\mu}{\sigma/\sqrt{n}}}$
• The $P$-value is the probability of getting a test statistic at least as extreme as the one you got, assuming $H_0$ is true. A $P$-value is calculated by finding the area under the normal distribution curve that is more extreme (farther away from the mean) than the z-score.