# Lesson 10: Inference for One Mean: Sigma Known (Confidence Interval)

(Redirected from Lesson 10)

These optional videos discuss the contents of this lesson.

## 1 Lesson Outcomes

By the end of this lesson, you should be able to:

• Confidence Intervals for a single mean with σ known:
• Calculate and interpret a confidence interval for a population mean given a confidence level.
• Explain how the margin of error changes with the sample size and the level of confidence.
• Identify a point estimate and margin of error for the confidence interval.
• Show the appropriate connections between the numerical and graphical summaries that support this confidence interval.
• Check the requirements the confidence interval.
• Calculate a desired sample size given a level of confidence and margin of error.

## 2 Political Polls

During an election in the United States, many polls are conducted to determine the attitudes of likely voters. Poll results are usually reported as percentages. For example, a poll might state that 49% favor the Republican candidate and 51% favor the Democratic candidate.

Polls always include a margin of error. The margin of error gives an estimate of the variability in the responses. A common value for the margin of error in political polls is 3%.

When we incorporate the margin of error, we estimate that the true proportion of people who favor the Republican candidate is 49% ± 3%, or in other words between 46% and 52%. For the Democratic candidate, we get 48% to 54%. There is a lot of overlap in these numbers. In this case, the political race is too close to know who might win.

In this reading, we will explore the margin of error and its role in estimating a parameter.

## 3 Background

### 3.1 Point Estimators

We have learned about several statistics. Remember, a statistic is any number computed based on data. The sample statistics we have discussed are used to estimate population parameters.

Sample Statistic Population Parameter
Mean $\bar x$ $\mu$
Standard Deviation $s$ $\sigma$
Variance $s^2$ $\sigma^2$
$\vdots$ $\vdots$ $\vdots$

The statistics above are called point estimators because they are just one number (one point on a number line) that is used to estimate a parameter. Parameters are generally unknown values. Think about the mean. If $\mu$ is unknown, how do we know if $\bar x$ is close to it?

The short answer is that we will never know for sure if $\bar x$ is close to $\mu$. This does not mean that we are helpless. The laws of probability and the normal distribution provide a way for us to create a range of plausible values for a parameter (e.g. $\mu$) based on a statistic (e.g. $\bar x$).

### 3.2 Interval Estimators

A point estimator gives one specific value as an estimate of a parameter. An interval estimator is a range of plausible values for a parameter. We can create an interval estimate by starting with a point estimate and adding or subtracting the margin of error.

In the political poll mentioned above, the point estimate for the support of the Republican candidate is 49%. By adding and subtracting the margin of error, we get the interval estimate: 46% to 52%.

A confidence interval is a commonly used interval estimator. In this reading, we will explore how to create a confidence interval for the mean when $\sigma$ is known.

### 3.3 The Margin of Error

#### 3.3.1 Properties of Bell-shaped Curves

1. Fill in the blank in the following sentence.
"The 68-95-99.7% rule only applies for distributions that are _________."
The 68-95-99.7% rule only applies for distributions that are bell-shaped.
In the past, some students have answered that the data must be normally distributed. Actually, the 68-95-99.7% rule will work for any distribution that is mound-shaped and symmetrical. As long as the data are unimodal and not skewed left or skewed right, this rule works well.

2. Approximately what percentage of data from a bell-shaped distribution will lie within two standard deviations of the mean?
Using the 68-95-99.7% rule, about 95% of the observations will lie within two standard deviations of the mean.

#### 3.3.2 The Distribution of the Sample Mean

We learned in the reading Distribution of Sample Means & The Central Limit Theorem about the characteristics of the sample mean, $\bar x$. Specifically, if the population from which the data are drawn is (i) approximately normal or (ii) if the sample size is large, then $\bar x$ will be approximately normally distributed. Furthermore, if the original population has mean $\mu$ and standard deviation $\sigma$, then the sampling distribution of $\bar x$ will have mean $\mu$ and standard deviation $\sigma/\sqrt{n}$.

So, if either condition (i) or (ii) is met, then we can consider the sample mean $\bar x$ as a normal random variable with mean $\mu$ and standard deviation $\sigma/\sqrt{n}$. According to the 68-95-99.7% rule for symmetric bell-shaped distributions, about 95% of the time, the sample mean ($\bar x$) will lie within two standard deviations of the population mean ($\mu$).

This is an important concept. Make sure that you understand the logic above before you continue reading.

#### 3.3.3 How Far is $\bar x$ from $\mu$, or in other words, How Far is $\mu$ from $\bar x$?

Assuming that $\bar x$ is approximately normally distributed, about 95% of the time, it will be within two standard deviations of $\mu$.

Remember, the standard deviation of $\bar x$ is $\frac{\sigma}{\sqrt{n}}$. For the variable $\bar x$, two standard deviations would be equal to $2 \frac{\sigma}{\sqrt{n}}$.

If we collect a random sample from a population and $\bar x$ is normally distributed, then about 95% of the time the sample mean $\bar x$ will be less than $2 \frac{\sigma}{\sqrt{n}}$ units away from the population mean $\mu$. Notice that this is true, whether or not we know $\mu$.

We can express this as a probability statement:

$$P\left( \mu - 2 \frac{\sigma}{\sqrt{n}} < \bar X < \mu + 2 \frac{\sigma}{\sqrt{n}} \right) \approx 0.95$$

Here is the magic: If $\bar x$ is within 2 standard deviations of $\mu$, then $\mu$ is within 2 standard deviations of $\bar x$. It may seem silly to state it, but this is very important.

 Starting with $$\mu - 2 \frac{\sigma}{\sqrt{n}} < \bar X < \mu + 2 \frac{\sigma}{\sqrt{n}}$$ We subtract $\mu$ from all parts of the inequality: $$\mu - 2 \frac{\sigma}{\sqrt{n}} - \mu < \bar X - \mu < \mu + 2 \frac{\sigma}{\sqrt{n}} - \mu$$ Which reduces to $$- 2 \frac{\sigma}{\sqrt{n}} < \bar X - \mu < 2 \frac{\sigma}{\sqrt{n}}$$ Now, subtract $\bar X$ from all the terms: $$- 2 \frac{\sigma}{\sqrt{n}} - \bar X < \bar X - \mu -\bar X < 2 \frac{\sigma}{\sqrt{n}} -\bar X$$ And this simplifies to $$- 2 \frac{\sigma}{\sqrt{n}}-\bar X < -\mu < 2 \frac{\sigma}{\sqrt{n}}-\bar X$$ Multiplying all the terms by $-1$, we get $$2 \frac{\sigma}{\sqrt{n}}+\bar X > \mu > -2 \frac{\sigma}{\sqrt{n}}+\bar X$$ Rewriting this reversing the order of the terms, we get $$\bar X - 2 \frac{\sigma}{\sqrt{n}} < \mu < \bar X + 2 \frac{\sigma}{\sqrt{n}}$$

So, the statement $$P\left( \mu - 2 \frac{\sigma}{\sqrt{n}} < \bar X < \mu + 2 \frac{\sigma}{\sqrt{n}} \right) \approx 0.95$$ is equivalent to the statement $$P\left( \bar X - 2 \frac{\sigma}{\sqrt{n}} < \mu < \bar X + 2 \frac{\sigma}{\sqrt{n}} \right) \approx 0.95$$

## 4 Putting It All Together: Confidence Intervals for $\mu$ when $\sigma$ is Known

Notice that if we know $\sigma$, then with approximately 95% confidence, $\mu$ will be between the following two values: $$\left( \bar X - 2 \frac{\sigma}{\sqrt{n}}, ~ \bar X + 2 \frac{\sigma}{\sqrt{n}} \right)$$ This equation gives a confidence interval for $\mu$.

A confidence interval is actually a point estimate $\left( \bar x \right)$ plus or minus the margin of error $\left( 2 \frac{\sigma}{\sqrt{n}} \right)$.

We use the letter $m$ to denote the margin of error: $$m = 2 \frac{\sigma}{\sqrt{n}}$$ Using this definition for $m$, our confidence interval can be written as $$( \bar x - m, ~ \bar x + m )$$ or $$\bar x \pm m$$ This pattern will be repeated throughout the course as we create confidence intervals of the form: \left( \begin{align} point~ ~ ~ & & margin~ & & point~ ~ ~ & & margin~ \\ estimate & ~ ~- & of~error & ~ ~, & estimate & ~ ~+ & of~error \end{align} \right) or $$(\text{point estimate}) \pm (\text{margin of error})$$

### 4.1 A Little More Precision

#### 4.1.1 What $z$ Corresponds to 95% of the Area?

The 68-95-99.7% Rule for bell-shaped distribution is just a quick approximation. This is useful for estimation, but more precision is usually required.

3. For a standard normal distribution, between what two $z$-scores will 95% of the data fall? In other words, find the values $-z$ and $z$ such that the area between them is equal to 0.95. Use the Normal Probability Applet.
To find the value of $z^*$, given a specific confidence level, do the following:
• Open the applet at Normal Probability Applet
• Shade the area between the two red lines (the middle portion of the graph) only and enter the decimal value for the desired percentage (e.g. 0.95 for 95%) in the Area box.
• The value of $z^*$ will be displayed below. Two numbers will be given along the horizontal axis. The only difference in these numbers is their sign: one is positive and one is negative. The positive value is the desired $z^*$.
95% of the area under the standard normal curve is between $z=-1.96$ and $z=1.96$
For normally distributed data, 95% of the observations will fall within 1.96 standard deviations of the mean.

The actual formula for a 95% confidence interval for $\mu$ (when $\sigma$ is known) is: $$\left( \bar x - 1.96 \frac{\sigma}{\sqrt{n}}, \bar x + 1.96 \frac{\sigma}{\sqrt{n}} \right)$$ Please notice that the number 2 was used as an approximation for the actual value of 1.96. When computing a 95% confidence interval for a mean with $\sigma$ known, please use 1.96 in the equation.

#### 4.1.2 Worked Example: Rolling a Die 25 Times

We will compute a 95% confidence interval for the mean of $n=25$ rolls of a fair die. A die was rolled 25 times. The values rolled were: $$3 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 4 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 6 ~ ~ ~ 5 ~ ~ ~ 6 ~ ~ ~ 3 ~ ~ ~ 4 ~ ~ ~ 2 ~ ~ ~ 2 ~ ~ ~ 5 ~ ~ ~ 6 ~ ~ ~ 2 ~ ~ ~ 6 ~ ~ ~ 4 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 5$$ The mean of these values is $\bar x = 3.04$. It is a fact that for the outcome of a six-sided die, $\sigma=\sqrt{\frac{35}{12}} \approx 1.7078$.

Applying the formula for a 95% confidence interval, $$\left( \bar x - z^* \frac{\sigma}{\sqrt{n}}, ~ \bar x + z^* \frac{\sigma}{\sqrt{n}} \right)$$ we get: $$\left( 3.04 - 1.96 \frac{1.7078}{\sqrt{25}}, ~ 3.04 + 1.96 \frac{1.7078}{\sqrt{25}} \right)$$ or $$(2.37, ~ 3.71)$$

#### 4.1.3 What about Other Confidence Levels?

Most of the time, researchers report 95% confidence intervals. The number 95% is called the confidence level. Sometimes it is desirable to use a level of confidence that is different than 95%. In that case, we need to use a number besides 1.96 in the calculations.

Suppose we want to create a 90% confidence interval. What value would we put in the blanks in the confidence interval formula below? $$\left( \bar x - \underline{~ ~ ~ ? ~ ~ ~} \frac{\sigma}{\sqrt{n}}, ~ \bar x + \underline{~ ~ ~ ? ~ ~ ~} \frac{\sigma}{\sqrt{n}} \right)$$ Another way to state this question is to ask, between what two $z$-scores will 90% of the data in a standard normal distribution fall? We need to find the values of $-z$ and $z$ such that the area between them is equal to 0.90. Again, we use the Normal Probability Applet.

So, for a 90% confidence interval, the value of $z*$ would be 1.645. Note: We use an asterisk ($*$) to indicate that $z$ was not computed from data, but was determined based on a chosen confidence level, 90%.

The formula for a 90% confidence interval for a mean when $\sigma$ is known is: $$\left( \bar x - 1.645 \frac{\sigma}{\sqrt{n}}, ~ \bar x + 1.645 \frac{\sigma}{\sqrt{n}} \right)$$

When $\sigma$ is known, we use $z^* = 1.96$ to create 95% confidence intervals, and we use $z^* = 1.645$ to create 90% confidence intervals.

#### 4.1.4 Formula for the Confidence Interval for $\mu$ ($\sigma$ Known)

With this notation, the confidence interval formula generalizes to: $$\left( \bar x - z^* \frac{\sigma}{\sqrt{n}}, ~ \bar x + z^* \frac{\sigma}{\sqrt{n}} \right)$$ where $z^*$ is determined by the level of confidence. If you want a 90% confidence interval, then $z^*$ is the number of standard deviations on either side of the mean that you must go to capture 90% of the data.

4. If $\bar x$ follows a normal distribution and $\sigma$ is known, what is the equation for a 99% confidence interval for the true population mean?
For a 99% confidence level, $z^* = 2.576$. The equation for the 99% confidence interval becomes:
$\left( \bar x - 2.576 \frac{\sigma}{\sqrt{n}}, ~ \bar x + 2.576 \frac{\sigma}{\sqrt{n}} \right)$

5. Apply the equation from the previous problem to find a 99% confidence interval for the mean value rolled on the die. Use the data given above. For convenience, they are reproduced here:

$3 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 4 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 2 ~ ~ ~ 6 ~ ~ ~ 5 ~ ~ ~ 6 ~ ~ ~ 3 ~ ~ ~ 4 ~ ~ ~ 2 ~ ~ ~ 2 ~ ~ ~ 5 ~ ~ ~ 6 ~ ~ ~ 2 ~ ~ ~ 6 ~ ~ ~ 4 ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ ~ 5$

If we were making a 99% confidence interval, we follow the same procedure as for a 95% confidence interval, only we use $z^*=2.576$. The 99% confidence interval for the true mean is:
$\left( 3.04 - 2.576 \frac{1.7078}{\sqrt{25}}, ~ 3.04 + 2.576 \frac{1.7078}{\sqrt{25}} \right)$
which simplifies to:
$(2.16, ~ 3.92)$

The most commonly used values for $z^*$ are:

Confidence
Level
$z^*$
90% 1.645
95% 1.960
99% 2.576

Any other values that you need can be determined using the Normal Probability Applet.

### 4.2 Factors Affecting the Width of the Confidence Interval

Recall that the formula to compute the confidence interval for a mean, where $\sigma$ is known is: $$\left( \bar x - z^* \frac{\sigma}{\sqrt{n}}, ~ \bar x + z^* \frac{\sigma}{\sqrt{n}} \right)$$

6. What would happen to the confidence interval if the sample size $n$ was increased, but the other values were still the same?
Since the sample size $n$ is in the denominator, if $n$ is increased, $\sqrt{n}$ would increase, and the fraction $\frac{\sigma}{\sqrt{n}}$ would decrease, which would lead to a smaller margin of error. In other words, if the sample size is increased, the width of a confidence interval will decrease--it will become narrower.

7. What would happen to the confidence interval if the confidence level were increased, say from 95% to 99%?
Increasing the confidence level will lead to larger values of $z^*$. If $z^*$ is increased, then the margin of error $z^* \frac{\sigma}{\sqrt{n}}$ would increase. This would lead to a wider confidence interval.

### 4.3 Interpretation of Confidence Intervals

How do we interpret confidence intervals? What do they really mean?

 (Image source: Flickr/ICMA Photos)

Consider a coin with two sides: one called "heads" and the other called "tails". Imagine that you flipped this coin, but you have not looked at it yet. What is the probability that the coin shows heads?

Strangely enough, the answer is, it depends! If the head is facing up, then the probability that the coin shows heads is 1. If the head is facing down, then the probability that it shows heads is 0.

The coin has been tossed. There is no randomness left in the process. So, either the head is facing up or it is facing down. So, the probability that the coin shows heads is either 1 or 0. (We just don't know which.) The fact that we do not know the outcome does not change it or make it random.

Before we toss the coin, the probability that the coin will show heads is $\frac{1}{2}=0.5$. After we toss the coin, the probability that we get heads is 1 or 0.

Transferring this reasoning to confidence intervals, we get a similar result.

Once we have collected data on something, there is no randomness in the system. Any confidence interval that is created using that data will either contain the true parameter ($\mu$) or it will not. After collecting data, the probability that a specific confidence interval will contain $\mu$ is either 1 or 0.

The correct interpretation of a 95% confidence interval is to say, "We are 95% confident that the true mean lies within the lower and upper bounds of the confidence interval."

Consider the 95% confidence interval for the true mean of 25 rolls of a fair die. We found the 95% confidence interval to be: $(2.37, 3.71)$. When we interpret this confidence interval, we say, "We are 95% confident that the true mean is between 2.37 and 3.71."

The word, "confident" implies that if we repeated this process many, many times, 95% of the confidence intervals we would get would contain the true mean $\mu$. It does not imply anything about whether or not one specific confidence interval will contain the true mean.

We do not say that "there is a 95% probability (or chance) that the true mean is between 2.37 and 3.71." The probability that the true mean $\mu$ is between 2.37 and 3.71 is either 1 or 0.

### 4.4 Requirements

There are three requirements that need to be checked when computing a confidence interval for a mean with $\sigma$ known:

1. A simple random sample was drawn from the population
2. $\bar x$ is normally distributed
3. $\sigma$ is assumed to be known

The requirement of normality is satisfied if (a) the raw data are normally distributed or (b) the sample size is large. This procedure is robust to moderate departures from normality. Even if the requirement that $\bar x$ is normally distributed is not satisfied perfectly, it is usually okay to conduct the test.

In practice, we never really know $\sigma$. This procedure is primarily used to help you understand the idea of confidence intervals. When $\sigma$ is unknown, we use a slightly different computation.

### 4.5 Example: Costs of CABG Surgery

 (Photo credit: Vanderbilt Photo/Neil Brake)

This reading focuses on an important aspect of designing a study: determining the sample size.

It is important for health care administrators to know the mean hospital costs for patients who have coronary artery bypass graft (CABG) surgery. A large hospital is planning a study to determine their mean costs for patients who have CABG surgery.

A study will be conducted in which the charts of patients who had CABG surgery will be sampled, and their hospital costs will be recorded. For budgetary reasons, the hospital administrators do not want to collect a sample that is too large. However, if the sample size is not large enough, the confidence interval will be too wide to be useful as a planning tool.

After a discussion among the senior administration, they have determined that they want to estimate the mean hospital costs of CABG surgery within $2000 (i.e., plus or minus$2000.) In other words, they want the confidence interval for the true mean to have a margin of error of $2000 dollars. Recall the equation for the confidence interval is: $$\left( \bar x - z^* \frac{\sigma}{\sqrt{n}}, \bar x + z^* \frac{\sigma}{\sqrt{n}} \right)$$ The part of the equation that is added to and subtracted from$\bar x$is called the margin of error. We will denote the margin of error by the letter$m$. $$m=z^* \frac{\sigma}{\sqrt{n}}$$ To use this formula, the parameter$\sigma$must be given to us. It is the true standard deviation of the data you are observing. If you do not know$\sigma$, you can estimate it using the standard deviation reported in a previous study or by conducting a pilot study. A study published by another hospital reported that the standard deviation of the costs for CABG surgery was$28,705 .

In the following questions, you will compute the margin of error, $m$, for a future study of the hospital costs for CABG surgery. The hospital administrators want to use a 95% level of confidence. Assume the standard deviation can be estimated to be $\sigma = \$28,705$. Answer the following questions: 8. If the hospital administrators want to be 95% confident in the results, what should the value of$z^*$be?$ \begin{align} z^*=1.96 \end{align} $9. If the hospital collects a sample of$n=100$patients' costs, what would the margin of error be?$ \begin{array}{1cl} m &= z^* \frac{\sigma}{\sqrt{n}} \\ &= 1.96 \cdot \frac{28705}{\sqrt{100}} \\ &= \$5626.18 \end{array}$

10. If the hospital collects a sample of $n=1000$ patients' costs, what would the margin of error be?

$\begin{array}{1cl} m &= z^* \frac{\sigma}{\sqrt{n}} \\ &= 1.96 \cdot \frac{28705}{\sqrt{1000}} \\ &= \$1779.15 \end{array} $11. Choose any other value for the sample size,$n$. Find the margin of error for your sample size. Answers will vary. 12. Repeat Question 11 until you find the sample size that will yield a margin of error as close to$2000 as possible, without going over.

$n = 792$

## 5 Sample Size Calculations

The process you followed in Questions 8 - 12 is effective, but tedious. There must be an easier way! The trick is to solve the margin of error equation $m = z^* \frac{\sigma}{\sqrt{n}}$ for $n$.

 Starting with $$m = z^* \frac{\sigma}{\sqrt{n}}$$ Multiply both sides of the equation by $\sqrt{n}$: $$m \cdot \sqrt{n} = z^* \frac{\sigma}{\sqrt{n}} \cdot \sqrt{n}$$ Which reduces to: $$m \cdot \sqrt{n} = z^* \cdot \sigma$$ Divide both sides of the equation by $m$. Cancelling the $m$'s on the left hand side: $$\sqrt{n} = \frac{z^* \cdot \sigma}{m}$$ Squaring both sides: $$\left( \sqrt{n} \right)^2 = \left( \frac{z^* \cdot \sigma}{m} \right)^2$$ Which simplifies to the desired result: $$n = \left( \frac{z^* \cdot \sigma}{m} \right)^2$$

This gives us the sample size formula, which tells the number of observations required in order to obtain a specified margin of error: $$n = \left( \frac{z^*\sigma}{m} \right)^2$$

Once the level of confidence is selected, $z^*$ is automatically determined. In practice, the most common level of confidence is 95%, which means that $z^*$ would equal 1.96.

If you need a reminder on finding the value of $z^*$, click here.

13. In Question 12, you used the process of guess-and-check to find the sample size. For this question, use the sample size formula to compute the sample size required to estimate the mean cost of CABG surgery, $\mu$, within $2000 with 95% confidence. Recall that in a previous study, the standard deviation was found to be$\sigma = \$28,705$ .

\begin{align} n &= \left( \frac{z^*\sigma}{m} \right)^2 \\ &= \left( \frac{1.96 \cdot 28705}{2000} \right)^2 \\ &= 791.3 \end{align}

### 5.1 Rounding Up

In Question 13, you computed the sample size required to get a margin of error of $2000. Notice that the result was 791.3, which is not a whole number. What does that mean? Does that suggest that you will survey only a fraction of the last patient's costs? Of course not! When doing sample size calculations, if the answer is not a whole number, you always round up to the next highest whole number. This will allow you to get a sample size that is large enough to attain your desired margin of error. If you want to estimate the mean cost of CABG surgery within$2000 with 95% confidence, you will need to survey the files of 792 patients.

When doing sample size calculations, you always round up.

14. Answer this question without doing any computations. If the hospital administrators wanted to estimate the mean cost of CABG surgery with a margin of error of $1000 at the 95% confidence level, should the sample size be larger or smaller than$n=792$? In question 5, we determined that the sample size required to have a margin of error of$2000 is 792. The margin of error is in the denominator of the sample size formula. If we want to reduce the margin of error, we need to increase the sample size. The sample size should be larger than $n=792$.
15. Find the sample size required to estimate the mean cost of CABG surgery with a margin of error of $1000 and with a 95% confidence level.$ \begin{array}{1cl} n &= \left( \frac{z^*\sigma}{m} \right)^2 \\ &= \left( \frac{1.96 \cdot 28705}{1000} \right)^2 \\ &= 3165.4 \end{array} $We always round up in sample size calculations, so the required sample size is$ n=3166$. 16. What happened to the sample size required when the margin of error is cut in half from$2000 to $1000? If we divide the two sample sizes, we get:$ \frac{3166}{792} \approx 4 $In order to reduce the margin of error by half, we need to quadruple the sample size. 17. Remember the sample size required to have a margin of error of$2000 at the 95% level of confidence is $n=792$. If we wanted to estimate the mean cost with the same margin of error at the 99% level of confidence, would the sample size be larger or smaller?
The value of $z^*$ would increase from 1.96 to 2.576, which would increase the required sample size.