Lesson 18: Inference for Two Proportions

From BYU-I Statistics Text
(Redirected from Lesson 18)
Jump to: navigation, search

These optional videos discuss the contents of this lesson.



 


1 Lesson Outcomes

By the end of this lesson, you should be able to:

  • Confidence Intervals for a comparison of two proportions:
    • Calculate and interpret a confidence interval for a comparison of two proportions given a confidence level.
    • Identify a point estimate and margin of error for the confidence interval.
    • Show the appropriate connections between the numerical and graphical summaries that support the confidence interval.
    • Check the requirements for the confidence interval.
  • Hypothesis Testing for a comparison of two proportions:
    • State the null and alternative hypothesis for the chosen test.
    • Calculate the test-statistic and p-value of the hypothesis test.
    • Assess the statistical significance by comparing the p-value to the α-level.
    • Check the requirements for the hypothesis test.
    • Show the appropriate connections between the numerical and graphical summaries that support the hypothesis test.
    • Draw a correct conclusion for the hypothesis test.


2 Hypothesis Tests

StepsAll.png

2.1 Another Taste of PTC

Step1.png

Phenylthiocarbamide-3D-balls.png

The ability to taste the chemical Phenylthiocarbamide (PTC) is hereditary. Some people can taste it, while others cannot. Even though the ability to taste PTC was observed in all age, race, and sex groups, this does not address the issue about whether men or women are more likely to be able taste PTC.

Further exploration of the PTC data allows us to investigate if there is a difference in the proportion of men and women who can taste PTC. The following contingency table summarizes Elise Johnson's results:

Gender Data Table
Can Taste PTC? Female Male Total
No 15 14 29
Yes 51 38 89
Total 66 52 118

These data are available in the file PTCTasting. Note the way the data are organized in the file. One column gives the gender, another column indicates if the individual can taste PTC, and a third column gives counts for each group.

Researchers want to know if the ability to taste PTC is a sex-linked trait. This can be summarized in the following research question: Is there a difference in the proportion of men and the proportion of women who can taste PTC? The hypothesis is that there is no difference in the the true proportion of men who can taste PTC compared to the true proportion of women who can taste PTC.


Step2.png

A sample of 66 females and 52 males were provided with PTC strips and asked to indicate if they could taste the chemical or not. (This research was approved by the BYU-Idaho Institutional Review Board.)


Step3.png

When working with categorical data, it is natural to summarize the data by computing proportions. If someone has the ability to taste PTC, we will call this a success. The sample proportion is defined as the number of successes observed divided by the total number of observations. For the females, the proportion of the sample who could taste the PTC was: $$ \hat p_1 = \frac{x_1}{n_1} = \frac{51}{66} $$ This is approximately 77.3% of the people who were surveyed. For the males, the proportion who could taste PTC was: $$ \hat p_2 = \frac{x_2}{n_2} = \frac{38}{52} $$ This works out to be about 73.1%.

When working with data for two proportions, graphically displaying the data can help us compare each proportion. Pie charts and bar charts are essential tools for describing our data.

Side-by-Side Pie Charts



Excel Instructions
To create side-by-side pie charts in Excel, do the following:
For this example we will use the PTCTasting data set.
  • First, make sure the data are organized so that one column contains the first variable (gender) and another column contains the second variable (whether or not they can taste PTC).
  • If a third column has be used to store counts of the number of observations in each group, then highlight the "CanTaste" column and "Count" column just for females, or cells B2, B3, C2, and C3.
  • Now click on the "Insert" tab and then the "pie chart" button. Please select the most basic 2D pie chart.
  • Next you can repeat the process above but highlighting the male data or cells B4, B5, C4, and C5.
  • To keep the two pie charts straight, you can add a title by clicking on the "Layout" tab under chart tools. Next click on "Chart Title" and then "Above Chart".
  • You can now type in the titles, "Females Taste PTC?" and "Males Taste PTC?" or something similar.
  • Now you should be able to visually compare the proportion of females that can taste PTC to the proportion of males that can taste PTC.
Your pie charts should look like this:
SidebysidePieCharts - Excel.PNG



Side-by-Side Bar Charts

Step4.png

The null and alternative hypotheses for a test of equality of two proportions is: $$ \begin{array}{rl} H_0: & p_1 = p_2 \\ H_a: & p_1 \ne p_2 \\ \end{array} $$

If the null hypothesis is true, then the proportion of females who can taste PTC is the same as the proportion of males who can taste PTC.

The test statistic is a $z$, and is given by: $$ z = \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( p_1 - p_2 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } $$ where $$ \begin{array}{lll} n_1= \text{sample size for group 1:} & n_1 = 66 & \text{(number of females)} \\ n_2= \text{sample size for group 2:} & n_2 = 52 & \text{(number of males)} \\ \hat p_1= \text{sample proportion for group 1:} & \hat p_1 = \frac{x_1}{n_1} = \frac{51}{66} & \text{(proportion of females who can taste PTC)}\\ \hat p_2= \text{sample proportion for group 2:} ~ & \hat p_2 = \frac{x_2}{n_2} = \frac{38}{52} & \text{(proportion of males who can taste PTC)}\\ \hat p= \text{overall sample proportion:} & \hat p = \frac{x_1+x_2}{n_1+n_2} = \frac{89}{118} & \text{(overall proportion who can taste PTC)}\\ \end{array} $$

Substituting these values into the equation for the test statistic, $z$, we get: $$ \begin{align} z & = \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( p_1 - p_2 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } \\ & = \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( 0 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } \\ & ~ ~ ~ ~ ~ \textrm{In the null hypothesis, we assumed that} ~ p_1=p_2. \\ & ~ ~ ~ ~ ~ \textrm{Or after subtracting,} ~ p_1-p_2=0 \\ & ~ ~ ~ ~ ~ \textrm{In the null hypothesis, we assumed that} ~ p_1=p_2. \\ & ~ ~ ~ ~ ~ \textrm{So, we substituted} ~ 0 ~ \textrm{for} ~ p_1-p_2 ~ \text{in the previous step.} \\ & = \frac{ \left( \frac{51}{66} - \frac{38}{52} \right) - (0) }{ \sqrt{\frac{89}{118} \left( 1-\frac{89}{118} \right) \left( \frac{1}{66} + \frac{1}{52} \right) } } \\ & = 0.526 \\ \end{align} $$

The test statistic is $z=0.526$. Under the null hypothesis, this follows a standard normal distribution. So, we can use the Normal Probability applet to compute the $P$-value. We are conducting a two-sided test, so we will shade both tails in the applet.

ShadeBothZ-0-5256.png

Since $P\textrm{-value} = 0.599 > 0.05 = \alpha$, we fail to reject the null hypothesis. In English we say, there is insufficient evidence to suggest that the true proportion of males who can taste PTC is different from the true proportion of females who can taste PTC.


Step5.png

Men and women appear to be able to taste PTC in equal proportions. There is not enough evidence to say that one gender is able to taste PTC more than the other. It appears that the ability to taste PTC is not a sex-linked trait.

2.2 Using Excel to perform these calculations

Just like we did for one proportion, we will use the Excel spreadsheet CategoricalDataAnalysis to perform hypothesis tests for two proportions.

To download this file, click here:

Click on the link at right for instructions on using this spreadsheet to perform hypothesis testing.

For this example we will consider the "PTC" data above.

Step 1: Open the Excel file CategoricalDataAnalysis and click on the "Two Proportions" tab at the bottom of the spreadsheet.
Lesson 18 pic 1.PNG
The blue boxes indicate the input spaces. These are the only cells into which you will enter data. For this example we will be considering all the blue boxes.


Step 2: Input the appropriate values into the designated cells. We will input the values of $x_1$, $n_1$, $x_2$, and $n_2$ from the data above. The fifth cell indicates our desired level of confidence for our confidence interval and is also used to give the level of significance, $\alpha$. The level of significance will be $1-\text{value of the cell}$. By default, the cell contains the value 0.95. This means we have a level of significance of $\alpha=1-0.95=0.05$. The sixth cell contains the value of our null hypothesis and is a drop-down list where we can select the appropriate hypothesis test.
Lesson 18 pic 2.PNG
You may have noticed that after you input the values into the blue cells that the values of the other cells changed automatically. Excel performs all of the necessary calculations for you.


Step 3: The results of the hypothesis test are given in the output boxes.
Lesson 18 pic 3.PNG
Compare the z-score and $P$-value with the one you calculated by hand. They're the same!


StepsAll.png


2.3 Mortality Rates and Day of Admission: Aortic Aneurysms

Some people have claimed that mortality (death) rates are higher for patients admitted to a hospital on a weekend compared to patients admitted on a weekday. Researchers Chaim Bell and Donald Redelmeier analyzed admission data from hospital emergency rooms in Ontario, Canada .

This CT scan image shows an abdominal aortic aneurysm.
The aorta is the large artery coming from the heart, along the center of the body.
The aneurysm appears as a bulge in the aorta.
(Image source: Michel de Villeneuve)

The aorta is a major artery that takes oxygen-rich blood from the heart to the entire body. In some patients, this artery can swell like a balloon and burst. If this occurs in the abdomen, the technical term for the event is a ruptured abdominal aortic aneurysm. Although this condition is treatable, it requires immediate action, or the patient will die rapidly.


Step1.png

The problem is that the quality of care in an emergency care facility may differ at different times of the week. Doctors Bell and Redelmeier hypothesized that the probability that a patient with an aortic aneurysm will die is greater if they are admitted to a hospital on a weekend compared to a weekday.


Hypothesis: The proportion of patients with a ruptured abdominal aortic aneurysm who will die is greater on the weekends than on weekdays.


Step2.png

To test this claim, the researchers accessed medical records for several patients admitted to the emergency department of the hospitals in Ontario, Canada. They recorded the number of patients admitted with an aortic aneurysm on weekdays compared to weekends.

Data representative of their results are given below .

Aortic Aneurysm Outcomes
Outcome Weekday Admission Weekend Admission
Died (x) $x_1 = 1476$ $x_2 = 553$
Survived $2669$ $756$
Total (n) $n_1 = 4145$ $n_2 = 1309$


Step3.png

Answer the following questions:
1. Use the data above to find the estimated proportion of patients admitted with an aortic aneurysm on a weekday who will die, $\hat p_1$.
$\hat p_1 = 0.3561$


2. Use the data to compute the estimated proportion of the patients admitted on a weekend that will die, $\hat p_2$.
$\hat p_2 = 0.4225$


3. What do you notice about $\hat p_1$ and $\hat p_2$?
$\hat p_2~>~\hat p_1$


4. Without doing any more calculations, do you think that there is a significant increase in the death rates of patients admitted on a weekend compared to those admitted on a weekday? Justify your answer.
It appears that those admitted on a weekend have a greater death rate than those who are admitted on a weekday, but we do not know if it is a statistically significant difference until we do a hypothesis test.

 


2.3.1 Side-by-side Bar Charts

If the data are considered counts, then a side-by-side bar chart is usually the preferred plot.



Excel Instructions
To create side-by-side bar charts in Excel, do the following:
  • First, make sure the data are organized so that the first column gives the outcome (died or survived), the next column indicates the day of admittance (weekday or weekend) and the last column shows the counts.
  • Then highlight the all three columns but just for the "Weekday" data.
  • Now click on the "Insert" tab and then the "Chart" button, and select "column". Please select the most basic 2D column chart.
  • Next you can repeat the process above but highlighting the "Weekend" data for all three columns.
  • To keep the two bar charts straight, you can add a title by clicking on the "Layout" tab under chart tools. Next click on "Chart Title" and then "Above Chart".
  • You can now type in the Titles, "Aortic Aneurysm Outcomes Weekday Admittance" and "Aortic Aneurysm Outcomes Weekend Admittance" or something similar.



Step4.png


$p_1$ is the true proportion of deaths in group 1, the weekday group. $p_2$ is the true proportion of deaths in the weekend group.



We now conduct a formal hypothesis test to determine if the mortality (death) rate is greater on a weekend compared to a weekday. First, we state the null and alternative hypotheses: $$ \begin{align} H_0: & p_1 = p_2 \\ H_a: & p_1 < p_2 \end{align} $$ where group 1 represents the patients admitted on a weekday and group 2 represents patients admitted on a weekend. Note that if $p_1 < p_2$, then the risk of death is greater in group 2 than in group 1. We will use the 0.05 level of significance.

2.3.2 Checking Requirements for the Hypothesis Test


When you check the requirements for this procedure, you are actually checking that you have at least 10 successes and 10 failures in both Group 1 and Group 2. All four conditions must be true in order to conduct the test.



If the sample size is large in both groups, then we can use the normal distribution to compute the $P$-value. To check if the sample size is large enough, we need to check the following requirements: $$ \begin{array}{rrr} n_1 \cdot \hat p_1 \ge 10 && n_2 \cdot \hat p_2 \ge 10 \\ n_1 \cdot \left(1-\hat p_1\right) \ge 10 && n_2 \cdot \left(1-\hat p_2\right) \ge 10 \end{array} $$ If these requirements are satisfied, the $z$-statistic can be used to compute to assess whether the true population proportions are equal or if the risk of death is greater on weekends.

Note that the requirements are all satisfied: $$ \begin{array}{rr} n_1 \cdot \hat p_1 = 4145 \cdot 0.3561 = 1476 \ge 10 & n_2 \cdot \hat p_2 = 1309 \cdot 0.4225 = 553 \ge 10 \\ n_1 \cdot \left(1-\hat p_1\right) = 4145 \cdot 0.6439 = 2669 \ge 10 & n_2 \cdot \left(1-\hat p_2\right) = 1309 \cdot 0.5775 = 756 \ge 10 \end{array} $$

Reminder: $~ x_1 = 1476,~x_2 =553$, $n_1=4145,~n_2 = 1309$, $\hat p_1 = \frac{x_1}{n_1},~\hat p_2 = \frac{x_2}{n_2}$, and $\hat p = \frac{x_1+x_2}{n_1+n_2}$.

The test statistic can be computed by following these steps:

  • First, find the combined proportion of "successes." This is computed as:

$$ \hat p = \frac{x_1+x_2}{n_1+n_2} = \frac{1476+553}{4145+1309} = \frac{2029}{5454} $$

  • Next, enter the appropriate values into the equation for the $z$-score.

$$ \begin{array}{rcll} z &=& \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( p_1 - p_2 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } \\ &=& \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( 0 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } & \text{We assumed } p_1=p_2. \\ &=& \frac{ \left( \hat p_1 - \hat p_2 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } \\ &=& \frac{ \left( \frac{1476}{4145} - \frac{553}{1309} \right) }{ \sqrt{\frac{2029}{5454} \left( 1-\frac{2029}{5454} \right) \left( \frac{1}{4145} + \frac{1}{1309} \right) } } \\ &=& -4.331 \end{array} $$

(Make sure you can get this value using your calculator.)

Remember...
The alternative hypothesis determines which area in the tails of the $z$-distribution will be shaded as you calculated the $P$-value.
If the alternative hypothesis is $\ldots$
  • $p_1 \ne p_2$, shade both tails.
  • $p_1 < p_2$, shade the left tail.
  • $p_1 > p_2$, shade the right tail.


This $z$-score can be substituted into the Normal Probability applet to find the $P$-value. Since the alternative hypothesis is that $p_1 < p_2$, we consider only the area to the left of $z=-4.331$. The applet gives this area (our $P$-value) as $7.42 \times 10^{-6} = 0.00000742$.

ShadeLeftZ-4-331.png

Note that $P$-value$ = 0.00000742 < 0.05 = \alpha$, so we reject the null hypothesis.


Step5.png

There is sufficient evidence to suggest that the true proportion of patients who die from an aortic aneurysm is greater on the weekends than on the weekdays. It appears that there are substantial differences in the quality of care available to patients on the weekends compared to patients on weekdays.


StepsAll.png

2.4 Mortality Rates and Day of Admission: Heart Attacks

A microscopic view of tissue from the heart of a heart attack victim.
Damage can be seen as constricted bands of tissue. (Image credit: KGH)


Heart attacks are a leading cause of death in many areas of the world. The study by Doctors Bell and Redelmeier included an analysis of the risk of dying of a heart attack, after admission to a hospital. The researchers reported the following death rates, depending on whether the heart attack occurred on a weekday or a weekend .


Heart Attack Outcomes
Outcome Weekday Admission Weekend Admission
Died (x) 17,113 6,289
Survived 100,596 36,222
Total (n) 117,709 42,511


In this section, you will conduct a hypothesis test to determine if the proportion of patients who die of a heart attack is greater on weekends than on weekdays. Use the 0.05 level of significance.


Answer the following questions:

Step1.png

5. Summarize the relevant background information
This was a study conducted by Doctors Bell and Redelmeier in which they analyzed the death rates depending on which day (weekday or weekend) the heart attack occurred and when the patients were admitted into the hospital.


Step2.png

6. Describe the data collection process.


The researchers collected data on patients admitted to hospitals for heart attacks and whether or not they died. Separate data was kept for weekday hospital admissions and weekend hospital admissions.


Step3.png

7. The value of $\hat p_1$, the sample proportion of patients who died of a heart attack on a weekday, is $\hat p_1 = \frac{17113}{117709} = 0.14538$. Find the value of $\hat p_2$.
$\displaystyle{\hat p_2 = \frac{6289}{42511}=0.14794}$


8. Create a side-by-side pie chart illustrating the data.
Heart-Attack-Side-By-Side-Pies-Excel.png


9. Based on your answers to questions 7 and 8, does it appear that the risk of dying is greater if a heart attack occurs on a weekend than on a weekday?
The sample proportions, $\hat p_1$ and $\hat p_2$ are very close. Visually, there does not appear to be a difference.


10. If the proportion of patients who die of a heart attack is greater on weekends than on weekdays, which of the following would best describe the relationship?
A. $p_1 = p_2$
B. $p_1 > p_2$
C. $p_1 < p_2$
D. $p_1 \ne p_2$
$p_1 < p_2$ is correct. Note that group 1 is the weekday group and group 2 is the weekend group.


Step4.png

11. Replace the circles ($\bigcirc$) in the following null and alternative hypotheses with two of the following symbols: $=><\ne$.

$$ \begin{array}{rl} H_0: & p_1 \bigcirc p_2 \\ H_a: & p_1 \bigcirc p_2 \end{array} $$

$ \begin{array}{rl} H_0: & p_1 = p_2 \\ H_a: & p_1 < p_2 \end{array} $


12. What are the requirements for this test?

$ \begin{array}{rrr} n_1 \cdot \hat p_1 \ge 10 && n_2 \cdot \hat p_2 \ge 10 \\ n_1 \cdot \left(1-\hat p_1\right) \ge 10 && n_2 \cdot \left(1-\hat p_2\right) \ge 10 \end{array} $


13. Are the requirements for this hypothesis test satisfied? Justify your answer.

$ \begin{array}{rrr} n_1 \cdot \hat p_1 = 17113 \ge 10 && n_2 \cdot \hat p_2 = 6289 \ge 10 \\ n_1 \cdot \left(1-\hat p_1\right) = 100596 \ge 10 && n_2 \cdot \left(1-\hat p_2\right) = 36222 \ge 10 \end{array} $

Yes, all of the calculated quantities from the previous question are at least 10.


14. Write the equation for the test statistic, $z$.

$ z = \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( p_1 - p_2 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) }} $


15. Compute the value of the test statistic.
$z=-1.278$


16. Present a sketch of the sampling distribution, showing the test statistic and the $P$-value.
SamplingDistributionHeartAttack-Applet.png


17. Find the $P$-value.
$P\textrm{-value} = 0.1006$


18. Compare the $P$-value to the level of significance. Which is smaller? Will you reject, or fail to reject, the null hypothesis?
$P\textrm{-value} = 0.1006 > 0.05 = \alpha$


19. What is your decision?
Since the $P$-value is greater than $\alpha$, we fail to reject the null hypothesis.


20. Fill in the blanks in the following sentence:
There is $\text{____________}$ evidence to suggest that the true proportion of patients who die of a heart attack is $\text{__________}$ on weekends than on weekdays.
There is insufficient evidence to suggest that the true proportion of patients who die of a heart attack is greater on weekends than on weekdays.


Step5.png

21. If you were to have a heart attack, would you be more concerned if it occurred on a weekend than on a weekday?
No, the evidence suggests that the probability of dying of a heart attack is not greater on the weekends.

 


StepsAll.png

3 Confidence Intervals: Managing Fox Populations

During the mid 1800's, European foxes were introduced to the Australian mainland. These predators have been responsible for the reduction or extinction of several species of native wildlife .

Royal Botanic Gardens in Cranbourne, Victoria, Australia
From flickr.com

The Royal Botanic Gardens Cranbourne is a 914 acre (370 ha) conservation reserve outside Melbourne, Australia. Predation by foxes has be an ongoing problem in the gardens. To reduce the risk to native species, a systematic program of killing foxes was implemented.

One way to monitor the presence of foxes is to look for fox tracks in specific sandy areas, called sand-pads. Before beginning a systematic effort to reduce the fox population, ecologists observed fox tracks in the sand-pads 576 out of the 950 times the sand-pads were observed. After eliminating some of the foxes, the ecologists observed fox tracks in the sand-pads 268 times out of the 1359 times they checked the sand-pads . The ecologists want to know if there is a difference in the proportion of times fox tracks are observed before versus after the intervention to reduce the fox population.

One way to compare two proportions is to make a confidence interval for the difference in the proportions.

The equation for the confidence interval for the difference of two proportions may look a little daunting at first, but with some practice, it is not too difficult.

Before we compute the confidence interval, we first organize our data and calculate some statistics that will be useful later. We divide the data into two groups: before foxes were targeted (Group 1) and after (Group 2). For each group, let $x_1$ and $x_2$ represent the number of times fox prints were observed in the sand-pads before and after the ecologists began systematically eliminating the foxes, respectively. Similarly, Let $n_1$ and $n_2$ be the number of times the ecologists checked the sand-pads in the before and after periods, respectively.


Fox Tracks Data
Before Intervention After Intervention Combined Data
Fox Tracks Observed $x_1 = 576$ $x_2 = 268$ $x_1 + x_2 = 576 + 268 = 844$
Total Observations $n_1 = 950$ $n_2 = 1359$ $n_1 + n_2 = 950 + 1359 = 2309$


Similar to what we did in the lesson for Inference for One Proportion, we compute $\hat p$ for each group.

For group 1: $$ \hat p_1 = \frac{x_1}{n_1} = \frac{576}{950}$$

For group 2: $$ \hat p_2 = \frac{x_2}{n_2} = \frac{268}{1359}$$

An equation of the confidence interval for the difference between two proportions is computed by combining all the information above: $$ \left( \left( \hat p_1 -\hat p_2 \right) - z^* \sqrt{ \frac{\hat p_1 \left( 1 - \hat p_1 \right)}{n_1} + \frac{\hat p_2 \left( 1 - \hat p_2 \right)}{n_2} } , ~ \left( \hat p_1 -\hat p_2 \right) + z^* \sqrt{ \frac{\hat p_1 \left( 1 - \hat p_1 \right)}{n_1} + \frac{\hat p_2 \left( 1 - \hat p_2 \right)}{n_2} } \right) $$


Remember, for a 95% confidence interval, $z^*=1.96$. If you need to review how to find the value of $z^*$ for other confidence levels, see page Inference for One Mean: Sigma Known (Confidence Interval).



The lower bound for a 95% confidence interval for the difference of the proportions of times fox prints are observed in the sand-pads is:

$$ \displaystyle{ \left( \hat p_1 -\hat p_2 \right) - z^* \sqrt{ \frac{\hat p_1 \left( 1 - \hat p_1 \right)}{n_1} + \frac{\hat p_2 \left( 1 - \hat p_2 \right)}{n_2} } } $$

$$ \displaystyle{ = \left( \frac{576}{950} - \frac{268}{1359} \right) - 1.96 \sqrt{ \frac{\frac{576}{950} \left( 1 - \frac{576}{950} \right)}{950} + \frac{\frac{268}{1359} \left( 1 - \frac{268}{1359} \right)}{1359} } } $$

$$ \displaystyle{ = 0.372 } $$

and the upper bound is:

$$ \displaystyle{ \left( \hat p_1 -\hat p_2 \right) + z^* \sqrt{ \frac{\hat p_1 \left( 1 - \hat p_1 \right)}{n_1} + \frac{\hat p_2 \left( 1 - \hat p_2 \right)}{n_2} } } $$

$$ \displaystyle{ = \left( \frac{576}{950} - \frac{268}{1359} \right) + 1.96 \sqrt{ \frac{\frac{576}{950} \left( 1 - \frac{576}{950} \right)}{950} + \frac{\frac{268}{1359} \left( 1 - \frac{268}{1359} \right)}{1359} } } $$

$$ \displaystyle{ = 0.447 } $$


Make sure you can compute these confidence intervals before reading on.



So, the 95% confidence interval for the difference in the proportions is:

$$ (0.372, 0.447) $$


If we switch the way we label group 1 and group 2, then our confidence interval would have the opposite signs: $(-0.447, -0.372)$.



To interpret this confidence interval, we say, "We are 95% confident that the true difference in the proportions of times fox prints will appear in the sand-pads is between 0.372 and 0.447."

Notice that zero is not in this confidence interval, so zero is not a plausible value for $p_1 - p_2$. Based on this result, it is reasonable to conclude that the proportion of times foxes are observed in the sand-pads is not the same before and after the effort to reduce their population. It seems that the work to reduce the number of foxes is having an effect on their presence in the reserve.

As you may have guessed, the spreadsheet CategoricalDataAnalysis can be used to calculate confidence intervals for the difference of two proportions.

3.1 Checking Requirements for a Confidence Interval

The requirements for computing a confidence interval for two proportions are the same as the requirements for doing a hypothesis test.

$$ \begin{array}{rrr} n_1 \cdot \hat p_1 \ge 10 && n_2 \cdot \hat p_2 \ge 10 \\ n_1 \cdot \left(1-\hat p_1\right) \ge 10 && n_2 \cdot \left(1-\hat p_2\right) \ge 10 \end{array} $$

In this example, all of the requirements are satisfied:

$$ \begin{array}{rr} n_1 \cdot \hat p_1 = 950 \cdot 0.606 = 576 \ge 10 & n_2 \cdot \hat p_2 = 1359 \cdot 0.197 = 268 \ge 10 \\ n_1 \cdot \left(1-\hat p_1\right) = 950 \cdot (1-0.606) = 374 \ge 10 & n_2 \cdot \left(1-\hat p_2\right) = 1359 \cdot (1-0.197) = 1091 \ge 10 \end{array} $$

Answer the following questions:
22. What is the value of $z^*$ for a 93% confidence interval?
$z^* = 1.8119$


23. Find the 93% confidence interval for the difference in the proprotions of the times fox prints are observed in the sand-pads before and after the effort to reduce the fox population.
$(0.374, 0.444)$

 

4 Summary

Remember...
  • When conducting hypothesis tests using two proportions, the null hypothesis is always $p_1=p_2$, indicating that there is no difference between the two proportions. The alternative hypothesis can be left-tailed ($<$), right-tailed($>$), or two-tailed($\ne$).
  • For a hypothesis test and confidence interval of two proportions, we use the following symbols:

$$ \begin{array}{lcl} \text{Sample proportion for group 1:} & \hat p_1 = \displaystyle{\frac{x_1}{n_1}} \\ \text{Sample proportion for group 2:} & \hat p_2 = \displaystyle{\frac{x_2}{n_2}} \end{array} $$

  • For a hypothesis test only, we use the following symbols:

$$ \begin{array}{lcl} \text{Overall sample proportion:} & \hat p = \displaystyle{\frac{x_1+x_2}{n_1+n_2}} \end{array} $$

  • Whenever zero is contained in the confidence interval of the difference of the true proportions we conclude that there is no significant difference between the two proportions.
  • You will use the Excel spreadsheet
to perform hypothesis testing and calculate confidence intervals for problems involving two proportions.


5 Navigation

Previous Reading:
Lesson 17:
Inference for One Proportion
                   This Reading:
Lesson 18:
Inference for Two Proportions
                   Next Reading:
Lesson 19:
Inference for Independence of Categorical Data