Module 13: F-Distribution and One-Way ANOVA

# Facts about the F Distribution

Barbara Illowsky & OpenStax et al.

Here are some facts about the *F* distribution.

- The curve is not symmetrical but skewed to the right.
- There is a different curve for each set of
*df*s. - The
*F*statistic is greater than or equal to zero. - As the degrees of freedom for the numerator and for the denominator get larger, the curve approximates the normal.
- Other uses for the
*F*distribution include comparing two variances and two-way Analysis of Variance. Two-Way Analysis is beyond the scope of this chapter.

### try it

MRSA, or *Staphylococcus aureus*, can cause a serious bacterial infections in hospital patients. This table shows various colony counts from different patients who may or may not have MRSA.

Conc = 0.6 | Conc = 0.8 | Conc = 1.0 | Conc = 1.2 | Conc = 1.4 |
---|---|---|---|---|

9 | 16 | 22 | 30 | 27 |

66 | 93 | 147 | 199 | 168 |

98 | 82 | 120 | 148 | 132 |

Plot of the data for the different concentrations:

Test whether the mean number of colonies are the same or are different. Construct the ANOVA table (by hand or by using a TI-83, 83+, or 84+ calculator), find the *p*-value, and state your conclusion. Use a 5% significance level.

While there are differences in the spreads between the groups, the differences do not appear to be big enough to cause concern.

We test for the equality of mean number of colonies:

*H _{0}* :

*μ*

_{1}=

*μ*

_{2}=

*μ*

_{3}=

*μ*

_{4}=

*μ*

_{5}

*H*:

_{a}*μ*≠

^{i}*μ*some

^{j}*i*≠

*j*

The one-way ANOVA table results are shown in below.

Source of Variation | Sum of Squares (SS) |
Degrees of Freedom (df) |
Mean Square (MS) |
F |
---|---|---|---|---|

Factor (Between) | 10,233 | 5 – 1 = 4 | [latex]displaystylefrac{{{10},{233}}}{{4}}={2},{558.25}[/latex] | [latex]displaystylefrac{{{2},{558.25}}}{{{4},{194.9}}}={0.6099}[/latex] |

Error (Within) | 41,949 | 15 – 5 = 10 | ||

Total | 52,182 | 15 – 1 = 14 | [latex]displaystylefrac{{{41},{949}}}{{10}}={4},{194.9}[/latex] |

**Distribution for the test:** *F*_{4,10}**Probability Statement: ***p*-value = *P*(*F* > 0.6099) = 0.6649.

**Compare α and the p-value:**

*α*= 0.05,

*p*-value = 0.669,

*α*>

*p*-value

**Make a decision:** Since *α* > *p*-value, we do not reject *H*0.

**Conclusion:** At the 5% significance level, there is insufficient evidence from these data that different levels of tryptone will cause a significant difference in the mean number of bacterial colonies formed.

### Example

Four sororities took a random sample of sisters regarding their grade means for the past term. The results are shown in the table.

Mean Grades for Four Sororities

Sorority 1 | Sorority 2 | Sorority 3 | Sorority 4 |
---|---|---|---|

2.17 | 2.63 | 2.63 | 3.79 |

1.85 | 1.77 | 3.78 | 3.45 |

2.83 | 3.25 | 4.00 | 3.08 |

1.69 | 1.86 | 2.55 | 2.26 |

3.33 | 2.21 | 2.45 | 3.18 |

Using a significance level of 1%, is there a difference in mean grades among the sororities?

Solution:

Let *μ _{1}*,

*μ*,

_{2}*μ*,

_{3}*μ*be the population means of the sororities. Remember that the null hypothesis claims that the sorority groups are from the same normal distribution. The alternate hypothesis says that at least two of the sorority groups come from populations with different normal distributions. Notice that the four sample sizes are each five.

_{4}#### Note

This is an example of a **balanced design**, because each factor (i.e., sorority) has the same number of observations.

*H*:

_{0}*μ*=

_{1}*μ*=

_{2}*μ*=

_{3}*μ*

_{4}*H _{a}*: Not all of the means

*μ*,

_{1}*μ*,

_{2}*μ*,

_{3}*μ*are equal.

_{4}**Distribution for the test:** *F*_{3,16}

where *k* = 4 groups and *n* = 20 samples in total

*df*(*num*)= *k* – 1 = 4 – 1 = 3

*df*(*denom*) = *n* – *k* = 20 – 4 = 16

**Calculate the test statistic:** *F* = 2.23

**Graph:**

**Probability statement:** *p*-value = *P*(*F* > 2.23) = 0.1241

**Compare α and the p-value:**

*α*= 0.01

*p*-value = 0.1241

*α* < *p*-value

**Make a decision:** Since *α* < *p*-value, you cannot reject *H0*.

**Conclusion: **There is not sufficient evidence to conclude that there is a difference among the mean grades for the sororities.

#### Using a Calculator

Put the data into lists L1, L2, L3, and L4. Press `STAT`

and arrow over to `TESTS`

. Arrow down to `F:ANOVA`

. Press `ENTER`

and Enter (`L1,L2,L3,L4`

).

The calculator displays the F statistic, the *p*-value and the values for the one-way ANOVA table:

*F* = 2.2303

*p* = 0.1241 (*p*-value)

Factor *df* = 3

*SS* = 2.88732

*MS* = 0.96244

Error *df* = 16

*SS* = 6.9044

*MS* = 0.431525

### try it

Four sports teams took a random sample of players regarding their GPAs for the last year. The results are shown below:

GPAs for Four Sports Teams

Basketball | Baseball | Hockey | Lacrosse |
---|---|---|---|

3.6 | 2.1 | 4.0 | 2.0 |

2.9 | 2.6 | 2.0 | 3.6 |

2.5 | 3.9 | 2.6 | 3.9 |

3.3 | 3.1 | 3.2 | 2.7 |

3.8 | 3.4 | 3.2 | 2.5 |

Use a significance level of 5%, and determine if there is a difference in GPA among the teams.

With a *p*-value of 0.9271, we decline to reject the null hypothesis. There is not sufficient evidence to conclude that there is a difference among the GPAs for the sports teams.

### Example

A fourth grade class is studying the environment. One of the assignments is to grow bean plants in different soils. Tommy chose to grow his bean plants in soil found outside his classroom mixed with dryer lint. Tara chose to grow her bean plants in potting soil bought at the local nursery. Nick chose to grow his bean plants in soil from his mother’s garden. No chemicals were used on the plants, only water. They were grown inside the classroom next to a large window. Each child grew five plants. At the end of the growing period, each plant was measured, producing the data (in inches) in this table.

Tommy’s Plants | Tara’s Plants | Nick’s Plants |
---|---|---|

24 | 25 | 23 |

21 | 31 | 27 |

23 | 23 | 22 |

30 | 20 | 30 |

23 | 28 | 20 |

Does it appear that the three media in which the bean plants were grown produce the same mean height? Test at a 3% level of significance.

Solution:

This time, we will perform the calculations that lead to the *F’*statistic. Notice that each group has the same number of plants, so we will use the formula [latex]displaystyle{F}'=frac{{{n}cdot{{s}_{overline{{x}}}^{{ {2}}}}}}{{{{s}_{{text{pooled}}}^{{2}}}}}[/latex].

First, calculate the sample mean and sample variance of each group.

Tommy’s Plants | Tara’s Plants | Nick’s Plants | |
---|---|---|---|

Sample Mean | 24.2 | 25.4 | 24.4 |

Sample Variance | 11.7 | 18.3 | 16.3 |

Next, calculate the variance of the three group means (Calculate the variance of 24.2, 25.4, and 24.4). **Variance of the group means = 0.413** = [latex]displaystyle{{s}_{overline{{x}}}^{{ {2}}}}[/latex]

Then [latex]displaystyle{M}{S}_{{text{between}}}={n}{{s}_{overline{{x}}}^{{ {2}}}}={({5})}{({0.413})} text{ where } {n}={5}[/latex] is the sample size (number of plants each child grew).

Calculate the mean of the three sample variances (Calculate the mean of 11.7, 18.3, and 16.3). Mean of the sample variances = 15.433 = [latex]displaystyle{{s}_{{text{pooled}}}^{{2}}}[/latex]

Then [latex]displaystyle{M}{S}_{{text{within}}}={{s}_{{text{pooled}}}^{{2}}}={15.433}[/latex].

The *F* statistic (or *F* ratio) is

[latex]displaystyle{F}=frac{{{M}{S}_{{text{between}}}}}{{{M}{S}_{{text{within}}}}}=frac{{{n}{{s}_{overline{{x}}}^{{ {2}}}}}}{{{{s}_{{text{pooled}}}^{{2}}}}}=frac{{{({5})}{({0.413})}}}{{15.433}}={0.134}[/latex]The *dfs* for the numerator = the number of groups – 1 = 3 – 1 = 2.

The *dfs* for the denominator = the total number of samples – the number of groups = 15 – 3 = 12

The distribution for the test is *F*2,12 and the *F* statistic is *F* = 0.134

The *p*-value is *P*(*F* > 0.134) = 0.8759.

**Decision:** Since *α* = 0.03 and the *p*-value = 0.8759, do not reject *H0*. (Why?)

**Conclusion:** With a 3% level of significance, from the sample data, the evidence is not sufficient to conclude that the mean heights of the bean plants are different.

#### Using a Calculator

To calculate the *p*-value:

- Press
`2nd DISTR`

- Arrow down to
`Fcdf`

(and press`ENTER`

. - Enter 0.134,
`E99`

, 2, 12) - Press
`ENTER`

The *p*-value is 0.8759.

### try it

Another fourth grader also grew bean plants, but this time in a jelly-like mass. The heights were (in inches) 24, 28, 25, 30, and 32. Do a one-way ANOVA test on the four groups. Are the heights of the bean plants different? Use the same method as shown in Example 2.

*F*= 0.9496*p*-value = 0.4402

From the sample data, the evidence is not sufficient to conclude that the mean heights of the bean plants are different.

## References

Data from a fourth grade classroom in 1994 in a private K – 12 school in San Jose, CA.

Hand, D.J., F. Daly, A.D. Lunn, K.J. McConway, and E. Ostrowski. *A Handbook of Small Datasets: Data for Fruitfly Fecundity.* London: Chapman & Hall, 1994.

Hand, D.J., F. Daly, A.D. Lunn, K.J. McConway, and E. Ostrowski. *A Handbook of Small Datasets.* London: Chapman & Hall, 1994, pg. 50.

Hand, D.J., F. Daly, A.D. Lunn, K.J. McConway, and E. Ostrowski. A Handbook of Small Datasets. London: Chapman & Hall, 1994, pg. 118.

“MLB Standings – 2012.” Available online at http://espn.go.com/mlb/standings/_/year/2012.

Mackowiak, P. A., Wasserman, S. S., and Levine, M. M. (1992), “A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body Temperature, and Other Legacies of Carl Reinhold August Wunderlich,” *Journal of the American Medical Association*, 268, 1578-1580.

## Concept Review

The graph of the *F* distribution is always positive and skewed right, though the shape can be mounded or exponential depending on the combination of numerator and denominator degrees of freedom. The *F* statistic is the ratio of a measure of the variation in the group means to a similar measure of the variation within the groups. If the null hypothesis is correct, then the numerator should be small compared to the denominator. A small *F* statistic will result, and the area under the *F* curve to the right will be large, representing a large *p*-value. When the null hypothesis of equal group means is incorrect, then the numerator should be large compared to the denominator, giving a large *F* statistic and a small area (small *p*-value) to the right of the statistic under the *F* curve.

When the data have unequal group sizes (unbalanced data), then techniques need to be used for hand calculations. In the case of balanced data (the groups are the same size) however, simplified calculations based on group means and variances may be used. In practice, of course, software is usually employed in the analysis. As in any analysis, graphs of various sorts should be used in conjunction with numerical techniques. Always look of your data!

OpenStax, Statistics, “Facts About the F Distribution,” licensed under a CC BY 3.0 license.