Module 2: Descriptive Statistics

# Section Exercises

Barbara Illowsky & OpenStax et al.

## Stem-and-Leaf Graphs (Stemplots), Line Graphs, and Bar Graphs

For each of the following data sets, create a stem plot and identify any outliers.

1. The miles per gallon rating for 30 cars are shown below (lowest to highest).  19, 19, 19, 20, 21, 21, 25, 25, 25, 26, 26, 28, 29, 31, 31, 32, 32, 33, 34, 35, 36, 37, 37, 38, 38, 38, 38, 41, 43, 43

2. The height in feet of 25 trees is shown below (lowest to highest).  25, 27, 33, 34, 34, 34, 35, 37, 37, 38, 39, 39, 39, 40, 41, 45, 46, 47, 49, 50, 50, 53, 53, 54, 54

3. The data are the prices of different laptops at an electronics store. Round each value to the nearest ten.  249, 249, 260, 265, 265, 280, 299, 299, 309, 319, 325, 326, 350, 350, 350, 365, 369, 389, 409, 459, 489, 559, 569, 570, 610

4. The data are daily high temperatures in a town for one month.  61, 61, 62, 64, 66, 67, 67, 67, 68, 69, 70, 70, 70, 71, 71, 72, 74, 74, 74, 75, 75, 75, 76, 76, 77, 78, 78, 79, 79, 95
For the next three exercises, use the data to construct a line graph.

In a survey, 40 people were asked how many times they visited a store before making a major purchase. The results are shown in the table.

Number of times in store Frequency
1 4
2 10
3 16
4 6
5 4
In a survey, several people were asked how many years it has been since they purchased a mattress. The results are shown in the table.
Years since last purchase Frequency
0 2
1 8
2 13
3 22
4 16
5 9

Several children were asked how many TV shows they watch each day. The results of the survey are shown in the table.

Number of TV Shows Frequency
0 12
1 18
2 36
3 7
4 2
The students in Ms. Ramirez’s math class have birthdays in each of the four seasons. The table shows the four seasons, the number of students who have birthdays in each season, and the percentage (%) of students in each group. Construct a bar graph showing the number of students.
Seasons Number of students Proportion of population
Spring 8 24%
Summer 9 26%
Autumn 11 32%
Winter 6 18%

5. Using the data from Mrs. Ramirez’s math class supplied in the tables, construct a bar graph showing the percentages.

6. David County has six high schools. Each school sent students to participate in a county-wide science competition. The table shows the percentage breakdown of competitors from each school, and the percentage of the entire student population of the county that goes to each school. Construct a bar graph that shows the population percentage of competitors from each school.
High School Science competition population Overall student population
Alabaster 28.9% 8.6%
Concordia 7.6% 23.2%
Genoa 12.1% 15.0%
Mocksville 18.5% 14.3%
Tynneson 24.2% 10.1%
West End 8.7% 28.8%

7. Use the data from the David County science competition supplied in the table above. Construct a bar graph that shows the county-wide population percentage of students at each school.

8. Student grades on a chemistry exam were: 77, 78, 76, 81, 86, 51, 79, 82, 84, 99

1. Construct a stem-and-leaf plot of the data.
2. Are there any potential outliers? If so, which scores are they? Why do you consider them outliers?

9. The table contains the 2010 obesity rates in U.S. states and Washington, DC.

State Percent (%) State Percent (%) State Percent (%)
Alabama 32.2 Kentucky 31.3 North Dakota 27.2
Alaska 24.5 Louisiana 31.0 Ohio 29.2
Arizona 24.3 Maine 26.8 Oklahoma 30.4
Arkansas 30.1 Maryland 27.1 Oregon 26.8
California 24.0 Massachusetts 23.0 Pennsylvania 28.6
Colorado 21.0 Michigan 30.9 Rhode Island 25.5
Connecticut 22.5 Minnesota 24.8 South Carolina 31.5
Delaware 28.0 Mississippi 34.0 South Dakota 27.3
Washington, DC 22.2 Missouri 30.5 Tennessee 30.8
Florida 26.6 Montana 23.0 Texas 31.0
Georgia 29.6 Nebraska 26.9 Utah 22.5
Hawaii 22.7 Nevada 22.4 Vermont 23.2
Idaho 26.5 New Hampshire 25.0 Virginia 26.0
Illinois 28.2 New Jersey 23.8 Washington 25.5
Indiana 29.6 New Mexico 25.1 West Virginia 32.5
Iowa 28.4 New York 23.9 Wisconsin 26.3
Kansas 29.4 North Carolina 27.8 Wyoming 25.1
1. Use a random number generator to randomly pick eight states. Construct a bar graph of the obesity rates of those eight states.
2. Construct a bar graph for all the states beginning with the letter “A.”
3. Construct a bar graph for all the states beginning with the letter “M.”

## Histograms, Frequency Polygons, and Time Series Graphs

10. Sixty-five randomly selected car salespersons were asked the number of cars they generally sell in one week. Fourteen people answered that they generally sell three cars; nineteen generally sell four cars; twelve generally sell five cars; nine generally sell six cars; eleven generally sell seven cars. Complete the table.

Data Value (# cars) Frequency Relative Frequency Cumulative Relative Frequency

11. What does the frequency column in Table sum to? Why?

12. What does the relative frequency column in Table sum to? Why?

13. What is the difference between relative frequency and frequency for each data value in Table?

14. What is the difference between cumulative relative frequency and relative frequency for each data value?

15. To construct the histogram for the data in Table, determine appropriate minimum and maximum x and y values and the scaling. Sketch the histogram. Label the horizontal and vertical axes with words. Include numerical scaling.

16. Construct a frequency polygon for the following:

1. Pulse Rates for Women Frequency
60–69 12
70–79 14
80–89 11
90–99 1
100–109 1
110–119 0
120–129 1
2. Actual Speed in a 30 MPH Zone Frequency
42–45 25
46–49 14
50–53 7
54–57 3
58–61 1
3. Tar (mg) in Nonfiltered Cigarettes Frequency
10–13 1
14–17 0
18–21 15
22–25 7
26–29 2

17. Construct a frequency polygon from the frequency distribution for the 50 highest ranked countries for depth of hunger.

Depth of Hunger Frequency
230–259 21
260–289 13
290–319 5
320–349 7
350–379 1
380–409 1
410–439 1

18. Use the two frequency tables to compare the life expectancy of men and women from 20 randomly selected countries. Include an overlayed frequency polygon and discuss the shapes of the distributions, the center, the spread, and any outliers. What can we conclude about the life expectancy of women compared to men?

Life Expectancy at Birth – Women Frequency
49–55 3
56–62 3
63–69 1
70–76 3
77–83 8
84–90 2
Life Expectancy at Birth – Men Frequency
49–55 3
56–62 3
63–69 1
70–76 1
77–83 7
84–90 5

19. Construct a times series graph for (a) the number of male births, (b) the number of female births, and (c) the total number of births.

 Sex/Year 1855 1856 1857 1858 1859 1860 1861 Female 45,545 49,582 50,257 50,324 51,915 51,220 52,403 Male 47,804 52,239 53,158 53,694 54,628 54,409 54,606 Total 93,349 101,821 103,415 104,018 106,543 105,629 107,009
 Sex/Year 1862 1863 1864 1865 1866 1867 1868 1869 Female 51,812 53,115 54,959 54,850 55,307 55,527 56,292 55,033 Male 55,257 56,226 57,374 58,220 58,360 58,517 59,222 58,321 Total 107,069 109,341 112,333 113,070 113,667 114,044 115,514 113,354
 Sex/Year 1871 1870 1872 1871 1872 1827 1874 1875 Female 56,099 56,431 57,472 56,099 57,472 58,233 60,109 60,146 Male 60,029 58,959 61,293 60,029 61,293 61,467 63,602 63,432 Total 116,128 115,390 118,765 116,128 118,765 119,700 123,711 123,578

20. The following data sets list full time police per 100,000 citizens along with homicides per 100,000 citizens for the city of Detroit, Michigan during the period from 1961 to 1973.

 Year 1961 1962 1963 1964 1965 1966 1967 Police 260.35 269.8 272.04 272.96 272.51 261.34 268.89 Homicides 8.6 8.9 8.52 8.89 13.07 14.57 21.36
 Year 1968 1969 1970 1971 1972 1973 Police 295.99 319.87 341.43 356.59 376.69 390.19 Homicides 28.03 31.49 37.39 46.26 47.24 52.33
1. Construct a double time series graph using a common x-axis for both sets of data.
2. Which variable increased the fastest? Explain.
3. Did Detroit’s increase in police officers have an impact on the murder rate? Explain.

21. Suppose that three book publishers were interested in the number of fiction paperbacks adult consumers purchase per month. Each publisher conducted a survey. In the survey, adult consumers were asked the number of fiction paperbacks they had purchased the previous month. The results are as follows:

Publisher A
# of books Freq. Rel. Freq.
0 10
1 12
2 16
3 12
4 8
5 6
6 2
8 2
Publisher B
# of books Freq. Rel. Freq.
0 18
1 24
2 24
3 22
4 15
5 10
7 5
9 1
Publisher C
# of books Freq. Rel. Freq.
0–1 20
2–3 35
4–5 12
6–7 2
8–9 1
1. Find the relative frequencies for each survey. Write them in the charts.
2. Using either a graphing calculator, computer, or by hand, use the frequency column to construct a histogram for each publisher’s survey. For Publishers A and B, make bar widths of one. For Publisher C, make bar widths of two.
3. In complete sentences, give two reasons why the graphs for Publishers A and B are not identical.
4. Would you have expected the graph for Publisher C to look like the other two graphs? Why or why not?
5. Make new histograms for Publisher A and Publisher B. This time, make bar widths of two.
6. Now, compare the graph for Publisher C to the new graphs for Publishers A and B. Are the graphs more similar or more different? Explain your answer.

22. Often, cruise ships conduct all on-board transactions, with the exception of gambling, on a cashless basis. At the end of the cruise, guests pay one bill that covers all onboard transactions. Suppose that 60 single travelers and 70 couples were surveyed as to their on-board bills for a seven-day cruise from Los Angeles to the Mexican Riviera. Following is a summary of the bills for each group.

Singles
Amount($) Frequency Rel. Frequency 51–100 5 101–150 10 151–200 15 201–250 15 251–300 10 301–350 5 Couples Amount($) Frequency Rel. Frequency
100–150 5
201–250 5
251–300 5
301–350 5
351–400 10
401–450 10
451–500 10
501–550 10
551–600 5
601–650 5
1. Fill in the relative frequency for each group.
2. Construct a histogram for the singles group. Scale the x-axis by $50 widths. Use relative frequency on the y-axis. 3. Construct a histogram for the couples group. Scale the x-axis by$50 widths. Use relative frequency on the y-axis.
4. Compare the two graphs:
1. List two similarities between the graphs.
2. List two differences between the graphs.
3. Overall, are the graphs more similar or different?
5. Construct a new graph for the couples by hand. Since each couple is paying for two individuals, instead of scaling the x-axis by $50, scale it by$100. Use relative frequency on the y-axis.
6. Compare the graph for the singles with the new graph for the couples:
1. List two similarities between the graphs.
2. Overall, are the graphs more similar or different?
7. How did scaling the couples graph differently change the way you compared it to the singles graph?
8. Based on the graphs, do you think that individuals spend the same amount, more or less, as singles as they do person by person as a couple? Explain why in one or two complete sentences.

22. Twenty-five randomly selected students were asked the number of movies they watched the previous week. The results are as follows.

# of movies Frequency Relative Frequency Cumulative Relative Frequency
0 5
1 9
2 6
3 4
4 1
1. Construct a histogram of the data.
2. Complete the columns of the chart.

Use the following information to answer the next two exercises:

Suppose one hundred eleven people who shopped in a special t-shirt store were asked the number of t-shirts they own costing more than $19 each. 23. The percentage of people who own at most three t-shirts costing more than$19 each is approximately:

1. 21
2. 59
3. 41
4. Cannot be determined
24. If the data were collected by asking the first 111 people who entered the store, then the type of sampling is:

1. cluster
2. simple random
3. stratified
4. convenience

25. Following are the 2010 obesity rates by U.S. states and Washington, DC.

State Percent (%) State Percent (%) State Percent (%)
Alabama 32.2 Kentucky 31.3 North Dakota 27.2
Alaska 24.5 Louisiana 31.0 Ohio 29.2
Arizona 24.3 Maine 26.8 Oklahoma 30.4
Arkansas 30.1 Maryland 27.1 Oregon 26.8
California 24.0 Massachusetts 23.0 Pennsylvania 28.6
Colorado 21.0 Michigan 30.9 Rhode Island 25.5
Connecticut 22.5 Minnesota 24.8 South Carolina 31.5
Delaware 28.0 Mississippi 34.0 South Dakota 27.3
Washington, DC 22.2 Missouri 30.5 Tennessee 30.8
Florida 26.6 Montana 23.0 Texas 31.0
Georgia 29.6 Nebraska 26.9 Utah 22.5
Hawaii 22.7 Nevada 22.4 Vermont 23.2
Idaho 26.5 New Hampshire 25.0 Virginia 26.0
Illinois 28.2 New Jersey 23.8 Washington 25.5
Indiana 29.6 New Mexico 25.1 West Virginia 32.5
Iowa 28.4 New York 23.9 Wisconsin 26.3
Kansas 29.4 North Carolina 27.8 Wyoming 25.1

26. Construct a bar graph of obesity rates of your state and the four states closest to your state. Hint: Label the x-axis with the states.

## Measures of the Location of the Data

27. Listed are 29 ages for Academy Award winning best actors in order from smallest to largest.

18; 21; 22; 25; 26; 27; 29; 30; 31; 33; 36; 37; 41; 42; 47; 52; 55; 57; 58; 62; 64; 67; 69; 71; 72; 73; 74; 76; 77

1. Find the 40th percentile.
2. Find the 78th percentile.

28. Listed are 32 ages for Academy Award winning best actors in order from smallest to largest.

18; 18; 21; 22; 25; 26; 27; 29; 30; 31; 31; 33; 36; 37; 37; 41; 42; 47; 52; 55; 57; 58; 62; 64; 67; 69; 71; 72; 73; 74; 76; 77

1. Find the percentile of 37.
2. Find the percentile of 72.

29. Jesse was ranked 37th in his graduating class of 180 students. At what percentile is Jesse’s ranking?

1. For runners in a race, a low time means a faster run. The winners in a race have the shortest running times. Is it more desirable to have a finish time with a high or a low percentile when running a race?
2. The 20th percentile of run times in a particular race is 5.2 minutes. Write a sentence interpreting the 20th percentile in the context of the situation.
3. A bicyclist in the 90th percentile of a bicycle race completed the race in 1 hour and 12 minutes. Is he among the fastest or slowest cyclists in the race? Write a sentence interpreting the 90th percentile in the context of the situation.
4. For runners in a race, a higher speed means a faster run. Is it more desirable to have a speed with a high or a low percentile when running a race?
5. The 40th percentile of speeds in a particular race is 7.5 miles per hour. Write a sentence interpreting the 40th percentile in the context of the situation.

30. On an exam, would it be more desirable to earn a grade with a high or low percentile? Explain.

31. Mina is waiting in line at the Department of Motor Vehicles (DMV). Her wait time of 32 minutes is the 85th percentile of wait times. Is that good or bad? Write a sentence interpreting the 85th percentile in the context of this situation.

32. In a survey collecting data about the salaries earned by recent college graduates, Li found that her salary was in the 78th percentile. Should Li be pleased or upset by this result? Explain.

33. In a study collecting data about the repair costs of damage to automobiles in a certain type of crash tests, a certain model of car had $1,700 in damage and was in the 90th percentile. Should the manufacturer and the consumer be pleased or upset by this result? Explain and write a sentence that interprets the 90th percentile in the context of this problem. 34. The University of California has two criteria used to set admission standards for freshman to be admitted to a college in the UC system: 1. Students’ GPAs and scores on standardized tests (SATs and ACTs) are entered into a formula that calculates an “admissions index” score. The admissions index score is used to set eligibility standards intended to meet the goal of admitting the top 12% of high school students in the state. In this context, what percentile does the top 12% represent? 2. Students whose GPAs are at or above the 96th percentile of all students at their high school are eligible (called eligible in the local context), even if they are not in the top 12% of all students in the state. What percentage of students from each high school are “eligible in the local context”? 35. Suppose that you are buying a house. You and your realtor have determined that the most expensive house you can afford is the 34th percentile. The 34th percentile of housing prices is$240,000 in the town you want to move to. In this town, can you afford 34% of the houses or 66% of the houses?

36. Use 35 to calculate the following values:
First quartile = _______
Second quartile = median = 50th percentile = _______
Third quartile = _______
Interquartile range (IQR) = _____ – _____ = _____
10th percentile = _______

70th percentile = _______

37. The median age for U.S. blacks currently is 30.9 years; for U.S. whites it is 42.3 years.Based upon this information, give two reasons why the black median age could be lower than the white median age. Does the lower median age for blacks necessarily mean that blacks die younger than whites? Why or why not? How might it be possible for blacks and whites to die at approximately the same age, but for the median age for whites to be higher?

38. Six hundred adult Americans were asked by telephone poll, “What do you think constitutes a middle-class income?” The results are in the table. Also, include left endpoint, but not the right endpoint.

Salary ($) Relative Frequency < 20,000 0.02 20,000–25,000 0.09 25,000–30,000 0.19 30,000–40,000 0.26 40,000–50,000 0.18 50,000–75,000 0.17 75,000–99,999 0.02 100,000+ 0.01 1. What percentage of the survey answered “not sure”? 2. What percentage think that middle-class is from$25,000 to \$50,000?
3. Construct a histogram of the data.
1. Should all bars have the same width, based on the data? Why or why not?
2. How should the <20,000 and the 100,000+ intervals be handled? Why?
4. Find the 40th and 80th percentiles
5. Construct a bar graph of the data

39. Given the following box plot:

1. which quarter has the smallest spread of data? What is that spread?
2. which quarter has the largest spread of data? What is that spread?
3. find the interquartile range (IQR).
4. are there more data in the interval 5–10 or in the interval 10–13? How do you know this?
5. which interval has the fewest data in it? How do you know this?
1. 0–2
2. 2–4
3. 10–12
4. 12–13

40. The following box plot shows the U.S. population for 1990, the latest available year.

1. Are there fewer or more children (age 17 and under) than senior citizens (age 65 and over)? How do you know?
2. 12.6% are age 65 and over. Approximately what percentage of the population are working age adults (above age 17 to age 65)?

## Box Plots

41. In a survey of 20-year-olds in China, Germany, and the United States, people were asked the number of foreign countries they had visited in their lifetime. The following box plots display the results.

1. In complete sentences, describe what the shape of each box plot implies about the distribution of the data collected.
2. Have more Americans or more Germans surveyed been to over eight foreign countries?
3. Compare the three box plots. What do they imply about the foreign travel of 20-year-old residents of the three countries when compared to each other?

42. Given the following box plot, answer the questions.

1. Think of an example (in words) where the data might fit into the above box plot. In 2–5 sentences, write down the example.
2. What does it mean to have the first and second quartiles so close together, while the second to third quartiles are far apart?

43. Given the following box plots, answer the questions.

1. In complete sentences, explain why each statement is false.
1. Data 1 has more data values above two than Data 2 has above two.
2. The data sets cannot have the same mode.
3. For Data 1, there are more data values below four than there are above four.
2. For which group, Data 1 or Data 2, is the value of “7” more likely to be an outlier? Explain why in complete sentences.

44. A survey was conducted of 130 purchasers of new BMW 3 series cars, 130 purchasers of new BMW 5 series cars, and 130 purchasers of new BMW 7 series cars. In it, people were asked the age they were when they purchased their car. The following box plots display the results.

1. In complete sentences, describe what the shape of each box plot implies about the distribution of the data collected for that car series.
2. Which group is most likely to have an outlier? Explain how you determined that.
3. Compare the three box plots. What do they imply about the age of purchasing a BMW from the series when compared to each other?
4. Look at the BMW 5 series. Which quarter has the smallest spread of data? What is the spread?
5. Look at the BMW 5 series. Which quarter has the largest spread of data? What is the spread?
6. Look at the BMW 5 series. Estimate the interquartile range (IQR).
7. Look at the BMW 5 series. Are there more data in the interval 31 to 38 or in the interval 45 to 55? How do you know this?
8. Look at the BMW 5 series. Which interval has the fewest data in it? How do you know this?
1. 31–35
2. 38–41
3. 41–64
45. Twenty-five randomly selected students were asked the number of movies they watched the previous week. The results are as follows:
# of movies Frequency
0 5
1 9
2 6
3 4
4 1

Construct a box plot of the data.

46. Santa Clara County, CA, has approximately 27,873 Japanese-Americans. Their ages are as follows:

Age Group Percent of Community
0–17 18.9
18–24 8.0
25–34 22.8
35–44 15.0
45–54 13.1
55–64 11.9
65+ 10.3
1. Construct a histogram of the Japanese-American community in Santa Clara County, CA. The bars will not be the same width for this example. Why not? What impact does this have on the reliability of the graph?
2. What percentage of the community is under age 35?
3. Which box plot most resembles the information above?