Chapter 4: Numerical Methods for Describing Data Review Pack Name _________________________

Chapter 4: Numerical Methods for Describing Data
Review Pack
Name _________________________
The following questions are in a True / False format. The answers to these questions will
frequently depend on remembering facts, understanding of the concepts, and knowing the
statistical vocabulary. Before answering these questions, be sure to read them carefully!
T F
1. The trimmed mean is less sensitive to outliers than is the mean.
T F
2. The mean is the middle value of an ordered data set.
T F
3. One disadvantage of using the mean as a measure of center for a data set is that
its value is affected by the presence of even a single outlier in the data set.
T F
4. The variance is the positive square root of the standard deviation.
T F
5. For any given data set, the median must be greater than or equal to the lower
quartile, and less than or equal to the upper quartile.
T F
6. For data that is skewed to the right,
T F
7. By definition, an outlier is "extreme" if it is more than 3.0 iqr away from the
closest quartile.
T F
8. According to Chebyshev’s rule, the fraction of observations that are within 3
standard deviations of the mean is at least eight-ninths.
T F
9. When using a 20% trimmed mean, the largest 10% and the smallest 10% of the
observations are discarded for calculation purposes.
T F
10. When the histogram of a data set is closely approximated by a normal curve, the
standard deviation and the interquartile range are very close to equal on
average.
T F
11. The interquartile range is resistant to the effect of outliers.
T F
12. If there are no outliers, a skeletal and modified boxplot can differ in the length
of the box, but not in the whisker lengths.
 x  x   0 .
Chapter 4, Review Pack
Page 1 of 9
Chapter 4: Numerical Methods for Describing Data
Review Pack
1. Astronomers are interested in the recessional velocity of
galaxies – that is, the speed at which they are moving away
from the Milky Way. The accompanying table contains the
recessional velocities for a sample of galaxies, measured in
km/sec. Negative velocity indicates the galaxy is moving
towards us.
Recessional
velocities
(km/sec)
170
150
290
500
-130
920
-70
500
-220
960
200
500
290
850
200
800
300
1090
(a) Calculate these numerical summaries:
The mean
The standard deviation
The median
The interquartile range
_______________
_______________
_______________
_______________
(b) Construct a skeletal box plot for these data.
Chapter 4, Review Pack
Page 2 of 9
650
Chapter 4: Numerical Methods for Describing Data
Review Pack
(c) Judging from the data and your responses in parts (a) and (b), would you say this distribution
is skewed or approximately symmetric? Justify your response using appropriate statistical
terminology.
2. A wide variety of oak trees grow in the United States. In one study a
sample of acorns was collected from different locations, and their
volumes, in cm3, were recorded. In the table at right are summary
statistics for these data.
Acorn Statistics
Statistic
(a) Describe a procedure that uses these some or all of these summary
statistics to determine whether outliers are present in the data.
(b) Using your procedure from part (a), determine if there are outliers in
these data.
Chapter 4, Review Pack
Page 3 of 9
Value
N
38
Mean
3.0
Median
1.8
St. Dev.
2.6
Minimum
0.3
Maximum
10.5
1st
Quartile
1.1
3rd
Quartile
4.3
Chapter 4: Numerical Methods for Describing Data
Review Pack
3. An insurance agent is studying fire damage claims in a major city to see if the insurance
premiums are matched to the company's risk. She takes a random sample of 20 claims, and
finds the amount of each claim, in thousands of dollars. Her results are shown below:
Fire Damage Claims in a major city
($1,000)
52
59
32
54
45
73
39
62
97
65
58
48
62
28
30
69
13
41
75
36
(a) Under what circumstances should one consider using a trimmed mean as a description of the
center of a distribution?
(b) Does the fire damage data exhibit the characteristic(s) that suggest a trimmed mean is the
appropriate statistic to calculate? Explain.
Chapter 4, Review Pack
Page 4 of 9
Chapter 4: Numerical Methods for Describing Data
Review Pack
1. Consider a study in which the heights of a sample of 1000 high school seniors were recorded.
The mean height is 70" and the standard deviation of the heights is 3". It is observed that
the height distribution is approximately normal.
(a) Approximately what percent of heights in this sample would exceed 79"?
(b) What is the approximate percentile of a senior who is 73" tall?
(c) When the data were summarized the value of the first quartile was written down but then
smudged. There is general agreement that the writer meant to indicate either 66" or 68".
Which of these values is most likely the correct one? Justify your answer with appropriate
statistical reasoning.
Chapter 4, Review Pack
Page 5 of 9
Chapter 4: Numerical Methods for Describing Data
Review Pack
2. In recent years there has been considerable discussion about the appropriateness of the
body shapes and proportions of Ken and Barbie dolls. These dolls are very popular, and
there is some concern that the dolls may be viewed as having the "ideal body shape,"
potentially leading young children to risk anorexia in pursuit of that ideal. Researchers
investigating the dolls' body shapes scaled Ken and Barbie up to a common height of 170.18
cm (5' 7") and compared them to body measurements of active adults. Common
measures of body shape are the chest (bust), waist, and hip circumferences. These
measurements for Ken and Barbie and their reference groups are presented in the table
below:
Doll and Human Reference Group Measurements (cm)
Ken
Chest
Waist
Barbie
Hips
Chest
Waist
Hips
Doll
75.0
56.5
72.0
82.3
40.7
72.7
Human x
91.2
80.9
93.7
90.3
69.8
97.9
Human s
4.8
9.8
6.8
5.5
4.7
5.4
For the following questions, suppose that the researchers' scaled up dolls suddenly found
themselves in the human world of actual men and women.
(a) Convert Ken's chest, waist, and hips measurements to z-scores. Which of those
measures appears to be the most different from Ken's reference group? Justify your
response with an appropriate statistical argument.
Chapter 4, Review Pack
Page 6 of 9
Chapter 4: Numerical Methods for Describing Data
Review Pack
(b) The z-scores for Barbie's Chest, Waist, and Hips when compared to active female adults are
approximately 1.4 , 6.2 , and 4.7 respectively. Do these z-scores
provide evidence to justify the claim that the Barbie doll is a thin representation
of adult women? Justify your response with an appropriate statistical argument.
(c) If men's waist measurements are approximately normally distributed, based on the sample
above what is the approximate percentile of a 100 cm waist?
Chapter 4, Review Pack
Page 7 of 9
Chapter 4: Numerical Methods for Describing Data
Review Pack
3. The Territory of Iowa was initially surveyed in the 1830's. The surveyors were very careful to
note the trees and vegetation; it was believed at that time that the richness of the soil could
be measured by the density of trees encountered. The sample of Ash tree diameters from
the original survey of what is now Linn County, Iowa, is presented in the stem and leaf plot
below. The display uses five lines for each stem. Thus, "1t|" is the stem for diameters of 12
and 13, "1f|" for 14 and 15, "1s|" for 16 and 17, and so on. (The "t" then stands for leaves
that are twos and threes, the "f" for leaves of fours and fives, etc.)
The mean diameter of ash trees in this sample is 11.500 inches, and the standard deviation
is 3.842 inches.
Linn County Trees in 1830
Ash Diameters
1|0 = 10 inches
N = 102
0.|
0t|2
0f|44
0s|666777
0*|8888888888888888888999
1.|000000000000000000
1t|22222222222222222222222
1f|444444444445
1s|666666
1*|8888888888
2.|0
2t|
2f|4
2s|
2*|
(a) What is the approximate diameter of an ash tree at the 20th percentile in this distribution?
Chapter 4, Review Pack
Page 8 of 9
Chapter 4: Numerical Methods for Describing Data
Review Pack
(b) The Empirical Rule would suggest that 68% of ash tree diameters are between what two
values?
(c) Chebyshev's Rule would suggest that at least 75% of the data are between what two values?
Chapter 4, Review Pack
Page 9 of 9