Worksheet I

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
Introduction to Statistics and Probability, Stat101
Course Number: Stat101
Course Title: Introduction to Statistics and Probability
Worksheet I
1. Define the following terms
a. Quantitative variable
b. Categorical variable
2. Are the following variables quantitative or categorical? Put Q” or a “C” in the space
provided
a. Number of cigarettes smoked per day.
b. Grade point average (on a scale of 0-4).
c. A person’s occupation.
d. Marital status
e. A person’s height (in inches)
f. Number of children in a family
g. Type of newspaper read
h. Hair color
i. Pulse rate (beats per minute)
j. Size of a car’s gas tank (in liters)
3. Define the following terms
a. Discrete Variable
b. Continuous Variable
4. Indicate whether the following variables are discrete or continuous:
a. Number stocks sold every day in the stock exchange.
b. Hourly temperatures recorded at an observatory.
c. Lifetime of a car.
d. The diameter of the wheels of several cars.
e. Number of children from 50 families.
f. Annual Census of Americans.
5. In an internet poll from quibblo.com 629 respondents indicated the following as their
favorite Harry Potter character. Find the percents for Severus rounded to the nearest onesplace. What kind of graph (among bar graph, pie chart and line graph) would be most
appropriate to represent this data?
Favorite Harry Potter Character Count Percent
Hermione
164
Harry
129
The Dark Lord
43
Dumbledore
52
Severus
241
1|Page
6. Match the distribution
7. The interquartile range (IQR) is the difference between the upper and lower quartiles.
Find IRQ for each of the following
a. 5,2,5,4,3,5,5,4,2,2,2,5,2,4,5
b. 5,3,6,6,2,2,6,3,5,2,4
c. 2,5,7,7,5,2,7
d. 5,8,5,8,8,5,4
e. 8,2,6,10,2,6,5,6,6,2,4,9,4,3,4
f. −12,−3,−7,−12,−13,−9,−6,−9,−13,−3,−7 g. −1.1,−0.2,−0.3,−1.4,−0.4,−0.4,−1.2
8. Based on the following box and whisker plot, list the 5 number summary and the
interquartile range.
2|Page
9. For questions A through D, find the 5 number summary, the interquartile range, and then
construct a box and whisker plot for the data given.
A. 8, 15, 12, 10, 7, 6, 4, 10, 15
B. 85, 92, 97, 100, 70, 60, 85, 95, 90
C. 180, 150, 100, 250, 275, 325, 460, 540, 500, 410, 150
D. 2.3, 8.6, 5.4, 3.1, 2.7, 9.3, 7.4, 8.1, 10.2, 11.3
10. What percent of the data is…
A. higher than the lower quartile?
B. lower than the maximum?
C. lower than the upper quartile?
D. higher than the upper quartile?
E. higher than the median?
11. Draw the box and the whisker plot for the data set: 40, 42, 28, 38, 41, 39, 41, 47, 44.
12. Given the statistical distribution of the table.
xi
61
64
67
70
73
fi
5
18
42
27
8
a. The mode, median and mean.
b. The range, variance and standard deviation.
13. Calculate the mean, median and mode for the following set of numbers: 5, 3, 6, 5, 4, 5, 2,
8, 6, 5, 4, 8, 3, 4, 5, 4, 8, 2, 5, 4.
14. Tell whether the correlation coefficient for the data is closest to -1, -0.5, 0, 0.5, or 1.
15. The table shows the number y (in thousands) of alternative-fueled vehicles in use in the
United States x years after 1997. Approximate the best-fitting line for the data.
x
y
0
280
1
295
2
322
550
500
450
400
350
300
0
3|Page
1
2
3
4
5
6
7
3
395
4
425
5
471
6
511
7
548
16. In Exercises a- d, (a) draw a scatter plot of the data, (b) approximate the
best-fitting line, and (c) estimate y when x = 20
a.
b.
c.
d.
17. Make a scatter plot showing the number of home owners on one axis and vacation
homeowners on the other axis. If there is a trend, draw a trend line
18. Calculate the standard deviation of the following test data by hand. Test Scores: 22, 99,
102, 33, 57, 75, 100, 81, 62, 29. Mean:_____________
n:_______________
19. For the following sets of data, calculate the mean and standard deviation of the data.
Describe the mean and standard deviation in words after calculating it.
a. The data set below gives the prices (in dollars) of cordless phones at an
electronics store. 35, 50, 60, 60, 75, 65, 80.
b. The data set below gives the numbers of home runs for the 10 batters who hit the
most home runs during the 2005 Major League Baseball regular season. 51, 48,
47, 46, 45, 43, 41, 40, 40, 39.
c. The data set below gives the waiting times (in minutes) of several people at a
department of motor vehicles service center. 11, 7, 14, 2, 8, 13, 3, 6, 10, 3, 8, 4, 8,
4, 7
d. The data set below gives the calories in a 1-ounce serving of several breakfast
cereals. 135, 115, 120, 110, 110, 100, 105, 110, 125
4|Page
20. The table below displays data on the temperature (◦F) reached on a given day and the
number of cans of soft drink sold from a particular vending machine in front of a grocery
store.
a. Draw a scatterplot of the data.
b. Compute the correlation coefficient r
c. Based on the computed value of r, what can you say about the association
between the temperature and the number of soft drinks sold?
d. Compute the slope b for the least squares regression line and give an
interpretation of the slope within the context of the problem
e. Compute the intercept a for the least squares regression line and give an
interpretation of the slope within the context of the problem
f. State the least squares regression line.
g. For a temperature of 85◦F, predict how many cans of soft drinks will be sold
h. Can you predict the number of soft drinks being sold for a temperature of 62 ◦F?
Ex-plain why or why not!
i. Compute the residual for the predicted values based on x= 72 and x= 91.
j. Compute the coefficient of determination and give an interpretation the
coefficient of determination.
21. You flip four coins. Let X, the random variable, be the number of heads on all four
coins.
a. List the sample space for the experiment.
b. What are the possible values for X?
c. Is the random variable, X, continuous or discrete?
d. Construct a probability distribution for this experiment.
22. Consider the random variable, X, denote to be the sum of two fair dice.
a. What are the possible values of X?
b. Find the probability mass function for X.
c. Draw probability histogram of the random variable X.
d. Find the cumulative mass function for X.
e. Find the probability, P [2X < 8].
f. Find the mean of the random variable X.
g. Find the standard deviation of the random variable X.
23. A SRS of 600 people was asked how much they make in a year. In the sample, no one
made less than $20,000 and people who made more than $60,000 are included in
$60,000.
X
P(X)
a.
b.
c.
d.
5|Page
20,000
0.14
30,000
0.23
40,000
0.24
50,000
0.27
Verify the probability distribution is legitimate
Draw a probability histogram for the data
Find P(X= 40,000)
Find P(X > 30,000)
60,000
0.12
e. Find P(20,000 < X < 50,000)
f. Find P(X < 40,000)
24. Suppose you toss a fair coin 8 times. Let X = the number of heads
a. Make a probability distribution table for X.
b. What is the probability that you get all heads
c. What is the probability that you get 4 heads
d. What is the probability that you get at least 1 head
e. What is the probability that you get 7 or 8 heads
25. Discrete Probability Distribution: The following data set presents the probability
distribution for the number of students absent from our statistics class. Calculate the
mean, the variance, and standard deviation of the data using the standard method.
26. In the following table, income units are in thousands of dollars, and each interval goes up
to but does not include the given high value. Calculate the mean, the variance, and
standard deviation of the data using the standard method.
27. A fair coin is flipped six times and the number of heads is counted.
a. Calculate the probability of exactly two heads.
b. Calculate the probability that the coin will land heads fewer than three times.
c. Calculate the probability that the coin will land heads more than four times.
d. Calculate the mean, variance, and standard deviation of the binomial distribution.
6|Page