DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Introduction to Statistics and Probability, Stat101 Course Number: Stat101 Course Title: Introduction to Statistics and Probability Worksheet I 1. Define the following terms a. Quantitative variable b. Categorical variable 2. Are the following variables quantitative or categorical? Put Q” or a “C” in the space provided a. Number of cigarettes smoked per day. b. Grade point average (on a scale of 0-4). c. A person’s occupation. d. Marital status e. A person’s height (in inches) f. Number of children in a family g. Type of newspaper read h. Hair color i. Pulse rate (beats per minute) j. Size of a car’s gas tank (in liters) 3. Define the following terms a. Discrete Variable b. Continuous Variable 4. Indicate whether the following variables are discrete or continuous: a. Number stocks sold every day in the stock exchange. b. Hourly temperatures recorded at an observatory. c. Lifetime of a car. d. The diameter of the wheels of several cars. e. Number of children from 50 families. f. Annual Census of Americans. 5. In an internet poll from quibblo.com 629 respondents indicated the following as their favorite Harry Potter character. Find the percents for Severus rounded to the nearest onesplace. What kind of graph (among bar graph, pie chart and line graph) would be most appropriate to represent this data? Favorite Harry Potter Character Count Percent Hermione 164 Harry 129 The Dark Lord 43 Dumbledore 52 Severus 241 1|Page 6. Match the distribution 7. The interquartile range (IQR) is the difference between the upper and lower quartiles. Find IRQ for each of the following a. 5,2,5,4,3,5,5,4,2,2,2,5,2,4,5 b. 5,3,6,6,2,2,6,3,5,2,4 c. 2,5,7,7,5,2,7 d. 5,8,5,8,8,5,4 e. 8,2,6,10,2,6,5,6,6,2,4,9,4,3,4 f. −12,−3,−7,−12,−13,−9,−6,−9,−13,−3,−7 g. −1.1,−0.2,−0.3,−1.4,−0.4,−0.4,−1.2 8. Based on the following box and whisker plot, list the 5 number summary and the interquartile range. 2|Page 9. For questions A through D, find the 5 number summary, the interquartile range, and then construct a box and whisker plot for the data given. A. 8, 15, 12, 10, 7, 6, 4, 10, 15 B. 85, 92, 97, 100, 70, 60, 85, 95, 90 C. 180, 150, 100, 250, 275, 325, 460, 540, 500, 410, 150 D. 2.3, 8.6, 5.4, 3.1, 2.7, 9.3, 7.4, 8.1, 10.2, 11.3 10. What percent of the data is… A. higher than the lower quartile? B. lower than the maximum? C. lower than the upper quartile? D. higher than the upper quartile? E. higher than the median? 11. Draw the box and the whisker plot for the data set: 40, 42, 28, 38, 41, 39, 41, 47, 44. 12. Given the statistical distribution of the table. xi 61 64 67 70 73 fi 5 18 42 27 8 a. The mode, median and mean. b. The range, variance and standard deviation. 13. Calculate the mean, median and mode for the following set of numbers: 5, 3, 6, 5, 4, 5, 2, 8, 6, 5, 4, 8, 3, 4, 5, 4, 8, 2, 5, 4. 14. Tell whether the correlation coefficient for the data is closest to -1, -0.5, 0, 0.5, or 1. 15. The table shows the number y (in thousands) of alternative-fueled vehicles in use in the United States x years after 1997. Approximate the best-fitting line for the data. x y 0 280 1 295 2 322 550 500 450 400 350 300 0 3|Page 1 2 3 4 5 6 7 3 395 4 425 5 471 6 511 7 548 16. In Exercises a- d, (a) draw a scatter plot of the data, (b) approximate the best-fitting line, and (c) estimate y when x = 20 a. b. c. d. 17. Make a scatter plot showing the number of home owners on one axis and vacation homeowners on the other axis. If there is a trend, draw a trend line 18. Calculate the standard deviation of the following test data by hand. Test Scores: 22, 99, 102, 33, 57, 75, 100, 81, 62, 29. Mean:_____________ n:_______________ 19. For the following sets of data, calculate the mean and standard deviation of the data. Describe the mean and standard deviation in words after calculating it. a. The data set below gives the prices (in dollars) of cordless phones at an electronics store. 35, 50, 60, 60, 75, 65, 80. b. The data set below gives the numbers of home runs for the 10 batters who hit the most home runs during the 2005 Major League Baseball regular season. 51, 48, 47, 46, 45, 43, 41, 40, 40, 39. c. The data set below gives the waiting times (in minutes) of several people at a department of motor vehicles service center. 11, 7, 14, 2, 8, 13, 3, 6, 10, 3, 8, 4, 8, 4, 7 d. The data set below gives the calories in a 1-ounce serving of several breakfast cereals. 135, 115, 120, 110, 110, 100, 105, 110, 125 4|Page 20. The table below displays data on the temperature (◦F) reached on a given day and the number of cans of soft drink sold from a particular vending machine in front of a grocery store. a. Draw a scatterplot of the data. b. Compute the correlation coefficient r c. Based on the computed value of r, what can you say about the association between the temperature and the number of soft drinks sold? d. Compute the slope b for the least squares regression line and give an interpretation of the slope within the context of the problem e. Compute the intercept a for the least squares regression line and give an interpretation of the slope within the context of the problem f. State the least squares regression line. g. For a temperature of 85◦F, predict how many cans of soft drinks will be sold h. Can you predict the number of soft drinks being sold for a temperature of 62 ◦F? Ex-plain why or why not! i. Compute the residual for the predicted values based on x= 72 and x= 91. j. Compute the coefficient of determination and give an interpretation the coefficient of determination. 21. You flip four coins. Let X, the random variable, be the number of heads on all four coins. a. List the sample space for the experiment. b. What are the possible values for X? c. Is the random variable, X, continuous or discrete? d. Construct a probability distribution for this experiment. 22. Consider the random variable, X, denote to be the sum of two fair dice. a. What are the possible values of X? b. Find the probability mass function for X. c. Draw probability histogram of the random variable X. d. Find the cumulative mass function for X. e. Find the probability, P [2X < 8]. f. Find the mean of the random variable X. g. Find the standard deviation of the random variable X. 23. A SRS of 600 people was asked how much they make in a year. In the sample, no one made less than $20,000 and people who made more than $60,000 are included in $60,000. X P(X) a. b. c. d. 5|Page 20,000 0.14 30,000 0.23 40,000 0.24 50,000 0.27 Verify the probability distribution is legitimate Draw a probability histogram for the data Find P(X= 40,000) Find P(X > 30,000) 60,000 0.12 e. Find P(20,000 < X < 50,000) f. Find P(X < 40,000) 24. Suppose you toss a fair coin 8 times. Let X = the number of heads a. Make a probability distribution table for X. b. What is the probability that you get all heads c. What is the probability that you get 4 heads d. What is the probability that you get at least 1 head e. What is the probability that you get 7 or 8 heads 25. Discrete Probability Distribution: The following data set presents the probability distribution for the number of students absent from our statistics class. Calculate the mean, the variance, and standard deviation of the data using the standard method. 26. In the following table, income units are in thousands of dollars, and each interval goes up to but does not include the given high value. Calculate the mean, the variance, and standard deviation of the data using the standard method. 27. A fair coin is flipped six times and the number of heads is counted. a. Calculate the probability of exactly two heads. b. Calculate the probability that the coin will land heads fewer than three times. c. Calculate the probability that the coin will land heads more than four times. d. Calculate the mean, variance, and standard deviation of the binomial distribution. 6|Page
© Copyright 2024