Lecture Notes On CENG 272 Statistical Computations Prepared by: Dr. Emre Sermutlu Based on the book: Probability and Statistics for Engineers and Scientists, Ninth Edition, Walpole, Myers, Myers, Ye, Pearson Education Last Update: March 12, 2015 . Week 1– Introduction A population is a collection of individual items of a particular type. A sample is a subset of the population, selected by a definite procedure. In a biased sample, the probability for each member of population to be selected is not equal. Sample Mean: x1 + x 2 + · · · xn x= n Sample Median: If the observations in the sample are ordered as x1 , x2 , . . . , xn the median is: x(n+1)/2 if n is odd x˜ = 1 (xn/2 + xn/2+1 ) if n is even 2 An alternative to mean and median is the trimmed mean. For example, we can eliminate largest and smallest %10 of the data and find the mean of the remaining elements. This is called %10 trimmed mean. Variance: n X (xi − x)2 2 s = n−1 i=1 Standard Deviation: √ s= s2 In statistics, any process that generates a set of data is called an experiment. The set of all possible outcomes of a statistical experiment is called the sample space and is denoted by S. For example, if we toss a coin twice, the sample space is: S = {HH, HT, T H, T T }. An event is a subset of a sample space. The complement of an event A is the set of all elements of S that are not in A, denoted by A0 . Two events A and B are mutually exclusive or disjoint if A ∩ B = ∅. Exercise 1-1: An experiment consists of tossing a die and then flipping a coin. Describe the sample space. Exercise 1-2: An experiment consists of tossing a die and then flipping a coin once if the number is even, twice if it is odd. Describe the sample space. Exercise 1-3: A student is registered to 2 courses. He can get one of 5 different letter grades for each course, (A,B,C,D,F) Describe the sample space of his grade distributions. Find the event he passes all, he fails all and the complements of these two. 1 Multiplication Rule: If an operation can be performed in n ways, and if for each of these ways a second operation can be performed in m ways, then the two operations can be performed together in nm ways. Permutation: A permutation is an arrangement of a set of objects. The number of permutations of n objects is n!. Permutation of n objects taken r at a time is: P (n, r) = n! (n − r)! The number of distinct permutations of n things of which n1 are of one kind, n2 are of a second kind etc. is: n! n1 !n2 ! · · · nk ! Combination: The number of combinations of n distinct objects taken r at a time is: n n! = r r!(n − r)! Exercise 1-4: How many 12 digit numbers contain exactly four 9’s? Exercise 1-5: A football team plays 20 matches in a season. The matches result in win, loss or tie. In how many different ways can the team end the season with: a) No loss? b) 10 wins, 4 losses, 6 ties? Exercise 1-6: 6 people, A, B, C, D, E, F will sit around a circular table. a) In how many ways can they do that? b) A wants to sit together with B. In how many ways can they do that? c) C does not want to sit together with D. In how many ways can they do that? 2 Probability Probability of an event A denotes the weight of A in S. Therefore: P (S) = 1, P (∅) = 0, 0 6 P (A) 6 1 Furthermore, if A and B are mutually exclusive events P (A ∪ B) = P (A) + P (B) In general, for two events A and B we have: P (A ∪ B) = P (A) + P (B) − P (A ∩ B) and for three events A, B and C we have: P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C) + P (A ∩ B ∩ C) For complementary events, P (A) + P (A0 ) = 1 Exercise 1-7: There are 5 black and 4 red balls in a bag. We randomly choose three balls without replacement. Find the probabilities that we get a) 3 black b) At least 2 black c) At least 1 black Exercise 1-8: We toss a pair of dice. What is the probability that: a) The sum is 7? b) The maximum number is 4? c) We have a double number? Exercise 1-9: A fair coin is tossed 5 times. Find the probability of getting a) No heads b) Exactly one head c) 3 or more heads. Exercise 1-10: We choose a number from {1, 2, . . . , n} randomly. We repeat this n times. What is the probability that we choose 1 at least once? Exercise 1-11: In a game of chance, your probability of winning is 0.7. You play this game five times. What is the probability that a) You win 3 or more games? b) You lose all of them? Solution: 5 5 5 3 2 4 a) 0.7 0.3 + 0.7 0.3 + 0.75 = 0.83692 3 4 5 b) 0.35 = 0.00243 3 Exercise 1-12: There are 17 balls in a box. 5 are blue, 8 are red and 4 are green. We randomly choose 5 balls. What is the probability that we choose equal number of blue and red balls? Solution: Possible choices are: 1 blue 1 red and 2 blue 2 red. 5 8 4 5 8 4 2 2 1 1 1 3 + = 0.0259 + 0.1810 = 0.2069 17 17 5 5 Exercise 1-13: A file server has 4 harddisks. Each disk has a 5% probability of failure within one year. If one (or more) disk fails, the whole system fails. a) What is the probability that the system will fail within one year? b) We decide to improve the system reliability by adding an extra disk. Now we have 5 disks, and the system works if 4 or 5 disks are working, fails otherwise. What is the probability that the new system will fail within one year? Solution: a) 1 − 0.954 = 0.1855 b) 1 − 0.955 − 5 · 0.954 · 0.05 = 0.0226 Exercise 1-14: In a computer game, there are three results: Win, Draw, Lose. The probabilities are: 0.4, 0.5, 0.1. You get 2 points for Win, 1 for Draw and 0 for Lose. What is the probability that you get 16 points after playing this game for 10 rounds? Solution: To get 16 points, we may get 8W +0D +2L or 7W +2D +1L or 6W +4D +0L. We can find the probabilities using multinomial distribution: 10! 10! 10! 0.48 0.50 0.12 + 0.47 0.52 0.11 + 0.46 0.54 0.10 8! 0! 2! 7! 2! 1! 6! 4! 0! = 0.0003 + 0.0147 + 0.0538 = 0.0688 4 Week 2– Conditional Probability The conditional probability of B, given A, denoted by P (B A) is defined as: P (A ∩ B) , provided P (A) > 0 P (B A) = P (A) Two events A and B are independent if and only if P (B A) = P (B) or P (AB) = P (A) Otherwise, A and B are dependent. Theorem: Two events A and B are independent if and only if P (A ∩ B) = P (A)P (B) Exercise 2-1: In a classroom of 50 students, 28 are girls and 22 are boys. 16 of the girls are from Ankara, and 10 of the boys are from Ankara. We randomly choose a student. a) Given that the student is a girl, what is the probability that she’s from Ankara? b) Given that the student is from Ankara, what is the probability that the student is a girl? c) Are these events independent? Exercise 2-2: In a city, cars are colored black, white or red. 10% of all cars are black, 60% are white, the rest are red. In the past one year, 4% of all cars had an accident. 15% of all cars that had an accident are black, 45% are white, the rest are red. a) Given that a car is red, what is the probability that it had an accident? b) Are these events independent? Solution: Using the values 0.04 × 0.15 = 0.006, 0.016, we can fill the table as follows: 0.04 × 0.45 = 0.018 and 0.04 × 0.40 = Black White Red Accident 0.006 0.018 0.016 NO Accident 0.094 0.582 0.284 P (Acc. ∩ R) 0.016 a) P (Acc.R) = = = 0.053 P (R) 0.016 + 0.284 b) These events are dependent. P (Acc.) = %4, P (Acc.R) = %5.3 ⇒ P (Acc.) 6= P (Acc.R) 5 Exercise 2-3: In a country, people are unemployed with 0.20 probability. 60% of the population is young. The probability that a person is unemployed given that he/she is young is 0.25. What is the probability that an old person is unemployed? Are these events independent? Answer: 0.125, NO, they are dependent Exercise 2-4: A driver uses road 1 with probability 0.3 and road 2 with probability 0.7. On road 1, the probability that he sees a police car is 0.5, on road 2 it is 0.2. Given that he saw a police car, what is the probability he took road 1? Exercise 2-5: There are 20 balls in a box. There is a 50% probability that all are white, 30% probability that 18 are white and 2 are black, 20% probability that 15 are white and 5 are black. We randomly choose two balls and see that both are white. What is the probability that all balls are white? Exercise 2-6: The probability that a married man watches Muhte¸sem Y¨ uzyıl is 0.5. The probability that a married woman watches it is 0.6. The probability that a man watches it, given that his wife does is 0.7. Given that a married man watches Muhte¸sem Y¨ uzyıl, what is the probability that his wife watches it? Are these events independent? Exercise 2-7: There are two roads, A and B that I can take in the mornings. I prefer A 80% of the time. If I choose A, I arrive work early with probability 0.1, on time with probability 0.8 and late with probability 0.1. For road B, these probabilities are 0.5, 0.3 and 0.2. a) What is the probability that I arrive work early? b) Given that I arrived early, what is the probability that I have taken B? Solution: a) 0.8 · 0.1 + 0.2 · 0.5 = 0.18 b) 0.1 = 0.5556 0.18 6 Exercise 2-8: We have a shipment of 20 components. There is a 60% probability that it is good (no defectives), 30% probability that it is medium (1 defective) and 10% probability that it is bad (2 defectives). We randomly choose two components, test them, and see that neither is defective. a) What is the probability that the shipment is good? b) What is the probability that the shipment is medium? c) What is the probability that the shipment is bad? Solution: The probability that there are no defectives among the two components we test is: 18 19 P (N D) = 0.60 × 1 + 0.30 × 2 20 2 + 0.10 × 2 20 2 = 0.60 + 0.27 + 0.0805 = 0.9505 P (G ∩ N D) 0.60 a) P (GN D) = = = 0.631 P (N D) 0.9505 P (M ∩ N D) 0.27 b) P (M N D) = = = 0.284 P (N D) 0.9505 P (B ∩ N D) 0.0805 c) P (B N D) = = = 0.085 P (N D) 0.9505 Exercise 2-9: 65% of the customers of a coffee shop are women, the rest are men. A woman orders cappuccino with 50% probability, she orders espresso with 30% probability and orders something else with 20% probability. For a man these numbers are 25%, 50% and 25%. Given that a customer ordered espresso, what is the probability that the customer is a man? Solution: Let E denote that the customer orders espresso, M denote the customer is a man and W denote the customer is a woman. P (M | E) = P (M ∩ E) 0.35 × 0.5 = = 0.47 P (E) 0.35 × 0.5 + 0.65 × 0.3 Exercise 2-10: Three assistants, Bu˜gra, Alphan and O˜guz are grading homeworks. Bu˜gra grades 30% of the homeworks, Alphan grades 45% and O˜guz grades the rest. Bu˜gra makes 2 mistakes per 100 homeworks, Alphan makes 3 and O˜guz makes 5. I have a homework that was graded wrongly, but I don’t know who graded it. What is the probability it was O˜guz? Solution: Let O denote O˜guz has graded the homework and M denote a mistake was made. P (O | M ) = P (O ∩ M ) 0.25 × 0.05 = = 0.39 P (M ) 0.25 × 0.05 + 0.45 × 0.03 + 0.30 × 0.02 7 Exercise 2-11: I feel sad 5%, happy 35% and normal 60% of the time. On any given day, if I feel normal, I do not go to canteen. If I feel sad, I go to canteen with probability 70%. If I feel happy, I go to the canteen with probability 30%. At the canteen, I order either black coffee or coffee with milk. On happy days the probabilities are 50% - 50%, on sad days 10% - 90%. Today I was at the canteen, drinking coffee with milk. What is the probability I am feeling happy? Solution: The event Sad+Canteen+Coffee with Milk has probability 0.05×0.70×0.90 = 0.0315. The alternative event Happy+Canteen+Coffee with Milk has probability 0.35 × 0.30 × 0.50 = 0.0525. We know one of these happened, so using conditional probability formulas, I am happy with probability: 0.0525 = 0.625 = 62.5% 0.0525 + 0.0315 8 Week 3– Probability Distributions A Random Variable is a function that associates a real number with each element in the sample space. Random variables can be discrete or continuous. Exercise 3-1: There are 10 balls in a box. 4 of them are black, 6 are white. We draw two balls without replacement. The number of black balls is a random variable. (discrete) Exercise 3-2: We take cell phones of 4 students and redistribute them randomly. The number of students getting the correct phone is a random variable. (discrete) Exercise 3-3: The time between passing of two trucks along the way is a random variable. (continuous) Discrete Probability Distributions f (x) is a probability function, or probability distribution of the discrete random variable X if: • f (x) > 0 X • f (x) = 1 x • P (X = x) = f (x) Exercise 3-4: Among a shipment of 20 laptops, 3 are defective. We purchase 2. Find the probability distribution for the number of defectives. Let the discrete random variable X have the probability distribution f (x). The cumulative distribution function F (x) is defined as X F (x) = P (X 6 x) = f (t) t6x Exercise 3-5: A box contains 2 black and 5 white balls. We randomly select 3. If x is the number of black balls we choose, find the probability distribution of X. Then, find the cumulative distribution function. Answer: 2/7, 4/7, 1/7, , 0, 2/7, 6/7, 1 Continuous Probability Distributions f (x) is a probability density function of the continuous random variable X if: • f (x) > 0 Z ∞ • f (x) dx = 1 −∞ Z • P (a < X < b) = b f (x) dx a 9 Exercise 3-6: Determine c such that f (x) = c(x2 + 4) for x = 0, 1, 2, 3 is a probability distribution. Answer: 1/30 Exercise 3-7: Let the error in an experiment be given by 2 x −1 < x < 2 3 f (x) = 0 elsewhere a) Verify that f (x) is a density function. b) Find P (0 < X 6 1) Exercise 3-8: Consider the density function ( √ k x 0<x<1 f (x) = 0 elsewhere a) Find k. b) Find P (0.3 < X < 0.6) Answer: 3/2, 0.3 The Cumulative Distribution Function F (x) of a continuous random variable X with density function f (x) is: Z x f (t) dt F (x) = P (X 6 x) = −∞ Therefore P (a < X < b) = F (b) − F (a). Exercise 3-9: The time to failure in hours of an electronic equipment is ( 0 x<0 f (x) = exp(−x/2000)/2000 x > 0 a) Find F (x) b) Find the equipment lasts at least 1000 hours c) Find the probability that it fails before 2000 hours. Answer: 0.6065, 0.6321 10 Exercise 3-10: The probability distribution for a continuous random variable X is: ( k(1 − x)4 0 6 x 6 1 f (x) = 0 elsewhere a) Find k b) Find P (0.8 < X) Z Z 1 4 k(1 − x) dx = k Solution: a) 0 u4 (−du) = 1 0 Z k =1 5 ⇒ k=5 ∞ b) P (0.8 < x) = 5(1 − x)4 dx = 0.25 = 0.00032 0.8 Exercise 3-11: The waiting time, in hours, for a police radar is a continuous random variable with probability density function: ( 0 x<0 f (x) = 8 exp(−8x) x > 0 Find the probability of waiting less than 12 minutes. Answer: 0.7981 Exercise 3-12: The particle size (in micrometers) distribution in a chemical mixture is given by ( −4 3x x>1 f (x) = 0 elsewhere Find the probability that the particle size is greater than 4 micrometers. 11 Exercise 3-13: A continuous random variable X has the probability distribution 0 x<2 k(1 + x) 2 6 x 6 5 f (x) = 0 5<x a) Find k. b) Find P (4 6 X 6 8) Solution: a) Z ∞ Z 5 f (x) dx = k(1 + x) dx −∞ 2 k(1 + x)2 = 2 = ⇒ k= 5 2 27k =1 2 2 27 Z 8 f (x) dx P (4 6 x 6 8) = b) 4 Z 5 = 4 = 2 (1 + x) dx 27 11 27 = 0.4074 Exercise 3-14: Emre hoca announces exam results x hours after the exam ends. The probability distribution function of x is: ke−x/24 18 < x f (x) = 0 elsewhere a) Find k. b) Find the probability that an exam result is announced within 36 hours of end of the exam. Z ∞ e3/4 −x/24 Solution: a) k e dx = 1 ⇒ k = = 0.0882 24 18 Z 36 b) P = k e−x/24 dx = 0.5276 18 12 Exercise 3-15: The concentration of a pollutant is a continuous random variable with probability density function: c x>1 x4 f (x) = 0 elsewhere a) Find c. b) Find P (3 < x < 4) Solution: ∞ c c c = =1 ⇒ c=3 dx = x4 −3x3 1 3 1 4 Z 4 3 1 −1 1 P (3 < x < 4) = + = 0.0214 dx = 3 = − 4 x 3 64 27 3 x Z ∞ 13 Exercise 3-16: The time to failure in years of an electronic equipment is t<0 0 f (x) = −t/3 e t>0 3 The company will replace any product that had a lifetime less than 1 year. What proportion of the products will they replace? Solution: Z 0 1 1 e−t/3 dx = −e−t/3 = 1 − e−1/3 = 0.283 3 0 They will replace %28.3 of the products. 14 Week 4– Joint Probability Distributions Discrete case: f (x, y) is a probability mass function, or joint probability distribution of the discrete random variables X and Y if: • f (x, y) > 0 XX • f (x, y) = 1 x y • P (X = x, Y = y) = f (x, y) Continuous case: f (x, y) is a joint density function of the continuous random variables X and Y if: • f (x, y) > 0 Z ∞Z ∞ • f (x, y) dx dy = 1 −∞ −∞ Z Z • P ((X, Y ) ∈ A) = f (x, y) dxdy for any region A in the xy−plane A Exercise 4-1: A box contains 3 blue, 2 red and 3 green pens. We randomly choose 2 pens. If X is the number of blue and Y is the number of red pens, find a) the joint probability function f (x, y). b) P (X, Y ) ∈ A where A is the region {(x, y) | x + y 6 1} Exercise 4-2: Let 2 (2x + 3y) 0 6 x 6 1, 0 6 y 6 1 5 f (x, y) = 0 elsewhere a) Verify that it is a probability density function 1 1 1 b) Calculate the probability that 0 < x < and < y < 2 4 2 Answer: 13/160 15 Marginal Distributions Given a joint probability distribution f (x, y), we can find the probability distribution of x only or y only as follows: X X g(x) = f (x, y) and h(y) = f (x, y) y Z x ∞ Z f (x, y) dy g(x) = and ∞ h(y) = −∞ f (x, y) dx −∞ 2 x(1 + 3y ) 4 Exercise 4-3: Let f (x, y) = 0 0 < x < 2, 0 < y < 1 elsewhere Find marginal distributions g(x) and h(y). 2 Answer: x/2, (1 + 3y )/2 ( 10xy 2 0<x<y<1 0 elsewhere Exercise 4-4: Let f (x, y) = Find marginal distributions g(x) and h(y). 3 4 Answer: 10x(1 − x )/3, 5y Statistical Independence The random variables X and Y are said to be statistically independent if and only if f (x, y) = g(x)h(y) Exercise 4-5: Let X and Y have the distribution given in table: x f 1 2 1 0.2 0.3 y 2 0.4 0.1 a) Find the marginal distributions of X and Y . b) Are they statistically independent? Exercise 4-6: Given a joint density function 2 3y 0 6 x, 1 6 y 6 3 26ex f (x, y) = 0 elsewhere are X and Y statistically independent? 16 Exercise 4-7: Given a joint density function 4 + 6x + 3y 128 f (x, y) = 0 06x+y 64 elsewhere are X and Y statistically independent? Exercise 4-8: Age and income distribution in a country is given by the following table in percentages: Age 20-34 35-49 50-64 658 7 4 3 Less than $20 000 $20 000-$40 000 13 10 8 6 Income 5 6 8 7 $40 000-$60 000 Greater than$60 000 2 2 5 6 a) Find the marginal distributions for age and income. b) Are they independent? Exercise 4-9: A coffee factory investigates the relation between wind speed and quality of coffee produced that day. They obtain the following table for probabilities: Wind Quality Low Calm (No wind) 0.03 Light Wind 0.05 Strong Wind 0.02 Average 0.225 0.375 0.15 High 0.045 0.075 0.03 a) Find the marginal distributions for quality and wind speed. b) Are they independent? c) Find the probability that we obtain high quality coffee, given that there is strong wind. d) Find the probability that there is strong wind, given that we obtain high quality coffee. Solution: a) Quality g(x): Low: 0.1, Average: 0.75, High:0.15 Wind h(y): Calm: 0.3, Light: 0.5, Strong: 0.2 b) Multiplication of these numbers give exactly the above table. In other words f (x, y) = g(x) · h(y). Therefore, wind speed and quality are independent. c) 0.03 = 0.15 0.03 + 0.02 + 0.15 d) 0.03 = 0.20 0.045 + 0.075 + 0.03 17 Exercise 4-10: Given a joint density function 2 2 x + 2y −5 < x < 5, −5 < y < 5 2500 f (x, y) = 0 elsewhere what is the probability that 2 < x and 3 < y? Solution: Z 3 5 Z 2 5 5 x3 2 + 2y x dy 3 3 2 5 1 117y = + 2y 3 2500 3 3 x2 + 2y 2 1 dx dy = 2500 2500 = Z 5 274 2500 = 0.1096 Exercise 4-11: Given a joint density function 2 2 x + 2y −5 < x < 5, −5 < y < 5 2500 f (x, y) = 0 elsewhere are X and Y statistically independent? Solution: 5 Z g(x) = −5 Z 5 h(y) = −5 x2 + 2y 2 x2 1 dy = + , 2500 250 15 −5 < x < 5 x2 + 2y 2 y2 1 dx = + , 2500 125 30 −5 < y < 5 g(x) · h(y) 6= f (x, y) ⇒ 18 They are dependent Week 5– Mathematical Expectation Expected Value Let X be a random variable with probability distribution f (x). The mean, or expected value of X is: X µ = E(X) = xf (x) x if X is discrete and ∞ Z µ = E(X) = xf (x) dx −∞ if X is continuous. Exercise 5-1: A lot containing 7 components contains 4 good and 3 defective ones. We take a sample of 3. Find the expected value of number of good components. Answer: 12/7 Exercise 5-2: Let X be the random variable that denotes the life in hours of a certain electronic device. The probability density function is 20000 x > 100 x3 f (x) = 0 elsewhere Find the expected life of this type of device. Answer: 200 Let X be a random variable with probability distribution f (x). The expected value of g(X) is: X E(g(X)) = g(x)f (x) x if X is discrete and Z ∞ g(x)f (x) dx E(g(X)) = −∞ if X is continuous. Exercise 5-3: The number of sales per month have the probability distribution: x f (x) 4 5 6 7 8 9 1 12 1 12 1 4 1 4 1 6 1 6 If the salesman is paid a bonus of 2X − 1, find the expected amount of bonus. Answer: 12.67 Theorem: If a and b are constants, E(aX + b) = aE(X) + b. Exercise 5-4: Solve the previous problem with a second method. 19 Variance Let X be a random variable with probability distribution f (x) and mean µ. The variance of X is X σ 2 = E[(X − µ)2 ] = (x − µ)2 f (x), if X is discrete x σ 2 = E[(X − µ)2 ] = Z ∞ (x − µ)2 f (x) dx, if X is continuous −∞ The square root of variance, σ is called the standard deviation of X. Theorem: The variance of a random variable X is σ 2 = E(X 2 ) − µ2 . Exercise 5-5: Let the random variable X represent the number of typographical errors on a page.The probability distribution is given as: x 0 1 2 3 f (x) 0.51 0.38 0.10 0.01 Calculate σ 2 . Answer: 0.4979 Exercise 5-6: The weekly demand for a product is a continuous random variable X having the probability density ( 2(x − 1) 1 < x < 2 f (x) = 0 elsewhere Find the mean and variance of X. Answer: 5/3, 1/18 Exercise 5-7: A random variable X has density function x 0<x<1 2−x 16x<2 f (x) = 0 elsewhere Find the mean and the variance of X. Exercise 5-8: A random variable X has density function x 0<x<1 2−x 16x<2 f (x) = 0 elsewhere Find the expected value of Y = 3X 2 − 4X. 7 1 Answer: 3 · − 4 · 1 = − 6 2 20 Chebyshev’s Theorem Theorem: The probability that any random variable X will take a value within k stan1 dard deviations of the mean is at least 1 − 2 . That is: k P (µ − kσ < X < µ + kσ) > 1 − 1 k2 Exercise 5-9: A random variable X has a mean µ = 10 and a variance σ 2 = 4. Using Chebyshev’s theorem, find P (5 < X < 15) Answer: p > 21/25 Exercise 5-10: Compute P (µ − 2σ < X < µ + 2σ) where X has the density function ( 6x(1 − x) 0 < x < 1 f (x) = 0 elsewhere and compare with the result given in Chebyshev’s theorem. Exercise 5-11: Find the mean and variance of a random variable X whose probability distribution is: 0 5 10 20 x f (x) 0.17 0.33 0.41 0.09 Solution: µ = E(X) = 0 × 0.17 + 5 × 0.33 + 10 × 0.41 + 20 × 0.09 = 7.55 σ 2 = E(X 2 ) − µ2 = 0 × 0.17 + 25 × 0.33 + 100 × 0.41 + 400 × 0.09 − 7.552 = 28.2475 21 Exercise 5-12: The length of time cars have to wait at a traffic light in seconds has the density function: 1 e−x/5 0<x 5 f (x) = 0 elsewhere a) Find E(X) b) Find E(X 2 ) Solution: Using integration by parts, we can show that, for any nonzero a: Z xeax eax − 2 xeax dx = a a Z 2 ax 2xeax 2eax xe − x2 eax dx = + 3 a a2 a Z ∞ −x/5 xe dx = 5 a) E(X) = 5 0 Z ∞ 2 −x/5 xe 2 dx = 50 b) E(X ) = 5 0 Exercise 5-13: The probability density function of a random variable is: 3 1 + 7x − x2 0 < x < 4 116 f (x) = 0 elsewhere Find σ 2 Z(the variance). 4 3 µ= x + 7x2 − x3 dx = 2.4138 0 116 Z 4 3 2 2 2 2 3 4 σ = E(x ) − µ = x + 7x − x dx − µ2 = 1.0150 0 116 22 Week 6– Binomial Distributions Bernoulli Process: In a Bernoulli process, we make trials. The result of each trial is success or failure. (There are two options) The probability of success (p) remains constant from trial to trial. Exercise 6-1: We select a card from a standard deck. We replace the card and shuffle after each trial. What is the probability that we get 3 hearts after 6 trials? (If we do this without replacement, it is no longer Bernoulli) Binomial Distribution: In a Bernoulli trial, if the probability of success is p and the probability of failure is q = 1 − p, the number of successes after n trials is given by: n x n−x b(x; n, p) = p q x Note that n X b(x; n, p) = 1 x=0 The mean of the binomial distribution is µ = np and the variance is σ 2 = npq. Exercise 6-2: We conjecture 30% of the wells in an area are impure. We randomly select 10 wells and test. The results show 6 have impurity. What can we say about the conjecture? (If it were correct, we would see this with 4.7% chance) Exercise 6-3: Suppose airplane engines operate independently and fail with probability p = 0.4. Assuming that a plane makes a safe flight if at least one-half of its engines run, determine whether a 4-engine plane or 2-engine plane has a better chance. Exercise 6-4: The probability that a patient recovers after a heart operation is 0.9. Find the probability that, a) Out of the next 10 patients, 5 or more recover. b) Out of the next 8 patients, 4 or more recover. Exercise 6-5: Tests show that only 30% of the cars have correct tire pressure. We test 7 cars. Find the probability that a) 2 or more have correct pressure b) 3 − 6 have correct pressure. 23 Exercise 6-6: According to statistics of finance ministry, one in five cars have unpaid tax. Suppose we check 10 randomly chosen cars. a) What is the probability that exactly 4 of them have unpaid tax? b) What is the probability that 4 or more of them have unpaid tax? Answer: 0.088, 0.121 Multinomial Distribution If each trial has more than 2 possible outcomes, we have a multinomial distribution. If k outcomes result with probabilities p1 , . . . , pk , after n independent trials f (x1 , . . . , xk ; p1 , . . . , pk ; n) = where X xi = n and P n! px1 1 · · · pxkk x 1 ! . . . xk ! pi = 1. Exercise 6-7: At a traffic light, green signal stays for 15 seconds, yellow for 5 seconds and the red for 40 seconds. We pass through it 5 times. We encounter green light X1 times, yellow X2 times and red X3 times. Find the distribution of X1 , X2 , X3 . 5! x1 x2 x3 Answer: 0.25 0.083 0.67 x1 !x2 !x3 ! Exercise 6-8: In a large classroom, 55% of the students are from CENG, 35% are from ECE and 10% are from IE departments. We randomly choose 6 students. What is the probability that 3 are from CENG, 2 are from ECE and 1 is from IE? Solution: Using multinomial distribution, p= 6! 0.553 0.352 0.10 = 0.1223 3!2!1! 24 Exercise 6-9: Jale and Se¸cil are testing some equipment. Jale estimates that 10% are defective, Se¸cil estimates that 15% are defective. They test 30 items and find 4 defective ones. What is the probability of this outcome a) Assuming Jale is right? b) Assuming Se¸cil is right? c) Who is right? (Assuming one of them is right) 30 Solution: a) 0.14 0.926 = 0.1771 4 30 b) 0.154 0.8526 = 0.2028 4 c) Se¸cil is right with probability: 0.2028 = 0.53 0.2028 + 0.1771 Exercise 6-10: You receive a large shipment of electronic components. It is either ”good”, which means 5% is defective, or ”bad”, which means 15% is defective. You randomly choose a sample of 20 components and test them. You reject the shipment if there are 2 or more defectives, accept otherwise. a) Suppose the shipment is good. What is the probability of rejecting? b) Suppose the shipment is bad. What is the probability of accepting? Solution: 20 20 19 a) 1 − 0.95 + 0.05 × 0.95 1 = 1 − 0.7358 = 0.2642 20 b) 0.85 + 0.15 × 0.8519 1 20 = 0.1756 25 Exercise 6-11: In a court, there are 9 judges. They make the decision ”Guilty” or ”Innocent” independently. Each judge has the same rate of error. They find an innocent person guilty 20% of the time, and a guilty person innocent 30% of the time. An accused person is considered guilty if 7 or more judges find him guilty. a) Suppose you are innocent. What is the probability that the court will find you guilty? b) Suppose you are guilty. What is the probability that the court will find you innocent? Solution: 9 9 9 7 2 8 a) 0.2 × 0.8 + 0.2 × 0.8 + 0.29 7 8 9 = 0.000314 9 9 9 7 2 8 9 b) 1 − 0.7 × 0.3 + 0.7 × 0.3 + 0.7 7 8 9 = 1 − 0.4628 = 0.5372 26 Week 7– Hypergeometric and Negative Binomial Dist. Hypergeometric Distribution Hypergeometric distribution is based on sampling without replacement. Exercise 7-1: There are 2 black and 8 white balls in a basket. We randomly choose 3. What is the probability that all are white? 2 8 Answer: 0 3 = 0.467 10 3 In general, there are N items. We consider k of them as success and N − k as failures. We randomly choose n items without replacement. What is the probability that there are x successes? k N −k h(x; N, n, k) = x n−x N n , max{0, n − (N − k)} 6 x 6 min{n, k} Exercise 7-2: A lot of 40 components is unacceptable if there are 3 or more defectives. We test 5 randomly chosen elements and reject the lot if one is defective. What is the probability that exactly one defective is found assuming there are 3 total defectives? Answer: 0.3011 Theorem: The mean and variance of the hypergeometric distribution h(x; N, n, k) are N −n k k nk 2 , σ = ·n· 1− µ= N N −1 N N There is a close relationship between binomial and hypergeometric distributions. If n N , the distinction between with and without replacement disappears. Exercise 7-3: A factory reports that of the 5000 tires sent to a local distributor, 1000 are slightly blemished. You purchase 10. What is the probability that exactly 3 are blemished? Answer: 0.2015 ≈ 0.2013 Exercise 7-4: There are 500 students in a CENG department. 150 use Linux and the rest use Windows on their personal computers. We randomly choose 7 students. What is the probability that 4 of them use Linux? Answer this using a) Hypergeometric distribution. b) Binomial distribution approximation. Answer: 0.09659, 0.09724 Exercise 7-5: There are 600 cars in the parking area. 150 are Turkish and the rest are foreign made. We randomly choose 12 cars. What is the probability that 6 of them are Turkish? Answer this using 27 a) Hypergeometric distribution. b) Binomial distribution approximation. Exercise 7-6: A network makes errors in 1500 bits per 100000 bits transmitted. Each packet consists of 100 bits. If there are 4 or more errors per packet, we request retransmission. a) Assuming we can detect all errors, what is the probability of retransmission request? b) Assuming we can detect at most 6 errors per packet, what is the probability of retransmission request? What is the probability of accepting a packet with errors? Solution: For one bit, error probability is q = 1500/100000 = 0.015 and correct arrival probability is: p = 1 − q = 0.985. a) If there are 0,1,2 or 3 errors, we do not request a transmission. If there are 4, 5, 6, . . . or 100 errors, we do. 100 97 3 100 99 100 98 2 100 1− p + p q+ p q + p q = 0.0642 1 2 3 b) Now a transmission if there are 4,5 or 6 errors: we assumewe request 100 96 4 100 95 5 100 94 6 p q + p q + p q = 0.0634 4 5 6 The probability that there are 7 or more errors is: 0.0642 − 0.0634 = 0.0008 Exercise 7-7: a) Of the 50 cars in the parking lot, 13 are using diesel fuel and 37 gasoline. We randomly choose 10. What is the probability that 5 are using diesel? b) Of the 500 people working at a hospital, 220 are female and 280 are male. We randomly choose 10. What is the probability that 5 are female? c) If you had to solve one of the above problems using an approximation, which one would you choose? a) or b)? Which approximation would you use? Explain. Solution: Using hypergeometric distribution, 13 37 5 5 a) p = = 0.0546 50 10 b) p = 220 280 5 5 = 0.2309 500 10 c) We can use binomial approximation to hypergeometric distribution. We should prefer part b) because n is larger and therefore we expect it to be a better approximation. This approximation gives 0.0664 for part a), which means %22 relative error. It gives 0.2289 for part b), which means %0.8 relative error. 28 Negative Binomial Distribution Consider an experiment where probability of success is fixed like in binomial. We are interested in k successes in x trials, but this time, we want the kth success to occur in xth trial. Exercise 7-8: In NBA championship, the team that wins four out of seven games is the winner. Suppose team A has probability 0.55 of winning a game over team B. a) What is the probability that team A will win the series in 6 games? b) What is the probability that team A will win the series? Answer: 0.1853, 0.6083 If the probability of success (p) and failure (q = 1 − p) are fixed, the probability that kth success occurs at trial x is. x − 1 k x−k ∗ b (x; k, p) = p q , x = k, k + 1, . . . k−1 The average number of trials until kth success is: µ= k p We can prove this starting with ∞ X i−1 k i p (1 − p)i−k µ= k − 1 i=k and using derivatives of geometric series. Exercise 7-9: In a sports tournament, the team that wins 5 out of 9 games passes that tour. Team A has probability 0.6 of winning any one game against team B. What is the probability that this tour ends in exactly 7 games? Solution: Team A may win in 7 games or team B may win in 7 games. Winner must win in the 7th game, so 6 6 5 2 p = 0.6 0.4 + 0.45 0.62 4 4 = 0.1866 + 0.0553 = 0.2419 Exercise 7-10: Suppose that the probability of male or female birth is 0.5. A couple wishes to have exactly two daughters, and they will continue to have babies until this condition is satisfied. What is the probability that the family has 2 sons? Answer: 0.188 29 Exercise 7-11: We throw a pair of dice until we get 6-6. What is the expected value of the number of throws? Answer: 36 Exercise 7-12: An oil company drills wells. Their probability of success is 0.2. They will stop at the third success. What is the average number of wells they drill? Answer: 15 Exercise 7-13: On a Saturday night, Alphan is playing a game on his phone. He wins with probability 0.23. His friends are waiting for him to go out, but Alphan says ”I will continue until I win 7 times”. C ¸ a˜gatay says: ”We will wait exactly 25 games.” O˜guz says: ”We will wait exactly 30 games.” Alphan is more optimistic, he thinks his friends will wait at most 10 games. a) What is the probability that C ¸ a˜gatay is right? b) What is the probability that O˜guz is right? c) What is the probability that Alphan is right? d) What is the probability that all are wrong? 24 Solution: a) 0.237 0.7718 = 0.0415 6 29 b) 0.237 0.7723 = 0.0396 6 9 8 7 6 7 3 7 2 7 c) 0.23 0.77 + 0.23 0.77 + 0.23 0.77 + 0.237 = 0.0021 6 6 6 6 d) 1 − 0.0415 − 0.0396 − 0.0021 = 0.9168 Exercise 7-14: You have started using your father’s car today. On each day, there is a probability of 0.01 that you make an accident. Your father says ”You can make a mistake at most twice. At your third mistake, I will take back the car”. If you use the car everyday, what is the probability you lose it on day 100? Solution: 99 × 0.013 × 0.9997 = 0.0018 2 Exercise 7-15: A biased coin have probability of 0.7 of coming Heads. We start tossing this coin. We will stop when we obtain 10 Tails. What is the probability we stop after 20 tosses? Solution: 19 × 0.710 × 0.310 = 0.0154 10 30 Week 8– Poisson Distribution Properties of Poisson Process • The number of outcomes occurring in one time interval is independent of the number that occurs in any other interval • The probability that a single outcome will occur in a very short time interval is proportional to the length of time interval • The probability that more than one outcome will occur in such a short time interval is negligible The probability distribution of the Poisson random variable X is: p(x; µ) = e−µ µx x! where µ is the average number of outcomes per unit time. Exercise 8-1: During an experiment, average number of radioactive particles passing through a counter in 1 millisecond is 4. What is the probability that 6 particles enter the counter in any given millisecond? Answer: 0.1042 Exercise 8-2: Average number of tankers arriving at a port is 10. The facilities can handle at most 15 tankers per day. What is the probability that tankers have to be turned away on any given day? Answer: 0.0487 Theorem: Both the mean and the variance of the Poisson distribution are µ. Theorem: Let X be a binomial random variable with probability distribution b(x; n, p). When n → ∞, p → 0 and np → µ remains constant, b(x; n, p) → p(x, µ) Exercise 8-3: In a factory, the probability of an accident on a given day is 0.005 and accidents are independent of each other. What is the probability that in any given period of 400 days a) There will be one accident? a) There will be at most three accidents? Answer: 0.271, 0.857 31 Exercise 8-4: For a certain type of copper wire, it is known that, on the average, 1.5 flaws occur per millimeter. Assuming that the number of flaws is a Poisson random variable, what is the probability that in a certain portion of the wire of length 5 millimeters a) No flaw occurs ? b) 10 or more flaws occur? Solution: λt = 5 × 1.5 = 7.5 e−λt (λt)x = e−7.5 = 5.5308 × 10−4 a) x! b) ∞ X e−7.5 7.5x x1 0 x! = 1 − 0.7764 = 0.2236 Exercise 8-5: Of all the computers in the campus, 2% have Ubuntu installed. We randomly select 250 and test. What is the probability that we observe Ubuntu in 13 of them? Answer using a) Binomial distribution. b) Poisson distribution. Solution: 250 a) 0.0213 0.98237 = 1.189 × 10−3 13 e−5 513 = 1.321 × 10−3 b)λ = 250 × 0.02 = 5 13! Exercise 8-6: On average, 1 person in 1000 make a numerical error while preparing income tax form. If 10000 forms are selected at random and examined, what is the probability that 15 or more contain an error? Exercise 8-7: The number of customers arriving per hour at a auto service follows a Poisson distribution with mean λ = 7. a) Find the probability that, at a certain hour, no customers come. b) Find the probability that, within two hours, at least 10 and at most 20 customers come. c) Find the mean number of arrivals during a 2-hour period. −4 Answer: 9.12 × 10 , 0.8427, 14 32 Exercise 8-8: Aysun has analyzed several year’s lists and found that Emre hoca fails 15 students per semester on average. But this year he failed 25. So Aysun thinks Emre hoca must have started using different limits, because the probability of such an outcome is very low assuming he is using the old system. Nesib points out that the probability that exactly 15 student fails is also low. He says Emre hoca is probably using the usual system. Let number of failed students be n. Assuming µ = 15, find the probability that a) n = 25 b) n = 15 c) n > 25 d) 10 6 n 6 20 e) Who is right, Aysun or Nesib? Solution: We have to use Poisson distribution, because only the average is given. a) 0.9938 − 0.9888 = 0.0005 b) 0.5681 − 0.4657 = 0.1024 c) 1 − 0.9888 = 0.0112 d) 0.9170 − 0.0699 = 0.8471 e) Probably Aysun is right, because part c) gives 1% probability for such a result. Exercise 8-9: The probability that a cell phone rings in any given second is 0.0025. Find the probability that it rings 4 times or more in an hour, using: a) Exact method. b) An approximation. Solution: a) Using binomial distribution, we find the probability as: 3600 3600 0 3600 1− 0.0025 0.9975 + 0.00251 0.99753599 0 1 3600 3600 2 3598 3 3597 + 0.0025 0.9975 + 0.0025 0.9975 2 3 = 1 − [0.0001 + 0.0011 + 0.0050 + 0.0149] = 0.9789 b) Using Poisson distribution, the average per hour is: 0.0025 × 3600 = 9. Using the table, we find the probability as: 1 − 0.0212 = 0.9788 33 Exercise 8-10: You are in the real estate business and on average, you sell 17 houses per month. Find the probability of a) Good month. (25 or more sales) b) Normal month. (10 − 24 sales) c) Bad month. (2 − 9 sales) d) Terrible month. (0 − 1 sales) (Include 8 digits for part d) Solution: Using the Table on Poisson Probability Sums, we obtain: a) 1 − 0.9594 = 0.0406 b) 0.9594 − 0.0261 = 0.9333 c) 0.0261 − 0.0000 = 0.0261 Using the Poisson formula, we obtain: 0 17 171 −17 + = 7.45 × 10−7 d) e 0! 1! Exercise 8-11: You work in a warehouse which receives 2 orders per hour on average. It is open 8 hours per day. If on any given day you receive 24 or more orders, you call it a difficult day. If you receive 8 − 23 orders, you call it a normal day. If you receive 7 or less, you call it an easy day. Find the probability of experiencing a) A difficult day b) A normal day c) An easy day d) No orders. (8 digits) Solution: Average per day = 2 × 8 = 16. Using the Table on Poisson Probability Sums, we obtain: a) 1 − 0.9633 = 0.0367 b) 0.9633 − 0.0100 = 0.9533 c) 0.0100 Using the Poisson formula, we obtain: 160 d) e−16 = 1.12 × 10−7 0! 34 Week 9– Normal Distribution Normal distribution is the most important continuous probability distribution in statistics. The density of the normal random variable X, with mean µ and variance σ 2 is 1 2 e− 2σ2 (x−µ) n(x; µ, σ) = √ , 2π σ −∞ < x < ∞ The curve is symmetric about x = µ, which is its maximum. It asymptotically approaches the x−axis as we go away from center. The total area under the curve is 1. x−µ We can prove this using z = . σ Areas Under the Normal Curve To find the probability that x1 < X < x2 , we have to compute Z x2 1 2 1 e− 2σ2 (x−µ) dx P (x1 < X < x2 ) = √ 2π σ x1 This can be transformed into Z z2 1 2 1 P (z1 < Z < z2 ) = √ e− 2 z dz 2π z1 where Z is a normal random variable with mean 0 and variance 1. This is called standard normal distribution. Using polar coordinates, we can prove that r Z ∞ π −ax2 e dx = a −∞ Derivative with respect to a gives Z ∞ 2 −ax2 xe −∞ √ π dx = √ 2a a Exercise 9-1: Given a standard normal distribution, find the area of the curve a) to the right of z = 1.84 b) between z = −1.97 and z = 0.86 Answer: 0.0329, 0.7807 Exercise 9-2: Given a standard normal distribution, find k such that P (Z > k) = 0.3015. Answer: k = 0.52 Exercise 9-3: A certain type of battery lasts, on average, 3 years with a standard deviation of 0.5 years. Assuming battery life is normally distributed, find the probability that a given battery lasts less than 2.3 years. Answer: 0.0808 35 Exercise 9-4: The average grade for an exam is 74 and the standard deviation is 7. If 12% of the class get A, what is the lowest possible A and highest possible B? Assume grades are distributed normally. Answer: 83, 82 Exercise 9-5: Find the value of k such that the area under the standard normal curve between −k < z < k is equal to 0.762. Solution: P (−k < z < k) = 0.762 P (0 < z < k) = 0.381 P (−z < k) = 0.5 + 0.381 = 0.881 Using the table we find k = 1.18 Exercise 9-6: The IQ’s of 600 applicants to a certain college are approximately normally distributed with µ = 115 and σ = 12. If the college requires an IQ of at least 95, how many of them will be rejected? Note that IQ’s are rounded to the nearest integer. Solution: 94.5 − 115 = −1.71 12 P (Z < −1.71) = 0.0436 Z= 0.0436 × 600 = 26 Exercise 9-7: The average time for a trip from your home to work is 24 minutes with a standard deviation of 3.8 minutes. Assume the trip times are normally distributed. You leave home at 08:35 and you must be at work by 09:00. What is the probability that you will be late? Answer: P (z > 0.26) = 1 − 0.6026 = 0.3974 Exercise 9-8: The average life of a small motor is 10 years with a standard deviation of 2 years. The manufacturer replaces free all motors that fail while under guarantee. To replace only 3%, how long a guarantee should be offered? Assume lifetime of a motor follows a normal distribution. 36 Exercise 9-9: Let random variable x have a normal distribution with µ = 710 and σ = 93. a) Find a such that P (710 − a < x < 710 + a) = 0.76. b) Find b such that P (710 < x < b) = 0.36. c) Find c such that P (x < c) = 0.14. d) Find the probability that x > 1000. Solution: a) 109.275 0.76 + 0.5 = 0.88 P (z < k) = 0.88 2 ⇒ b) 0.36 + 0.5 = 0.86 P (z < k) = 0.86 ⇒ k = 1.175 a = 1.175 × 93 = k = 1.08 b = 710 + 1.08 × 93 = 810.44 c) z = −1.08 c = 710 − 1.08 × 93 = 609.56 1000 − 710 = 3.12 P (z > 3.12) = 1 − 0.9991 = 0.0009 d) 93 Exercise 9-10: If the function f (x) = ke−x Z ∞ 2 Solution: Let I = e−x /3 dx. Then 2 /3 is a probability distribution, what is k? −∞ I 2 Z ∞ = −x2 /3 e −∞ Z ∞ Z −∞ 2π e−y 2 /3 dy ∞ e−(x 2 +y 2 )/3 dxdy −∞ Z = 0 ∞ dx −∞ = Z Z ∞ e−r 2 /3 r drdθ 0 = 3π Therefore I = √ 1 3π and k = √ . 3π Second Method: We know that the normal distribution √ 1 2 1 e− 2σ2 (x−µ) 2π σ is a probability distribution. If we choose µ = 0 and 2σ 2 = 3 we obtain the given function, therefore r r 3 1 2 1 σ= , k=√ =√ 2 2π 3 3π 37 Exercise 9-11: The average height of women is 161 cm with a standard deviation of 6 cm and the average height of men is 173 cm with a standard deviation of 7 cm. A mirror in a shopping mall has dimensions such that 85% of women (equally distributed between higher and lower than average values) can use it comfortably. What percentage of men can use it comfortably? (Assume height distribution is normal) Solution: 0.85 + 0.5 = 0.925 2 P (−k < z < k) = 0.85 ⇒ k = 1.44 161 + 1.44 × 6 = 169.64 161 − 1.44 × 6 = 152.36 So the mirror was designed for people with height between [152.36 − 169.64]. For men, these correspond to z values: 152.36 − 173 169.64 − 173 = −0.48, = −2.95 7 7 P (−2.95 < z < −0.48) = 0.3156 − 0.0016 = 0.314 = 31.4% Exercise 9-12: In a large scale international examination, students in the top 1.5% get A and the students in the top 3.5% after them get B. We are given that the limits of B are [509.44 − 534.16]. a) What is the average (µ) of this distribution? b) What is the standard deviation (σ) of this distribution? (Assume grade distribution is normal) Solution: P (z > z1 ) = 1.5% = 0.015 P (z > z2 ) = 5% = 0.05 ⇒ ⇒ z1 = 2.17 z1 = 1.645 534.16 − µ σ 509.44 − µ 1.645 = σ 2.17 = ⇒ µ = 431.98, 38 σ = 47.09 Week 10– Normal Approximation to the Binomial Theorem: If X is a binomial random variable with mean µ = np and variance σ 2 = npq, then the limiting form of the distribution of X − np Z= √ npq as n → ∞ is the standard normal distribution n(z; 0, 1). Exercise 10-1: The probability that a patient recovers from a rare blood disease is 0.4. If 100 people contract this disease, what is the probability that fewer than 30 survive? Answer: x = 29.5, z = −2.14, P = 0.0162 Exact Result= 0.0148 Exercise 10-2: In a multiple choice exam, a student answers 80 questions randomly. There are 4 answers for each question. What is the probability that the student guesses between 25-30 (inclusive) of the questions correctly? Answer: 0.1196 Exact Result= 0.1193 Exercise 10-3: A company produces component parts for an engine. Part specifications suggest that 95% of items meet specifications. The parts are shipped to customers in lots of 100. a) What is the probability that more than 2 items in a lot will be defective? b) What is the probability that more than 10 items in a lot will be defective? Answer: 0.8749, 0.0059 Exercise 10-4: In a digital communication channel, the probability that a bit is received in error is 10−5 . If 16 million bits are transmitted, what is the probability that more than 150 errors occur? Answer: 0.7734 Exercise 10-5: Statistics show that on a Saturday night 1 out of every 10 drivers on the road is drunk. 400 drivers are randomly checked. Let’s call the number of drunk drivers n. What is the probability that a) n < 32? b) 49 < n? c) 35 6 n 6 46? Solution: Using normal approximation to binomial, we find √ µ = 400 × 0.1 = 40, σ = 400 × 0.1 × 0.9 = 6 a) x = 31.5, z = 31.5 − 40 = −1.42 6 39 P (z < −1.42) = 0.0778 49.5 − 40 = 1.58 6 P (1.58 < z) = 1 − 0.9429 = 0.0571 b) x = 49.5, z = c) x = 34.5, z = −0.92, x = 46.5, z = 1.08 P (−0.92 < z < 1.08) = 0.8599 − 0.1788 = 0.6811] Exercise 10-6: In a shipment of 500 identical products, 30 are defective. We randomly choose 20. Find the probability that 2 are defectives among the 20 a) Using the exact method b) Using an approximation. Exercise 10-7: A coin is tossed 400 times. We obtain n heads. Use the normal curve approximation to find the probability that 185 6 n 6 210. Exercise 10-8: Suppose 15% of all cars in Ankara are white. We observe the Eski¸sehir Road and count passing cars. We observe n white cars in a total number of 400. What is the probability that 50 6 n 6 70? Answer: P (−1.47 < z < 1.47) = 0.8584 Exercise 10-9: There are 3000 students in a university and 750 are freshmen. We randomly choose 10 students. What is the probability that 8 of them are freshmen? Solve in two different ways. (6 digits after point) Solution: Hypergeometric distribution gives: 750 2250 8 2 = 0.000377 3000 10 Binomial approximation with p = 750 = 0.25 gives 3000 10 0.258 0.752 = 0.000386 8 Exercise 10-10: We toss a single die 90 times. What is the probability we obtain 20 or more sixes? Use normal approximation to binomial: r 1 1 5 µ = 90 · = 15, σ = 90 · · = 3.54 6 6 6 19.5 − 15 z= = 1.27 3.54 P (z > 1.27) = 1 − 0.8980 = 0.1020 40 Week 11– Gamma and Exponential Distributions The gamma function is defined by: Z ∞ xα−1 e−x dx, Γ(α) = α>0 0 Using integration by parts, we can show that Γ(α) = (α − 1)Γ(α − 1) Γ(1) = 1 Therefore Γ(n) = (n − 1)! For a positive integer n Gamma Distribution: The continuous random variable X has a gamma distribution with parameters α and β if its density function is given by α−1 −x/β x e x>0 β α Γ(α) f (x; α, β) = 0 elsewhere where α > 0, β > 0 Theorem: The mean and variance of the gamma distribution are σ 2 = αβ 2 µ = αβ, Exercise 11-1: In a biomedical research study, it was determined that the survival time, in weeks, of an animal subjected to radiation has a gamma distribution with α = 5 and β = 10. Z What is the probability that an animal survives more than 30 weeks? Hint: x4 e−x dx = −e−x x4 + 4x3 + 12x2 + 24x + 24 Answer: 0.8153 41 Exponential Distribution: The continuous random variable X has an exponential distribution with parameter β if its density function is given by −x/β e x>0 β f (x; β) = 0 elsewhere where β > 0 Theorem: The mean and variance of the exponential distribution are σ2 = β 2 µ = β, Exercise 11-2: A system contains a component with time to failure T . The random variable T is modeled by exponential distribution with mean time to failure β = 5. if 5 of these components are installed, what is the probability that at least 2 are functioning at the end of 8 years? Answer: 0.2667 Exercise 11-3: The length of time for one individual to be served at a cafeteria is a random variable having an exponential distribution with a mean of 6 minutes. What is the probability that a person is served in less than 4 minutes on at least 5 of the next 7 days? Answer: 0.2052 42 √ Exercise 11-4: The lifetime of an electronic component has µ = 40 and σ = 20 2. Nilay thinks that the distribution is gamma, but Mehmet thinks it is normal. The only other information they have about the population is that 1.7% of the components have a lifetime larger than 120. Who is right and why? Solution: According to Nilay: αβ = 40, αβ 2 = 800 ⇒ α = 2, β = 20 Z ∞ 1 x1 e−x/20 dx P (x > 120) = 2 20 Γ(2) 120 = 7e−6 = 0.017 According to Mehmet: Z= 120 − 40 √ = 2.83 20 2 P (Z > 2.83) = 1 − 0.9977 = 0.0023 %1.7 = 0.017 Clearly, Nilay is right. Exercise 11-5: The length of time you have to wait at the cafeteria is a random variable having an exponential distribution with a mean of 120 seconds. If you wait more than 400 seconds, you call it an unlucky day. If you eat at the cafeteria 20 days a month, what is the probability that you experience 2 or more unlucky days in a month? Solution: First we have to find the probability of unlucky days: ∞ Z ∞ 1 −x/120 −x/120 −400/120 = 0.0357 e dx = −e =e 120 400 400 1 − 0.0357 = 0.9643 20 20 0 20 1 19 1− 0.0357 0.9643 + 0.0357 0.9643 0 1 = 0.1588 43 Exercise 11-6: A random variable X is modeled by gamma distribution with α = 2, β = 8. Find the probability that P (X < 7). Solution: Z 7 P (X < 7) = 0 xe−x/8 dx 82 Γ(2) Γ(2) = 1. Using integration by parts with u = x, dv = e−x/8 dx we obtain: 7 −x/8 x + 1 P (X < 7) = −e 8 0 =1− 15 −7/8 e 8 = 0.2184 Exercise 11-7: A random variable X is modeled by gamma distribution with α = 2, β = 6. Find the probability that P (X < 7). Solution: Z P (X < 7) = 0 7 xe−x/6 dx 62 Γ(2) Γ(2) = 1. Using integration by parts with u = x, dv = e−x/6 dx we obtain: x 7 P (X < 7) = −e−x/6 + 1 6 0 =1− 13 −7/6 e 6 = 0.3253 44 Week 12– Sampling Distributions A population consists of the totality of the observations with which we are concerned. A sample is a subset of the population. Any sampling procedure that produces inferences that consistently overestimate or underestimate some characteristic of the population is said to be biased. Any function of the random variables constituting a random sample is called a statistic. The probability distribution of a statistic is called a sampling distribution. Theorem: (Central Limit Theorem) If X is the mean of a random sample of size n taken from a population with mean µ and finite variance σ 2 , then the limiting form of the distribution of X −µ √ Z= σ/ n as n → ∞ is the standard normal distribution n(z; 0, 1). In other words, sampling distribution of X will be normal even if the population distribution is not. Exercise 12-1: An electrical firm manufactures light bulbs that have a lifetime of mean 800 hours and standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an average lifetime less than 775 hours. Answer: 0.0062 Exercise 12-2: An auto part must have a diameter of 5 mm. We know that population σ = 0.1mm. We choose 100 parts randomly. The sample average is x = 5.027 mm. Can we say the population mean is 5 mm? Answer: 0.007 Exercise 12-3: The bus trip to a campus takes on average, 28 minutes with a standard deviation of 5 minutes. In a week, the bus makes 40 trips. What is the probability that weekly average is above 30 minutes? Assume the mean is measured to nearest minute. Answer: P (z > 3.16) = 0.0008 45 Theorem: If independent samples of size n1 and n2 are drawn at random from two populations with means µ1 and µ2 and variances σ12 and σ22 respectively, then the sampling distribution of the differences of means X 1 − X 2 is approximately normally distributed σ2 σ2 with mean and variance given by µ1 − µ2 and 1 + 2 . In other words n1 n2 Z= (X 1 − X 2 ) − (µ1 − µ2 ) q 2 σ1 σ2 + n22 n1 is approximately a standard normal variable. Exercise 12-4: We test the strength of steel cables manufactured by companies A and B. The standard deviations of both are 5 and we test 30 cables from each. The results are: xA = 49.5, xB = 45.5, xA − xB = 4 Company B claims the population means are the same. What is the probability of seeing this result if they are really the same? Solution: Z = r 4−0 = 3.10 25 25 + 30 30 P (Z > 3.10) = 1 − 0.999 = 0.001 Exercise 12-5: The televisions of manufacturer A have a mean lifetime of 6.5 years and a standard deviation of 0.9 years. Those of manufacturer B have a mean lifetime of 6.0 years and a standard deviation of 0.8 year. We take a random sample of 36 from A and 49 from B. What is the probability that sample from A will have a mean lifetime at least 1 year more than sample of B? Answer: P (z > 2.65) = 0.0040 46 Exercise 12-6: A certain machine makes electrical resistors having a mean resistance of 50 ohms and a standard deviation of 3 ohms. We choose a random sample of size n. What is the probability that the average resistance of the sample is less than 49.7 ohms, if the sample size is a) n = 10? b) n = 50? c) n = 250? Answer: P (z < −0.32) = 0.3745, P (z < −1.58) = 0.0571 P (z < −0.71) = 0.2389, Exercise 12-7: We randomly choose 35 students from school A and 45 student from school B. (There are thousands of students in each school) We give them a mathematics test and find that sample averages are 55 and 60. The standard deviations are 18 and 17 respectively. What is the probability of seeing this result if the schools have the same average? Solution: (60 − 55) − 0 Z= r 182 172 + 35 45 = 1.26 P (Z > 1.26) = 1 − 0.8962 = 0.1038 47 Exercise 12-8: Average lifetime of an electronic component is 87.0 months, with a standard deviation of 9.0 months. Assume normal distribution. a) What is the probability that a single component will have a lifetime between 86.5 and 87.5 months? b) What is the same probability for a sample average if sample size is 100? Solution: 86.5 − 87 = −0.06, a) z1 = 9 z2 = 87.5 − 87 = 0.06 9 P (−0.06 < z < 0.06) = 0.5239 − 0.4761 = 0.0478 b) z3 = 86.5 − 87 √ = −0.56, 9/ 100 z4 = 87.5 − 87 √ = 0.56 9/ 100 P (−0.56 < z < 0.56) = 0.7123 − 0.2877 = 0.4246 48 Week 13– Confidence Intervals Suppose we know the variance σ 2 of a population and we are trying to find the mean. The sample mean is distributed normally around the population mean, so P (−zα/2 < Z < zα/2 ) = 1 − α where Z= X −µ √ σ/ n and zα/2 denotes the z−value such that the area to the right is α/2. 1−α α/2 −zα/2 α/2 zα/2 We can rewrite this as: σ σ P X − zα/2 √ < µ < X + zα/2 √ =1−α n n This is called the 100(1 − α)% confidence interval for µ. Exercise 13-1: Average zinc concentration from a sample of 36 measurements is 2.6 grams per milliliter. Find the 95% and 99% confidence intervals for mean zinc concentration in the river. Assume σ = 0.3. Answer: [2.50, 2.70], [2.47, 2.73] . Exercise 13-2: A population has σ = 40 and we are trying to determine the mean. How large a sample do we need if we want to be 95% sure that we are making an error of 15 or less? Solution: P (Z < k) = 0.975 40 1.96 × √ 6 15 n ⇒ ⇒ k = 1.96 n > 28 49 Exercise 13-3: A random sample of 130 units have average 36 and standard deviation 0.7. a) Find a 90% confidence interval. b) How large a sample do we need if we want to be 90% sure that sample mean is within 0.05 of the true mean? Answer: [35.90 − 36.10], 531 Exercise 13-4: 200 high school students in a city are randomly chosen and given a mathematics test. The mean and standard deviation of the sample are 46 and 14. a) Find a 99% confidence interval for the mean. b) Find the necessary sample size if we want the 99% confidence interval to have size 1. Exercise 13-5: A sample of apple juice is tested for arsenic content. The standard deviation is 1.8 ppb (part per billion), the sample size is 94 and the sample average is 9 ppb. The distribution is normal. a) Find a 80% confidence interval for the population average. b) Find a sample size such that the 80% confidence interval will be half of what you found in part a). c) Find a sample size such that the 99.5% confidence interval will be the same size as what you found in part a). Solution: We will use Area= 0.8/2 + 0.5 = 0.9 ⇒ z = 1.28 . 1.8 σ a) √ z = √ 1.28 = 0.2376, therefore confidence interval is: n 94 [9 − 0.2376, 9 + 0.2376] = [8.7624, 9.2376] b) We have to find n such that 1.8 √ 1.28 = 0.2376/2 = 0.1188 n n = 376 Or, we can simply multiply 94 by 4: 94 × 4 = 376 c) Area= 0.995/2 + 0.5 = 0.9975 ⇒ z = 2.81 . 1.8 √ 2.81 = 0.2376 n n = 453 50 Week 14– Prediction Intervals Suppose we know the variance σ 2 of a population but do not know the mean µ. We have a random sample of size n with average x. We want to predict the value of a single future observation x0 . If we define a new variable x − x0 , its variance will be σ2 + σ2 n therefore the standard deviation is: r σ 1+ 1 n So 100(1 − α)% prediction interval is: r r 1 1 X − zα/2 σ 1 + < x0 < X + zα/2 σ 1 + n n Exercise 14-1: Let sample size be 100, sample average 290, population standard deviation 32. a) Construct a 99% confidence interval. b) Construct a 99% prediction interval. Answer: [282, 298], [207, 373] Exercise 14-2: The average weight gain for a sample of 40 mice is 5.6 grams. The population standard deviation is 1.3. a) Construct a 90% confidence interval. b) Construct a 90% prediction interval. Answer: [5.26, 5.94], [3.43, 7.77] 51
© Copyright 2024