A Solution Manual For the Book: Probability and Statistics: For Engineering and the Sciences (7th Edition) by Jay L. Devore John L. Weatherwax∗ October 18, 2005 Introduction This solution manual was prepared for the seventh edition of Devore’s textbook. I would suspect that other editions would be very similar. Some Useful Formulas in Linear Regression X X 1X 2 yi SST = Syy = (yi − y¯)2 = yi2 − n X SSE = (yi − yˆi )2 error sum of squares SSR = SST − SSE regression sum of squares SSR SSE R2 = =1− . SST SST total sum of squares (1) (2) (3) (4) Note that R2 is the percent of variance explained and can be calculated both in and out of sample (with coefficients estimated using the in sample data). ∗ [email protected] 1 Probability Problem Solutions Note all R scripts for this chapter (if they exist) are denoted as ex2 NN.R where NN is the section number. Exercise 2.1 Part (a): The sample space S is the set of all possible outcomes. Thus using the integer shorthand suggested in the problem for all of the possible outcomes we have S = {1324, 1342, 3124, 3142, 1423, 1432, 4123, 4132, 2314, 2341, 3214, 3241, 2413, 2431, 4213, 4231} . From which we see that there are 16 elements. Part (b): This would be all number strings that begin with a 1 or A = {1324, 1342, 1423, 1432} . Part (c): This would be all number strings from S with a two in the first or second position or B = {2314, 2341, 3214, 3241, 2413, 2431, 4213, 4231} . Part (d): This is the union of the two sets A and B or A ∪ B = {1324, 1342, 1423, 1432, 2314, 2341, 3214, 3241, 2413, 2431, 4213, 4231} . Now A∩B = ∅ and A′ = {3124, 3142, 4123, 4132, 2314, 2341, 3214, 3241, 2413, 2431, 4213, 4231}. Exercise 2.2 Part (a): These would be A = {LLL, RRR, SSS} . Part (b): This would be B = {LRS, LSR, RLS, RSL, SLR, SRL} . Part (c): This would be C = {RRL, RRS, RLR, RSR, LRR, SRR} . 2 Part (d): This would be D = {RRL, RRS, RLR, RSR, LRR, SRR, LLR, LLS, LRL, LSL, RLL, SLL, SSR, SSL, SLS, SRS, LSS, RSS} . Part (e): We have D ′ = {LLL, RRR, SSS, LRS, RSL, SLR, LSR, RLS, SRL} , that is the event that all cars go the same direction or no cars go in the same direction. We have that C ∪ D = D and C ∩ D = C as C is a subset of D. Exercise 2.3 The total sample space for this problem would be S = {(S, S, S), (S, S, F ), (S, F, S), (F, S, S), (F, F, S), (F, S, F ), (S, F, F ), (F, F, F )} . Part (a): This would be A = {(S, S, F ), (S, F, S), (F, S, S)} . Part (b): This would be B = {(S, S, S), (S, S, F ), (S, F, S), (F, S, S)} . Part (c): This would happen if 1 functions and 2 or 3 (or both) function. These events are C = {(S, S, S), (S, S, F ), (S, F, S)} . Part (d): We have C′ A∪C A∩C B∪C B∩C = {(F, S, S), (F, F, S), (F, S, F ), (S, F, F ), (F, F, F )} = {(S, S, S), (S, S, F ), (S, F, S), (F, S, S)} = {(S, S, F ), (S, F, S)} = B since C is a subset of B = C since C is a subset of B . Exercise 2.4 Part (a): We have S = {F F F F, F F F V, F F V F, F V F F, V F F F, F F V V, F V F V, V F F V, F V V F, V F V F, V V F F, V V V F, V V V F, V V F V, V F V V, F V V V, VVVV}. 3 Part (b): These would be {F F F V, F F V F, F V F F, V F F F } . Part (c): These would be {F F F F, V V V V } . Part (d): These would be {F F F F, F F F V, F F V F, F V F F, V F F F } . Part (e): The union is {F F F F, F F F V, F F V F, F V F F, V F F F, V V V V } , while the intersection is {F F F F } . Part (f): The union is {F F F F, F F F V, F F V F, F V F F, V F F F, V V V V } , while the intersection is the empty set ∅. Exercise 2.5 Part (a): We could have 1, 2, or 3 for the first persons station assignment, 1, 2, or 3 for the second persons station assignment, and 1, 2, or 3 for the third persons assignment. Thus the sample space would be tuples of the form (i, j, k) where i, j, and k are taken from {1, 2, 3}. Part (b): This would be the outcomes {(1, 1, 1) , (2, 2, 2) , (3, 3, 3)} . Part (c): This would be the outcomes {(1, 2, 3) , (1, 3, 2) , (2, 1, 3) , (2, 3, 1) , (3, 1, 2) , (3, 2, 1)} . Part (d): This could be obtained by enumerating all elements from S (as in Part (a)) and removing any elementary events that have a two in them. 4 Exercise 2.6 Part (a): Our sample space is S = {3, 4, 5, 13, 14, 15, 23, 24, 25, 123, 124, 125, 213, 214, 215} . Part (b): This would be A = {3, 4, 5} . Part (c): This would be B = {5, 15, 25, 125, 215} . Part (b): This would be C = {3, 4, 5, 23, 24, 25} . Exercise 2.8 (the language of sets) Part (a): A1 ∪ A2 ∪ A3 . Part (b): A1 ∩ A2 ∩ A3 . Part (c): A1 ∩ (A2 ∪ A3 )′ . Part (d): (A1 ∩ A′2 ∩ A′3 ) ∪ (A′1 ∩ A2 ∩ A′3 ) ∪ (A′1 ∩ A′2 ∩ A3 ). Part (e): A1 ∪ (A2 ∩ A3 ). Exercise 2.10 Part (a): Three events that are mutually exclusive are the type of car bought from the types A = {Chevrolet, Pontiac, Buick} B = {Ford, Mercury} C = {Plymouth, Chrysler} . Then A, B, and C are mutually exclusive. Part (b): No. Consider the sets A, B = A and C defined as in Part (a). That is take B to be the same set as A. Then A ∩ B ∩ C = ∅ but A and B are equal and cannot be mutually exclusive. 5 Exercise 2.11 Part (a): 0.07 Part (b): 0.15 + 0.1 + 0.05 = 0.3. Part (c): 1 − 0.18 − 0.25 = 1 − 0.43 = 0.57. Exercise 2.12 Part (a): This is P (A ∪ B) = P (A) + P (B) − P (A ∪ B) = 0.5 + 0.4 − 0.25 = 0.65 . (5) Part (b): This would be P ((A ∪ B)′ ) = 1 − P (A ∪ B) = 1 − 0.65 = 0.35 . Part (c): This would be the event A ∩ B ′ . To compute its probability we recall that P (A) = P (A ∩ B) + P (A ∩ B ′ ) or 0.5 = 0.25 + P (A ∩ B ′ ) , so P (A ∩ B ′ ) = 0.25. Exercise 2.13 Part (a): A1 ∪ A2 is the event that we are awarded project 1 or project 2. Its probability can be calculated as P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 ) = 0.22 + 0.25 − 0.11 = 0.36 . Part (b): Since A′1 ∩ A′2 = (A1 ∪ A2 )′ this event is the outcome that we don’t get project 1 or project 2. This probability is then given by 1 − P (A1 ∪ A2 ) = 0.64 . Part (c): The event A1 ∪ A2 ∪ A3 is the outcome that we get one of the projects 1 or 2, or 3. Its probability is given by P (A1 ∪ A2 ∪ A3 ) = P (A1 ) + P (A2 ) + P (A3 ) − P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A3 ) = 0.22 + 0.25 + 0.28 − 0.11 − 0.05 − 0.07 + 0.01 = 0.53 . 6 Part (d): This event is we don’t get any of the three projects. Using the identity A′1 ∩ A′2 ∩ A′3 = (A1 ∪ A2 ∪ A3 )′ , its probability is given by P (A′1 ∩ A′2 ∩ A′3 ) = 1 − P (A1 ∪ A2 ∪ A3 ) = 1 − 0.53 = 0.47 . Part (e): This is the event that we don’t get project 1 and 2 but do get project 3. Its probability is given by using the fact that P ((A′1 ∩ A′2 ) ∩ A3 ) + P ((A′1 ∩ A′2 ) ∩ A′3 ) = P (A′1 ∩ A′2 ) , or with what we know P (A′1 ∩ A′2 ∩ A3 ) + 0.47 = 0.64 so P (A′1 ∩ A′2 ∩ A3 ) = 0.17 . Part (f): This is the event that we don’t get project 1 or 2 but do get project three. To find its probability we first notice that (A′1 ∩ A′2 ) ∪ A3 = (A1 ∪ A2 )′ ∪ A3 = ((A1 ∪ A2 ) ∩ A′3 )′ . Thus if we can compute the probability of (A1 ∪ A2 ) ∩ A′3 we can compute the desired probability. To compute this probability note that [(A1 ∪ A2 ) ∩ A′3 ] ∪ [(A1 ∪ A2 ) ∩ A3 ] = A1 ∪ A2 , and the two sets on the left-hand-side are disjoint so we have P ((A1 ∪ A2 ) ∩ A′3 ) + P ((A1 ∪ A2 ) ∩ A3 ) = P (A1 ∪ A2 ) . (6) From Part (a) we know the value of the right-hand-side is 0.36. To compute P ((A1 ∪A2 )∩A3 ) we note that by distributing the intersection over the unions we have (A1 ∪ A2 ) ∩ A3 = (A1 ∩ A3 ) ∪ (A2 ∩ A3 ) . We can now use Equation 5 to write the probability of the above event as P ((A1 ∪ A2 ) ∩ A3 ) = P (A1 ∩ A3 ) + P (A2 ∩ A3 ) − P ([A1 ∩ A3 )] ∩ [A2 ∩ A3 ]) = 0.05 + 0.07 − P (A1 ∩ A2 ∩ A3 ) = 0.05 + 0.07 − 0.01 = 0.11 . Using this in Equation 6 we have P ((A1 ∪ A2 ) ∩ A′3 ) + 0.11 = 0.36 so P ((A1 ∪ A2 ) ∩ A′3 ) = 0.25 . Finally with this we have the desired probability of P ((A′1 ∩ A′2 ) ∪ A3 ) = 1 − P ((A1 ∪ A2 ) ∩ A′3 ) = 1 − 0.25 = 0.75 . If anyone knows of a more direct method at obtaining this result please contact me. 7 Exercise 2.14 Part (a): Using P (A ∪ B) = P (A) + P (B) − P (A ∩ B) we have 0.9 = 0.8 + 0.7 − P (A ∩ B) , or P (A ∩ B) = 0.6. Part (b): This would be the event (A ∩ B ′ ) ∪ (A′ ∩ B). Since these two events are disjoint, the probability of it is given by P (A ∩ B ′ ) + P (A′ ∩ B) . Lets compute each one. Using A = (A ∩ B ′ ) ∪ (A ∩ B) we get P (A) = P (A ∩ B ′ ) + P (A ∩ B) so with what we know 0.8 = P (A ∩ B ′ ) + 0.6 . Thus we get that P (A∩B ′ ) = 0.2. Using the same method we compute that P (A′ ∩B) = 0.1. Thus the probability we want is given by 0.2 + 0.1 = 0.3. Exercise 2.15 Let G stand for a gas dryer and E stand for an electric dryer. Part (a): We are told that P ({GGGGG, EGGGG, GEGGG, GGEGG, GGGEG, GGGGE}) = 0.428 , and the event we want is the complement of the above event thus has a probability given by 1 − 0.428 = 0.572. Part (b): This would be 1 − P ({GGGGG}) − P ({EEEEE}) = 1 − 0.116 − 0.005 = 0.879 . Exercise 2.16 Part (a): The set would be {CDP, CP D, DCP, DP C, P CD, P DC} , and each would get a probability of 1/6. Part (b): This happens in two of the six samples so our probability is 2/6 = 1/3. Part (c): This happens in only one sample so our probability is 1/6. 8 Exercise 2.17 Part (a): There could be other statistical software besides SPSS and SAS. Part (b): P (A′) = 1 − P (A) = 0.7. Part (c): We have P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 0.3 + 0.5 − 0 = 0.8 , since P (A ∩ B) = 0 as there are no events in the set A ∩ B. Part (d): We have P (A′ ∩ B ′ ) = P ((A ∪ B)′ ) = 1 − P (A ∪ B) = 1 − 0.8 = 0.2 . Exercise 2.18 This event will happen if we don’t select a bulb rated 75 Watts on the first draw. That we 6 = 23 . The select one rated 75 Watt on the first draw will happen with probability 4+5+6 probability we require at least two draws in then 1 − 23 = 13 . Exercise 2.19 Let A be the event that A found a defect and similarly for B. Then in the problem statement we are told that 724 = 0.0724 10000 P (B) = 0.0751 P (A ∪ B) = 0.1159 . P (A) = Part (a): P ((A ∪ B)′ ) = 1 − P (A ∪ B) = 0.8841. Part (b): We need to compute P (B ∩ A′ ). To do this note that B = (B ∩ A′ ) ∪ (B ∩ A) and thus P (B) = P (B ∩ A′ ) + P (B ∩ A) . To use this we need to compute P (B ∩ A). We can get that since −P (A ∩ B) = P (A ∪ B) − P (A) − P (B) = 0.1159 − 0.0724 − 0.0751 = −0.0316 . Thus P (A ∩ B) = 0.0316. Using this we have 0.0751 = P (B ∩ A′ ) + 0.0316 so P (B ∩ A′ ) = 0.0435 . 9 Exercise 2.20 Part (a): The simple events for this problem are tuples containing the shift and whether or not the conditions of the accident were “unsafe” or “unrelated”. Thus we would have S = {(Day, Unsafe), (Swing, Unsafe), (Night, Unsafe), (Day, Unrelated), (Swing, Unrelated), · · · } . Part (b): This would be 0.1 + 0.08 + 0.05 = 0.23 . Part (c): We have P (Day) = 0.1 + 0.35 = 0.45 so P (Day′ ) = 1 − P (Day) = 0.55 . Exercise 2.21 Part (a): This would be 0.1. Part (b): This would be P (Low Auto) = 0.04 + 0.06 + 0.05 + 0.03 = 0.18 P (Low Homeowners) = 0.06 + 0.10 + 0.03 = 0.19 . Part (c): This would be P ((Low Auto, Low Home)) + = P ((Medium Auto, Medium Home)) + P ((High Auto, High Home)) 0.06 + 0.2 + 0.15 = 0.41 . Part (d): This is 1 − 0.41 = 0.59. Part (e): This is 0.04 + 0.06 + 0.05 + 0.03 + 0.1 + 0.03 = 0.31 . Part (f): This is 1 − 0.31 = 0.69. Exercise 2.22 (stopping at traffic lights) Part (a): This is P (A ∩ B) which we can evaluate using P (A ∩ B) = −(P (A ∪ B) − P (A) − P (B)) = P (A) + P (B) − P (A ∪ B) = 0.4 + 0.5 − 0.6 = 0.3 . 10 Part (b): This is P (A ∩ B ′ ). To compute this recall that P (A) = P (A ∩ B ′ ) + P (A ∩ B) so 0.4 = P (A ∩ B ′ ) + 0.3 . Thus P (A ∩ B ′ ) = 0.1. Part (c): This is the probability of the event (A ∩ B ′ ) ∪ (A′ ∩ B) or P (A ∩ B ′ ) + P (A′ ∩ B) . We know P (A ∩ B ′ ) from Part (b). To compute P (A′ ∩ B) recall that P (B) = P (B ∩ A′ ) + P (B ∩ A) or 0.5 = P (B ∩ A′ ) + 0.3 , so P (B ∩ A′ ) = 0.2. Thus P (A ∩ B ′ ) + P (A′ ∩ B) = 0.1 + 0.2 = 0.3 . Exercise 2.23 Part (a): This would happen in one way from 15 or a probability of 1 . 15 Part (b): This would happen with probability 4 6 2 2 = = . 15 15 5 Part (c): This is the complement of the event that both selected computers are laptops so 1 14 this given a probability of 1 − 15 = 15 . Part (d): This is 1− 1 6 8 − = . 15 15 15 Exercise 2.24 Since B = A ∪ (B ∩ A′ ) and the events A and B ∩ A′ are disjoint we have P (B) = P (A) + P (B ∩ A′ ) . As P (B ∩ A′ ) ≥ 0 we have that P (A) ≤ P (B). Since for a general events A and B we have (A ∩ B) ⊂ A ⊂ (A ∪ B) , applying the above result twice we have P (A ∩ B) ≤ P (A) ≤ P (A ∪ B) . 11 Exercise 2.25 From the problem statement we are told that P (A) = 0.7 P (B) = 0.8 P (C) = 0.75 P (A ∪ B) = 0.85 P (A ∪ C) = 0.9 P (B ∪ C) = 0.95 P (A ∪ B ∪ C) = 0.98 . Part (a): This is P (A ∪ B ∪ C) = 0.98. Part (b): This is 1 − P (A ∪ B ∪ C) = 0.02. Part (c): We want to evaluate P (A ∩ B ′ ∩ C ′ ). Drawing a Venn diagram with three sets we get the following mutually exclusive sets A ∩ B ∩ C′ , A ∩ B′ ∩ C , A′ ∩ B ∩ C , and A ∩ B ∩ C . To evaluate all of these we first compute P (A ∩ B), P (A ∩ C) and P (B ∩ C) using P (A ∩ B) = P (A) + P (B) − P (A ∪ B) = 0.7 + 0.8 − 0.85 = 0.65 . In the same way we find P (A ∩ C) = 0.7 + 0.75 − 0.9 = 0.55 P (B ∩ C) = 0.8 + 0.75 − 0.95 = 0.6 . Now using what we know in P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C) + P (A ∩ B ∩ C) , we have 0.98 = 0.7 + 0.8 + 0.75 − 0.65 − 0.55 − 0.6 + P (A ∩ B ∩ C) , and find P (A ∩ B ∩ C) = 0.53. Lets now compute P (A ∩ B ∩ C ′ ) using the Venn diagram. We have P (A ∩ B ∩ C ′ ) = P (A ∩ B) − P (A ∩ B ∩ C) = 0.65 − 0.53 = 0.12 . In the same way we have P (A ∩ B ′ ∩ C) = P (A ∩ C) − P (A ∩ B ∩ C) = 0.55 − 0.53 = 0.02 P (A′ ∩ B ∩ C) = P (B ∩ C) − P (A ∩ B ∩ C) = 0.6 − 0.53 = 0.07 . 12 Using these computed probabilities we get P (A ∩ B ′ ∩ C ′ ) = P (A) − P (A ∩ B ′ ∩ C) − P (A ∩ B ∩ C ′ ) − P (A ∩ B ∩ C) = 0.7 − 0.02 − 0.12 − 0.53 = 0.03 . Part (d): This would be the probability of the event (A ∩ B ′ ∩ C ′ ) ∪ (A′ ∩ B ∩ C ′ ) ∪ (A′ ∩ B ′ ∩ C) . Notice that this the union of disjoint sets and we have computed P (A ∩ B ′ ∩ C ′ ) in Part (c). Following the steps Part (c) (for the other two sets) we find P (A′ ∩ B ∩ C ′ ) = P (B) − P (A ∩ B ∩ C ′ ) − P (A′ ∩ B ∩ C) − P (A ∩ B ∩ C) = 0.8 − 0.12 − 0.07 − 0.53 = 0.08 ′ ′ P (A ∩ B ∩ C) = P (C) − P (A ∩ B ′ ∩ C) − P (A′ ∩ B ∩ C) − P (A ∩ B ∩ C) = 0.85 − 0.02 − 0.07 − 0.53 = 0.23 . Given these numbers we have that P (A ∩ B ′ ∩ C ′ ) + P (A′ ∩ B ∩ C ′ ) + P (A′ ∩ B ′ ∩ C) = 0.03 + 0.08 + 0.23 = 0.34 . Note this is different than the answer in the back of the book. If anyone sees anything wrong with what I have done please contact me. Exercise 2.26 Part (a): P (A′1 ) = 1 − P (A1 ) = 0.78. Part (b): P (A1 ∩ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∪ A2 ) = 0.12 + 0.07 − 0.13 = 0.06 . Part (c): Using the P (A1 ∩ A2 ) = P (A1 ∩ A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A′3 ) , or 0.06 = 0.01 + P (A1 ∩ A2 ∩ A′3 ) so P (A1 ∩ A2 ∩ A′3 ) = 0.05 . Part (d): This is 1 − P (have three defects) = 1 − P (A1 ∩ A2 ∩ A3 ) = 1 − 0.01 = 0.99 . 13 Exercise 2.27 Part (a): This is 1 1 . 5 = 10 2 Part (b): This would be 2 1 3 1 + 22 22 30 6+1 7 = = . 10 10 10 We can also evaluate this probability as 1 − P (no members have a name starting with C) = 1 − 3 2 2 0 10 = 1− 3 7 = . 10 10 Part (c): For this event we can select from the pairs {(1, 5), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)} . Thus we will have a probability for this event of 6 10 = 35 . Exercise 2.28 Part (a): This will happen if we have one of the elementary events (1, 1, 1), (2, 2, 2), and 3 = 19 . (3, 3, 3) and thus this event will happen with a probability of 27 Part (b): This is the complement of the event that all family members are assigned the same section. We calculated this probability in Part (a) above. Thus the probability we want is then 1 − 91 = 89 . Part (c): This event is represented by the following set elementary events {(1, 2, 3), (1, 3, 2), (2, 13), (2, 3, 1), (3, 1, 2), (3, 2, 1)} . Thus this event will happen with a probability of 6 27 = 91 . Exercise 2.29 Part (a): We have 26 choices for the first letter A-Z and then 26 choices for the second letter making 262 = 676 total choices. If we allow digits we must add another 10 characters (the digits 0-9) giving 36 total choices for characters the first and second location and giving 362 = 1296 total choices. 14 Part (b): These would be 263 = 17575 and 363 = 46656. Part (c): These would be 264 = 456976 and 364 = 1679616. 97786 . 1679616 Part (d): This would be 1 − Exercise 2.30 Part (a): This would be 8(7)(6). Part (b): This would be 30 6 Part (c): This would be 8 2 = 593775. 10 2 12 2 . Part (d): This would be the number from Part (c) over the number from Part (b). Part (e): This would be 8 6 + 10 + 6 593775 12 6 . Exercise 2.31 Part (a): This would be 9(27) = 243. Part (b): This would be 9(27)(15) = 3645. If we divide by 365 we get the number of years which is 9.986 our about 10 years. Exercise 2.32 Part (a): This would be 5(4)(3)(4) = 240. Part (b): This would be 1(1)(3)(4) = 12. Part (c): This would be 4(3)(3)(3) = 108. Part (d): The number of systems with at least one Sony component is equal to the total number of systems minus the number of ways to select components without a Sony component or 240 − 108 = 132. Part (e): The probability that we have at least one Sony component is 15 132 240 = 0.55. The probability that we have exactly one Sony component is given by 1(3)(3)(3) + 4(1)(3)(3) + 4(3)(3)(1) = 0.4125 . 240 Exercise 2.33 Warning: The solutions to Part (a) and (b) do not match the ones given in the back of the book. If anyone sees what I did that is incorrect please contact me. Part (a): This would be 15 9! = 1816214400, since we can pick the 9 players to be on 9 15 the field in 9 ways and then order them (select the pitcher, the first, second, and third baseman, etc.) in 9! ways. Part (b): This would be the number in Part (a) above multiplied by 9! or the number of ways we can specify the batting order. This would give 6.590679 1014. Part (c): This would be 5 3 10 6 = 2100. Exercise 2.34 Part (a): This would be Part (b): 5 4 = 5. Part (c): This would be 25 5 5 53130 = 53130. = 9.418 10−5. Part (d): This would be 5 4 + 55 = 0.0001129 . 53130 Exercise 2.35 Part (a): We have 20 = 38760 selections of six workers coming from the day shift. The 6 probability that all 6 selected workers will be from the day shift in our sample is 20 6 45 = 0.004758 . 6 Part (b): This would be 20 6 + 15 6 45 6 + 10 6 16 = 0.005398 . Part (c): This is 1 − P (all workers come from the same shift) = 0.9946. Part (d): We want to compute the probability that at least one of the shifts will be unrepresented in the sample of workers. This is the union of the events that there are unrepresented shifts (day, swing, or graveyard) in the sample. We compute this using one unrepresented shift = {the day shift is unrepresented and the others are} ∪ {the swing shift is unrepresented and the others are} ∪ {the graveyard shift is unrepresented and the others are} . Note that these probabilities are not mutually exclusive. We can count the number of samples in the first event on the right-hand-side of the above as 15 0 X 6 10 15 10 15 10 15 10 20 15 10 + + ··· + + = . 6 1 5 5 1 6 0 0 k 6−k k=0 15 10 Note that the first expression above i.e. 0 6 has two shifts unrepresented. The number of samples in the second event can be computed as 20 1 X 6 10 20 10 20 10 20 10 20 15 10 + + ··· + + = . 5 2 4 5 1 6 0 k 0 6−k k=1 Finally, the number of samples in the third event can be computed as 20 1 X 5 15 20 15 20 15 20 15 10 + + ··· + = . 5 2 4 5 k 1 6−k 0 k=1 Note that the above summation does not have the term corresponding to k = 6. We have been very careful in the above in avoiding double counting the number of events. When we add these all up we get 2350060. This is to be divided by the number of ways to select six 45 members from the 45 total. This is given by 6 . This gives a probably of 0.2885258. Exercise 2.36 See the python code ex2 36.py. When we run that code we get the output All possible orderings of the votes= [’AAABB’, ’AABAB’, ’AABBA’, ’ABAAB’, ’ABABA’, ’ABBAA’, ’BAAAB’, ’BAABA’, ’BABAA’, ’BBAAA’] Orderings where A leads (or equals) B= [’AAABB’, ’AABAB’, ’AABBA’, ’ABAAB’, ’ABABA’] Probability that A always leads (or equals) B= 0.5 Exercise 2.37 Part (a): There are 3(4)(5) = 60 possible experiments. 17 Part (b): There are 1(2)(5) = 10 possible experiments. Part (c): Note that we have 60 total experiments and thus 60! ways of ordering these experiments. We need to count the number of ways we can have the first five experiments have one of each of the five catalysts. Imagine the experiment that uses the first type of catalyst. Once that is fixed we have 3(4) = 12 possible orderings for the temperature and pressure values that would go with this catalyst. Thus for each of the five catalyst we have 12 choices for the other two variables. In total we have 125 choices for the two other variables for all five experiments. We have 5! ways of ordering the five different catalyst giving a total of 5! 125 ways to order the first five experiments. Following these five experiments we have (60 − 5)! ways to order the remaining experiments. This gives a probability of 5!125 5!125 (60 − 5)! = = 0.04556 . 60! 60 · 59 · 58 · 57 · 56 Exercise 2.38 Part (a): This is 6 2 4 + 1 15 3 5 1 + 5 + 3 15 3 6 3 = 0.2967 . Part (b): This is 4 3 = 0.074725 . Part (c): This is 4 1 5 6 1 1 15 3 = 0.2637 . Part (d): From the problem statement there are six 75 Watt bulbs and nine bulbs with different wattage. Let Si be the event we select the first 75 Watt bulb in the ith draw. We 18 have P (S1 ) = P (S2 |S1′ ) = P (S3|S1′ S2′ ) = P (S4 |S1′ S2′ S3′ ) = P (S5 |S1′ S2′ S3′ S4′ ) = 6 1 15 1 6 1 14 1 6 1 13 1 6 1 12 1 6 1 11 1 = 6 15 = 6 14 = 6 13 = 6 12 = 6 . 11 Then the probability, P, it is necessary to examine at least 6 bulbs is given by P = 1 − P (S1 ) − P (S1′ S2 ) − P (S1′ S2′ S3 ) − P (S1′ S2′ S3′ S4 ) − P (S1′ S2′ S3′ S4′ S5 ) = 1 − P (S1 ) − P (S2 |S1′ )P (S1′ ) − P (S3 |S1′ S2′ )P (S1′ S2′ ) − P (S4 |S1′ S2′ S3′ )P (S1′ S2′ S3′ ) − P (S5 |S1′ S2′ S3′ S4′ )P (S1′ S2′ S3′ S4′ ) = 1 − P (S1 ) − P (S2 |S1′ )P (S1′ ) − P (S3 |S1′ S2′ )P (S2′ |S1′ )P (S1′ ) − P (S4 |S1′ S2′ S3′ )P (S3′ |S1′ S2′ )P (S2′ |S1′ )P (S1′ ) − P (S5 |S1′ S2′ S3′ S4′ )P (S4′ |S1′ S2′ S3′ )P (S3′ |S1′ S2′ )P (S2′ |S1′ ) = 0.04195804 , when we use the numbers above. Exercise 2.39 Part (a): First pick from the first ten spots the five spots where we will put the cordless phones. These five spots can be picked in 10 ways. We can order these five cordless phones 5 in 5! ways. The other 10 phones can be placed in 10! ways. In total we have 10 5!(10!) , 5 ways in which the five cordless phones can be placed in the first ten spots. There are 15! ways to order all the phones. Thus the probability is given by 10 5!(10!) 5 = 0.0839 . 15! Part (b): For this part of the exercise we want the probability that after we service ten phones we will have serviced all phones of one type. This means that in the first ten phone there must be all five phones of one given type. To compute this probability note that we have three choices of the type of phone which willhave all of its repairs done in the first ten. Once we specify that phone type we have 10 locations in which we can place these 5 five phones and 5! ways in which to order them. This gives 3 10 5! ways to place the type 5 of five phones that will get fully serviced. We now have to place the remaining phones. We have 10! ways to place these phone but some of these ways will give only a single phone type 19 for the last five spots. Thus removing the 5! permutations from each of the two phone types finally gives us the probability 3 10 5!(10! − 2(5!)) 5 = 0.2517316 . 15! Warning: This is not the same answer as in the back of the book. If anyone sees an error in what I have done please contact me. Part (c): To have two phones of each type in the first six serviced means that we must 3 have two cordless, two corded, and two cellular phones. Now 52 = 103 is the number of ways to pick the two phones from each phone type that will go in the first six spots. We can 6 place the two cordless phones in 2 2! ways, then once these are placed we can place the two corded phones in 42 2! ways and finally the two cellular phones can be placed in 2! ways. With the phone in the first six locations determined we have to place the other 15 − 6 = 9 phones which can be done in 9! ways. This gives the probability 4 103 62 2! 2! (2!)9! 2 = 0.19980 . 15! Exercise 2.40 Part (a): Since 3 + 3 + 3 + 3 = 27 we have 27! , (3!)4 chain molecules. First we assume that each of the molecule types are distinguishable to get 27! and then divide by the number of orderings (3!) of the A, B, C, and D type molecules. Part (b): We would have 4·3·2·1 27! (3!)4 = 4!(3!)4 . 27! The denominator above is the number of chain molecules and the numerator is the number of ways of picking the ordering of the four molecules. We have four choices for the first molecule, three for the second molecule etc. Exercise 2.41 Part (a): This is 1 − P (no female assistant is selected) = 1 − 20 4 3 8 3 =1− 4 = 0.92857 . 56 Part (b): This probability is 4 4 4 1 = 8 5 4 = 0.0714 . 56 Part (c): This would be 1 − P (orderings are the same between semesters) = 1 − 1 = 0.9997520 . 8! Exercise 2.42 The probability that Jim and Paula sit at the two seats on the far left is 2(4!) 1 = = 0.06666 . 6! 15 Since there are two permutations of Jim an Paula (where they are sitting together in the two seats on the far left) and then 4! orderings of the other people. For Jim and Paula to sit next to each other then as a couple they can be at the positions (1, 2), (2, 3), (3, 4), (4, 5), and (5, 6) each of which has the same probability (as calculated 1 1 above) to get for the probability that Jim and Paula sit next to each other of 5 15 = 3. Another method to get the same probability is to consider Jim and Paula (together) as one unit. Then the number of orderings of this group is 2(5!) since we have five items (Jim and Paula together and the other four people) and two orderings of Jim and Paula where they are sitting together. This gives a probability of 2(5!) = 31 the same as before. 6! We now want the probability that at least one wife ends up sitting next to her husband. Let Ci be the event that couple i (for i = 1, 2, 3) sits next to each other. Then we want to evaluate P (C1 ∪ C2 ∪ C3 ) , or P (C1 ) + P (C2) + P (C3 ) − P (C1 ∩ C2 ) − P (C1 ∩ C3 ) − P (C2 ∩ C3 ) + P (C1 ∩ C2 ∩ C3 ) . By symmetry this is equal to 3P (C1 ) − 3P (C1 ∩ C2 ) + P (C1 ∩ C2 ∩ C3 ) . We computed P (C1) above. Now P (C1 ∩ C2 ) can be computed by considering two “fused” couples items and the two other people. There are 4! ways to order these groups and two orderings of each couple in their fused group. This gives P (C1 ∩ C2 ) = 21 2 4!22 = . 6! 15 In the same way we can evaluate P (C1 ∩ C2 ∩ C3 ) since it is three fused couples and we get P (C1 ∩ C2 ∩ C3 ) = 1 3!23 = . 6! 15 Thus we compute P (C1 ∪ C2 ∪ C3 ) = 1 − 3 2 15 + 1 2 = . 15 3 Exercise 2.43 We have four ways to pick the 10 to use in the hand, four ways to pick the nine to use, etc. 52 We have 5 ways to draw five cards. Thus the probability of a straight with ten the high 5 card is 452 = 0.000394. (5) To be a straight we can have five, six, seven, eight, nine, ten, jack, queen, king, or ace be the high card. This is ten possible cards. The probability of a straight with any one of these as the high card is the same as we just calculated. Thus the probability of a straight is given by 10(0.000394) = 0.00394. The probability that we have a straight flush where all cards are of the same suit and 10 is the high card is 524 = 1.539 10−6. To have any possible high card we multiply this by 10 as (5) before to get 1.539 10−5. Exercise 2.44 Recall that nk are the number of ways to draw a set of size k from n items. Once this set is drawn what remains is a set of size n − k. Thus for every set of size k we have a set of size n − k this fact gives this equivalence. Notes on Example 2.26 The book provides probabilities for the various intersections P (A ∩ B), P (A ∩ C), P (B ∩ C), and P (A ∩ B ∩ C) in the table given for this example, but does not explain how it calculated the probabilities of the three way intersections i.e. P (A ∩ B ∩ C ′ ), P (A ∩ B ′ ∩ C), and P (A′ ∩ B ∩ C). We can do this by setting up equations for the three way intersections in terms of the two way intersections (by reading from the Venn diagram) as follows P (A ∩ B ∩ C ′ ) + P (A ∩ B ∩ C) = P (A ∩ B) P (A ∩ B ′ ∩ C) + P (A ∩ B ∩ C) = P (A ∩ C) P (A′ ∩ B ∩ C) + P (A ∩ B ∩ C) = P (B ∩ C) . This allows us to solve for the three way intersections. 22 Exercise 2.45 Part (a): From the given table we have P (A) = 0.106 + 0.141 + 0.2 = 0.447 P (C) = 0.215 + 0.2 + 0.065 + 0.02 = 0.5 P (A ∩ C) = 0.2 . Part (b): We find P (A|C) = 0.2 2 P (A ∩ C) = = , P (C) 0.5 5 is the probability of the blood type A given that we are from the third ethnic group and P (C|A) = 0.2 P (A ∩ C) = = 0.4474273 , P (A) 0.447 is the probability we are from the third ethnic group given we have the blood type A. Part (c): Let G1 be the event that the individual is from ethnic group one. Then we want to evaluate P (G1 ∩ B ′ ) P (G1 |B ′ ) = . P (B ′ ) We need to compute the various parts of the above expression. First we have P (B) = 0.008 + 0.018 + 0.065 = 0.091 so P (B ′ ) = 1 − P (B) = 0.909 . Next as the blood types are mutually exclusive we have G1 ∩ B ′ = G1 ∩ (O ∪ A ∪ AB) = (G1 ∩ O) ∪ (G1 ∩ A) ∪ (G1 ∩ AB) , so P (G1 ∩ B ′ ) = P (G1 ∩ O) + P (G1 ∩ A) + P (G1 ∩ AB) = 0.082 + 0.106 + 0.04 = 0.228 . Thus we get 0.228 = 0.2508251 . 0.909 Warning: The answer to this problem does not match that in the back of the book. If anyone sees anything wrong with what I have done please contact me. P (G1 |B ′ ) = Exercise 2.46 From the description of the events given in the book P (A|B) is the probability a person is over six feet tall given they are a professional basketball player and P (B|A) is the probability a person is a professional basketball player given they are over six feet tall. We would expect P (A|B) > P (B|A). 23 Exercise 2.47 From Exercise 12 we were told that P (A) = 0.5 P (B) = 0.4 P (A ∩ B) = 0.25 . Part (a): P (B|A) is the probability we have a MasterCard given we have a Visa card and is given by P (A ∩ B) 0.25 1 = = . P (A) 0.5 2 Part (b): P (B ′|A) is the probability we don’t have a MasterCard given we have a Visa card and is given by P (A ∩ B ′ ) . P (A) To compute P (A ∩ B ′ ) write A as A = (A ∩ B) ∪ (A ∩ B ′ ) so P (A) = P (A ∩ B) + P (A ∩ B ′ ) . Using this we have P (A ∩ B ′ ) = P (A) − P (A ∩ B) = 0.5 − 0.25 = 0.25 . Thus P (B ′ |A) = 0.25 0.5 = 21 . Note that this is also equal to 1 − P (B|A) as it should be. Part (c): P (A|B) is the probability we have a Visa card given we have a MasterCard and is given by 0.25 P (A ∩ B) = = 0.625 . (7) P (B) 0.4 Part (d): P (A′|B) is the probability we don’t have a Visa given we have a MasterCard and is given by 1 − P (A|B) = 1 − 0.625 = 0.375 . Part (e): For this part we want to evaluate P (A ∩ (A ∪ B)) P ((A ∩ A) ∪ (A ∩ B)) = P (A ∪ B) P (A ∪ B) P (A) 0.5 P (A ∪ (A ∩ B)) = = = 0.7692308 . = P (A ∪ B) P (A ∪ B) 0.5 + 0.4 − 0.25 P (A|A ∪ B) = 24 Exercise 2.48 Part (a): This would be P (A2 |A1 ) = P (A1 ∩ A2 ) P (A1 ) + P (A2 ) − P (A1 ∪ A2 ) 0.12 + 0.07 − 0.13 = = = 0.5 . P (A1 ) P (A1 ) 0.12 Part (b): This would be P (A1 ∩ A2 ∩ A3 |A1 ) = 0.01 P (A1 ∩ A2 ∩ A3 ) = = 0.0833 . P (A1 ) 0.12 Part (c): Denote the probability we want to calculate by P. Then P is given by P = = P {(A1 ∩ A′2 ∩ A′3 ) ∪ (A′1 ∩ A2 ∩ A′3 ) ∪ (A′1 ∩ A′2 ∩ A3 )|A1 ∪ A2 ∪ A3 } P {[(A1 ∩ A′2 ∩ A′3 ) ∪ (A′1 ∩ A2 ∩ A′3 ) ∪ (A′1 ∩ A′2 ∩ A3 )] ∩ [A1 ∪ A2 ∪ A3 ]} . P (A1 ∪ A2 ∪ A3 ) The numerator of the above fraction is given by P ((A1 ∩ A′2 ∩ A′3 ) ∩ (A1 ∪ A2 ∪ A3 )) + P ((A′1 ∩ A2 ∩ A′3 ) ∩ (A1 ∪ A2 ∪ A3 )) + P ((A′1 ∩ A′2 ∩ A3 ) ∩ (A1 ∪ A2 ∪ A3 )) , or P (A1 ∩ A′2 ∩ A′3 ) + P (A′1 ∩ A2 ∩ A′3 ) + P (A′1 ∩ A′2 ∩ A3 ) . We now need to compute each of the above probabilities. Given the assumptions of the problem we can derive “all” of the intersections we might need P (A1 ∩ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∪ A2 ) = 0.12 + 0.07 − 0.13 = 0.06 P (A1 ∩ A3 ) = 0.12 + 0.05 − 0.14 = 0.03 P (A2 ∩ A3 ) = 0.07 + 0.05 − 0.1 = 0.02 P (A1 ∩ A′2 ) = P (A1 ) − P (A1 ∩ A2 ) = 0.12 − 0.06 = 0.06 P (A1 ∩ A′3 ) = 0.12 − 0.03 = 0.09 P (A2 ∩ A′3 ) = 0.07 − 0.02 = 0.05 P (A′1 ∩ A2 ) = P (A2 ) − P (A1 ∩ A2 ) = 0.07 − 0.06 = 0.01 P (A′1 ∩ A3 ) = P (A3 ) − P (A3 ∩ A1 ) = 0.05 − 0.03 = 0.02 P (A′2 ∩ A3 ) = P (A3 ) − P (A2 ∩ A3 ) = 0.05 − 0.02 = 0.03 . Now with these and using P (A1 ∩ A2 ) = P (A1 ∩ A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A′3 ) we get 0.06 = 0.01 + P (A1 ∩ A2 ∩ A′3 ) so P (A1 ∩ A2 ∩ A′3 ) = 0.05 . In the same way we get P (A1 ∩ A′2 ∩ A3 ) = P (A1 ∩ A3 ) − P (A1 ∩ A3 ∩ A2 ) = 0.03 − 0.01 = 0.02 , 25 and P (A′1 ∩ A2 ∩ A3 ) = P (A2 ∩ A3 ) − P (A1 ∩ A2 ∩ A3 ) = 0.02 − 0.01 = 0.01 P (A1 ∩ A′2 ∩ A′3 ) = P (A1 ∩ A′2 ) − P (A1 ∩ A′2 ∩ A3 ) = 0.06 − 0.02 = 0.04 P (A′1 ∩ A2 ∩ A′3 ) = P (A′1 ∩ A2 ) − P (A′1 ∩ A2 ∩ A3 ) = 0.01 − 0.01 = 0.0 P (A′1 ∩ A′2 ∩ A3 ) = P (A′2 ∩ A3 ) − P (A1 ∩ A′2 ∩ A3 ) = 0.03 − 0.02 = 0.01 . Using everything we have thus far we can compute the probability we need. We find P (A1 ∩ A′2 ∩ A′3 ) + P (A′1 ∩ A2 ∩ A′3 ) + P (A′1 ∩ A′2 ∩ A3 ) = 0.04 + 0.0 + 0.01 = 0.05 . Part (d): We have P (A1 ∩ A2 ∩ A3 ) P (A1 ∩ A2 ) 0.01 P (A1 ∩ A2 ∩ A3 ) =1− = 0.8333333 . =1− P (A1 ) + P (A2 ) − P (A1 ∪ A2 ) 0.12 + 0.07 − 0.13 P (A′3 |A1 ∩ A2 ) = 1 − P (A3 |A1 ∩ A2 ) = 1 − Exercise 2.49 Let A be the event that at least one of the two bulbs selected is found to be 75 Watts and . B be the event that both bulbs are 75 Watts. Then we want to evaluate P (B|A) = P P(A∩B) (A) Now the denominator of P (B|A) can be evaluated as 6 9 + 62 54 + 15 1 1 = 0.657143 . P (A) = = 15 105 2 For the numerator note that A ∩ B = B, since if B is true then A is true. Thus we have (62) (152) P (B|A) = = 0.21739 . 0.657143 Let C be the event that at least one of the two bulbs is not 75 Watts and D be the event that both bulbs are the same rating. Then D = {both bulbs 40W} ∪ {both bulbs 60W} ∪ {both bulbs 75W} . For this part of the problem we want to evaluate P (D|C) = Now P (C) = 9 2 P (C ∩ D) . P (C) 9 1 + 15 2 6 26 1 = 0.8571429 . Now D ∩ C is the event that both 40 Watt or both are 60 Watt as C contradicts the event that both bulbs are 75 Watts. Thus we have P (C ∩ D) = P (both bulbs 40W) + P (both bulbs 60W) 5 4 = Thus we have P (D|C) = 2 15 2 0.152381 0.8571429 + 2 15 2 = 0.152381 . = 0.1777778. Exercise 2.50 Let LS represent long-sleeved shirts and SS represent short-sleeved shirts. Part (a): From the given table we have P (M, LS, P r) = 0.05. Part (b): P (M, P r) = 0.07 + 0.05 = 0.12. Part (c): For P (SS) we would add all the numbers in the short-sleeved table. For P (LS) we would add all of the numbers in the long-sleeved table. Part (d): We have P (M) = 0.08 + 0.07 + 0.12 + 0.1 + 0.05 + 0.07 = 0.49 P (P r) = 0.02 + 0.07 + 0.07 + 0.02 + 0.05 + 0.02 = 0.25 . Part (e): We want to evaluate P (M|SS, P l) = 0.08 P (M, SS, P l) = = 0.5333333 . P (SS, P l) 0.04 + 0.08 + 0.03 Part (f): We have P (SS|M, P l) = P (M, SS, P l) 0.08 = = 0.4444444 , P (M, P l) 0.08 + 0.1 P (LS|M, P l) = 0.10 P (M, LS, P l) = = 0.5555556 . P (M, P l) 0.08 + 0.1 and Exercise 2.51 Part (a): Let R1 be the event that we draw a red ball on the first draw and R2 that we draw a red ball on the second draw. The we have 6 8 = 0.4363 . P (R1 , R2 ) = P (R2 |R1 )P (R1 ) = 11 10 27 Part (b): We want the probability that we have the same number of red and green balls after the two draws as before. This is given by P (R1 , R2 ) + P (G1 , G2 ) = P (R2 |R1 )P (R1 ) + P (G2 |G1 )P (G1) 4 32 4 24 = + = 0.5818 . = 25 11 10 55 Exercise 2.52 Let F1 be the event that the first pump fails and F2 the event that the second pump fails. Then we are told that P (F1 ∪ F2 ) = 0.07 P (F1 ∩ F2 ) = 0.01 . Now assuming that P (F1 ) = P (F2 ) we get P (F1 ∪F2 ) = P (F1 ) + P (F2) −P (F1 ∩F2 ) = 2P (F1 ) −P (F1 ∩F2 ) so 0.07 = 2P (F1 ) −0.01 , and we get that P (F1 ) = P (F2) = 0.03. We can check that with this numerical value we have P (F2 |F1 ) > P (F2 ) as we should. Note that P (F2 |F1 ) = 0.01 1 P (F1 ∩ F2 ) = = > P (F2 ) = 0.03 . P (F1 ) 0.03 3 Exercise 2.53 We have when B ⊂ A so that A ∩ B = B and thus P (B|A) = P (A ∩ B) P (B) 0.05 = = = 0.083 . P (A) P (A) 0.6 Exercise 2.54 Part (a): The expression P (A2 |A1 ) is the probability we are awarded project 2 given that we are awarded project 1. We can compute it from P (A1 ∩ A2 ) 0.11 1 = = . P (A1 ) 0.22 2 Part (b): The expression P (A2 ∩ A3 |A1 ) is the probability we are awarded projects two and three given that we were awarded project one. We can compute it as P (A1 ∩ A2 ∩ A3 ) 0.01 = = 0.04545 . P (A1 ) 0.22 28 Part (c): The expression P (A2 ∪ A3 |A1 ) is the probability we are awarded projects two or three given that we were awarded project one. We can compute it as P (A1 ∩ (A2 ∪ A3 )) . P (A1 ) To compute P (A1 ∩ (A2 ∪ A3 )) we note that A1 ∩ (A2 ∪ A3 ) = (A1 ∩ A2 ) ∪ (A1 ∩ A3 ) , so P (A1 ∩ (A2 ∪ A3 )) = P ((A1 ∩ A2 ) ∪ (A1 ∩ A3 )) = P (A1 ∩ A2 ) + P (A1 ∩ A3 ) − P ((A1 ∩ A2 ) ∩ (A1 ∩ A3 )) = 0.11 + 0.05 − P (A1 ∩ A2 ∩ A3 ) = 0.16 − 0.01 = 0.15 . Thus we get P (A2 ∪ A3 |A1 ) = 0.15 = 0.6818 . 0.22 Part (d): The expression P (A1 ∩ A2 ∩ A3 |A1 ∪ A2 ∪ A3 ) is the probability we are awarded projects one, two, and three given that we were awarded at least one of the three projects. We can compute it as P (A1 ∩ A2 ∩ A3 ) 0.01 P ((A1 ∩ A2 ∩ A3 ) ∩ (A1 ∪ A2 ∪ A3 )) = = = 0.01886792 . P (A1 ∪ A2 ∪ A3 ) P (A1 ∪ A2 ∪ A3 ) 0.53 where we computed P (A1 ∪ A2 ∪ A3 ) in Exercise 13 Page 6. Exercise 2.55 Let L be the event that a tick carries Lyme disease and H the event that a tick carries HGE. Then from the problem we are told that P (L) = 0.16 P (H) = 0.1 P (H ∩ L|H ∪ L) = 0.1 . We want to compute P (L|H). Now from the definition of conditional probability we have P (H ∩ L|H ∪ L) = so P ((H ∩ L) ∩ (H ∪ L)) P (H ∩ L) = = 0.1 , P (H ∪ L) P (H ∪ L) P (H ∩ L) = 0.1P (H ∪ L) . (8) Next using P (H ∩ L) = P (H) + P (L) − P (H ∪ L) with the above we get 0.1P (H ∪ L) = 0.16 + 0.1 − P (H ∪ L) so P (H ∪ L) = 0.23636 . Using Equation 8 we get P (H ∩ L) = 0.023636. For the probability we want to evaluate we find 0.0236 P (L ∩ H) = = 0.236 . P (L|H) = P (H) 0.1 29 Exercise 2.56 Using the definition of conditional probability we have P (A|B) + P (A′ |B) = P (A ∩ B) + P (A′ ∩ B) P (B) = = 1, P (B) P (B) as we were to show. Exercise 2.57 If P (B|A) > P (B) then adding P (B ′ |A) to both sides of this expression gives 1 > P (B) + P (B ′ |A) , or 1 − P (B) = P (B ′ |A) or P (B ′) > P (B ′|A) as we were to show. Exercise 2.58 We have P ((A ∩ C) ∪ (B ∩ C)) P ((A ∪ B) ∩ C) = P (C) P (C) P (A ∩ C) + P (B ∩ C) − P (A ∩ B ∩ C) = P (C) = P (A|C) + P (B|C) − P (A ∩ B|C) , P (A ∪ B|C) = as we were to show. Exercise 2.59 Part (a): We have P (A2 ∩ B) = P (A2 )P (B|A2 ) = 0.35(0.6) = 0.21 . Part (b): We have P (B) = P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + P (B|A3 )P (A3 ) = 0.3(0.4) + 0.21 + 0.5(0.25) = 0.455 . 30 Part (c): We have 0.12 P (A1 ∩ B) = = 0.2637 P (B) 0.455 P (A2 ∩ B) 0.21 P (A2 |B) = = = 0.4615 P (B) 0.455 0.125 P (A3 ∩ B) = = 0.274725 . P (A3 |B) = P (B) 0.455 P (A1 |B) = Exercise 2.60 Let D be the event that an aircraft is discovered and L be the event that the aircraft discovered has an emergency locator. Then we are told that P (D) = 0.7, P (L|D) = 0.6, and P (L|D ′) = 0.9. From these we can conclude that P (L ∩ D) = 0.7(0.6) = 0.42 and P (L ∩ D ′ ) = 0.3(0.9) = 0.27. Part (a): We want to evaluate P (D ′|L) = P (L∩D ′ ) . P (L) Now P (L) = P (L|D)P (D) + P (L|D ′)P (D ′) = 0.6(0.7) + 0.9(0.3) = 0.69 , so P (D ′|L) = 0.27 0.69 = 0.3913. Part (b): We want P (D|L′) = P (D∩L′ ) . P (L′ ) Now P (D ∩ L′ ) = P (D) − P (D ∩ L) = 0.7 − 0.42 = 0.28 , so P (D|L′) = 0.28 0.31 = 0.9032. Exercise 2.61 Let D0 , D1 , and D2 be the events that there are no defective, one defective, and two defective items in the batch of 10 items. Then we are told that P (D0 ) = 0.5 P (D1 ) = 0.3 P (D2 ) = 0.2 . Part (a): Let N be the event that neither tested component is defective. We want to evaluate P (Di |N) for i = 0, 1, 2. We have P (D0 |N) = P (D0 ∩ N) P (N|D0 )P (D0 ) = , P (N) P (N|D0 )P (D0 ) + P (N|D1 )P (D1 ) + P (N|D2 )P (D2) 31 and the same type of expression for P (D1|N) and P (D2|N). To use the above note that P (N|D0 ) = 1 P (N|D1 ) = P (N|D2 ) = Thus using these we have P (N) = 1(0.5) + 9 2 10 2 8 2 10 2 = 36 45 = 28 . 45 36 28 (0.3) + (0.2) = 0.8644 , 45 45 and then 0.5 = 0.5784359 0.8644 0.3 36 45 P (D1|N) = = 0.277649 0.8644 28 0.2 45 = 0.14396 . P (D2|N) = 0.8644 P (D0|N) = Part (b): Let O (an upper case letter “o” and not a zero) be the event that one of the two tested items is defective. Then P (O) = P (O|D0)P (D0 ) + P (O|D1)P (D1) + P (O|D2)P (D2 ) 1 9 2 8 1 1 1 1 = 0 + 10 (0.3) + 10 (0.2) = 0.06 + 0.0711 = 0.1311 . 2 2 Using this we have P (D0 |O) = 0 0.06 = 0.4576 0.1311 0.07111 = 0.5424 . P (D2 |O) = 0.1311 P (D1 |O) = Exercise 2.62 Let B be the event that the camera is a basic model, and W the event that a warranty was purchased. Then from the problem statement we have P (B) = 0.4 (so P (B ′ ) = 0.6) and P (W |B) = 0.3 and P (W |B ′) = 0.5. Then we want to evaluate P (W |B)P (B) P (B ∩ W ) = P (W ) P (W |B)P (B) + P (W |B ′)P (B ′ ) 0.3(0.4) = = 0.2857 . 0.3(0.4) + 0.5(0.6) P (B|W ) = 32 Exercise 2.63 Part (a): In words, we would draw a diagram with A going up (with a 0.75) and A′ going down (with a 0.25). Then from the A branch we would draw B going up (with a 0.9) and B ′ going down (with a 0.1). From the A′ branch we would draw B going up (with a 0.8) and B ′ going down (with a 0.2). From the AB branch we would draw C going up (with a 0.8) and C ′ going down (with a 0.2). From the AB ′ branch we would draw C going up (with a 0.6) and C ′ going down (with a 0.4). From the A′ B branch we would draw C going up (with a 0.7) and C ′ going down (with a 0.3). From the A′ B ′ branch we would draw C going up (with a 0.3) and C ′ going down (with a 0.7). Part (b): We could compute this as P (A ∩ B ∩ C) = P (A ∩ B ∩ C|A)P (A) = P (B ∩ C|A)P (A) = P (B ∩ C|A ∩ B)P (B|A)P (A) = P (C|A ∩ B)P (B|A)P (A) = 0.8(0.9)(.75) = 0.54 . Part (c): Using the tree diagram we would compute P (B ∩ C) = 0.75(0.9)(0.8) + 0.25(0.8)(0.7) = 0.68 . Or algebraically we could use P (B ∩ C) = P (B ∩ C|A)P (A) + P (B ∩ C|A′ )P (A′ ) = P (C|A ∩ B)P (B|A)P (A) + P (C|A′ ∩ B)P (B|A′)P (A′ ) . Part (d): Algebraically we have P (C) = P (C|A ∩ B)P (A ∩ B) + P (C|A ∩ B ′ )P (A ∩ B ′ ) + P (C|A′ ∩ B)P (A′ ∩ B) + P (C|A′ ∩ B ′ )P (A′ ∩ B ′ ) = 0.8P (A ∩ B) + 0.6P (A ∩ B ′ ) + 0.7P (A′ ∩ B) + 0.3P (A′ ∩ B ′ ) = 0.8P (B|A)P (A) + 0.6P (B ′|A)P (A) + 0.7P (B|A′)P (A′ ) + 0.3P (B ′|A′ )P (A′ ) = 0.8(0.9)(0.75) + 0.6(0.1)(0.75) + 0.7(0.8)(0.25) + 0.3(0.2)(0.25) = 0.74 . Part (e): This would be P (A|B ∩ B) = P (A ∩ B ∩ C) 0.54 = = 0.7941 . P (B ∩ C) 0.68 Exercise 2.64 Let A1 be the event that we have the disease and B the event that our test gives a positive 1 , P (B|A1 ) = 0.99 and P (B ′ |A1 ) = 0.02. Now result. Then P (A) = 25 1 24 ′ ′ P (B) = P (B|A1 )P (A1 ) + P (B|A1)P (A1 ) = 0.99 + 0.02 = 0.0588 . 25 25 33 With this we have for the two probabilities requested 1 0.99 25 P (B|A1 )P (A1 ) = = 0.6734 P (A1|B) = P (B) 0.0588 24 ′ ′ ′ 0.02 P (B |A )P (A ) 1 1 25 = = 0.02039949 . P (A′1 |B ′ ) = P (B ′ ) 1 − 0.0588 Exercise 2.65 From the given problem statement we get P (mean) = P (median) = P (mode) = P (S|mean) = P (S|median) = P (S|mode) = 500 1 = 500 + 300 + 200 2 3 10 1 5 200 2 = 500 5 1 150 = 300 2 4 160 = . 200 5 Here S is the event that a given student was satisfied with the book. We then compute P (S|mean)P (mean) P (S|mean)P (mean) + P (S|median)P (median) + P (S|mode)P (mode) 1 2 5 23 1 = 0.3921569 . = 2 1 1 4 + + 5 2 2 10 5 5 P (mean|S) = In the same way we get P (median|S) = 0.2941176 P (mode|S) = 0.3137255 . Exercise 2.66 There are various ways to stay connected to ones work while on vacation. While on vacation, let E be the event that a person checks their email to stay connected, let C be the event that a person connects to work with their cell phone, and let L be the event that a person 34 uses their laptop to stay connected. Then from the problem statement we are told that P (E) = 0.4 P (C) = 0.3 P (L) = 0.25 P (E ∩ C) = 0.23 P ((E ∪ C ∪ L)′ ) = 0.51 so P (E ∪ C ∪ L) = 0.49 and P (E|L) = 0.88 P (L|C) = 0.7 . Using the above we can derive the probability of some intersections P (E ∩ L) = P (E|L)P (L) = 0.88(0.25) = 0.22 P (L ∩ C) = P (L|C)P (C) = 0.7(0.3) = 0.21 . Part (a): This would be P (C|E) = P (E ∩ C) 0.23 = = 0.575 . P (E) 0.4 Part (b): This would be P (C|L) = 0.21 P (C ∩ L) = = 0.84 . P (L) 0.25 Part (c): This would be P (E ∩ L ∩ C) . P (E ∩ L) The numerator in the above fraction can be computed with P (C|E ∩ L) = P (E ∪ L ∪ C) = P (E) + P (L) + P (C) − P (E ∩ L) − P (E ∩ C) − P (L ∩ C) + P (E ∩ L ∩ C) so 0.49 = 0.4 + 0.25 + 0.3 − 0.22 − 0.23 − 0.21 + P (E ∩ L ∩ C) , on solving we find P (E ∩ L ∩ C) = 0.2. Using this we find P (C|E ∩ L) = 0.2 = 0.90909 . 0.22 Exercise 2.67 Let T be the event that a person is a terrorist (so that T ′ is the event that the person is not a terrorist). Then from the problem statement we have that 100 = 3.33 10−6 300 106 P (T ′) = 1 − P (T ) = 1 − 3.33 10−6 . P (T ) = 35 Let D (for detect) be the event that our system identifies a person as a terrorist, then D ′ is the event the system does not identify the person as a terrorist. Then also from the problem statement we have P (D|T ) = 0.99 P (D ′|T ′ ) = 0.999 . Then we want to evaluate P (T |D) i.e. the person is actually a terrorist given that our system identifies them as one. We have P (D|T )P (T ) P (D|T )P (T ) + P (D|T ′)P (T ′) 0.99(3.33 10−6) = = 0.0003298912 . 0.99(3.33 10−6) + (1 − 0.999)(1 − 3.33 10−6) P (T |D) = Exercise 2.68 From the problem statement we have P (A1 ) = 0.5, P (A2 ) = 0.3, and P (A3 ) = 0.2. Let Ld be the event the flight is late into D.C. and La be the event the flight is late into L.A. Then we have P (Ld |A1 ) = 0.3 P (Ld |A2 ) = 0.25 P (Ld |A3 ) = 0.4 and and and P (La |A1 ) = 0.1 P (La |A2 ) = 0.2 P (La |A3 ) = 0.25 . We want to evaluate P {Ai |(Ld ∩ L′a ) ∪ (L′d ∩ La )} for i = 1, 2, 3. By Bayes Rule P {(Ld ∩ L′a ) ∪ (L′d ∩ La )|A1 }P {A1 } P {(Ld ∩ L′a ) ∪ (L′d ∩ La )} (P {Ld ∩ L′a |A1 } + P {L′d ∩ La |A1 })P {A1 } = by disjoint sets P {(Ld ∩ L′a ) ∪ (L′d ∩ La )} (P {Ld |A1 }P {L′a |A1 } + P {L′d |A1 }P {La |A1 })P {A1 } = P {(Ld ∩ L′a ) ∪ (L′d ∩ La )} (0.3(0.9) + 0.7(0.1))(0.5) . = P {(Ld ∩ L′a ) ∪ (L′d ∩ La )} P {A1 |(Ld ∩ L′a ) ∪ (L′d ∩ La )} = By conditioning on the airline taken Ai the denominator of the above can be computed as P {(Ld ∩ L′a ) ∪ (L′d ∩ La )} = (0.3(0.9) + 0.7(0.1))(0.5) + (0.25(0.8) + 0.75(0.8))(0.3) + (0.4(0.75) + 0.6(0.75))(0.2) = 0.17 + 0.24 + 0.15 = 0.56 . With this we then get for the posterior probabilities 0.17 = 0.303 0.56 0.24 P {A2 |(Ld ∩ L′a ) ∪ (L′d ∩ La )} = = 0.428 0.56 0.15 P {A3 |(Ld ∩ L′a ) ∪ (L′d ∩ La )} = = 0.267 . 0.56 P {A1 |(Ld ∩ L′a ) ∪ (L′d ∩ La )} = 36 Exercise 2.69 From the definitions of A1 , A2 , A3 , and B in the previous exercise we have P (A1 ) = 0.4 P (A2 ) = 0.35 P (A3 ) = 0.25 P (B|A1 ) = 0.3 P (B|A2 ) = 0.6 P (B|A3 ) = 0.5 . Let C be the event the customer uses a credit card, then from this exercise we have P (C|A1 ∩ B) = 0.7 P (C|A1 ∩ B ′ ) = 0.5 P (C|A2 ∩ B) = 0.6 P (C|A2 ∩ B ′ ) = 0.5 P (C|A3 ∩ B) = 0.5 P (C|A3 ∩ B ′ ) = 0.4 . Part (a): We want to compute P (A2 ∩ B ∩ C) = P (C|A2 ∩ B)P (A2 ∩ B) = P (C|A2 ∩ B)P (B|A2 )P (A2) = 0.6(0.6)(0.35) = 0.126 . Part (b): We want to compute P (A3 ∩ B ′ ∩ C) = P (C|A3 ∩ B ′ )P (A3 ∩ B ′ ) = P (C|A3 ∩ B ′ )P (B ′ |A3 )P (A3 ) = 0.4(0.5)(0.25) = 0.05 . Part (c): We want to compute P (A3 ∩ C) = P (A3 ∩ C ∩ B) + P (A3 ∩ C ∩ B ′ ) = P (C|A3 ∩ B)P (A3 ∩ B) + 0.05 = 0.5P (B|A3)P (A3 ) + 0.05 = 0.5(0.5)(0.25) + 0.05 = 0.1125 . Part (d): We want to compute P (B ∩ C) = P (B ∩ C|A1 )P (A1 ) + P (B ∩ C|A2 )P (A2 ) + P (B ∩ C|A3 )P (A3 ) = P (C|B ∩ A1 )P (B|A1 )P (A1 ) + P (C|B ∩ A2 )P (B|A2 )P (A2 ) + P (C|B ∩ A3 )P (B|A3 )P (A3 ) = 0.7(0.3)(0.4) + 0.6(0.6)(0.35) + 0.5(0.5)(0.25) = 0.2725 . 37 Part (e): We want to compute P (C) = P (C ∩ B) + P (C ∩ B ′ ) = 0.2725 + 0.5(0.7)(0.4) + 0.5(0.4)(0.35) + 0.4(0.5)(0.25) = 0.5325 . Part (f): We want to compute P (A3 |C) = P (A3 ∩ C) 0.1125 = = 0.2112676 . P (C) 0.5325 Exercise 2.70 From the definition of independence we have that events A and B are dependent if P (A|B) 6= P (A) or P (A ∩ B) 6= P (A)P (B). In exercise 47 we computed P (A|B) and found it to be 0.625 (see Equation 7) which is not equal to P (A) = 0.5. Note also that using the other expression we have P (A ∩ B) = 0.25 6= P (A)P (B) = 0.5(0.4) = 0.2. Exercise 2.71 Part (a): Since the events are independent what happens with the Asia project does not effect the European project and thus P (B ′ ) = 1 − P (B) = 0.3. Part (b): We have (using independence) that P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = P (A) + P (B) − P (A)P (B) = 0.4 + 0.7 − 0.28 = 0.82 . Part (c): We have P ((A ∩ B ′ ) ∩ (A ∪ B)) P ((A ∩ B ′ ∩ A) ∪ (A ∩ B ′ ∩ B)) = P (A ∪ B) P (A ∪ B) ′ ′ P (A)P (B ) (0.4)(0.3) P (A ∩ B ) = = = 0.14634 . = P (A ∪ B) P (A ∪ B) 0.82 P (A ∩ B ′ |A ∪ B) = Exercise 2.72 From exercise 13 we have P (A1 ∩ A2 ) = 0.11 vs. P (A1)P (A2 ) = 0.22(0.25) = 0.055 P (A1 ∩ A3 ) = 0.05 vs. P (A1)P (A3 ) = 0.22(0.28) = 0.0618 P (A2 ∩ A3 ) = 0.07 vs. P (A2)P (A3 ) = 0.25(0.28) = 0.07 . Thus A2 and A3 are independent while the others are not. 38 Exercise 2.73 We have P (A′ ∩ B) = P (B) − P (A ∩ B) = P (B) − P (A)P (B) = P (B)(1 − P (A)) = P (B)P (A′ ) , showing that the events A′ and B are independent. Exercise 2.74 The probabilities that both phenotype are O is given by P (O1 ∩ O2 ) = P (O1)P (O2) = 0.442 = 0.1936 . The probabilities that the two phenotype match is given by P (A1 ∩ A2 ) + P (B1 ∩ B2 ) + P ((A ∩ B1 ) ∩ (A ∩ B2 )) + P (O1 ∩ O2 ) = P (A1 )P (A2 ) + P (B1 )P (B2) + P (A ∩ B1 )P (A ∩ B2 ) + P (O1)P (O2 ) = 0.422 + 0.12 + 0.042 + 0.442 = 0.3816 . Exercise 2.75 From the problem statement, the probability that a point does not signal a problem (when it is running correctly) is 0.95. The probability that in ten points at least one indicates a problem is the complement of the probability that in ten points none indicate a problem which is 0.9510 = 0.5987369. Thus the problem that at least one point that signals a problem is 1 − 0.9510 = 0.4012631. For 25 points the probability that at least one point signals a problems is 1 − 0.9525 = 0.7226. Exercise 2.76 From the problem statement the probability that a grader will not make an error on one question is 0.9. The probability they make no errors in ten questions is then 0.910 = 0.34867. The probability of at least one error in ten questions is 1 − 0.910 = 0.6513. In general if the probability that the grader makes an error on one question is p the probability of no error is 1 − p. The probability no errors in n questions is then (1 − p)n . The probability of at least one error is mode is 1 − (1 − p)n . Exercise 2.77 Part (a): Let p be the probability that an individual rivet is defective, then 1 − p is the probability a single rivet is not defective and (1 − p)25 is probability that all 25 rivets are 39 not defective. Then 1 − (1 − p)25 is the probability that at least one rivets is defective and the entire seam will need reworking. We are told that 1 − (1 − p)25 = 0.2 so solving for p gives p = 0.008886 . Note that this is a different number than the one given in the back of the book. If anyone sees anything wrong with what I have done please contact me. Part (b): In this case we want 1 − (1 − p)25 = 0.1 so p = 0.004205. Exercise 2.78 From the problem statement we have P (at least one valve opens) = 1 − P (no valves opens) = 1 − 0.055 = 0.9999 , and P (at least one valves fails to open) = 1 − P (all valves open) = 1 − 0.955 = 0.226 . Exercise 2.79 Let Fo be the event that the older pump fails and Fn the event that the newer pump fails. We are told that these events are independent and that P (Fo) = 0.1 and P (Fn ) = 0.05 thus P (Fo ∩ Fn ) = P (Fo)P (Fn ) = 0.1(0.05) = 0.005 . Note that this is a different number than the one given in the back of the book. If anyone sees anything wrong with what I have done please contact me. Exercise 2.80 Let Ci be the event that the component i works and let p = 0.9 be the probability that it works. Then from the diagram given we have that P (system works) = P (C1 ∪ C2 ∪ (C3 ∩ C4 )) = P (C1) + P (C2 ) + P (C3 ∩ C4 ) − P (C1 ∩ C2 ) − P (C1 ∩ (C3 ∩ C4 )) − P (C2 ∩ (C3 ∩ C4 )) + P (C1 ∩ C2 ∩ (C3 ∩ C4 )) = 2p + p2 − p2 − 2p3 + p4 = 2p − 2p3 + p4 = 0.9981 . 40 Exercise 2.81 Based on the figure in Example 2.36 we have (and assuming independence) that P (system works) = P {(components 1 and 2 work) or (components 3 and 4 work)} = P {(A1 ∩ A2 ) ∪ (A3 ∩ A4 )} = P (A1 ∩ A2 ) + P (A3 ∩ A4 ) − P (A1 ∩ A2 ∩ A3 ∩ A4 ) = p2 + p2 − p4 = 2p2 − p4 . We want this to equal 0.99 which gives the equation p4 − 2p2 + 0.99 = 0 . This has roots p2 = 0.9 and p2 = 1.1. Taking the square root of only valid value gives p = 0.94868. Exercise 2.82 These events are not pairwise independent since the events A and C are not independent. To be mutually independent all events must be pairwise independent which they are not. Exercise 2.83 Let D be the event a defect is present Ii the event that inspector i detects a defect. Then from the problem statement we are told P (I1 |D) = P (I2 |D) = 0.9 . and P ((I1′ ∩ I2 ) ∪ (I1 ∩ I2′ ) ∪ (I1′ ∩ I2′ )|D) = 0.2 . Using a Venn diagram set in left-hand-side of above expression is the complement of the set I1 ∩ I2 thus the above is equivalent to 1 − P (I1 ∩ I2 |D) = 0.2 or P (I1 ∩ I2 |D) = 0.8 . Part (a): To have only the first inspector detect the defect we want to evaluate P (I1 ∩ I2′ |D) = P (I1 |D) − P (I1 ∩ I2 |D) = 0.9 − 0.8 = 0.1 . To have only one of the two inspectors detect the defect we need to compute P ((I1 ∩ I2′ ) ∪ (I1′ ∩ I2 )) = P (I1 ∩ I2′ ) + P (I1′ ∩ I2 ) − P ((I1 ∩ I2′ ) ∩ (I1′ ∩ I2′ )) = 0.1 + 0.1 = 0.2 . 41 Part (b): The probability that both inspectors do not find the defect in one defective component is given by P (I1′ ∩ I2′ |D) = P ((I1 ∪ I2 )′ |D) = 1 − P (I1 ∪ I2 |D) = 1 − (P (I1 |D) + P (I2 |D) − P (I1 ∩ I2 |D)) = 1 − (0.9 + 0.9 − 0.8) = 0 . Thus to have all three defective components missed is then P (I1′ ∩ I2′ |D)3 = 0. Exercise 2.84 Let A be the event vehicle pass inspection and then P (A) = 0.7. Part (a): This would be P (A)3 = 0.73 = 0.343. Part (b): This would be 1 − P (all pass inspection) = 1 − 0.343 = 0.657. Part (c): This would be Part (d): This would be 3 1 0.71 0.32 = 0.189. P (zero pass inspection) + P (one pass inspection) = 0.33 + 0.189 = 0.216 . Part (e): We have P (A1 ∩ A2 ∩ A3 |A1 ∪ A2 ∪ A3 ) = = P (A1 ∩ A2 ∩ A3 ∩ (A1 ∪ A2 ∪ A3 )) P (A1 ∪ A2 ∪ A3 ) P (A1 ∩ A2 ∩ A3 ) P (A1 ∪ A2 ∪ A3 ) 0.343 P (A1 ) + P (A2 ) + P (A3 ) − P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A3 ) 0.343 = = 0.352518 . 3(0.7) − 3(0.72 ) + 0.73 = Exercise 2.85 Part (a): This would be p + (1 − p)p = 2p − p2 = p(2 − p). Part (b): One way to derive this would be to evaluate p + (1 − p)p + (1 − p)2 p + · · · + (1 − p)n−1 p = p(1 + (1 − p) + (1 − p)2 + · · · + (1 − p)n−1 ) 1 − (1 − p)n = 1 − (1 − p)n . =p 1 − (1 − p) Note that we get the same result by evaluating 1 − P (flaw not detected in n fixations) = 1 − (1 − p)n . 42 Part (c): This would be 1 − P (flaw is detected in three fixations) = 1 − (1 − (1 − p)3 ) = (1 − p)3 . Part (d): This would be P (pass inspection) = P (pass inspection|flawed)P (flawed) + P (pass inspection|flawed′ )P (flawed′ ) = (1 − p)3 (0.1) + 1(0.9) . Part (e): This would be P (flawed|pass inspection) = P (pass inspection|flawed)P (flawed) (1 − p)3 (0.1) = . P (pass inspection) (1 − p)3 (0.1) + 0.9 Exercise 2.86 Part (a): From the problem statement we have that 2000 P (A) = = 0.2 10000 1999 2000 ′ ′ P (B) = P (B|A)P (A) + P (B|A )P (A ) = (0.2) + (0.8) = 0.2 9999 9999 1999 (0.2) = 0.39984 . P (A ∩ B) = P (B|A)P (A) = 9999 To see if A and B are independent we next compute P (A)P (B) = 0.2(0.2) = 0.04. Since the two expressions P (A ∩ B) and P (A)P (B) are not equal the events A and B are not independent. Part (b): If P (A) = P (B) = 0.2 and A and B are independent than P (A ∩ B) = P (A)P (B) = 0.04. The difference between this and the value computed in Part (a) /is 0.04 − 0.39984 = 1.6 10−5. Since this difference is so small we might conclude that A and B are independent. Part (c): If we now have only two green boards then we have 2 P (A) = = 0.2 10 1 2 ′ ′ P (B) = P (B|A)P (A) + P (B|A )P (A ) = (0.2) + (0.8) = 0.2 9 9 1 P (B|A) = = 0.1111 so 9 P (A ∩ B) = P (B|A)P (A) = 0.0222 . If we assume that A and B are independent (as we did in Part (b)) we would again compute P (A)P (B) = 0.22 = 0.04. Note that is not very close in value to P (A ∩ B). Removing one green board in the 10 board case changes the distribution of the number of green boards still remaining quite significantly, while when there are 2000 green board initially removing one does not change the distribution of green boards remaining. 43 Exercise 2.87 As earlier let Ci be the event that component i works. We want to compute P (system work). Using the expression P (Ci ) = p we find P (system works) = P (C1 ∪ C2 )P {(C3 ∩ C4 ) ∪ (C5 ∩ C6 )}P (C7 ) . Lets compute the different parts in tern. First we have P (C1 ∪ C2 ) = (P (C1) + P (C2 ) − P (C1 ∩ C2 )) = 2p − p2 . Next we have P {(C3 ∩ C4 ) ∪ (C5 ∩ C6 )} = P (C3 ∩ C4 ) + P (C5 ∩ C6 ) − P ((C3 ∩ C4 ) ∩ (C5 ∩ C6 )) = p2 + p2 − p4 = 2p2 − p4 . Thus we get P (system works) = (2p − p2 )(2p2 − p4 )p = p4 (2 − p)(2 − p2 ) . when p = 0.9 the above becomes 0.85883. If this system was connected in parallel with the system in Figure 2.14 when we define the events S1 and S2 as S1 ≡ System In Figure 2.14 (a) Works S2 ≡ System In Problem 87 Works , we would have P (S1 ∪ S2 ) = P (S1 ) + P (S2 ) − P (S1 ∩ S2 ) = P (S1 ) + P (S2 ) − P (S1 )P (S2) = 0.927 + 0.85883 − 0.927(0.85883) = 0.9896946 . Exercise 2.88 Route 1 has four railway crossings and route 2 has only two railway crossings but is longer. Let Ti be the event that we are slowed by a train and we will take P (Ti ) = 0.1. Part (a): The probability we are late given we take each route can be computed by 4 4 4 3 1 2 2 (0.1)4 (0.9)0 = 0.0523 (0.1) (0.9) + (0.1) (0.9) + P (late|route 1) = 4 3 2 2 2 (0.1)2 (0.9)0 = 0.19 . (0.1)1 (0.9)1 + P (late|route 2) = 2 1 Since the probability we are late under route 1 is smaller we should take route 1. Part (b): If we toss a coin to decide which route to take then the probability we took route one is given by P (late|route 1)P (route 1) P (late|route 1)P (route 1) + P (late|route 2)P (route 2) 0.0523(0.5) = = 0.215848 . 0.0523(0.5) + 0.19(0.5) P (route 1) = 44 Exercise 2.89 We want the probability that exactly one tag was lost given that at most one is lost P ((C1 ∩ C2′ ) ∪ (C1′ ∩ C2 )|(C1′ ∩ C2′ ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ )) = = P { (C1 ∩ C2′ ) ∪ (C1′ ∩ C2 ) ∩ (C1′ ∩ C2′ ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ ) } P {(C1′ ∩ C2′ ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ )} P {(C1 ∩ C2′ ) ∪ (C1′ ∩ C2 )} . ∩ C2′ ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ )} P {(C1′ The numerator N , in the above can be expanded using N = P (C1 ∩ C2′ ) + P (C1′ ∩ C2 ) − P ((C1 ∩ C2′ ) ∩ (C1′ ∩ C2 )) = P (C1)P (C2′ ) + P (C1′ )P (C2) − 0 = 2π(1 − π) . since the last term is zero. The denominator D, can be expanded as D = P (C1′ ∩ C2′ ) + P (C1′ ∩ C2 ) + P (C1 ∩ C2′ ) − P ((C1′ ∩ C2′ ) ∩ (C1′ ∩ C2 )) − P ((C1′ ∩ C2′ ) ∩ (C1 ∩ C2′ )) − P ((C1′ ∩ C2 ) ∩ (C1 ∩ C2′ )) + P ((C1′ ∩ C2′ ) ∩ (C1′ ∩ C2 ) ∩ (C1 ∩ C2′ )) = P (C1′ )P (C2′ ) + P (C1′ )P (C2) + P (C1 )P (C2′ ) − 0 − 0 − 0 + 0 = (1 − π)2 + 2π(1 − π) = (1 − π)(1 + π) . Dividing these two expressions we get P ((C1 ∩ C2′ ) ∪ (C1′ ∩ C2 )|(C1′ ∩ C2′ ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ )) = 2π . 1+π Exercise 2.90 Part (a): This would be 20 3 Part (b): This would be 19 3 = 1140. = 969. Part (c): This would be 20 − 10 = 1020 or the total number of shifts subtracting the 3 3 number of shifts that don’t have 1 of the 10 best machinists. Part (d): This would be 19 3 20 3 = 0.85 . Exercise 2.91 Let Li be the event that a can comes from line i. Note that the numbers given in the table are P (defect type|Li ). 45 Part (a): This would be 1 500 = . 1500 3 The probability that the reason for nonconformist was a crack is given by P (L1 ) = P (crack) = P (crack|L1 )P (L1 ) + P (crack|L2 )P (L2 ) + P (crack|L3 )P (L3 ) 5 4 6 = 0.5 + 0.44 + 0.4 = 0.444 . 15 15 15 Part (b): If the can came from line one the probability the defect was a blemish can be read from the table. We have P (blemish|L1 ) = 0.15. Part (c): P (L1 ∩ surface defect) = P (L1 |surface defect) = P (surface defect) 0.1 0.1 5 + 0.08 15 5 15 4 15 + 0.15 6 15 = 0.2906977 . Exercise 2.92 Part (a): We have 10 = 210 ways of choosing the six forms from the ten we have to hand 6 off. If we want to have the remaining four forms to be all of the same type we have to choose all six of the withdrawal petitions or all four of the substitution requests (and then any two withdrawal petitions). We can do this in 4 6 6 4 = 16 , + 4 2 6 0 ways. Then the probability this happens is then 16 210 = 0.07619048. Part (b): We have 10! ways of arranging all ten forms. The event that the first four occur in sequentially different order can start happening in two ways. By starting with a withdrawal petition or a substitution requests. Thus the number of ways that we can have the first four forms alternating is given by 6(4)(5)(3) + 4(6)(3)(5) . The first product 6(4)(5)(3) counts by first selecting one of the six withdrawal petition, then second selecting one of the substitution requests, then selecting another withdrawal petition, and finally another substitution request. The second product is derived in a similar way but starting with a substitution request. Thus the probability we are looking for is given by [6(4)(5)(3) + 4(6)(3)(5)]6! = 0.1428571 . 10! 46 Exercise 2.93 We know that when A and B are independent we have P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = P (A) + P (B) − P (A)P (B) . With P (A)P (B) = 0.144 we have P (B) = using the above we get 0.144 P (A) 0.626 = P (A) + and since we are told that P (A ∪ B) = 0.626 0.144 − 0.144 . P (A) If we write this as a quadratic equation we get P (A)2 − 0.770P (A) + 0.144 = 0 . If we solve for P (A) we get P (A) = 0.45 and P (B) = 0.32 (enforcing P (A) > P (B)). Exercise 2.94 Let Ci be the event that the ith relay works correctly for i = 1, 2, 3 . Then from the problem statement we know that P (Ci ) = 0.8 (so P (Ci′ ) = 0.2). Part (a): This would be Q3 i=1 P (Ci ) = 0.83 = 0.512. Part (b): To have the correct output we can have no errors (all relays work correctly) or two errors (two relays work incorrectly) to give a probability of 3 3 0 3 (0.2)2 (0.8)1 = 0.608 . (0.2) (0.8) + 2 0 Part (c): Let T be the event that we transmit a one and R the event that we receive a one. We are told that P (T ) = 0.7 and we want to evaluate P (T |R) = P (R|T )P (T ) . P (R|T )P (T ) + P (R|T ′ )P (T ′) In Part (b) we computed P (R|T ) = 0.608. In the same way we can compute P (R|T ′ ) = 0.392, thus we get 0.608(0.7) = 0.7835052 . P (T |R) = 0.608(0.7) + 0.392(0.3) Exercise 2.95 Part (a): This would be 1 5! Part (b): This would be 1(4!) 5! = 0.008333333. = 1 5 = 0.2. Part (c): This would be (4!)1 = 15 = 0.2, since we specify that F is the last person to hear 5! the rumor and then have 4! ways of arranging the other four people. 47 Exercise 2.96 At each stage the person F has a probability of 15 of getting the rumor and a probability 4 of not getting the rumor. The the probability that F has not heard the rumor after ten 5 10 tellings is then 45 = 0.1073742. Exercise 2.97 Let E be the event that we have the trace impurity in our sample and D the event that we detect this. Then from the problem statement we are told that P (D|E) = 0.8, P (D ′|E ′ ) = 0.9 and P (E) = 0.4. In the experiment performed we found two detections from three trials and let D be this event. Then we have P (V |E)P (E) . P (E|V ) = P (V |E)P (E) + P (V |E ′ )P (E ′) We have that 3 3 2 ′ 0.82 0.2 = 0.384 P (D|E) P (D |E) = P (V |E) = 2 2 3 3 0.12 0.9 = 0.027 . P (D|E ′)2 P (D ′|E ′ ) = P (V |E ′ ) = 2 2 Thus we compute P (E|V ) = 0.384(0.4) = 0.9045936 . 0.384(0.4) + 0.027(0.6) Exercise 2.98 This would be 3 1 52 = 0.3472222 . 63 For the denominator note that there are 63 ways to select the three chosen categories, since each contestant has six choices. For the numerator we have three choices for which contestant will select category one and then the other two contestants each have five choices for their categories. Exercise 2.99 Part (a): We break down the number of ways a faster can pass inspection as P (pass inspection) = P (pass initially) + P (pass after recrimping|pass initially′ , recrimped)P (pass initially′ , recrimped) = 0.95 + 0.05(0.8)(0.6) = 0.974 . 48 Part (b): This would be P (passed initially|passed inspection) = 0.95 0.95 == = 0.9753593 . 0.95 + 0.05(0.8)(0.6) 0.974 Note this result is different than that in the back of the book. If anyone sees anything that I did wrong please contact me. Exercise 2.100 Let D (for disease) be the event that we are a carrier of the disease, and T (for test) be the event that the test comes back positive. Then we are told that P (T |D) = 0.9 and P (T |D ′) = 0.05 . Part (a): This would be P (T1 , T2 ) + P (T1′ , T2′ ) = P (T1 )P (T2 ) + P (T1′ )P (T2′ ) . Now P (T1 ) = P (T |D)P (D) + P (T |D ′)P (D ′) = 0.9(0.01) + 0.05(0.9) = 0.0585 . Thus with this P (T1′ ) = 1 − 0.0585 = 0.9415. Thus the probability we want is given by 0.05852 + 0.94152 = 0.88984 . Part (b): This would be P (T1 , T2 |D)P (D) P (T1 , T2 |D)P (D) + P (T1 , T2 |D ′ )P (D ′) P (T |D)2P (D) 0.92 (0.01) = = = 0.7659 . P (T |D)2P (D) + P (T |D ′)2 P (D ′ ) 0.92 (0.01) + 0.052 (0.99) P (D|T1, T2 ) = Exercise 2.101 Let C1 and C2 be the events that components one and two function. Then from the problem statement we have that the probability that the second component functions is given by P (C2 ) = 0.9, the probability that both components function is given by P (C1 ∩ C2 ) = 0.75 and the probability that at least one component function is given by P ((C1 ∩ C2 ) ∪ (C1′ ∩ C2 ) ∪ (C1 ∩ C2′ )) = 0.96 . We want to evaluate P (C2|C1 ) = P (C1 ∩ C2 ) . P (C1 ) 49 To do this recognize that C1 ∩ C2 , C1′ ∩ C2 , and C1 ∩ C2′ are mutually exclusive events and we can write the probability that at least one component function as P (C1 ∩ C2 ) + P (C1′ ∩ C2 ) + P (C1 ∩ C2′ ) = 0.96 , or since we know P (C1 ∩ C2 ) this becomes P (C1′ ∩ C2 ) + P (C1 ∩ C2′ ) = 0.96 − 0.75 = 0.21 . (9) Now we can write the event C1 as the union of two mutually exclusive events so that we get P (C2 ) = 0.9 = P (C2 ∩ C1 ) + P (C2 ∩ C1′ ) = 0.75 + P (C2 ∩ C1′ ) , which gives P (C2 ∩C1′ ) = 0.15. Using this and Equation 9 we have P (C1 ∩C2′ ) = 0.21−0.15 = 0.06. Putting everything together we have P (C1) = P (C1 ∩ C2 ) + P (C1 ∩ C2′ ) = 0.75 + 0.06 = 0.81 . Thus the conditional probability we want is then given by P (C2 |C1 ) = 0.75 0.81 = 0.92592. Exercise 2.102 If we draw a diagram to represent the information given then from the diagram we have P (E1 ∩ L) = P (L|E1 )P (E1 ) = 0.02(0.4) = 0.08 . Exercise 2.103 Part (a): We would have (recall L is the event that a parcel is late) P (L) = P (L|E1 )P (E1 ) + P (L|E2 )P (E2 ) + P (L|E3 )P (E3 ) = 0.02(0.4) + 0.01(0.5) + 0.05(.1) = 0.018 . Part (b): We can compute the requested probability as P (L′ ∩ E1′ ) P (L′ ∩ (E2 ∪ E3 )) P ((L′ ∩ E2 ) ∪ (L′ ∩ E3 )) = = P (L′ ) 1 − P (L) 1 − P (L) ′ ′ ′ P (L |E2 )P (E2 ) + P (L′ |E3 )P (E3 ) P (L ∩ E2 ) + P (L ∩ E3 ) = = 1 − P (L) 1 − P (L) 0.99(0.5) + 0.95(0.1) = = 0.6008 . 1 − 0.018 P (E1′ |L′ ) = 50 Exercise 2.104 This is an application of Bayes’ rule where we want to evaluate P (R|Ai )P (Ai ) P (Ai |R) = P3 . j=1 P (R|Aj )P (Aj ) Using the numbers given in this problem we find these values given by [1] 0.3623188 0.3478261 0.2898551 Exercise 2.105 Part (a): We would find 365 10 and 10! = 0.88305 . P (All have different birthdays) = 36510 P (At least two people have the same birthdays) = 1 − P (all people have different birthdays) = 1 − 0.88305 = 0.1169 . Part (b): For general number of people k we would have P (all k have different birthdays) = 365 k k! 365k . 365 k k! P (At least two people have the same birthday = 1 − . k 365 We can consider different values of k and find the smallest value where the above probability is larger than one-half using the following R code ks = 1:40 prob = 1 - ( choose(365,ks) * factorial(ks) ) / ( 365^ks ) plot(ks,prob) grid() which( prob > 0.5 ) Running the above we find that k = 23. Part (c): Let E be the event that at least two people have the same birthday or at least two people have the same last three digits of their SSN then we have P (E) = 1 − P (all have different birthday)P (all have different SSN) ! ! 1000 365 10! 10! 10 10 = 1 − (0.88305)(0.9558606) = 0.155927 . =1− 10 365 100010 51 Exercise 2.106 In the tables for this problem we are given the values of P (Oi |G) and P (Oi |B) for i = 1, 2, 3. To save typing I’ll denote each of the possible observation ranges as Oi for i = 1, 2, 3. For example, O1 means the event that the observations were such that R1 < R2 < R3 . Part (a): In the notation of this problem we want to show that P (G|O1) > P (B|O1). Using Bayes’ rule we see that this inequality is equivalent to P (O1 |G)P (G) P (O1|B)P (B) > . P (O1 ) P (O1) We will evaluate the left-hand-side and the right-hand-side of the above and show that it is true. Note that we have P (O1) = P (O1 |G)P (G) + P (O1 |B)P (B) = 0.6(0.25) + 0.1(0.75) = 0.225 . With this the left-hand-side is given by 0.6(0.25) = 0.666 , 0.225 while the right-hand-side is given by 0.1(0.75) = 0.333 . 0.225 And we see that the requested inequality is true as we were to show. If we receive the measurement O1 , we would classify the sample as Granite since (as we just showed) its posterior probability is larger. Part (b): The first question corresponds to the observation we are calling O2 . In this case we have P (O2) = 0.25(0.25) + 0.2(0.75) = 0.2125 , so that P (G|O2) = 0.25(0.25) 0.2(0.75) = 0.294 and P (B|O2) = = 0.705 . 0.2125 0.2125 In this case we classify as Basalt. The second question corresponds to the observation we are calling O3 . We have P (O3) = 0.15(0.25) + 0.7(0.75) = 0.5625 , so that P (G|O3) = 0.7(0.75) 0.15(0.25) = 0.066 and P (B|O3) = = 0.933 . 0.5625 0.5625 In this case we also classify as Basalt. 52 Part (c): Since if I receive the observation O1 , we would classify the rock as Granite an error will happen if in fact the rock is Basalt. Generalizing this for the other three types of observations the probability of error is given by P (O1 ∩ B) + P (O2 ∩ G) + P (O3 ∩ G) = P (O1 |B)P (B) + P (O2|G)P (G) + P (O3 |G)P (G) = 0.1(0.75) + 0.25(0.25) + 0.15(0.25) = 0.175 . Part (d): As in Part (a) we classify a Granite if P (G|O1) > P (B|O1 ) or P (O1|B)P (B) P (O1 |G)P (G) > , P (O1 ) P (O1) or 0.6p 0.1p > , 0.6p + 0.1(1 − p) 0.6p + 0.1(1 − p) which is always true no matter what the value of p. Next we would need to set up the same equations for observations O2 and O3 and see what restrictions they imposed on the value of p. A value of p large enough should make us always classify every rock as Granite. Exercise 2.107 The probability we want is given by the expression P (detected) = P (G1 ∪ G2 ∪ · · · ∪ Gn ) = 1 − P (G′1 ∩ G′2 ∩ · · · ∩ G′n ) = 1 − P (G′1 )P (G′2 ) · · · P (G′n ) = 1 − (1 − p1 )(1 − p2 ) · · · (1 − pn ) . Exercise 2.108 Part (a): We would need to get four balls in a row which happens with probability 0.54 . Part (b): This would be 5 2 3 0.5 0.5 (0.5) = 0.15625 . 2 Part (c): This would be 5 4 3 2 3 1 3 0 3 0.5 0.5 (0.5) = 0.28125 . 0.5 0.5 (0.5) + 0.5 0.5 (0.5) + 2 1 0 Where in the above the first term corresponds to four total pitches (all balls), the second term corresponds to five total pitches one of which is a strike and the last of which is a 53 ball and the third term corresponds to six total pitches two of which are strikes and the last of which is a ball. Given this value, the probability of a strike out is then given by 1 − 0.28125 = 0.71875. Part (d): To have the first batter score a run with each batter not swinging the pitcher must walk four batters before he gets three outs. Let pw = 0.28125 and ps = 0.71875 be the probabilities that the pitcher walks or strikes out a given batter (which we computed in Part (b)). Then the probability we want is given by 5 2 3 4 1 3 3 0 3 p p pw = 0.0565 . p p pw + p p pw + 2 s w 1 s w 0 s w Exercise 2.109 Part (a): This is 1 4! = 0.0416 Part (b): It is not possible to have three engineers (but not four) in correct rooms. The (4 ) probability we have only two engineers in correct rooms is 4!2 = 0.25. The probability we have one engineer in a correct room is given by 4 3 3! − 1 − 0 − 1 1 = 0.333 . 4! In the numerator of the above fraction from 3! we subtract 1 (the number of ways to have three engineers in correct rooms), then 0 (the number of ways to have only two engineers in correct rooms) and finally 31 (the number of ways to have one engineer in the correct rooms). This then gives 4 1 1− − 2 − 0.333 = 0.375 , 24 4! for the probability we seek. Exercise 2.110 Part (a): By independence we have P (A∩B ∩C) = P (A)P (B)P (C) = 0.6(0.5)(0.4) = 0.12, so 1 − P (A ∩ B ∩ C) = 1 − 0.12 = 0.88. Part (b): We have P (A ∩ B ′ ∩ C ′ ) = P (A)P (B ′)P (C ′ ) = 0.6(1 − 0.5)(1 − 0.4) = 0.18 . What we need to evaluate is P (A ∩ B ′ ∩ C ′ ) + P (A′ ∩ B ∩ C ′ ) + P (A′ ∩ B ′ ∩ C) , or (0.6)(1 − 0.5)(1 − 0.4) + (1 − 0.6)(0.5)(1 − 0.4) + (1 − 0.6)(1 − 0.5)(0.4) = 0.38 . 54 Exercise 2.111 See the python code ex2 111.py where the given strategy is implemented for any value of n the number of people we interview. For example, when we run that code with n = 10 we get the following output s= s= s= s= s= s= s= s= s= s= 0; 1; 2; 3; 4; 5; 6; 7; 8; 9; prob_best_hire= prob_best_hire= prob_best_hire= prob_best_hire= prob_best_hire= prob_best_hire= prob_best_hire= prob_best_hire= prob_best_hire= prob_best_hire= 362880/3628800 1026576/3628800 1327392/3628800 1446768/3628800 1445184/3628800 1352880/3628800 1188000/3628800 962640/3628800 685440/3628800 362880/3628800 = = = = = = = = = = 0.100000 0.282897 0.365794 0.398690 0.398254 0.372817 0.327381 0.265278 0.188889 0.100000 Were we see that in this case if we take s = 3 we maximize the probability that we hire the best person. If n = 4 then s = 1 maximizes the probability we hire the best worker. Exercise 2.112 The probability of at least one event is given by P = 1 − P (no events happen) = 1 − P (A′1 ∩ A′2 ∩ A′3 ∩ A′4 ) = 1 − P (A′1)P (A′2 )P (A′3 )P (A′4 ) = 1 − (1 − p1 )(1 − p2 )(1 − p3 )(1 − p4 ) . The probability of at least two events happen P (only one event happens) = P (A1 ∩ A′2 ∩ A′3 ∩ A′4 ) + P (A′1 ∩ A2 ∩ A′3 ∩ A′4 ) + P (A′1 ∩ A′2 ∩ A3 ∩ A′4 ) + P (A′1 ∩ A′2 ∩ A′3 ∩ A4 ) = p1 (1 − p2 )(1 − p3 )(1 − p4 ) + (1 − p1 )p2 (1 − p3 )(1 − p4 ) + (1 − p1 )(1 − p2 )p3 (1 − p4 ) + (1 − p1 )(1 − p2 )(1 − p3 )p4 . Using these we have that P (at least two events happen) = 1 − P (no events happen) − P (only one event happens) . Exercise 2.113 P (A1 ∩ A2 ) = P (win prize 1 and win prize 2) = 55 1 , 4 since in this case we have to draw the fourth slip of paper for this event to happen. Now P (A1 ) = 24 = 12 and P (A2) = 42 = 21 and so we have P (A1 ∩ A2 ) = P (A1 )P (A2 ). In the same way we have 1 P (A1 ∩ A3 ) = P (win prize 1 and win prize 3) = , 4 since in this case we have to draw the fourth slip of paper for this event to happen. Now P (A3 ) = 42 = 12 and so we have P (A1 ∩ A3 ) = P (A1 )P (A3 ). Now for A2 ∩ A3 we have the same conclusion. For P (A1 ∩ A2 ∩ A3 ) we have P (A1 ∩ A2 ∩ A3 ) = P (win prize 1, 2 and 3) = 1 1 6= P (A1 )P (A2 )P (A3 ) = , 4 8 as we were to show. Exercise 2.114 Using the definition of conditional probability we have P (A1 |A2 ∩ A3 ) = P (A1 ∩ A2 ∩ A3 ) P (A1 )P (A2 )P (A3 ) = = P (A1 ) . P (A2 ∩ A3 ) P (A2 )P (A3 ) 56 Discrete Random Variables and Probability Distributions Problem Solutions Exercise 3.1 We would have the following outcomes SSS, SSF, SFS, FSS, FFS, FSF, SFF, FFF with the following values for the random variable X 3, 2, 2, 2, 1, 1, 1, 0 Exercise 3.2 The event having a child of a specified sex (like a boy or a girl) can be viewed as a “success” and having a child of the opposite sex viewed as a failure. Catching a train (or not) can be viewed as an experiment where the outcome is either a “success” or a “failure”. Making a passing grade in a statistics class can be viewed as an experiment where the outcome is either a “success” or a “failure”. Exercise 3.3 The minimum number of cars at the two pumps or the product of the number of cars at the two pumps. Exercise 3.4 A zip code is a five digit number. Assuming that all possible five digit numbers are possible zip codes the number of nonzero numbers in a zip code could be 0, 1, 2, 3, 4, 5. Having zero nonzero numbers means that we have the zip code 00000 which is probably not realistic thus X = 0 is not an allowable outcome. There maybe other restrictions on the numerical form that a valid zip code can take. 57 Exercise 3.5 No. The mapping take more than one sample in the sample space to the same numerical value. Exercise 3.6 X would be 1, 2, 3, . . . , ∞. A few examples of experimental outcomes might be L, RL, RSSL and their X values are 1, 2, 4 Exercise 3.7 Part (a): The variable is discrete and ranges from 0 ≤ X ≤ 12. Part (b): The variable is discrete and ranges from zero to the number of students in the class. Part (c): The variable is discrete and ranges from one to +∞ (where the golfer never hits the ball). Part (d): The variable is real and ranges from zero to the largest known length of a rattlesnake. Part (e): The variable is discrete and ranges from zero (for no books being sold) to the largest known amount for sales (which is 10000c where c is the royalty per book), in increments of c. Part (f): The variable is real and ranges over the range of the pH scale. Part (g): The variable is real and ranges over the smallest to largest possible tension. Part (h): The variable is discrete and ranges over the values three to ∞ (if a match) is never obtained. 58 Exercise 3.8 We can get Y = 3 for the outcome SSS . We can get Y = 4 for the outcome FSSS . We can get Y = 5 for the outcomes FFSSS , SFSSS . We can get Y = 6 for the outcomes SSFSSS , FSFSSS , SFFSSS , FFFSSS . We can get Y = 7 for the outcomes FFFFSSS , FFSFSSS , FSFFSSS , SFFFSSS , SSFFSSS , SFSFSSS , FSSFSSS . Exercise 3.9 Part (a): X would take the values 2, 4, 6, . . . . Part (b): X would take the values 2, 3, 4, 5, 6, . . . . Exercise 3.10 Part (a): T would have a range from 0, 1, . . . , 9, 10. Part (b): X would have a range from −3, −2, . . . , 4, 5. Part (c): U would have a range from 0, 1, 2, . . . , 5, 6. Part (d): Z would have a range from 0, 1, 2. Exercise 3.11 Part (a): This would be P (X = 4) = 0.45, P (X = 6) = 0.4 and P (X = 8) = 0.15. Part (c): This would be P (X ≥ 6) = 0.4 + 0.15 = 0.55 and P (X > 6) = P (X = 8) = 0.15. 59 Exercise 3.12 Part (a): This would be P (Y ≤ 50) = sum(0.05, 0.1, 0.12, 0.14, 0.25, 0.17) = 0.83. Part (b): This would be P (Y > 50) = 1 − P (Y ≤ 50) = 0.17. Part (c): If we are the first on the standby list then there must be at least one seat available or P (Y ≤ 49) = 0.66. If we are the third person on the standby list then there must be at least three seats available or P (Y ≤ 50 − 3) = P (Y ≤ 47) = 0.27. Exercise 3.13 (phone lines in use) Part (a): This would be P (X ≤ 3) = 0.1 + 0.15 + 0.2 + 0.25 = 0.7. Part (b): This would be P (X < 3) = 0.1 + 0.15 + 0.2 = 0.45. Part (c): This would be P (X ≥ 3) = 0.25 + 0.2 + 0.06 + 0.04 = 0.55. Part (d): This would be P (2 ≤ X ≤ 5) = 0.2 + 0.25 + 0.2 + 0.06 = 0.71. Part (e): This would be P 1 − P (2 ≤ X ≤ 4) = 1 − (0.2 + 0.25 + 0.2) = 0.35. Part (f): This would be P (X = 0) + P (X = 1) + P (X = 2) = 0.1 + 0.15 + 0.2 = 0.45. Exercise 3.14 Part (a): We must have k such that P5 y=1 ky = 1. Solving for k we get k = Part (b): This would be P (Y ≤ 3) = k(1 + 2 + 3) = Part (c): This would be P (2 ≤ Y ≤ 4) = Part (d): We would need to check that P 2 p(y) we find 5y=1 y50 = 1.1 6= 1 so no. 1 (2 15 P5 y=1 1 (6) 15 1 . 15 = 25 . + 3 + 4) = 53 . p(y) = 1. If we put in the suggested form for Exercise 3.15 Part (a): These would be the selections (1, 2) , (1, 3) , (1, 4) , (1, 5) , (2, 3) , (2, 4) , (2, 5) , (3, 4) , (3, 5) , (4, 5) . 60 Part (b): From the above sample set we have that 1 10 3 6 = P (X = 1) = 10 5 3 . P (X = 0) = 10 P (X = 2) = Part (c): We would have 3 10 9 F (1) = 10 F (2) = 1 . F (0) = Exercise 3.16 Part (a): We could tabulate the possible sequences of S and F and count or recognize that X is a binomial random variable and thus 4 0.3x 0.74−x for 0 ≤ x ≤ 4 . P (X = x) = x Part (c): Evaluating the above and looking for the largest value of P (X = x) we see that it is when x = 1. Part (d): This would be P (X ≥ 2) = 0.3483. Exercise 3.17 Part (a): This would be p(2) = 0.92 = 0.81 Part (b): This would be p(3) = 2(0.1)(0.9)2 = 0.162. Part (c): If Y = 5 we must have the fifth battery be acceptable. Thus in the previous four we need to have one other acceptable batter. These would be the events UUUAA , UUAUA , UAUUA , AUUU A . Thus p(5) = 4(0.1)3 (0.9)2 = 0.00324. Part (d): p(y) = (y − 1)(0.1)y−2(0.9)2 . 61 Exercise 3.18 Part (a): To solve this part we can make a matrix where the outcome of the first die corresponding to a row and the value of the second die corresponds to a column and the matrix hold the value of the maximum of these two elements. We can then count the number of times each possible maximum value occurs. When we do that we get p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1 36 3 36 5 36 7 36 9 36 11 . 36 Part (b): This would be 1 36 4 F (2) = 36 9 F (3) = 36 16 F (4) = 36 25 F (5) = 36 F (6) = 1 . F (1) = Exercise 3.19 We can construct a matrix where the rows represent the day (Wednesday, Thursday, Friday, or Saturday) when the first magazine arrives and the columns represent the day when the second magazine arrives. Then from the given probabilities on each day and assuming a product model for the joint events we have P (Y = 0) = 0.09 P (Y = 1) = 0.12 + 0.16 + 0.12 = 0.4 P (Y = 2) = 0.06 + 0.08 + 0.04 + 0.08 + 0.06 = 0.32 P (Y = 3) = 0.03 + 0.04 + 0.02 + 0.03 + 0.04 + 0.02 + 0.01 = 0.19 . P Note that 3y=0 P (Y = y) = 1 as it must. 62 Exercise 3.20 Part (a): Following the hint, we will label the couples as #1, #2, and #3 and the two individuals as #4 and #5. • To have no one arrive late X = 0 will happen with probability 0.65 . • To have only one person arrive late X = 1 will happen if either #4 or #5 arrives late and thus with a probability 2(0.4)0.64 . • To have two people arrive late X = 2 will happen if either #1, #2, or #3 arrives late or both #4 and #5 arrives late. This combined event happens with a probability 3(0.4)0.64 + 0.42 0.63 . • To have three people arrive late X = 3 will happen if one of #1, #2, or #3 arrives late and one of #4 or #5 arrives last. This will happen with a probability of 6(0.4)2 (0.6)3 . • To have four people arrive late X = 4 will happen if two of #1, #2, or #3 are late with #4 and #5 on time or one of #1, #2, or #3 arrive late with both of #4 and #5 late. This happens with a probability of 3 3 2 3 0.43 0.62 . 0.4 0.6 + 1 2 • To have five people arrive late X = 5 will happen if two of #1, #2, or #3 are late with one of #4 and #5 also late. This will happen with a probability 2 32 0.43 0.62 . • To have six people arrive late will happen if all of #1, #2, and #3 are late with #4 and #5 on time or two of #1, #2, and #3 are late with both of #4 and #5 late. This will happen with probability 3 3 2 0.42 0.6(0.4)2 . 0.4 0.6 + 2 • To have seven people arrive late will happen if all of #1, #2, and #3 are late and one of #4 and #5 are late. This will happen with probability 20.43 0.6(0.4) . • To have eight people arrive late will happen with probability of 0.45 . Once can check that given by P8 x=0 P (X = x) = 1 as it should. As an R array these probabilities are [1] 0.07776 0.10368 0.19008 0.20736 0.17280 0.13824 0.06912 0.03072 0.01024 63 Exercise 3.21 Part (a): Using R we could compute p(x) as xs = 1:9 ps = log( 1 + 1/xs, base=10 ) cdf = cumsum(ps) which gives [1] 0.30103000 0.17609126 0.12493874 0.09691001 0.07918125 0.06694679 0.05799195 [8] 0.05115252 0.04575749 These numbers are to be compared with 1/9 = 0.1111111. Notice that starting with a one has a much higher probability than 1/9. Part (b): This is given by [1] 0.3010300 0.4771213 0.6020600 0.6989700 0.7781513 0.8450980 0.9030900 [8] 0.9542425 1.0000000 Part (c): This would be F (3) = 0.6020600 and P (X > 5) = 1 − P (X ≤ 4) = 1 − F (4) = 1 − 0.6989700 = 0.30103. Exercise 3.23 Part (a): p(2) = F (2) − F (1) = 0.39 − 0.19 = 0.2. Part (b): P (X > 3) = 1 − P (X ≤ 2) = 1 − 0.39 = 0.61. Part (c): P (2 ≤ X ≤ 5) = F (5) − F (1) = 0.97 − 0.19 = 0.78. Part (d): P (2 < X < 5) = P (3 ≤ X ≤ 4) = F (4) − F (2) = 0.92 − 0.39 = 0.53. 64 Exercise 3.24 Part (a): This would be Part (b): These would be 0.3 x = 1 0.1 x = 3 0.05 x = 4 . p(x) = 0.15 x = 6 0.4 x = 12 P (3 ≤ X ≤ 6) = F (6) − F (1) = 0.6 − 0.3 = 0.3 P (4 ≤ X) = 1 − P (X ≤ 3) = 1 − 0.4 = 0.6 . Exercise 3.25 We would have P (Y = 0) = P (Y = 1) = P (Y = 2) = .. . P (Y = y) = p (1 − p)p (1 − p)2 p (1 − p)y p . Exercise 3.26 Part (a): Alvie will visit at least one friend since he moves from the center to either A, B, C, or D on the first step. There he might go back to the center (only visiting one friend) or he might go to another friend. Logic like this gives rise to the following probability distribution P (X = 0) = 0 1 P (X = 1) = 3 2 1 P (X = 2) = 3 3 2 2 1 P (X = 3) = 3 3 3 2 1 P (X = 4) = 3 3 .. . x−1 2 1 P (X = x) = . 3 3 65 Part (b): Alvie will have to cross at at least two segments (visiting a friend and then coming back home). He can cross three total segments if after visiting the first friend he visits one more and then goes home. Logic like this gives rise to the following probability distribution P (Y = 1) = 0 1 P (Y = 2) = 3 2 1 P (Y = 3) = 3 3 2 2 1 P (Y = 4) = 3 3 3 2 1 P (Y = 5) = 3 3 .. . y−2 1 2 . P (Y = y) = 3 3 Exercise 3.27 (the matching distribution) Part (a-b): See the python code ex3 27.py where we implement this. When we run that code we get the following output counts of number of different matches= Counter({0: 9, 1: 8, 2: 6, 4: 1}) probability of different n_matches values= {0: 0.375, 1: 0.3333333333333333, 2: 0.25, 4: 0.041666666666666664} This indicates that there is a probability of 0.375 that there will be no matches (there were 9 permutations of the numbers 1 − 4 with no matches), a probability 0.33333 that there will be one matches (there were 8 permutations of the numbers 1 − 4 with one matches) etc. Exercise 3.28 From the definition of the cumulative distribution function we have X X X p(y) . p(y) + p(y) = F (x2 ) = y:y≤x2 y:y≤x1 y:x1 <y≤x2 As p(y) ≥ 0 for all y, the second sum in the above expression is nonnegative and the first sum is the definition of F (x1 ). Thus we have P shown that F (x2 ) ≥ F (x1 ). We will have F (x1 ) = F (x2 ) if this second sum is zero or y:x1 <y≤x2 p(y) = 0. 66 Exercise 3.29 Part (a): From the numbers given we get E(X) = 2.06. Part (b): From the numbers given we get Var (X) = 0.9364. Part (c): This would be the square root of the above or 0.9676776. Part (d): This gives the same answer as in Part (b) as it must. We computed these using the R code xs = 0:4 ps = c( 0.08, 0.15, 0.45, 0.27, 0.05 ) ex = sum( ps * xs ) v_x = sum( ps * ( xs - ex )^2 ) ex2 = sum( ps * xs^2 ) v_x_2 = ex2 - ex^2 c( ex, v_x, sqrt(v_x), v_x_2 ) Exercise 3.30 Part (a): From the numbers given we get E(Y ) = 0.6. Part (b): From the numbers given we get E(100Y 2 ) = 110. Exercise 3.31 From exercise 12 on Page 60 we compute Var (Y ) = 4.4944 and σY = 2.12. Then the probability that Y is within one standard deviation of the mean is 0.65. These were calculated with the following R code ys = 45:55 ps = c( 0.05, 0.1, 0.12, 0.14, 0.25, 0.17, 0.06, 0.05, 0.03, 0.02, 0.01 ) ey = sum( ys * ps ) ey2 = sum( ys^2 * ps ) var_y = ey2 - ey^2 sqrt( var_y ) 67 inds = abs( ys - ey ) < sqrt( var_y ) sum( ps[inds] ) Note that this answer is different than the one in the back of the book. If anyone sees anything wrong with what I have done please contact me. Exercise 3.32 Part (a): For the given numbers we have E(X) = 16.3800, E(X 2 ) = 272.2980 and Var (X) = 3.9936. Part (b): This would be 25E(X) − 8.5 = 401. Part (c): This would be 252 Var (X) = 2496. Part (d): This would be E(X) − 0.01E(X 2 ) = 13.65702. Exercise 3.33 Part (a): E(X 2 ) = 02 (1 − p) + 12 p = p . Part (b): Var (X) = E(X 2 ) − E(X)2 = p − p2 = p(1 − p) . Part (c): E(X 79 ) = 079 (1 − p) + 179 p = p . Exercise 3.34 To have E(X) finite we would need to be able to evaluate the following sum ∞ ∞ c X X 1 . E(X) = x 3 =c x x2 x=1 x=1 This sum exists and thus E(X) is finite. 68 Exercise 3.35 Let R3 be the revenue if we order three copies. Then we have R3 = −3 + 2 min(X, 3) . Taking the expectation of this we get E[R3 ] = 2.466667. The same type of calculation if we order four gives E[R4 ] = 2.666667. Thus we should order four if we want the largest expected revenue. Exercise 3.36 The policy profit in terms of X the claims X is given by policy profit = policy cost − max(X − 500, 0) . If the company wants to have an expected profit of 100 then taking the expectation of the above gives 100 = policy cost − E[max(X − 500, 0)] . From the numbers given using the R code xs = c(0, 1000, 5000, 10000) ps = c( 0.8, 0.1, 0.08, 0.02 ) expectation = sum( ps * pmax( xs - 500, 0 ) ) we find the expectation to be 600. Thus solving for the policy cost we get that it should be 700. Exercise 3.37 Using the summation formulas given we have n n(n + 1) n+1 = 2 2 n 1X 2 1 n(n + 1)(2n + 1) (n + 1)(2n + 1) 2 E(X ) = x = = n x=1 n 6 6 1X 1 E(X) = x= n x=1 n Var (X) = E(X 2 ) − E(X)2 = n2 − 1 , 12 when we use the two expressions above and simplify. 69 Exercise 3.38 We want to know if E 1 X > 1 . 3.5 If it is then we should gamble and if not we should take the fixed amount. We find this expectation given by 6 1X1 1 = 0.4083333 > = 0.2857143 , 6 x=1 x 3.5 thus one should gamble. Exercise 3.39 Using the numbers given in the book we compute E(X) = 2.3 and Var (X) = 0.81. The number of pounds left after selling X lots is 100 − 5X. Then the expected number of pounds left is 88.5 with a variance of 20.25. Exercise 3.40 Part (a): A plot of the pmf p(−X) would be the same as a plot of p(X) but reflected about the X = 0 axis. Since the spread of these two distributions is the same we would conclude that Var (X) = Var (−X). Part (b): Let a = −1 and b = 0 to get Var (−X) = (−1)2 Var (X) = Var (X) . Exercise 3.41 Expression 3.13 from the book is Var (h(X)) = X x∈D {h(x) − E[h(x)]}2 p(x) . Following the hint let h(X) = aX + b then E[h(X)]] = aE[X] + b = aµ + b and X X 2 (a(x − µ))2 p(x) = a2 Var (h(X)) = (x − µ)2 p(x) = a2 σX . D D 70 (10) Exercise 3.42 Part (a): Following the hint we have E(X(X −1)) = E(X 2 )−5 so E(X 2 ) = 27.5+5 = 32.5. Part (b): Var (X) = 32.5 − 25 = 7.5. Part (c): We have E(X 2 ) = E(X(X − 1)) + E(X) so Var (X) = E(X 2 ) − E(X)2 = E(X(X − 1)) + E(X) − E(X)2 . Exercise 3.43 We have If c = µ then E(X − c) = 0. E(X − c) = E(X) − c = µ − c . Exercise 3.44 Part (a): For these values of k we find upper bounds given by [1] 0.2500000 0.1111111 0.0625000 0.0400000 0.0100000 Part (a): Using Exercise 13 on Page 60 we compute µ = 2.64 and σ = 1.53961 and then P (|X − µ| ≥ kσ) for the values of k suggested to get [1] [1] [1] [1] [1] "k= "k= "k= "k= "k= 2, 3, 4, 5, 10, P(|X-mu|>=k P(|X-mu|>=k P(|X-mu|>=k P(|X-mu|>=k P(|X-mu|>=k sigma)= sigma)= sigma)= sigma)= sigma)= These suggest that the upper bound of 1 k2 0.040000" 0.000000" 0.000000" 0.000000" 0.000000" is relatively loose. Part (c): For this given distribution we find 1 8 (−1) + (0) + 18 9 1 8 E(X 2 ) = (+1) + (0) + 18 9 1 1 Var (X) = so σ = . 9 3 µ= 71 1 (+1) = 0 18 1 1 (+1) = 18 9 Then we find 1 1 2 = ≤ . 18 9 9 This shows that the upper bound in Chebyshev’s inequality can sometimes be achieved. P (|X − µ| ≥ 3σ) = P (|X| ≥ 1) = Part (d): To do this we let p(x) be given by 1 1 2 25 x = −1 24 x=0 p(x) = 1 251 x = +1 2 25 1 1 Then with this density we have E(X) = 0, E(X 2 ) = 50 (2) = 25 thus σ = 51 . With these we have 1 P (|X − µ| ≥ 5σ) = P (|X| ≥ 1) = = 0.04 , 25 as we were to show. Exercise 3.45 We have E(X) = X x∈D xp(x) ≤ X bp(x) = b , x∈D since x ≤ b for all x ∈ D. In the same way since x ≥ a for all x in D we have X X ap(x) ≤ xp(x) so a ≤ E(X) . x∈D x∈D Exercise 3.46 We will use R notation to evaluate these. We would compute c( dbinom(3,8,0.35), dbinom(5,8,0.6), sum(dbinom(3:5,7,0.6)), sum(dbinom(0:1,9,0.1)) ) [1] 0.2785858 0.2786918 0.7451136 0.7748410 Exercise 3.47 We will use R notation to evaluate these. For Part (a)-(d) we would compute c( pbinom(4,15,0.3), dbinom(4,15,0.3), dbinom(6,15,0.7), pbinom(4,15,0.3) - pbinom(1,15,0.3) ) [1] 0.51549106 0.21862313 0.01159000 0.48022346 For Part (e)-(g) we would compute 72 c( 1-pbinom(1,15,0.3), pbinom(1,15,0.7), pbinom(5,15,0.3) - pbinom(2,15,0.3) ) [1] 9.647324e-01 5.165607e-07 5.947937e-01 Exercise 3.48 Part (a): P (X ≤ 2) is given by pbinom(2,25,0.05) = 0.8728935. Part (b): P (X ≥ 5) = 1−P (X ≤ 4) is given by 1 - pbinom(4,25,0.05) = 0.007164948. Part (c): We would compute this using P (1 ≤ X ≤ 4) = P (X ≥ 4) − P (X ≥ 0) , This is given in R by pbinom(4,25,0.05) - pbinom(0,25,0.05) = 0.7154455. Part (d): This would be given by pbinom(0,25,0.05) = 0.2773896. Part (e): We have E(X) = np = 25(0.05) = 1.25 and Var (X) = np(1 − p) = 1.1875 so SD (X) = 1.089725 . Exercise 3.49 Using R to evaluate these probabilities we have Part (a): This is given by dbinom(1,6,0.1) [1] 0.354294 Part (b): We want to evaluate P (X ≥ 2) = 1 − P (X ≤ 1). In R this is 1 - pbinom(1,6,0.1) [1] 0.114265 Part (c): Let X be the number of goblets examined to find four that are in fact “good” (i.e. not defective). Then we want to compute P (X ≤ 5). We can have the event X ≤ 5 happen if the first four goblets are “good” or there is one defective goblet in the first four but the fifth is “good”. These to mutually exclusive events have probabilities P (X = 4) = 0.94 = 0.6561 P (X = 5) = P (one defective goblet in first four examined)P (fifth goblet is good) 4 3 (0.1)(0.9 ) 0.9 = 0.26244 . = 1 73 Taking the sum of these two numbers we have P (X ≤ 5) = 0.91854. Exercise 3.50 Let X be the number of fax messages received. Then using R to evaluate these probabilities we have the following Part (a): P (X ≤ 6) we use pbinom(6,25,0.25)=0.5610981. Part (b): P (X = 6) we use dbinom(6,25,0.25)=0.1828195. Part (c): P (X ≥ 6) = 1 − P (X ≤ 5) we use 1 - pbinom(5,25,0.25)=0.6217215. Part (d): P (X > 6) = P (X ≥ 7) = 1−P (X ≤ 6) we use 1 - pbinom(6,25,0.25)=0.4389019. Exercise 3.51 Again let X be the number of fax messages received. Then we have Part (a): E(X) = np = 25(0.25) = 6.25. Part (b): Var (X) = np(1 − p) = 25(0.25)(0.75) = 4.6875, thus SD (X) = 2.165064. p Var (X) = Part (c): For this part we want to evaluate P (X ≥ 6.25 + 2(2.165064) = 10.58013). We can evaluate this using sum( dbinom( 11:25, 25, 0.25 ) ) [1] 0.02966991 Exercise 3.52 Let X be the number of students that buy a new textbook. Part (a): E(X) = np = 25(0.3) = 7.5 and SD (X) = 2.291288. p np(1 − p) = p 25(0.3)(0.7) = Part (b): Using the above we have E(X) + 2SD (X) = 12.08258 so that the probability we 74 want is given by P (X > 12.08258) = 25 X b(x; 25, 0.3) = 0.01746974 . x>12.08258 Part (c): We are told that there are 15 new and 15 old textbooks in stock. Let X be the number of students (from the 25) that want a new textbook. Then Y = 25 − X is the number of students that want a used textbook. We want to know for what values of X we have 0 ≤ X ≤ 15 and 0 ≤ Y ≤ 15 . Using the known expression for Y in terms of X we have that the second inequality above is equivalent to 10 ≤ X ≤ 25 . Intersecting this condition with that from earlier (i.e. 0 ≤ X ≤ 15) we see that the condition we want to determine the probability for is given by 10 ≤ X ≤ 15 . We can check the end points of this range to make sure that the constraints on the quantities of books holds. For example, if X = 10 then Y = 15 and we buy 10 new textbooks and 15 used textbooks. If X = 15 then Y = 10 and we buy 15 new textbooks and 10 used textbooks. Each of these statements is possible. The probability we want to evaluate is then given by 15 X b(k; 25, 0.3) = 0.1889825 . k=10 Part (d): The expression for the revenue when we sell X new textbooks and Y old textbooks is h(X) = 100X + 70Y = 100X + 70(25 − X) = 70(25) + 30X = 1750 + 30X . Thus the expectation of revenue is then given by 25 X h(x)b(x; 25, 0.3) = 1750 + 30E(X) = 1750 + 30(25)(0.3) = 1975 . x=0 Exercise 3.53 Part (a): Let X be the number of individuals with no traffic citations (in three years). We would like to compute P (X ≥ 10). From Exercise 30 Section 3.3 the probability a given individual haves no citations (in last three years) is 0.6. The probability we then want is given by (in R notation) P (X ≥ 10) = 1 − pbinom(9, 15, 0.6) = 0.4032156 . 75 Part (b): Let Y = 15 − X then Y is the random variable representing the number with at least one citation. We want to evaluate P (Y ≤ 7) = pbinom(7, 15, 0.4) = 0.7868968 . Part (c): For this we want P (5 ≤ Y ≤ 10) = pbinom(10, 15, 0.4) − pbinom(4, 15, 0.4) = 0.7733746 . Exercise 3.54 Let X be the random variable representing the number of customers who want/prefer the oversized version of the racket. Part (a): For this part we want to P (X ≥ 6) = 1 − P (X ≤ 5) = 0.6331033 Part (b): We have µX = np = 10(0.6) = 6 p √ σX = npq = 10(0.6)(0.4) = 1.549193 . Using these numbers we have that µX − σX = 4.450807 and µx + σX = 7.549193 so that we next want to evaluate P (5 ≤ X ≤ 7). Using R we find this to be equal to 0.6664716. Part (c): Let Y be the number of customers that get the midsized version of the racket. In terms of X we know that Y = 10 − X. We then want to know for which values of X do we have 0 ≤ X ≤ 7 and 0 ≤ Y ≤ 7 . Using the fact that Y = 10 − X we get that the second expression is equivalent to 3 ≤ X ≤ 10 . Intersecting this with the requirement 0 ≤ X ≤ 7 we want to compute P (3 ≤ X ≤ 7) = pbinom(7, 10, 0.6) − pbinom(2, 10, 0.6) = 0.8204157 . Exercise 3.55 Our phone must first be submitted for service which happens with a probability of 20% and then once submitted there is a 40% chance that it will be replaced. Thus the initial probability that a phone is replaces is then 0.2(0.4) = 0.08. The probability that two from ten will be replaced is given by (in R) as dbinom(2,10,0.08) = 0.147807. 76 Exercise 3.56 Let X be the random variable representing the number of students (from 25) that received special accommodation. Part (a): This would be P (X = 1) = dbinom(1, 25, 0.02) = 0.3078902. Part (b): This would be P (X ≥ 1) = 1 − P (X = 0) = 1 − dbinom(0, 25, 0.02) = 0.3965353. Part (c): This would be P (X ≥ 2) = 1 −P (X = 0) −P (X = 1) = 1 −pbinom(1, 25, 0.02) = 0.0886451. √ Part (d): We have µX = np = 25(0.02) = 0.5 and σX = npq = 0.7. With these we have that µX − 2σX = −0.9 and µX + 2σX = 1.9. Thus the probability we want to evaluate is given by P (0 ≤ X ≤ 1) = 0.9113549. Part (e): As before X is the number of students that are allowed special accommodations and let Y be the number of students not allowed special accommodations then Y = 25 − X. The total exam time T is given by T = 3Y + 4.5X = 3(25 − X) + 4.5X = 75 + 1.5X . Thus the expectation of T is given by E(T ) = 75 + 1.5E(X) = 75 + 1.5(25)(0.02) = 75.75 , hours. Exercise 3.57 Both batteries will work with probability 0.92 = 0.81. The probability a flashlight works is then 0.81. If we let X be the random variable specifying the number of flashlights that work from n = 10 then X is a binomial random variable with n = 10 and p = 0.81. Thus we conclude that P (X ≥ 9) = 1 − P (X ≤ 8) = 1 − pbinom(8, 10, 0.81) = 0.4067565 . Exercise 3.58 Let X denote the number of defective components in our batch of size n = 10. Part (a): If the actual proportion of defectives is p = 0.01 then the probability we accept a given batch is given by P (X ≤ 2) = pbinom(2, 10, 0.01) = 0.9998862 . 77 0.6 0.4 0.0 0.2 acceptance probability 0.8 1.0 P(X<=2;10,p) P(X<=1;10,p) P(X<=2;15,p) P(X<=1;15,p) 0.0 0.2 0.4 0.6 0.8 1.0 p (proportion of defective) Figure 1: Operating characteristic curves for Exercise 58. For the other values of p the values of the above expression are given by [1] 0.9884964 0.9298092 0.6777995 0.5255928 Part (b-d): For this part we plot P (batch is accepted) = P (X ≤ x) = pbinom(x, n, p) for different values of x and n as a function of p in Figure 1. Part (e): Of the choices it looks like sampling with n = 15 and X ≤ 1 is the “best” in that it has a curve with the lowest acceptance probability when p ≥ 0.1, which is the range of p under which there are too many defective components. Exercise 3.59 Part (a): P (reject claim) = P (X ≤ 15) = pbinom(15, 25, 0.8) = 0.01733187. Part (b): P (not reject claim) = 1 − P (reject claim) = 1 − P (X ≤ 15) = 1 − pbinom(15, 25, 0.7) = 0.810564 . If p = 0.6 then the above becomes 1 − pbinom(15, 25, 0.6) = 0.424617. 78 Exercise 3.60 Let X be the number of passenger cars then Y = 25 − X is the number of other vehicles. Our revenue h(X) is given by h(X) = X + 2.5(25 − X) = 62.5 − 1.5X . Then the expectation of the above is given by E(h(X)) = 62.5 − 1.5E(X) = 62.5 − 1.5(25)(0.6) = 40 . Exercise 3.61 We compute the probability of a good paper depending on if our student picks topic A or topic B P (good paper) = P (A)P (X ≥ 1|A) + P (B)P (X ≥ 2|B) . Here X are the number of books that arrive from inter-library loan. We can compute P (X ≥ 1|A) = 1 − pbinom(0, 2, 0.9) = 0.99 P (X ≥ 2|B) = 1 − pbinom(1, 4, 0.9) = 0.9963 . If we assume that we can pick P (A) or P (B) to be zero or one to maximize P (good paper) the student should choose the larger of P (X ≥ 1|A) or P (X ≥ 2|B) and thus should choose topic B. If p = 0.5 then the above values become P (X ≥ 1|A) = 1 − pbinom(0, 2, 0.5) = 0.75 P (X ≥ 2|B) = 1 − pbinom(1, 4, 0.5) = 0.6875 . Thus in this case the student should choose topic A. Exercise 3.62 Part (a): Since Var (X) = np(1 − p) we will have Var (X) = 0 if p = 0 or p = 1 which means that the result of the experiment is deterministic and not really random. Part (b): We compute 1 dVar (X) = n(1 − p − p) = n(1 − 2p) = 0 so p = , dp 2 when Var (X) is maximized. 79 Exercise 3.63 We first recall that n x p (1 − p)n−x . b(x, n, p) = x Part (a): Consider n n x n−x pn−x (1 − p)x = b(n − x; n, p) . (1 − p) p = b(x; n, 1 − p) = n−x x Part (b): For this part consider B(x; n, 1 − p) = x X k=0 =1− b(k; n, 1 − p) = n X k=x+1 n X k=0 b(k; n, 1 − p) − n X k=x+1 b(k; n, 1 − p) b(k; n, 1 − p) . Now by Part (a) b(x; n, 1 − p) = b(n − x; n, p) so we have the above equal to 1− n X k=x+1 b(n − k; n, p) . Now let v = n − k then the limits of the above summation become k = x+1⇒v =n−x−1 k = n ⇒ v = 0, to give 1− n−x−1 X v=0 b(v; n, p) = 1 − B(n − x − 1; n, p) . Part (c): We don’t need p > 0.5 since if p > 0.5 we can transform the expression we need to evaluate into one with p < 0.5. 80 Exercise 3.64 We have E(X) = n X x=0 n X x xb(x; n, p) = x=0 n X n! (n − x)!x! px (1 − p)n−x n! = px (1 − p)n−x (n − x)!(x − 1)! x=1 n X (n − 1)! = np px−1 (1 − p)n−x (n − x)!(x − 1)! x=1 n X (n − 1)! px−1 (1 − p)n−1−(x−1) . = np (n − 1 − (x − 1))!(x − 1)! x=1 Let y = x − 1 and the above becomes E(X) = np n−1 X y=0 = np n−1 X y=0 (n − 1)! py (1 − p)n−1−y (n − 1 − y)!y! b(y; n − 1, p) = np , as we were to show. Exercise 3.65 Part (a): People can pay with a debit card (with a probability 0.2) or something else (also with a probability of 0.8). Let X represent the number of people who pay with a debit card. Then E(X) = np = 100(0.2) = 20 Var (X) = npq = 100(0.2)(0.8) = 16 . Part (b): The probability a person does not pay with cash is given by 1 − 0.3 = 0.7. If we let Y be the number of people who don’t pay with cash we have E(Y ) = 100(0.7) = 70 Var (Y ) = 100(0.7)(0.3) = 21 . Exercise 3.66 Part (a): Let X be the number of people that actually show up for the trip (and have reservations). Then X is a binomial random variable with n = 6 and p = 0.8. If X > 4 i.e. 81 X ≥ 5 then at least one person cannot be accommodated since there are only four seats. The probability this happens is given by P (X ≥ 5) = 1 − P (X ≤ 4) = 1 − pbinom(4, 6, 0.8) = 0.65536 . Part (b): We have E(X) = 6(0.8) = 4.8 so the expected number of available places is 4 − 4.8 = −0.8. Part (c): Let Y be the number of passengers that show up to take the trip. Then to compute P (Y = y) we have to condition on the number of reservations R. For example for Y = 0 we have P (Y = 0) = P (Y = 0|R = 3)P (R = 3) + P (Y = 0|R = 4)P (R = 4) + P (Y = 0|R = 5)P (R = 5) + P (Y = 0|R = 6)P (R = 6) = dbinom(0, 3, 0.8)(0.1) + dbinom(0, 4, 0.8)(0.2) + dbinom(0, 5, 0.8)(0.3) + dbinom(0, 6, 0.8)(0.4) = 0.0012416 . Doing the same for the other possible values for y ∈ {0, 1, 2, 3, 4, 5, 6} we get the probabilities [1] [1] [1] [1] [1] [1] [1] "P(y=0)= "P(y=1)= "P(y=2)= "P(y=3)= "P(y=4)= "P(y=5)= "P(y=6)= 0.001242" 0.017254" 0.090624" 0.227328" 0.303104" 0.255590" 0.104858" Now since we can only take four people if more than four show up we can only accommodate four. Thus if we let Z be a random variable representing the number of people who actually take the trip (extra people are sent away) then we have [1] [1] [1] [1] [1] "P(z=0)= "P(z=1)= "P(z=2)= "P(z=3)= "P(z=4)= 0.001242" 0.017254" 0.090624" 0.227328" 0.303104 + 0.255590 + 0.104858 = 0.663552" Exercise 3.67 Recall that Chebyshev’s inequality is P (|X − µ| ≥ kσ) ≤ 82 1 . k2 If X ∼ Bin(20, 0.5) then we have µ = 10 and σ = 2.236068 so that when k = 2 the left-hand-side of the above becomes P (|X − 10| ≥ 4.472136) = P (|X − 10| ≥ 5) = P (X − 10 ≥ 5 or 10 − X ≥ 5) = P (X ≥ 15 or X ≤ 5) = P (X ≤ 5) + (1 − P (X ≤ 14)) = 0.04138947 . If X ∼ Bin(20, 0.75) then the above becomes P (|X − 15| ≥ 3.872983) = P (|X − 15| ≥ 4) = P (X ≥ 19 or X ≤ 11) = 0.06523779 . This is to be compared with way. 1 k2 = 1 4 = 0.25. The calculations for k = 3 are done in the same Exercise 3.68 Part (a): X is a hypergeometric random variable with N = 15, M = 6, and n = 5. The probability density function for this random variable is given by 6 15−6 h(x; n, M, N) = h(x; 5, 6, 15) = x 5−x 15 5 for 0 ≤ x ≤ 5 . Part (b): We can evaluate these using the R expressions P (X = 2) = dhyper(2, 6, 15 − 6, 5) = 0.4195804 P (X ≤ 2) = phyper(2, 6, 15 − 6, 5) = 0.7132867 P (X ≥ 2) = 1 − phyper(1, 6, 15 − 6, 5) = 0.7062937 . Part (c): Using the formulas in the book we compute µ = 2 and σ = 0.8783101. Exercise 3.69 Part (a): We have this given by dhyper( 5, 7, 12-7, 6 ) = 0.1136364. Part (b): We have this given by phyper( 4, 7, 12-7, 6 ) = 0.8787879. Part (c): From the formulas in the book for a hypergeometric random variable the mean and standard deviation of this distribution are given by µ = 3.5 and σ = 0.8141737 which gives µ + σ = 4.314174. Thus we want to compute P (X ≥ 5) = 1 − P (X ≤ 4) = 0.1212121. 15 Part (d): In this case we have Nn = 400 = 0.0375 < 0.05 and M = 0.1 which is not too close N to either 0 or 1 so we can use the binomial approximation to the hypergeometric distribution. Using that approximation we compute P (X ≤ 5) ≈ pbinom(5, 15, 0.1) = 0.9977503 . 83 Exercise 3.70 Part (a): If X is the number of second section papers then X is a hypergeometric random variable with M = 30, N = 50, and n = 15. Thus using R we compute M = 30 # second session numbers N = 20 + M # total number of students n = 15 x = 10 dhyper( x, M, N-M, n ) # gives 0.2069539 Part (b): This would be P (X ≥ 10) = 1 − P (X ≤ 9) and is given by 1 - phyper( x-1, M, N-M, n ) # gives 0.3798188 Part (c): In this case we could have at least 10 from the first or 10 from the second session. This will happen with a probability of ( 1 - phyper( x-1, 30, 20, n ) ) + ( 1 - phyper( x-1, 20, 30, n ) ) which gives 0.3938039. Part (d): In this case, using the formulas from the book we compute m = n * ( M / N ) s = sqrt( n * ( M / N ) * ( M / N ) * ( ( N - M ) / ( N - 1 ) ) ) which give µ = 9 and σ = 1.484615. Part (e): When we draw our fifteen papers this leaves 50 − 15 = 35 remaining papers to grade. The number of second section papers in this group is another hypergeometric random variable with M and N the same as before but now with n = 35. This distribution would have mean and standard deviation given by n = 35 m = n * ( M / N ) s = sqrt( n * ( M / N ) * ( M / N ) * ( ( N - M ) / ( N - 1 ) ) ) which give µ = 21 and σ = 2.267787. 84 Exercise 3.71 Part (a): The pmf of the number of granite specimens X is that of a hypergeometric random variable with M = 10, N = 20, and n = 15 thus 10 M N −M 10 P (X = x) = x n−x N n = x 15−x 20 12 for 5 ≤ x ≤ 10 . Part (b): This could happen if we get all granite or all basaltic rock i.e. x = 10. Since the distribution of basaltic rock is the same as that of the granite rock the probability requested would be given by 2 * dhyper( 10, 10, 10, 15 ) # gives 0.03250774 Part (c): For the number of granite rocks we find of µ − σ = 6.095121 and µ + σ = 8.904879 thus the probability we want to calculate is P (7 ≤ X ≤ 8). Using R we find this to be 0.6965944. Exercise 3.72 Part (a): Here X (the number of the top four candidates interviewed on the first day) is a hypergeometric random variable with N = 11 (the total number interviewed), M = 4 (the number of special candidates), and n = 6 (the number selected for first day interviews) and thus the pmf for X has a hypergeometric form. Part (b): We want the expected number of the top four candidates interviewed on the first day. This is given by M = N = n = n * [1] 4 11 6 ( M / N ) 2.181818 Exercise 3.73 Part (a): Note that X, the number of the top ten pairs that are found playing east-west, is a hypergeometric random variable with M = 10, N = 20, and n = 10. Part (b): The number of top five pairs that end up playing east-west is another hypergeometric random variable this time with M = 5, M = 20, and n = 10, so the probability that all five end up playing east-west is 85 dhyper( 5, 5, 15, 10 ) # gives 0.01625387 We could also have all five top pairs play north-south with the same probability. This gives a probability that the top five pairs end up playing the same direction is then given by 2(0.01625387) = 0.03250774. Part (c): Assume we have 2˜ n pairs (we use the notation n ˜ to not be confused with the parameter n found in the hypergeometric pmf). In this more general case we have another hypergeometric random variable with M = n ˜ , N = 2˜ n, and n = n ˜ . Thus This distribution has an expectation and variance given by n ˜ n ˜ = µ=n ˜ 2˜ n 2 n ˜ n ˜2 2˜ n−n ˜ n ˜ 1− = . Var (X) = n ˜ 2˜ n 2˜ n 2˜ n−1 4(2˜ n − 1) Exercise 3.74 Part (a): This is a hypergeometric random variable with N = 50, M = 15, and n = 10. Part (b): We can approximate a hypergeometric random variable with a binomial random 150 variable with p = M = 500 = 0.3 if this p is not too close to 0 or 1 (which it is not) and if N 10 n = 500 = 0.02 < 0.05 (which it is). We would have n = 10 in this binomial approximation. N Part (c): We would have for the exact and approximate pmf M 150 µ=n = 10 = 3. N 500 For the exact pmf we have 2 σ =n M N M 1− N N −M N −1 = 1.472946 , while for the approximate binomial pmf we have σ 2 = 2.1 . Exercise 3.75 Part (a): For this problem we can model a success is having a girl child then X the number of boys in the family is the number of failures before we have r = 2 success. Thus the pmf for X is a negative binomial and we have x+1 2 x+2−1 2 x p (1 − p)x = (x + 1)p2 (1 − p)x , p (1 − p) = P (X = x) = 1 2−1 86 for x = 0, 1, 2, . . . . Part (b): With four children we must have two boys (with the two required girls) so X = 2 and we want to evaluate the negative binomial density with x = 2. In R this is in the stats::NegBinomial library. Once this library is loaded you can access the probability mass functions in the normal way i.e. dnbiom for the negative binomial density. Thus we get library(stats) dnbinom(2,2,0.5) [1] 0.1875 Part (c): The statement “at most four children” means we have at most two boys and this probability is given by pnbinom(2,2,0.5) [1] 0.6875 Part (d): The expected number of failures is E(X) = 2(1/2) r(1 − p) = = 2. p (1/2) Thus the expected number of children is given by 2 + 2 = 4. Exercise 3.76 Let Y be the number of boys had before the third girl is had. Then Y is a negative binomial random variable with r = 3 and p = 1/2. Let X be the total number of children then X = Y + r and so the pmf of X is another negative binomial random variable but with different arguments (the arguments of X are the offsets of Y but increased by r). Exercise 3.77 Here X = X1 + X2 + X3 which shows X is a sum of three negative binomial random variables each with r = 2 and p = 1/2. From this decomposition we have 3(2)(1/2) r(1 − p) = = 6. E(X) = 3 p (1/2) The expected number of male children born to each couple is born to all couples. 87 r(1−p) p = 2 or 1/3 the average Exercise 3.78 Each “double” will happen with a probability of 1/36. Let Y be the number of failures before we roll five “doubles”. Then Y is a negative binomial random variable with p = 5 and p = 1/36. Let X = Y + 5 or the total number of die rolls. We have that E(X) = E(Y ) + 5 = r(1 − p) 5(35/36) +5= + 5 = 180 , r (1/36) and Var (X) = Var (Y ) = 5(35/36) r(1 − p) = = 6300 . p2 (1/36)2 Exercise 3.79 Part (a): P (X ≤ 8) = ppois(8, 5) = 0.9319064. Part (b): P (X = 8) = dpois(8, 5) = 0.06527804. Part (c): P (9 ≤ X) = 1 − P (X ≤ 8) = 1 − ppois(8, 5) = 0.06809363. Part (d): P (5 ≤ X ≤ 8) = ppois(8, 5) − ppois(4, 5) = 0.4914131. Part (e) P (5 < X < 8) = P (6 ≤ X ≤ 7) = ppois(7, 5) − ppois(5, 5) = 0.2506677. Exercise 3.80 Part (a): P (X ≤ 5) = ppois(5, 8) = 0.1912361. Part (b): P (6 ≤ X ≤ 9) = ppois(9, 8) − ppois(5, 8) = 0.5253882. Part (c) P (10 ≤ X) = 1 − P (X ≤ 9) = 1 − ppois(9, 8) = 0.2833757. Part (d) This would be P (X > 10.82843) = P (X ≥ 11) = 1 − P (X ≤ 10) = 0.1841142. Exercise 3.81 Part (a): P (X ≤ 10) = ppois(10, 20) = 0.01081172. Part (b): P (X > 20) = 1 − P (X ≤ 19) = 1 − ppois(19, 20) = 0.5297427. The answer in the back of the book corresponds to P (X ≥ 20). 88 Part (c): We have P (10 ≤ X ≤ 20) = P (X ≤ 20) − P (X ≤ 9) = ppois(20, 20) − ppois(9, 20) = 0.5540972, and P (10 < X < 20) = P (X ≤ 19)−P (X ≤ 10) = ppois(19, 20)− ppois(10, 20) = 0.4594455. √ Part (d): We have µX = 20 and σX = 20 = 4.472136 so that µX − 2σX = 11.05573 and µX + 2σX = 28.94427. Thus we want to evaluate P (12 ≤ X ≤ 28) = 0.9442797. Exercise 3.82 Part (a): P (X = 1) = dpois(1, 0.2) = 0.1637462. Part (b): P (X ≥ 2) = 1 − P (X ≤ 1) = 1 − ppois(1, 0.2) = 0.0175231. Part (c): P (X1 = 0 and X2 = 0) = P (X1 = 0)P (X2 = 0) = dpois(0, 0.2)2 = 0.67032. Exercise 3.83 1 and n = 1000 From the given description X is a binomial random variable with p = 200 thus λ = np = 5. In using the Poisson approximation to the binomial distribution the book states that we should have n > 50 and np < 5. The first condition is true here while the second condition is not strictly true. Part (a): P (5 ≤ X ≤ 8) = ppois(8, 50) − ppois(4, 50) = 0.4914131. Part (b): P (X ≥ 8) = 1 − P (X ≤ 7) = 1 − ppois(7, 50) = 0.1333717. Exercise 3.84 We have p = 0.1 10−2 = 0.001 and so λ = np = 104 (10−3 ) = 10. Part (a): E(X) = np = 10 and Var (X) = npq = 10(1 − 0.001) = 9.99 so that SD (X) = 3.160696. Part (b): X is approximately Poisson with λ = np = 10. Then P (X > 10) = 1 − P (X ≤ 9) = 1 − ppois(9, 10) = 0.5420703. Part (c): This is P (X = 0) = dpois(0, 10) = 4.539993 10−5. 89 Exercise 3.85 Part (a): Since λ = 8 these would be P (X = 6) = dpois(6, 8) = 0.1221382 P (X ≥ 6) = 1 − P (X ≤ 5) = 1 − ppois(5, 8) = 0.8087639 P (X ≥ 10) = 1 − ppois(9, 8) = 0.2833757 . Part (b): Now λ = 8(1.5) = 12, so E(X) = λ = 12 and SD (X) = √ λ = 3.464102. Part (c): Now λ = 8(2.5) = 20 so we get P (X ≥ 20) = 1 − P (X ≤ 19) = 1 − ppois(19, λ) = 0.5297427 P (X ≤ 10) = ppois(10, λ) = 0.01081172 . Exercise 3.86 Part (a): Since λ = 5 this is P (X = 4) = dpois(4, 5) = 0.1754674. Part (b): This is P (X ≥ 4) = 1 − P (X ≤ 3) = 1 − ppois(3, 5) = 0.7349741. Part (c): This is λ = 5(3/4) = 3.75. Exercise 3.87 Part (a): The number of calls received, X is a Poisson random variable with λ = 4(2) = 8. Thus we want P (X = 10) = dpois(10, 8) = 0.09926153. Part (b): The number of calls received, X during the break is a Poisson random variable with λ = 4(0.5) = 2. Thus we want P (X = 0) = dpois(0, 2) = 0.1353353. Part (c): This would be the expectation of the random variable X which is λ = 2. Exercise 3.88 Part (a): If X is the number of diodes that will fail then X is a binomial random variable √ with n = 200 and p = 0.01. Thus E(X) = np = 2 and SD (X) = npq = 1.407125. Part (b): We could approximate the true pdf of X as a Poisson random variable with λ = np = 2. Thus we want to compute P (X ≥ 4) = 1 − P (X ≤ 3) = 0.1428765. 90 Part (c): The probability that all diodes work (using the Poisson approximation) is P (X = 0) = 0.1353353. The number of boards that work (from five) is a binomial random variable with n = 5 and a probability of “success” given by P (X = 0) just calculated. Thus the probability we seek (if N is a random variable denoting the number of working boards) is P (N ≥ 4) = 1 − P (N ≤ 3) = 1 − pbinom(3, 5, 0.1353353) = 0.001495714 . Exercise 3.89 Part (a): This would be 2 0.5 = 4. Part (b): This would be P (X > 5) = 1 − P (X ≤ 5) = 1 − ppois(5, 4) = 0.2148696. Part (c): For this we want t P (X = 0) ≤ 0.1 or dpois 0, ≤ 0.1 . 0.5 From the known functional form for the Poisson pdf we have e−t/0.5 ≤ 0.1 or t ≥ −0.5 ln(0.1) = 1.151293 , years. Exercise 3.90 (deriving properties of a Poisson random variable) If X is a Poisson random variable then from the definition of expectation we have that E[X n ] = ∞ X in e−λ i=0 ∞ ∞ X in λn X in λi λi = e−λ e−λ = , i! i! i! i=0 i=1 since (assuming n 6= 0) when i = 0 the first term vanishes. Continuing our calculation we can cancel a factor of i and find that n −λ E[X ] = e = λ ∞ ∞ X X (i + 1)n−1λi+1 in−1 λi −λ =e (i − 1)! i! i=0 i=1 ∞ X (i + 1)n−1e−λ λi i=0 i! . Now this sum can be recognized as the expectation of the variable (X + 1)n−1 so we see that E[X n ] = λE[(X + 1)n−1 ] . From the result we have E[X] = λE[1] = λ and E[X 2 ] = λE[X + 1] = λ(λ + 1) . 91 (11) Thus the variance of X is given by Var[X] = E[X 2 ] − E[X]2 = λ . We find the characteristic function for a Poisson random variable given by ζ(t) = E[eitX ] = ∞ X eitx x=0 = e−λ ∞ X (eit λ)x x=0 x! e−λ λx x! = e−λ eλe it = exp{λ(eit − 1)} . (12) Above we explicitly calculated E(X) and Var(X) but we can also use the above characteristic function to derive them. For example, we find 1 1 ∂ζ(t) it it exp{λ(e − 1)}λie = E(X) = t=0 i ∂t t=0 i = λeit exp{λ(eit − 1)}t=0 = λ , for E(X) and 1 ∂ 2 ζ(t) 1 ∂ it it E(X ) = 2 = λe exp{λ(e − 1)} i ∂t2 t=0 i ∂t t=0 1 it it it it iλe exp{λ(e − 1)} + λe (λie ) exp{λ(eit − 1)} t=0 = i = λeit exp{λ(eit − 1)} + λ2 e2it exp{λ(eit − 1)}t=0 2 = λ + λ2 , for E(X 2 ) the same two results as before. Exercise 3.91 Part (a): This number X will be distributed as a Poisson random variable with λ = 80(0.25) = 20. Thus we want to evaluate P (X ≤ 16) = ppois(16, 20) = 0.2210742. Part (b): This number X will be distributed as a Poisson random variable with λ = 80(85000) = 6800000 and this is also the expectation of X. Part (c): This circle would have an area (in square miles) of π(0.1)2 = 0.03141593 which is 20.10619 acres. Thus the number of trees is a Poisson random variable with λ = 20.10619. Exercise 3.92 Part (a): We need ten vehicles to arrive and then once these ten are inspected we need to have no violations found. Let X be the number of vehicles that arrived and then N the 92 number of cars without violations. Then N is a binomial random variable with n = 10 and p = 0.5. Using this we have P (X = 10 ∩ N = 0) = P (N = 0|X = 10)P (X = 10) = dbinom(0, 10, 0.5)dpois(10, 10) = 0.0001221778 . Part (b): Based on the arguments above this would be −10 y y e 10 10 y−10 0.5 0.5 P (X = y)P (N = 10|X = y) = 10 y! = dpois(y, 10)dbinom(10, y, 0.5) . Part (c): Summing the above for y = 10 to y = 30 (a small approximation to ∞) we get the value 0.01813279. Exercise 3.93 Part (a): For there to be no events in the interval (0, t + ∆t) there must be no events in the interval (0, t) and the interval (t, t + ∆). Using this and the independence from the Poisson process we have P0 (t + ∆t) = P0 (t)P0 (∆t) . Part (b): Following the books suggestions we have P0 (t + ∆t) − P0 (t) = −P0 (t) ∆t By property 2 from the Poisson process we have 1 − P0 (∆t) ∆t . 1 − P0 (∆t) = 1 − (1 − α∆t + o(∆t)) = α∆t + o(∆t) , and the above becomes P0 (t + ∆t) − P0 (t) o(∆t) . = −P0 (t) α + ∆t ∆t Taking the limit as ∆t → 0 we get dP0 (t) = −αP0 (t) . dt satisfies the above equation. Part (c): The expression e−αt k not that Part (d): For Pk (t) = e−αt (αt) k! kα(αt)k−1 k! (αt)k−1 −αt = −αPk (t) + αe (k − 1)! = −αPk (t) + αPk−1(t) , d Pk (t) = −αPk (t) + e−αt dt the desired expression. 93 Exercise 3.94 Recall that the number of elements (unique unordered tuples of size three from seven elements) is given by 7 = 35 , 3 which is how to calculate the total number of outcomes. Each tuple thus has a probability 1 of 35 . In the python code ex3 94.py we enumerate each possible tuple of three numbers and compute its sum. When we run the above code we get (1, (1, (1, (1, (1, (1, ... (3, (3, (4, (4, (4, (5, 2, 3) 6 2, 4) 7 2, 5) 8 2, 6) 9 2, 7) 10 3, 4) 8 output omitted ... 5, 7) 15 6, 7) 16 5, 6) 15 5, 7) 16 6, 7) 17 6, 7) 18 If we next count up the number of times each potential sum occurs and compute the probability of getting this sum we compute sum 6 7 8 9 10 11 12 13 14 15 16 17 18 numb 1 1 2 3 4 4 5 4 4 3 2 1 1 prob 0.028571 0.028571 0.057143 0.085714 0.114286 0.114286 0.142857 0.114286 0.114286 0.085714 0.057143 0.028571 0.028571 In the same code we compute µ and σ 2 and find 94 mu = 12.0 sigma2= 8.0 Exercise 3.95 Part (a): Following the hint we have 4 13 4 5 P (all five cards are spades) = 52 = 0.001980792 . P (1) = 1 5 The factor 41 is the number of ways to select the suit of cards for the hand (here we selected spades) from the four possible suits. Next we have 4 P (only spades and hearts with at least one of each suit) P (2) = 2 13 P4 13 6 k=1 k 5−k = 0.1459184 . = 52 5 4 The factor 2 is the number of ways to select the two suits of cards in the hand (here we selected spades and hearts). In the numerator the factor 13 is the number of ways to select k 13 the spades in the hand and then 5−k is the number of ways to select the hearts in the hand. Next we have 4 P (two spades and one card from the other three suits) P (4) = 1 13 13 13 4 13 2 1 1 1 = 0.2637455 . = 52 5 4 The factor 1 is the number of ways to select the suit of cards for the hand that will be duplicated (here we selected spades) from the four possible suits. Then P (3) = 1 − P (1) − P (2) − P (4) = 0.5883553. Part (b): We find µ = 3.113866 and σ 2 = 0.4046217. 95 Exercise 3.96 We must have r successes and we stop when we get them. Thus the last trial will be the rth success. P (Y = r) = pr r (1 − p)pr P (Y = r + 1) = 1 r+1 (1 − p)2 pr P (Y = r + 2) = 2 .. . r+k−1 (1 − p)k pr P (Y = r + k) = k for k ≥ 0 . If we want this written in terms of just y the total number of trials then y = r + k and we have y−1 (1 − p)y−r pr for y ≥ r . P (Y = y) = y−r Exercise 3.97 Part (a): This is a binomial random variable with n = 15 and p = 0.75. Part (b): We have 1 − P (X ≤ 9) = 1 − pbinom(9, 15, 0.75) = 0.8516319. Part (c): We have P (6 ≤ X ≤ 10) = P (X ≤ 10) − P (X ≤ 5) = 0.3127191. Part (d): We have µ = np = 15(0.75) = 11.25 σ 2 np(1 − p) = 2.8125 . Part (e): We have 10 chain driven models and 8 shank driven models in the existing stock. Note that Y = 15 − X is the number of shaft driven models bought by the next fifteen customers. Thus to have enough product on hand we must have 0 ≤ X ≤ 10 and 0 ≤ Y ≤ 8 . Since Y = 15 − X this last inequality is equivalent to 7 ≤ X ≤ 15 . Thus combining this with the condition 0 ≤ X ≤ 10 we must have 7 ≤ X ≤ 10 . Thus the probability we want to compute is given by P (7 ≤ X ≤ 10) = P (X ≤ 10)−P (X ≤ 6) = pbinom(10, 15, 0.75)−pbinom(6, 15, 0.75) = 0.309321 . 96 Exercise 3.98 The probability that a six volt flash light works is one minus the probability that both six volt batteries fail or 1 − (1 − p)2 . The probability that the two D-cell flashlight works is the probability we have at least two working batteries from the four given. If X is the number of working batteries than X is a binomial random variable with parameters n = 4 and p. Thus the probability we want is P (X ≥ 2) or 1 X 4 x p (1 − p)4−x . P (X ≥ 2) = 1 − P (X ≤ 1) = 1 − x x=0 Using the simple R code ps = seq( 0, 1, length.out=100 ) P_six_volt = 1 - (1-ps)^2 P_D_cell = 1 - pbinom( 1, 4, ps ) plot( ps, P_six_volt, type=’l’, col=’blue’ ) lines( ps, P_D_cell, type=’l’, col=’red’ ) legend( ’topleft’, ’(x,y)’, c(’probability six volt works’,’probability D cell works’), lty=c(1,1), col=c(’blue’,’red’) ) grid() we can plot each of these expressions as a function of p. When we do that we get the plot given in Figure 2. There we see that for low value of p i.e. less than about 0.65 the six volt flashlight has a larger probability of working. If p is greater than about 0.65 then the D cell has a larger probability. Exercise 3.99 We want P (X ≥ 3) where X is binomial with n = 5 and p = 0.9 so P (X ≥ 3) = 1 − P (X ≤ 2) = 1 − pbinom(2, 5, 0.9) = 0.99144 . Exercise 3.100 From the problem statement a lot will be rejected if X ≥ 5. Part (a): We have P (X ≥ 5) = 1 − P (X ≤ 4) = 1 − pbinom(4, 25, 0.05) = 0.007164948 . 97 0.0 0.2 0.4 P_six_volt 0.6 0.8 1.0 probability six volt works probability D cell works 0.0 0.2 0.4 0.6 0.8 1.0 ps ¯ Exercise 98. Figure 2: The two flashlight probabilities of X Part (b): This is P (X ≥ 5) = 1 − pbinom(4, 25, 0.1) = 0.09799362 . Part (c): This is P (X ≥ 5) = 1 − pbinom(4, 25, 0.2) = 0.5793257 . Part (d): We change the four in the above expressions to a five all probabilities would decrease since now we require more defective batteries Exercise 3.101 Part (a): X is a binomial random variable with n = 500 and p = 0.005 which we can approximate using a Poisson random variable with λ = np = 2.5 since we have n > 50 and np = 2.5 < 5. Part (b): P (X = 5) = dpois(5, 2.5) = 0.06680094. Part (c): This is P (5 ≤ X) = P (X ≥ 5) = 1 − P (X ≤ 5) = 1 − ppois(4, 2.5) = 0.108822 . 98 Exercise 3.102 Note that X is a binomial random variable with n = 25 and p = 0.5. Part (a): This is 0.9853667. Part (b): This is 0.2199647. Part (c): If p = 0.5 then by chance we would have P (X ≤ 7 or X ≥ 18) = P (X ≤ 7) + P (X ≥ 18) = pbinom(7, 25, 0.5) + (1 − P (X ≤ 17)) = pbinom(7, 25, 0.5) + (1 − pbinom(17, 25, 0.5)) = 0.04328525 . Part (d): We reject the claim if the inequalities in the previous part are true. This can happen with a probability (when p = 0.6) given by pbinom(7,25,0.6) + ( 1 - pbinom(17,25,0.6) ) [1] 0.1547572 If p = 0.8 this becomes 0.8908772. Part (e): We would want to construct a test to pick an integer value of c such that when p = 0.5 we have P (µ − c ≤ X ≤ µ + c) ≈ 1 − 0.01 = 0.99 . This means that we expect to fall in the region µ − c ≤ X ≤ µ + c with probability 0.99 and outside of this region with probability of 0.01. Then if our sample had x ≥ µ + c or x ≤ µ − c we would reject H0 : p = 0.5. Exercise 3.103 Let T be the random variable specifying the number of tests that will be run. Then we have E(T ) = 1P (none of the n members has the disease) + (n + 1)P (at least one of the n members has the disease) = (1 − p)n + (n + 1)(1 − (1 − p)n ) = n + 1 − n(1 − p)n . If n = 3 and p = 0.1 the above is E(T ) = 1.813. If n = 5 and p = 0.1 the above is E(T ) = 3.04755. 99 Exercise 3.104 We receive a correct symbol with a probability 1 − p1 . We receive an incorrect symbol with probability p1 but this incorrect symbol can be corrected with a probability p2 . In total then, we receive a correct symbol with probability p given by p = 1 − p1 + p1 p2 . Then the number of correct symbols X is a binomial random variable with parameters n and p (given by the above expression). Exercise 3.105 In our sequence of trials the last two must be successes and thus the probability we perform just two trials to accept will have a probability of p2 . We will perform three trials to accept if the first trial is a failure followed by two successes which happens with probability (1 − p)p2 . We will perform four trials to accept if the first two trials are F, F or S, F followed by two successes. Thus this event has the probability (1 − p)2 p2 + (1 − p)pp2 = (1 − p)p2 . Thus so far we have P (2) = p2 P (3) = (1 − p)p2 P (4) = (1 − p)p2 . For P (x) = P {X = x} when x ≥ 5 we must have the last two trials a success and the trial before the last two trials must be a failure. If it was a success we would have the sequence S, S, S and would stop before the final trial. In the first x − 3 trials we cannot have a sequential run of two successes. Thus we get for the probability of the event X = x P (x) = p2 (1 − p) [1 − P (2) − P (3) − · · · − P (x − 4) − P (x − 3)] , for x ≥ 5. For p = 0.9 we can evaluate these expressions to compute P (X ≤ 8) with the following R code p = 0.9 # x=2 x=3, x=4 p_x = c( p^2, (1-p)*p^2, (1-p)*p^2 ) for( x in 5:8 ){ x_to_index = x-1 # location of x in p_x vector previous_x_indices = 1:(x_to_index-3) prob_no_acceptance = 1 - sum( p_x[ previous_x_indices ] ) p_x = c( p_x, prob_no_acceptance * (1-p) * p^2 ) } print( sum(p_x) ) # gives 0.9995084 100 Exercise 3.106 Part (a): The number of customers that qualify for membership X is a binomial random variable with n = 25 and p = 0.1. Thus we want compute P (2 ≤ X ≤ 6) = 0.7193177. Part (b): In this case n = 100 and the same considerations as above gives µ = np = 10 and σ 2 = np(1 − p) = 9. Part (c): We want to compute P (X ≥ 7) when X ∼ Bin(25, 0.1). We find this to be 0.009476361. Part (d): In this case using R we compute P (X ≤ 6) to be pbinom(6,25,0.2) [1] 0.7800353 Exercise 3.107 Let S correspond to the event that a seed of maize has a single spikelet and P correspond to the event that a seed of maize has a paired spikelet. Then we are told that P (S) = 0.4 P (P ) = 0.6 . We are also told that after the seed has grown it will produce an ear of corn with single or paired spikelets with the following probabilities P (S|S) = 0.29 so P (P |S) = 0.71 P (S|P ) = 0.26 so P (P |P ) = 0.74 . We next select n = 10 seeds. Part (a): For each seed the probability that we are of type S and produce kernels of type S is given by the probability p computed as p = P (S|S)P (S) = 0.29(0.4) = 0.116 . Then the probability that exactly X of these seeds from the 10 do this is a binomial random variable with n = 10 and p given above. Thus we compute dbinom(5,10,0.116) [1] 0.002857273 101 Part (b): Next we want the probability that we produce kernels of type S. This can be computed p = P (S|S)P (S) + P (S|P )P (P ) = 0.29(0.4) + 0.26(0.6) = 0.272 . The two desired probability are then given by c( dbinom(5,10,0.272), pbinom(5,10,0.272) ) [1] 0.07671883 0.97023725 Exercise 3.108 X is a hypergeometric random variable with M = 4, N = 8 + 4 = 12, and n = 4. We are asked about the mean number of jurors favoring acquittal who will be interviewed. This is M 4 µ=n =4 = 1.333333 . N 12 Exercise 3.109 Part (a): The number of calls (say X) any one of the operators receives during one minute will be a Poisson random variable with λ = 2(1) = 2. Thus the probability that the first (or any operator) receives no calls is P (X = 0) = e−λ = 0.1353353. Part (b): If we consider receiving no calls a “success” then the number of operators (from five) that receive no calls will be a binomial random variable with n = 5 and p = 0.1353353 (the probability from Part (a) above). Thus the probability we seek is given by 0.001450313. Part (b): Let E be the event that all operators receive the same number of calls. Let c be the number of calls each operators receives in the first minute. Then we have (using R notation) ∞ X dbinom(5, 5, dpois(c, 2))) = 0.00314835 . P (E) = c=0 Exercise 3.110 For a radius of size R the number of grasshoppers found will be a Poisson random variable with a parameter λ = απR2 = 2πR2 . Since for a Poisson random variable the probability of getting at least one count is given by P (X ≥ 1) = 1 − P (X = 0) = 1 − e−λ in this case we find to find R such that 2 P (X ≥ 1) = 1 − e−2πR = 0.99 . Solving for R in the above expression we find R = 0.8561166 yards. 102 Exercise 3.111 The expected number of copies sold is given by E(number sold) = 5 X k=0 kP (X = k) + 5 ∞ X P (X = k) = 2.515348 + 1.074348 = 3.589696 . k=6 Exercise 3.112 Part (a): For x = 10 we can have A win the first ten games or B win the first ten games and thus P (X = 10) = p10 + (1 − p)10 . For x = 11 the opponent of the player that ultimately wins must win one game in the first ten games thus 10 10 10 p(1 − p)10 . (1 − p)p + P (X = 11) = 1 1 In the same way if x = 12 the opponent of the player that ultimately wins must win two games in the first eleven games thus 11 2 11 2 10 p (1 − p)10 . (1 − p) p + P (X = 12) = 2 2 The pattern is now clear. We have x − 1 x−10 x−1 x−10 10 p (1 − p)10 , (1 − p) p + P (X = x) = x − 10 x − 10 for 10 ≤ x ≤ 19. Lets check empirically that what we have constructed is a probability mass function p = 0.9 xs = 10:19 pt_1 = choose( xs-1, xs-10 ) * (1-p)^(xs-10) * p^10 pt_2 = choose( xs-1, xs-10 ) * (1-p)^10 * p^(xs-10) sum( pt_1 + pt_2 ) # gives 1 Part (b): In this case X ∈ {10, 11, 12, 13, . . . } since we can imagine as many draws as needed to make X as large as desired. Exercise 3.113 Let T be the event that the result of the test is positive and let D be the event that the person has the disease. Then we are told that P (T |D c ) = 0.2 ⇒ P (T c |D c ) = 0.8 , 103 and P (T c|D) = 0.1 ⇒ P (T |D c) = 0.9 . Part (a): No since we have a different probability of “success” on each trial since this probability depends no whether or not the person has the disease. Part (b): This probability p would be given by p= 3 X P (X = k|Select from the diseased group of 5)P (X = k|Select from non-diseased group of 5) k=0 = 3 X 5 k=0 k k 5−k 0.9 0.1 5 k 5−k 0.2 0.8 3−k = 0.0272983 . We computed this in R using ks = 0:3 sum( dbinom( ks, 5, 0.9 ) * dbinom( 3-ks, 5, 0.2 ) ) Exercise 3.114 In R the dnbinom function does not need to have its size parameter (here the value of r) an integer. Thus we compute P (X = 4) = dnbinom(4, 2.5, 0.3) = 0.106799 and P (X ≥ 1) = 1 − P (X ≤ 0) = 1 − pnbinom(0, 2.5, 0.3) = 0.950705. Exercise 3.115 Part (a): We have p(x) ≥ 0 and Part (b): This would be P x p(x) = 1. p(x; λ, µ) = 0.6 e−λ λx x! e−µ µx + 0.4 x! . Part (c): Using the expectation of a Poisson random variable we have 1 1 1 1 1 E(X) = E(Part 1) + E(Part 2) = λ + µ = (λ + µ) . 2 2 2 2 2 Part (d): To compute the variance we need the expectation of the random variable squared. To compute this recall that if Y is a Poisson random variable with parameter κ then E(Y 2 ) = Var (Y ) + E(Y )2 = κ + κ2 . Thus for our variable X in this problem we have 1 1 1 E(X 2 ) = (λ + λ2 ) + (µ + µ2 ) = (λ + µ + λ2 + µ2 ) . 2 2 2 104 Given this we have Var (X) = E(X 2 ) − E(X)2 1 1 = (λ + µ + λ2 + µ2 ) − (λ2 + µ2 + 2λµ) 2 4 1 1 2 = (λ + µ) + (λ − µ) , 2 4 when we simplify. Exercise 3.116 Part (a): To prove this lets first consider the ratio b(x + 1; n, p)/b(x; n, p) which is given by n px+1 (1 − p)n−x−1 x+1 b(x + 1; n, p) = b(x; n, p) n px (1 − p)n−x x p n−x . = x+1 1−p Now b(x + 1; n, p) will be larger than b(x; n, p) if this ratio is larger than one or n−x p > 1. x+1 1−p This is equivalent to x < np − (1 − p) . Thus the mode x∗ is the integer that is larger than or equal to np − (1 − p) but less than (or equal to) this number plus one. That is it must satisfy np − (1 − p) ≤ x∗ ≤ np − (1 − p) + 1 = p(n + 1) . −λ x Part (b): For P (X = x) = e x!λ the mode is the value x∗ that gives the largest P (X = x∗ ) value. Consider for what values of x the ratio of P (X = x + 1) and P (X = x) is increasing we have P (X = x + 1) λ = > 1. P (X = x) x+1 This means that λ > x + 1 so x < λ − 1. Thus the mode is the integer x∗ such that λ − 1 ≤ x∗ ≤ λ − 1 + 1 = λ . If λ is an integer then λ − 1 is also an integer so the bounds above puts x∗ between two integers and thus either one can be the mode. 105 Exercise 3.117 Recall that X is the number of tracks the arm will pass during a new disk track “seek”. Then we can compute P (X = j) by conditioning on the track that the disk head is currently on as P (X = j) = = 10 X i=1 10 X P (arm is now on track i and X = j) P (X = j|arm is now on track i)pi . i=1 Next we need to evaluate P (X = j|arm is now on track i). To do that we consider several cases. Starting on track i then for the head: • to move over j = 0 tracks means the head does not actually move and it stays on track i. This will happen with probability pi since we have to have a seek request to the same track index i as we are currently on. • to move over j = 1 tracks means that we have to have requests to the tracks i + 1 or i − 1. This will happen with probability pi+1 + pi−1 . • to move over j = 2 tracks means that we have to have requests to the tracks i + 2 or i − 2. This will happen with probability pi+2 + pi−2 . The pattern above continues for other values of j. In general, to move over j tracks means that we receive a seek request to go to tracks i + j or i − j. Note that if one of these values is less than 1 or larger than 10 it would indicate a track that does not exist and we must take the values of pi+j or pi−j as zero. Thus we have P (X = 0) = P (X = j) = 10 X i=1 10 X p2i for j ≥ 1 , (pi+j + pi−j )pi i=1 with pk = 0 if k ≤ 0 or k ≥ 11. Note that this results is slightly different than the one in the back of the book. If anyone sees anything wrong with what I have done please let me know. Exercise 3.118 Since X is a hypergeometric random variable we have E(X) = X xp(x) = x Xx x 106 M x N −M n−x N n . Now the limits of X are max(0, n − N + M) ≤ x ≤ min(n, M). Since we are told that n < M we know that the upper limit of our summation is min(n, M) = n. Thus we have N −M n X x M x n−x . E(X) = N n x=max(0,n−N +M ) Expanding the binomial coefficients this becomes (N −M )! ! n x x!(MM−x)! X (n−x)!(N −M −n+x)! E(X) = N! n!(N −n)! x=max(0,n−N +M ) = (N −1−(M −1))! M (M −1)! (x−1)!(M −1−(x−1))! (n−1−(x−1))!(N −1−(M −1)−(n−1)+(x−1))! N (N −1)! n(n−1)!(N −1−(n−1))! x=max(1,n−N +M ) n X nM = N M −1 x−1 n X x=max(1,n−N +M ) Let y = x − 1 then the above is nM E(X) = N nM = N n−1 X N −1−(M −1) n−1−(x−1) N −1 n−1 M −1 y y=max(1,n−N +M )−1 . N −1−(M −1) n−1−y N −1 n−1 M −1 y n−1 X y=max(0,n−1−(N −1)+(M −1)) N −1−(M −1) n−1−y N −1 n−1 = nM , N since the sum above is the sum of the hypergeometric density h(y; n − 1, M − 1, N − 1) over all possible y values (and hence sums to one). Exercise 3.119 From the given expression X all x (x − µ)2 p(x) ≥ X (x − µ)2 p(x) , x:|x−µ|≥kσ since the left-hand-side is the definition of σ 2 we have X X σ2 ≥ (x − µ)2 p(x) ≥ x:|x−µ|≥kσ The right-hand-side of the above is X k2σ2 x:|x−µ|≥kσ (k 2 σ 2 )p(x) . x:|x−µ|≥kσ p(x) = k 2 σ 2 P (|X − µ| ≥ kσ) . If we divide both sides by k 2 σ 2 we get P (|X − µ| ≥ kσ) ≤ or Chebyshev’s inequality. 107 1 , k2 Exercise 3.120 Part (a): For the given functional expression for α(t) we get t Z t2 ea ebt 2 ea bt2 a+bt e dt = λ= = (e − ebt1 ) . a b t1 t1 Which due to the properties of the Poisson distribution is also the expectation of the number of events between the two times [t1 , t2 ]. With the values of a and b given and for t1 = 0 and t2 = 4 we get λ = 123.4364. If t1 = 2 and t2 = 6 we get λ = 409.8231. Part (b): In the interval [0, 0.9907] the number of events X is a Poisson random variable with (calculated as above) λ = 9.999606. Thus we want to evaluate P (X ≤ 15) = ppois(15, 9.999606) = 0.9512733. Exercise 3.121 Part (a): The expectation is given by E(call time) = 0.75E(call time|call is voice) + 0.25E(call time|call is data) = 0.75(3) + 0.25(1) = 2.5 . minutes. Part (b): Let C be the random variable representing the number of chocolate chips found and the three cookie types denoted by C1 , C2 , and C3 . Then we get E(C) = E(C|C1 )P (C1 )+E(C|C2 )P (C2 )+E(C|C3 )P (C3 ) = 0.2(1+1)+0.5(2+1)+0.3(3+1) = 3.1 . Exercise 3.122 We compute P (X = 1) = P (X = 2) = P (X = 3) = .. . P (X = k) = .. . p (1 − p)p (1 − p)2 p (1 − p)k−1 p P (X = 10) = 1 − 9 X P (X = k) = 1 − k=1 8 X = 1−p k=0 k 9 X (1 − p)k−1 p k=1 (1 − p) = 1 − p 108 1 − (1 − p)9 1 − (1 − p) = (1 − p)9 . The average will then be given by µ= 10 X kP (X = k) = p k=1 8 X =p k=0 9 X k=1 k(1 − p)k−1 + 10(1 − p)9 k(1 − p)k + 10(1 − p)9 . To evaluate the first summation we use the identity n X k=0 k kr = r 1 − rn nr n − (1 − r)2 1 − r We then get when we simplify. 1 1 µ= −1+ 2− (1 − p)9 , p p 109 . (13) Continuous Random Variables and Probability Distributions Problem Solutions Exercise 4.5 Part (a): We must have a value of k such that Z 2 kx2 dx = 1 . 0 Integrating the left-hand-side we get 2 x3 k = 1. 3 0 or 3 8k = 1 so k = . 3 8 Part (b): We find P (X < 1) = Z 0 1 3 3 2 x dx = 8 8 Part (c): We find P (1 < X < 1.5) = Z 1.5 1 3 3 2 x dx = 8 8 Part (d): For this we find P (X > 1.5) = Z 2 1.5 1 1 x3 = . 3 0 8 1.5 x3 1 = (1.53 − 1) = 0.296875 . 3 1 8 1 2 1 3 2 x dx = x3 1.5 = (8 − 1.53 ) = 0.578125 . 8 8 8 Exercise 4.11 Part (a): For this we have P (X ≤ 1) = F (1) = 1 . 4 Part (b): 1 1 P (0.5 ≤ X ≤ 1) = F (1) = F (0.5) = − 4 4 110 2 1 3 . = 2 16 Part (c): 1 P (X > 0.5) = 1 − P (X ≤ 0.5) = 1 − F (0.5) = 1 − 4 1 15 = . 4 16 Part (d): For this part we want to solve 0.5 = F (˜ µ) for µ ˜ or which means that µ ˜= √ µ ˜2 1 = , 2 4 2. Part (e): We have 0 x<0 0≤x<2 f (x) = F (x) = 0 x≥2 ′ Part (f): E(X) = Z xf (x)dx = x 2 Z 2 0 Part (g): We first compute Z Z 2 2 E(X ) = x f (x)dx = so that Var (X) = E(X 2 ) − E(X)2 = 2 − 2 1 dx = 2 2 Z 1 x dx = 2 2 2 2 x 0 16 9 Z x x x2 dx = 0 4 . 3 2 x3 dx = 2 , 0 = 0.2222222 and σX = 0.4714045. Part (h): This would be the same as E(X 2 ) which we computed above as 2. Exercise 4.12 Part (a): This is P (X < 0) = F (0) = 1 . 2 Part (b): This is given by 3 P (−1 < X < +1) = F (1) − F (−1) = 32 1 4− 3 3 − 32 3 1 + 2 32 1 −4 + 3 = 11 , 16 when we simplify. Part (c): This is P (0.5 < X) = 1 − P (X < 0.5) = 1 − F (0.5) = 1 − 111 4 1 1 − · 2 3 8 = 81 , 256 when we simplify. Part (d): This is f (x) = F ′ (x) = 3 (4 − x2 ) . 32 Part (e): Now µ ˜ is the solution to F (˜ µ) = 21 . For this problem this equation is 3 µ ˜3 4˜ µ− = 0. 32 3 √ This has solutions µ ˜ = 0 or µ ˜ = ± 12 = ±3.464102. Note that these last two solutions don’t satisfy |˜ µ| < 2 and are not valid. Thus µ ˜ = 0 is the only solution. Exercise 4.13 Part (a): We must have k such that Z ∞ kx−4 dx = 1 . 1 Evaluating the left-hand-side of this gives ∞ Z ∞ k k x−3 −4 = k (0 − 1) = . x dx = k −3 1 −3 3 1 To make this equal to one means that k = 3. Part (b): Our cumulative distribution for this density is given by x Z x 3ξ −3 1 1 −4 F (x) = 3ξ dξ = =− −1 =1− 3 , 3 −3 1 x x 1 for x > 1. Part (c): These are given by 1 1 P (X > 2) = 1 − P (X < 2) = 1 − F (2) = 1 − 1 − 3 = , 2 8 and P (2 < X < 3) = F (3) − F (2) = 1− 1 27 1 19 1 1 = = 0.08796296 . − 1− = − 8 8 27 216 112 Part (d): These can be computed using Z ∞ E(X) = x(3x−4 )dx 1 ∞ Z ∞ x−2 3 3 −3 =3 x dx = 3 = − (0 − 1) = (−2) 1 2 2 1 −1 ∞ Z ∞ Z ∞ x = −3(0 − 1) = 3 . E(X 2 ) = 3 x2 (x−4 )dx = 3 x−2 dx = 3 (−1) 1 1 1 Thus and σX = √ 3 2 Var (X) = E(X 2 ) − E(X)2 = 3 − 3 9 = , 4 4 = 0.8660254. Part (e): The domain we are interested in is |X − µ| < σ or µ − σ < X < µ + σ. Note that √ 3 3 = 0.6339746 < 1 , µ−σ = − 2 2 and is outside of the feasible domain. Thus we compute this using √ ! 3 3 P (µ − σ < X < µ + σ) = F (µ + σ) − F (1) = F − F (1) + 2 2 = 1 − 1 3 2 + √ 3 2 3 − 0 = 0.9245009 . Exercise 4.14 Part (a): From properties of a uniform distribution we have 7.5 + 20 1 = 13.75 E(X) = (a + b) = 2 2 (20 − 7.5)2 (b − a)2 = = 13.02083 . Var (X) = 12 12 Part (b): We find x dξ x − 7.5 = for 7.5 < x < 20 , 12.5 7.5 20 − 7.5 together with F = 0 if x < 7.5 and F = 1 if x > 20. FX (x) = Z Part (c): We have P (X < 10) = P (10 < X < 15) = Z 10 Z7.515 10 10 − 7.5 dξ = = 0.2 12.5 12.5 5 dξ = = 0.4 . 12.5 12.5 113 4 3 2 f(x) 1 0 0.0 0.2 0.4 0.6 0.8 1.0 x Figure 3: A plot of the density f (x) given in Exercise 4.15. Part (d): We have P (|X − µ| < nσ) = P (µ − nσ < X < µ + nσ) = Z min(µ+nσ,20) max(µ−nσ,7.5) dξ . 12.5 When n = 1 this is 0.5773503 and when n = 2 this is 1.0. These were computed with the following R code f_fn = function(n){ m = ( 7.5 + 20 )/2 # the mean s = sqrt( ( 20 - 7.5 )^2 / 12 ) # the standard deviation ll = max( m - n * s, 7.5 ) ul = min( m + n * s, 20 ) result = ( ul - ll )/12.5 } lapply( c(1,2), f_fn ) 114 Exercise 4.15 Part (a): We first plot 90x8 (1 − x) for 0 < x < 1. See Figure 3 where we do this. Next we compute the cdf of X as x 9 9 Z x x ξ 10 x10 ξ 8 = 90 = 10x9 − 9x10 . FX (x) = 90ξ (1 − ξ)dξ = 90 − − 9 10 0 9 10 0 With the conditions that FX (x) = 0 for x < 0 and FX (x) = 1 for x > 1. Part (b): This is P (X ≤ 0.5) = F (0.5) = 10(0.5)9 − 9(0.5)10 = 0.01074219 . Part (c): This is P (0.25 < X ≤ 0.5) = F (0.5) − F (0.25) = 0.01071262 , when we use the above expression for F (x). The second requested expression has the same numerical value as the probability just computed above since our density is continuous. Part (d): We want to find x0.75 which is the point x that satisfy F (x) = 0.75. This means that we need to solve 10x9 − 9x10 = 0.75 , for x. We do that with the following R code coefs = rep( 0, 11 ) coefs[1] = -0.75 # constant term coefs[10] = 10 # coefficient of x^9 coefs[11] = -9 # coefficient of x^10 polyroot( coefs ) The only real root that satisfies 0 < x < 1 is x = 0.9035961. Part (e): We can compute these using 1 10 Z 1 x11 x 9 − E(X) = 90 x (1 − x)dx = 90 10 11 0 0 1 1 = 90 = 0.8181818 , − 10 11 and E(X 2 ) = 90 Z 1 1 x11 x12 − 11 12 0 x10 (1 − x)dx = 90 1 1 = 90 = 0.6818182 . − 11 12 0 115 Thus Var (X) = E(X 2 ) − E(X)2 = 0.6818182 − 0.81818182 = 0.01239674 √ SD (X) = 0.01239674 = 0.1113406 . Part (f): For this we have 1 − P (|X − µ| < σ) = 1 − P (µ − σ < X < µ + σ) Z min(µ+σ,1) =1− 90x8 (1 − x)dx max(µ−σ,0) = 1 − (FX (min(µ + σ, 1)) − FX (max(µ − σ, 0))) = 0.3136706 . Exercise 4.16 In Exercise 5 above we had the pdf given by f (x) = 38 x2 for 0 < x < 2. Part (a): Our cdf for this pdf is computed as Z x 1 x x3 3 2 FX (x) = ξ dξ = ξ 3 0 = . 8 8 0 8 Part (b): For this we have P (X ≤ 0.5) = FX (0.5) = 0.015625 . Part (c): For this we have P (0.25 ≤ X ≤ 0.5) = FX (0.5) − FX (0.25) = 0.01367188 . Part (d): We want to find the value of x0.75 which is the solution of FX (x) = 0.75. Using the above form for FX (x) we find that the value of x that solves this equation is x = 1.817121. Part (e): To compute these we need 2 Z Z 2 3 3 2 3 3 1 4 3 3 2 x dx = x dx = x = (16) = , E(X) = x 8 8 0 8 4 0 32 2 0 and 2 E(X ) = Z 0 2 2 x Thus from these we compute 2 Z 3 2 3 3 2 4 3 1 5 12 x dx = x dx = x = (32) = . 8 8 0 8 5 0 40 5 Var (X) = So σX = √ 3 12 9 − = = 0.15 . 5 4 20 0.15 = 0.3872983. Part (f): Calculated the same way as in problem 15 for this probability we get 0.3319104. 116 Exercise 4.17 For the uniform distribution we have a cdf given by FX (x) = x−A B−A for A < x < B . The value of xp is given by solving FX (xp ) = xp −A B−A = p. This has the solution xp = (B − A)p + A . Part (b): For this we need to compute B Z B x2 dx = A+B . = E(X) = x B−A 2(B − A) A 2 A and 2 E(X ) = Z B 2 x A Using these we have that dx B−A = 1 1 (B 3 − A3 ) = (B 2 + AB + A2 ) . 3(B − A) 3 Var (X) = E(X 2 ) − E(X)2 1 1 1 = (B 2 + AB + A2 ) − (A + B)2 = (A − B)2 , 3 4 12 when we simplify. Using the expression for the variance we have that B−A σX = √ . 12 Part (c): For this we compute n+1 B Z B 1 1 x dx n n = = (B n+1 − An+1 ) . E(X ) = x B−A (B − A) n + 1 A (B − A)(n + 1) A Exercise 4.19 Part (a): Using the given cdf for this we have 1 P (X ≤ 1) = F (1) = (1 + log(4)) = 0.5965736 . 4 Part (b): This is 3 P (1 ≤ X ≤ 3) = F (3) − F (1) = 4 4 1 1 + log − (1 + log(4)) = 0.369188 . 3 4 Part (c): This is given by 4 x 1 1 1 4 1 + log + − = log . f (x) = F (x) = 4 x 4 x 4 x ′ 117 Exercise 4.20 Part (a): The cdf for this pdf is different in different regions of y. First we have FY (y) = 0 if y < 0. Next we have y Z y y2 1 ξ 2 1 = ξdξ = for 0 < y < 5 . FY (y) = 25 2 0 50 0 25 Next if 5 < y < 10 we have y Z y 25 1 1 2 1 ξ 2 2 FY (y) = + − ξ dξ = + (y − 5) − 50 5 25 2 5 25 2 5 5 2 1 2 1 2 1 = + (y − 5) − y − 25 = y − y 2 − 1 . 2 5 50 5 50 As a sanity check on our work we find FY (10) = 4 − 1 − 2 = 1 as it should. Part (b): If 0 < p < 0.5 then we have to solve 1 2 y = p, 50 √ for y. Solving this we get y = 5 2p. If 0.5 < p < 1.0 then again we need to solve FY (y) = p which in this case is given by 1 2 y − y2 − 1 = p . 5 50 Solving this with the quadratic equation we get y = W W XF inishthis Exercise 4.21 Since the area of a circle is πr 2 the expected area is given by Z 11 3 501π 3π 668 2 2 2 2 E(πr ) = πE(r ) = π r = (1 − (10 − r) ) dr = = 314.7876 , 4 4 5 5 9 when we perform the needed integration. Exercise 4.22 Part (a): For this we have −1 x Z x Z x ξ 1 ξ −2 dξ = 2(x − 1) − 2 F (x) = 2 1 − 2 dξ = 2(x − 1) − 2 ξ −1 1 1 1 1 2 = 2(x − 1) − 2 − 1 = 2x + − 4 . x x 118 Lets check this expression for F (x) at a few special points. We have F (1) = 2 + 2 − 4 = 0 2 F (2) = 4 + − 4 = 1 . 2 both of which must be true. Part (b): For this part of the problem we want to find xp such that F (xp ) = p. From the above expression for F (x) this means that 2xp + 2 − 4 = p, xp or p x2p + −2 − xp + 1 = 0 . 2 When we solve this with the quadratic formula we get r p p2 p + . xp = 1 + ± 4 2 16 We need to determine which of the two signs in the above formula to use. Lets check a few “easy cases” [?] and see if they will tell us this information. If p = 0 then from the above we have x0 = 1 which does not tell us the information we wanted. If p = 1 then the above formula gives r 1 1 1 5 3 x1 = 1 + ± + = ± . 4 2 16 4 4 Must take the plus sign so that x1 = 2 otherwise x1 < 2 which is not possible. x1/2 1 2 in the above to compute s 1 1 9 1 3 3 1 1 =µ ˜ =1+ + = + = . + 8 8 16 4 8 2 4 2 To find the median µ ˜ we take p = Part (c): We compute these expectations as Z 2 Z 2 Z 2 dx 1 xdx − 2 E(X) = 2 1 − 2 dx = 2 x x 1 1 1 2 2 2x = − 2 ln(x)|21 = (4 − 1) − 2 ln(2/1) = 3 − 2 ln(2) = 1.613706 . 2 1 Next to compute Var (X) we need to compute E(X 2 ) we do this as Z 2 Z 2 Z 2 1 2 2 2 E(X ) = x 2 1 − 2 dx = 2 x dx − 2 dx x 1 1 1 8 2 2 2 = x3 1 − 2x|21 = (8 − 1) − 2(2 − 1) = . 3 3 3 119 Thus using these two we have 8 − (3 − 2 ln(2))2 = 0.06262078 . 3 Part (d): We have h(X) = max(1.5 − X, 0) in stock so at the end of the week we expect to have Var (X) = E(h(X)) = max(1.5 − E(X), 0) = max(1.5 − (3 − 2 ln(2)), 0) = max(−0.1137056, 0) = 0 . So none left at the end of the week. Exercise 4.23 When we have F = 1.8C + 32 then E(F ) = 1.8E(C) + 32 = 1.8(120) + 32 = 248.0 . Var (F ) = 1.82 Var (C) = 1.82 (22 ) = 12.96 . Thus SD (F ) = √ 12.96 = 3.6 . Exercise 4.24 Part (a): The expectation of X can be computed as −k+1 ∞ Z ∞ Z ∞ k x kθ k −k k dx = kθ x dx = kθ E(X) = x xk+1 −k + 1 θ θ θ 1 k kθk 0 − k−1 = θ. =− k−1 θ k−1 As long as −k + 1 < 0 or k > 1 so that we can evaluate the integral in the limit as x → ∞. Part (b): If k = 1 this expectation is undefined. Part (c): To compute the variance we need to compute E(X 2 ). We can compute this using Z ∞ Z ∞ Z ∞ k kθ k 2−k−1 k 2 2 dx = kθ E(X ) = x dx = kθ x−k+1 dx x k+1 x θ θ θ −k+2 ∞ k 1 k 2 x = kθ 0 − k−2 = θ . = kθk −k + 2 θ −k + 2 θ k−2 Using this we then have k 2 1 k2 k 2 2 Var (X) = θ − θ = kθ − k−2 (k − 1)2 k − 2 (k − 1)2 1 2 = kθ . (k − 2)(k − 1)2 120 The expression we were trying to show. Part (d): If k = 2 then the variance does not exist. Part (e): If we try to compute E(X n ) then we need to evaluate Z ∞ Z ∞ k kθ k n n dx = kθ xn−k−1 dx . E(X ) = x k+1 x θ θ This integral will only converge if the power on x is “large enough negative”. Specifically we need to have n − k − 1 < −1 for convergence. This is equivalent to k > n. This condition satisfies what we found for for Part (a) and (b) above. Exercise 4.25 Part (a): The distribution function for Y is given by y − 32 y − 32 FY (y) = P {Y ≤ y} = P {1.8X + 32 ≤ y} = P X ≤ = FX . 1.8 1.8 Now µ ˜Y is the number such that FY (˜ µY ) = 0.5. This means that µ ˜Y − 32 FX = 0.5 . 1.8 From the distribution of X this means that µ ˜Y − 32 =µ ˜X , 1.8 so solving for µ ˜Y we get µ ˜ Y = 32 + 1.8˜ µX . Part (b): To find yp we need to solve FY (yp ) = FX yp − 32 1.8 = p. Using the distribution of X this means that yp − 32 = xp , 1.8 or yp = 1.8xp + 32 . Part (c): If xp is the p-percentile of the X distribution then yP = axp + b is the p-th percentile of the Y distribution. 121 Exercise 4.28 Using R notation to evaluate all of these we have Part (a): P (0 ≤ Z ≤ 2.17) = pnorm(2.17) − pnorm(0) = 0.4849966 . Part (b): P (0 ≤ Z ≤ 1) = pnorm(1) − pnorm(0) = 0.3413447 . Part (c): pnorm(0) − pnorm(−2.5) = 0.4937903 . Part (d): pnorm(2.5) − pnorm(−2.5) = 0.9875807 . Part (e): pnorm(1.37) = 0.9146565 . Part (f): P (−1.75 ≤ Z) = 1 − P (Z < −1.75) = 1 − pnorm(−1.75) = 0.9599408 . Part (g): pnorm(2.0) − pnorm(−1.5) = 0.9104427 . Part (h): pnorm(2.5) − pnorm(1.37) = 0.07913379 . Part (i): P (1.5 ≤ Z) = 1 − P (Z < 1.5) = 1 − pnorm(1.5) = 0.0668072 . Part (j): P (|Z| ≤ 2.5) = P (−2.5 < Z < +2.5) = pnorm(2.5) − pnorm(−2.5) = 0.9875807 . Exercise 4.29 Using R notation/functions we can answer these questions Part (a): qnorm(0.9838) = 2.139441 . 122 Part (b): pnorm(c) − pnorm(0) = 0.291 , so pnorm(c) = 0.291 + pnorm(0) = 0.791 so c = qnorm(0.791) = 0.8098959 . Part (c): P (c ≤ Z) = 0.121 or 1 − P (Z ≤ c) = 0.121 or P (Z ≤ c) = 0.879 . Thus c = qnorm(0.879) = 1.170002. Part (d): P (−c ≤ Z ≤ c) = 1 − 2Φ(−c) = 0.668 so Φ(−c) = 0.166 , so c = −qnorm(0.166) = 0.9700933. Part (e): P (c ≤ |Z|) = 0.016 or 1 − P (|Z| ≤ c) = 0.016 or P (|Z| ≤ c) = 0.984 . Then following the same steps as in Part (d) we find c = 2.408916. Exercise 4.30 Using R notation for these we have qnorm( c(0.91, 0.09, 0.75, 0.25, 0.06) ) [1] 1.3407550 -1.3407550 0.6744898 -0.6744898 -1.5547736 Exercise 4.31 Using R notation we have qnorm( c( 1-0.0055, 1-0.09, 1-0.663 ) ) [1] 2.5426988 1.3407550 -0.4206646 Exercise 4.32 In R notation these are given by • pnorm((100 − 80)/10) = 0.9772499. 123 • pnorm((80 − 80)/10) = 0.5. • pnorm((100 − 80)/10) − pnorm((65 − 80)/10) = 0.9104427. • 1 − pnorm((70 − 80)/10) = 0.8413447. • pnorm((95 − 80)/10) − pnorm((85 − 80)/10) = 0.2417303. • pnorm((90 − 80)/10) − pnorm((70 − 80)/10) = 0.6826895. Exercise 4.33 In R notation these are given by Part (a): P (X < 18) = pnorm((18−15)/1.25) = 0.9918025. Part (b): pnorm((12 − 15)/1.25) − pnorm((10 − 15)/1.25) = 0.008165865. Part (c): This would be P (|X − 15| ≤ 1.5(1.25)) = P (15 − 1.875 < X < 15 + 1.875) = P (13.125 < X < 16.875) = pnorm((16.875 − 15)/1.25) − pnorm((13.125 − 15)/1.25) = 0.8663856 . Exercise 4.34 In R notation these are given by Part (a): P (X > 0.25) = 1 − P (X < 0.25) = 1 − pnorm((0.25 − 0.3)/0.06) = 0.7976716. Part (b): P (X < 0.1) = pnorm((0.1 − 0.3)/0.06) = 0.0004290603. Part (c): We would want a value of t such that P (Z > t) = 0.05. This is equivalent to P (Z < t) = 0.95. This has a solution t = qnorm(0.95) = 1.644854. In terms of X this means that the largest 5% of concentration values will satisfy X − 0.3 > 1.644854 so X > 0.3986912 . 0.06 Exercise 4.35 Part (a): These two wordings are the same and are computed as P (X > 10) = 1 − P (X < 10) = 1 − pnorm((10 − 8.8)/2.8) = 0.3341176. Part (b): P (X > 20) = 1 − P (X < 20) = 1 − pnorm((20 − 8.8)/2.8) = 3.167124 10−5. 124 Part (c): This would be P (5 < X < 10) = pnorm((10 − 8.8)/2.8) − pnorm((5 − 8.8)/2.8) = 0.5785145. Part (d): We want the value of c such that P (8.8 − c < X < 8.8 + c) = 0.98 . Converting to a standard normal variable Z this is equivalent to 8.8 + c − 8.8 8.8 − c − 8.8 = 0.98 or <Z< P 2.8 2.8 +c −c = 0.98 or <Z< P 2.8 2.8 c 1 − 2Φ − = 0.98 or 2.8 c Φ − = 0.98 . 2.8 This last equation has the solution − c = qnorm(0.01) = −2.326348 so c = 6.513774 . 2.8 Part (e): From Part (a) we have P (X > 10) = 0.3341176. The event that at least one tree has a diameter exceeding 10 is the complement of the event that none of the selected trees has a diameter that large. Thus the probability we seek is 1 − (1 − P (X > 10))4 = 1 − (1 − 0.3341176)4 = 0.803397 . Exercise 4.36 Part (a): For these we would have 1500 − 1050 = 0.9986501 and P (X < 1500) = Φ 150 1000 − 1050 P (X > 1000) = 1 − P (X < 1000) = 1 − Φ = 0.6305587 . 150 Part (b): This is given by 1500 − 1050 P (1000 < X < 1500) = Φ 150 −Φ 1000 − 1050 150 = 0.6292088 . Part (c): For this part we want to find a t such that P (X < t) = 0.02 or t − 1050 X − 1050 = 0.02 . < P 150 150 125 In R notation this becomes t − 1050 = qnorm(0.02) 150 Solving for t gives t = 741.9377. Part (d): Now let p be defined as p = P (X > 1500) = 1 − P (X < 1500) = 0.001349898 , using the result from Part (a). Then if N is the number of droplets (from five) that have a size greater than 1500 µm we can compute P (N > 1) as 5 0 p (1 − p)5 = 0.006731292 . P (N > 1) = 1 − P (N = 0) = 1 − 0 Exercise 4.37 Part (a): Now we have P (X = 105) = 0 since X is a continuous. Next we compute 105 − 104 P (X < 105) = Φ = 0.5792597 . 5 The statement “X is at most 105” is the same condition as X < 105 and the probability we seek is the same as the one above. Part (b): We can compute this using |X − µ| |X − µ| > 1 = 1−P < 1 = 1−(Φ(1)−Φ(−1)) = 0.3173105 . P (|X−µ| > σ) = P σ σ Part (c): We would want to find the value of t such that |X − µ| P > t = 0.001 . σ This means that t is the solution to −qnorm(0.001/2) = 3.290527. Once we have this value of t the extream values of X are given by X < µ − σt = 87.54737 and X > µ + σt = 120.45263. Exercise 4.46 Part (a): This would be P (67 < X < 75) = pnorm((75 − 70)/3) − pnorm((67 − 70)/3) = 0.7935544 . 126 Part (b): We want a value of c such that c c −Φ − P (70 − c < X < 70 + c) = Φ 3 3c c c =Φ − 1−Φ = 2Φ −1, 3 3 3 equals 0.95. Solving for c in the above we get c = 5.879892. Part (c): The number of acceptable specimens is a binomial random variable which has an expectation of np or 100.95 = 9.5. Part (d): Following the hint we have p = P (X < 73.84) = pnorm((73.84 − 70)/3) = 0.8997274 and we want to evaluate P (Y ≤ 8) = 8 X dbinom(y, 10, 0.8997274) = 0.2649573 . y=0 Exercise 4.47 We want the value of c such that P (X + 1 < c) = 0.99 or P (X < c − 1) = 0.99. This means that c − 1 − 12 P Z< = 0.99 , 3.5 so c − 13 = qnorm(0.99) = 2.326348 . 3.5 and c = 21.14222. Exercise 4.48 To solve these we will use Φ(−c) = 1 − Φ(c) . (14) Part (a): We have P (−1.72 < Z < −0.55) = Φ(−0.55)−Φ(−1.72) = (1−Φ(0.55))−(1−Φ(1.72)) = 0.2484435 . Part (b): We have P (−1.75 < Z < 0.55) = Φ(0.55) − Φ(−1.75) = Φ(0.55) − (1 − Φ(1.75)) = 0.6687812 . 127 Exercise 4.98 Part (a): P (10 ≤ X ≤ 20) = Part (b): P (X ≥ 10) = Part (c): This would be R 25 R 20 1 dξ 10 25 1 dξ 10 25 F (x) = = Z 0 = 1 (25 25 x 1 (20 25 − 10) = 52 . − 10) = 53 . 1 dξ = 25 x 25 0 0 ≤ x ≤ 25 . otherwise Part (d): Using the formulas for the mean and variance of a uniform distribution we have 1 E(X) = (25 + 0) = 12.5 . 2 Var (X) = 1 (25 − 0)2 = 52.08333 . 12 So σX = 7.216878. 128 Y =0 Y =1 Y =2 Y =3 Y =4 X = 0 0.6(0.5) = 0.3 0.1(0.5) = 0.05 0.1(0.5) = 0.05 0.1(0.5) = 0.05 0.1(0.5) = 0.05 X = 1 0.6(0.3) = 0.18 0.1(0.3) = 0.03 0.05(0.3) = 0.015 0.05(0.3) = 0.015 0.2(0.3) = 0.06 X = 2 0.6(0.2) = 0.12 0.1(0.2) = 0.02 0.05(0.2) = 0.01 0.05(0.2) = 0.01 0.2(0.2) = 0.04 Table 1: The joint probability distribution requested in Exercise 5.2 Part (b). Joint Probability Distributions and Random Sampling Problem Solutions Exercise 5.1 Part (a): 0.02 Part (b): This would be 0.1 + 0.04 + 0.08 + 0.2 = 0.42. Part (c): The combined condition {X 6= 0 and Y 6= 0} is the event that there is a person at each pump. We can then compute P {X 6= 0 and Y 6= 0} = 0.2 + 0.06 + 0.14 + 0.3 = 0.7 . Part (d): For pX (x) we would compute (for the values of x ∈ {0, 1, 2}) 0.16 , 0.34 , 0.5 . For pY (y) we would compute (for the values of y ∈ {0, 1, 2}) 0.24 , 0.38 , 0.38 . Using these we can compute P (X ≤ 1) = pX (0) + pX (1) = 0.16 + 0.34 = 0.5 . Part (e): To be independent we would need to check if p(x, y) = pX (x)pY (y) for all x and y. Consider x = 0 and y = 0 then from the table we have p(0, 0) = 0.1. Does this equal pX (0)pY (0) = 0.16(0.24) = 0.0384 . As these two numbers are not equal we have that X and Y are not independent. Exercise 5.2 Part (a): See Table 1 for the requested table. 129 x1 pX1 (x1 ) 0 1 2 3 4 0.19 0.3 0.25 0.14 0.12 Table 2: The marginal probability distribution pX1 (x1 ) requested in Problem 4 Part (a). Part (b): From the table above we have P (X ≤ 1 and Y ≤ 1) = 0.3 + 0.05 + 0.18 + 0.03 = 0.56 , while at the same time we have P (X ≤ 1)P (Y ≤ 1) = (0.5 + 0.3)(0.6 + 0.1) = 0.56 , which are the same as they should be. Part (c): P (X + Y = 0) = P (X = 0 and Y = 0) = 0.3. Part (d): We have P (X + Y ≤ 1) = P (X = 0 and Y = 0) + P (X = 0 and Y = 1) + P (X = 1 and Y = 0) = 0.3 + 0.05 + 0.18 = 0.53 . Exercise 5.3 Part (a): 0.15. Part (b): P (X1 = X2 ) = 0.08 + 0.15 + 0.1 + 0.07 = 0.4. Part (c): This would be P (A) = P {X1 − X2 ≥ 2 or X2 − X1 ≥ 2} = P {|X1 − X2 | ≥ 2} = P (0, 2) + P (0, 3) + P (1, 3) + P (2, 0) + P (3, 1) + P (3, 0) + P (4, 2) + P (4, 1) + P (4, 0) = 0.04 + 0.00 + 0.04 + 0.05 + 0.03 + 0.00 + 0.05 + 0.01 + 0.00 = 0.22 . Part (d): The first part would be P {X1 + X2 = 4} = P (1, 3) + P (2, 2) + P (3, 1) + P (4, 0) = 0.04 + 0.1 + 0.03 + 0.00 = 0.17 . Exercise 5.4 Part (a): See Table 2 for a tabular representation of pX1 (x1 ). Using that table we can compute E[X1 ] = 0(0.19) + 1(0.3) + 2(0.25) + 3(0.14) + 4(0.12) = 1.7 . 130 x2 0 1 2 pX2 (x2 ) 0.19 0.3 0.28 3 0.23 Table 3: The marginal probability distribution pX2 (x2 ) requested in Problem 4 Part (b). x pX (x) 0 1 0.1 0.2 2 3 4 0.3 0.25 0.15 Table 4: The marginal probability distribution pX (x) requested in Problem 5 Part (a). Part (b): See Table 3 for a tabular representation of pX2 (x2 ). Part (c): We have P (X1 = 4, X2 = 0) = 0 while the product P (X1 = 4)P (X2 = 0) = 0.12(0.19) = 0.0228 6= 0 and thus the random variables X1 and X2 are not independent. Exercise 5.5 Recall that X is the number of customers waiting in line and each customer can have 1,2, or 3 packages each with probabilities 0.6, 0.3, 0.1. The random variable Y is the total number of packages. To help solving this problem see Table 4 for a tabular representation of pX (x), and Table 5 for a tabular representation of pY (y). Part (a): P (X = 3, Y = 3) = P (Y = 3|X = 3)P (X = 3) = P (probability each customer has only one package to be wrapped)P (X = 3) = 0.63 (0.25) = 0.054 . Part (b): We want to compute P (X = 4, Y = 11) which equals the probability all but one customer in line has three packages to be wrapped and the one customer with less than three packages to be wrapped has two packages to be wrapped. This probability is given by 4 3 0.1 0.3 0.15 = 0.00018 . P (Y = 11|X = 4)P (X = 4) = 1 1 2 3 y pY (y) 0.6 0.3 0.1 Table 5: The marginal probability distribution pY (y) requested in Problem 5 Part (a). 131 Exercise 5.6 Part (a): Using the hint we have 4 (0.6)2 (0.4)2 · 0.15 = 0.05184 . P (X = 4, Y = 2) = P (Y = 2|X = 4)P (X = 4) = 2 Part (b): We have P (X = Y ) = 4 X P (Y = k|X = k)P (X = k) k=0 1 2 3 2 = 0.1 + 0.2 0.6 + 0.3 (0.6) + 0.25 (0.6)3 + 0.15(0.6)4 1 2 3 = 0.40144 . Part (c): We have n m n−m P (X = n) . 0.6 0.4 P (X = n, Y = m) = P (Y = m|X = n)P (X = n) = m To compute this as a table we would let n ∈ {0, 1, 2, 3, 4} and 0 ≤ m ≤ n and evaluate the above expression. The marginal probability mass function for Y can be computed once we have evaluated P (X = n, Y = m) above. For fY (Y = m) we would compute fY (Y = m) = 4 X P (X = n, Y = m) for m = 0, 1, 2, 3, 4 . n=m Exercise 5.7 Recall that X is the number of cars and Y is the number of buses at the proposed left-turn lane. Part (a): From the given table we have P (X = 1, Y = 1) = 0.03. Part (b): We have P (X ≤ 1, Y ≤ 1) = 0.025 + 0.015 + 0.05 + 0.03 = 0.12. Part (c): We have for one car P (X = 1) = 2 X P (X = 1, Y = y) = 0.05 + 0.03 + 0.02 = 0.1 , y=0 and for one bus P (Y = 1) = 5 X P (X = x, Y = 1) = 0.015 + 0.03 + 0.075 + 0.09 + 0.06 + 0.03 = 0.3 . x=0 132 Note this is the sum of the numbers in the column with Y = 1. Part (d): Summing over the cases where would be overflow we would have P (capacity is exceeded) = 5 X P (X = x, Y = 1) + x=3 5 X P (X = x, Y = 2) x=0 = 0.09 + 0.06 + 0.03 + 0.01 + 0.02 + 0.05 + 0.06 + 0.04 + 0.02 = 0.38 . Part (e): From the given table we have PX (x) the sum of the columns and PY (y) the sum of the rows. To be independent we need to check if PX,Y (x, y) = PX (x)PY (y) for all x and y. Considering one specific case we have 0.03 = P (X = 1, Y = 2) , while PX (X = 1)PY (Y = 2) = (0.05 + 0.03 + 0.02)(0.015 + 0.03 + 0.075 + 0.09 + 0.06 + 0.03) = 0.03 . Note that these two results are equal. To fully show independence one would need to verify this calculation for all x and y. Exercise 5.8 Part (a): We have 8 3 p(3, 2) = Part (b): For general x and y we would have 8 p(x, y) = x 10 12 2 1 30 6 10 y 30 6 . 12 6−x−y , for 0 ≤ x ≤ 6 and 0 ≤ y ≤ 6. Note that we must have x + y + z = 6 where z is the number of components selected from the third supplier. Exercise 5.9 Part (a): We must select K to ensure that Z 30 Z 30 f (x, y)dxdy = 1 . 20 20 133 The left-hand-side of this can be evaluated as follows 30 Z 30 Z 30 Z 30 3 x 2 + xy dy f (x, y)dxdy = K 3 20 20 y=20 20 Z 30 1 3 3 2 =K (30 − 20 ) + 10y dy 3 y=20 10K 20K K 19000 = (19000) = 1 . = 19000(10) + 3 3 3 So solving for K we get K = 3 380000 = 7.894737 10−6. Part (b): Using the above value of K we find Z 26 Z P (20 ≤ X ≤ 26, 20 ≤ Y ≤ 26) = K 20 26 (x2 + y 2 )dxdy = 38304K = 0.3024 . 20 Part (c): This would be P (|Y − X| ≤ 2) = 1 − P (|Y − X| ≥ 2) = 1 − P (Y − X ≥ 2 or Y − X ≤ −2) = 1 − P (Y ≥ X + 2 or Y ≤ X − 2) = 1 − P (Y ≥ X + 2) − P (Y ≤ X − 2) Z 28 Z 30 Z 30 Z x−2 2 2 =1− K(x + y )dxdy − K(x2 + y 2 )dxdy x=20 y=x+2 x=22 y=20 = 1 − 40576K − 40576K = 1 − 81152K = 1 − 0.6406737 = 0.3593263 . One would need to evaluate the given integrals. Part (d): We would need to compute 30 Z 30 y 3 2 2 2 pX (x) = K(x + y )dy = K x y + 3 20 y=20 3 3 30 − 20 19000 2 2 = K 10x + = K 10x + , 3 3 (15) for 20 ≤ x ≤ 30. Note that as X and Y are symmetric the above functional form is also the functional form for the marginal distribution in the left tire i.e. pY (y). Part (e): As the functional expression for the joint density f (x, y) = K(x2 + y 2) does not factor into a function of x alone multiplied by a function of y alone we conclude that the random variables X and Y are not independent. Exercise 5.10 Part (a): We would have PX,Y (x, y) = 1 when 5 ≤ x ≤ 6 and 5 ≤ y ≤ 6 , 134 = . and zero otherwise. Part (b): This would be calculated as P (5.25 ≤ X ≤ 5.75 and 5.25 ≤ Y ≤ 5.75) = 0.52 . Part (c): Following the hint we would need to evaluate 1 1 1 P |X − Y | ≤ = 1 − 2P 5 + ≤ X ≤ 6 and 5 ≤ Y ≤ X − 6 6 6 = 1 − 2Area of lower right triangle in X-Y space 2 1 1 1 5 11 25 =1−2 1− 1− =1− = . =1− 2 6 6 6 36 36 Note that since our probability density is constant in evaluating the above probability we can use the geometric view of the integration region (i.e. that it is a square region with two triangles removed). Exercise 5.11 Part (a): By independence we would have x −λ y −θ θ e λ e , PX,Y (x, y) = x! y! for x ≥ 0 and y ≥ 0. Part (b): This would be P (at most one error) = P (X = 0, Y = 0) + P (X = 0, Y = 1) + P (X = 1, Y = 0) = e−λ−θ + e−λ−θ (θ + λ) . Part (c): For the event A defined in the problem we have P (A) = P {(X, Y ) : X + Y = m} = −λ−θ =e Note that we can write m X λx θm−x . x!(m − x)! x=0 m X x=0 P (X = x, Y = m − x) 1 m 1 , = x!(m − x)! m! x so that the above expression for P (A) becomes m e−λ−θ X m x m−x e−λ−θ λ θ = (λ + θ)m , m! x=0 x m! 135 using the binomial theorem. Notice that this is the pmf for a Poisson random variable with parameter λ + θ. Lets check that this result gives the same as we obtained in Part (b) of this problem. In that part the probability we want to compute is P {m = 0} + P {m = 1} = e−λ−θ + e−λ−θ (λ + θ) , which is the same expression that we got earlier. Exercise 5.12 Part (a): To solve this we will first compute the pdf of X. To do this we have −xy ∞ Z ∞ Z ∞ e −x −x −x(1+y) −x fX (x) = f (x, y)dy = xe dy = xe = −e (0 − 1) = e , −x y=0 y=0 0 for x ≥ 0. From this we want to compute Z 3 3 P (X ≥ 3) = 1 − P (X < 3) = 1 − e−x dx = 1 + e−x 0 = 1 + (e−3 − 1) = e−3 . 0 Part (b): Note that we computed fX (x) above. Now to compute fY (y) we have to evaluate ∞ Z ∞ Z ∞ xe−x(1+y) 1 −x(1+y) fY (y) = xe dx = + e−x(1+y) dx −(1 + y) 0 1+y 0 x=0 −x(1+y) ∞ 1 1 e 1 1 =− =− (0 − 0) + (0 − 1) = . 2 1+y 1 + y −(1 + y) 0 (1 + y) (1 + y)2 Now for X and Y to be independent we would need to have fX,Y (x, y) = fX (x)fY (y) which is not true in this case. Thus X and Y are not independent. Part (c): For this part we want to evaluate P (X ≥ 3 or Y ≥ 3) = 1 − P (X ≤ 3 and Y ≤ 3) Z 3 Z 3 Z −x(1+y) =1− xe dydx = 1 − x=0 3 y=0 3 −x xe x=0 Z 3 e−xy dydx y=0 3 Z 3 e−xy −x −3x −x dx = 1 + e e − 1 dx =1− xe −x y=0 x=0 x=0 3 3 Z 3 e−x e−4x −4x −x − =1+ (e − e )dx = 1 + −4 0 −1 0 x=0 1 1 1 = 1 − (e−12 − 1) + e−3 − 1 = + e−3 − e−12 . 4 4 4 Z Exercise 5.13 Part (a): Since X and Y are independent random variables we have f (x, y) = λe−λx λe−λy = e−x−y , 136 when we take λ = 1. Part (b): For this we want to calculate −λx 1 −λy 1 Z 1Z 1 e e 2 −λx −λy 2 P (X ≤ 1 and X ≤ 1) = λ e e dxdy = λ −λ −λ 0 0 0 0 = (1 − e−λ )(1 − e−λ ) = (1 − e−1 )(1 − e−1 ) = 0.3995764 . Part (c): Following the hint if we draw of the region A then the probability P that the total lifetime of the two bulbs is at most two is given by the following integral Z 2 Z 2−x P = λ2 e−λ(x+y) dydx , x=0 y=0 which we can evaluate as follows λy 2−x Z 2 Z 2 e −λx 2 −λx −λ(2−x) e P =λ dx = −λ e e − 1 dx −λ 0 x=0 x=0 Z 2 −λx 2 e = −λ e−2λ − e−λx dx = −λ e−2λ (2) − −λ 0 x=0 1 = −λ 2e−2λ + (e−2λ − 1) = −2λe−2λ + 1 − e−2λ λ −2λ =1−e − 2λe−2λ = 0.5939942 , when we take λ = 1. Part (d): If we draw this region in the X-Y plane then this probability can be expressed as an integral of the joint probability density function. As such we would need to evaluate the following Z 1 Z 2−x Z 2 Z 2−x 2 −λx −λy P {1 ≤ X + Y ≤ 2} = λ e e dydx + λ2 e−λx e−λy dydx . x=0 y=1−x x=1 We do that with the following calculations Z 1 Z 2−x Z 2 −λx −λy P {1 ≤ X + Y ≤ 2} = λ e e dydx + x=0 y=1−x 2 x=1 Z y=0 2−x λ2 e−λx e−λy dydx y=0 2−x 2−x Z 2 1 −λy 1 −λy 2 −λx 2 −λx dx + λ dx e =λ − e e e λ −λ x=1 x=0 1−x 0 Z 1 Z 2 −λx −λ(2−x) −λ(1−x) = −λ e (e −e )dx − λ e−λx (e−λ(2−x) − 1)dx x=0 x=1 Z 1 Z 2 =λ (e−λ − e−2λ )dx + λ (e−λx − e−2λ )dx x=0 x=1 ! −λx 2 e − e−2λ = λ(e−λ − e−2λ ) + λ −λ 1 Z 1 = λ(e−λ − e−2λ ) − λe−2λ + (e−λ − e−2λ ) = λe−λ − 2λe−2λ + e−λ − e−2λ = (λ + 1)e−λ − (2λ + 1)e−2λ = 0.329753 , 137 when we take λ = 1. Exercise 5.14 Part (a): Following Example 5.11 in this section we have f (x1 , x2 , . . . , x10 ) = λ10 e−λ P10 i=1 xi , for xi ≥ 0. Now we want to evaluate P (X1 ≤ t, X2 ≤ t, . . . , X10 ≤ t) = Z tZ 0 =λ 10 t ··· Z t λ10 e−λ 0 0 t 10 −λx Y i i=1 P10 i=1 xi dx1 dx2 · · · dx10 10 Y e = (1 − e−λt ) . −λ 0 i=1 Part (b): For this part we want exactly k relationships of the form Xi ≤ t and then 10 − k relationships of the form Xi ≥ t. Then the probability is given by 10 (1 − e−λt )k (e−λt )10−k , k where the factor 10 is needed since that is the number of ways we can select the k light k bulbs that will fail. Part (c): As in the previous part we still need to have five relationships of the form Xi ≤ t and five others of the form Xi ≥ t. To evaluate the total probability we can condition on whether the single bulb with parameter θ is in the set that fails before t or in the set that fails after t. The probability that it is in either set is 21 . Thus we get 1 1 9 9 −θt −λt 5 −λt 4 −λt 4 −λt 5 (1 − e ) (e ) e + (1 − e ) (e ) (1 − e−θt ) . 5 4 2 2 Exercise 5.15 Part (a): From the drawing (and the hint) we can write F (y) = P (Y ≤ y) = P ({X1 ≤ y} ∪ ({X2 ≤ y} ∩ {X3 ≤ y})) = P (X1 ≤ y) + P ({X2 ≤ y} ∩ {X3 ≤ y}) − P ((X1 ≤ y) ∩ {X2 ≤ y} ∩ {X3 ≤ y}) = (1 − e−λy ) + (1 − e−λy )2 − (1 − e−λy )3 = 1 − 2e−2λy + e−3λy , when we expand and simplify. Lets check some properties of F (y) to verify that the above gives reasonable results. Note that F (0) = 1 − 2 + 1 = 0 and that limy→∞ F (y) = 1 as a cumulative density should. 138 Part (b): Using the expression for F (y) we find f (y) = F ′ (y) = 4λe−2λy − 3λe−3λy . For the expectation of Y we then find Z ∞ E[Y ] = y(4λe2λy − 3λe−3λy )dy 0 −3λy ∞ −2λy ∞ Z ∞ Z ∞ 1 1 ye ye −2λy −3λy + + e dy − 3λ e dy = 4λ −2λ 0 2λ 0 −3λ 0 3λ 0 ∞ ∞ 1 e−2λy 1 e−3λy = 4λ 0 + − 3λ 0 + 2λ −2λ 0 3λ −3λ 0 1 2 2 . = − (0 − 1) + (0 − 1) = 2λ 3λ 3λ Exercise 5.16 Part (a): In Example 5.10 we are given f (x1 , x2 , x3 ) and in this part we want to compute f (x1 , x3 ) by “integrating out” x2 . We can do that as follows f (x1 , x3 ) = = Z 1−x1 −x3 x2 =0 Z 1−x1 −x3 x2 =0 f (x1 , x2 , x3 )dx2 kx1 x2 (1 − x3 )dx2 = kx1 (1 − x3 ) = kx1 (1 − x3 ) Z 1−x1 −x3 x2 dx2 x2 =0 1−x −x (1 − x1 − x3 )2 x22 1 3 , = kx1 (1 − x3 ) 2 0 2 for x1 ≥ 0, x3 ≥ 0, and x1 + x3 ≤ 1. Here as derived in Example 5.10 we have k = 144. Part (b): We want P (X1 + X3 < 0.5) which we calculate as Z 0.5 Z 1−x1 kx1 P (X1 + X3 < 0.5) = (1 − x3 )(1 − x1 − x3 )2 dx3 dx1 2 x1 =0 x3 =0 Z 0.5 Z 1−x1 = 72 (1 − x3 )((1 − x1 )2 − 2(1 − x1 )x3 + x23 )dx3 dx1 , x1 =0 x3 =0 which would need to be integrated. Part (c): For this part we have Z 1−x1 Z f (X1 ) = f (x1 , x3 )dx3 = 72 x3 =0 1−x1 x3 =0 x1 (1 − x3 )(1 − x1 − x3 )2 dx3 = 6x1 (1 − x1 )3 (x1 + 3) , when we integrate. 139 Exercise 5.17 Part (a): Using the “area” representation of probability we have 2 π R2 πR2 1 = = = 0.25 . 2 2 πR πR (4) 4 Part (b): This would be R2 R 1 R = = = 0.3183099 > 0.25 . P X ≤ ∩Y ≤ 2 2 2 πR π Part (c): The probability is then 2 2 √R2 πR2 = 2 = 0.6366198 . π Part (d): The marginal pdf of X and Y are given by fX (x) = Z fY (y) = Z √ + R2 −x2 √ y=− R2 −x2 + √ x=− R2 −y 2 √ R2 −y 2 1 2 √ 2 R − x2 dy = πR2 πR2 2 p 2 1 dy = R − y2 . πR2 πR2 To have X and Y independent would mean that fX,Y (x, y) = fX (x)fY (y). From the expressions for these pdf’s above we see that this is not true. Exercise 5.18 Part (a): We have pX,Y (1, 0) 0.08 0.08 = = = 0.2352941 pX (1) 0.08 + 0.2 + 0.06 0.34 0.2 pX,Y (1, 1) = = 0.5882353 pY |X (0|1) = pX (1) 0.34 0.06 pX,Y (1, 2) = = 0.1764706 . pY |X (2|1) = pX (1) 0.34 pY |X (0|1) = Part (b): We are told that X = 2 and we want to evaluate pY |X (y|2). We have pY |X (y|2) = pX,Y (2, y) pX,Y (2, y) pX,Y (2, y) = = . pX (2) 0.06 + 0.14 + 0.3 0.5 140 When we evaluate the above for y ∈ {0, 1, 2} we get the values 0.12 , 0.28 , 0.60 . Part (c): This would be P (Y ≤ 1|X = 2) = 1 X pY |X (y|2) = 0.12 + 0.28 = 0.4 . y=0 Part (d): We are told that Y = 2 and we want to evaluate pX|Y (x|2). We have pX|Y (x|2) = pX,Y (x, 2) pX,Y (x, 2) pX,Y (x, 2) = = . pY (2) 0.02 + 0.06 + 0.3 0.38 When we evaluate the above for x ∈ {0, 1, 2} we get the values 0.05263158 , 0.15789474 , 0.78947368 . Exercise 5.19 Part (a): Using fX (x) and fY (y) from Exercise 5.9 on Page 133 we have f (Y = y|X = x) = and f (X = x|Y = y) = K(x2 + y 2 ) x2 + y 2 f (X = x, Y = y) = , = f (X = x) 10x2 + 19000 K 10x2 + 19000 3 3 f (X = x, Y = y) x2 + y 2 K(x2 + y 2 ) . = = 2 + 19000 f (Y = y) 10y K 10y 2 + 19000 3 3 Part (b): We first evaluate f (Y = y|X = 22) = 222 + y 2 484 + y 2 . = 11173.33 10(222) + 19000 3 Then we want to evaluate P (Y ≥ 25|X = 22) = Z 30 f (Y = y|X = 22)dy = 0.555937 . 25 This to be compared with P (Y ≥ 25) where we have not information on the value of X. We can evaluate this later probability as Z 30 Z 30 P (Y ≥ 25) = K(x2 + y 2 )dxdy = 0.5493418 , y=25 x=20 which is smaller than the value of P (Y ≥ 25|X = 22). 141 Part (c): In this case we want to compute Z 30 E[Y |X = 22] = yf (Y = y|X = 22)dy y=20 Z 30 484 + y 2 = y dy = 25.3729 . 11173.33 y=20 To compute the standard deviation of the pressure in the left tire we first compute Z 30 484 + y 2 2 2 dy = 652.029 , E[Y |X = 22] = y 11173.33 y=20 then using this we have Var (Y |X = 22) = E[Y 2 |X = 22] − E[Y |X = 22]2 = 652.029 − 25.37292 = 8.244946 . Thus the standard deviation is then the square root of that or 2.871401. Exercise 5.20 Part (a): We have when we evaluate the pmf of a multinomial distribution n! 12! px1 1 px2 2 px3 3 px4 4 px5 5 px6 6 = 0.242 0.132 0.162 0.22 0.132 0.142 = 0.002471206 . x1 !x2 !x3 !x4 !x5 !x6 ! (2!)6 Part (b): In this case a success is getting an orange candy and a failure is drawing any other color. Thus we then would have 5 X 20 0.2k 0.820−k . P (at most five orange candies) = k k=1 Since we get an orange candy with a probability 0.2 and then must get any other candy with a probability of 1 − 0.2 = 0.8. Part (c): For this part we want to compute P (X1 + X3 + X4 ≥ 10). Now the probability that we get a blue, a green, or an orange candy is given by the sum of their individual probabilities or 0.24 + 0.16 + 0.13 = 0.53, so the probability we don’t get one of these colored candies is 1 − 0.53 = 0.47. We then can compute that 20 X 20 0.53k 0.4720−k . P (X1 + X3 + X4 ≥ 10) = k k=10 Exercise 5.21 Part (a): This would be p(X3 = x3 |X1 = x1 , X2 = x2 ) = p(X3 = x3 , X1 = x1 , X2 = x2 ) . p(X1 = x1 , X2 = x2 ) 142 Part (b): This would be p(X2 = x2 , X3 = x3 |X1 = x1 ) = p(X1 = x1 , X2 = x2 , X3 = x3 ) . p(X1 = x1 ) Exercise 5.22 Part (a): We need to evaluate E[X + Y ] = X X (x + y)f (x, y) = 14.1 . x∈{0,5,10} y∈{0,5,10,15} Part (b): We need to evaluate E[X + Y ] = X X max(x, y)f (x, y) = 9.6 . x∈{0,5,10} y∈{0,5,10,15} We have evaluated each of these using the R code: P = matrix( data=c( 0.02, 0.06, 0.02, 0.1, 0.04, 0.15, 0.20, 0.1, 0.01, 0.15, 0.14, 0.01 ), nrow=3, ncol=4, byrow=T ) X = matrix( data=c( rep(0,4), rep(5,4), rep(10,4) ), nrow=3, ncol=4, byrow=T ) Y = matrix( data=c( rep(0,3), rep(5,3), rep(10,3), rep(15,3) ), nrow=3, ncol=4, byrow=F ) print( sum( ( X + Y ) * P ) ) M_X_Y = matrix( data=c( 0, 5, 10, 15, 5, 5, 10, 15, 10, 10, 10, 15 ), nrow=3, ncol=4, byrow=T ) print( sum( M_X_Y * P ) ) Exercise 5.23 We have E[X1 − X2 ] = XX x1 x2 (x1 − x2 )f (x1 , x2 ) = 0.15 . We have evaluated each of these using the R code: P = matrix( data=c( 0.08, 0.07, 0.06, 0.15, 0.05, 0.04, 0.0, 0.03, 0.0, 0.01, X_1 = matrix( data=c( rep(0,4), X_2 = matrix( data=c( rep(0,5), sum( ( X_1 - X_2 ) * P ) 0.04, 0.00, 0.05, 0.04, 0.10, 0.06, 0.04, 0.07, 0.05, 0.06 ), nrow=5, ncol=4, byrow=T ) rep(1,4), rep(2,4), rep(3,4), rep(4,4) ), nrow=5, ncol=4, byrow=T ) rep(1,5), rep(2,5), rep(3,5) ), nrow=5, ncol=4 ) Exercise 5.24 Let D be the distance in seats between A and B. The number of seats separating the two individuals when sitting at locations X and Y is given in Table 6. To count the number of 143 x=1 x=2 x=3 x=4 x=5 x=6 y=1 0 1 2 1 0 y=2 0 0 1 2 1 y=3 y=4 y=5 1 2 1 0 1 2 0 1 0 0 1 0 2 1 0 y=6 0 1 2 1 0 - Table 6: The number of seats between A and B when A sits at location X and B sits at location Y . people who handle the message between A and B we would need to add two to the numbers given in Table 6. The probability that A sits at X and that Y sits at y is given by p(x, y) = 1 1 = . 6(5) 30 We want to evaluate E[h(x, y)] where we find the value 1.6. Exercise 5.25 The area of the rectangle is given by the product XY . The expected area is then given by 2 Z L+A Z L+A 1 E[Area] = dxdy xy 2A x=L−A y=L−A 2 2 L+A 2 L+A 1 y 1 x (L + A)2 − (L − A)2 = = 4A2 2 2 4A2 2 x=L−A x=L−A 1 1 4L2 2 2 2 2 = (L + A − L + A) (L + A + L − A) = (4A )(2L) = = L2 . 2 2 16A 16A 4 Exercise 5.26 Revenue for the ferry is given by the expression 3X + 10Y so that its expected value would be given by XX E(Revenue) = (3x + 10y)p(x, y) = 15.4 . x y We can evaluate this using the following R code P = matrix( data=c( 0.025, 0.050, 0.125, 0.150, 0.015, 0.030, 0.075, 0.090, 0.010, 0.020, 0.050, 0.060, 144 X = matrix( data=c( nrow=6, Y = matrix( data=c( sum( ( 3 * X + 10 * 0.100, 0.060, 0.040, 0.050, 0.030, 0.020 ), nrow=6, ncol=3, byrow=T ) rep(0,3), rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3) ), ncol=3, byrow=T ) rep(0,6), rep(1,6), rep(2,6) ), nrow=6, ncol=3 ) Y ) * P ) Exercise 5.27 We want to evaluate E(h(X, Y )) = = Z Z Z 1 x=0 1 = Z =6 =6 =6 h(x, y)f (x, y)dxdy Z 1 |x − y|fX (x)fY (y)dydx y=0 1 Z |x − y|(3x2 )(2y)dydx x=0 y=0 Z 1 Z x x=0 y=0 1 Z x Z 2 (x − y)x ydydx + 6 y=0 1 3 2 x=0 1 2 3 x xy xy dx + 6 − 2 3 y=0 when we further integrate and simplify. Z 1 Z (y − x)x2 ydydx x=0 y=x Z 1 Z 1 (x3 y − x2 y 2)dydx + 6 x=0 Z Z x=0 1 x=0 y=x (x2 y 2 − x3 y)dydx 1 1 x3 y 2 xy dx = , − 2 2 y=x 4 2 3 Exercise 5.28 Using independence we have Z Z Z Z E(XY ) = xyf (x, y)dxdy = xyfX (x)fY (y)dxdy Z Z = xfX (x)dx yfY (y)dy = E(X)E(Y ) . In Exercise 25 with independence we have E(Area) = E(X)E(Y ) = L2 , the same result as we found there. 145 Exercise 5.29 ) For this exercise we want to evaluate ρXY = Cov(X,Y for Example 5.16. In that example we σX σY 2 2 calculated µX = µY = 5 and Cov(X, Y ) = − 75 . Thus to compute ρXY we need to evaluate σX and σY . To do that we recall that 2 σX = E(X 2 ) − µ2X . Thus we need 2 Z 1 2 2 Z 1 x3 (1 − 2x + x2 )dx 0 0 1 4 Z 1 x 1 2 1 1 2x5 x6 3 4 5 = 12 (x − 2x + x )dx = 12 = . − + = 12 − + 4 5 6 0 4 5 6 5 0 E(X ) = x (12x(1 − x ))dx = 12 Since the densities for X and Y are the same the above is also equal to E(Y 2 ). Using this 4 1 2 we have σX = σY2 = 51 − 25 = 25 so σX = 15 . We can now evaluate ρXY and find ρXY = 2 − 75 1 25 2 =− . 3 Exercise 5.30 For these two parts recall that Cov(X, Y ) = E(XY ) − E(X)E(Y ) Cov(X, Y ) ρ= . σX σY We can compute all that is needed with the following R code P = matrix( data=c( 0.02, 0.06, 0.02, 0.1, 0.04, 0.15, 0.20, X = matrix( data=c( rep(0,4), rep(5,4), rep(10,4) ), nrow=3, Y = matrix( data=c( rep(0,3), rep(5,3), rep(10,3), rep(15,3) E_X = sum( X * P ) E_Y = sum( X * P ) E_XY = sum( X * Y * P ) print( E_XY - E_X * E_Y ) E_X2 = sum( X^2 * P ) E_Y2 = sum( Y^2 * P ) rho = ( E_XY - E_X * E_Y ) / sqrt( ( E_X2 - E_X^2 ) * ( E_Y2 print(rho) 0.1, 0.01, 0.15, 0.14, 0.01 ), nrow=3, ncol=4, byrow=T ) ncol=4, byrow=T ) ), nrow=3, ncol=4, byrow=F ) - E_Y^2 ) ) Part (a-b): We find Cov(X, Y ) = 13.4475 and ρ = 0.4862374. 146 Exercise 5.31 Part (a-b): Using the results from Exercise 9 in Page 133 we find 1925 = 25.32 76 37040 = 649.825 E(X 2 ) = E(Y 2 ) = 57 Var(X) = Var(Y ) = 649.825 − 25.322 = 8.7226 E(XY ) = 641.447 Cov(X, Y ) = 641.447 − 25.322 = 0.3446 0.3446 Corr(X, Y ) = = 0.03950657 . 8.7226 E(X) = E(Y ) = Note that these numbers do not agree with the ones given in the back of the book. If anyone sees a mistake in what I have done here please contact me. Exercise 5.32 Using the results of Exercise 12 on Page 136 we can compute Z ∞Z ∞ Z ∞ Z ∞ −x(1+y) 2 −x E(XY ) = xy(xe )dydx = xe ye−xy dydx x=0 y=0 x=0 y=0 −xy ∞ Z ∞ Z ∞ e xe−x (0 − 1)dx = x2 e−x dx = − −x x=0 x=0 y=0 Z ∞ Z ∞ ∞ = xe−x dx = −xe−x 0 + e−x dx = 1 , x=0 x=0 when we simplify some. From the densities of fX (x) and fY (y) we can compute E(X) = 1 and E(Y ) = 1 thus Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 1 − 1 = 0 . We would then also have ρ = 0. Exercise 5.33 We have Cov(X, Y ) = E(XY ) − E(X)E(Y ) = E(X)E(Y ) − E(X)E(Y ) = 0 , when we use independence. 147 Exercise 5.34 Part (a): We would have 2 σ = = Z Z Z Z (h(x, y) − E[h(x, y)])2 f (x, y)dxdy h(x, y)2 f (x, y)dxdy − E[h(x, y)]2 . Part (b): We compute E[h] = 0(0.02) + 5(0.06) + 10(0.02) + 15(0.1) + 5(0.04) + 5(0.15) + 10(0.2) + 15(0.1) + 10(0.01) + 10(0.15) + 10(0.14) + 15(0.1) = 10.95 . and E[h2 ] = 02 (0.02) + 52 (0.06) + 102 (0.02) + 152 (0.1) + 52 (0.04) + 52 (0.15) + 102 (0.2) + 152 (0.1) + 102 (0.01) + 102 (0.15) + 102(0.14) + 152 (0.1) = 125.75 . Thus σ 2 = 125.75 − 10.952 = 5.8475. Exercise 5.35 Part (a): We can show the desired result with the following manipulations Cov(aX + b, cY + d) = E((aX + b)(cY + d)) − E(aX + b)E(cY + d) = E(acXY + adX + bcY + bd) − (aE(X) + b)(cE(Y ) + d) = acE(XY ) + adE(X) + bcE(Y ) + bd − (acE(X)E(Y ) + adE(X) + bcE(Y ) + bd) = ac(E(XY ) − E(X)E(Y )) = acCov(X, Y ) . Part (b): To begin recall that Var(aX + b) = a2 Var(X) Var(cY + d) = c2 Var(Y ) . Thus using this and the result from Part (a) we have Cov(aX + b, cY + d) Var(aX + b)Var(cY + d) acCov(X, Y ) acCov(X, Y ) = =p |a||c|σX σY a2 Var(X)c2 Var(Y ) Corr(aX + b, cY + d) = p = sign(a)sign(c)Corr(X, Y ) . 148 Here the function sign(x) is one if x > 0, is zero if x = 0, and is minus one if x < 0. Thus if a and c have the same sign we have sign(a)sign(c) = 1 and we have the requested result. Part (c): If a and c have opposite signs then sign(a)sign(c) = −1 and the correlation of the linear combination is the negative of the correlation of the random variables X and Y . Exercise 5.36 If Y = aX + b then from Exercise 35 above we have Corr(X, Y ) = Corr(X, aX + b) = sign(a)Corr(X, X) . Notice that Corr(X, X) = 1 and thus Corr(X, Y ) = ±1 depending on the sign of a. Specifically if a > 0 the correlation of X and Y will be +1 while if a < 0 it will be −1. Notes on Example 5.21 ¯ ≤ x¯} and {T0 ≤ 2¯ Since the two events {X x} are equivalent, the cumulative probability ¯ x) = FT0 (2¯ x). This means we distribution for X can be derived from that of T0 namely FX¯ (¯ evaluate the function FT0 (·) computed in Example 5.21 at the value 2¯ x. We get x) = FT0 (2¯ x) = 1 − e−2λ¯x − 2λ¯ xe−2λ¯x . FX¯ (¯ ¯ as With this we can get the density function for X x) = fX¯ (¯ x) dFX¯ (¯ = 2λe−2λ¯x − 2λe−2λ¯x − 2λ¯ x(−2λ)e−2λ¯x = 4λ2 x¯e−2λ¯x , d¯ x the same as the books equation 5.6. Exercise 5.37 Part (a-b): See Table 7 where we present all possible two element samples we could draw from this population. With each sample of two we compute the statistics x¯ = 12 (x1 + x2 ) and s2 = (x1 − x¯)2 + (x2 − x¯)2 . 1 Note that since n = 2 the normalization factor of n−1 in the unbiased variance estimate ¯ in Table 8. becomes 1. Once we have this data we display the sampling distribution for X Using that table we can compute ¯ = 0.04(25) + 0.2(32.5) + 0.25(40) + 0.12(45) + 0.3(52.5) + 0.09(65) = 44.5 . E(X) Notice that this equals the population mean µ given by µ = 0.2(25) + 0.5(40) + 0.3(65) = 44.5 . 149 x1 25 25 25 40 40 40 65 65 65 x2 25 40 65 25 40 65 25 40 65 p(x1 , x2 ) 0.2(0.2) = 0.04 0.2(0.5) = 0.10 0.2(0.3) = 0.06 0.5(0.2) = 0.10 0.5(0.5) = 0.25 0.5(0.3) = 0.15 0.3(0.2) = 0.06 0.3(0.5) = 0.15 0.3(0.3) = 0.09 x¯ 25 32.5 45 32.5 40 52.5 45 52.5 65 s2 0 112.5 800 112.5 0 312.5 800 312.5 0 Table 7: The possible two element samples we could draw from Exercise 37. 25 32.5 40 45 52.5 x¯ x) 0.04 0.2 0.25 0.12 0.30 pX¯ (¯ 65 0.09 ¯ for Exercise 37. Table 8: The sampling distribution of X The sampling distribution of S 2 is given in Table 9. From the sampling distribution of S 2 we find E(S 2 ) = 0(0.38) + 112.5(0.2) + 312.5(0.3) + 800(0.12) = 212.25 . Notice that this equals the population variance σ 2 is given by σ 2 = 0.2(25 − 44.5)2 + 0.5(40 − 44.5)2 + 0.3(65 − 44.5)2 = 212.25 . Exercise 5.38 Part (a): See Table 10 where we present all possible two element samples we could obtain from the given distribution are presented. Based on this result the probability distribution for T0 is given in Table 11. Part (b): We compute µT0 = 0(0.04) + 1(0.2) + 2(0.37) + 3(0.3) + 4(0.09) = 2.2 . Note that we are told the population mean µ is µ = 1.1. Note that µT0 = 2µ. s2 pS 2 (s2 ) 0 112.5 312.5 800 0.38 0.2 0.30 0.12 Table 9: The sampling distribution of S 2 for Exercise 37. 150 x1 0 0 0 1 1 1 2 2 2 x2 0 1 2 0 1 2 0 1 2 p(x1 , x2 ) t0 0.2(0.2) = 0.04 0 0.2(0.5) = 0.10 1 0.2(0.3) = 0.06 2 0.5(0.2) = 0.1 1 0.5(0.5) = 0.25 2 0.5(0.3) = 0.15 3 0.3(0.2) = 0.06 2 0.3(0.5) = 0.15 3 0.3(0.3) = 0.09 4 Table 10: The possible two element samples we could draw from Exercise 38. t0 0 1 2 3 4 pT0 (t0 ) 0.04 0.1+0.1=0.2 0.06 + 0.25 + 0.06 = 0.37 0.15 + 0.15 = 0.3 0.09 Table 11: The sampling distribution of T0 for Exercise 38. Part (c): We compute σT20 = E(T02 ) − E(T0 )2 = 02 (0.04) + 12 (0.2) + 22 (0.37) + 32 (0.3) + 42 (0.09) − 2.22 = 5.82 − 2.22 = 0.98 . Note that σT20 = 2(0.49) = 2σ 2 . Exercise 5.39 X is a binomial random variable with p = 0.8 and n = 10 representing the number of successes (a drive that works in a satisfactory manner). Now V ≡ Xn is a scaled binomial random variable so the probability of getting a certain value of V is equal to the probability of getting various binomial probabilities. We tabulate these values in Table 12. We generated the probabilities of each sample using the R code dbinom(0:10,10,0.8) Exercise 5.40 Let the type of envelope opened be denoted by 0, 5, or 10 representing the dollar amount. We generate the samples for this problem using the python code ex5 40.py. When we run that code we get the following (partial) output. v1= v1= v1= 0, v2= 0, v2= 0, v2= 0, v3= 0, v3= 0, v3= 0, prob= 0.125, max(v1,v2,v3)= 0 5, prob= 0.075, max(v1,v2,v3)= 5 10, prob= 0.050, max(v1,v2,v3)= 10 151 v = nx 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 10pV (v) 10 0.1 0.8 = 0.0000001024 0 10 0.11 0.89 = 0.0000040960 1 10 0.12 0.88 = 0.0000737280 1 0.0007864320 0.0055050240 0.0264241152 0.0880803840 0.2013265920 0.3019898880 0.2684354560 0.1073741824 Table 12: The sampling distribution of v1= v1= v1= 0, v2= 0, v2= 0, v2= 5, v3= 5, v3= 5, v3= X n for Exercise 39. 0, prob= 0.075, max(v1,v2,v3)= 5 5, prob= 0.045, max(v1,v2,v3)= 5 10, prob= 0.030, max(v1,v2,v3)= 10 Part (a): If we accumulate the probabilities of the various possible maximums we get the following {0: 0.125, 10: 0.4880000000000001, 5: 0.387} Thus the probability of getting a maximum of 5 is 0.387. Part (b): The above python code could be modified to compare the values of M for different samples sizes. We would expect that as we draw more samples it is more likely that we get larger values for M and thus there should be more probability weight places on the higher values for M. Exercise 5.41 Part (a): We generate the samples for this problem using the python code ex5 41.py. ¯ (as a python When we run that code we get the following sampling distribution for X defaultdict). defaultdict(<type ’float’>, {1.5: 0.24, 1.0: 0.16000000000000003, 2.0: 0.25, 3.0: 0.1, 4.0: 0.010000000000000002, 2.5: 0.2, 3.5: 0.04000000000000001}) 152 Part (b): We compute 0.85. Part (c): In the same python code we compute defaultdict(<type ’float’>, {0: 0.30000000000000004, 1: 0.4, 2: 0.22000000000000003, 3: 0.08000000000000002}) ¯ ≤ 1.5 would be for “samples of the form” Part (d): The only samples where X (1, 1, 1, 1) (2, 1, 1, 1) (3, 1, 1, 1) 1 4 4 4 = 6. 2 (2, 2, 1, 1) I’ve listed the number of samples that are like the expression on the right in each case. Now we get all ones with a probability of 0.44 etc. Thus the desired probability is given by 10.44 + 40.43 0.3 + 40.43 0.2 + 60.420.32 = 0.24 . Exercise 5.42 Part (a): We generate the samples for this problem using the python code ex5 42.py. ¯ given by When we run that code we get the following sampling distribution for X [(27.75, 0.06666666666666667), (28.0, 0.03333333333333333), (29.7, 0.03333333333333333), (29.700000000000003, 0.06666666666666667), (29.95, 0.06666666666666667), (31.65, 0.13333333333333333), (31.9, 0.06666666666666667), (33.6, 0.03333333333333333)] ¯ = 15.21. We find that E(X) Part (b): For this part we select an office first and then average the two salaries in that office. There are only three choices we can make (which office to select) and thus the sampling ¯ under this sampling method is given by distribution of X 27.75 31.65 31.9 153 6 4 Density 0 2 2 0 1 Density 3 8 n=10 4 n=5 4.40 4.45 4.50 4.55 4.60 4.65 4.25 4.30 4.35 4.40 4.45 4.50 colMeans(WD) colMeans(WD) n=20 n=30 4.55 4.60 4.55 4.60 4 3 Density 0 0 1 1 2 2 Density 3 5 4 4.35 4.3 4.4 4.5 4.6 4.7 4.25 4.30 colMeans(WD) 4.35 4.40 4.45 4.50 colMeans(WD) ¯ Exercise 44. Figure 4: The sampling distribution of X ¯ = 30.43. each with a probability of 13 . We find that in this case E(X) Part (c): Notice that the population mean µ = 30.43. Exercise 5.43 For b = 1, 2, · · · , B − 1, B we would draw n samples from our dispensing machine. For each of these samples we would compute the fourth spread. This fourth spread would be the statistic of interest and we would have B values of this statistic. These B values could be used to compute/display the sampling distribution of the fourth spread from a uniform random variables. Exercise 5.44 See the R code ex5 44.R where we perform the requested simulation. When that code is run we get the result given in Figure 4. The histogram with n = 30 looks the most approximately normal. 154 n=20 0.00 0.0 32 33 34 35 30 32 34 36 colMeans(WD) colMeans(WD) n=30 n=50 38 0.15 0.00 0.00 0.05 0.05 0.10 0.10 Density 0.15 0.20 0.25 0.20 31 Density 0.10 Density 0.05 0.2 0.1 Density 0.3 0.15 0.4 0.20 n=10 28 30 32 34 36 38 40 30 32 colMeans(WD) 34 36 38 colMeans(WD) ¯ Exercise 45. Figure 5: The sampling distribution of X Exercise 5.45 See the R code ex5 45.R where we perform the requested simulation. When that code is run we get the result given in Figure 5. The histogram with n = 50 looks the most approximately normal. Exercise 5.46 ¯ is centered on µX¯ = 12 cm and σX¯ = Part (a): X √σ n = 0.04 4 = 0.01 cm. ¯ is centered on µX¯ = 12 cm and σX¯ = Part (b): When n = 64 we have that X 0.005 cm. √σ n = 0.04 8 = ¯ will be closer to µ = 12 as σX¯ in that case Part (c): When n = 64 the sample value of X is smaller. 155 Exercise 5.47 Part (a): We have ¯ ≤ 12.01) = Φ 12.01 − 12 P (11.99 ≤ X 0.01 11.99 − 12 −Φ 0.01 = 0.6826895 . Part (b): We have ¯ ≥ 12.01) = 1 − P (X ¯ ≤ 12.01) = 1 − Φ P (X 12.01 − 12 √ 0.04/ 25 = 0.1056498 . Exercise 5.48 Part (a): Using the Central Limit Theorem (CLT) we have µX¯ = 50 and σX¯ = Then we have 49.9 − 50 50.1 − 50 ¯ −Φ = 0.6826895 . P (49.9 ≤ X ≤ 50.1) = Φ 1/10 1/10 √1 n = 1 . 10 Part (b): This would be ¯ ≤ 50.1) = Φ P (49.9 ≤ X 50.1 − 49.8 1/10 −Φ 49.9 − 49.8 1/10 = 0.1573054 . Exercise 5.49 Here n = 40 and the time to grade all the papers T0 is the sum of n random variables √ T √0 = X1 + X2 + · · · + Xn−1 + Xn so µT0 = nµ = 40(6) = 240 minutes and σT0 = nσ = 40(6) = 37.947 minutes. To finish grading by 11:00 P.M. we have to finish grading in 10 minutes + four hours or 250 minutes. Thus we want to calculate 250 − 240 √ P (T0 ≤ 250) = Φ = 0.6039263 . 6 40 Part (b): In this case we want to calculate 260 − 240 √ P (T0 ≥ 260) = 1 − P (T0 < 260) = Φ 6 40 156 = 0.2990807 . Exercise 5.50 Part (a): We have 10200 − 10000 9900 − 10000 ¯ ≤ 10200) = P √ √ ≤Z≤ P (9900 ≤ X 500/ 40 500/ 40 10200 − 10000 9900 − 10000 √ √ =Φ −Φ 500/ 40 500/ 40 = 0.8913424 . Part (b): This would be given by changing n to 15 and given by 9900 − 10000 10200 − 10000 √ √ −Φ = 0.7200434 . Φ 500/ 15 500/ 15 Exercise 5.51 ¯ ≤ 11) which can be done using On the first day we want to calculate P (X 11 − 10 √ , Φ 2/ 5 with a similar expression for the second day. If we want the sample average to be at most 11 minutes on both days then we would multiply these two results. When we do that we get the value of 0.7724277. Exercise 5.52 We want to find L such that P (T0 ≥ L) = 0.05 or 1 − P (T0 < L) = 0.05 , or P (T0 < L) = 0.95 , or with n = 4 P or T − nµ L − nµ √ < √ nσ nσ = 0.95 , L − nµ √ = 1.644854 , nσ using R’s qnorm function. We can solve the above for L when we take n = 4, µ = 10, and σ = 1 where we find L = 43.28971. 157 Exercise 5.53 Part (a): We would compute ¯ ≥ 51) = 1 − P (X ¯ < 51) = 1 − Φ P (X 51 − 50 √ 1.2/ 9 = 0.006209665 . Part (b): We could use the same formula as above but with n = 40 to get 6.8 10−8. Exercise 5.54 Part (a): We could compute these as 3.0 − 2.65 ¯ ≤ 3.0) = Φ √ = 0.9802444 P (X 0.85/ 25 3.0 − 2.65 2.65 − 2.65 ¯ ≤ 3.0) = Φ √ √ P (2.65 ≤ X −Φ = 0.4802444 . 0.85/ 25 0.85/ 25 Part (b): We would want to pick the value of n such that 3.0 − 2.65 ¯ ≤ 3.0) = Φ √ = 0.99 , P (X 0.85/ n or when we use qnorm we get 0.85 3.0 − 2.65 = √ (2.326348) . n We can solve for n to find n = 31.91913. Since n must be an integer we take n ≥ 32. Exercise 5.55 Part (a): Use the normal approximation we can compute 35 − 50 70 − 50 √ √ −Φ = 0.9807137 . P (35 ≤ N ≤ 70) = Φ 50 50 Here since the number of parking tickets is√Poisson we have a mean of λ = 50 and a variance of λ = 50 (the standard deviation is then 50 = 7.071068). Part (b): We have P (225 ≤ T0 ≤ 275) = Φ 275 − 5(50) √ √ 5 50 158 225 − 5(50) √ √ −Φ 5 50 = 0.8861537 . Exercise 5.56 Part (a): Let T0 be the total number of errors. Then we have µT0 = nµ = 1000(1/10) = 100 1/2 √ 9 1 = 9.486833 σT0 = 1000 10 10 125 − 100 P (T0 ≤ 125) = Φ = 0.995796 . 9.486833 Part (b): Let T1 and T2 be the number of errors in the first and second message respectively. Now define X to be X ≡ T1 − T2 . Notice that X has an expectation of zero and a variance (due to the independence of T1 and T2 ) given by Var (X) = Var (T1 ) + Var (T2 ) = 2(9.486833) = 180 . Then the probability we want is given by 50 − 0 P (|X| ≤ 50) = P (−50 ≤ X ≤ 50) = Φ √ 180 −Φ −50 − 0 √ 180 = 0.9998061 . Exercise 5.57 From properties of the gamma distribution µX = αβ = 100 2 σX = αβ 2 = 200 . We want to evaluate P (X ≤ 125) ≈ Φ 125 − µ σ = 0.9614501 . Exercise 5.58 Part (a): We have E(volume) = 27µ1 + 125µ2 + 512µ3 = 27(200) + 125(250) + 512(100) = 87850 , and Var (volume) = 272 σ12 + 1252 σ22 + 5122σ32 = 272 (102) + 1252(122 ) + 5122 (82 ) = 19100116 . Part (b): No we would need to know the covariances between two different variables Xi and Xj for i 6= j. 159 Exercise 5.59 Part (a): We have P (X1 + X2 + X3 ≤ 200) = Φ and P (150 ≤ X1 + X2 + X3 ≤ 200) = Φ 200 − 3(60) p 3(15) 200 − 3(60) p 3(15) ! −Φ ! = 0.9985654 , 150 − 3(60) p 3(15) ! = 0.9985616 . ¯ = 1 (X1 + X2 + X3 ) we have µX¯ = 1 (3(60)) = 60 and Part (b): Since X 3 3 1 2 σX ¯ = (3(15)) = 5 . 9 Then using these we have and 55 − 60 ¯ ¯ √ = 0.9873263 , P (55 ≤ X) = 1 − P (X < 55) = 1 − Φ 5 62 − 60 58 − 60 ¯ ≤ 62) = Φ √ √ P (58 ≤ X −Φ = 0.6289066 . 5 5 Note that these numbers do not agree with the ones given in the back of the book. If anyone sees anything incorrect in what I have done please let me know. Part (c): If we define the random variable V as V ≡ X1 − 0.5X2 − 0.5X3 , then we have E(V ) = 60 − 0.5(60) − 0.5(60) = 0 and 1 2 1 2 45 1 1 2 Var (V ) = σ1 + σ2 + σ3 = 15 1 + + = . 4 4 4 4 2 The probability we want to calculate is then given by ! ! −10 − 0 5−0 −Φ p = 0.8365722 . Φ p 45/2 45/2 Part (d): In this case let V ≡ X1 + X2 + X3 so that E(V ) = 40 + 50 + 60 = 150 and Var (V ) = σ12 + σ22 + σ32 = 10 + 12 + 14 = 36. Thus 160 − 150 P (X1 + X2 + X3 ≤ 160) = Φ = 0.9522096 . 6 Next let V ≡ X1 + X2 − 2X3 so that E(V ) = 40 + 50 − 2(60) = −30 and Var (V ) = σ12 + σ22 + 4σ32 = 10 + 12 + 4(14) = 78. Using these we have P (X1 + X2 ≥ 2X3 ) = P (X1 + X2 − 2X3 ≥ 0) = 1 − P (X1 + X2 − 2X3 < 0) 0 − (−30) √ = 0.0003408551 . =1−Φ 78 160 Exercise 5.60 With the given definition of Y we have 1 1 1 1 E(Y ) = (µ1 + µ2 ) − (µ3 + µ4 + µ5 ) = (2(20)) − (3(21)) = 20 − 21 = −1 , 2 3 2 3 and 1 1 1 1 1 1 1 Var (Y ) = σ12 + σ22 + σ32 + σ42 + σ52 = (2(4)) + (3(3.5)) = 3.166 . 4 4 9 9 9 4 9 Then we have 0 − (−1) P (0 ≤ Y ) = 1 − P (Y < 0) = 1 − Φ √ = 0.2870544 , 3.166 and 1 − (−1) P (−1 ≤ Y ≤ +1) = Φ √ 3.166 −1 − (−1) √ −Φ 3.166 = 0.369498 . Exercise 5.61 Part (a): The total number of vehicles is given by X + Y . We can compute the requested information using the following R code P = matrix( data=c( 0.025, 0.015, 0.010, 0.050, 0.030, 0.020, 0.125, 0.075, 0.050, 0.150, 0.090, 0.060, 0.100, 0.060, 0.040, 0.050, 0.030, 0.020 ), nrow=6, ncol=3, byrow=T ) X = matrix( data=c( rep(0,3), rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3) ), nrow=6, ncol=3, byrow=T ) Y = matrix( data=c( rep(0,6), rep(1,6), rep(2,6) ), nrow=6, ncol=3 ) E_T = sum( ( X + Y ) * P ) E_T2 = sum( ( X + Y )^2 * P ) Var_T = E_T2 - E_T^2 Std_T = sqrt( Var_T ) Numerically we get > c( E_T, E_T2, Var_T, Std_T ) [1] 3.500000 14.520000 2.270000 1.506652 Part (b): The revenue is given by 3X + 10Y . We can compute the requested information using the following R code 161 E_R = sum( ( 3 * X + 10 * Y ) * P ) E_R2 = sum( ( 3 * X + 10 * Y )^2 * P ) Var_R = E_R2 - E_R^2 Std_R = sqrt( Var_R ) Numerically we get > c( E_R, E_R2, Var_R, Std_R ) [1] 15.400000 313.100000 75.940000 8.714356 Exercise 5.62 We compute 60 − (15 + 30 + 20) √ P (X1 + X2 + X3 ≤ 60) = Φ 12 + 22 + 1.52 = 0.9683411 . Exercise 5.63 Part (a): To compute this we will use Cov (X1 , X2 ) = E(X1 X2 ) − E(X1 )E(X2 ) . To evaluate each of the expressions on the right-hand-side of the above we have used the following R code: P = matrix( data=c( 0.08, 0.07, 0.04, 0.00, 0.06, 0.15, 0.05, 0.04, 0.05, 0.04, 0.10, 0.06, 0.0, 0.03, 0.04, 0.07, 0.0, 0.01, 0.05, 0.06 ), nrow=5, ncol=4, byrow=T ) X_1 = matrix( data=c( rep(0,4), rep(1,4), rep(2,4), rep(3,4), rep(4,4) ), nrow=5, ncol=4, byrow=T ) X_2 = matrix( data=c( rep(0,5), rep(1,5), rep(2,5), rep(3,5) ), nrow=5, ncol=4 ) E_X1 = sum( X_1 * P ) E_X2 = sum( X_2 * P ) E_X1_X2 = sum( X_1 * X_2 * P ) Cov_X1_X2 = E_X1_X2 - E_X1 * E_X2 Numerically we get > c( E_X1, E_X2, E_X1_X2, Cov_X1_X2 ) [1] 1.700 1.550 3.330 0.695 Part (b): To compute Var (X1 + X2 ) we will use Var (X1 + X2 ) = Var (X1 ) + Var (X2 ) + 2Cov (X1 , X2 ) . Continuing with the calculations started above we have 162 E_X1_Sq = sum( X_1^2 * P ) E_X2_Sq = sum( X_2^2 * P ) Var_X1 = E_X1_Sq - E_X1^2 Var_X2 = E_X2_Sq - E_X2^2 Numerically these give > c( Var_X1, Var_X2, Var_X1 + Var_X2, Var_X1 + Var_X2 + 2 * Cov_X1_X2 ) [1] 1.5900 1.0875 2.6775 4.0675 Exercise 5.64 Part (a): Let Xi be the waiting times for the morning bus and Yi the waiting times for the evening bus for i = 1, 2, 3, 5 (Monday P5 through Friday). Let the total weighting time be P4, 5 denoted W so that W ≡ i=1 Xi + i=1 Yi . Then E(W ) = 5E(Xi ) + 5E(Yi ) = 5(4) + 5(5) = 45 , minuets. Part (b): We use the formula for the variance of a uniform distribution and independence to get Var (W ) = 5 X Var (Xi ) + 5 X Var (Yi ) = 5 i=1 i=1 82 12 +5 102 12 = 820 = 68.33 . 12 Part (c): On a given day i the difference between the morning and evening weighting times would be Vi = Xi − Yi . Thus E(Vi ) = E(Xi ) − E(Yi ) = 4 − 5 = −1 and Var (Vi ) = 2 82 + 10 = 13.6667. Var (Xi ) + Var (Yi ) = 12 12 P5 i=1 2Yi . Thus 2E(V ) = 5E(Xi ) − 5E(Yi ) = 5(4) − 5(5) = 5(4) − 5(5) = −5 and Var (V ) = 5 812 + 5 10 = 820 = 68.33. 12 12 Part (d): This would be V = P5 i=1 Xi − Exercise 5.65 Part (a): Note that and ¯ − E(Y¯ ) = 5 − 5 = 0 , µX− ¯ Y¯ = E(X) 2 0.2 ¯ ¯ ¯ ¯ = 0.0032 . Var X − Y = Var X + Var Y = 2 25 163 Using these we have ¯ − Y¯ ≤ +0.1) = Φ P (−0.1 ≤ X 0.1 − 0 √ 0.0032 −0.1 − 0 −Φ √ 0.0032 = 0.9229001 . Part (b): In this case n = 36 and the variance of the difference changes to give 2 0.2 ¯ ¯ Var X − Y = 2 = 0.00222 , 36 then we have ¯ − Y¯ ≤ +0.1) = Φ √0.1 − 0 P (−0.1 ≤ X 0.00222 −0.1 − 0 −Φ √ 0.00222 = 0.9661943 . Exercise 5.66 Part (a): From the problem statement we have E(Bending Moment) = a1 E(X1 ) + a2 E(X2 ) = 5(2) + 10(4) = 50 Var (Bending Moment) = a21 Var (X1 ) + a22 Var (X2 ) = 52 (0.5)2 + 102 (12 ) = 106.25 √ std(Bending Moment) = 106.25 = 10.307 . Part (b): This is given by P (Bending Moment > 75) = 1−P (Bending Moment < 75) = 1−Φ 75 − 50 10.307 = 0.007646686 . Part (c): This would be E(Bending Moment) = E(A1 )E(X1 ) + E(A2 )E(X2 ) = 5(2) + 10(4) = 50 . Part (d): To compute this we will use the formula Var (Bending Moment) = E(Bending Moment2 ) − E(Bending Moment)2 . First we need to compute E(Bending Moment2 ) = E((A1 X1 + A2 X2 )2 ) = E(A21 X12 + 2A1 A2 X1 X2 + A22 X22 ) = E(A21 )E(X12 ) + 2E(A1 )E(A2 )E(X1 )E(X2 ) + E(A22 )E(X22 ) . Now to use the above we compute E(A21 ) = Var (A1 ) + E(A1 )2 = 0.52 + 52 = 25.25 E(A22 ) = 0.52 + E(A2 )2 = 0.52 + 102 = 100.25 E(X12 ) = 0.52 + 22 = 4.25 E(X22 ) = 12 + 42 = 17 . 164 Thus we can use the above to compute E(Bending Moment) = 25.25(4.25) + 2(5)(10)(2)(4) + (100.25)(17) = 2611.562 . Thus Var (Bending Moment) = E(Bending Moment2 ) − E(Bending Moment)2 = 2611.562 − 502 = 111.5625 . Part (e): Now if Corr(X1 , X2 ) = 0.5 then Cor(X1 , X2 ) = σ1 σ2 Corr(X1 , X2 ) = 0.5(1)(0.5) = 0.25 . Using this we compute Var Bending Moment2 = Var (a1 X1 + a2 X2 ) = a21 Var (X1 ) + a22 Var (X2 ) + 2a1 a2 Cov(X1 , X2 ) = 52 (0.5)2 + 102 (12 ) + 2(5)(10)(0.25) = 131.25 . Exercise 5.67 I think this problem means that we will connect a length “20” pipe to a length “15” pipe in such a way that they overlap by “1” inch. Let the first pipe length by denoted as X1 , the second pipes length be denoted as X2 and the connectors length be denoted as O (for overlap). Then the total length when all three are connected is then given by L = X1 + X2 − O . Thus E(L) = 20 + 15 − 1 = 34 and Var (L) = Var (X1 ) + Var (X2 ) + Var (O) = 0.52 + 0.42 + 0.12 = 0.42 . We want to compute P (34.5 ≤ L ≤ 35) = Φ 35 − 34 √ 0.42 −Φ 34.5 − 34 √ 0.42 = 0.158789 . Exercise 5.68 If the velocities of the first and second plane are given by the random variables V1 and V2 respectively then the distance between the two planes after a time t is D = (10 + V1 t) − V2 t = 10 + (V1 − V2 )t . Now D is normally distributed with a mean E(D) = 10 + (E(V1 ) − E(V2 ))t = 10 + (520 − 500)t = 10 + 20t , 165 and a variance given by Var (D) = t2 (Var (V1 ) + Var (V2 )) = t2 (102 + 102 ) = 200t2 . Part (a): We want to compute the probability that D ≥ 0 when t = 2. We find 0 − (10 + 20(2)) √ = 0.9614501 . P (D ≥ 0) = 1 − P (D < 0) = 1 − Φ 200 22 Part (b): We want to compute the probability that D ≤ 10 when t = 2. We find 10 − (10 − 20(2)) √ P (D ≤ 10) = Φ = 0.0786496 . 200 22 Exercise 5.69 Part (a): The expected total number of cars entering the freeway is given by E(T ) = E(X1 ) + E(X2 ) + E(X3 ) = 800 + 1000 + 600 = 2400 . Part (b): Assuming independence we can compute Var (T ) = Var (X1 ) + Var (X2 ) + Var (X3 ) = 162 + 252 + 182 = 1205 . Part (c): The value of E(T ) does not change from the value computed above if the number of cars on each road is correlated. The variance of T is now given by E(T ) = 162 + 252 + 182 + 2Cov (X1 , X2 ) + 2Cov (X1 , X3 ) + 2Cov (X2 , X3 ) = 1205 + 2(80) + 2(90) + 2(100) = 1745 . √ So the standard deviation is 1745 = 41.7732. Exercise 5.70 Part (a): From the definition of W given by W = E(W ) = n X iE(Yi ) = i=1 n X Pn i(0.5) = i=1 i=1 iYi n X 1 2 we have i= i=1 n(n + 1) . 4 Part (b): Since Yi is a binomial random variable from the properties of the binomial random variable we have Var (Yi ) = pq = p(1 − p). Thus for the variance of W we have Var (W ) = n X i2 Var (Yi ) = i=1 = p(1 − p) n X i=1 n X i=1 2 i2 p(1 − p) i = p(1 − p) n(n + 1)(2n + 1) 6 166 = n(n + 1)(2n + 1) . 24 Exercise 5.71 Part (a): The bending moment would be given by Bending Moment = a1 X1 + a2 X2 + W Z 12 xdx 2 12 x = a1 X1 + a2 X2 + W 2 0 144 = a1 X1 + a2 X2 + W 2 = 5X1 + 10X2 + 72W . 0 With this expression we have that E(Bending Moment) = 5E(X1 ) + 10E(X2 ) + 72E(W ) = 5(2) + 10(4) + 72(1.5) = 158 , and Var (Bending Moment) = 52 Var (X1 ) + 102 Var (X2 ) + 722 Var (X3 ) = 25(0.52) + 100(12 ) + 722 (0.252 ) = 430.25 . Part (b): Using the above we have that 200 − 158 P (bending moment ≤ 200) = Φ √ 430.25 = 0.9785577 . Exercise 5.72 Let T be the total time taken to run all errands and return to the office then T = X1 + X2 + X3 + X4 with T measured in minutes. We want to compute the value of t such that P (T ≥ t) = 0.01 , or P (T < t) = 0.99 , or t − (15 + 5 + 8 + 12) Φ √ < 0.99 42 + 12 + 22 + 32 We can solve the above for t to find t = 52.74193 minutes. Thus the sign should say “I will return by 10:53 A.M.”. 167 Exercise 5.73 ¯ is approximately normal with a mean of 105 and a variance of Part (a): X 62 . approximately normal with a mean of 100 and a variance of 35 82 . 40 Y¯ is ¯ − Y¯ is approximately normal with a mean of 105 − 100 = 5 and a variance of Part (b): X 2 2 ¯ − Y¯ = 8 + 6 = 2.628 . Var X 40 35 Part (c): Using the above results we compute −1 − 5 1−5 ¯ ¯ −Φ √ = 0.006701698 . P (−1 ≤ X − Y ≤ +1) = Φ √ 2.628 2.628 Part (d): We calculate 10 − 5 ¯ ¯ ¯ ¯ P (X − Y ≥ +10) = 1 − P (X − Y ≤ +10) = 1 − Φ √ = 0.001021292 . 2.628 Since this is so small we would doubt the hypothesis that µ1 − µ2 = 5. Exercise 5.74 If X and Y are binomial random variables and we let Z = X − Y then we have E(Z) = n(0.7) − n(0.6) = 50(0.1) = 5 Var (Z) = n(0.7)(0.3) + n(0.6)(0.4) = 22.5 . Where we have used the result that the variance of a binomial random variable is given by npq. Using these results we can approximate 5−5 −5 − 5 P (−5 ≤ X − Y ≤ +5) = Φ √ −Φ √ = 0.4824925 . 22.5 22.5 Exercise 5.75 Part (a): We compute the marginal pmf for X in Table 13 and for Y in Table 14. Part (b): This would be P (X ≤ 15 ∩ Y ≤ 15) = 0.05 + 0.05 + 0.05 + 0.1 = 0.25 . Part (c): We need to check if fX,Y (x, y) = fX (x)fY (y) for all x and y. 168 x fX (x) 12 0.05 + 0.05 + 0.1 = 0.2 15 0.05 + 0.1 + 0.35 = 0.5 20 0 + 0.2 + 0.1 = 0.3 Table 13: The expression for fX (x). fY (y) y 12 0.05 + 0.05 + 0. = 0.1 15 0.05 + 0.1 + 0.2 = 0.35 20 0.1 + 0.35 + 0.1 = 0.55 Table 14: The expression for fY (y). Part (d): We have E(X + Y ) = 24(0.05) + 27(0.05) + 32(0.1) + 27(0.05) + 30(0.1) + 35(0.35) + 32(0) + 35(0.2) + 40(0.1) = 33.35 . Part (e): We have E(|X − Y |) = 0(0.05) + 3(0.05) + 8(0.1) + 3(0.05) + 0(0.1) + 5(0.35) + 8(0) + 5(0.2) + 0(0.1) = 3.85 . Exercise 5.76 Let X1 and X2 be independent normal random variables with a mean of zero and a variance of one. Then X1 + X2 is a normal random variable with a mean of zero and a variance of 2. The 75% percentile of either X1 or X2 is given in R as qnorm(0.75,0,1). Evaluating this gives 0.6744898. Two of these gives 1.34898. The 75% percentile of X1 + X2 is given by qnorm(0.75, 0, 2). When we evaluate this we get 1.34898. Thus it looks like the 75% percentiles add together when we add random variables. 169 30 25 20 15 0 5 10 y 0 10 20 30 40 x Figure 6: The region of nonzero probability for Exercise 77. Exercise 5.77 Part (a): See Figure 6 for the region of positive density. From that region the value of k is given by evaluating the following Z 20 Z 30−x Z 30 Z 30−x 1= kxydydx + kxydydx x=0 y=20−x Z 20 2 30−x x=20 y=0 Z 30 2 30−x y dx 2 y=0 x=20 x=0 Z 20 Z 30 x x 2 2 =k ((30 − x) − (20 − x) )dx + k (30 − x)2 dx 2 2 x=20 x=0 81250k 70000 + 3750k = . =k 3 3 =k Thus k = 3 81250 x y dx + k 2 y=20−x = 3.692308 10−5. 170 x Part (b): The marginal pdf of X is given by ( R 30−x kxydy 0 < x < 20 Ry=20−x fX (x) = 30−x kxydy 20 < x < 30 y=0 2 30−x kxy kxydy = kx ((30 − x)2 − (20 − x)2 ) 2 2 y=20−x 30−x . = kxy 2 kx 2 = (30 − x) 2 2 y=0 In the same way we have the marginal pdf of Y given by ( R 30−y kxydx 20 < y < 30 x=0 R fY (y) = 30−y kxydx 0 < y < 20 x=20−y ky (30 − y)2 2 . = ky ((30 − y)2 − (20 − y)2) 2 Note that f (x, y) 6= fX (x)fY (y) and X and Y are not independent. Part (c): We need to evaluate P (X + Y ≤ 15) = Z 25 x=0 25−x Z kxydydx . y=20−x Part (d): We need to evaluate E(X + Y ) = Z 25 x=0 Z 25−x (x + y)kxydydx . y=20−x Part (e): We need to compute E(X), E(Y ), E(XY ), E(X 2 ), E(Y 2 ), Var (X) and Var (Y ) to evaluate these. Part (f): We first need to evaluate 2 E((X + Y ) ) = Z 25 x=0 Z 25−x (x + y)2kxydydx , y=20−x and then use the formula for the variance expressed as the difference of expectations to evaluate Var (X + Y ). Exercise 5.78 By the argument given in the problem statement we would have FY (y) = P {Y ≤ y} = 171 n Y i=1 P {Xi ≤ y} . Since each Xi is a uniform random variable we have y < 100 0 y−100 P {Xi ≤ y} = 100 < y < 200 100 1 y > 200 Using this we have y − 100 100 Our pdf for Y is given by fY (y) = or FY (y) = dFY dy fY (y) = n for 100 ≤ y ≤ 200 . n (y − 100)n−1 . n 100 We then get the expectation of Y to be given by Z 200 Z 200 yn (y − 100)n−1dy E(Y ) = yfY (y)dy = n 100 100 100 Z 200 Z 200 n n n−1 = (y − 100) dy + 100 (y − 100) dy 100n 100 100 2n + 1 100n+1 100n+1 n = 100 . + = 100n n + 1 n n+1 Exercise 5.79 Let the random variable representing the average calorie intake be given by 365 1 X (Xi + Yi + Zi ) . V = 365 i=1 Where Xi , Yi , and Zi are defined in the problem. Then we have 365 1 X E(V ) = (E(Xi ) + E(Yi ) + E(Zi )) 365 i=1 1 (365(500) + 365(900) + 365(2000)) = 3400 , 365 = and 1 Var (V ) = 3652 1 = 3652 We want to calculate 365 X i=1 365 X i=1 (σx2 + σY2 + σZ2 ) ! (502 + 1002 + 1802 ) ! = 1 (502 + 1002 + 1802 ) = 123.0137 . 365 3500 − 3400 P (V < 3500) = Φ √ 123.0137 172 = 1. Exercise 5.80 P50 P Part (a): Let T0 be equal the total luggage weight or T0 = 12 i=1 Yi where Xi is i=1 Xi + the weight of the ith business class customers luggage and Yi is the weight of the ith tourist class customers luggage weight. Now with this definition we have that E(T0 ) = 12E(Xi ) + 50E(Yi ) = 12(40) + 50(30) = 1980 2 Var (T0 ) = 12σX + 50σY2i = 12(102) + 50(62) = 3000 . i Part (b): For this part we want to compute 2500 − 1980 √ P (T0 ≤ 2500) = Φ = 1. 3000 Exercise 5.81 Part (a): We can use the expression E(X1 + X2 + · · · + XN ) = E(N)µ , to compute the desired expected total repair time. Let Xi be the length of time taken to repair the ith component. We want to compute E(X1 + X2 + · · · + XN ) = E(N)µ = 10(40) = 400 , minutes. Part (b): Let Xi be the number of defects found in the ith component. Then the total number of defects in four hours is T = X1 + · · · + XN where N is the number of components that come in during the four hour period. We don’t know the value of N since it is a random variable. We know however that E(N) = 4E(N1 ) = 4(5) = 20 , where N1 is the number of components submitted in one hour. Using this we have E(T ) = E(N)µ = 20E(X1 ) = 20(3.5) = 70 . Exercise 5.82 Let total number of voters that favor this candidate be denoted T and then T = Tr + Tu where Tr are the number of voters form the rural area and Tu are the number of voters from the urban area. From the description in the problem Tr is a binomial random variable with 173 n = 200 and p = 0.45 and Tu is a binomial random variable with n = 300 and p = 0.6. We want to compute P (Tr + Tu ≥ 250) = 1 − P (Tr + Tu < 250) . To use the central limit theorem (CLT) we need to know E(Tr + Tu ) = 0.45(200) + 0.6(300) = 270 Var (Tr + Tu ) = 200(0.45)(0.55) + 300(0.6)(0.4) = 121.5 . With these (and the CLT) the probability we want can be approximated by 250 − 270 √ = 0.9651948 . 1−Φ 121.5 Exercise 5.83 ¯ − µ| < 0.02) = 0.95. Now X ¯ has a mean of We want to find a value of n such that P (|X 0.1 ¯ − µ has a mean of µ and a standard deviation given by √n so that the random variable X zero (and the same standard deviation). Thus we can divide by the standard deviation to write the probability above as ¯ |X − µ| 0.02 √ < √ P = 0.95 . 0.1/ n 0.1/ n Since the random variable be written ¯ |X−µ| √ 0.1/ n is the absolute value of a standard normal the above can 0.02 √ P |Z| < = 0.95 . 0.1/ n Based on properties of the standard normal we can show that P (|Z| < c) = 1 − 2P (Z < −c) = 1 − 2Φ(−c) . Thus we can write the above as 0.02 √ = 0.95 . 1 − 2Φ − 0.1/ n Simplifying some we get 0.02 √ Φ − = 0.025 . 0.1/ n In the above we can solve for n. When we do this we get n = 96.03647. Thus as n must be an integer we would take n = 97. 174 Exercise 5.84 The P14 amount of soft drink consumed (in ounces) in two weeks (14 days) is given by T0 = i=1 Xi where Xi is a normal random variable with a mean of 13 oz and a standard deviation of 2 oz. Thus E(T0 ) = 14(13) = 182 and Var (T0 ) = 14(22 ) = 56. The total amount of soft drink we have in the two six-packs is 2(6)(16) = 192 oz. For the problem we want to compute 192 − 182 √ = 0.9092754 . P (T0 < 192) = Φ 56 Exercise 5.85 Exercise 58 is worked on Page 159. The total volume is given by V = 27X1 + 125X2 + 512X3 and we want to compute 100000 − 87850 √ P (V ≤ 100000) = Φ = 0.9972828 . 19100116 Exercise 5.86 To make it to class we must have X2 − X1 or the amount of time between the end of the first class and the start of the second class larger than the time it takes to get the second class from the first class. From the problem statement X1 − X2 is a normal random variable with a mean given by difference between the two means or 9:10 - 9:02 = 8 , 2 minutes. The variable X1 − X2 has a variance of σX = 12 + 1.52 = 3.25. To compute 2 −X1 the probability of interest want to compute P (X2 − X1 > X3 ). To find this probability consider the random variable X2 − X1 − X3 . This is a normal random variable with a mean of 8 − 6 = 2 minutes and a variance (by independence) of 12 + 1.52 + 12 = 4.25. Thus we have 0−2 P (X2 − X1 − X3 > 0) = 1 − P (X2 − X1 − X3 < 0) = 1 − Φ √ = 0.8340123 . 4.25 Exercise 5.87 Part (a): Note that we can write Var (aX + Y ) = a2 Var (X) + Var (Y ) + 2aCov (X, Y ) . 175 σY σX and the above becomes 2 σY σY 2 2 Cov (X, Y ) Var (aX + Y ) = σX + σY + 2 2 σX σX σY 2 = 2σY + 2 Cov (X, Y ) . σX In the above let a = Next we recall that Cov (X, Y ) = σX σY ρ and that the variance is always positive so that the above becomes σY 2 ρσX σY ≥ 0 , 2σY + 2 σX or when we cancel positive factors we get 1 + ρ ≥ 0 so ρ ≥ −1 . Part (b): Using the fact that Var (aX − Y ) ≥ 0 and the same type of expansion as above we have 2 a2 σX + σY2 − 2aCov (X, Y ) ≥ 0 . As before let a = σY σX to get σY2 + σY2 −2 σY σX σX σY ρ ≥ 0 . Which simplifies to 1 − ρ ≥ 0 so ρ ≤ 1 . Exercise 5.88 To minimize E((X + Y − t)2 ] with respect to t we will take the derivative with respect to t, set the result equal to zero, and then solve for t. Taking the derivative and setting the result equal to zero gives 2E((X + Y − t)) = 0 . The left-hand-side of this (after we divide by two) is given by ZZ 2 (x + y − t)(2x + 3y)dydx . 5 To integrate this more easily with respect to y we will write the integrand as (y + x − t)(3y + 2x) = 3y 2 + 2xy + 3y(x − t) + 2x(x − t) . Integrating with respect to y over 0 < y < 1 gives 1 3 3 3 y + xy + y(x − t) + 2x(x − t)y = 1 + x + (x − t) + 2x(x − t) 2 2 y=0 5 3 − 2t x + 2x2 . = 1− t + 2 2 176 We now integrate this with respect to x over 0 < x < 1 to get 1 3 35 5 5 x 2 3 1− t x+ = − 2t + x − t, 2 2 2 3 x=0 12 2 when we simplify. Setting this equal to zero and solving for t gives t = minimizes the error of prediction. 7 6 for the value that Exercise 5.89 Part (a): To do this part one needs to derive the cumulative density function for X1 + X2 by computing P (X1 + X2 ≤ t) and the fact that f (x1 , x2 ) is the product of two chi-squared distributions with parameters ν1 and ν2 . Part (b): From Part (a) of this problem we know that Z12 +Z22 +· · ·+Zn2 will be a chi-squared random variables with parameter ν = n. Part (c): As Xiσ−µ is a standard normal random variable when we square this we will get a chi-squared random variable with ν = 1. First recall that the distribution of the sum of chi-squared random variables is another chi-squared random variable with its degree equal to the sum of the degrees of the chi-squared random variables in the sum. Because of this 2 the sum of n variables of the form Xiσ−µ is another chi-squared random variable with parameter ν = n. Exercise 5.90 Part (a): We have Cov(X, Y + Z) = E(X(Y + Z)) − E(X)E(Y + Z) = E(XY ) + E(XZ) − E(X)E(Y ) − E(X)E(Z) = E(XY ) − E(X)E(Y ) + E(XZ) − E(X)E(Z) = Cov (X, Y ) + Cov (X, Z) . Part (b): Using the given covariance values we have Cov (X1 + X2 , Y1 + Y2 ) = Cov (X1 , Y1 ) + Cov (X1 , Y2 ) + Cov (X2 , Y1 ) + Cov (X2 , Y2 ) = 5 + 1 + 2 + 8 = 16 . 177 Exercise 5.91 Part (a): As a first step we use the definition of the correlation coefficient ρ as Cov (X1 , X2 ) Cov (W + E1 , W + E2 ) = σX1 σX2 σX1 σX2 Cov (W, W ) + Cov (W, E2 ) + Cov (E1 , W ) + Cov (E1 , E2 ) = σX1 σX2 2 Cov (W, W ) σW = = . σX1 σX2 σX1 σX2 ρ= Since E1 and E2 are independent of one another and from W . Now for i = 1, 2 note that 2 2 σX = Var (W + Ei ) = Var (W ) + Var (Ei ) = σW + σE2 . i Thus we have that ρ is given by ρ= 2 σW . 2 σW + σE2 Part (b): Using the above formula we have ρ = 1 1+0.012 = 0.9999. Exercise 5.93 Following the formulas given in the book when Y = X4 1 X1 + 1 X2 1 1 1 E(Y ) = h(µ1 , µ2 , µ3 , µ4 ) = 120 + + 10 15 20 + 1 X3 we have = 26 . Next to compute the variance we need ∂h 1 120 ∂h so = X4 − 2 (µ1 , µ2 , µ3 , µ4 ) = − 2 = −1.2 ∂X1 X1 ∂X1 10 ∂h ∂h 1 120 so = X4 − 2 (µ1 , µ2 , µ3 , µ4 ) = − 2 = −0.5333333 ∂X2 X2 ∂X2 15 ∂h 1 120 ∂h so = X4 − 2 (µ1 , µ2 , µ3 , µ4 ) = − 2 = −0.3 ∂X3 X3 ∂X3 20 ∂h 1 1 1 ∂h 1 1 1 = + + so (µ1 , µ2 , µ3 , µ4 ) = + + = 0.2166667 . ∂X4 X1 X2 X3 ∂X4 10 15 20 Using these we get 2 2 2 2 ∂h ∂h ∂h ∂h 2 2 2 σ1 + σ2 + σ3 + σ42 V (Y ) = ∂x1 ∂x2 ∂x3 ∂x4 2 2 2 2 2 2 = (−1.2) 1 + (−0.5333333) 1 + (−0.3) 1.5 + 0.2166667242 = 2.678056 . 178 Exercise 5.94 For this problem we will use the more accurate expression for the expectation of a function of several random variables given by 1 2 ∂2h 1 ∂2h + · · · + σ . E[h(X1 , . . . , Xn )] = h(µ1 , . . . , µn ) + σ12 2 ∂x1 2 2 n ∂xn 2 In Exercise 93 above we computed h(µ1 , . . . , µn ) and all of the first derivatives of h. To use the above we need to compute the second derivatives of h. We find ∂2h ∂2h 2X4 2(120) so = 0.24 = 2 2 (µ1 , µ2 , µ3 , µ4 ) = 3 X1 103 ∂X1 ∂X1 ∂2h 2X4 2(120) ∂2h = 0.07111111 = so 2 2 (µ1 , µ2 , µ3 , µ4 ) = 3 X2 153 ∂X2 ∂X2 ∂2h 2X4 2(120) ∂2h so = 0.03 = 2 2 (µ1 , µ2 , µ3 , µ4 ) = 3 X3 203 ∂X3 ∂X3 ∂2h = 0. ∂X4 2 Using these in the above formula we have 1 1 1 E(Y ) = 26 + 12 (0.24) + 12 (0.07111111) + (1.52 )(0.03) = 26 + 0.1893056 = 26.18931 . 2 2 2 Exercise 5.95 Part (a-b): To start with let U = αX + βY where X and Y are independent standard normal random variables. Then we have 2 Cov (U, X) = αCov (X, X) + βCov (X, Y ) = ασX = α, and 2 Cov (U, U) = α2 σX + β 2 σY2 = α2 + β 2 . To have Corr (X, U) = ρ we pick α and β such that α ρ= p . 2 α + β2 If we take α = ρ then we have ρ= p ρ ρ2 + β 2 . When we solve for β in the above we get Thus the linear combination p β = ± 1 − ρ2 . U = ρX ± p 1 − ρ2 Y will have Corr (X, U) = ρ. Now if α = 0.6 and β = 0.8 using Equation 16 we get 0.6 = 0.6 . Corr (U, X) = √ 0.62 + 0.82 179 (16) Tests of Hypotheses Based on a Single Sample Problem Solutions Exercise 8.1 To be a statistical hypothesis the statement must be an assumption about the value of a single parameter or several parameters from a population. Part (b): This is not since x˜ is the sample median and not a population parameter. Part (c): This is not since s is the sample standard deviation and not a population parameter. ¯ and Y¯ are sample means and not population parameters. Part (e): This is not since X Exercise 8.2 For this books purpose the hypothesis H0 will always be an equality claim while Ha will look like one of the following Ha : θ > θ0 Ha : θ < θ0 Ha : θ = 6 θ0 . Part (a): Yes. Part (b): No since Ha is a less than or equal statement. Part (c): No since H0 is not an equality statement. Part (d): Yes. Part (e): No since S1 and S2 are not population statistics. Part (f): No as Ha is not of the correct form. Part (g): Yes. Part (h): Yes. 180 Exercise 8.3 If we reject H0 we want to be certain that µ > 100 since that is the requirement. Exercise 8.4 With the hypothesis test H0 : µ = 5 Ha : µ > 5 . In a type I error we classify the water as contaminated when it is not. This seems like not a very dangerous error. A type II error will have us not rejecting Ha when we should i.e. we classify the contaminated water as clean. This second error seems more dangerous. With the hypothesis test H0 : µ = 5 Ha : µ < 5 , under a type I error we classify the water as safe when it is not (and is the dangerous error). A type II error will have us failing to classify safe water when it is. In general it is easier to specify a fixed value of α (the probability of a type I) error rather than β the probability of a type II error. Thus we would prefer to use the second test where we could make α very small (resulting in very few waters classified as safe when they are not). Exercise 8.5 We would test the hypothesis that H0 : σ = 0.05 Ha : σ < 0.05 . A type I error will reject H0 when it is true and thus conclude that the standard deviation is smaller than 0.05 when it is not. A type II error will fail to reject H0 when it is not true or we will fail to notice that the standard deviation is smaller than 0.05 when it is. Exercise 8.6 The hypothesis test we would specify would be H0 : µ = 40 Ha : µ 6= 40 . 181 The type I error would be to reject H0 when it is true or to state that the manufacturing process is producing fuses that are not in specification. A type II error would be to accept H0 when it is not true or Ha is true. This would mean that we are producing fuses outside specifications and we don’t detect this. Exercise 8.7 For this problem a type I error would indicate that we assume that the mean water temperature is too hot when in fact it is not. This would result in attempts to cool the water and result in even cooler water. A type II error would result in failing to reject H0 when in fact the water is too hot. Thus we could be working with water that is too hot and never know it. This second error would seem more serious. Exercise 8.8 Let µr and µs be the average warpage for the regular and the special laminate. Then we hope that µs < µr and our hypothesis test could be H0 : µ r − µ s = 0 Ha : µ r − µ s > 0 . For this problem a type I error would indicate that we assume that the warpage under the new laminate is less when in fact it is not. This we would switch to the new laminate when it is not actually better. A type II error would result in failing to reject H0 when in fact the new laminate is better. Exercise 8.9 Part (a): Since we need a two sided test we would select R1 . Part (b): A type I error is to conclude that the proportion of customers favors one cable company over another. A type II error is to conclude that there is no favored company when in fact there is. Part (c): When H0 is true we have X ∼ Bin(25, 0.5) and we have X Pr(X = x|H0 ) = α= x∈R1 Which we computed with the following R code x = c( 0:7, 18:25 ) 182 sum( dbinom( x, 25, 0.5 ) ) [1] 0.04328525 Part (d): These would be computed with x = 8:17 c( sum( dbinom( x, 25, 0.3 ) ), sum( dbinom( x, 25, 0.4 ) ), sum( dbinom( x, 25, 0.6 ) ), sum( dbinom( x, 25, 0.7 ) ) ) [1] 0.4881334 0.8452428 0.8452428 0.4881334 Part (e): According to R1 we would reject H0 in favor of Ha . Exercise 8.10 Part (a): The hypothesis test we would specify would be H0 : µ = 1300 Ha : µ > 1300 . ¯ is normal with a mean of 1300 and a standard deviation of Part (b): When H0 is true X 60 √ = 13.41641. With this we find 20 α = P {¯ x ≥ 1331.26|H0} = 1331.26 − 1300 1331.26 − 1300 x¯ − 1300 = 0.009903529 , ≥ |H0 = 1 − Φ =P 13.41641 13.41641 13.41641 or about 1%. ¯ is normal with a mean of 1350 and a standard deviation of Part (c): In this case X 60 √ = 13.41641. Following the same steps as above we find 20 1331.26 − 1350 . 1 − β = P {¯ x ≥ 1331.26|Ha} = 1 − Φ 13.41641 This gives β = 0.08123729. Part (d): We could change the critical value xc to be such that x¯ − 1300 xc − 1300 xc − 1300 α = 0.05 = P {¯ x ≥ xc |H0} = P . ≥ |H0 = 1 − Φ 13.41641 13.41641 13.41641 Solving for xc gives xc = 1322.068. This would make β smaller since we will be rejecting H0 more often. ¯ = 1331.26 into the expression for Z where we get a rejection Part (e): We would put X region {z ≥ 2.329359}. 183 Exercise 8.11 Part (a): The hypothesis test we would specify would be H0 : µ = 10 Ha : µ 6= 10 . Part (b): This would be α = P (¯ x ≤ 9.8968|H0) + P (¯ x ≥ 10.1032|H0) 10.1032 − 10 9.8968 − 10 + 1−Φ = 0.009880032 . =Φ 0.04 0.04 Part (c): If µ = 10.1 then we have β = P (9.8968 ≤ x¯ ≤ 10.1032) 9.8968 − 10.1 10.1032 − 10.1 =P ≤Z≤ 0.04 0.04 9.8968 − 10.1 10.1032 − 10.1 −Φ = 0.5318812 . =Φ 0.04 0.04 The same manipulations for µ = 9.8 give β = 0.007760254. −10 Part (d): We define z ≡ x¯0.04 and we want to translate the critical region in Part (b) in terms of x¯ into a critical region in terms of z. Letting x¯ equal the two values in the critical region of Part (a) we get the numerical values −2.58 and +2.58 thus c = 2.58. Part (e): We would now want to find a value of c such that 1 − α = P {−c ≤ z ≤ +c} , or if α = 0.05 we can write this as 0.95 = Φ(c) − Φ(−c) = (1 − Φ(−c)) − Φ(−c) = 1 − 2Φ(−c) . When we solve this for c we find c = 1.959964. Then using this value for c and n = 10 we get critical values on x¯ given by 0.2 (1.959964) = {9.876041 , 10.123959} . 10 ± √ 10 Our rejection region would then be to reject H0 if either x¯ ≥ 10.123959 or x¯ ≤ 9.876041. Part (f): For the given data set we compute x¯ = 10.0203. Since this is not in the rejection region we have no evidence to reject H0 . Part (g): This would be to reject H0 if either z ≥ +2.58 or z ≤ −2.58. 184 Exercise 8.12 Part (a): Our hypothesis test would be H0 : µ = 120 Ha : µ < 120 . ¯ is sufficiently small which will happen in region R2 . Part (b): We reject H0 is X Part (c): We have 115.20 − 120 α = P (¯ x ≤ 115.20|H0) = Φ (10/6) = 0.001988376 . To have a test with α = 0.001 we would want to pick a different critical value for xc i.e. we pick xc such that xc − 120 Φ = 0.001 . (10/6) Solving for xc in the above gives xc = 114.8496. Part (d): This would be the value of β where 115.20 − 115 = 0.5477584 . 1 − β = P (¯ x ≤ 115.20|Ha) = Φ 10/6 Thus β = 0.4522416. Part (e): These would be (using R notation) α = pnorm(−2.33) = 0.0099 ≈ 0.01 α = pnorm(−2.88) = 0.00198 ≈ 0.002 . Exercise 8.13 Part (a): We compute σ α = P x¯ > µ0 + 2.33 √ |H0 = 1 − Φ(2.33) = 0.009903 ≈ 0.01 . n Part (b): This would be 2.33σ x¯ − µ µ0 − µ √ ≥ √ + 2.33 α = P x¯ ≥ µ0 + √ |µ = 99 = P n σ/ n σ/ n µ0 − µ √ = 1 − Φ 2.33 + . σ/ n When µ0 = 100, n = 25, σ = 5 and µ = 99 the above gives 0.0004342299. When√µ = 98 the above gives 7.455467 10−6. If the actual µ is less than µ0 we have (µ0 − µ)/(σ/ n) > 0 so α in the above formula gets smaller. This makes sense since we are less likely to get a large ¯ and less likely to reject H0 . reading for X 185 Exercise 8.14 Part (a): We have α = P (z ≤ −2.65 or x ≥ 2.51|H0) = Φ(−2.65) + (1 − Φ(−2.51)) = 0.01006115 . Part (b): The probability we don’t reject H0 given that H0 is not true is given by β = P (9.894 ≤ x¯ ≤ 10.1004|µ = 10.1) 9.894 − 10.1 10.1004 − 10.1 −Φ = 0.5031726 . =Φ (0.2/5) (0.2/5) Exercise 8.15 Using R notation we have Part (a): α = P (z ≥ 1.88) = 1 − pnorm(1.88) = 0.03005404 . Part (b): α = P (z ≤ −2.75) = pnorm(−2.75) = 0.002979763 . Part (c): α = P (z ≤ −2.88 or z ≥ 2.88) = Φ(−2.88) + (1 − Φ(2.88)) = 0.003976752 . Exercise 8.16 Using R notation we have Part (a): α = P (t ≥ 3.733) = 1 − pt(3.733, 15) = 0.0009996611 . Part (b): α = P (t ≤ −2.5) = pt(−2.5, 23) = 0.009997061 . Part (c): α = P (t ≤ −1.697 or t ≥ 1.697) = pt(−1.697, 30) + (1 − pt(1.697, 30)) = 0.1000498 . 186 Exercise 8.17 √ = 2.56. As the rejection region for this Part (a): When x¯ = 30960 we have z = 30960−30000 1500/ 16 test is z ≥ zα = z0.01 = 2.33 we can reject H0 in favor of Ha . Part (b): Since z0.01 = 2.33 from the formulas in the book we have 30000 − 30500 µ0 − µ′ √ √ = Φ 2.33 + = 0.8405368 . β = Φ zα + σ/ n 1500/ 16 Part (c): From the formulas in the text we would have zβ = 1.644854 and n given by σ(zα + zβ ) n= µ0 − µ′ 2 1500(2.33 + 1.644) = 30000 − 30500 2 = 142.1952 , so we would need to take n = 143. Part (d): For this given value of x¯ we had z = 2.56 using R we find a P-value given by 0.005233608 indicating that we should reject H0 for any α larger than this value. Exercise 8.18 Part (a): Here we have z = x ¯−µ √ σ/ n = 72.3−75 9/5 = −1.5. Part (b): Since zα = 2.326348 and z > −zα we cannot reject H0 in favor of Ha . Part (c): Using the R command pnorm(-2.88) we find α = 0.001988376. Part (d): Using the formulas in the book we have µ0 − µ′ √ β = 1 − Φ −zα + σ/ n 75 − 70 = 0.5407099 . = 1 − Φ −2.88 + 9/5 Part (e): We would first compute zβ = qnorm(1 − 0.01) = 2.326348 and then σ(zα + zβ ) n= µ0 − µ′ 2 9(2.88 + 2.326348) = 75 − 70 Thus we would take n = 88. 187 2 = 87.82363 . Part (f): We would compute α = P (z ≤ −zα |H0 ) i.e. the probability we reject H0 given that it is true x¯ − µ0 √ ≤ −zα |H0 =P σ/ n σ = P x¯ ≤ µ0 − √ zα |H0 n µ0 − µ x¯ − µ √ ≤ −zα + √ |H0 =P σ/ n σ/ n µ0 − µ 75 − 76 √ = Φ −zα + = 0.001976404 . = Φ −2.326 + 9/5 σ/ n It makes sense that this probability is small since as µ gets larger is is less likely that z ≤ −zα . Exercise 8.19 Part (a): We first need to compute zα/2 = z0.005 = 2.575829. Now z = As |z| ≤ zα/2 we cannot reject H0 in favor of Ha . x ¯−95 1.2/4 = −2.266667. Part (b): From the formulas in the book we have 95 − 94 95 − 94 − Φ −zα/2 + = 0.224374 . β(µ) = Φ zα/2 + 1.2/4 1.2/4 Part (c): From the formulas in the book we first need to compute zβ = qnorm(1 − 0.1) = 2.326348. Then for the value of n we have σ(zα/2 + zβ ) 2 1.2(zα/2 + zβ ) 2 n= = 21.42632 . = µ0 − µ′ 95 − 94 Thus we would want to take n = 22. Exercise 8.20 Note that the p-value of this test is 0.016 which is less than 0.05 so we can reject H0 in favor of Ha . This p-value is not smaller than 0.01 and thus we cannot reject H0 at the 1% significance level. Exercise 8.21 We will assume that the hypothesis test we are performing is H0 : µ = 0.5 Ha : µ 6= 0.5 . 188 Part (a-b): Note that using R notation we have tα/2 = qt(1 − 0.5(0.05), 12) = 2.178813 Since |t| < tα/2 we cannot reject H0 in favor of Ha . We thus conclude that the ball bearings are manufactured correctly. Part (c): In this case we have tα/2 = qt(1 − 0.5(0.01), 24) = 2.79694 . Since |t| < tα/2 so we again cannot reject H0 in favor of Ha . Part (d): In this case |t| > tα/2 so we can reject H0 in favor of Ha . Exercise 8.22 We will assume that the hypothesis test we are performing is H0 : µ = 200 Ha : µ 6= 200 . Part (a): From the given box plot it looks like the average coating weight is greater than 200. Part (b): We compute z = 206.73−200 = 5.801724 which has a P-value of 6.563648 10−9. To 1.16 compute this we used the following R code z = ( 206.73 - 200 ) / 1.16 p_value = pnorm( -z ) + ( 1 - pnorm( z ) ) Exercise 8.23 The hypothesis test we are performing is H0 : µ = 6(60) = 360 seconds Ha : µ > 360 seconds . From the numbers in the problem we compute x¯ = 370.69 > 360 which indicates that from this sample the response time might be greater than six minutes. Since we are given the sample standard deviation we will assume that we don’t know the population standard x ¯−360 √ = deviation and should be working with the t-distribution. We compute t = 24.36/ 26 2.237624. This has a P-value given by 0.01719832. Since this is less than the value of 0.05 we should reject H0 in favor of Ha and this result shows a contradiction to the prior belief. 189 Exercise 8.24 The hypothesis test we are performing is H0 : µ = 3000 Ha : µ 6= 3000 . From the numbers in the problem we compute x¯ = 370.69 > 360 which indicates that from this sample the response time might be greater than six minutes. Since we are given a sample of data we will assume that we don’t know the population standard deviation and should be −3000 √ working with the t-distribution. We compute t = x¯s/ = −2.991161. This has a P-value 5 given by 0.02014595. Since this is less than the value of 0.05 we should reject H0 in favor of Ha at the 5% level. Thus we have evidence that the true average viscosity is not 3000. Exercise 8.25 Part (a): The hypothesis test we would perform is H0 : µ = 5.5 Ha : µ 6= 5.5 . Since we assume we know the population standard deviation we will work with the normal x ¯−5.5 √ distribution (and not the t-distribution). From the numbers given we compute z = 0.3/ = 16 −3.333333. This has a P-value given by 0.0008581207. Since this is less than the value of 0.01 we should reject H0 in favor of Ha indicating that the true average differs from 5.5. Part (b): From the formulas in the text we have µ0 − µ′ µ0 − µ′ ′ √ √ β(µ ) = Φ zα/2 + − Φ −zα/2 + . σ/ n σ/ n Using R we compute zα/2 = qnorm(1 − 0.5(0.01)) = 2.575829 . Using this value we have 5.5 − 5.6 5.5 − 5.6 √ √ β(5.6) = Φ zα/2 + − Φ −zα/2 + = 0.8929269 , 0.3/ 16 0.3/ 16 since this is the error the probability we detect a difference is one minus this amount or 0.1070731. Part (c): From the formulas in the text we have σ(zα/2 + zβ ) 2 . n= µ0 − µ′ Using the numbers in this problem we get n = 216.2821 thus we should take n = 217. 190 Exercise 8.26 The hypothesis test we will perform is H0 : µ = 50 Ha : µ > 50 . Since we are only told the sample standard deviation we will use the t-distribution (rather than the normal one), to compute the P-values. We find t = 3.773365 which has a P-value of 0.0002390189. Since this is smaller than this is smaller than 0.01 we can reject H0 for Ha at a 1%. Exercise 8.27 Part (a-b): A plot of the histogram of the data shows a distribution with a single peak and a longish right tail. All of the theorems in this text are proven assuming normality conditions but using the normal results on non-normal distributions may still be approximately correct especially when n is large. Part (c): The hypothesis test we will perform is H0 : µ = 1 Ha : µ < 1 . Using the data in this exercise we compute t = −5.790507 this has a P-value of 2.614961 10−7, thus there is strong evidence against H0 and we should reject it in favor of Ha . Part (d): A 95% confidence interval for µ is given by s x¯ ± tα,n−1 √ = {0.6773241, 0.8222677} . n Exercise 8.28 The hypothesis test we would perform is H0 : µ = 20 Ha : µ < 20 . As we don’t know the population standard deviation we will use the t-distribution. From the x ¯−20 √ = −1.132577. This has a P-value given numbers given in this exercise we compute t = 8.6/ 73 by 0.1305747 indicating that the evidence of lateral recumbency time is not inconsistent with H0 . 191 Exercise 8.29 Part (a): The hypothesis test we would perform is H0 : µ = 3.5 Ha : µ > 3.5 . As we don’t know the population standard deviation we will use hypothesis tests based x ¯−3.5 √ = on the t-distribution. From the numbers given in this exercise we compute t = 1.25/ 8 0.4978032. This has a P-value given by 0.316939 indicating that we cannot reject H0 at the 5% significance level. Part (b): From the formulas in the book we have 3.5 − 4 µ0 − µ′ ′ √ √ = Φ 1.644854 + = 0.6961932 , β(µ ) = Φ zα + σ/ n 1.25/ 8 note that this result is slightly different from that given in the back of the book. Exercise 8.30 The hypothesis test we would perform is H0 : µ = 15 Ha : µ < 15 . As we don’t know the population standard deviation we will use hypothesis tests based on x ¯−15 √ the t-distribution. From the numbers given in this exercise we compute t = 6.43/ = 115 −9 −6.170774. This has a P-value given by 5.337559 10 indicating that we can reject H0 at most significance levels. Exercise 8.31 The hypothesis test we would perform is H0 : µ = 7 Ha : µ < 7 . As we don’t know the population standard deviation we will use hypothesis tests based on the x ¯−7√ = −1.236364. t-distribution. From the numbers given in this exercise we compute t = 1.65/ 9 This has a P-value given by 0.1256946 indicating that we cannot reject H0 at the 10% significance level. 192 Exercise 8.32 Part (a): The hypothesis test we would perform is H0 : µ = 100 Ha : µ 6= 100 . As we don’t know the population standard deviation we will use hypothesis tests based ¯−100 √ = on the t-distribution. From the numbers given in this exercise we compute t = xs/ 12 −0.9213828. This has a P-value given by 0.3766161 indicating that we cannot reject H0 at the 5% significance level. Part (b): Using the formula in the book we have σ(zα/2 + zβ ) 2 n= . µ0 − µ′ We can compute zα/2 and zβ using the R code z_alpha_over_two = qnorm( 1-0.05/2 ) # gives 1.959964 z_beta = qnorm( 1 - 0.1 ) # gives 1.281552 Then we compute 7.5(1.959964 + 1.281552) n= 100 − 95 Thus we would take n = 24. 2 = 23.6417 . Exercise 8.33 From formula in the book we have µ0 − µ′ µ0 − µ′ ′ √ √ β(µ ) = Φ zα/2 + − Φ −zα/2 + . σ/ n σ/ n Using this we evaluate ∆ ∆ β(µ0 − ∆) = Φ zα/2 + √ − Φ −zα/2 + √ σ/ n σ/ n ∆ ∆ − Φ −zα/2 − √ . β(µ0 + ∆) = Φ zα/2 − √ σ/ n σ/ n Since Φ(c) = 1 − Φ(−c) by applying this twice we can write β(µ0 + ∆) as ∆ ∆ β(µ0 + ∆) = 1 − Φ −zα/2 + √ − 1 − Φ zα/2 + √ σ/ n σ/ n ∆ ∆ − Φ −zα/2 + √ = β(µ0 − ∆) , = Φ zα/2 + √ σ/ n σ/ n as we were to show. 193 Exercise 8.34 Consider the case where Ha : µ > µ0 then we must have µ′ > µ0 to make a type II error. In this case as n → ∞ we have √ µ0 − µ′ Φ zα + n → Φ(−∞) = 0 , σ the other cases are shown in the same way. Exercise 8.35 The hypothesis test to perform is With n = 200 and pˆ = 124 200 H0 : p = 0.7 Ha : p 6= 0.7 . = 0.62 we compute z=p pˆ − p0 p0 (1 − p0 )/n = −2.468854 , (17) which has a P-value of 0.01355467 indicating we should reject H0 at the 5% level. Exercise 8.36 Part (a): The hypothesis test to perform is H0 : p = 0.1 Ha : p > 0.1 . With n = 100 and pˆ = 0.14 we compute z in Equation 17 and get z = 1.333333 which has a P-value of 0.09121122. This does not provide compelling evidence that more than 10% of all plates blister under similar circumstances. Part (b): In this case we have p′ = 0.15 and so using the formulas in the book we need to compute ! p ′ p − p + z p (1 − p )/n 0 α 0 0 p β(p′ ) = Φ . (18) p′ (1 − p′ )/n When we do that we find β = 0.4926891 if n = 200 we find β = 0.274806. Part (c): We have zβ = qnorm(1 − 0.1) = 2.326348 and then n is given by #2 " p p zα p0 (1 − p0 ) + zβ p′ (1 − p′ ) = 701.3264 . n= p′ − p0 Rounding upwards we would take n = 702. 194 Exercise 8.37 Part (a): The hypothesis test to perform is H0 : p = 0.4 Ha : p 6= 0.4 . 82 . Note that p0 n = 60 > 10 and (1 − p0 )n = 90 > 10 so we With n = 150 and pˆ = 150 can use large sample asymptotics for this problem. We compute z in Equation 17 and get z = 3.666667 which has a P-value of 0.0002457328. This does provide compelling evidence that we can reject H0 . Exercise 8.38 Part (a): The hypothesis test to perform is H0 : p = 2/3 Ha : p 6= 2/3 . 80 With n = 124 and pˆ = 124 . Note that p0 n = 82.66667 > 10 and (1 − p0 )n = 41.33333 > 10 so we can use large sample asymptotics for this problem. We compute z in Equation 17 and get z = −0.5080005 which has a P-value of 0.611453. This does not provide compelling evidence that we can reject H0 . Part (b): With a P-value that large we cannot reject H0 and the value of 2/3 is plausible for kissing behavior. Exercise 8.39 Part (a): The hypothesis test to perform is H0 : p = 0.02 Ha : p < 0.02 . 15 With n = 1000 and pˆ = 1000 = 0.015. Note that p0 n = 20 > 10 and (1 − p0 )n = 980 > 10 so we can use large sample asymptotics for this problem. We compute z in Equation 17 and get z = −1.129385 which has a P-value of 0.1293678. This does not provide compelling evidence that we can reject H0 . Part (b-c): Now we assume p′ = 0.01 and use the formula in the book to compute β(p′ ) or the probability we take the inventory when we don’t need to i.e. we fail to reject H0 . The formula we use is given by ! p p0 − p′ − zα p0 (1 − p0 )/n ′ p . (19) β(p ) = 1 − Φ p′ (1 − p′ )/n 195 With the numbers from this problem we compute β(0.01) = 0.1938455. In the same way we compute 1 − β(0.05) = 3.160888 10−8 which is the probability that when p = 0.05 we will reject H0 in favor of Ha . Exercise 8.40 The hypothesis test to perform is H0 : p = 0.25 Ha : p 6= 0.25 . 177 With n = 1050 and pˆ = 1050 = 0.1685714. Note that p0 n = 262.5 > 10 and (1 − p0 )n = 787.5 > 10 so we can use large sample asymptotics for this problem. We compute z in Equation 17 and get z = −6.093556 which has a P-value of 1.104295 10−9. This does provide compelling evidence that we can reject H0 . Exercise 8.41 Part (a): The hypothesis test to perform is H0 : p = 0.05 Ha : p 6= 0.05 . 40 = 0.08. Note that p0 n = 25 > 10 and (1 − p0 )n = 475 > 10 so we With n = 500 and pˆ = 500 can use large sample asymptotics for this problem. We compute z in Equation 17 and get z = 3.077935 which has a P-value of 0.002084403. This does provide compelling evidence that we can reject H0 at the 1% level. Part (b): We use the formula from the book to compute ! ! p p p0 − p′ + zα/2 p0 (1 − p0 )/n p0 − p′ − zα/2 p0 (1 − p0 )/n ′ p p β(p ) = Φ −Φ . p′ (1 − p′ )/n p′ (1 − p′ )/n From the numbers given in this problem we compute β(0.1) = 0.03176361. Exercise 8.42 Part (a): The hypothesis test to perform is H0 : p = 0.5 Ha : p 6= 0.5 . 196 (20) To test this we would use the third region where “extream” values of X in either direction (small or large) indicate departures from p = 0.5. Part (b): We would compute α = P { Reject H0 | H0 is true } 3 k 20−k 20 20 X X 1 20 1 1 20 = + k 2 2 2 k k=0 k=17 = 0.002576828 . We computed the above with the following R code sum( c( dbinom( 0:3, 20, 0.5 ), dbinom( 17:20, 20, 0.5 ) ) ) Since this value is less than 0.05 we could use it as a 5% level test. To be the best 5% level test we would want the largest rejection interval such that α < 0.05. Trying to make the above rejection interval larger yields α > 0.05 so we conclude that this is the best 5% rejection region. Part (c): We would compute 16 X 20 0.6k 0.420−k = 0.9839915 . β = P { Accept H0 | Ha is true } = k k=4 We computed the above with the following R code sum( dbinom( 4:16, 20, 0.6 ) ) In the same way we find β(0.8) = 0.5885511. Part (d): For α = 0.1 the rejection region should be {0, 1, 2, 3, 4, 5, 15, 16, 17, 18, 19, 20}. Since 13 is not in this rejection region we do not reject H0 in favor of Ha . Exercise 8.43 The hypothesis test to perform is H0 : p = 0.1 Ha : p > 0.1 . The probability of not proceeding when p = 0.1 is the value of α and we want α ≤ 0.1. This means that we must pick a decision threshold c such that n X n 0.1k 0.9n−k ≤ 0.1 . α = P { Reject H0 | H0 is true } = k k=c 197 In addition we are told if p = 0.3 the probability of proceeding should be at most 0.1 which means that c−1 X n 0.3k 0.7n−k ≤ 0.1 . β = P { Accept H0 | Ha is true } = k k=0 We can find if a value of c exists for given values of n by looping over all possible values for c and printing the ones that satisfy the above conditions. We can do this with the following R code search_for_critical_value = function( n=10 ){ for( thresh in 0:n ){ alpha = sum( dbinom( thresh:n, n, 0.1 ) ) if( alpha <= 0.1 ){ beta = sum( dbinom( 0:(thresh-1), n, 0.3 ) ) if( beta <= 0.1 ){ print( c(thresh,alpha,beta) ) } } } } Using the above function I find that the call search for critical value(25) gives the output [1] 5.00000000 0.09799362 0.09047192 Thus the value of c should be 5 and with this value we have α = 0.09799362 and β = 0.09047192. Exercise 8.44 The hypothesis test to perform is H0 : p = 0.035 Ha : p < 0.035 . 15 With n = 500 and pˆ = 500 = 0.03. Note that p0 n = 17.5 > 10 and (1 − p0 )n = 482.5 > 10 so we can use large sample asymptotics for this problem. We compute z in Equation 17 and get z = −0.6083553 which has a P-value of 0.2714759. This does not provide compelling evidence that we can reject H0 at the 1% level. 198 Exercise 8.45 The null is rejected if the P-value is small so in this case we will be rejected when the P-value is less than 0.05. Thus we will reject for (a), (b), and (d). Exercise 8.46 We would reject H0 in cases (c), (d), and (f). Exercise 8.47 We can compute the P-value in each of these cases using R with the expression 1-pnorm(z). For the given values we compute z_values = c( 1.42, 0.9, 1.96, 2.48, -0.11 ) p_values = 1 - pnorm( z_values ) p_values [1] 0.077803841 0.184060125 0.024997895 0.006569119 0.543795313 Exercise 8.48 We can compute the P-value in each of these cases using R with the expression 2(1-pnorm(abs(z))). For the given values we compute z_values = c( 2.10, -1.75, -0.55, 1.41, -5.3 ) p_values = 2 * ( 1 - pnorm( abs(z_values) ) ) p_values [1] 3.572884e-02 8.011831e-02 5.823194e-01 1.585397e-01 1.158027e-07 Exercise 8.49 We can evaluate the P-values using the following R code c( 1-pt(2.0,8), pt(-2.4,11), 2*(1-pt(1.6,15)), 1-pt(-0.4,19), 1-pt(5,5), 2*(1-pt(4.8,40)) ) [1] 4.025812e-02 1.761628e-02 1.304450e-01 6.531911e-01 2.052358e-03 2.234502e-05 199 Exercise 8.50 Using the R code we compute c( 1-pt(3.2,15), 1-pt(1.8,9), 1-pt(-0.2,24) ) [1] 0.002981924 0.052695336 0.578417211 As the first number is less that 0.05 we can reject H0 at the level 5%, as the second number is larger than 0.01 we cannot reject H0 at the level 1%, as the third number is so large we would not reject H0 at any reasoable level. Exercise 8.51 Since our P-value is larger than 0.1 it is larger than 0.01. In each case we cannot reject H0 in favor of Ha at this significance level either. Exercise 8.52 The hypothesis test to perform is H0 : p = 1/3 Ha : p 6= 1/3 . With n = 855 and pˆ = 346 = 0.4046784. Note that p0 n = 285 > 10 and (1 − p0 )n = 570 > 10 855 so we can use large sample asymptotics for this problem. We compute z in Equation 17 and get z = 4.425405 which has a P-value of 4.813073 10−6. A P-value this small indicates that we should reject H0 and that there seems to be evidence of ability to distinguish between reserve and regular wines. Exercise 8.53 The hypothesis test to perform is to assume the null hypothesis that the pills each weight 5 grains or H0 : µ = 5 Ha : µ 6= 5 . ¯−µ √ 0 = −3.714286 which has a P-value of 0.0003372603. A P-value this We compute t = xs/ n small indicates that we should reject H0 and that there seems to be evidence that the pills are smaller than they should be. 200 Exercise 8.54 The hypothesis test to perform is H0 : p = 0.2 Ha : p > 0.2 . Part (a): With n = 60 and pˆ = 15 = 0.25. Note that p0 n = 12 > 10 and (1−p0 )n = 48 > 10 60 so we can use large sample asymptotics for this problem. We compute z in Equation 17 and get z = 0.9682458 which has a P-value of 0.1664608. A P-value this large indicates that we don’t have enough evidence to reject H0 and we don’t need to modify the manufacturing process. Part (b): We want to compute 1 − β where β is given by Equation 18. When we use that with the numbers from this problem we get 1 − β = 0.995158. Exercise 8.55 The hypothesis test to perform is H0 : p = 0.5 Ha : p < 0.5 . 47 = 0.4607843. Note that p0 n = (1 − p0 )n = 51 > 10 so we With n = 102 and pˆ = 102 can use large sample asymptotics for this problem. We compute z in Equation 17 and get z = −0.792118 which has a P-value of 0.2141459. A P-value this large indicates that we don’t have enough evidence to reject H0 . Exercise 8.56 The hypothesis test to perform is H0 : µ = 3 Ha : µ 6= 3 . x ¯−3 We compute z = 0.295 = −1.759322 which has a P-value of 0.07852283. Since this is smaller than 0.1 we would reject H0 at the 10% level. It is not smaller than 0.05 we would not reject H0 in that case. Exercise 8.57 The hypothesis test to perform is H0 : µ = 25 Ha : µ > 25 . 201 Using the numbers in this problem we compute x¯ = 27.923077, s = 5.619335, and have x ¯−µ √ = 1.875543 which has a P-value of 0.04262512. Since this n = 13. This gives a t = s/ n is smaller than 0.05 we would reject H0 and conclude that the mean response time seems greater than 25 seconds. Exercise 8.58 Part (a): The hypothesis test to perform is H0 : µ = 10 Ha : µ 6= 10 . Part (b-d): Since this is a two sided test the P-values for each part are given by the following R expressions c( 2*(1 - pt(2.3,17)), 2*(1 - pt(1.8,17)), 2*(1 - pt(3.6,17)) ) [1] 0.034387033 0.089632153 0.002208904 Thus we would reject H0 , accept H0 , and reject H0 (under reasonable values for α). Exercise 8.59 The hypothesis test to perform is H0 : µ = 70 Ha : µ 6= 70 . Using the numbers in this problem we compute x¯ = 75.5, s = 7.007139, and have n = 6. x ¯−µ √ = 1.922638 which has a P-value of 0.1125473. Since this is larger than This gives a t = s/ n 0.05 we would accept H0 and conclude that the spectrophotometer is working correctly. Exercise 8.60 Since the P-value given by the SAS output is larger than 0.01 and 0.05 we cannot reject H0 at the 1% or the 5% level. Since the P-value is smaller than 0.1 we can reject H0 at the 10% level. 202 Exercise 8.61 Part (a): We would compute x¯ − 75 √ > −zα µ = 74 β = P {accept H0 |Ha is correct} = P σ/ n 9 x¯ − 74 1 √ > −zα + √ = P x¯ > 75 − zα √ µ = 74 = P . n 9/ n 9/ n Taking zα = qnorm(1 − 0.01) = 2.326348 and using the three ns suggested in the exercise we compute the above to be [1] 0.8878620986 0.1569708809 0.0006206686 Part (b): We have z = Yes. 74−75 √ σ/ n = −5.555556 which has a P-value given by 1.383651 10−8. Part (c): This part has to do with the comments in the book about rejecting H0 when the sample size is large since when n is very large almost any departure will be detected (even if the departure is not practically significant). Exercise 8.62 Part (a): Using Equation 18 we would compute for the given values of n [1] 0.979279599 0.854752242 0.432294184 0.004323811 Part (b): I compute P-values of [1] 4.012937e-01 1.056498e-01 6.209665e-03 2.866516e-07 Part (c): This has to do with the comments in the book about rejecting H0 when the sample size is large since when n is very large almost any departure will be detected (even if the departure is not practically significant). Exercise 8.63 The hypothesis test to perform is H0 : µ = 3.2 Ha : µ 6= 3.2 . 203 Using the numbers in this problem we compute t = −3.119589 which has a P-value of 0.003031901. Since this is smaller than 0.05 we would reject H0 and conclude that the average lens thickness is something other than 3.05 mm. Exercise 8.64 We compute zα/2 = qnorm(1 − 0.05(0.05)) = 1.959964 zβ = qnorm(1 − 0.05(0.05)) = 1.644854 , then using the formulas in the book for sample size determination to select a value of β we have 2 σ(zα/2 + zβ ) 2 0.3(1.959964 + 1.644854) n= = = 29.2381 . µ0 − µ′ 3.2 − 3.0 Thus we need only around 30 samples. Exercise 8.65 Part (a): The hypothesis test to perform is H0 : µ = 0.85 Ha : µ 6= 0.85 . Part (b): Since the given P-value is larger than both 0.05 and 0.1 we cannot reject H0 at either level. Exercise 8.66 Part (a): The hypothesis test we would perform is H0 : µ = 2150 Ha : µ > 2150 . Part (b-c): We would need to compute z = x ¯−2150 30/4 = 1.333333. Part (d): The P-value is given by in R 1 − pnorm(z) = 0.09121122. Part (e): Since the above P-value is larger than 0.05 we cannot reject H0 in favor of Ha . 204 Exercise 8.67 The hypothesis test we would perform is H0 : µ = 548 Ha : µ > 548 . Part (a): We compute z = 587−548 √ 10/ 11 = 12.93484 which has a P-value of zero. Part (b): We assumed a normal distribution for the errors in the measurement of phosphorus levels. Exercise 8.68 The hypothesis test we would perform is H0 : µ = 29 Ha : µ > 29 . x ¯−µ √ = 0.7742408 which has a P-value given by 0.2193942. Since Part (a): We compute z = s/ n this P-value is not smaller than 0.05 we cannot reject H0 in favor of Ha . Exercise 8.69 Part (a): This distribution does not look normal since there are no negative values in a distribution that has a mean value around 215 and a standard deviation around 235. With a standard deviation this large and a normal distribution we would expect some negative samples. Part (b): The hypothesis test we would perform is H0 : µ = 200 Ha : µ > 200 . x ¯−µ √ = 0.437595 which has a P-value given by 0.33084. Since this P-value We compute z = s/ n is not smaller than 0.1 we cannot reject H0 in favor of Ha . Exercise 8.70 From the given output the P-value is 0.043 which is smaller than 0.05 and thus we can reject H0 in favor of Ha at the 5% level. The P-value is not less than 0.01 and thus we cannot reject H0 in favor of Ha at the 1% level. 205 Exercise 8.71 The hypothesis test we would perform for this problem is H0 : µ = 10 Ha : µ < 10 . We would have zα = qnorm(1 − 0.01) = 2.326348. Part (a): We want to compute β(9.5) or the probability we don’t reject H0 in favor of Ha when we should. From the formulas in the book we have µ0 − µ′ 10 − 9.5 ′ √ √ β(µ = 9.5) = 1 − Φ −zα + = 1 − Φ −2.326348 + = 0.6368023 . σ/ n 0.8/ 10 The same calculation when µ′ = 9.0 gives β = 0.05192175. Part (b): We first need to compute zβ = qnorm(1 − 0.25) = 0.6744898 since we would have a 25% chance of making an error. Then from the formulas in the book we calculate 2 2 σ(zα + zβ ) 0.8(zα + zβ ) n= = 23.05287 . = µ0 − µ′ 10 − 9.5 Thus we would take n = 24. Note this result is slightly different than that given in the back of the book which might be due to the books use of the tables in the Appendix. Exercise 8.72 The hypothesis test we would perform for this problem is H0 : µ = 9.75 Ha : µ > 9.75 . From the data given we compute x¯ = 9.8525 and s = 0.09645697 with n = 20 so that −9.75 √ = 4.752315. This has a P-value of 1.005503 10−6. Thus we would reject H0 in z = x¯s/ n favor of Ha . Exercise 8.73 Part (a-b): The hypothesis test we would perform for this problem is 1 75 1 Ha : µ = 6 . 75 H0 : p = 206 We compute z = √ pˆ−p0 p0 (1−p0 )/n = 1.64399. Using the normal distribution this has a P-value of 0.1001783. As this is larger than 0.05 we can not reject H0 at the 5% level. We computed this using the following R code n = 800 p_hat = 16/800 p_0 = 1/75 z = ( p_hat - p_0 ) / sqrt( p_0 * ( 1 - p_0 ) / n ) pnorm( -z ) + ( 1 - pnorm( z ) ) n * p_0 n * (1-p_0) Note that np0 = 10.66667 > 10 and n(1 − p0 ) = 789.3333 > 10 so we can use this large sample tests. Exercise 8.74 The hypothesis test we would perform for this problem is H0 : µ = 1.75 Ha : µ > 1.75 . √ We compute t = 1.89−1.75 = 1.699673. Using the t-distribution this has a P-value of 0.42/ 26 0.05080173. As this is larger than 0.05 we cannot reject H0 at the 5% level but we could at the 10% level. We computed this using the following R code t = ( 1.89 - 1.75 )/( 0.42/sqrt(26) ) 1 - pt( t, 25 ) Exercise 8.75 The hypothesis test we would perform for this problem is H0 : µ = 3200 Ha : µ < 3200 . √ We compute z = 3107−3200 = −3.31842. This has a P-value of 0.0004526412. Thus we can 188/ 45 reject H0 at α = 0.001. 207 Exercise 8.76 The hypothesis test we would perform for this exercise is H0 : p = 0.75 Ha : p < 0.75 . In the following R code n = 72 p_0 = 0.75 p_hat = 42 / 72 z = ( p_hat - p_0 ) / sqrt( p_0 * ( 1 - p_0 ) / n ) p_value = pnorm( z ) We compute z = −3.265986 which has a P-value 0.0005454176. Thus we can reject H0 at the 1% level and the true proportion of mechanics that could identify the given problem is likely less than 0.75. Exercise 8.77 The hypothesis test we would perform for this problem is H0 : λ = 4 Ha : λ > 4 . We compute x¯ = 160 36 = 4.444444 and then compute z = √x¯−4 = 1.333333. The P-value for 4/36 this is 0.09121122. Thus we can reject H0 at the 10% level but not the 2% level. Exercise 8.78 Part (a): The hypothesis test we would perform for this exercise is H0 : p = 0.02 Ha : p > 0.02 . In the following R code n = 200 p_0 = 0.02 p_hat = 0.083 z = ( p_hat - p_0 ) / sqrt( p_0 * ( 1 - p_0 ) / n ) p_value = 1 - pnorm( z ) 208 We compute z = 6.363961 which has a P-value 9.830803 10−11. Thus we can reject H0 at all reasonable levels and certainly at 5%. Part (b): For the value of α = 0.05 we have zα = qnorm(1 − 0.05) = 1.644854 and the formulas in the book give ! p p0 − p′ + zα p0 (1 − p0 )/n ′ p = 0.1867162 . β(p ) = Φ p′ (1 − p′ )/n Exercise 8.79 The hypothesis test we would perform for this problem is H0 : µ = 15 Ha : µ > 15 . 17.5−15 √ We compute t = 2.2/ = 6.428243. The P-value for this is 1.824991 10−7. Thus we can 32 (easily) reject H0 at the 5% level. Exercise 8.80 The hypothesis test we would perform for this problem is H0 : σ = 0.5 Ha : σ > 0.5 . 2 We compute χ2 = 9(0.58) = 12.1104. The P-value for this is 0.2071568. Thus the observation 0.52 is not large enough for us to reject H0 . Exercise 8.81 In this case we are told that χ2 = 8.58. The P-value for this is 0.01271886. We computed this with the following simple R code # the probability our statistic is less than or equal to the observed 8.58 chi2 = 8.58 pchisq( chi2, 20 ) Since this is larger than 0.01 (but not by much) we would not be able to reject H0 . We could reject H0 at the 5% level however. 209 Exercise 8.82 Part (a): We would use as our estimator of θ ¯ + 2.33S , θˆ = X where S is the sample standard deviation. Part (b): When we assume independence we get 2 σ2 σ2 2σ 2 ˆ ¯ + 2.33 = 3.71445 . Var θ = Var X + 2.33 Var (S) = n 2n n Using this we have σ σθˆ = 1.927291 √ . n Part (c): The hypothesis test we would perform for this problem is H0 : µ + 2.33σ = 6.75 Ha : µ + 2.33σ < 6.75 . = −1.224517. The P-value for this is 0.1103787. Since this We compute z = x¯+2.33s−(µ+2.33σ) σθˆ is larger than 0.01 we cannot reject H0 in favor of Ha . Exercise 8.83 Part (a): We will consider the hypothesis tests of the form H0 : µ = µ 0 Ha : µ < µ 0 . and the other two forms. To do this we will consider the statistic P 2λ ni=1 Xi − 2n √ Z= . 2 n where λ = µ10 . We use this form since under the assumptions of the problem the expression P 2λ ni=1 xi has a mean of 2n and a variance of 4n. Part (b): Using the following R code data = c( 95, 16, 11, 3, 42, 71, 225, 64, 87, 123 ) n = length( data ) z = ( 2 * sum( data ) / 75 - 2 * n ) / ( 2 * sqrt( n ) ) pnorm( z ) We find that z = −0.05481281, which has a P-value of 0.4781438. This is so large that we cannot reject H0 . 210 Simple Linear Regression and Correlation Problem Solutions Note all R scripts for this chapter (if they exist) are denoted as ex12 NN.R where NN is the section number. Exercise 12.1 Part (a-b): Using the stem command in R we would get for the Temp feature 17 17 18 18 | | | | 02344 567 0000011222244 568 while for the Ratio feature we would get 0 1 1 2 2 3 | | | | | | 889 0011344 55668899 12 57 01 It looks like there are a good number of temperatures around 180 and most ratios are around 1. The Ratio feature also looks slightly skewed to the right (has more larger values than smaller values). A scatter plot of the data shows that Ratio is not determined only by Temp. Part (c): See Figure 7 (left) for a scatter plot of Ratio as a function of Temp. From that plot it looks like a line would do a decent but not great job at modeling this data. Exercise 12.2 See Figure 7 (right) for a scatter plot of Baseline as a function of Age. From that plot it looks like a line would do a decent job of fitting the data if the two points with values of Age around 7 were removed (perhaps they are outliers). 211 5 3.0 1 2 3 DF$Baseline 4 2.5 2.0 DF$Ratio 1.5 1.0 170 175 180 185 0 DF$Temp 5 10 15 DF$Age Figure 7: Left: A scatter plot of the data for Exercise 12.1. Right: A scatter plot of the data for Exercise 12.2. Exercise 12.3 When this data is plotted it looks like it is well approximated by a line. Exercise 12.4 Part (a-b): See Figure 8 for each of the requested plots. In the box plot it looks like the amount removed is a smaller amount than the amount loaded and the amount removed seems to have a smaller spread of the data around its center value. From the scatter plot we have drawn the line y = x to emphasis that the amount removed is less than the amount loaded. The linear fit would seem to be influenced by the point near (x, y) = (40, 10). Exercise 12.5 Part (a-b): See Figure 9 for each of the requested plots. In the plot with the origin at (55, 100) we see the potential for a quadratic fit as the points increase and then decrease again. 212 80 140 120 BOD mass removed 40 60 100 80 60 20 40 20 0 x y 0 20 40 60 80 100 120 140 BOD mass loading Figure 8: Left: A box plot of the two features x BOD mass loading and y BOD mass removal. Right: A scatter plot of y as a function of x for the data for Exercise 12.4. Exercise 12.6 (tennis elbow) From the given scatter plot in the text we see two points with a very large x values (relative to the other data point). Their presence could strongly affect the linear relationship estimated. Exercise 12.7 Part (a): 1800 + 1.3(2500) = 5050. Part (b): 1.3(1) = 1.3. Part (c): 1.3(100) = 130. Part (d): 1.3(−100) = −130. 213 axes intersecting at (0,0) 0 100 50 150 100 DF$y elongation 150 200 200 250 250 axes intersect at (55,100) 0 20 40 60 55 80 60 65 70 75 80 85 90 DF$x temperature Figure 9: The data for Exercise 12.5. Left: With the origin at (0, 0) Right: With the origin at (55, 100). 214 Exercise 12.8 Part (a): When the acceleration strength is 2000 the distribution of the 28-day strength is normal with a mean given by 1800 + 1.3(2000) = 4400 with a standard deviation of 350. Then 5000 − 4400 Y − 4400 = 0.04323813 . ≤ P (Y ≥ 5000) = 1 − P (Y ≤ 5000) = 1 − P 350 350 Part (b): We have E(Y ) = 1800 + 1.3(2500) = 5050 and the standard deviation is the same as before (or 350). Then 5000 − 5050 P (Y ≥ 5000) = 1 − Φ = 0.5567985 . 350 Part (c): We have E(Y1 ) = 1800 + 1.3(2000) = 4400 E(Y2 ) = 1800 + 1.3(2500) = 5050 . We want to evaluate P (Y2 ≥ Y1 ) or P (Y2 − Y1 ≥ 0). Now as Y1 and Y2 are independent normal random variables then Y2 − Y1 is also a normal random variable with a mean of E(Y1 ) − E(Y2 ) = 4400 − 5050 = −650 and a variance 23502 = 245000 (so the standard deviation is 494.9747). Using these we would find 1000 − (−650) = 0.0004287981 . P (Y2 − Y1 ≥ 1000) = 1 − Φ 494.9747 Part (d): We have P (Y2 > Y1 ) = P (Y2 − Y1 > 0) = 1 − P (Y2 − Y1 < 0) 0 − (E(Y2 ) − E(Y1 )) √ =1−Φ . 23502 Since E(Y2 ) − E(Y1 ) = 1800 + 1.3x2 − (1800 + 1.3x1 ) = 1.3(x2 − x1 ) , we have −1.3(x2 − x1 ) √ P (Y2 − Y1 ) = 1 − Φ . 23502 We want to find x2 − x1 such that the above equals 0.95. We can do this with the R code - qnorm( 0.05 ) * sqrt( 2 * 350^2 ) / 1.3 which gives the value 626.2777. 215 Exercise 12.9 Part (a): 0.095(1) = 0.095. Part (b): 0.095(−5) = −0.475. Part (c): We would have −0.12 + 0.095(10) = 0.83 −0.12 + 0.095(15) = 1.305 . Part (d): We have 0.835 − (−0.12 + 0.095(10)) P (Y > 0.835) = 1 − P (Y < 0.835) = 1 − Φ 0.025 = 0.4207403 . To get the other part we would change the value of 0.835 in the above to 0.840. Part (e): We have P (Y10 > Y11 ) = P (Y10 − Y11 0 − (E(Y10 ) − E(Y11 )) √ > 0) = 1 − Φ 2(0.025) = 0.003604785 . Exercise 12.10 If the given expression is true then we must have 5500 − E(Y ) P (Y > 5500 when x = 100) = 1 − Φ SD(Y ) 5500 − (4000 + 10(100)) = 1−Φ = 0.05 . SD(Y ) This requires a value for the standard deviation of Y is given by 303.9784. Now that we have the given value for the standard deviation we can check if the second probability statement is true under that condition. Using R we can evaluate that statement with 1 - pnorm( ( 6500 - (4000 + 10*200) )/ 303.9784 ) which gives the value of 0.04999999 which is less than the claimed value of 0.1 thus the two statements given are inconsistent. 216 Exercise 12.11 Part (a): We would have −0.01(1) = −0.01 −0.01(10) = −0.1 . Part (b): We would have 5.00 − 0.01(200) = 3 5.00 − 0.01(250) = 2.5 . Part (c): All measurements are independent and thus the probability that all five times are between 2.4 and 2.6 is the fifth power of the probability that one time is between these two values. To calculate this later value we have P (2.4 < Y < 2.6) = P (Y < 2.6)−P (Y < 2.4) = Φ 2.4 − E(Y ) 2.6 − E(Y ) −Φ = 0.8175776 , sd(Y ) sd(Y ) using the R code pnorm( ( 2.6 - (5-0.01*250) )/0.075 ) - pnorm( ( 2.4 - (5-0.01*250) )/0.075 ) Then the the probability that all five measurements are between these two times is the fifth power of the above number or 0.3652959. Part (d): We would evaluate P (Y2 > Y1 ) = P (Y2 − Y1 > 0) = 1 − P (Y2 − Y1 < 0) = 1 − Φ 0 − (−0.01(1)) 0.075 = 0.4469649 . Here we have used the fact that E(Y2 ) − E(Y1 ) = −0.01(x2 − x1 ) = −0.01(1) . Estimating Model Parameters Exercise 12.12 Part (a): We would calculate P P P 25825 − (517)(346)/14 xi yi − ( xi )( yi )/n Sxy ˆ P 2 P 2 = = = 0.6522902 and β1 = Sxx xi − ( xi ) /n 39095 − 5172 /14 βˆ0 = y¯ − βˆ1 x¯ = 0.6261405 . 217 Thus the regression line is given by y = βˆ0 + βˆ1 x = 0.6261405 + 0.6522902x . Part (b): We get when x = 35 a prediction given by yˆ = 0.6261405 + 0.6522902(35) = 23.45 . Since this measurement is the 9th measurement it has a measured value of y = 21 thus a residual of 21 − 23.45 = −2.45. Part (c): We compute X X X SSE = yi2 − βˆ0 yi − βˆ1 xi yi = 17454 − 0.6226(346) − 0.65228(25825) = 394.45 , and σ ˆ2 = A square root then gives σ ˆ = 5.726. SSE 394.45 = = 32.78 . n−2 12 Part (d): The proportion of explained variation or r 2 can be written as r2 = 1 − SSE . SST We compute SST = Thus r 2 = 1 − X 394.45 8902.85 (yi − y¯)2 = = 0.95569. X yi2 − ( X yi )2 /n = 17454 − 3462 = 8902.85 . 14 Part (e): Note that the x value of 103 has a y value of 75 and the x value of 142 has a y value of 90. Thus the new sum values without these two sample points are given by X xi = 517 − 103 − 142 = 272 X yi = 346 − 75 − 90 = 181 X x2i = 34095 − 1032 − 1422 = 3322 X yi2 = 17454 − 752 − 902 = 3729 X xi yi = 25825 − 103(75) − 142(90) = 5320 , with n = 12. Using these values we get βˆ1 = −0.428 βˆ0 = 24.78 and r 2 = 0.6878 , note that these are quite different values than what we obtained before (when we included these two points in the regression). 218 Ex. 6.13 We can plot the points to see what they look like. If we do we see that this data looks like it comes from a line. Fitting a linear regression model gives for the proportion of explained variance r 2 = 0.9716, indicating that a line fits very well. Ex. 6.14 We will work this problem with R. Part (a): We find the regression line given by Ratio = −15.2449 + 0.09424Temp . Part (c): The have y values on different sides of the least square line. Part (d): This is given by r 2 = 0.4514. Ex. 6.15 Part (a): Using R a stem plot gives 2 3 4 5 6 7 8 | | | | | | | 034566668899 0133466789 2 3 0 00 which shows a cluster of points around MOE ≈ 30 − 40 and an island of other points near 70 − 80. Part (b): From the scatter plot of strength as a function of MOE the strength is not uniquely determined by MOE i.e. at a given MOE value there look to be several possible values for strength. Part (c): The least-squares line is y = 3.2925 + 0.10748x. At x = 40 we get y = 7.5917. The point x = 40 is inside the data used to build our linear model and thus using our model at that point should not cause worry. The value of x = 100 is outside of the data used to build the least squares model so we would not be comfortable using the least squares line to predict strength in that case. 219 Part (d): From the MINITAB output we have SSE = 18.736, SST = 71.605, and r 2 = 0.738 so yes the relationship is relatively good at fitting the data. Exercise 12.16 We will use the R command lm to compute all needed items. Part (a): A scatter plot of the data points looks very linear and has an r 2 of 0.953. Part (b-d): See the lm output. Part (e): This proportion is the same as the value of r 2 reported above or 0.953. Exercise 12.17 Part (b): 3.678 + 0.144(50) = 10.878. Part (c): We want to evaluate σ ˆ2 = Since we have SSE . n−2 SSE = SST − SSR = 320.398 − SSR = 320.398 − r 2 SST = 320.398 − 320.398(0.860) = 44.85572 . Thus σ ˆ2 = 44.85572 23−2 = 2.135987 and σ ˆ = 1.4615. Exercise 12.18 Part (a): We would calculate P P P S x y − ( x )( y )/n 987.645 − (1425)(10.68)/15 xy i i i P 2 P 2 i = = −0.0001938795 βˆ1 = = Sxx 139037.25 − (10.68)2/15 xi − ( xi ) /n 1425 10.68 − (−0.0001938795) = 0.7304186 . βˆ0 = y¯ − βˆ1 x¯ = 15 15 Thus the regression line is given by y = βˆ0 + βˆ1 x = 0.7304186 − 0.0001938795x . Part (b): This would be ∆y = βˆ1 (1) = −0.0001938795. 220 Part (c): We have y = βˆ0 + βˆ1 x = βˆ0 + βˆ1 9 = βˆ0 + 32βˆ1 + βˆ1 x˜ , 5 9 x˜ + 32 5 which give the new intercept and the new slope. Here x˜ is the temperature measured in Celsius. Part (d): I would think that we could use the regression results since 200 is inside the range of x values. Exercise 12.19 Part (a): We would use P P P xi yi − ( xi )( yi )/n Sxy ˆ P 2 P = β1 = Sxx xi − ( xi )2 /n βˆ0 = y¯ − βˆ1 x¯ . Using the numbers from the given dataset we compute βˆ1 = 1.711432 βˆ0 = −45.55 . Part (b): We need to evaluate βˆ0 + βˆ1 (225) = −45.55 + 1.711432(225) = 339.52. Part (c): This would be the value of βˆ1 times the change in liberation area or 1.711432(−50) = −85.57 . Part (d): The value of 500 is beyond the range of inputs and thus the regression line predictions are not to be trusted. Exercise 12.20 Part (a): From the MINITAB output we have βˆ1 = 0.9668 and βˆ0 = 0.36510. Part (b): We have βˆ0 + βˆ1 (0.5) = 0.36510 + 0.9668(0.5) = 0.8485. Part (c): We find σ ˆ = 0.1932. Part (d): We have SST = 1.4533 and r 2 = 0.717. 221 Exercise 12.21 Part (a): The value of r 2 is 0.985 a very “large” number. A scatter plot of the data looks like a line would fit quite well. Part (b): We have yˆ = βˆ0 + βˆ1 (0.3) = 32.1878 + 156.71(0.3) = 368.891. Part (c): This would be the same as in Part (b). Exercise 12.22 Part (a): This is ǫ=σ ˆ= Now SSE = SST − SSR = X yi2 − βˆ0 r SSE . n−2 X yi − βˆ1 X xi yi = 7.8518 − 1.41122(10.68) − (−0.00736)(987.64) = 0.049245 . q = 0.0615. So that we can evaluate ǫ = 0.049245 15−2 thus r 2 = 1 − SSR SST SSE = 1 − SST . To evaluate this we first compute X X SST = yi2 − ( yi )2 /n = 7.8518 − 10.682 /15 = 0.24764 , Part (b): Using r 2 = 0.049245 0.24764 = 0.8011. Exercise 12.23 Part (a): We want to consider the two definitions X SSE = (yi − yˆi )2 X X X SSE = yi2 − βˆ0 yi − βˆ1 xi yi . When I use R I get the same value of for each of these given by [1] "SSE_1= 16205.453351; SSE_2= 16205.453351" Part (b): We have SST = X yi2 − ( X yi )2 /n = 533760.0 , 222 and SSE = 0.9696 . SST With an r 2 this “large” we expect our linear model to be quite good. Note that these results don’t exactly match the back of the book. I’m not sure why. If anyone sees anything wrong with what I have done please contact me. r2 = 1 − Exercise 12.24 We will work this problem using R. Part (a): Based on the scatter plot a linear fit looks like a good model. Part (b): We get y = 137.876 + 9.312x. Part (c): This is r 2 . From the R output this looks to be given by 0.9897. Part (d): Dropping the sample where x = 112 and y = 1200 and refitting give the least squares line of y = 190.352 + 7.581x , which is very different from the line we obtained with that data point included in the fit. See Figure 10 where the line using all of the data points is given in red and the line with the deleted point in green. Notice how different the two lines are. Exercise 12.25 That b1 and b0 satisfy the normal equations can be shown by solving the normal equation to obtain solutions that agree with the books equations 12.2 and 12.3. Exercise 12.26 Lets check that x¯ and y¯ is on the line. Consider the right-hand-side of the least squares line evaluated at x = x¯. We have βˆ0 + βˆ1 x¯ = (¯ y − βˆ1 x¯) + βˆ1 x¯ = y¯ , showing that the given point (¯ x, y¯) is on the least-squares regression line. 223 1200 1000 800 DF$y 600 400 20 40 60 80 100 DF$x Figure 10: The two least squares fit of the data in Exercise 12.24. See the text for details. Exercise 12.27 (regression through the origin) The least square estimator for β1 is obtained by finding the value of βˆ1 such that the given SSE(β1 ) is minimized. Here SSE(β1 ) is given by X SSE(βˆ1 ) = (yi − βˆ1 xi )2 . Taking the derivative of the given expression for SSE(βˆ1 ) with respect to βˆ1 and setting the resulting expression equal to zero we find or X d SSE(βˆ1 ) = 2 (yi − βˆ1 xi )(−xi ) = 0 , dβˆ1 − X yi xi + βˆ1 Solving this expression for βˆ1 we find X x2i = 0 . P xi yi βˆ1 = P 2 . xi To study the bias introduced by this estimator of β1 we compute P 2 P x x E(y ) i i P 2 = β1 P i2 = β1 , E(βˆ1 ) = xi xi 224 (21) 2: r2= 0.99, sigma= 4.03 3: r2= 0.99, sigma= 1.90 df$y df$y −0.4 −0.2 0.0 0.2 0.4 0.6 10 20 36 0.6 38 20 40 0.8 40 30 1.0 60 42 df$y DF$y 1.2 40 44 80 1.4 46 50 100 1.6 48 60 1: r2= 0.43, sigma= 4.03 0.8 15 16 DF$x 17 18 19 20 df$x 10 20 30 40 50 df$x 10 20 30 40 50 df$x Figure 11: Left: A scatter plot of the data (xi , yi ) (in green) and the points (xi − x¯, yi ) (in red) for the data in Exercise 12.28. Right: showing that this estimator is unbiased. To study the variance of this estimator we compute X X 1 1 Var(x y ) = x2i Var(yi ) P 2 2 P 2 2 i i ( xi ) i ( xi ) i 2 2 X σ σ = P 2 2 x2i = P 2 , ( xi ) i i xi Var(βˆ1 ) = (22) the requested expression. An estimate of σ ˆ is given by the usual σ ˆ2 = SSE , n−1 which has n − 1 degrees of freedom. Exercise 12.28 Part (a): See Figure 11 (left) where the original (xi , yi ) points are plotted in green and the points (xi − x¯, yi ) points are plotted in red. The red points are a shift leftward of the green points. Thus the least squares line using (xi − x¯, yi ) will be a shift to the left of the least squares line using the data (xi , yi ). 225 Part (b): In the second model we have Yi = β0∗ + β1∗ (xi − x¯) = β0∗ − β1∗ x¯ + β1∗ xi , For this to match the first model means that β0 = β0∗ − β1∗ x¯ β1 = β1∗ . Solving for β0∗ and β1∗ we have β1∗ = β1 β0∗ = β0 + β1 x¯ . Exercise 12.29 From the plot given in Figure 11 (right) for the three data sets we see that two of the plots have a large r 2 value of 0.99 (plots 2 and 3). From these two plots the value of σ is smaller in the third one. In this case then the linear fit would be the “best”’. In the first data set the application of the linear fit would be the “worst”. Inferences About the Slope Parameter β1 Exercise 12.30 Part (a): From the text we have that σβˆ1 σ =√ Sxx with Sxx = n X i=1 (xi − x¯)2 . From the given data in this problem we compute x¯ = 2500 and Sxx = 7 106 thus 350 σβˆ1 = √ = 0.132228 . 7 106 Part (b): We have ! ˆ1 − β1 β 1.5 − β 1.0 − β 1 1 ≤ ≤ P (1.0 ≤ βˆ1 ≤ 1.5) = P σβˆ1 σβˆ1 σβˆ1 1.0 − 1.25 1.5 − 1.25 , n − 2 − pt , n − 2 = 0.8826 , = pt 0.132228 0.132228 using R notation pt for the cumulative t-distribution. 226 Part (c): In the case of the eleven measurements we have x¯ = 2500 (the same as before) but Sxx = 1.1 106 which is smaller than before. Thus σβˆ1 = √Sσxx in this case will be larger than in the seven measurement case. What is interesting about this is that one might think that having more measurements is always better and here is an example where that is not true since the spread in x of the eleven points is less than that of in the seven point case. Exercise 12.31 Part (a): To evaluate sβˆ1 we use sβˆ1 = √ s s = pP 2 . P Sxx xi − ( xi )2 /n q SSE and thus we need To use this we need to first compute s. This is computed using s = n−2 to evaluate SSE where we get X X X SSE = yi2 − βˆ0 yi − βˆ1 xi yi = 0.04924502 . Using this value we find sβˆ1 = 0.001017034. Part (a): To calculate a confidence interval we use the fact that the fraction β1 −βˆ1 sβˆ is given 1 by a t-distribution with n − 2 degrees of freedom. Using this we can derive the result in the book that a 100(1 − α)% confidence interval for the slope β1 is given by βˆ1 ± tα/2,n−2 sβˆ1 . (23) For this part we want α = 0.05 and have n = 15 so in R notation we find that tα/2,n−2 = qt(1 − 0.025, 13) = 2.1603 . With these the confidence interval for β1 is then given by [1] -0.009557398 -0.005163061 Exercise 12.32 Note that from the MINITAB output the p-value/t-statistic for the rainfall variable is 0.0/22.64 showing that the linear model is significant. Now the change in runoff associated with the given change in rainfall would be approximated using ∆runoff = βˆ1 ∆rainfall = βˆ1 (1) = 0.827 . A 95% confidence interval for this change would thus be the same as a 95% confidence interval for β1 and is given by βˆ1 ± tα/2,n−2 sβˆ1 . Here from Exercise 16 we have that n = 16 so that tα/2,n−2 = 2.14478 and the confidence interval obtained using the above is then (0.748, 0.905). 227 Exercise 12.33 Part (a): From the MINITAB output for Exercise 15 we have n = 27 and thus can compute tα/2,n−2 = qt(1 − 0.025, 25) = 2.059 , so a 95% confidence interval for β1 using Equation 23 is given by (0.0811, 0.1338). Part (b): For this part we want to know if the given measurements would reject the hypothesis that β1 = 0.1. As the above confidence interval in fact contains this value we cannot reject the hypothesis at the 5% level. Thus this sample of data does not contradict the belief that β1 = 0.1. Exercise 12.34 Part (a): In the rejection region approach we compute the confidence interval for β1 and see if the value of zero is inside or outside of it. If the point zero is inside the confidence interval we cannot reject the hypothesis that β1 = 0 and the linear model is not appropriate. Here we desire to have α = 0.01, so that with n = 13 we get tα/2,n−2 = qt(1 − 0.005, 11) = 3.105807 and the 99% confidence interval is given by (0.3987, 1.534). As zero is not included in this interval we cannot reject the hypothesis that β1 6= 0. Part (c): From the statement in the problem we are told that the previous belief on the = 1.5. Since the value of 1.5 is inside the 99% confidence value of β1 is such that β1 = 0.15 0.1 interval for β1 computed above this new data does not contradict our original belief. Exercise 12.35 Part (a): Using the given summary statistics we compute βˆ1 = 1.536018 with a 95% confidence interval given by (0.6321, 2.439). Part (b): Our p-value for β1 is computed as p-value = P (βˆ1 ≥ 1.536|β1 = 0) = P ! βˆ1 1.536 ≥ β1 = 0 = P sβˆ1 sβˆ1 βˆ1 ≥ 3.622|β1 = 0 sβˆ1 ! = 1 − pt(3.622, n − 2) = 0.00125 . As the p-value is less than 0.05 we conclude that this result is significant at the 95% level. Part (c): The value of does at 5.0 is outside of the range of observed samples so it would not be sensible to use linear regression since it would be extrapolation. Part (d): Without this observation we get β1 = 1.683 with a 95% confidence interval for β1 given by (0.53, 2.834). As the values of zero is not inside this 95% confidence interval 228 the regression is still significant even without this point and it does not seem to be exerting undue influence. Exercise 12.36 Part (a): If we plot the data (see the R code) a scatter plot looks like it can be well modeled by a linear curve. From the scatter plot we notice that there there is one point (965, 99) that could be an outlier or might exert undo influence on the linear regression. Part (b): The proportion asked for is the value of the r 2 coefficient which from the output from the lm function is given by 0.9307. Part (c): We want to know the increase in y when x is increased by 1000 − 100 = 900 which has a point estimate given by βˆ1 (900) = 6.211 10−4(900) = 0.55899 . The 95% confidence interval on this product is 900βˆ1 ± tα/2,n−2 (900sβˆ1 ) = (0.233, 0.8845) . Since the value of 0.6 is inside of this confidence interval there is not substantial evidence from the data that the true average increase in y would be less than 0.6. Part (d): This would be the point estimate and a confidence interval for the parameter β1 . Exercise 12.37 Part (a): We can compute a t-test on the difference between the two levels. We find a t-value given by -7.993149 which due to its magnitude is significant. Part (b): To answer this question we will compute the 95% confidence interval for β1 . Using the given data points we find this confidence interval given by (0.467, 0.840). Since all points in this interval are less than one we conclude that β1 is less than 1. Since the value of zero is not inside this interval we can also conclude that the linear relationship is significant. Exercise 12.38 Part (a): Using R and the data from Exercise 19 we get a t-value for β1 given by 17.168 and a p-value of 8.23 10−10. Thus this estimate of β1 is significant. Part (b): A point estimate of the change in emission rate would be estimated by 10βˆ1 . This point estimate would have a 95% confidence interval given by (14.94231, 19.28633). 229 Exercise 12.39 Part (a): From Example 12.6 we have n = 20 and calculated SSE = 7.968. Next we compute X X X SST = Syy = (yi − y¯)2 = yi2 − ( yi )2 /n = 124039.58 − 1574.82/20 = 39.828 . Using these two results we have SSR = SST − SSE = 31.86. Now that we have these items we can using Table 12.2 as a template where we find a f value for the linear fit given by given by SSR 31.86 f= = = 71.97 . SSE/(n − 2) 7.968/18 The model utility test would compare this value to Fα,1,n−2 = qf(1 − 0.05, 1, n − 2) = qf(1 − 0.05, 1, 20 − 2) = 4.413 . Since f ≥ Fα,1,n−2 the linear regression is significant. We find a p-value given by p-value = P (F ≥ 71.97) = 1 − pf(71.97, 1, n − 2) = 1 − pf(71.97, 1, 18) = 1.05 10−7 . To compare this result to that when we use the t-test of significance we first have to estimate β1 and σβ1 for the linear regression. From the text we have that βˆ1 = 0.04103 and sβˆ1 = 0.0048367 so the t-stat for β1 is given by t= βˆ1 = 8.483 . sβˆ1 Note that t2 = 71.97 = f as it should. Using this t-statistic we compute the p-value using p-value = P (|t| ≥ 8.483) = 1 − P (|t| < 8.483) = 1 − (pt(8.483, n − 2) − pt(−8.483, n − 2)) = 1.05 10−7 , the same value as calculated using the F -statistic. Exercise 12.40 We start with the expression for βˆ0 derived in this section of the text P P Yi − βˆ1 xi ˆ β0 = . n Then taking the expectation of both sides of this expression gives X 1 X ˆ E[Yi ] − β1 xi . E[β0 ] = n But by the linear hypothesis E[Yi ] = β0 + β1 xi and using this the above we get X 1 X (β0 + β1 xi ) − β1 xi = β0 , E[βˆ0 ] = n as it should. 230 Exercise 12.41 Part (a): We start with the expression for βˆ1 derived in this chapter given by X xi − x¯ . βˆ1 = ci Yi with ci = Sxx Taking the expectation of the above expression we get X X E[βˆ1 ] = ci E[Yi ] = ci (β0 + β1 xi ) X X = β0 ci + β1 ci xi . P Note that ci = 0 since X 1 1 X (xi − x¯) = ci = (n¯ x − n¯ x) = 0 . Sxx Sxx P Next consider the sum ci xi . We have X 1 X 1 X (xi − x¯)xi = (xi − x¯)(xi − x¯ + x¯) ci xi = Sxx Sxx X X 1 1 X 2 = (xi − x¯)2 = 1 . (xi − x¯) + x¯ (xi − x¯) = Sxx Sxx Combing these two results we have shown that E[βˆ1 ] = β1 . Part (b): By the independence of the Yi ’s we have X X σ2 σ2 σ2 X (xi − x¯)2 = 2 Sxx = , Var βˆ1 = c2i Var (Yi ) = σ 2 c2i = 2 Sxx Sxx Sxx the same expression for the variance of βˆ1 as given in the text. Exercise 12.42 The t-statistic for testing the hypothesis H0 : β1 = 0 vs. Ha : β1 6= 0 is given by the ˆ expression sβˆ1 . We will compute the new estimates for β1 and σβ1 under the transformation β1 suggested and show that the t-statistic does not change. xy As we compute the least-squares estimate of β1 using βˆ1 = SSxx so that under the given transformation our new β1 estimate (denoted βˆ1′ ) is related to the old β1 estimate (denoted βˆ1 ) as cd d βˆ1′ = 2 βˆ1 = βˆ1 . c c The estimate of β0 transforms as d ′ βˆ1 c¯ x = dβˆ0 . βˆ0 = d¯ y− c 231 The error sum of squares (SSE) transforms X d X X ′ 2 2 ˆ ˆ xi yi = d2 SSE . yi − β1 dc SSE = d yi − dβ0 d c q SSE ′ we have s′ = ds. Finally note that Sxx = c2 Sxx . Using all of these Thus using s = n−2 results we have s′ ds d s′βˆ1 = p = √ = sβˆ1 . ′ c c Sxx Sxx Combing these results our new t-statistic of the transformed points is given by (d/c)βˆ1 βˆ1 βˆ1′ = = , s′βˆ (d/c)sβˆ1 sβˆ1 1 the same expression for the t-statistic before the transformation. Exercise 12.43 Using the formula given with β10 = 1, β1′ = 2 and data from this problem we have a value of d given by |1 − 2| = 1.2034 . d = q 15−1 4 4022 11098− 15 As we have n−2 = 13 degrees of freedom and are considering a two-tailed test would consult the curves in the lower-right of Appendix A.17. There we β ∼ 0.1. Inferences Concerning µY ·x∗ and the Prediction of Future Y Values Exercise 12.44 Part (a): The formula for sYˆ is given by s sYˆ = s 1 (x∗ − x¯)2 , + n Sxx (24) and depends on the value of x∗ . Part (b): The confidence interval with tα/2,n−2 = qt(1 − 0.025, 25) = 2.059539 is given by yˆ ± tα/2,n−2 sYˆ , is given by (7.223343, 7.960657). 232 (25) Part (c): A 100(1 − α)% prediction interval is given by q yˆ ± tα/2,n−2 s2 + s2Yˆ . (26) q q SSE = 18.736 = 0.865702 and the prediction From Exercise 15 in Section 12.2 we have s = n−2 27−2 interval for Y is given by (5.771339, 9.412661). Part (d): We now want to know if we measure a 95% confidence interval for the average strength when the modulus of elasticity is 60 what is the simultaneous confidence interval for both measurements. We can use the Bonferroni inequality to conclude that the joint confidence level is guaranteed to be at least 100(1 − kα)% or since k = 2 this is 100(1 − 2(0.05)% = 100(0.9)% = 90%. Exercise 12.45 Part (a): Here α = 0.1 and we want the confidence interval for prediction which is given using Equation 24. In the R code for this section we compute the confidence interval [1] 77.82559 78.34441 Part (b): Here we want a prediction interval. Since all that changes is the standard error of the prediction interval we get [1] 76.90303 79.26697 This is an interval with the same center as the confidence interval but one that is wider. Part (c): These two intervals would have the same center but the prediction interval would be wider than then confidence interval. In addition, since x∗ = 115 is farther from the mean x¯ = 140.895 these intervals will be wider than the ones computed at x∗ = 125. Part (d): We compute 99% the confidence interval of β0 + 125β1 and see where the value of 80 falls relative to it. We find the needed confidence interval given by (77.65439, 78.51561). Note that every point in this interval is less than 80. Thus we can reject the null hypothesis is favor of the alternative. Exercise 12.46 Part (a): We compute a confidence interval for the true average given by (0.320892, 0.487108). 233 Part (b): The value of x∗ = 400 is farther from the sample mean of x¯ = 471.5385 than the value of x∗ = 500. Thus the confidence interval of the true mean at this point will be wider. Part (c): Here we are interested in δy = β1 δx = β1 and thus want to consider a confidence interval on β1 . This confidence interval is given by βˆ1 ± tα/2,n−2 sβˆ1 with sβˆ1 = √ s . Sxx Using the numbers in the text we compute this to be (0.0006349512, 0.0022250488) . Part (d): We now need to compute a confidence interval on the true average silver content when the crystallization temperature is 400. We find this given by (0.1628683, 0.3591317) since the value of 0.25 is inside of this interval the given data does not contract the prior belief. Exercise 12.47 Part (a): We could report the 95% prediction interval for Y when x = 40 and find (20.29547, 43.60613). This is a rather wide interval and thus we don’t expect to be able to make very accurate predictions of the future Y value. Part (b): With this new x value we compute the prediction interval given by (28.63353, 51.80747). The simultaneous prediction interval will be accurate with accuracy 100(1 − 2α)% or (100 − 10)% = 90% in this case. Exercise 12.48 Part (a): Plotting the data points shows that a line is not perfect but a good representation of the data points. Part (b): Using the lm function in R we get a linear fit given by yˆ = 52.62689 − 0.22036x . Part (c): This is given by the value of r 2 which is given by r 2 = 0.701. Part (d): This is the model utility test. From the summary command we see that the t-value for β1 is -4.331 with a p-value of 0.00251 indicating that the linear model is significant at the 0.3% level. Part (e): We want to consider ∆y = β1 ∆x = β1 (10) = 10β1 . We can compute a confidence interval for 10β1 . From the summary command we see that sβˆ1 = 0.05088 thus s10βˆ1 = 0.5088 234 and our confidence interval is given by (−3.376883, −1.030294). Since the value of −2 is inside this interval there is not strong evidence that the value of δy is less than -2. Part (f): We want the confidence interval for the true mean given x = 100 and x = 200. We compute these in the R code and find [1] 22.36344 38.81858 [1] -3.117929 20.228174 The mean of the x values is 125.1 which is closer to the value x = 100 than the value x = 200. We expect the confidence interval when x = 200 to be larger (as it is). Part (g): We want the prediction intervals for Y when x = 100 and x = 200. For each of these we find [1] 4.941855 56.240159 [1] -18.39754 35.50779 These intervals are larger that the same confidence intervals (as they must be). Exercise 12.49 The midpoint of the given confidence interval is the estimate yˆ and is given by 529.9. The range of the two points is 2tα/2,n−2 sYˆ . For the 95% confidence interval the value of tα/2,n−2 is given by 2.306004. Thus we can compute sYˆ and find the value of 29.40151. For the 99% confidence interval we compute a new value for tα/2,n−2 (since α is now different) and use yˆ and syˆ to compute the new interval. We get (431.2466, 628.5534). Exercise 12.50 Part (a): We want to consider the confidence interval for ∆y = β1 ∆x = β1 (.1) = 0.1β1 . In the R code we compute a 95% confidence interval for 0.1β1 given by (−0.05755671, −0.02898329). Part (b): This is confidence interval for the true mean of Y when x = 0.5. For a 95% confidence interval we find (0.4565869, 0.5541655). Part (c): This is a prediction interval for the value of Y when x = 0.5. For a 95% prediction interval we find (0.3358638, 0.6748886). Part (d): We want to guarantee 100(1 − 0.03)% joint confidence intervals for Y |x = 0.3, Y |x = 0.5, and Y |x = 0.7. Using the Bonferroni inequality we need to measure each 235 individual confidence intervals to a 99$ or α = 0.01 accuracy to guarantee that the joint confidence intervals are at 97%. We compute the three individual confidence intervals in the same was as earlier ones. Exercise 12.51 Part (a): The mean of the x values in this example is given by x¯ = 0.7495 from which we see that the points x = 0.4 and x = 1.2 are 0.3495 and 0.4505 away from x¯. Thus the confidence interval for x = 0.4 (since its closer to the mean) should be smaller than the confidence interval for x = 1.2. Part (b): We get (0.7444347, 0.8750933). Part (c): We get (0.05818062, 0.52320338). Exercise 12.52 We will use the R command lm to answer these questions. Part (a): A scatter plot of the points looks like a linear model would do a good job fitting the data. The r 2 of the linear fit is 0.9415 and the p-value for β1 is 1.44 10−5 all of which indicate that the linear model is a decent one. Part (b): We want to consider a confidence interval of ∆y = β1 ∆x = β1 (1) = β1 . A point estimate of β1 is given by 10.6026. The 95% confidence interval for β1 is given by (8.504826, 12.700302). Part (c): We get (36.34145, 40.17137). Part (d): We get (32.57571, 43.93711). Part (e): This depends on which point is farther from x¯ = 2.666667. Since x = 3 is farther from x¯ than x = 2.5 its intervals will be the wider ones. Part (f): The value x = 6 is outside of the x data used to fit the model. Thus it would not be recommended. Exercise 12.53 In the given example we find that x¯ = 109.34 and thus the point x = 115 is further away from x¯ than x = 110 is. Given this we can state that (a) will be smaller than (b), (c) will 236 be smaller than (d), (a) will be smaller than (c), and (b) will be smaller than (d). Exercise 12.54 Part (a): A scatter plots indicates that a linear model would fit relatively well. Part (b): Using R we find that the p-value for β1 is given by 0.000879 indicating that the model is significant. Part (c): We compute (404.1301, 467.7439). Exercise 12.55 (the variance of Yˆ ) Using the expression for Yˆ = βˆ0 + βˆ1 x derived in this section we have ! n n n X X X 2 2 ˆ ˆ ˆ Var Y = Var β0 + β1 x = Var di Yi = d Var (Yi ) = σ d2 . i i=1 i=1 i i=1 From the expression in the book for di we have 2 (x − x¯)(xi − x¯) (x − x¯)2 (xi − x¯)2 1 1 (x − x¯)(xi − x¯) 2 P + + P . = 2 +2 P di = n n n (xi − x¯)2 ( (xi − x¯)2 )2 (xi − x¯)2 Summing this expression for i = 1, 2, . . . , n − 1, n we get 2 (x − x¯) (x − x¯)2 1 (x − x¯)2 1 P . + P 0 + = + n n (xi − x¯)2 (xi − x¯)2 n Sxx Using the above we have shown that (x − x¯)2 2 1 ˆ ˆ ˆ Var Y = Var β0 + β1 x = σ +P , n (xi − x¯)2 (27) as we were to show. Exercise 12.56 Part (b): The 95% confidence interval of the population mean for Torque is given by (440.4159, 555.9841). Part (c): The t-value of β1 is 3.88 with a p-value of 0.002 which is less than 0.01 thus the result is significant at the 1% level. Part (d): Lets look at the prediction interval when the torque is 2.0. We compute (−295.2629, 1313.0629). This range is much larger than the range of observed y-values. 237 Correlation Exercise 12.58 Part (a): We get r = 0.9231564 indicating that the two variables are highly correlated. Part (b): If we exchange the two variables we get the same value for r (as we should). Part (c): If we change the scale of one of the variables the value of the sample correlation coefficient does not change. Part (e): Using the R command lm we find that the t-value on the β1 coefficient is given by 7.594 with a p-value of 1.58 10−5 indicating that the linear model is significant. Exercise 12.59 Part (a): We compute r = 0.9661835 indicating a strong linear relationship between . Part (b): Since r > 0 we expect that the percent of dry fiber in the first specimen would also be larger in than the second specimen. Part (c): r would not change since the value of r is independent of the units used to measure x or y. Part (d): This would be the value of r 2 which is given by 0.9335106. Part (e): Lets compute a test for the absence of correlation. We compute R the sample correlation and then our test statistic T is given by √ R n−2 T = √ . (28) 1 − R2 For the data here we compute t = 14.98799. Now T above is given by a t-distribution with n − 2 degrees of freedom. If we take α = 0.01 and find tα,n−2 = 2.583487. Since our t ≥ tα,n−2 we can reject the null hypothesis of no correlation ρ = 0 in favor of the hypothesis of a positive correlation ρ > 0. Exercise 12.60 We want to compute the confidence interval for the population correlation coefficient ρ. The null hypothesis is that ρ = 0 and from the data (since r = 0.7217517) we hypothesize that maybe ρ > 0. To test this we compute T given by Equation 28 and find t = 3.612242. The 238 p-value for this value is given by 0.001782464. Note that the summary statistics given in this problem don’t match the data from the book provided text file ex12-60.txt. For example, the data file has only 14 samples rather than the 24 claimed in the problem statement. Exercise 12.61 Part (a): The data itself gives r = 0.7600203. Part (b): The proportion of observed variation in endurance that could be attributed to a linear relationship is given by r 2 = 0.5776308. As the value of r does not change if we exchange the meaning of the x and y variables the proportion of variation explained would not change either. Note that the summary statistics given in this problem (for example Sxx = 36.9839) don’t match what one would compute directly from the data Sxx = 2628930. In fact it looks like x and y are exchanged. Exercise 12.62 Part (a): We compute t = 1.740712 and a p-value given by 0.0536438. Thus at the 5% level the population correlation does not differ from 0. Part (b): This would be r 2 = 0.201601. Exercise 12.63 For this data we get r = 0.7728735 with a t-value of t = 2.435934. This gives a p-value of 0.03576068. This is significant at the 5% level. Note that the book computes tα, n − 2 to be 2.776 (indicating no rejection of the null hypothesize) while I compute it to be 2.131847 which would allow rejection (at the 5% level). Exercise 12.64 Part (a): From the numbers given we compute r = −0.5730081. Using this we get ν = −0.65199 using 1+r 1 . (29) ν = ln 2 1−r 239 Then compute the endpoints zα/2 zα/2 ,ν + √ , ν−√ n−3 n−3 from which we get (c1 , c2 ) = (−0.9949657, −0.3090143). Using these the confidence interval for ρ is given by 2c1 e − 1 e2c2 − 1 . , e2c1 + 1 e2c2 + 1 For this we get (−0.7594717, −0.2995401). Part (b): We now want to test H0 : ρ = −0.5 versus Ha : ρ < −0.5 with α = 0.05. Following the section in the book, we compute z = −0.4924543. This is to be compared with −zα = −1.959964. Since our value of z is not less than we cannot reject the null hypothesis. Part (c): Since we are considering in-sample results this would be r 2 = 0.3283383. Part (d): Since now we are considering the population value of ρ we only know a range in which it must lie and thus a range of the variance that can be explained. This range would be the square of the confidence interval for ρ computed above or (0.08972426, 0.57679732). Exercise 12.65 Part (a): Analysis is difficult due to the small sample size. A histogram of the data for x seems to be skewed i.e. it does not seem to have values above 110. A qqnorm plot of x has some curvature in it. A qqnorm plot of y looks very streight. The histogram looks normal also. Part (b): Lets test the hypothesis that H0 : ρ = 0 vs. the alternative that Ha : ρ > 0. Computing the confidence interval for ρ we get (0.518119, 0.987159). Since zero is not in this range we can conclude that there is a linear relationship between x and y and reject the null hypothesis. Exercise 12.67 Part (a): The P -value is less than the value of 0.001 and thus we can reject the null hypothesis at this level of significance. Part (b): Given the P -value we can compute the t value and then the value of r. Since the P -value in this case is given by P − value = P (T ≥ t) + P (T ≤ −t) = 2P (T ≤ −t) = 2pt(−t, n − 2) . 240 Using this we compute t = 3.623906. Then since T is related to R using Equation 28 we can solve for R to get R = 0.1603017 so that the variance explained is given by R2 = 0.02569664 or only about 2%. Part (c): We get a P -value now given by 0.01390376 and thus is significant at the 5% level. Notice however that the value of r is very small and the fraction of variance explained would be r 2 = 0.000484 so small that this result is to be of no practical value. Supplementary Exercises Exercise 12.68 Part (a): When we plot Height as a function of Price we see that there is not a deterministic relationship but a relationship that is very close to linear. Part (b): In the R code we plot a scatterplot of the data. It suggests that a linear relationship will work quite well. Part (c): Using the R code lm we get the model Price = 23.77215 + 0.98715Height . Part (d): Using the R function predict with the option interval=’prediction’ and we get a prediction interval given by fit lwr upr 1 50.42522 47.3315 53.51895 Part (e): This is the value of R2 which we can read from the summary report to be the value of 0.9631. Exercise 12.69 Part (a): We are to compute the 95% confidence interval on the coefficient β1 . When we do that we get the values [1] 0.8883228 1.0859790 Part (b): In this case we want the confidence interval of the mean value of Price when Height=25. Using the R code predict with the option interval=’confidence’ and we get a confidence interval for the mean given by 241 fit lwr upr 1 48.45092 47.73001 49.17183 Part (c): This is the prediction interval and we find fit lwr upr 1 48.45092 45.37787 51.52397 Notice that the prediction interval and the confidence interval have the same center but the prediction interval is wider than the confidence interval. Part (d): The mean of the Height variable is 22.73684. The sample point x = 30 is farther from the mean than the point x = 25 thus the confidence and the prediction intervals for this point will be wider than the ones when x = 25. Part (e): We can calculate the sample correlation coefficient by taking the square root of the coefficient of determination r 2 = 0.9631 to get r = 0.9813766. Exercise 12.70 First we follow the strategy suggested in this exercise. For this we will compute the prediction interval of the linear regression when X.DAA=2.01 and see where the value 22 falls relative to it. We find the prediction interval given by fit lwr upr 1 28.76064 17.39426 40.12701 Since the value of 22 is inside of this prediction range it is possible that the individual is of age 22 years or less. Next we consider the “inverse” use of the regression line by first fitting the model %DAA = β0 + β1 Age . Then with this model we estimate Age given a value of %DAA = 2.01 by solving the above for Age. The linear regression we obtain is given by %DAA = 0.58174 + 0.04973Age , so that when we take %DAA = 2.01 and solve for Age we get Age = 28.72179. Using the formula in this exercise we get a standard error of SE = 0.4380511. This gives us a 95% prediction interval given by 242 [1] 17.12924 40.31435 Note that this range about the same as the range computed first using the prediction interval approach. Thus we again conclude that since the value of 22 is inside this prediction range it is possible that the individual is of age 22 year or less. Exercise 12.71 Part (a): The change in the rate of the CI-trace method of ∆y would equal β1 ∆x or β1 times the change in the rage of the drain-weight method. If these two rate changes are to be equal we must have β1 = 1. We will thus compute a confidence interval on β1 and observe if the value of 1 is inside of it. If so this indicates that the data does not contradict the hypothesis that β1 = 1. We find our confidence interval for β1 given by [1] 0.7613274 1.0663110 Since the value of 1 is inside of this region the stated result is plausible. Part (b): This would be given by the square root of the coefficient of determination or r 2 . Taking this square root we get r = 0.9698089. Exercise 12.72 Part (a): From the degrees of freedom column we since the the C Total degrees of freedom is 7. Since this is equal to n − 1 we have that n = 8. Part (b): Using the given parameter estimates we have 326.976038 − 8.403964(35.5) = 28.63532 . Part (c): Given that the R-squared is 0.9134 we would conclude that yes there seems to be a significant linear relationship. Part (d): The sample correlation coefficient would be the square root of the coefficient of determination or r 2 . From the SAS output we get r = 0.9557196. Part (e): Loading in the data we see that the smallest value of the variable CO is 50. As we are asking to evaluate the model where CO is 40 this would not be advised. 243 Exercise 12.73 Part (a): This would be the value of the coefficient of determination or r 2 = 0.5073. Part (b): This would be the square root of the value of r 2 where we find r = 0.71225. Part (c): We can observe the P -value for either the F -value for the entire model or the T -value for the coefficient β1 (they must be the same for simple linear regression). We see that this P -value is 0.0013. Since this is smaller than the value of 0.01 we have that the linear model is significant. Part (d): We would want to compute a confidence interval when the content is x = 0.5. Using the numbers from the SAS output we find a 95% confidence interval given by (see the R code) [1] 1.056113 1.275323 The center of this confidence interval is the point prediction. Part (e): We would predict 1.014318. The residual of the prediction when x = 30 is 0.8 − 1.014318 = −0.214318. Exercise 12.74 Part (a): When we plot the data for this problem we see one point with a very large value of CO that could be an outlier. It does appear to be close to the linear fit that would be present if it was removed and thus might not problematic to the linear fit. Using the R command lm we get a P -value for the fit O(10−6) and an r 2 = 0.9585 indicating a good linear fit. Part (b): Computing a prediction interval when CO is 400 gives fit lwr upr 1 17.22767 11.95623 22.4991 Part (c): We can remove the largest observation and refit to see if it has any impact on the coefficients of the linear model. When we do this we get > m$coefficients (Intercept) CO -0.2204143 0.0436202 > m2$coefficients 244 (Intercept) 1.00140419 CO 0.03461564 Here m is the model fit to all of the data and m2 is the model with the potential outlier removed. We see that the value of intercept coefficient has changed. This is not surprising since the P -value for this coefficient is large in both fits and its also not necessarily significant since the data points are relatively far from the origin. We also see that the slope coefficient has not changed that much in-line with what we argued above. Thus for the data that remains (after we remove the potential outlier) it does not appear to have a significant impact. Exercise 12.75 Part (a): This would be given by the linear regression rate = 1.693872 + 0.080464 speed . Part (b): This would be given by the linear regression speed = −20.056777 + 12.116739 rate . Part (c): The coefficient of determination or r 2 would be the same for each regression. We find it given by 0.974967. Exercise 12.76 Part (a): Using the R function lm we get a linear fit given by y = −115.195 + 38.068x , with an R2 = 0.5125. Given this low value of R2 we conclude that a linear fit is not that great of a modeling procedure for this data. A scatterplot of the data points confirms this in that there seems to be two clusters of points. Part (b): If a linear model is appropriate this would be a test on the value of the parameter β1 . We can compute a confidence interval on this parameter and find [1] 16.78393 59.35216 This is a rather wide confidence interval and the value of 50 is inside of it so we cannot reject the null hypothesis that β1 = 50. 245 Part (c): Since the expression for the standard error for β1 is given by sβˆ1 = √ s , Sxx under the given suggestion for changing the sample x values the expression for s should not change while Sxx will. In the original formulation we compute that Sxx = 148.9108 while under the new suggested sampling scheme we compute Sxx = 144. Since this value is smaller than the original value of Sxx we will have the new value of sβˆ1 larger and thus have lost accuracy. It is better to stick with the more dispersed original x values. Part (d): We want confidence and prediction intervals when x = 18 and x = 22. We find the confidence intervals for these two points given by fit lwr upr 1 570.0294 451.8707 688.1881 2 722.3016 655.9643 788.6388 The prediction intervals for these two points are given by fit lwr upr 1 570.0294 284.6873 855.3715 2 722.3016 454.2358 990.3673 Exercise 12.77 Part (a): A scatter plot of this data looks very linear. Part (b): The linear fit would be given by y = −0.082594 + 0.044649x . Part (c): This would be R2 which is given by R2 = 0.9827. Part (d): The prediction from the linear model when x = 19.1 is given by yˆ = 0.7701935. The residual would be 0.68 − 0.7701935 = −0.09019349. Part (e): The P -value for the linear fit is O(10−7) indicating a strong linear relationship. Part (f): If a linear model holds then we would have ∆y = β1 ∆x = β1 and a confidence interval on this change is a confidence interval on β1 . We compute this to be [1] 0.03935878 0.04993829 246 The center value of this interval or 0.04464854 is the point estimate. Part (g): In this case we want a confidence interval on 20β1 . Which is just 20 times the interval from above or [1] 0.7871756 0.9987658 Exercise 12.78 (the standard deviation of β0 ) Using the formulas in the book we have 2 1 x ¯ ∗ 2 Var βˆ0 + βˆ1 x = Var βˆ0 = σ , + x∗ =0 n Sxx so the standard error of βˆ0 is estimated by 1/2 1 x¯2 sβˆ0 = s . + n Sxx and the confidence interval for β0 is then given by βˆ0 ± tα/2,n−2 sβˆ0 . For the data in Example 12.11 we get an estimate standard deviation of βˆ0 given by 0.187008 and a 95% confidence interval for β0 given by [1] 125.8067 126.6911 Exercise 12.79 (and expression for SSE) From the definition of SSE we have that X X SSE = (yi − yˆi )2 = (yi − (βˆ0 + βˆ1 xi ))2 . With βˆ0 = y¯ − βˆ1 x¯ the above is X X [yi − y¯ + βˆ1 x¯ − βˆ1 xi ]2 = [yi − y¯ − βˆ1 (xi − x¯)]2 X = [(yi − y¯)2 − 2βˆ1 (yi − y¯)(xi − x¯) + βˆ12 (xi − x¯)2 ] X = Syy − 2βˆ1 (yi − y¯)(xi − x¯) + βˆ12 Sxx Since βˆ1 = Sxy Sxx = Syy − 2βˆ1 Sxy + βˆ12 Sxx . we get Sxy SSE = Syy − 2 Sxx as we were to show. Sxy + 2 2 Sxy Sxy = Syy − = Syy − βˆ1 Sxy , Sxx Sxx 247 Exercise 12.80 I think the answer is no. If x and y are such that r ≈ 1 then x and y almost lie on a streight line with a positive slope. Thus variable y is linearly related to x. The variable y 2 would then be quadratically related to x and thus would not lie on a straight line when regressed against x as it would if r ≈ 1 thus r 2 should be different than 1. Exercise 12.81 Part (a): We start with y = βˆ0 + βˆ1 x and replace βˆ0 = y¯ − βˆ1 x¯ to get y = y¯ + βˆ1 (x − x¯) . Next recall that βˆ1 and r are given/defined by P (xi − x¯)(yi − y¯) S xy P = βˆ1 = Sxx (xi − x¯)2 and r = √ so we can write βˆ1 as Sxy p , Sxx Syy s sP s P p √ p 1 2 2 (y − y ¯ ) s2y r S S S sy (yi − y¯) i xx yy yy n−1 ˆ P P √ =r =r . β1 = =r =r = r√ 1 2 2 2 (xi − x¯) sx sx (xi − x¯) Sxx Sxx n−1 Thus y = y¯ + r sy (x − x¯) , sx as we were to show. Part (b): Using the data from Ex. 12.64 we compute that r = −0.5730081. If our patients x = −1. Thus average age is below the average age by one standard deviation then x−¯ sx r sy (x − x¯) = −rsy = +0.5730081sy , sx and the patients predicted ∆CBG is then 0.5730081 standard deviations above the average ∆CBG. Exercise 12.82 We start with the t-stat test for H0 : ρ = 0 where √ R n−2 . T = √ 1 − R2 248 Since R = Sxy 1/2 1/2 Sxx Syy we have that R2 = 2 Sxy Sxx Syy so 2 Sxx Syy − Sxy . 1−R = Sxx Syy 2 Thus T becomes T = 1/2 √ Sxy n − 2 =p . 2 2 Sxx Syy −Sxy Sxx Syy − Sxy √ Sxy n − 2 √ 1/2 Sxx Syy 1/2 1/2 Sxx Syy For the t-statistic of the test H0 : β1 = 0 we have √ Sxy ˆ β1 − β1 βˆ1 Sxy n − 2 Sxx √ T = = s = = 1/2 √ . s 1 √ SSE 1/2 1/2 S SSE xx 1/2 Sxx S n−2 xx β =0 Sxx 1 Using various relationships we have X X X SSE = yi2 − βˆ0 yi − βˆ1 xi yi P 2 X ( yi ) = Syy + − (¯ y − βˆ1 x¯)n¯ y − βˆ1 xi yi n X = Syy + n¯ y 2 − n¯ y 2 + n¯ xy¯βˆ1 − βˆ1 xi yi X = Syy + βˆ1 n¯ xy¯ − xi yi Using Equation 30 we have SSE = Syy − βˆ1 Sxy = Syy − 2 Sxy . Sxx Thus the t-statistic of the test H0 : β1 = 0 becomes √ Sxy n − 2 T =p , 2 Sxx Syy − Sxy the same as we had earlier showing the two are the same. Exercise 12.83 We start with the “computational” expression for SSE (the expression the book recommends to use in computing SSE). X X SSE = (yi − yˆi )2 = [yi − (βˆ0 + βˆ1 xi )]2 X X X = yi2 − βˆ0 yi − βˆ1 xi yi . P Next recall that the total sum of squares SST is given by (yi − y¯)2 = Syy by the definition of Syy thus we have that P 2 P yi xi yi n¯ y SSE = − βˆ0 − βˆ1 SST Syy Syy Syy 249 Using the facts that βˆ0 = y¯ − βˆ1 x¯ Sxy βˆ1 = Sxx X X Syy = yi2 − ( yi )2 /n so we get for the ratio SSE SST X yi2 = Syy + (n¯ y )2 /n = Syy + n¯ y2 , above given by P xi yi Syy + n¯ y2 Sxy Sxy n¯ y SSE = − y¯ − x¯ − SST Syy Sxx Syy Sxx Syy X Sxy Sxy x¯y¯ − xi yi . =1+n Sxx Syy Sxx Syy Next lets consider Sxy . We find X X Sxy = (xi − x¯)(yi − y¯) = (xi yi − xi y¯ − x¯yi + x¯y¯) X X = xi yi − n¯ xy¯ − n¯ xy¯ + n¯ xy¯ = xi yi − n¯ xy¯ . (30) SSE we get xi yi and putting it in the expression for SST SSE Sxy Sxy x¯y¯ − (Sxy + n¯ xy¯) =1+n SST Sxx Syy Sxx Syy 2 Sxy =1− . Sxx Syy Using this by solving for P From the definition of r we see that this expression is 1 − r 2 . Exercise 12.84 Part (a): A scatter plot of the data looks like a linear fit might be appropriate. Part (b): Using the R command lm we find that a point prediction is given by 98.29335. Part (c): This would be the estimate of σ which from the summary command we see is given by 0.1552. Part (d): This would be the value of R2 from the summary command again we see that this is given by 0.7937. Part (e): A 95% confidence interval for β1 is given by [1] 0.06130152 0.09008124 250 The center of this interval is 0.07569138. Part (g): When we add this new data point and then refit our linear model we get the two different models (the first is the original model and the second is the new model) Call: lm(formula = Removal ~ Temp, data = DF) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 97.498588 0.088945 1096.17 < 2e-16 *** Temp 0.075691 0.007046 10.74 8.41e-12 *** Residual standard error: 0.1552 on 30 degrees of freedom Multiple R-squared: 0.7937, Adjusted R-squared: 0.7868 Call: lm(formula = Removal ~ Temp, data = DF_new) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 97.27831 0.16026 607.018 < 2e-16 *** Temp 0.09060 0.01284 7.057 6.33e-08 *** Residual standard error: 0.2911 on 31 degrees of freedom Multiple R-squared: 0.6163, Adjusted R-squared: 0.604 This new point seems to change the model coefficients (especially β1 ) it increases the estimate of σ and decreases the value of R2 . All of these indicate that this data point does not well fit the other data points. Exercise 12.85 A plot of the data looks like a linear fit would do a reasonable job at modeling this data. From the summary command in R we see that a linear model is significant (the P -value for the F -statistics is small) and that the coefficients are estimated well (small P -values). The percentage of the observed variance explained by the linear model is 0.6341. Exercise 12.86 Part (a): A plot of the data points indicates that a linear model is appropriate. The linear fit give HW = 4.7878 + 0.7432 BOD . If the two techniques measure the same amount of fat then we would expect β1 ≈ 1. A 95% confidence interval of β1 is given by 251 [1] 0.5325632 0.9539095 Since the value of 1 is not in this interval indicating that they do not measure the same amount of fat. Part (b): We could fit a linear model with y = HW and x = BOD which is the “inverse” of the model above. We get that this model is given by HW = 0.581739 + 0.049727 BOB . Exercise 12.87 From the given description and the data we compute that T = 0.2378182 while when H0 is true T is given by a t-distribution with n1 + n2 − 4 degrees of freedom. Since a 95% confidence interval would constrain T to the values such that |t| ≤ 2.048407 we see that we cannot reject H0 and therefore conclude that the two slopes are equal. 252 Nonlinear and Multiple Regression Problem Solutions Note all R scripts for this chapter (if they exist) are denoted as ex13 NN.R where NN is the section number. Exercise 13.1 Part (a): Recall the equation for the variance of the ith residual given in the book 2 1 (x − x ¯ ) i 2 V (Yi − Yˆi ) = σ 1 − − . n Sxx (31) Using this on the given data gives standard deviations for each residual given by [1] 6.324555 8.366600 8.944272 8.366600 6.324555 Part (b): For this different set of data (where we have replaced the point x5 = 25 with x5 = 50) we get [1] 7.874008 8.485281 8.831761 8.944272 2.828427 Notice that in general all standard deviations (except the last) than they were before. Part (c): Since the last point has a smaller standard deviation in the second case the least squares fit line must have moved to be “closer” to this data point. Thus the deviation about the estimate line in this case must be less than previously. Exercise 13.2 The standardized residual plot for this question would plot the pairs (xi , e∗i ). When we do that we get the plot where all e∗i values are bounded between −2 and +2. Thus there is nothing alarming (in that the model was not fitting the data) to be seen in this plot. Exercise 13.3 Part (a): A plot of the raw residuals as a function of x shows samples above and below the horizontal line y = 0. This plot does not show any atypical behavior of the model. 253 Part (b): Computing the standardized residuals and the ratio ei /s give values that are very close. Part (c): The plot of the standard residuals as a function of x looks much the same as a plot of the residuals as a function of x. Exercise 13.4 Part (a): I don’t see much pattern in the plot of the residuals as a function of x. Thus this plot does not contradict the assumed linear fit. Part (b): The standardized residuals when plotted seem to have values bounded by O(1.5) which does not contradict the assumed linear model. Exercise 13.5 Part (a): The fact that the R2 is so large indicates that the linear model is quite a good one for this data. Part (b): A plot of the residuals as a function of Time indicates that there might be some type of nonlinear relationship of the form (Time − 3)2 and thus we might want to add this quadratic predictor. The location of the “peak” of this quadratic term (at 3) was specified “by eye”. If one wanted to not have to specify that free parameter (the peak of the quadratic) one could simply add the term Time2 to the existing regression or use the mean or the median of the x values for the centering location. Exercise 13.6 Part (a): The R2 = 0.9678 being so large indicates that a linear model fits quite well to this data. Part (b): A plot of the residuals as a function of x show a model with a quadratic term might be more appropriate for this data (rather than just a linear model). Part (c): A plot of the standardized residuals as a function of x does not show any significant deviation from the expected behavior i.e. the values are not too large (bounded by ±2). This plot still indicates that a quadratic model should be considered. 254 1.5 8 10 12 0.5 rstandard(m) 6 −1.5 −0.5 DF$Y1 4 5 6 7 8 9 10 11 4 14 5 6 7 8 9 10 9 10 9 10 rstandard(m) −1.5 −0.5 0.5 1.0 m$fitted.values 3 4 5 6 7 8 9 DF$Y2 DF$X13 4 6 8 10 12 14 5 6 7 8 m$fitted.values 1 rstandard(m) −1 0 10 6 8 DF$Y3 2 12 3 DF$X13 4 6 8 10 12 14 5 6 8 0.5 12 −0.5 sres 10 8 −1.5 6 DF$Y4 7 m$fitted.values 1.5 DF$X13 8 10 12 14 16 18 7 DF$X4 8 9 10 11 12 m$fitted.values Figure 12: Scatter plots and residual plots for Exercise 13.9. Exercise 13.7 Part (a): A scatter plot of the data does not look that linear. In fact, it seems to have an increasing component and then a decreasing component indicating that a quadratic model may be more appropriate. Part (b): A plots of the standardized residuals as a function of x don’t show any very large values for the standardized residuals. The residual plot still indicates that a quadratic term might be needed in the regression. Exercise 13.8 A plot of the standardized residuals as a function of HR shows that there are two values of HR that have relatively large values of the standardized residual. The values of the standardized residuals have values near ±2 and while this is not large in an absolute measure they are the largest found in the data set. Thus if one has to look for outliers these would be the points to consider. 255 Exercise 13.9 In Figure 12 (in the first column) we present scatter plots of each of the four data sets given in this problem. From this, we can see that even though each data set has the ”same” second order statistics the scatter plots look quite different. In the second column we present residual plots of this data.1 The residual plots give an indication as to how well the linear fit is performing. Starting at the first row we see that the data has a linear form and the residual plot shows no interesting behavior (i.e. looks like the expected plot when a linear model is a decent approximation to the data). In the second row our scatter plot has a curvature to it that will not be captured by a linear model. The residual plot shows this in that it also has a quadratic (curved) shape. The third row shows a scatter plot that seems to look very linear but that has an extream outlier near the right end of the x domain. The standardized residual plot shows this and we see a relatively large (order of 3) value for this same sample. This gives an indication that this point is an outlier. The fourth plot has almost all of the data at a single point x = 8 with a single point at the value x = 19. The R implementation of the standard residuals in rstandard gives NaN for this data point. To alleviate this we compute the standard residuals ”by hand” using the formula in the book. In the plot that results we see that this point is separated from the main body of the data just as it is in the scatter plot. Exercise 13.10 Part (a): Our predictions are given by yˆi = βˆ0 + βˆ1 xi with βˆ1 = consider the residual ei given by ei = yi − (βˆ0 + βˆ1 xi ) , Sxy Sxx and βˆ0 = y¯ − βˆ1 x¯. Now we have that the sum of the ei is given by X X X ei = yi − βˆ0 n − βˆ1 xi = n¯ y − n(¯ y − βˆ1 x¯) − βˆ1 n¯ x = 0, as we were to show. Part (b): I believe that because the error terms that produce Yi from β0 + β1 xi are independent between Yi and Yj (under the assumptions of simple linear regression) that the residuals would be also, but I’m not sure how to prove this. If anyone knows how to prove this or can supply any more information on this please contact me. Part (c): Consider the given sum. We have from the definition of ei that X X xi ei = (xi yi − xi βˆ0 − x2i βˆ1 ) X X = xi yi − βˆ0 n¯ x − βˆ1 x2i X X = xi yi − n¯ x(¯ y − βˆ1 x¯) − βˆ1 x2i 1 For this problem we assume a residual plot means a plot with the fitted values yˆ on the x-axis and the standardized residuals of the linear fit on the y axis. Another interpretation of a residual plot could be to plot the independent variable on the x-axis and the standardized residuals from the model fit on the y-axis. 256 Next using the definition of Sxy and Sxx we can show X X Sxy = xi yi − n¯ xy¯ and Sxx = x2i − n¯ x2 . P Using these expressions in what we have derived for xi ei gives X X X x2i xi ei = xi yi − n¯ xy¯ + n¯ x2 βˆ1 − βˆ1 X 2 ˆ = Sxy + n¯ xy¯ − n¯ xy¯ + β1 n¯ x − x2i = Sxy − βˆ1 Sxx = Sxy − Sxy = 0 , as we were to show. P ∗ the stanPart (d): It is not true that ei = 0. We can see this by actually computing P ∗ ∗ ei 6= 0. This is dardized residuals ei for an linear regression example P ∗and showing that done in the R code for Ex.13-3 where we get that ei = 0.2698802 6= 0. Exercise 13.11 Part (a): Using the expression from the previous chapter that relates Yˆi in terms of Yj we have that n X 1 (xi − x¯)(xj − x¯) Yˆi = dj Yj with dj = + . n S xx j=1 Using this we have the ith residual given by Yi − Yˆi = (1 − di )Yi − X dj Yj . j6=i As Yi and Yj are independent using the rule of variances we have that ) ( ) ( n X X X 2 2 2 2 2 2 2 2 2 dj . dj = σ 1 − 2di + Var Yi − Yˆi = (1−di ) σ + dj σ = σ 1 − 2di + di + j6=i j6=i Using the result form Exercise 12.55 on Page 237 we have that n X j=1 d2j = 1 (xi − x¯)2 + . n Sxx Thus the variance of the residual is given by 2 2 1 1 (x − x ¯ ) (x − x ¯ ) i i 2 + Var Yi − Yˆi = σ 1 − 2 + + n Sxx n Sxx 2 1 (xi − x¯) = σ2 1 − − , n Sxx the expression we were to show. 257 j=1 Part (b): If Yˆi and Yi − Yˆi are independent then by taking the variance of both sides of the identity Yi = Yˆi + (Yi − Yˆi ) , we get σ 2 = Var Yˆi + Var Yi − Yˆi or Var Yi − Yˆi = σ 2 − Var Yˆi . Using Equation 27 in the above we get 2 (x − x ¯ ) 1 i 2 2 , + Var Yi − Yˆi = σ − σ n Sxx which when we simplify some is the same as before. Part (c): From Equation 27 we see that as x moves from x¯ Var Yˆi increases while from the equation above we see that under the same condition Var Yi − Yˆi decreases. Exercise 13.12 For this exercise we will use properties derived in Exercise 13.10 above. Part (a): Note that as required for residuals we don’t have the given values cannot be residuals. Part (b): Note that as required for residuals we don’t have the given values cannot be residuals. P P ei = 0 as we should. Thus xi ei = 0 as we should. Thus Exercise 13.13 A similar expression for the residuals would be Z= yi − yˆi − 0 σ 1− 1 n − (xi −¯ x)2 Sxx 1/2 , which would have a standard normal distribution. If we estimate σ with s = above expression would have a t-distribution with n − 2 degrees of freedom. q SSE n−2 then the The requested probability is given by 2 * pt( -2.5, 25-2 ) and equals 0.01999412. Exercise 13.14 Part (a): Here we follow the prescription given in the problem and compute SSE = 7241.013, SSPE = 4361, and SSLF = 2880.013. This gives an F statistics of f = 3.30201. If we 258 compute the critical value of the F -distribution with the given degrees of freedom (at the 5% level) we get a critical value of 4.102821. As our statistic is not larger than this we cannot conclude that the results are significant, cannot reject the H0 hypothesis, and have no evidence to conclude that the true relationship is nonlinear. Note if we decrease our level of significance to 10% we get a critical value of 2.924466 which is less than our statistic indicating we can reject the hypothesis H0 . To make using this analysis easier the test H0 vs. Ha was written in a function H0 linear model.R. This function computes the F statistic for our test and the probability of receiving a value of F that large or larger by chance. For this data we get that probability to be 0.07923804. Since this is larger than 5% and less than 10% we cannot reject H0 at the 5% level but we can at the 10% level. In agreement with the statement earlier. Part (b): A scatter plot of this data does look nonlinear in that the response y seems to decrease as x increases. This is in contradiction to the above test at a significance of 5% but not at 10%. Notes on regressions with transformed variables If our probabilistic model is Y = αeβx ǫ then E[Y ] = αeβx E[ǫ] . 2 If ǫ is log-normal i.e. log(ǫ) ∼ N (µ, σ 2) then it can be shown that E[ǫ] = eµ+σ /2 . This latter expression will be near one if µ = 0 and σ 2 ≪ 1 and we can conclude that in such cases E[Y ] ≈ αeβx . Exercise 13.15 Part (a): A scatter plot shows what looks to be exponential decay. Part (b): A scatter plot of the logs of both x and y looks much more linear. Part (c): We would have log(Y ) = β0 + β1 log(X) + ǫ or Y = αxβ ǫ or a multiplicative power law. Part (d): Our estimated model is log(Y ) = 4.638 − 1.049 log(X) + ǫ with σ = 0.1449077. If we want a prediction of moisture content when x = 20 we have log(20) = 2.995732 and the prediction of yˆ′ from our linear model gives 1.495298 with a 95% prediction interval of (1.119196, 1.8714). If we transform back to the original coordinates we get a mean value for y given by 4.460667 and a prediction interval for Y of (3.062392, 6.497386). Part (e): Given the small amount of data the residual plots look good. 259 Exercise 13.16 The plot of Load vs. log(Time) looks like it could be taken as linear. The estimate of β1 is given by -0.4932 which has p-value given by 0.000253 indicating that it is estimated well with the given data. When the load is 80 I get a center and a 95% prediction interval for the linear model of (2.688361, −2.125427, 7.50215) and which becomes (14.70756, 0.119382, 1811.933) when we transform back to the original variables. Notice that this last prediction interval is so wide that it is unlikely to be of any practical utility (it covers almost the range of all the data). Exercise 13.17 Part (a): If we want to assume a multiplicative power law model Y = αxβ ǫ then to linearize we will need to take logarithms of both the original variables x and y. Part (b): The model parameters for the transformed variables are well estimated and the variance explained by the model is large (R2 ≈ 0.9596) indicating that the linear model for the transformed variables is a good one. We can next consider some diagnostic plots on the transformed linear model to see if there are any ways that it could be improved. A plot of the fitted values on the x-axis and the standardized residuals on the y-axis does not indicate any model difficulties. Part (c): For the one-sided test asked for here the T statistic we compute is given by T = βˆ − βˆ0 , sβ where βˆ0 = 43 . We reject H0 if t-statistics for the test is less then the threshold −tα,n−2 of the t-distribution. Here we have n = 13 data points and I compute our t statistic to be t = −1.02918. I compute −tα,n−2 = −1.795885. Since our t-statistic is not less than this number we cannot reject H0 in favor of Ha . Part (d): We will have truth to the given statement when y(5) = 2y(2.5) . Since the model fit is a power law the above expression is equivalent to α5β = 2α(2.5)β or β = 1 . when we solve for β. Thus the hypothesis test we want to do is H0 : β = 1 and Ha : β 6= 1. The t-statistic for this test is given by 3.271051 and the critical value for a two sided test is tα/2,n−2 = 2.200985. Since our t-statistic is larger than the critical value we reject H0 in favor of Ha . 260 Exercise 13.18 In plotting the raw data we see that a direct linear model would not fit the data very well. The range of the variable Cycfail is quite large and thus a better model will result if logarithms are applied to that variable. It is more difficult to decide whether or not to take a logarithm of the variable Strampl since both scatter plots look very much the same qualitatively. One could fit linear models to both transform and compare the R2 between the two selecting the model with the larger R2 . The R2 of the model Strampl = β0 + β1 log(Cycfail) , is 0.4966854 while that for the model log(Strampl) = β0 + β1 log(Cycfail) , is 0.468891. Since the two models give similar values for R2 it is not clear which is the better choice. Since it is somewhat simpler (and has a larger R2 ) we will consider the first model. For a value of Cycfail of 5000 we get a mean value and a 95% prediction interval given for Strampl given by (0.008802792, 0.003023201, 0.01458238). Exercise 13.19 Part (a): A scatter plot shows an exponential or power law decay of Lifetime as a function of Temp. Part (b): If we think that the relationship between the predictor log(Lifetime) is linear then we would have log(Lifetime) = β0 + 1 Temp and the response β1 + ǫ′ . Temp Solving for Lifetime we get β1 β1 Lifetime = eβ0 e Temp ǫ = αe Temp ǫ . A scatter plot of the data transformed in the above manner looks linear. Part (c): The predicted value for Lifetime is given by 875.5128. Part (d): When we run the R code H0 linear model.R on the suggested date we find a F statistic given by 0.3190668 and a probability of getting a value this large or larger (under the H0 hypothesis) of 0.5805192. Thus we don’t have evidence to reject H0 and there is no evidence that we need a nonlinear model to fit this data. 261 Exercise 13.20 Under the suggested transformations we get scatter plots that look more or less linear. One could fit a linear model to each transformation and compute the R2 . The transformation that resulted in the largest value for R2 could be declared the best. This is a model selection problem and there maybe be better methods of selecting the model to use. ”By eye” the best model to use looked to be one of 1 , y = β0 + β1 x or 1 . log(y) = β0 + β1 x Exercise 13.21 For this problem I choose to model y = β0 + β1 104 x . Where I found β0 = 18.139488 and β1 = −0.148517. The predicted value of y when x = 500 is given by 15.16915. Exercise 13.22 Part (a): Yes and models 1 = α + βx . y Part (b): Yes and models log 1 −1 y = α + βx . Part (c): Yes and models log(log(y)) = α + βx . Part (d): No unless λ is given. 262 Exercise 13.23 Under both of these models if epsilon is independent from x then we can calculate the variance of the two suggested models. We find Var (Y ) = Var αeβx ǫ = α2 e2βx σ 2 Var (Y ) = Var αxβ ǫ = α2 x2β σ 2 , both of which are values that depend on x. Sometimes if you find that simple linear regression gives you a variance that depends on x you can try to find a transformation (like ones discussed in this section) that removes this dependence. Exercise 13.24 (does age impact kyphosis) Looking at the MINITAB output we see that the coefficient of age in this logistic regression is 0.004296 which has a Z value of 0.73 and a p-value of 0.463. Thus we have a 46.3% chance of getting a estimated coefficient of age in this model this large or larger when it is in fact zero. We conclude that age does not significantly impact kyphosis. Exercise 13.25 In this case (in contrast to the previous exercise) we see that age does influence the regression results in that the p-value for the age coefficient is small 0.007 indicating that there is less than 1% chance that we got a coefficient this large (or larger) by chance. Exercise 13.26 Part (a): A scatter plot of the data appears consistent with a quadratic regression model. Part (b): This is given by R2 , which from the MINITAB output we see is 0.931. Part (c): The full model has a p-value of 0.016 which is significant at the 5% level. Part (d): From the output we have the center of the prediction interval to be 491.10, the value of σYˆ = 6.52. We need to compute the critical values tα/2,n−(k+1) for α = 0.01 to compute the 99% confidence interval where we find (453.0173, 529.1827). Part (e): We would keep the quadratic term at the significance level of 5%. The p-value for the quadratic term is 0.032 thus we would not keep it at 2.5% level. 263 Exercise 13.27 Part (a): A scatter plot looks very much like a quadratic fit will be appropriate. Part (b): Using the given regression we have that yˆ = 52.8764 which gives a residual of 0.1236. Part (c): This is the R2 which we calculate to be 0.8947476. Part (d): The plot of the standardized residuals as a function of x show two points with numbers close to 2. A normal probability plot also shows two points that deviate from the line y = x. These two points could be considered in further depth if needed. Given the small data set size I don’t think these values are too worrisome. Part (e): For a confidence interval we get (48.53212, 57.22068). Part (f): For a prediction interval we get (42.8511, 62.9017). Exercise 13.28 Part (a): This is given by 39.4113. Part (b): We would predict 24.9303. Part (c): Using the formula given for SSE we get SSE = 217.8198, s2 = MSE = 72.6066 and s = 8.52095. Part (d): From the given value of SST we get R2 = 0.7793895. Part (e): From the value of sβˆ2 we can compute tβˆ2 = −7.876106. The p-value for a t-value this “large” is given by 0.002132356. Since this is less than 0.01 (the desired significance) we can conclude that our results are significant at the 1% level. Exercise 13.29 Part (a): For predicted values and residuals I get [1] 82.12449 80.76719 79.84026 72.84800 72.14642 43.62904 21.57265 and 264 [1] -1.1244938 2.2328118 -0.8402602 2.1520000 -2.1464152 -0.6290368 0.4273472 We get SSE = 16.77249 and s2 = 5.590829. Part (b): We get R2 = 0.9948128. This is a very large result indicating quite a good fit to the data. Part (c): We get a t value for β2 given by −6.548501 this has a p-value of 0.003619986 and we are significant at the 1% level. This indicates that the quadratic term does belong in the model. Part (d): Using the Bonferroni correction we need to calculate each confidence interval to at least 97.5% so that our combined confidence interval will be 95%. We find the two confidence intervals given by (for β1 and then for β2 ) [1] 0.4970034 3.8799966 [1] -0.005185555 -0.001146845 The guarantee that we are within both of these is good to 95% Part (e): I get a confidence interval of (69.03543, 76.66057) and a prediction interval of (64.4124, 81.2836). Exercise 13.30 Note that the data given in the file ex13-30.txt does not seem to match the numbers quoted in this problem. Part (a): From the output give we see that R2 = 0.853 indicating a good fit to the data. Part (b): I compute a confidence interval for β2 given by (−316.02234, 45.14234). Part (c): I compute a t-value of -1.828972. For the one-sided test suggested here I compute a 5% critical threshold given by -2.919986. Since the t-value we compute is not less than this number there is no evidence to reject H0 and accept Ha . Part (d): I compute this to be (1171.958, 1632.342). Exercise 13.31 Note that the data given in the file ex13-31.txt does not seem to match the numbers quoted in this problem. 265 Part (a): Using the numbers given in this problem statement and the R command lm we get a quadratic model given by Y = 13.6359 + 11.4065x − 1.7155x2 + ǫ . Part (b): A residual vs. x plot does not show any interesting features. A scatter plot of the raw data shows that the point with the largest value of x might have large influence. Part (c): From the output of the summary command we can read that σ = 1.428435 so that s2 = 2.040426, and R2 = 0.9473. Such a large value for R2 indicates that the quadratic fit is a good one. Part (d): We are given the values of (when we square) Var(Yˆj ) and will use independence of Yˆj and Yj − Yˆj to write 2 ˆ ˆ σ = Var Yj + Var Yj − Yj . If we replace σ 2 ≈ s2 we can solve for Var Yj − Yˆj in the above to get This gives Var Yj − Yˆj = s2 − Var Yˆj . [1] 1.12840095 1.12840095 1.53348195 1.43669695 1.43669695 1.43669695 0.06077695 For the variances of each residual. To get the standard deviation of each residual we take the square root of the above numbers to get [1] 1.0622622 1.0622622 1.2383384 1.1986229 1.1986229 1.1986229 0.2465298 A plot of the standardized residuals looks much like the plot in Part (b) of this problem. I don’t get a huge difference between using the correct sample standard deviation and the value estimated for s. Part (e): I get a prediction interval given by (27.29929, 36.32877). Exercise 13.32 Part (a): The estimated regression function for the “centered” regression is Y = 0.3463 − 1.2933(x − 4.3456) + 2.3964(x − 4.3456)2 − 2.3968(x − 4.3456)3 + ǫ . 266 Part (b): If we expand the above polynomials we can estimate the coefficient of any power of x. In fact since we want the coefficient of x3 this is just the coefficient of the monomial polynomial (x − 4.3456)3 which from the above we see is -2.3968. To compute the estimate β2 we have to compute the second order term in (x − 4.3456)3 and then add it to the second order term from (x − 4.3456)2 . Expanding the cubic term we have (x − x¯)3 = x3 − 3¯ xx2 + 3¯ x2 x − x¯3 , to give a coefficient of the quadratic term of −3¯ x. We need to multiply this by β3∗ and then add β2∗ to this to get the full coefficient of x2 . We get the value 33.64318. Part (c): I would predict the value of yˆ = 0.1948998. Part (d): The t-value for the cubic coefficient is -0.975, which is not that large. The p-value for this number is 0.348951 indicating that with almost 35% chance we can get a result this large or larger for the coefficient of β3∗ when in fact it is zero. Exercise 13.33 Part (a): We would compute yˆ = 0.8762501 and yˆ = 0.8501021 which are different from what the book reports. I’m not sure why if anyone sees an error I made please contact me. Part (b): The estimated regression function for the unstandardized model would be given p x by expanding each of the terms x−¯ for the value of p ∈ {0, 1, 2, 3} to get polynomials in sx x and then add these polynomials. Part (c): The t-value for the cubic term is given by 2 and the critical values for a tdistribution with n−(k+1) = 7−4 = 3 degrees of freedom is given by tα/2,n−(k+1) = 3.182446. Notice that our t-value is less than this indicating that it is not significant at the 5% level and should be dropped. Part (d): The values of R2 and MSE for each model would be the same, since the two models are equivalent in their predictive power. Part (e): If we compute the adjusted R2 for each model we find adjustedR22 = 0.9811126 adjustedR32 = 0.9889571 . Since the adjusted R2 is larger for the cubic model than the quadratic model the increase in R2 resulting from the addition of the cubit term is worth the “cost” of adding this variable. This result is different from what the book reports. I’m not sure why if anyone sees an error I made please contact me. 267 Exercise 13.34 Part (a): I would compute yˆ = 0.8726949. Part (b): We compute SSE = 0.1176797 for the transformed regression (the one in terms of x′ ). This gives R2 = 0.9192273. Part (c): The estimated regression for the unstandardized model would be given p function x−¯ x for the value of p ∈ {0, 1, 2} to get polynomial in x by expanding each of the terms sx and then adding these polynomials. Part (d): This would be the same as the estimated standard deviation of β2∗ which we see is 0.0319. Part (e): The t-value for β2 is given by 1.404389 while the tα/2,n−(k+1) (for α = 0.05) critical value is 2.228139. Since our t-value is less than this critical value we can not reject H0 and and conclude that we should take β2 = 0. For the second order term the two tests specified are the same and would reach the same conclusions. Exercise 13.35 Using the R command lm I find the model log(Y ) = 0.2826 − 0.008509x − 0.000003449x2 , we note that the O(x2 ) coefficient is not significant however. This result is different from what the book reports. I’m not sure why if anyone sees an error I made please contact me. Exercise 13.36 Part (a): The value of β1 represents the increase in maximum oxygen uptake with a unit increase in weight (i.e. increasing weight by one kilogram) and holding all other predictors in the model fixed. Here the estimated value of β1 = 0.1 > 0 indicates that as the male gets older his maximum oxygen update should increase. The value of β3 represents the increase in maximum oxygen uptake when the time necessary to walk 1 mile increase by one unit (one minutes) again holding all other predictors in the model fixed. Here the estimated value β3 = −0.13 means that for every minute needed to walk one mile the maximum oxygen uptake decreases by -0.13. Part (b): I compute 1.8. Part (c): The mean value of our distribution when the x values are as specified is given by 1.8 and our variance is given by σ 2 = 0.42 = 0.16. Thus the probability is given by 268 0.9544997. Exercise 13.37 Part (a): I compute 4.9. Part (b): The value of β1 represents the increase in total daily travel time when the distance traveled in miles increases by one mile (holding all other predictors in the model constant). The same type of statement can be made for β2 . Part (c): I compute this would be 0.9860966. Exercise 13.38 Part (a): I compute this to be 143.5. Part (b): When the viscosity is 30 we have x1 = 30 and our expression for the mean becomes Y = 125 + 7.75(30) + 0.095x2 − 0.009(30)x2 = 357.5 − 0.175x2 . Thus the change in Y associated with a unit increase in x2 is then given by -0.175. Exercise 13.39 Part (a): I compute this mean to be 77.3. Part (b): I compute this mean to be 40.4. Part (c): The numerical value of β3 represents the increase in sales (1000s of dollars) that the fast food outlet gets to having a drive-up window. Exercise 13.40 Part (a): I compute the expected error percentage to be 1.96. Part (b): I compute the expected error percentage to be 1.402. Part (c): This would be the coefficient of the x4 variable or the value of β4 = −0.0006. If we increase x4 by 100 (rather than 1) we would get ∆Y = 100β4 = −0.06. 269 Part (d): The answers in Part (c) do not depend on the other x values since the model is purely linear. It would depend on the other x if there were interaction terms (like x1 x4 or x2 x4 etc.). Part (e): I get R2 = 0.4897959. For the model utility test I get a test statistic f of 6. The critical value is Fα,k,n−(k+1) = 2.75871. Since our value of f is larger than this value we can reject the H0 hypothesis (the model is not predictive) in favor of Ha the model is useful. Exercise 13.41 For the model utility test I get a test statistic f of 24.41176. The critical value is Fα,k,n−(k+1) = 2.420523. Since our value of f is larger than this value we can reject the H0 hypothesis (the model is not predictive) in favor of Ha (the model is useful). Exercise 13.42 Part (a): From the MINITAB output we have that the f value is 319.31 which has a pvalue of zero to three digits. Thus this model is significant to an accuracy of at least 0.04% (otherwise the given p-value would have been written as 0.001). Part (b): I get a 95% confidence interval for β2 given by (2.117534, 3.882466). Since this interval does not include 0 we can be confident (with an error of less than or equal to 5%) that β2 6= 0. Part (c): I get a 95% confidence interval for Y at the give values of x given by (−15.26158, 203.26158). Part (d): From the MINITAB output we have that MSE = σ ˆ 2 = 1.12. Using this we can estimate a 95% prediction interval to be (−15.28295, 203.28295). Exercise 13.43 Part (a): I get the value of 48.31 and a residual of 3.689972. Part (b): We cannot conclude this since there is an interaction term x1 x2 in the regression. Part (c): There appears to be a useful relationship of the type that is suggested (with an interaction term). We can see this from the p-value for the model as whole. Part (d): The t-value and p-value for the coefficient β3 is given in the SAS output. From that we see that this term is significant at the α = 0.003 level. 270 Part (e): I compute a confidence interval given by (21.70127, 41.49134). Exercise 13.44 Part (a): The estimate of β1 is the numerical change in the water absorption for wheat flour for a unit change in x1 (flour protein). The interpretation of β2 is the same as for β1 (but now for a unit change in starch damage). Part (b): This is the R2 for which we find to be 0.98207. Part (c): The p-value for the model fit is very small (zero to the precision given) thus the model is a useful one. Part (d): No. The 95% confidence interval of starch damage does not include zero indicating that the predictor is useful. Part (e): From the SPSS output we have MSE = 1.1971 = σ ˆ 2 or σˆ = 1.09412 from the value reported in the Standard Error output. Using this we can compute the prediction and confidence intervals and find [1] 21.70127 41.49134 [1] -15.28295 203.28295 Part (f): For the estimated coefficient β3 we have a t-value of -2.427524 while the α = 0.01 threshold tα,2,n−(k+1) (when we have three predictors) is 2.79694. Since our t-value is not larger than this in magnitude we conclude that we cannot reject the null hypothesis that β3 = 0 and we should not include this term in the regression Exercise 13.45 Part (a): Given the value of R2 we can compute a model utility test. We find f = 87.59259 and a critical value Fα,k,n−(k+1) = 2.866081. As f is greater than this critical value the model has utility. Part (b): I compute 0.9352. Part (c): I get a prediction interval given by (9.095131, 11.086869). 271 Exercise 13.46 Part (a): From the given numbers I compute R2 = 0.835618 and an f value for the model utility test of f = 22.88756. This has a p-value of 0.000295438. Since this p-value is less than 0.05 there seems to be a useful linear relationship. Part (b): The t-value for β2 is found to be 4.00641 where the tα/2,n−(k+1) critical value is 2.262157. Since our t-value is larger than this critical value we conclude that the type of repair provides useful information about repair time (when considered with elapsed time since the last service). Part (c): I get a confidence interval for β2 given by (0.544207, 1.955793). Part (d): I get a prediction interval given by (2.914196, 6.285804). Exercise 13.47 Part (a): β1 is the change in y energy content for a one unit change in x1 % percent plastics by weight. β4 is the same for the fourth variable. Part (b): The MINITAB output shows a p-value for the entire model as zero to three digits. This indicates that there is a useful linear relationship between at least one of the four predictors. Part (c): The p-value for % garbage is 0.034 and thus would stay in the model at a significance of 5% but not a significance of 1%. Part (d): I compute a confidence interval given by (1487.636, 1518.364). Part (e): I compute a prediction interval given by (1436.37, 1569.63). Exercise 13.48 Part (a): The f -value for the given regression is 8.405018 which has a p-value of 0.01522663. Note that this is not significant at the 1% level but is significant at the 5% level. Part (b): I compute a confidence interval of (18.75891, 25.17509) for expected weight loss. Part (c): I compute a prediction interval of (15.54913, 28.38487) for expected weight loss. Part (d): We are asked for a F -test for a group of predictors. That is we have a null 272 hypothesis H0 that βl+1 = βl+2 = · · · = βk = 0 , so the reduced model Y = β0 + β1 x1 + · · · + βl xl + ǫ , is correct vs. the alternative Ha that at least one of the βl+1 , βl+2 , · · · , βk is not 0. Given the values of SSE I’m getting an f -value for this test to be f = 6.431734. This has a p-value of 0.02959756 indicating that there is a useful relationship between weight loss and at least one of the second-order predictors. Exercise 13.49 Part (a): I compute µ ˆY ·18.9,43 = 96.8303 and a residual of -5.8303. Part (b): This is a model utility test for the model as a whole. I compute a f value of 14.89655 which has a p-value of 0.001395391. Thus this model is “useful” at the 1% level. Part (c): I compute a confidence interval of (78.28061, 115.37999). Part (d): I compute a prediction interval of (38.49285, 155.16775). Part (e): To do this problem we first have to solve for the estimated standard deviation of βˆ0 + βˆ1 x1 + βˆ2 x2 at the given point in question. I compute this to be 25.57073. Once we have this we can compute the 90% prediction interval as we normally do. I compute (46.88197, 140.63003). Part (f): For this we look at the t-value for β1 . From the MINITAB output we see this is given by -1.36 which has a p-value of 0.208, thus we should probably drop this predictor from the analysis. Part (g): The F -statistic for the model H0 : Y = β0 + β2 x2 + ǫ vs. Ha : Y = β0 + β1 x1 + β2 x2 + ǫ is given by 1.823276 while if we square the t-value for β1 in the MINITAB output we get 1.8496. These two numbers are the same to the first decimal point. I’m not sure why they are not closer in value. If anyone knows the answer to this please contact me. Exercise 13.50 Part (a): I get a t-value for β5 given by 0.5925532 with a p-value of 0.5751163, indicating that we should not keep this term (when the others are included in the model). Part (b): Each t-value is computed with all other variables in the model. Thus each can be small but the overall model can still be a good one. 273 Part (c): I get an f -value for this test of 1.338129 which has a p-value of 0.3472839 indicating that we should not keep the quadratic terms in the model. Exercise 13.51 Part (a): The plots suggested don’t indicate that the model should be changed. Part (b): I get an f -value for the model utility test of 5.039004 and a p-value for that of 0.02209509. This indicates that the model is valid at the 5% level. Part (d): I get an f -value for this test of 3.452831 and a p-value of 0.07151767, thus there is at least a 7% chance that the reduction in SSE when we use the full model is due to chance. At the 5% level we should drop these quadratic terms. Exercise 13.52 Part (a): We would want to do a model comparison test between the full quadratic model 3 with k = 2 + 3 = 6 predictors (I’m assuming all cross product terms in addition to all quadratic terms) and the linear model with l = 3 predictors. I get an f -value of 124.4872 and a p-value for this of 8.26697 10−9 indicating that we should keep the quadratic terms in the model. Part (b): I get a prediction interval of (0.5618536, 0.7696064). Exercise 13.53 The model as a whole is good (low p-value). The predictor x2 could perhaps be dropped not significant at the 1% level but is significant at the 5% level. Exercise 13.54 Part (a): We would map the numerical values given to the coded variables and then evaluate the expression for y given in this problem. Part (b): For each variable one would need to produce a mapping from the coded variables to the uncoded variables. For x1 such a mapping could be 0.3 − 0.1 (coded value − 0) + 0.3 = 0.1 × coded value + 0.3 . uncoded value = −(−2) 274 If we solve for the coded value in terms of the uncoded value we would replace each coded value in the regression with an expression in terms of its uncoded value. Part (c): I get a f -value of 2.281739 with p-value of 0.06824885. This means that at the 5% level we cannot conclude that the quadratic and cross-product terms add significant improvement to the linear model. Part (d): I get a 99% confidence interval of (84.34028, 84.76932) since this does not include the value of 85.0 we can conclude that the given information does contradict this belief (with a chance of being wrong of 1%). Exercise 13.55 Part (a): We would take the logarithm of the multiplicative power model which would give log(Q) = log(α) + β log(a) + γ log(b) + log(ǫ) . Then if we find β0 , β1 , and β2 using linear regression we would have α = eβ0 β = β1 γ = β2 . We can use the R command lm to estimate the above parameters and find α = 4.783400 β = 0.9450026 γ = 0.1815470 . We can also use this linear model to make predictions. For the requested inputs we predict a value of Q = 18.26608. Part (b): Again talking logarithms of the suggested model we would have log(Q) = log(α) + βa + γb + log(ǫ) . Again we could fit the above model with least squares. Part (c): We can get this by transforming the confidence interval given in terms of q into the variable of interest q by taking the exponential of each variable to get (1.242344, 5.783448). Exercise 13.56 Part (a): I compute a f -statistic (given the value of R2 ) of 9.321212 this has a p-value of 0.0004455507 indicating that there is a linear relationship between y and at least one of the predictors. 275 Part (b): I compute Ra2 = 0.6865 for the original model (the one with x2 ) and Ra2 = 0.7074 for the model when we drop x2 . Part (c): This would be a model utility test where the null hypothesis is that the coefficients of x1 , x2 , and x4 are all zero. For this test I get an f -value of 2.323232 with a p-value of 0.1193615. Since the p-value is so large we cannot reject H0 and must conclude that the coefficients stated are “in fact” zero. Part (d): Using the transformed equation I predict yˆ = 0.5386396. Part (e): I compute a confidence interval for β3 given by (−0.03330515, −0.01389485). Part (f): From the transformed equation we can write x3 − x¯3 x5 − x¯5 y = β0 + β3 + β5 s3 s5 β5 β3 x¯3 β5 x¯5 β3 − + x3 + x5 , = β0 − s3 s5 s3 s5 which shows how to transform coefficients from the standardized model to the unstandardized model. The coefficient for x3 in the unstandardized model is then given by 0.0236 β3 =− = −0.00433449 . s3 5.4447 We would compute the value of sβˆ3 using the normal rules for how variables transform under multiplication i.e. 1 1 sβˆ3 = sβˆ′ = (0.0046) = 0.0008448583 . s3 3 5.4447 Part (g): I compute this prediction interval to be (0.4901369, 0.5769867). Exercise 13.57 Part (a): We first start by converting the SSE into MSEk using MSEk = SSE . n−k−1 We then pick the model with the smallest MSEk as a function of k. This seems to be when k = 2. Part (b): No. Forward selection when considering a subset of two variables would start with the result of the best single variable, which in this case this is x4 . Forward selection would then add other variables to it and keeping the variable that had the best metric (R2 or MSE) from all pairs that included x4 as one of the two variables. Thus there is no way to test the variable pairing (x1 , x2 ) which is the optimal one for this problem if we search over all subsets of size two. 276 Exercise 13.58 In the first step we dropped the variable x3 since it had the smallest t-value and was below the threshold tout . In the second step we dropped the variable x4 for the same reason. No t-value was less than tout and the procedure stopped with three variables x1 , x2 , and x4 . Note that it looks like tout = 2.0. Exercise 13.59 It looks like the MINITAB output is presenting the best 3 models over all model subsets of size k where k = 1, 2, 3, 4, 5. From the best Cp result with k = 4 it looks like x1 , x3 , x4 and x5 are important. The best Cp results with k = 3 indicates that x1 , x3 , and x5 are important. These are the two models I would investigate more fully. Exercise 13.60 The first “block” of outputs (from the first “Step” row down to the second “Step” row) represents the backward elimination method and the second block of outputs represents the forward selection method. In the backward elimination first we remove the feature sumrfib, then the feature splitabs and then the feature sprngfib and then stop. At each step (it looks like) we are removing the feature with the smallest t-value under 2.0. In the forward selection method we first select the feature %sprwood and then the feature sumltabs and then stop. If we assume the minimum t-value for inclusion in the forward selection method is the same as in the backwards elimination procedure no single additional feature (if added) would have a t-value larger than 2.0. Exercise 13.61 Severe multicollinearity is a problem if Ri2 is larger than 0.9. All Ri2 values given here are smaller than this threshold. Exercise 13.62 = 0.4210526. We consider a sample to unusual if its hii value is larger than the value 2(k+1) n From the values given we see that observations 14, 15, and 16 are candidates for unusual observations. 277 Exercise 13.63 The presence of a observation with a large influence means that this point could adversely affect the estimate of the parameters in the linear regression. That we have points like this casts some doubt on the appropriateness of the previous given model estimates. The observation with a large standard residual is a point that does not fit the model very well and (if you believe your model) could be an outlier point i.e. one that is not really generated by the process we are attempting to study. Dropping the outlier point makes sense if it is found out that it is not really representative of the system we are trying to study. Exercise 13.64 Part (a): We would want to compare the given hii values to The observation with hii = 0.604 appears to be influential. 2(k+1) n = 0.6 in this problem. Part (b): Lets look at the change in the values of βi when we include the second point and when we exclude it (divided by the standard error of βi ). We find these numbers given by [1] 0.4544214 0.5235602 0.7248858 The value of β3 changes by almost 70% of a standard error. This is a relatively large change. The other values of β change quite a bit also. This observation does seem influential. Part (c): In this case the relative change in the values of βi are given by [1] -0.1446507 0.2617801 -0.1712329 These are all relatively small in value and this observation does not seem influential. Exercise 13.65 Part (a): See Figure 13 for the boxplots for this problem. From this plot we see that when a crack appeared the value of ppv was larger “on average” than when a crack did not appear. To test if the mean of the two distributions are different we use the Z-test described in an earlier chapter. This gives a 95% confidence interval between the difference in mean between the values of ppv for samples where cracking happens to samples where cracking does not happen given by (146.6056, 545.4499). Thus we can be “certain” that the ppv for cracked samples is larger than for un-cracked samples. Note this result is different from what the book reports. If anyone sees anything wrong with what I have done please contact me. 278 1200 1000 800 200 400 600 ppv 0 1 crack appeared (0=False;1=True) Figure 13: Boxplots for the data from Exercise 13.65. Part (b): A scatter plot with a linear fit of ppv on the x-axis and Ratio on the y-axis looks like a good model fit. To test if this can be improved we plot the standardized residuals e∗ as a function of the fitted values yˆ. This plot does not show anything that would make us think the linear fit was not a good one. Note that one of the standardized residuals is -4 indicating that we should probably investigate that point. Exercise 13.66 Part (a): The slope βˆ1 = 0.26 is the estimate of how much flux will change for a unit increase in the value of inverse foil thickness. The coefficient of determination or R2 = 0.98 indicates that this is a good model fit. Part (b): I compute this to be 5.712. Part (c): Yes. Part (d): Using the MINITAB output for sample 7 we have σYˆ = 0.253 when invthick = 45.0. This gives a 95% confidence interval given by (10.68293, 11.92107). Part (e): Note that the value found for β0 is perhaps not significant. 279 Exercise 13.67 Part (a): By increasing x3 by one unit we expect y to decrease by -0.0996 holding all other variables constant. Part (b): By increasing x1 by one unit we expect y to increase by 0.6566 holding all other variables constant. Part (c): I would have predicted the value of 3.6689 which gives a residual of -0.5189. Part (d): I compute R2 = 0.7060001. Part (e): We compute an f -statistic of 9.005105 which has an p-value of 0.0006480373 thus we have a useful model. Exercise 13.68 Part (a): A scatter plot of log(time) versus log(edges) does suggest a linear relationship between the two variables. Part (b): This would be time = α edgeβ ǫ , or a power law. Part (c): Using the R code lm I compute the model log(time) = −0.7601 + 0.7984 log(edge) + ǫ , with σǫ2 = 0.01541125. A point prediction for log(time) is given by 3.793736 which gives a point prediction of time of 44.42206. Exercise 13.69 Part (a): A plot of Temperature as a function of Pressure does not look linear but seems to take a curved shape. Part (b): I find that a plot of Temperature as a function of log(Pressure) looks very linear. Using the R command lm I find a model given by Temperature = −297.273 + 108.282 log(Pressure) + ǫ , with σǫ2 = 72.45414. Note that a plot of the fitted values yˆ on the x-axis and the standardized residuals on the y-axis shows a cluster of points with almost a linear shape and we see a 280 point with a very large standardized residual. If we ignore these difficulties and use this model in any case with a measured Pressure of 200 then we find a 95% prediction interval for Temperature given by (257.8399, 295.0415). We could try to fit a polynomial model to this data. A second order model has the same point with a large standardized residual that the log model did. A cubic model does not have any single point with a sufficiently large standardized residual but the intercept term (i.e. β0 is no longer significant). Residual plots of the second order model don’t seem to indicate the need for a cubic term and thus I would probably stop with a quadratic model. Exercise 13.70 Part (a): For the model without the interaction term I get R2 = 0.394152 and for the model with the interaction term I get R2 = 0.6409357. Part (b): For the reduced model (with only two terms) I find a f -statistic of 1.951737 which has a p-value of 0.2223775, indicating that this model is perhaps spurious (will not hold up out of sample). When we add the interaction term our f -statistic becomes 2.975027 with a p-value of 0.1355349. This second model appears to be much more significant than the first. Note that neither model is significant at the 5% level. Exercise 13.71 Part (a): The p-value for the fit using all five predictors is 2.18 10−8 indicating that there is “utility” in this model. The R command summary gives > summary(m) Call: lm(formula = pallcont ~ pdconc + niconc + pH + temp + currdens, data = DF) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 25.0031 20.7656 1.204 0.239413 pdconc 2.2979 0.2678 8.580 4.64e-09 *** niconc -1.1417 0.2678 -4.263 0.000235 *** pH 3.0333 2.1425 1.416 0.168710 temp 0.4550 0.2143 2.124 0.043375 * currdens -2.8000 1.0713 -2.614 0.014697 * Residual standard error: 5.248 on 26 degrees of freedom Multiple R-squared: 0.8017, Adjusted R-squared: 0.7636 281 F-statistic: 21.03 on 5 and 26 DF, p-value: 2.181e-08 Note that some of the predictors don’t appear to be as important. For example pH does not seem to be significant at the 10% level (when the other predictors are included). Part (b): I’ll assume the second order model includes 52 = 10 cross-terms and 5 quadratic terms. In that case I’m getting R2 = 0.801736 for the model without quadratic and interaction terms and an R2 = 0.9196336. This is an expected result and is not sufficient to conclude that the more complex model is better. Part (c): We can perform a model utility test with the null hypothesis that the coefficients of all quadratic terms are zero. The f -statistic for this test is 1.075801 with a p-value of 0.4608437. This indicates that the reduction in SSE due to the quadratic terms is not sufficient to warrant the more complex model. Part (d): The model fit using all five predictors with pH2 has all predictors important at the 5% level. The least important predictor is temp which has a p-value of 0.015562. Exercise 13.72 Part (a): I compute R2 = 0.9505627 thus the model seems to have utility. Part (b): I get an f -value of 57.68292 with a p-value of 2.664535 10−15. Part (c): Following the hint we take the square root of the given F -value to get 1.523155 this gives the t-value of the test βcurrent-time = 0. A p-value for this t-value is 0.1393478 which indicates that this predictor can be eliminated. Part (d): I’m not sure how to compute an estimate of σ 2 given the inputs to this problem since we have removed predictors from the complete second-order model where we were told that SSE = 0.80017 and thus the value of SSE would change from that value (it would have to increase). If we assume that it does not change however with a new value of k (the number of predictors used in our model) we get a prediction interval for the given point of (7.146174, 7.880226). This seems relatively accurate. Exercise 13.73 Part (a): I compute a value of R2 = 0.9985706 for the quadratic model. This implies an f -statistic of 1746.466 with a p-value of 7.724972 10−8. All of these which indicates a good fit to the data and a useful model. Part (b): The quadratic coefficient has a t-value of -48.11 and a p-value of 7.330253 10−8. This indicates that it is estimated well and a linear model would not do as well as the 282 quadratic model. In addition, the F -test (for the inclusion of the quadratic terms) has an f -value given by t2 and would be significant. Part (c): Only if the residual plots indicated that one might be needed. We only have n = 8 data point so by adding another predictor we might start to overfit the data. Part (d): I get a confidence interval for the mean value when x = 100 of (21.0700, 21.6566). Part (e): I get a prediction interval for the mean value when x = 100 of (20.67826, 22.04834). Exercise 13.74 We could perhaps model log10 (y) as a function of x. This gives the model log10 (y) = −9.12196 + 0.08857x + ǫ . The prediction when x = 35 of y is then given by y = 9.508204 10−7. Exercise 13.75 Part (a): This is a model utility test where we can consider whether or not the expanded model including x2 decreases the mean square error enough to be included. We can recall that the f -value for the test of including an additional predictor in a model is equal to t2 . From the information given here we compute our f -value to be f = 59.08421. This has a p-value of 0.0001175359 indicating that we should include the quadratic term. Part (b): No. Part (c): I compute a confidence interval for E[Y ] given the value of x to be (43.52317, 48.39903). Exercise 13.76 Part (a): This is the R2 which we can read from the MINITAB output to be 0.807. Part (b): The f -value for this fit is 12.51 which has a p-value of 0.007 and the model appears to be useful. Part (c): The p-value for this variable is 0.043 which is less than 5% and thus we would keep it at that level of significance. Part (d): I compute a confidence interval for β1 given by (0.06089744, 0.22244256). 283 Part (e): I compute a prediction interval for E[Y ] given by (5.062563, 7.565437). Exercise 13.77 Part (a): I compute this point estimate to be 231.75. Part (b): I compute R2 = 0.9029993. Part (c): For this we need to consider a model utility test. We find an f -value of 41.8914 which gives a p-value of 2.757324 10−5. This indicates that there is utility to this model. Part (d): I compute a prediction interval to be (226.7854, 232.2146). Exercise 13.78 This is the application of the F test for a group of predictors where we want to see if the addition of the quadratic terms has sufficiently reduced the mean square error to justify their inclusion. I compute a f -statistic of 1.056076 and a p-value for this of 0.4465628. This indicates there is not sufficient evidence to reject the hypothesis that the coefficients of the quadratic terms are all zero. We should therefore not include them in the model. Exercise 13.79 Part (a): When we plot the various metrics we don’t see an isolated extremum for any of them. Thus we have to pick the number of predictors to use where the metric curves start to asymptote (i.e. the marginal improvement in adding additional predictors begins to decrease). From these plots this looks to be when we have around 4 predictors. Part (b): Following the same logic as above but with the new values of Rk2 , MSEk and Ck corresponding to the expanded model I would select five predictors. Exercise 13.80 Part (a): I get a f -statistic of 2.4 which has a p-value of 0.2059335. This indicates that there is greater than 20% chance that the model extracted is simply noise and will have no out of sample predictive power. Part (b): A high R2 is helpful when the number of predictors k is small relative to n. In this case k and n are about the same value indicating that the in-sample R2 might overestimate the out-of-sample R2 . 284 Part (c): We can repeat the analysis above for larger and larger values of R2 . We do that in the R code for this problem. When we do that we find that for a R2 value of around 0.96 the p-value is less than 0.05. Exercise 13.81 Part (a): We have k = 4 and for the value of n given we get an f -statistic of 106.1237. This has a p-value that is zero to the accuracy of the numerical codes used to calculate it. Thus this model is significant. Part (b): I find a 90% confidence interval for β1 given by (0.01446084, 0.06753916). Part (c): I compute a t-value for β4 given by 5.857143 which has a p-value of 4.883836 10−8 indicating that this variable is important in the model. Part (d): I compute yˆ = 99.514. Exercise 13.82 Part (a): This is a list where we have combined all predictors from the previous two models. The R2 in that case must increase from the larger of the two R2 s. This is because this expanded model has more degrees of freedom and can fit the data “no worse” than each of the component models. Thus we can conclude that for the combined model R2 > max(0.723, 0.689) = 0.723. Part (b): In this case as x1 and x4 are both predictors in the first model with R2 = 0.723 removing the predictors x5 and x8 can only cause our R2 to decrease thus with this smaller model we would expect R2 < 0.723. 285 Distribution-Free Procedures Note on the Text The Large Sample Approximation in the Wilcoxon Signed-Rank Test Recall that Wi is the indicator random variable representing whether or not the ith samples contributes its rank to S+ . Now under H0 in forming the statistic S+ every sample will contribute its rank to S+ with a probability 1/2 (and with probability 1/2 it will contribute the value 0 to S+ ). Thus " n # n n n X X X 1 n(n + 1) 1X 1 n(n + 1) E[Wi ] = = E[S+ ] = E Wi = (i + 0) = . i= 2 2 i=1 2 2 4 i=1 i=1 i=1 Now the variance of S+ can be computed using independence as Var (S+ ) = n X Var (Wi ) . i=1 To use this we need to estimate Var (Wi ) Var (Wi ) = Thus E[Wi2 ] i2 − E[Wi ] = − 2 2 n 1X 2 1 i = Var (S+ ) = 4 i=1 4 2 i i2 = . 2 4 n(n + 1)(2n + 1) 6 . Each of these are the results given in the text. Notes on Ties in Absolute Magnitude Given the signed ranks 1, 2, −4, −4, +4, 6, 7, 8.5, 8.5, 10 , the duplicate number four is because we had three tied values with sequential ranks 3, 4, and 5. When we average these three values we get 3+4+5 = 4. 3 and would get signed ranks of −4, −4, and 4 thus τ1 = 3. The negative numbers (vs. the positive numbers) are due to the fact that those samples must have magnitudes below the median. Next we must have had two tied values with sequential ranks of 8 and 9 which have an average rank of 8+9 = 8.5 , 2 and we would have τ2 = 2. 286 Problem Solutions The Wilcoxon signed rank test is in the R routine wilcoxon signed rank test.R and can be used to help in solving some of these problems. The function will return the numerical value of s+ given a sample of data and the mean of the null hypothesis. One will still need to look up critical values of S+ from the tables in the appendix to determine if we will should reject or accept the null hypothesis. Some of the computational steps needed for the Wilcoxon rank-sum test is in the R routine wilcoxon rank sum test.R and can be used to help in solving some of these problems. All R scripts for this chapter (if they exist) are denoted as ex13 NN.R where NN is the section number. Exercise 15.1 Our hypothesis test for this data is H0 : µ = 100 Ha : µ 6= 100 . We compute a value of s+ = 27 (the back of the book has the value of s+ = 35 but I think that is a typo). We are told to consider α = 0.05 so α/2 = 0.025. When we look in Table A.13 where n = 12 for the row with an α value near 0.025 we find that when α = 0.026 we have c = 64 and we reject when s+ is greater than c or less than n(n+1) − c = 14. Since 2 our value of s+ is between these two limits we cannot reject H0 in favor of Ha . This agrees with the result (using the t-distribution) found when working Exercise 32 (see Page 193). Exercise 15.2 Our hypothesis test for this data is H0 : µ = 25 Ha : µ > 25 . We compute a value of s+ = 11. We want to perform a one-sided test with α = 0.05. When we look in Table A.13 where n = 5 we find for α = 0.062 that c1 = 14. Since our value of s+ is smaller than this we cannot reject H0 in favor of the conclusion Ha . This result agrees with the conclusions of Example 8.9 in the book. 287 Exercise 15.3 Our hypothesis test for this data is H0 : µ = 7.39 Ha : µ 6= 7.39 . We compute a value of s+ = 18. We want to consider a two-sided test with α = 0.05 so α/2 = 0.025. When we look in Table A.13 where n = 14 for the row with an α value near 0.025 we find that when α = 0.025 we have c = 84. We will reject H0 when s+ is greater − c = 21. Since our sample value of s+ = 18 is smaller than this than c or less than n(n+1) 2 smallest value we can reject H0 in favor of Ha . Exercise 15.4 Our hypothesis test for this data is H0 : µ = 30 Ha : µ < 30 . This is a one-sided test with α = 0.1. When we look in Table A.13 where n = 15 for the row with an α value near 0.1 we find that when α = 0.104 we have c1 = 83. Since our sample value of s+ = 39 is larger than the critical value of c2 = n(n+1) − c1 = 37 we cannot reject 2 H0 in favor of Ha . Exercise 15.5 This is a two sided test so if we take α = 0.05 we would have α/2 = 0.025. Then for n = 12 in Table A.13 we find α = 0.026 gives c1 = 64. Thus we will reject H0 if s+ is larger than − c1 = 14. Since we have s+ = 72 we can reject H0 in c1 = 64 or smaller than c2 = n(n+1) 2 favor of Ha . Exercise 15.6 This is a two sided test so if we take α = 0.05 we would have α/2 = 0.025. Then for n = 9 in Table A.13 we find α = 0.027 gives c1 = 39. Thus we will reject H0 if s+ is larger than c1 = 39 or smaller than c2 = n(n+1) − c1 = 6. Since in this case we have s+ = 45 we can 2 reject H0 in favor of Ha . 288 Exercise 15.7 For this exercise we will use the large sample version of the Wilcoxon test (ignoring the correction to the estimate of σS+ due to duplicate differences). For this data we compute s+ = 443 a sample z = 2.903522 which has a P-value given by 0.003689908. Thus we should reject H0 in favor of Ha in this case. Exercise 15.8 For this hypothesis test we compare H0 : µ = 75 Ha : µ > 75 . Since we have n = 25 > 20 we will use the large sample test (ignoring the correction to σS+ due to duplicate Xi − 75 values). For the given data set I compute s+ = 226.5, which has z = 1.722042 which has a P-value of 0.04253092. Thus at the 5% level we can reject H0 in favor of Ha . Exercise 15.9 See the python code ex15 9.py where we enumerate the various ranks that the Xi could have. For each of these ranks we then evaluate the D statistic. Possible values for the D statistic (and a count of the number of times D takes on this value when n = 4) are given by counts of number of different D values= Counter({6: 4, 14: 4, 2: 3, 18: 3, 8: 2, 10: 2, 12: 2, 0: 1, 4: 1, 16: 1, 20: 1}) We can then obtain the probability that we get each D value and find probability of different D values= {0: 0.041666666666666664, 2: 0.125, 4: 0.041666666666666664, 6: 0.16666666666666666, 8: 0.08333333333333333, 10: 0.08333333333333333, 12: 0.08333333333333333, 14: 0.16666666666666666, 16: 0.041666666666666664, 18: 0.125, 20: 0.041666666666666664} 289 y y y y x x y x x x 163 179 213 225 229 245 247 250 286 299 1 2 3 4 5 6 7 8 9 10 Table 15: The ranks of each xi and yj sample when combined. If we want to then pick a value c such that under the null hypothesis we would have P (D ≤ c) ∼ 0.1 we have to take c = 0 for if c = 2 then P (D ≤ 2) ≈ 0.16666667 > 0.1. Exercise 15.10 Our hypothesis test for this data is H0 : µ 1 = µ 2 ⇒ H0 : µ 1 − µ 2 = 0 Ha : µ 1 > µ 2 ⇒ Ha : µ 1 − µ 2 > 0 . We have five values for the adhesive strength for sample 1 (call these the xi s). We also have five values for the adhesive strength for sample 2 (which we call the yi s). We don’t need to use the large sample test (since the numbers are so small). For α = 0.05 we use Appendix Table A.14 with m = 5 and n = 5 to find P (W ≥ 36 when H0 is true) = 0.048 ≈ 0.05. For this data we have Table 15. Thus w = 5 + 68 + 9 + 10 = 38. Since w is greater than the critical value of 36 we can reject H0 in favor for Ha . Exercise 15.11 We label the “Pine” data as the xi s (since there are fewer samples) and the “Oak” data as the yi s. Then our hypothesis test for this data is H0 : µ 1 = µ 2 ⇒ µ 1 − µ 2 = 0 Ha : µ1 6= µ2 ⇒ µ1 − µ2 6= 0 . Thus for α = 0.05 we use Appendix Table A.14 with m = 6 and n = 8 to find P (W ≥ 61 when H0 is true) = 0.021 ≈ 0.025 thus c = 61 and we reject H0 if the rank-sum w is such that w ≥ 61 or w ≤ m(m + n + 1) − c = 29 . For the data given here using wilcoxon rank sum test.R we find that w = 37 and thus we do not reject H0 in favor of Ha . 290 Exercise 15.12 We label the data from the “Original Process” as the xi s and the data from the “Modified Process” as the yi s. Then our hypothesis test for this data is H0 : µ 1 − µ 2 = 1 Ha : µ 1 − µ 2 > 1 . For α = 0.05 we use Appendix Table A.14 with m = 8 and n = 8 to find P (W ≥ 84 when H0 is true) = 0.052 ≈ 0.05 , thus c1 = 84 and we reject H0 if the rank-sum w is larger than this number. For the data given here we find that w = 65 (remembering to subtract the one) and thus we do not reject H0 in favor of Ha . Exercise 15.13 We label the data with the “Orange Juice” as the xi s and the data from the “Ascorbic Acid” as the yi s. Then our hypothesis test for this data is H0 : µ 1 − µ 2 = 0 Ha : µ1 − µ2 6= 0 . Since m = n = 10 we can use the large sample normal approximation to compute z = 2.267787. For this value of z we compute a P-value given by 0.0233422. As this is larger than 0.01 we cannot reject H0 in favor of Ha . Exercise 15.14 Again we label the data with the “Orange Juice” as the xi s and the data from the “Ascorbic Acid” as the yi s. Then our hypothesis test for this data is again H0 : µ 1 − µ 2 = 0 Ha : µ1 − µ2 6= 0 . Again m = n = 10 we can use the large sample normal approximation to compute z = 2.532362. For this value of z we compute a P-value given by 0.0113297. As this is larger than 0.01 we still cannot reject H0 in favor of Ha . 291 Exercise 15.15 Again we label the “Unexposed” data as the xi s and the “Exposed” data as the yi s. Then our hypothesis test for this data is H0 : µ1 − µ2 = −25 Ha : µ1 − µ2 < −25 . For α = 0.05 we use Appendix Table A.14 with m = 7 and n = 8 to find P (W ≥ 71 when H0 is true) = 0.047 ≈ 0.05 , thus c1 = 71 and we reject H0 if the rank-sum w is smaller than m(m + n + 1) − c1 = 41 . For the data given here we find that w = 39. Thus we can reject H0 in favor of Ha in this case. Exercise 15.16 Again we label the “good” data as the xi s and the “poor” data as the yi s. Then our hypothesis test for this data is again H0 : µ ˜1 − µ ˜2 = 0 Ha : µ ˜1 − µ ˜2 < 0 . Part (a): Here m = n = 8 and using wilcoxon rank sum test we get a w = 41. Part (b): For α = 0.01 we use Appendix Table A.14 to find P (W ≥ 90 when H0 is true) = 0.01 , thus c1 = 90 and we reject H0 if the rank-sum w is smaller than m(m + n + 1) − c1 = 46 . For the data given here since w = 41 we can reject H0 in favor of Ha in this case. Exercise 15.17 When we have n = 8, using Appendix Table A.15 we find c = 32. With this value and using the R command wilcoxon signed rank interval.R we get a 95% confidence interval for µ of (11.15, 23.8). 292 Exercise 15.18 When we have n = 14, using Appendix Table A.15 we find c = 93. With this value and using the R command wilcoxon signed rank interval.R we get a 99% confidence interval for µ of (7.095, 7.43). Exercise 15.19 When we have n = 8, using Appendix Table A.15 we find c = 32. With this value and using the R command wilcoxon signed rank interval.R we get a 95% confidence interval for µ of (−0.585, 0.025). Note this is significantly different than the answer in the back of the book. If anyone sees anything wrong with what I have done please contact me. Exercise 15.21 For this problem we have m = n = 5, using Appendix Table A.16 for α = 0.1 we find c = 21. With this value and using the R command wilcoxon rank sum interval.R we get a 90% confidence interval for µ1 − µ2 of (16, 87). Exercise 15.22 For this problem we have m = 6 and n = 8, using Appendix Table A.16 for α = 0.01 we find c = 44. With this value and using the R command wilcoxon rank sum interval.R we get a 99% confidence interval for µ1 − µ2 of (−0.79, 0.73). Exercise 15.23 For this problem I used the R code kruskal wallis test.R to compute the Kruskal Wallis K statistic and the α = 0.1 critical value. For the given data we find k = 14.06286 and kcrit = 6.251389. Since k > kcrit we reject H0 in favor of Ha . Exercise 15.24 For this problem I used the R code kruskal wallis test.R to compute the Kruskal Wallis K statistic and the α = 0.05 critical value. For the given data we find k = 7.586587 and kcrit = 7.814728. Since k < kcrit we cannot reject H0 in favor of Ha and conclude that diet makes no difference on nitrogen production. 293 Exercise 15.25 For this problem I used the R code kruskal wallis test.R to compute the Kruskal Wallis K statistic and the α = 0.05 critical value. For the given data we find k = 9.734049 and kcrit = 5.991465. Since k > kcrit we can reject H0 in favor of Ha . Exercise 15.26 For this problem I used the R code friedmans test.R to compute the Friedman’s Fr statistic and the α = 0.01 critical value. For the given data we find fr = 28.92 and fcrit = 11.34487. Since fr > fcrit we can reject H0 in favor of Ha . Exercise 15.27 For this problem I used the R code friedmans test.R to compute the Friedman’s Fr statistic and the α = 0.05 critical value. For the given data we find fr = 2.6 and fcrit = 5.991465. Since fr < fcrit we cannot reject H0 in favor of Ha . Exercise 15.28 We label the “Potato” data as the xi s and the “Rice” data as the yi s. Then our hypothesis test for this data is H0 : µ 1 − µ 2 = 0 Ha : µ1 − µ2 6= 0 . For α = 0.05 we use Appendix Table A.14 with m = 8 and n = 8 to find P (W ≥ 87 when H0 is true) = 0.025 , thus c = 87 and we reject H0 if the rank-sum w is larger than this number or smaller than m(m + n + 1) − c = 49. For the data given here we find that w = 73 and thus we do not reject H0 in favor of Ha . Exercise 15.29 For this problem I used the R code kruskal wallis test.R (this test ignores the salesperson information) to compute the Kruskal Wallis K statistic and the α = 0.05 critical value. For the given data we find k = 7.814728 and kcrit = 11.95095. Since k > kcrit we can reject H0 in favor of Ha and conclude that the average cancellation does depend on year. 294 The book seems to have used Friedman’s test on this problem where the blocks are the sales people and the treatments are the year. If we do that we find fr = 9.666667 and fcrit = 7.814728. Since ff > fcrit we again conclude that we reject H0 in favor of Ha . Exercise 15.30 For this problem I used the R code kruskal wallis test.R to compute the Kruskal Wallis K statistic and the α = 0.05 critical value. For the given data we find k = 7.814728 and kcrit = 17.85714. Since k > kcrit we can reject H0 in favor of Ha and conclude that the true mean phosphorus concentration does depend on treatment type. Exercise 15.31 For this problem we have m = n = 5, using Appendix Table A.16 for α = 0.05 we find c = 22. With this value and using the R command wilcoxon rank sum interval.R we get a 95% confidence interval for µII − µIII of (−5.9, −3.8). Exercise 15.32 We label the “Diagonal” data as the xi s and the “Lateral” data as the yi s. Then our hypothesis test for this data is H0 : µ 1 − µ 2 = 0 Ha : µ1 − µ2 6= 0 . Part (a): For α = 0.05 we use Appendix Table A.14 with m = 6 and n = 7 to find P (W ≥ 56 when H0 is true) = 0.026 ≈ 0.025 , thus c = 56 and we reject H0 if the rank-sum w is larger than this number or smaller than m(m + n + 1) − c = 28. For the data given here we find that w = 43 and thus we do not reject H0 in favor of Ha . Part (b): To get a 95% confidence interval for the means we need to compute the dij(n) values and then our confidence interval is (dij(mn−c+1), dij(c) ) = (dij(8) , dij(35) ) = (−0.29, 0.41) , for c = 35 (using Appendix Table A.16) and with mn − c + 1 = 8. 295 Exercise 15.33 Part (a): Here Y counts the number of successes in n trials with a probability of success p = 0.5 thus Y is a binomial random variable. Thus we can compute α as α= 20 X y=15 dbinom(y, 20, 0.5) = 1 − pbinom(14, 20, 0.5) = 0.02069473 . Part (b): We want to find a value of c such that 1 − P r{Y ≤ c} = 0.05 . We find c = 13 gives the value α = 0.9423409 and c = 14 gives α = 0.9793053. If we use c = 14 then since y = 12 we cannot reject H0 in favor of Ha . Exercise 15.34 Part (a): When H0 is true this means that µ ˜ = 25 so Y is binomial random variable with n = 20 and p = 0.5. Using this we can compute α using α= 5 X y=0 dbinom(y, 20, 0.5) + 20 X dbinom(y, 20, 0.5) = 0.04138947 . y=15 Part (b): For this part we pick a value of µ ˜0 and see if H0 will be rejected i.e. we count the number of times our sample xi is larger than µ ˜ 0 call this value Y . We reject if Y ≥ 15 or Y ≤ 5. Then we let µ ˜ 0 range over all possible values and the confidence interval is the minimum and maximum of these values of µ ˜ 0 such that H0 is not rejected. Note that we only need to test values for µ ˜0 that are the the same as our sample xi (since these are the only ones where the value of Y will change). The values of µ ˜ 0 where we did not reject H0 are given by > did_not_reject [1] 14.4 16.4 24.6 26.0 26.5 32.1 37.4 40.1 40.5 Thus the 4.1% confidence interval for our value of µ ˜ is (14.4, 40.5). Exercise 15.35 When I sort and label our combined data in the manner suggested I get the following 296 3.7 y 1 4.0 x 3 4.1 y 5 4.3 y 7 4.4 x 9 4.8 x 8 4.9 x 6 5.1 y 4 5.6 y 2 Thus w ′ = 3 + 8 + 9 + 6 = 26. Using the Appendix Table A.14 to find c such that P (W ≥ c) = 0.05 with m = 4 and n = 5 gives c = 27. Since w ′ < c we don’t reject H0 . Exercise 15.36 When we have m = 3 and n = 4 we have seven total observations and so there are 73 = 35 possible rank locations for the x samples. In the python code ex15 36.py we explicitly enumerate all of the possible values for the W ′′ variable when H0 is true. When we run this code we get the null distribution for W ′′ given by probability of different W’’ values= {4: 0.05714285714285714, 5: 0.11428571428571428, 6: 0.2571428571428571, 7: 0.22857142857142856, 8: 0.2, 9: 0.11428571428571428, 10: 0.02857142857142857} From the above probability we see that if c = 10 we have P {w ′′ ≥ c} = 0.02857 and if c = 9 that P {w ′′ ≥ c9} = 0.14286. Neither of these probabilities is very close to the value of 0.1. 297
© Copyright 2024