UNIVERSITY OF TORONTO AT SCARBOROUGH Sample Exam STAC52H3 EXPERIMENTAL DESIGNS Duration - 3 hours LAST NAME_____________________________________________________ FIRST NAME_____________________________________________________ STUDENT NUMBER___________________________________________ There are 23 pages including this page and t tables. Please check to see if you have all the pages. Aids allowed: You are allowed to use the textbook (Design and Analysis of experiments by Douglas Montgomery.), the class notes and a non-communicating calculator. No other material will be allowed during the test. All your work must be presented clearly in order to get credit. Just an answer with no other work shown will only qualify for zero credit. Show your work and answer in the space provided, in ink. Pencil may be used, but then any re-grading will NOT be allowed. PLEASE CHECK AND MAKE SURE THAT THERE ARE NO MISSING PAGES IN THIS BOOKLET. 1) Many studies have suggested that there is a link between exercise and healthy bones. One study examined the effect of jumping on bone density of growing rats. There were three treatments: a control with no jumping, a low-jump condition (the jump height was 30 cm), and a high jump condition (60 cm). Thirty rats were randomly divided into three treatment groups (10 in each). After 8 weeks of 10 jumps per day, 5 days per week, the bone density of the rats ( in mg / cm3 ) was measured. The summary statistics shown below were obtained from the analysis of the data from this experiment. This question and the next two questions are based on this study. Assume that the data satisfies the necessary assumptions. Note: In this question µc , µl , and µ h denote the means of control, low-jump and high jump populations respectively. Summary Statistics: Treatment Control Highjump Lowjump N 10 10 10 Mean 601.10 638.70 612.50 StDev 27.36 16.59 19.33 a) [ 5 points] Test whether the mean bone densities are the same for all three treatments. Test at α = 0.05. State the null and alternative hypothesis and show you work clearly. b) [2 points] Calculate a 95% confidence interval for µ h . Sol Grand Total = 10*(601.1+638.7+612.5) = 18523 a) SSTrt = (1010^2)/10+(6387^2)/10+(6125^2)/10 –GT^2/30 = 7433.866667 MSTrt = 7433.866667/(3-1) = 3716.933334 MSE= (27.36^2+16.59^2+19.33^2)/3 = 465.8155333 dfError = 30-3 = 27 F = 3716.933334/465.8155333 = 7.97941045 F~F(2, 27) = 3.35 and so the trt effect is sig at the 5% level. b) S=sqrt((27.36^2+16.59^2+19.33^2)/3) = 21.5827601 T(27, 0.025) = 2.056 ME = ts/sqrt(n) = 2.056*21.58/sqrt(10) CI = 638.70±ME Page 2 of 17 2) An experiment was conducted to test the effects of five different diets for turkeys. Six turkeys were randomly assigned to each of five diet groups (labeled 1, 2, …, 5) and were fed for a period of time. (i.e. a group of 30 turkeys were divided at random into the five treatment groups, each with 6 turkeys) Their weight gains were measured at the end of this period. A part of the SAS output used in the analysis of this data set is given below. In this question µ1 , µ2 ,… , µ5 denote the means of the diet groups 1, 2, …, 5 respectively. Assume that the data satisfies the necessary assumptions. The GLM Procedure t Tests (LSD) for WtGain NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate. Alpha Error Degrees of Freedom Error Mean Square Critical Value of t Least Significant Difference 0.05 25 0.3154 2.05954 0.6678 Comparisons significant at the 0.05 level are indicated by ***. diet Comparison 5 5 5 5 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 - 4 3 2 1 5 3 2 1 5 4 2 1 5 4 3 1 5 4 3 2 Difference Between Means 2.3833 2.4000 3.8833 5.6000 -2.3833 0.0167 1.5000 3.2167 -2.4000 -0.0167 1.4833 3.2000 -3.8833 -1.5000 -1.4833 1.7167 -5.6000 -3.2167 -3.2000 -1.7167 95% Confidence Limits 1.7155 omitted 3.2155 4.9322 -3.0511 -0.6511 0.8322 2.5489 -omitted -0.6845 0.8155 2.5322 -4.5511 -2.1678 -2.1511 1.0489 -6.2678 -3.8845 -3.8678 -2.3845 3.0511 *** 4.5511 6.2678 -1.7155 0.6845 2.1678 3.8845 *** *** *** 0.6511 2.1511 3.8678 -3.2155 -0.8322 -0.8155 2.3845 -4.9322 -2.5489 -2.5322 -1.0489 *** *** *** *** *** *** *** *** *** *** *** *** a) [2 points] Use Fishier’s method to calculate a 95% confidence for the difference between the means of diet 3 and diet 5. b) [2 points] Use Tukey’s method to calculate a 95% confidence for the difference between the means of diet 3 and diet 5. c) [3 points] Calculate a 95% confidence interval for 2 µ1 − µ 2 − µ3 using Scheffe’s method. Page 3 of 17 Sol a) 2.4+/-0.6678 =( 1.7322, 3.0678) Descriptive Statistics: Weight Gain (pounds) Variable Weight Gain (pou Group 1 2 3 4 5 N 6 6 6 6 6 Mean 3.783 5.500 6.983 7.000 9.383 StDev 0.527 0.867 0.578 0.358 0.293 For part c) Note that 2µ1 − µ2 − µ3 = ( µ1 − µ2 ) − ( µ3 − µ1 ) And 2 y1 − y2 − y3 = ( y1 − y2 ) − ( y3 − y1 ) = -1.7167 - 3.2000 = S == sqrt( (5-1)*4.18*0.3154*(4/6+1/6+1/6)) = 2.296407629 F(5-1, 30-5, 0.05) = 4.18 CI = L ±S - 4.917 3) [ 5points] A researcher conducted a study of two weight reduction programs (program I and program II) with two diets (diet A and diet B). 16 subjects were randomly into four groups, each group receiving one of the four treatments (i.e. a program-diet combination) in a 2 x 2 factorial design with four observations for each treatment. The amount of weight each subject lost was measured after one month. The table below gives treatment means and standard deviations for each of the four treatments. The values in parenthesis are the standard deviations. Assume that the data satisfies the necessary assumptions. Program A Program B Diet I 4.750 (0.957) 5.500 ( 2.380) Diet II 11.250 (2.986) 13.000 ( 2.160) Test whether the interaction of program and diet is significant. Test at α = 0.05. SS_int = (x_bar(I,A)+x_bar(II, B)-x_bar(II, A)- x_bar(I, B) ^2/(1/n1+1/n2+1/n3+1/n4) = 1 with df = (2-1)(2-1) = 1 and so MS(int) = 1 MSE = pooled variance of the 4 treatment variances = the average of the four variances when the sample sizes are equal = 5.04 and F = MS(int)/MSE = 1 / 5.04 = 0.2 approx Here is the MINITAB output for ANOVA Two-way ANOVA: Weight loss versus Exercise Program, Deit Page 4 of 17 Source Exercise Program Deit Interaction Error Total S = 2.245 DF 1 1 1 12 15 SS 6.25 196.00 1.00 60.50 263.75 R-Sq = 77.06% MS 6.250 196.000 1.000 5.042 F 1.24 38.88 0.20 P 0.287 0.000 0.664 R-Sq(adj) = 71.33% Question 4 is not in the exam coverage for this year ( ) 4) In assignment 3 we showed that for a BIBD V τˆ i1 − τˆ i2 = 2k 2 σ . λa ( ) a) [4 points] Prove that for a randomized complete block design V τˆ i1 − τˆ i2 = 2σ 2 . b k 1 ≥ and give it a statistical interpretation. λa b Note: The expressions above are in the usual notation we discussed in class. You may use without proof, any results given in class but you should state them all clearly with all the assumptions under which they are valid. b) [5 points] Prove that Sol (nor a) For RCBD, τˆ i1 = yi1 . − y.. ⇒ τˆ i1 − τˆ i2 = yi1 . − yi2 . ( ) ( ) ( ) ( ) ⇒ Var τˆ i1 − τˆ i2 = Var yi1 . − yi2 . = Var yi1 . + Var yi2 . = σ 2 σ 2 2σ 2 + = b b b (Note: yi1 . − yi2 . = τi1 − τi2 + εi1 . − εi2 . ) b) From class we have λ = r ( k − 1) and ar = bk a −1 r ( k − 1) ar ( k − 1) ⇒ λa = a −1 a −1 ar (k − 1) bk (k − 1) λa = = (∵ ar = bk ) a −1 a −1 bk (a − 1) ≤ = bk (∵ a ≥ k ) a −1 1 k ⇒ ≤ b λa λ= This implies that the estimates of treatment differences in a RCBD have less standard error than the corresponding estimates from a BIBD. (2 points) 5) This question is question 15.37 p914 Ott. Page 5 of 17 [5] A food-processing plant has tested several different formulations of a new breakfast drink. A panel of six members rated these formulations. Each member of the panel rated 12 different formulations obtained from combining one of the three levels of sweetness, one of two levels of caloric content, and one of the two colours. Each member of the panel rated the formulations in a random order. Identify the experimental design. Give the analysis of variance table showing the sources of variation and the respective degrees of freedom. Ans This is a RBD with three factors and panel members are the blocks SV Sweetness(S) df 2 Caloric (Ca ) Colour (Co) 1 1 S x Ca Ca x Co S xCo S x Ca xCo Blocks Error Tot 2 1 2 2 5 55 71 6) An experiment was performed to determine the effects of four different geometrical shapes of a certain film-type resistor on the current-noise of the resistors. A BIBD was used because only three resistors could be mounted on one plate. The design layout below shows the observations on noise measurements from this experiment. Assume that the data satisfies the necessary assumptions. Plates (Blocks) 1 2 3 4 Total A 1.11 1.70 1.66 4.47 Shapes (Treatments) B C 0.95 1.22 1.11 1.52 1.22 1.54 3.55 4.01 D 0.82 0.97 1.18 2.97 Total 2.88 3.89 4.29 3.94 15.00 The sum of squats of all observations = 19.6768 a) [ 10 points] Test whether the treatment effect is significant. (Use α = 0.05) b) [ 5 points] Calculate a 95% C.I. (Fisher-type) for τˆ1 − τˆ 2 where τˆ1 and τˆ 2 are the effects of shapes A and B respectively. Page 6 of 17 Sol a) SSTot = 19.6768-(15^2)/12 = 0.9268, dfTot = N-1 = 12 – 1 = 11 Q1 = 4.47-(2.88+3.89+4.29)/3 = 0.7833333333 Q2 = 3.55 – (3.89+4.29+3.94)/3 = -0.49 Q3 = 4.01 – (2.88+4.29+3.94)/3 = 0.3066666667 Q4 = 2.97 – (2.88+3.89+3.94)/3 = -0.6 k=3 a=4 λ=2 k a 2 ∑ Qi = (3/8)*( 0.7833333333^2+0.49^2+0.3066666667^2+0.6^2) = λa i =1 0.4904083333 dfTrt = a – 1 = 4 – 1 = 3 SSTr ( Adj ) = SSBlock = (2.88^2+3.89^2+4.29^2+3.94^2)/3-(15^2)/12 = 0.3680666667 dfBlock = b -1 = 4 – 1 = 3 SSE = SSTot – SSBlock – SSTr(Adj) = 0.0683250003 dfError = dfTot – dfBlock – dfTrt = 11 – 3 – 3 = 5 F= MSTrt(Adj) / MSE = (0.4904083333/3)/(0.0683250003/5) = 11.96263767 The SAS System 1 Here is the full SAS output 03:23 Sunday, December 12, 2010 The GLM Procedure Class Level Information ^L Class Levels Values shape 4 A B C D plate 4 1 2 3 4 Number of Observations Read 12 Number of Observations Used 12 The SAS System 2 03:23 Sunday, December 12, 2010 The GLM Procedure Page 7 of 17 Dependent Variable: noise DF Sum of Squares Mean Square F Value Pr > F Model 6 0.85847500 0.14307917 10.47 0.0104 Error 5 0.06832500 0.01366500 11 0.92680000 Source Corrected Total R-Square Coeff Var Root MSE noise Mean 0.926279 9.351791 0.116897 1.250000 Source DF Type I SS Mean Square F Value Pr > F plate shape 3 3 0.36806667 0.49040833 0.12268889 0.16346944 8.98 11.96 0.0186 0.0102 Source DF Type III SS Mean Square F Value Pr > F plate shape 3 3 0.44700833 0.49040833 0.14900278 0.16346944 10.90 11.96 0.0124 0.0102 b) τöi = ( kQi (λa ) , ) V τöi − τö j = 2k 2 σ λa 7) In a large study of a health awareness program, three states (factor A, a random factor) were randomly selected from all states of the country. Each state selected, independently devised the health awareness program. From each state (factor A) selected, three cities (Factor B) were chosen at random from all the cities in the state and five households within each city were randomly selected to evaluate the effectiveness of the program. All members of the selected households were interviewed before and after participation in the program and a composite index was formed for each household measuring the impact of the health awareness program. The data were analyzed using the model: yijk = µ + α i + β j ( i ) + εijk i = 1,… , a j = 1,… , b k = 1,… , n , where a is the number of levels for factor A, b is the b is the number of levels for factor B and n is the number of replicates. iid iid iid We also assume α i ~ N (0, σ α2 ) , β j (i ) ~ N (0, σβ2 ) , εijk ~ N (0, σε2 ) and α i , β j ( i ) and εijk are independent. Page 8 of 17 The ANOVA table constructed from the resulting data set (with some entries deleted) is shown below: You may assume that the model is appropriate (necessary assumptions are satisfied) when answering questions related to this study. (See page 987 Ott) Analysis of Variance for index Source state city(state) Error Total DF 2 6 36 44 SS 6976.8 167.6 3893.2 11037.6 MS 3488.4 27.9 108.1 F omitted omitted For this design E ( MSE ) = σ ε2 , E ( MSB( A)) = σ ε2 + nσ β2 , and E ( MSA) = σ ε2 + nσ β2 + nbσ α2 in usual notation. a) [ 3 points] Test the null hypothesis H 0 : σ α2 = 0 against H a : σ α2 > 0 . Use α = 0.05. Ans n=5, b = 3 b) [ 2 points] Estimate the variance component σ α2 c) [5 points] Calculate the value of the constant c such that σβ2 σβ2 MSB( A) P 1 + c 2 Fα /2,6,36 < < 1 + c 2 F1−( α / 2),6,36 = 1 − α where Fp ,6,36 denotes MSE σε σε the value of the inverse cdf of the F distribution with 6 df in the numerator and 36 df in the denominator evaluated at p. Show your work clearly. MSB ( A) / (σε2 + nσβ2 ) Sol use P Fα / 2,6,36 < < F = 1 − α with n=5 1−( α / 2 ),6,36 MSE / σε2 2 2 (σε2 + nσβ2 ) MSB( A) (σε + nσβ ) P Fα /2,6,36 < < F1−( α /2 ),6,36 = 1 − α 2 2 σε MSE σε σβ2 σβ2 MSB( A) P 1 + n 2 Fα / 2,6,36 < < 1 + n 2 F1−( α /2 ),6,36 = 1 − α σ MSE ε σε i.e c = n= 5 8) A petroleum company was interested in comparing the miles per gallon (MPG) achieved by four different gasoline blends (1, 2, 3, and 4). Because there can be considerable variability due to drivers and due to car models, the two extraneous sources of variability were included as “blocking” variables in a Latin square design. Each of the four drivers participated in the experiment drove each of the four car models with the Page 9 of 17 assigned gasoline blend in a Latin square design. Some useful SAS outputs and the SAS code that produced it are given below. Assume that the data satisfies the necessary assumptions. options ls=75; data a; infile 'latin1.txt' firstobs=2; input Row Driver Model Blend MPG; PROC GLM data=a; class Driver ; model MPG= Driver; PROC GLM data=a; class Model ; model MPG= Model; PROC GLM data=a; class Blend ; model MPG= Blend; PROC GLM data=a; class Driver Model Blend ; model MPG= Blend Driver Model; run; quit; The GLM Procedure Class Level Information Class Levels Driver ^L 4 Values 1 2 3 4 Number of Observations Read Number of Observations Used The SAS System The GLM Procedure 16 16 2 Dependent Variable: MPG DF Sum of Squares Mean Square F Value Pr > F Model 3 5.8968750 1.9656250 0.03 0.9936 Error 12 869.7025000 72.4752083 Corrected Total 15 875.5993750 Source DF Type I SS Mean Square F Value Pr > F Driver 3 5.89687500 1.96562500 0.03 0.9936 Source Page 10 of 17 Source DF Type III SS Mean Square F Value Pr > F Driver 3 5.89687500 1.96562500 0.03 0.9936 The GLM Procedure Class Level Information Class Levels Model 4 Values 1 2 3 4 Number of Observations Read Number of Observations Used 16 16 ^ The GLM Procedure Dependent Variable: MPG Source Model Error Corrected Total Source Model Source Model DF Sum of Squares Mean Square F Value Pr > F 3 736.9118750 245.6372917 21.25 <.0001 12 15 138.6875000 875.5993750 11.5572917 DF Type I SS Mean Square F Value Pr > F 3 736.9118750 245.6372917 21.25 <.0001 DF Type III SS Mean Square F Value Pr > F 3 736.9118750 245.6372917 21.25 <.0001 The GLM Procedure Class Level Information Class Levels Blend 4 Values 1 2 3 4 Number of Observations Read Number of Observations Used 16 16 The GLM Procedure Dependent Variable: MPG Source DF Sum of Squares Mean Square F Value Pr > F Model 3 108.9818750 36.3272917 0.57 0.6462 Error 12 766.6175000 63.8847917 Page 11 of 17 Corrected Total Source Blend 15 DF 3 Source Blend 875.5993750 Type I SS 108.9818750 Mean Square 36.3272917 F Value 0.57 Pr > F 0.6462 DF Type III SS Mean Square F Value Pr > F 3 108.9818750 36.3272917 0.57 0.6462 [3 points] Which of the following numbers is closest to the appropriate F-statistic for testing the equality of means of the four blends? Circle your answer. [Note: This is a multiple choice question. You do not have to show your work in this question and no part marks are given for showing work] A) B) C) D) E) 7 9 11 13 0.57 Ans B) 9 Note: This is example 15.4 p865 Ott SS_blend = 109.0 df_blend = 4-1 =3 MS = 109.0/3 =36.3 SSE = SSTot – SS_Driver – SS_Model – SS_Blend = 875.6 – 5.9 – 736.9 – 109.0 = 23.8 Df_error = 15-3-3-3 – 6 and MSE = SSE/df_error = 23.8/6 = 3.97 F= MS_Blend/MSE = 36.3/3.97 = 9.14 =9 (approx) Here is the full output The GLM Procedure Dependent Variable: MPG Source Model Error Corrected Total R-Square DF Sum of Squares Mean Square F Value Pr > F 9 6 851.7906250 23.8087500 94.6434028 3.9681250 23.85 0.0005 15 875.5993750 Coeff Var Root MSE MPG Mean Page 12 of 17 0.972809 8.950364 1.992015 22.25625 Source DF Type I SS Mean Square F Value Pr > F Blend Driver Model 3 3 3 108.9818750 5.8968750 736.9118750 36.3272917 1.9656250 245.6372917 9.15 0.50 61.90 0.0117 0.6987 <.0001 Source DF Type III SS Mean Square F Value Pr > F Blend Driver Model 3 3 3 108.9818750 5.8968750 736.9118750 36.3272917 1.9656250 245.6372917 9.15 0.50 61.90 0.0117 0.6987 <.0001 Page 13 of 17 9) (This is from q13.8 p798 Ott) [ 3 points] An experiment was conducted to compare the number of major defectives obtained along each of three production lines in which changes were being instituted. Production was monitored continuously during the period of changes, and the number of major defectives was recorded per day for each line. The data are given below: Production Line 2 54 41 38 33 56 1 34 44 32 36 51 3 75 62 45 10 68 The data showed violations of the one-way ANOVA procedure for comparing the three lines and so decided to use the Kruskal-Wallace test. Which of the following numbers is closest to the value of the Kruskal-Wallace statistic for comparing the three production lines? A) B) C) D) E) 2.0 2.5 3.0 3.5 4.0 Ans B 2.5 The formula on page 191 supp Vukov gives H = 2.66 which is exact the same in MINITAB output. 2 12 ∑ H = N ( N + 1) T i − 3( N + 1) n i = (12/(15*16))*5*(5.8^2+7.8^2+10.4^2)-3*16=2.66 ————— 7/3/2006 2:57:08 AM ———————————————————— Worksheet size: 10000 cells. Welcome to Minitab, press F1 for help. Kruskal-Wallis Test: defects versus Line Kruskal-Wallis Test on defects Line 1 N 5 Median 36.00 Ave Rank 5.8 Z -1.35 Page 14 of 17 2 3 Overall H = 2.66 5 5 15 41.00 62.00 DF = 2 7.8 10.4 8.0 -0.12 1.47 P = 0.264 10) An area in a greenhouse has 10 benches. On each bench, there are 3 pots. Each pot contains 1 plant. Researchers want to compare the effect of three treatments (A, B, and C) on the growth of the plants. The researchers randomly assign the three treatments to the three plants on each bench so that each treatment is represented exactly once on each bench. State whether each of the following statements based on this information is true or false. [ 1 point for each ] i) This is a completely randomized 1-factor design. [True / False] Ans F ii) This is a randomized block design. [True / False] Ans T iii) This is a two factor factorial design. [True / False] Ans F iv) This is an incomplete block design. [True / False] Ans F v) The hypothesis of equality of treatment means can be tested using a test statistic that has an F-distribution with 2 degrees of freedom in the numerator and 18 degrees of freedom in the denominator. [True / False] Ans T And: F(3-1= 2, 29 – 2-9 = 18) 11) This question is based on 15.25 p910 Ott. An experiment was set up to compare the effect of different soil pH and calcium additives on the trunk diameters of orange trees. Sulfur, Gypsum and other ingredients were applied to provide pH levels of 4, 5, 6, and 7. Three levels of calcium supplement (100, 200 and 300 pounds per acre) were also applied. All factor level combinations were used in the experiment. The trees were assigned to treatment at random. At the end of a 2-year period, three diameters were measured at each factor level combination. Some useful outputs from the analysis of this data set are given below. ANOVA Table: diameter versus pH, Calcium Source pH Calcium Interaction Error DF A B C D SS 4.4608 1.4672 3.2550 1.6267 MS 1.48694 0.73361 0.54250 0.06778 F 21.94 10.82 8.00 P 0.000 0.000 0.000 Page 15 of 17 Normal Scores Plot of Residual 0.50 Residual 0.25 0.00 -0.25 -0.50 -0.75 -2 -1 0 Normal Score 1 2 Interaction Plot (data means) for diameter C alcium 100 200 300 7.6 Mean 7.2 6.8 6.4 6.0 4.0 5.0 6.0 7.0 pH State whether each of the following statements based on this information is true or false. Just circle your answer [ 1 point for each ] i) The trunk diameter of orange trees is the explanatory variable in this study [True / False] Ans F, it is the response variable. ii) There are two factors of interest in the study. [True / False] Page 16 of 17 Ans T, pH, calcium iii) There are 12 treatments in this study [True / False] Ans T iv) The degrees of freedom for the numerator and the denominator of the F-statistic for testing the interaction effect are 6 and 24 respectively. [True / False] Ans T numerator df = 2 x 3= 6 denominator df = (3 x12 –1) 3 –2- 6 = 24 v) Since the normality of residuals is questionable, we should try a transformation or some other remedy. [True / False] Ans F, normality if good vi) The effect of calcium on trunk diameter is about the same for all pH levels. [True / False] Ans F interaction is sig. Page 17 of 17
© Copyright 2024