Example #1 – Confidence Interval To find Z values, use "INVN" function and input "area" (which is the A sample of 20 students were asked to fill in a confidence level (usually expressed in %); std dev=1; and mean=0. survey on gender, how many hours spent studying for the test and what was the grade earned on the To find the t values, you have two options: (1) t table (which will be provided same test that they had studied for. in the quiz/test/exam) and (2) INVt function and input "upper area", "df" Hours: Male 14, Female 17, Female 3, Female 6, (degree of freedom=n-1). Male 17, Male 3, Male 8, Male 4, Female 20, Male *t-distribution is a continuous probability distribution that resulted from estimating the 15, Female 7, Male 9, Male 0, Male 5, Female 11, mean of a small sample size and the population standard deviation is unknown (more Female 15, Male 18, Female 13, Male 8, Male 4 spread out than a normal distribution; the larger the degrees of freedom=more accurate Construct a 91% confidence interval estimate of the normal distribution) proportion of all female students. Margin of error: measures the uncertainty in estimating the population parameter. a) What parameter are you estimating here? π Margin of error= (critical value) x (standard error). b) State/check the necessary assumptions required to Point estimate: Using the sample Example statistic (e.g. #2 or p) to estimate the corresponding construct the confidence interval: The director of patient services of a large health Example #4 np > 5? 20 (8/20) = 8 maintenance organization wants to evaluate What proportion of people hit n(1-p) 20 (1 – 8/20) = 12 patient waiting time at a local facility. A random snags with online The conditions are met, therefore, it is normally sample of 25 patients is selected from the appoint transactions? According to a distributed. book. The waiting time is defined as the time poll, 89% hit snags with c) Before you construct a 91% confidence interval from when the patient signs in to when he or she estimate of the proportion of all female students, state the online transactions. is seen by the doctor. The following data a) To conduct a follow up study critical value: z = ± 1.6954 represents the waiting times (in minutes). x is that would provide 95% InvN, Area: 0.91, σ = 1, μ = 0 known: confidence that the point d) The point estimate for a 91% confidence interval 19. 30. 45. 39. 29. 41. 13. 17.4 10 estimate is correct to within estimate of the proportion of all female students is: x = 5 5 6 8 6 3 8 .7 ±0.04 of the population 9.85 25. 21. 28. 52. 25. 39. 36. 1.9 Example #3 proportion, how large a sample 4 8 6 0 4 0 6 A survey is planned to determine the mean annual size is required? 45. 42. 12. 26. 4.9 12. 31. 43.1 family medical expenses of employees of a large 235.0339 234 *In this 9 5 1 1 7 1 company. The management of the company wishes question, you are estimating to be 95% confident that the sample mean is Construct a 95% confidence interval estimate of π (the proportion) correct to within ±50 of the population mean the pop. avg. waiting time. π = 0.89 z = 1.9599 e = 0.04 annual family medical expenses. A previous study 1. State the necessary assumption(s)/condition(s) indicates that the standard deviation is approx. Hypothesis Testing - Mean (Z-Test) required to construct a 95% confidence interval A new battery has been developed $400. estimate for the population average waiting time. to power laptop computers. It will - It is not normally distributed (n is not equal or a) How large a sample size is necessary? 245.8 246 * sell in a certain price range. It is greater than 30) In this question, you are estimating μ (the mean) hoped that the battery can be used - Apply CLT ....no. because n = 25 z = 1.9599 e = 50 σ = 400 n = z²σ² for more than 4.00 hours before it - Assume x is normally distributed e² needs to be recharged. We will 2. State the critical value needed to construct the 95% b) If management wants to be correct with ±25, how assume that the battery lives are confidence interval estimate for the population average many employees need to be selected? normally distributed with a waiting time. standard deviation of 0.25 hours. A t critical values = ± 2.064 (Calc: Dist., t, Invt, random sample of 50 batteries is Area: 0.025, df: 24) tested. The sample batteries lasted an average of 4.12 hours before they required recharging. SPSS Example The table below is a random sample of 20 companies whose stock is traded on the New York Stock Exchange. For each company, the number of shares traded on May 25, 1999, and May 26, 1999, is given. a. At the .05 level of significance, is there evidence that the average number of shares traded on May 25 is higher than the average number of shares traded on May 26? b. Determine the p-value in (a) Ho: µD < 0 Ha: µD > 0 Assume the distribution of the differences between the number of shares traded on the two days is at least approximately normally distributed Decision Rule: Reject Ho if p-value is < than the level of significance (0.05), otherwise fail to reject Ho p-value = 0.4645 Calc: TEST, t, 1-s, µ: >µ0, µ0: 0, x: 6090, sx: 301997.99, n: 20 Decision: Fail to reject Ho 3. State the point estimate and its value used to construct a 95% confidence interval estimate for the population average waiting time. x = 27.892 (Calc: INTR, t, 1-s, List, C-level: 0.95) 4. The 95% confidence interval estimate for the population average waiting time is: 22.1668939 < μ < 33.6171261 Calc: INTR, t, 1-s, List, C-level: 0.95, left and right values Hypothesis Testing - Mean (T-Test) A new battery has been developed to power laptop computers. It will sell in a certain price range. It is hoped that the battery can be used for more than 4.00 hours before it needs to be recharged. We will assume that the battery lives are normally distributed. A random sample of 50 batteries is tested. The sample batteries lasted an average of 4.12 hours with a standard deviation of Hypothesis 0.25 hours Testing before for Z- they Test –required σ knownrecharging. Let A company that makes bolts that are used on an automotive component uses two machines to make these bolts. It has been determined by past studies that the standard deviation of the bolt diameters made by machine 1 is 0.025 mm. and the standard deviation of the bolt diameters of machine 2 is 0.022 mm. Both machines have a dial to set for the desired diameter. Recently they used both machines to fill a large order. The customer found that many of the bolts from a certain package were too large and made a complaint. It was determined that the package in question was made by machine 2. The manufacturer decided to take samples of the bolts from both machines to test to see whether the mean diameter of the bolts from machine 2 was significantly larger than the mean diameter from machine 1 when the dial was set to the same diameter on each machine. The sample of 100 bolts from machine 1 had a mean diameter of 5.023 mm and a Hypothesis Testing – One Population Population Mean: µ Is normal? Ie. X is or can be assumed normal or n ≥30? (CLT Theorem) NO NO Use a distributionfree test or if appropriate assume the population is normally distributed and proceed through the flow chart Is σ known? Z-Test, with test Statistics about a population parameter = NULL hypothesis - Denoted as Ho - The null hypothesis is written in terms of the population mean, not the sample mean - After specifying Ho, we have to specify Alternative Hypothesis - YE S YE S - In hypothesis testing we begin with a tentative assumption Population proportions: π YE S Convert to underlyin g binomial distributi on Denoted Ha OR H1 Example: - Ho : The average age of the population is equal to 45 years old - Ha : The average age of the population is NOT equal to 45 years old - The alternative hypothesis is also known as research hypothesis Important note: In any situation that involves testing the validity of a claim, the null hypothesis is based on the assumption that the claim is true. The alternative hypothesis is formulated so that rejection of Ho will NO Z-Test, with test Statistics T-Test, with test statistic Hypothesis Testing – Proportion (Z-Test) A random sample of 300 retail outlets indicated that 165 outlets included the GST in their prices while the others did not. Can one conclude at the 5% level of significance that more than retail outlets include the GST in their prices than those that do not? Independent Samples What is the corresponding p‐ value? The samples chosen at random are not related to each other. We wish to study the mean incomes of companies X and Y. We select a random sample of 28 employees from the Company X and a sample of 19 employees in Hypothesis Testing – Two Company Y. A person cannot be an employee in Population Means both companies. Dependent Sample Dependent samples are Independent characterized by a Sample Dependent/Paire measurement, then some type d Samples of intervention, followed by another measurement. Paired Is 1 and 2 normal? samples are also dependent NO because the same individual or item is a member of both samples. YE Paired T-test Advanced Examples: 10 participants in a S Statistics marathon were weighed prior to and after competing in the race. Is σ1 and σ2 Example: Ho: µ = 4.5 and Ha: µ ≠ We wish to study the mean Statistical Decision: Hypothesis Testing for T-Test (equal known? 4.5have NO Risks in Decision Making Using Hypothesis-Testing 1. If you reject the Ho, you variance) Methodology Type 1 error ifσyou say that µ ≠ 4.5 when statistical proof that the alternative A work team has developed a new process to = YE - Since our statistical evidence is based on sample data and the 1 YE hypothesis is correct. µ =NO 4.5 assemble a certain component. They would S corresponding sample variability, there is a risk that we may σ 2. If you do S not reject the Ho, then you 2 like to know if this new process has make the wrong conclusion. have failed to prove the Ha. The failure to significantly reduced the time to assemble Type 1 error and Type II error prove the alternative hypothesis, does not the component. They have taken samples of T-Test, with “NoHo Pooled - Type 1 error: Reject when Ho is true. mean that you have proven the null 50 components produced by the existing Variance” Prob of committing Type I error =process α (leveland of 40 components produced by the hypothesis. *we can never prove that H0 is true significance) new process. The mean and standard Z-Test, with test T-Test, with “Pooled You control the Type I error by deciding the risk level that you deviation of αthe assembly times for the Statistics Variance” are willing to have in rejecting the null hypothesis existing process when it is were true. 73.2 minutes and 3.6 minutes, - Type II error: Did not reject Ho when Ho is false.respectively. The mean time was Prob of committing Type II error =71.4 β minutes with a standard deviation of 3.2 for the components assembled by It depends on the difference between theminutes hypothesized and the newdifference process. Assume that the times for actual value of the population parameter.the If the between the hypothesized and actual value of the population parameter is large, then β is small. Regression Analysis(RA) is a statistical forecasting model that is concerned with describing and evaluating the relationship between a given variable (usually called the dependent variable, denoted as Y) and one or more other variable (usually known as the independent/exploratory variable, denoted as X) . • RA can predict the outcome of a given key business indicator (dependent variable) based on the interactions of other related business drivers (independent /exploratory variables) • The relationship can be described as a function of a linear (straightline) equation <called linear regression> “simple linear regression” Dependent Variable (Notation: Y) – The variable you wish to predict Independent Variable (Notation: X) – Variable used to make the prediction Simple Linear Regression – A single numerical independent variable X is used to predict the numerical dependent variable Y Multiple Regression – Use several independent variables to predict a numerical dependent variable Y. *When changes in the variable X leads to predictable change in the variable Y then we say “X can be used to explain Y” Regression analysis allows you to identify the type of relationship that exists between a dependent variable (X) and an independent variable (Y). The simplest relationship is the straight line or linear relationship Populatio Sample n Regression Y β0 b0 Coefficient Rule intercep of Thumb Concepts t Type of X Y X, Y Slope β1 B1 Relationship Increas Increase Regression (least squares) Y = β0 + Y = b0 + line es s x + εsame b1x + e Direct Move inβ1the Decrea Error Decreas Random (Residual) direction ε e relationship ses es Forecasting Line/ Y = b0 + Increas Decreas Prediction Line b1x es es Move in the Inverse opposite direction relationship Decrea Increase ses s Increas Cannot es Tell No apparent No relationship relationship Decrea Cannot ses Tell ASSUMPTIONS General Assumptions of the Simple Linear Regression Model <similar to ANOVA> Referring to the residuals 1. Linearity - The mean of the model error terms is 0. 2. Independence - The model error terms are independent. 3. Normality - The regression model errors are normally distributed. 4. Equal variance - The model error terms have a constant variance, σε2 , for all combinations of values of the independent variables. The Coefficient of Determination The Actual Data Value The Predicted Data Value Testing whether there is a linear relationship Test for significance of the correlation between x and y. Ho: ρ = 0 (There is no linear relationship) - Ha: ρ ≠ 0 (There is a linear relationship) Test for significance of the regression slope coefficient. Ho: β = 0 (There is no linear relationship) – Ha: β ≠ 0 (There is a linear relationship) Note: For a simple linear regression model (one independent variable), these two are equivalent methods. Example: Simple Linear Regression The marketing manager of a large supermarket chain would like to determine the effect of shelf space on the sales of pet food. A random sample of 12 equal-sized stores is selected with the following results: Stor Shelf Week Sales, e Space, X Y ($00) (feet) 1 5 1.6 2 5 2.2 3 5 1.4 4 10 1.9 5 10 2.4 6 10 2.6 7 15 2.3 8 15 2.7 9 15 2.8 10 20 2.6 11 20 2.9 12 20 3.1 **First determine which are the independent and dependent variables a. Assuming a linear relationship, use the least-squares method to find the regression coefficients b0 and b1. Where b0 = intercept and b1 = slope (stat, F3, TEST, T, REG, ≠, List # x, List # y, Freq-1) b1 = 0.7400 b0 = 1.450 b. Interpret the meaning of the slope b1 in this problem. Positive Relationship - For every increase of one foot in shelf space, there is an expected increase of 0.074 hundred of dollars ($7.40) in weekly sales. c. Use the regression model developed in (a) to predict the average weekly sales (in hundreds of dollars) of pet food for stores with 8 feet of shelf space for pet food (x=8); y= 1.45 + 0.074 (8)= 2.042*100 because sales is in ($00) = $204.20 *predict the average weekly sales (in hundreds of dollars) of pet food for stores with (12)20 feet of shelf space for pet food. What is the residual error? (X=20); y=1.45+0.074(20) = 2.93*100 $293; Residual Error = e= y – y = 3.1 – 2.93 = 0.17 d. compute the coefficient of determination r2, and interpret its meaning in this problem. NOTE: r2 is called the coefficient of determination. MEANING: The percentage of variation in the dependent variable explained by its relationship to the independent variables in the regression model r2 = 0.6839 ~ 68.39% of the variation in y (week sales) is explained by x (shelf space) e. Compute the coefficient of correlation r. NOTE: r is the sample correlation coefficient (also called the Pearson coefficient of correlation). It is used to measure the strength of association between two variables. r= 0.82700 (positive relationship) f. At the 0.005 level of significance, is there evidence of a linear relationship between shelf space and sales? T-Test; Reject Ho, since there is evidence of linear rel. between x&y Ho:β1 =0 p-value test statistics used to make statistical decisions Ha: β1≠0 tcal 1. Reject Ho or 2. Don’t Reject Ho Tcal = 4.6517273 p-value = 0.00090566 < α=0.05 Analyzing a Multiple Regression Model Step 1: Collect sample data. The values of Y, X1, X2, X3, … ,Xk Step 2: Hypothesize the form of the model. This includes choosing which independent variables to include in the model. Y = β0 + β1X1 + β2X2 + β3X3 + … + βkXk + ε (Linear) Step 3: Use the method of least squares to estimate the unknown parameters β0 , β1 , β2…., βk Step 4: Specify the probability distribution of the random error component ε and estimates its variance σ2 *σ2 = variance of the random error ε Step 5: Statistically evaluate the utility (or usefulness) of the model. Step 6: Check the assumptions on σ are satisfied and make model modifications, if necessary. Step 7: Finally, if the model is deemed adequate, use the fitted model to estimate the mean value of y or to predict a particular value of y for a given values of the independent variables, and to make other inferences. ASSUMPTIONS About the random - Error ε Referring to the probability distribution of the random error component ε and estimates its variance σ2. 1. Linearity ‐ The mean of the model error terms is 0.(E(ε)=0) 2. Independence ‐ The model error terms are independent. 3. Normality ‐ The regression model errors are normally distributed. 4. Equal variance ‐ The model error terms have a constant variance, σ2 , for all combinations of values of the independent variables. Estimator of σ2 for multiple regression with k Caution: A rejection of null hypothesis: independent variables Ho: β1 = β 2 =…..= β k =0 in the global F‐test leads to the conclusion that the model is statistically useful. - However, statistically “useful” does not necessarily mean “best”. - Another model may prove even more useful in Example: Multiple Regression terms of providing more reliable estimates and The following are data on horsepower x1, time predictions. from zero to 60 miles per hour (x2), top speed (x3), - The global F‐test is usually regarded as a test miles per gallon (x4), and price (y) in thousands of the model must pass to merit further dollars for 10 sports cars (Road & Track, October consideration. 1994). X1 X2 X3 X4 Y BMW M3 24 6. 12 24. 38. 0 0 0 6 4 Corvette 30 5. 17 16. 41. 0 7 0 8 4 Dodge Viper 40 4. 16 14. 54. 0 8 0 0 8 Ford 24 6. 14 18. 25. Mustang 0 9 0 0 8 Honda 19 7. 13 24. 25. Prelude 0 1 9 0 6 Misubishi GT 32 5. 15 16. 43. 0 7 9 3 7 Toyota 32 5. 15 18. 48. Supra 0 3 5 8 2 Nissan 30 6. 15 18. 0.8 300ZX 0 0 5 7 Alfa Romeo 32 7. 15 17. 38. 0 6 0 5 1 Mazda RX 25 5. 15 17. 35. 7 5 5 8 0 0 a. Develop an estimated regression equation with horsepower, time from zero to 60 miles per hour, top speed, and miles per gallon as the four independent variables to predict Hypothesis Testing for Dependent Samples The process improvement team selects 12 cars at random and uses both procedures on each car. There are two procedures: A and B. We record the time (in mins) for each procedure to oil and filter change. The results are shown in the next slide. At the 1% level of significance, can we conclude that there is a difference in the average time for an oil change and filter change? The times are normally distributed. Automo bile 1 2 3 4 5 6 7 8 9 10 11 12 Time (mins) for filter change Procedure A 28.2 27.1 26.4 27.3 24.8 23.4 26.8 27.2 25.5 25.8 26.0 25.4 an oil and Procedure B 25.4 27.0 25.5 27.1 26.5 27.4 26.2 26.8 28.9 26.1 24.7 26.6 j) What was the statistical decision made and why? Reject Ho because the p-value is < than the level of significance (0.05) k) Using specific references to the appropriate population parameters, state the test’s conclusion: There is sufficient sample evidence to indicate there is a significant difference between the two methods in the proportion of fair ratings D=Time A -Time B 2.8 0.1 0.9 0.2 -1.7 -4.0 0.6 0.4 -3.4 -0.3 1.3 -1.2 Hypothesis Two Sample Testing – Proportion: A human resources director decided to investigate employee perception of the fairness of two performance evaluation methods. To test for the differences between the two methods, 160 employees were randomly assigned to be evaluated by one of the methods: 78 were assigned to method 1, where individuals provide feedback to supervisory queries as part of the evaluation process; 82 were assigned to method 2, where individuals provided selfassessments of their work performances. Following the evaluations, employees were asked whether they considered the performance evaluation fair or unfair. Of the 78 employees in method 1, there were 63 fair ratings. Of the 82 employees in method, there were 49 fair ratings. Using a 0.05 level of significance, is there evidence of a significant difference between the two methods in the proportion of fair ratings? a) What type of parameter is being tested here? π b) Define the variable or parameter associated with this test: π1 = population proportion of fair ratings for method 1 π2 = population proportion of fair ratings for method 2 c) State the hypotheses: Null Hypothesis: Ho: π1 = π2 Alternative Hypothesis: Ha: π1 ≠ π2 d) State the condition(s) are required to be true for this procedure to be legitimate: Apply CLT, x1 and x2 are normally distributed e) What is the calculator procedure that you used for doing this test? z, 2-p, Var, ≠ f) What is the critical value for this test? z = ±1.9599 Calc: DIST, Norm, InvN, Tail: Central, Area: 1-0.05, σ = 1, µ = 0 g) What is/are the rejection region(s)? Zcal < -1.9599 and Zcal > 1.9599 h) What is the value of the test statistic? Zcal = 2.8991 i) What is the p-value for the test? p-value = 0.0037413 Calc: TEST, z, 2-p, p1: ≠p2, x1: 63, n1: 78, x2: 49, n2: 82 Hypothesis Testing for T-Test (unequal variance) We wish to determine if there is a difference in the breaking distances for two types of tires. Use the 5% level of significance and assume that the breaking distances for each type of tire are normally distributed with the unequal variance. Based on the data for the samples of tires shown, at the 5% level of significance, should we conclude that there is a difference in the mean breaking distance? Breaking Distance (meters) Tire (A) Tire (B) 83 75 79 84 82 76 84 83 80 85 81 78 83 Hypothesis Two Sample Testing – Variances: A carpet manufacturer is studying differences between two of its major outlet stores. The company is particularly interested in the time it takes customers to receive carpeting that was ordered from the plant. Data concerning a sample of delivery times for most popular type of carpet are summarized as follows: At the 0.01 level of significance, is there evidence in a difference in the variances of the shipping time between the two outlets? a) What type of parameter is being tested here? σ b) Define the variable or parameter associated with this test: σ1 ² = population variance in the shipping times for store A σ2 ² = population variance in the shipping times for store B c) State the hypotheses: Null Hypothesis: Ho: σ1 ² = σ2 ² Alternative Hypothesis: Ha: σ1 ² ≠ σ2 ² d) State the condition(s) are required to be true for this procedure to be legitimate: Assume that the proportions are normally distributed e) What is the calculator procedure that you used for doing this test? F, Var f) What is the critical value for this test? To get Fu go to InvF, Area: 0.01/2, n: df = 40, n: df = 30 Fu = F0.01/2, 40, 30 = 2.52 FL = F 1/Fu = 0.416 g) What is/are the rejection region(s)? F > 2.25 and F < 0.416 h) What is the value of the test statistic? Fcal = 0.5993756 i) What is the p-value for the test? p-value = 0.129946737 Calc: TEST, f, Variable, σ1: ≠σ2, sx1: 2.4, n1: 41, sx2: 3.1, n2: 31 j) What was the statistical decision made and why? Fail to reject Ho because the p-value is > than the level of significance (0.01) k) Using specific references to the appropriate population parameters, state the test’s conclusion: There is not enough evidence to conclude that the two proportion variances in the shipping times between the two major outlet stores are not different *Note: assuming the underlying normality in the 2 populations is met, based on results above, it is appropriate X Store A 34.3 days S n 2.4 days 41 Store B 43.7 days 3.1 days 31 Confidence Interval Estimate What Population Parameter are you testing? Is normal? Ie. X is or can be assumed normal or n ≥30? (CLT Theorem) Non – Standar d procedu re required Is σ known? Limits : Is p normal? Ie. np ≥ 5 and n(1-p) ≥ 5 Limits : Limits : Limits : Limits: Find Sample Size Chi Square Testing: Example #3 What is the test(s) procedure A survey is taken in three different locations in Nassau for this set of hypothesis County in New York to a. Z test determine whether there is a relationship between b. X2 test architectual style of houses Post Hoc Tests – will be given on test c. Shelf T test and georgraphic location. The results for a sample of 233 Dependent Variable: Life F test houses are as follows: Tukey HSD * The d. mean difference is significant e. ATukey-Kramer &B Multiple Procedure East Farmingd Levittow Tota level Comparisons: Test of0.05 Homogeneity of Variances Medo ale n l • To determine which means differ, you can use Life (days) w procedure. ChiTukey-Kramer Square x2 proportion test Cape 31 14 52 97 • SPSS output Qualitative Data “counting” of attribute Levene df1 df2 Sig. Expanded 2 1 12 15 *When you have a global test after rejection it ranch meansStatistic that the means are different Colonial 6 8 9 23 .528ANOVA 3 (AN20 .668 One-Way alysis Of VAriance) Ranch 16 20 24 60 - Compare means of more than two groups (p-value approach) Split-level 19 17 2 38 - One-way ANOVA, deals with one factor of Total 74 60 99 233 interest (e.g. performance, salary, etc) ANOVA At the 0.05 level of significance, is there evidence of a Example #1 - Analyzing the variation “within groups” and A snack foods company that supplies stores in a metropolitan area with relationship between “between groups” architectural and geographic location? “healthy” snack products was interested in improving-the shelf life of its populations whose values c groups represent a) What test procedure was used for doing thistortilla test? Chichips product. Six batches (each batch containing pound)and of independently selected, areone randomly Square the product were made under each of four different formulations. The distribution follow a normal b) State the hypothesis using statistical symbols: batches were then kept under the same conditions of -storage. Product Have equal variances? Null Hypothesis: Ho: style is not related to locationwas checked each day for freshness. The shelf life in days until condition o Use Levene’s test (SPSS output) Ho: π1 = π2 = π3 the product was deemed to be lacking in freshness was as follows: At p-value Alternative Hypothesis: Ha: style is related theto .05 level of significance, completely analyze the data to determine - p-value allows you to make direct conclusions location whether there is evidence of a difference in the average shelf life among Ha: at least one πj is different / Ha: not all πj is equal the formulations. If appropriate, determine which groups differ average Oneway ANOVA – may not beingiven on test c) State the condition(s) and assumption(s) are required for life. Shelf Life this procedure to be legitimate: All the fe > 5shelf for all cells 30.80 24.97 41.21 Calculator Input: 6 8 4 List 1: A, B, C, D 4.763 3.862 6.373 List 2: 111111, 222222, etc 9 6 3 7.304 5.992 9.772 Factor A: List 2 7 7 5 Dependent: List 1 19.05 15.45 25.49 MEAN 5 3 Xa = 95.33 Define the parameters being tested: What is the p-value for the test? 0.000 12.06 9.785 16.14 μa = population shelf life for This is a upper tail test (always an upper tail test) Xbaverage = 84.833 8 4 5 formulation A Xc = 75.33 To find Fu: Dist, F, InvF, Area: 0.05, n: df: d) Draw a graph of the most appropriate distribution clearly μb = population average shelf life for 3, d: df: 20 showing the value(s) for the critical value(s) and the Xd = 81.833 formulation B rejection region(s). Fu = 3.098 μc = population average shelf life for What condition(s) are required to be true for this How to obtain df? df(5-1), df(3-1) = 4 x 2 = 8 formulation C procedure to be legitimate? μd = population average shelf life for Randomness and independence formulation D Normality visual box whisker plot State the hypothesis: of variance Ho: σa² = σb² = σc² Null Hypothesis: Ho: μa = μHomogeneity b = σd² = μc = μd (I) (J) Mean Std. Sig. Lower Upper Ha: not all the σj² is Alternative Hypothesis: Method Ha: at least Difference (I – J)Error Method Bound Bound equal one μj is different A B 10.50* 2.44 .002 3.66 17.34 Reject < than the level of What standardized test statistic is being used C 20.00*Ho since p-value 2.44 is .000 13.16 26.84 significance by this test? D 13.50* 2.44 .000 6.66 20.34 P-value Significance 0.05 F = 23.274 (given in the B“Oneway A -10.50*= 0.000, Level 2.44 of.002 -17.34 = -3.66 C D C D A B D A B C 9.50* 3.00 -20.00* -9.50* -6.50 -13.50* -3.00 6.50 2.44 2.44 2.44 2.44 2.44 2.44 2.44 2.44 .005 .617 .000 .005 .066 .000 .617 .066 2.66 -3.84 -26.84 -16.34 -13.34 -20.34 -9.84 -.34 16.34 9.84 -13.16 -2.66 .34 -6.66 3.84 13.34 If appropriate, determine which method differs in average shelf life? Use 0.05 level of significance. Since Ho is rejected, it is appropriate to use the Tukey test Tukey procedure Post Hoc tests Using the Tukey output (at the .05 level of significance) formulation A has a longer shelf life than formulations B, C, and D. Formulation B has a longer shelf life than formulation C. At the 5% level of significance formulation A appears to have a longer shelf life than formulations
© Copyright 2024