SPSS Data Analysis Workshop Continuous Data Analysis: Parametric Approaches 1. One-Sample T Test  This test is a statistical procedure usually performed for testing the mean value of a distribution. Statistical Hypotheses: 𝐻0 : 𝜇 = 𝜇0 versus 𝐻1 : 𝜇 ≠ 𝜇0   Assumption: Sampled data comes from a normal distribution. For large samples, the procedure often performs well even for non-normal populations. Example 1: Data below are breakdown voltage of 12 diodes: 9.099 9.514 9.174 8.928 9.327 8.800 9.377 8.920 8.471 9.913 9.575 8.306 (Source: Montgomery et al. (2010) p. 185.) a. Check the normality assumption for the data. b. Test the claim that the mean breakdown voltage is different from 9 volts. Steps: I. Enter the data in SPSS with the variable “BV” takes up one column. II. Check normality assumption: SPSS: Analyze ⊳ Descriptive Statistics ⊳ Explore… i. ii. iii. iv. v. In the Explore dialog box, select “BV” into the Dependent List. For normality test, click the Plots button. The Explore: Plots dialog box will appear on the screen. Check the Normality plot with tests and click Continue in the Explore: Plots dialog box. Then, click OK in the Explore dialog box. Result: The normality test result will appear in the SPSS Output window. The pvalue 0.200 and 0.995 from Shapiro-Wilk test of normality are both greater than the 1 standard significance level 0.05. These imply that it is acceptable to assume that the Breakdown Voltage is normal (or bell-shaped). Tests of Normality a Kolmogorov-Smirnov Statistic Breakdown Voltage df .091 Shapiro-Wilk Sig. 12 Statistic .200 * df .984 Sig. 12 .995 a. Lilliefors Significance Correction *. This is a lower bound of the true significance. III. Test the claim that the mean breakdown voltage is different from 9 volts. SPSS: Analyze ⊳ Compare Means ⊳ One-Sample T Test… i. ii. iii. Select the variable “BV” to be analyzed into the Test Variable(s) box. Enter the Test Value, which is the expected or mean value to be tested. In this example the value is 9. Click Continue, and then OK for performing the test and estimation. One-Sample Test Test Value = 9 95% Confidence Interval of the Difference t Breakdown Voltage df .874 Sig. (2-tailed) 11 .401 Mean Difference .117000 Lower -.17769 Upper .41169 Result: The one-sample t-test statistic is 0.874 and the p-value from this statistic is 0.405 and this is greater than 0.05 (the level of significance usually used for the test). This p-value indicates that the average breakdown voltage of the sampled population is statistically not significantly different from 9 volts. The 95% confidence interval for the difference between the population mean breakdown voltage and 9 volts is (-0.17769, 0.41169). 2 2. Two Independent Samples T Test  This test is used for comparing means of two independent normally distributed populations. Statistical Hypotheses: 𝐻0 : 𝜇1 = 𝜇2 versus 𝐻1 : 𝜇1 ≠ 𝜇2   Assumption: Both populations are normally distributed. For large samples, the procedure usually performs well even for non-normal populations. Example 2: Two different etching solutions have been compared using two random samples of 10 wafers for etch solution. Data below are observed etch rates (in mils/sec) from two different etching solutions: Solution 1 Solution 2 9.9 10.2 9.4 10.6 9.3 10.7 9.6 10.4 10.2 10.5 10.6 10.0 10.3 10.2 10.0 10.7 10.3 10.4 10.1 10.3 (Source: Montgomery at al. (2010), p.236) a. Test normality assumption for the two samples. b. Do the data support the claim that the mean etch rate is the same for both solutions? Check if the assumption of equal variances holds. Steps: I. Enter the data in SPSS with the variable “ER” takes up one column, and the “SolType” variable for identifying whether the etch rates were from Solution 1 or 2 takes up another column. The “ER” is considered as the dependent, response or outcome variable, whereas the “SolType” variable is the independent or factor variable. The inserted data should look like the figure overleaf: 3 II. Test normality assumption: This should be done first since the test is for studying samples from normally distributed populations. SPSS: Analyze ⊳ Descriptive Statistics ⊳ Explore… a. In the Explore dialog box, select “ER” into the Dependent List and “SolType” in the Factor List. b. For normality test, click the Plots button. c. The Explore: Plots dialog box will appear on the screen. d. Check the Normality plot with tests and click Continue in the Explore: Plots dialog box. e. Then, click OK in the Explore dialog box. Tests of Normality a Solution Type Etch Rates Kolmogorov-Smirnov Statistic df Shapiro-Wilk Sig. Statistic df Sig. 1 .134 10 .200 * .949 10 .660 2 .107 10 .200 * .953 10 .705 a. Lilliefors Significance Correction *. This is a lower bound of the true significance. 4 Result: The normality test result is as shown in the table above and appear in the SPSS Output window. The p-values of 0.660 and 0.705 from Shapiro-Wilk test of normality are both greater than 0.05. These imply that it is acceptable to assume that the “ER” distributions for Solution 1 and 2 are both normal (or bellshaped). III. Answering Q(b): Do the data support the claim that the mean etch rates are the same for both solutions? Check if the assumption of equal variances holds. SPSS: Analyze ⊳ Compare Means ⊳ Independent-Samples T Test… i. Select the variable “ER” to be analyzed into the Test Variable(s) box. ii. Click Define Groups… and enter the group variable values for identifying groups to be compared. Enter 1 in Group 1 box and enter 2 for Group 2 box, since for the “SolType” variable 1 means Solution 1 and 2 means Solution 2. iii. Click Continue, and then OK for performing the test and estimation. The results are displayed in the SPSS Output window. Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the F Etch Rates Equal variances assumed 3.487 Sig. .078 t df Sig. (2-tailed) Mean Std. Error Difference Difference Difference Lower Upper -2.828 18 .011 -.4300 .1521 -.7495 -.1105 -2.828 13.952 .013 -.4300 .1521 -.7562 -.1038 Equal variances not assumed Results:   The statistics for the test are in the above table. The Levene’s Test for Equality of Variances yields a p-value of 0.078, which means that the difference between the variances is statistically insignificant, and thus the assumption of equality of variances holds. In this example, we should use the statistic in the first row. The p-value 0.011 for the test statistic t (less than 0.05) indicates that there is a significant difference between the mean etch rates for Solution 1 and 2. The 95% confidence interval for the difference between the two means etch rates is (-0.7495, -0.1105). Thus, we have strong evidence from the sample data to reject the claim that the mean etch rates are the same for both solutions. 5