Question 2: What is the variance and standard deviation of a dataset? The variance of the data uses all of the data to compute a measure of the spread in the data. The variance may be computed for a sample of data or a population of data. In either case, we must compute how much each data value differs from the mean and square that difference. Let’s compute the variance for the mileage of Toyota sedans. Vehicle Miles per Gallon x Prius 50 Camry Hybrid LE – 2.5 liter, automatic 41 Camry Hybrid XLE – 2.5 liter, automatic 40 Yaris – 1.5 liter, manual 33 Yaris – 1.5 liter, automatic 32 Corolla – 1.8 liter, manual 30 Corolla – 1.8 liter, automatic 29 Camry – 2.5 liter, automatic 28 Camry – 3.5 liter, automatic 25 Avalon – 3.5 liter, automatic 23 4 Start by computing the mean of this population,  50  41  40  33  32  30  29  28  25  23  33.1 10 Next we subtract the mean from each data value and square the result. Miles per Gallon x x x   50 16.9 285.61 41 7.9 62.41 40 6.9 47.61 33 -0.1 0.01 32 -1.1 1.21 30 -3.1 9.61 29 -4.1 16.81 28 -5.1 26.01 25 -8.1 65.61 23 -10.1 102.61 Sum = 0 2 Sum = 617.5 The sum at the bottom is found by adding the values in the column. The second column measures how much each data value deviates from the mean. Values higher than the mean give a positive deviation and values lower than the mean give a negative deviation. Since the mean is in the center of the data, the sum of the deviations is zero. 5 Whether a data value falls above or below the mean should not affect the spread of the data. For this reason, each deviation is squared. The farther the data value is from the mean, the larger the squared deviation is. Values like 23 or 50 have a high squared deviation since they are farther from the mean of 33.1. Population Variance The population variance  2 (sigma squared) of data xi is the mean of the squared deviations, N 2   x    i 1 2 i N where  is the population mean and N is the population size. The variance measures the average amount the square of the distance each data value is from the mean. Based on the table above, N 2   x   i 1 i N 2  617.5  61.75 10 The sum in the numerator is the sum of the entries in the third column of the table. On average, each data values squared distance from the mean is 61.75 mpg2 from the mean. Working in terms of the squared distance is inconvenient. To remedy this, take the square root of the variance. This measure is called the population standard deviation and measures the spread of the data in terms of the units on the data. 6 Population Standard Deviation The population standard deviation  is the square root of the population variance, N   2  x   i 1 2 i N where  is the population mean and N is the population size. For the Toyota fleet, the standard deviation is   61.75  7.86 miles per gallon The larger the variance or standard deviation is, the more spread out the data values are about the mean. If the data is from a sample instead of a population, the definitions for variance and standard deviation is slightly different. Sample Variance The population variance s 2 (sigma squared) of data xi is the mean of the squared deviations, n s2  x  x  i 1 2 i n 1 where x is the sample mean and n is the sample size. 7 Sample Standard Deviation The sample standard deviation s is the square root of the sample variance, n s  s2  x  x  i 1 2 i n 1 where x is the sample mean and n is the sample size. The main difference between the sample and population standard deviation is the denominator. In the population expressions, the sum of the squared deviations from the mean is divided by the population size N. In the sample expressions, the sum of the squared deviations from the mean is divided by one less than the sample size n. Although the reason for this difference is beyond the scope of this text, using n  1 instead of n insures that the variance is well behaved. Specifically, if we were to average all sample variances from a population, the resulting average is equal to the population variance. Despite this difference, the steps for calculating variance and standard deviation for samples or populations is very similar. Steps for Computing the Variance and Standard Deviation 1. Identify the data values xi . 2. Find the mean of the data values. 8 3. Compute the difference between the data and the mean for each data value. 4. Square each difference between the data and the mean. 5. Sum the squares of the differences. 6. If the data is a population, divide the sum by the number of data values N to find the variance. If the data is a sample, divide the sum by one less than the sample size, n 1 . 7. To find the standard deviation, take the square root of the variance. Let’s apply these steps to compute the spread in several datasets. Example 1 Compute the Sample Variance and Sample Standard Deviation The table below shows the dividend yields of six companies in the New York Stock Exchange energy sector. 9 Company Dividend Yield July 2012 (%) BP 4.80 Chevron 3.41 Exxon Mobil 2.66 PetroChina 3.50 Petroleo Brasiliero 1.20 Royal Dutch Shell 4.30 a. Find the sample mean. Solution The data in this example are the dividend yields for each company. The sample mean is x  x n 4.80  3.41  2.66  3.50  1.20  4.30 6  3.312 The mean has been rounded to three decimal places. b. Find the sample variance. Solution Use a table to compute the differences from the mean and the squared differences from the mean. 10 x xx x  x 4.80 1.488 2.214 3.41 0.098 0.010 2.66 -0.652 0.425 3.50 0.188 0.035 1.20 -2.112 4.461 4.30 0.988 0.976 2 Sum = 8.121 Divide the sum at the bottom of the third column by 5 to give the sample variance, n s2    x  x  i 1 2 i n 1 8.121 6 1  1.624 c. Find the sample standard deviation. Solution The sample standard deviation is the square root of the sample variance, s  s2  1.624  1.27 11 In this example, the original data was written to two decimal places. To insure that we can write the standard deviation to the same number of decimal places, we write numbers in the intermediate steps to one extra decimal place. Example 2 Compute the Population Variance and Population Standard Deviation Stock quotes also give the percentage change in a stock from the previous days closing price. For instance, the quote above indicates that Ford closed at $9.33 per share. This was down from $9.31 per share on the previous days close. This is a percentage change of Percent Change  9.33  9.35  0.21% 9.35 Percentage changes are often used to determine the volatility of a companies stock. By computing some statistics on the percentage change, we can get an idea whether a change in the price is normal or not. Consider the percentage changes in Ford’s price per share over ten trading days in June. Date 6/1 6/4 6/5 6/6 6/7 6/8 6/11 6/12 6/13 6/14 % Change -4.17 -0.79 1.49 3.73 -0.19 1.04 -1.97 0.48 -1.90 1.07 12 a. Find the population mean. Solution For the purpose of this example, we’ll consider the percentage changes over the ten day period to be a population. The mean is   x N  4.17    0.79   1.49  3.73   0.19   1.04   1.97   0.48   1.90   1.07 10  0.121 b. Find the population variance. Solution Calculate the diffference from the mean and the squared difference from the mean. x -4.17 -0.79 1.49 3.73 -0.19 1.04 -1.97 0.48 -1.90 1.07 x -4.049 -0.669 1.611 3.851 -0.069 1.161 -1.849 0.601 -1.779 1.191 16.394 0.448 2.595 14.830 0.005 1.348 3.419 0.361 3.165 1.418 x   2 The sum of the bottom row is 43.983. The population variance is N 2    x    i 1 2 i N 43.983 10  4.3983 c. Find the population standard deviation. Solution The standard deviation is the square root of the variance, 13 s  s2  4.3983  2.10 We’ll see in later chapters that stock traders assume that 68% of stock changes lie within one standard deiviation of the mean. A change in price of greater that 2.10% indicates above normal strength or weakness, depending on whether the price rises or falls. 14