Question 2: What is the variance and standard... The variance of the data uses all of the data... data. The variance may be computed for a sample of...

Question 2: What is the variance and standard deviation of a dataset?
The variance of the data uses all of the data to compute a measure of the spread in the
data. The variance may be computed for a sample of data or a population of data. In
either case, we must compute how much each data value differs from the mean and
square that difference.
Let’s compute the variance for the mileage of Toyota sedans.
Vehicle
Miles per Gallon
x
Prius
50
Camry Hybrid LE – 2.5 liter, automatic
41
Camry Hybrid XLE – 2.5 liter, automatic
40
Yaris – 1.5 liter, manual
33
Yaris – 1.5 liter, automatic
32
Corolla – 1.8 liter, manual
30
Corolla – 1.8 liter, automatic
29
Camry – 2.5 liter, automatic
28
Camry – 3.5 liter, automatic
25
Avalon – 3.5 liter, automatic
23
4
Start by computing the mean of this population,

50  41  40  33  32  30  29  28  25  23
 33.1
10
Next we subtract the mean from each data value and square the result.
Miles per Gallon
x
x
x  
50
16.9
285.61
41
7.9
62.41
40
6.9
47.61
33
-0.1
0.01
32
-1.1
1.21
30
-3.1
9.61
29
-4.1
16.81
28
-5.1
26.01
25
-8.1
65.61
23
-10.1
102.61
Sum = 0
2
Sum = 617.5
The sum at the bottom is found by adding the values in the column. The second column
measures how much each data value deviates from the mean. Values higher than the
mean give a positive deviation and values lower than the mean give a negative
deviation. Since the mean is in the center of the data, the sum of the deviations is zero.
5
Whether a data value falls above or below the mean should not affect the spread of the
data. For this reason, each deviation is squared. The farther the data value is from the
mean, the larger the squared deviation is. Values like 23 or 50 have a high squared
deviation since they are farther from the mean of 33.1.
Population Variance
The population variance  2 (sigma squared) of data xi is
the mean of the squared deviations,
N
2 
 x   
i 1
2
i
N
where  is the population mean and N is the population
size.
The variance measures the average amount the square of the distance each data value
is from the mean. Based on the table above,
N
2 
 x  
i 1
i
N
2

617.5
 61.75
10
The sum in the numerator is the sum of the entries in the third column of the table. On
average, each data values squared distance from the mean is 61.75 mpg2 from the
mean.
Working in terms of the squared distance is inconvenient. To remedy this, take the
square root of the variance. This measure is called the population standard deviation
and measures the spread of the data in terms of the units on the data.
6
Population Standard Deviation
The population standard deviation  is the square root of
the population variance,
N
  2 
x  
i 1
2
i
N
where  is the population mean and N is the population
size.
For the Toyota fleet, the standard deviation is
  61.75  7.86 miles per gallon The larger the variance or standard deviation is, the more spread out the data values
are about the mean.
If the data is from a sample instead of a population, the definitions for variance and
standard deviation is slightly different.
Sample Variance
The population variance s 2 (sigma squared) of data xi is
the mean of the squared deviations,
n
s2 
x  x 
i 1
2
i
n 1
where x is the sample mean and n is the sample size.
7
Sample Standard Deviation
The sample standard deviation s is the square root of the
sample variance,
n
s  s2 
x  x 
i 1
2
i
n 1
where x is the sample mean and n is the sample size.
The main difference between the sample and population standard deviation is the
denominator. In the population expressions, the sum of the squared deviations from the
mean is divided by the population size N. In the sample expressions, the sum of the
squared deviations from the mean is divided by one less than the sample size n.
Although the reason for this difference is beyond the scope of this text, using n  1
instead of n insures that the variance is well behaved. Specifically, if we were to
average all sample variances from a population, the resulting average is equal to the
population variance.
Despite this difference, the steps for calculating variance and standard deviation for
samples or populations is very similar.
Steps for Computing the Variance and Standard
Deviation
1. Identify the data values xi .
2. Find the mean of the data values.
8
3. Compute the difference between the data and the mean
for each data value.
4. Square each difference between the data and the mean.
5. Sum the squares of the differences.
6. If the data is a population, divide the sum by the number
of data values N to find the variance. If the data is a
sample, divide the sum by one less than the sample size,
n 1 .
7. To find the standard deviation, take the square root of the
variance.
Let’s apply these steps to compute the spread in several datasets.
Example 1
Compute the Sample Variance and Sample Standard
Deviation
The table below shows the dividend yields of six companies in the New
York Stock Exchange energy sector.
9
Company
Dividend Yield
July 2012 (%)
BP
4.80
Chevron
3.41
Exxon Mobil
2.66
PetroChina
3.50
Petroleo Brasiliero
1.20
Royal Dutch Shell
4.30
a. Find the sample mean.
Solution The data in this example are the dividend yields for each
company. The sample mean is
x

x
n
4.80  3.41  2.66  3.50  1.20  4.30
6
 3.312
The mean has been rounded to three decimal places.
b. Find the sample variance.
Solution Use a table to compute the differences from the mean and the
squared differences from the mean.
10
x
xx
x  x
4.80
1.488
2.214
3.41
0.098
0.010
2.66
-0.652
0.425
3.50
0.188
0.035
1.20
-2.112
4.461
4.30
0.988
0.976
2
Sum = 8.121
Divide the sum at the bottom of the third column by 5 to give the sample
variance,
n
s2 

 x  x 
i 1
2
i
n 1
8.121
6 1
 1.624
c.
Find the sample standard deviation.
Solution The sample standard deviation is the square root of the sample
variance,
s  s2
 1.624
 1.27
11
In this example, the original data was written to two decimal places. To insure that we
can write the standard deviation to the same number of decimal places, we write
numbers in the intermediate steps to one extra decimal place.
Example 2
Compute the Population Variance and Population
Standard Deviation
Stock quotes also give the percentage change in a stock from the
previous days closing price.
For instance, the quote above indicates that Ford closed at $9.33 per
share. This was down from $9.31 per share on the previous days close.
This is a percentage change of
Percent Change 
9.33  9.35
 0.21%
9.35
Percentage changes are often used to determine the volatility of a
companies stock. By computing some statistics on the percentage
change, we can get an idea whether a change in the price is normal or
not. Consider the percentage changes in Ford’s price per share over
ten trading days in June.
Date
6/1
6/4
6/5
6/6
6/7
6/8
6/11
6/12
6/13
6/14
%
Change
-4.17
-0.79
1.49
3.73
-0.19
1.04
-1.97
0.48
-1.90
1.07
12
a. Find the population mean.
Solution For the purpose of this example, we’ll consider the percentage
changes over the ten day period to be a population. The mean is


x
N
 4.17    0.79   1.49  3.73   0.19   1.04   1.97   0.48   1.90   1.07
10
 0.121
b. Find the population variance.
Solution Calculate the diffference from the mean and the squared
difference from the mean.
x
-4.17
-0.79
1.49
3.73
-0.19
1.04
-1.97
0.48
-1.90
1.07
x
-4.049
-0.669
1.611
3.851
-0.069
1.161
-1.849
0.601
-1.779
1.191
16.394
0.448
2.595
14.830
0.005
1.348
3.419
0.361
3.165
1.418
x  
2
The sum of the bottom row is 43.983. The population variance is
N
2 

 x   
i 1
2
i
N
43.983
10
 4.3983
c.
Find the population standard deviation.
Solution The standard deviation is the square root of the variance,
13
s  s2
 4.3983
 2.10
We’ll see in later chapters that stock traders assume that 68% of stock
changes lie within one standard deiviation of the mean. A change in
price of greater that 2.10% indicates above normal strength or
weakness, depending on whether the price rises or falls.
14