Download Report

SUMMER
KNOWHOW
STUDY AND LEARNING CENTRE
An introduction to
STATISTICS
1
2
Contents
Data………………………………………………………………………………………………………………………………………....5
Summation Notation…………………………………………………………………………………………………………………7
Measures of Spread…………………………………………………………………………………………………………………..9
Introductory Probability………………………………………………………………………………………………………….13
Sample Spaces………………………………………………………………………………………………………………………….16
Conditional Probability…………………………………………………………………………………………………………….18
Binomial Distribution……………………………………………………………………………………………………………….20
Normal Distribution…………………………………………………………………………………………………………………22
Standard Normal Distribution………………………………………………………………………………………………….24
Probability and Normal Distribution………………………………………………………………………………………...28
Sampling Distributions………………………………………………………………………………………………………….…30
Confidence Intervals…………………………………………………………………………………………………………….….32
Hypothesis Testing……………………………………………………………………………………………………………….…34
3
4
DATA
Definitions:
Population: the total group of individuals or items.
Sample: a group of individuals or items chosen from the population.
Data: the information collected from the sample or population.
Statistic: a number calculated from the sample data.
Parameter: a number calculated from the population data.
Types of data:
Data may be either qualitative (categorical) or quantitative (numerical)

Qualitative Data (classified or labelled).
Data is put into non-numerical categories. Blood type, religion, cause of death, are all examples
of qualitative data.

Quantitative Data (counted or measured).
There are two types of quantitative data.
o Discrete Data: data is put into categories depending on its counted number; for
example, the number of children in a family.
o Continuous Data: data is put into categories depending on its measured size; for
example, height.
Graphical Representation
Qualitative/Categorical data is often represented by means of a bar chart or a pie chart.
Example 1
The table shows the percentage of imports from various countries. This data can be represented on a
pie chart so that comparisons are easier:
Country
USA
Japan
Germany
UK
China
New Zealand
Italy
Other
Imports
25
20
10
7
6
4
3
25
Italy
3%
Other USA
25% 25%
NZ
4%
China
6% UK
7%
Japan
Germa 20%
ny
10%
Quantitative/Numerical Data is often represented by means of a frequency bar chart called a
histogram.
5
Example 2
A group of school students were surveyed to find the number of children in their families. This data
can be represented using a histogram.
15
20
No. of Children in a Family
10
5
0
Total
Frequency
13
21
11
4
3
1
1
54
Frequency
No. of Children
1
2
3
4
5
6
7
0
2
4
Children
6
8
Exercises
1. Label each of the following as either a categorical or numerical variable. For the numerical variables
label each as either discrete or continuous.
(a) Hair colour
(b) A person’s religion
(c) A person’s height
(d) Number of children in a family
(e) The weights of babies born on a particular day
(f) The number of crimes committed in Victoria each week
(g) The distance travelled to work by the employees of a large company
(h) The make of car driven by students at RMIT
2. Represent the data in example 1 in a bar graph.
25
Percentage Imports by Country
0
5
10
15
20
2.
mean of var2
Answers
1.(a) Categorical
(b) Categorical
(c) Numerical – continuous
(d) Numerical – discrete
(e) Numerical – continuous
(f) Numerical – discrete
(g) Numerical – continuous
(h) Categorical
China
6
Ger
Italy
Japan
NZ
UK
USA
wOther
SUMMATION NOTATION
Summation notation or sigma notation is a shorthand method of writing the sum or addition of a string
of similar terms. A typical element of the sequence which is being summed appears to the right of the
summation sign.
Last value of i
5
 2i
A sum of terms
Each term looks
like this
i=1
This value will
change with each
term
First value of i
This will remain
constant with
each term
To expand we replace i by its starting value (below the sigma symbol) and obtain each successive term
by adding 1 to the previous value until the final value of i (above the sigma symbol)
For the above sequence:
5
 2i = 2×1 + 2×2 +2×3 + 2×4 + 2×5 = 30
i=1
Examples:
1.
Expand and evaluate
3
 (i
2
 3)
i 0
3
 (i
2
 3) = (02 – 3) + (12 – 3) + (22 – 3) + (32 – 3)
i 0
= (-3) + (-2) + 1 + 6
= 2
2.
Given the set of data x1 = 1, x2 = 2, x3 = 4, x4 = 5 evaluate
n
(a) x =
x
i
i 1
n
n
(b) s2 =
 x  x 
i 1
2
i
n 1
7
n
x
i
i 1
x=
n
x  x 2  x3  x 4
= 1
n
=
1 2  4  5
4
= 3
n
s2 =
 x  x 
2
i
i 1
n 1
 x1  x    x2  x    x3  x    x4  x 
2
=
2
2
4 1
1  3   2  3   4  3   5  3
2
=
2
2
4 11 4
3
10
=
3
2
2
4 1
=
NB: If n is not specified then it is assumed to be the number of scores or values.
∑ means the sum of all the scores.
Exercise
3
1.
Find (a)
 (5i  2)
3
(b)
i 1
2.
 (5i)  2
i 1
Given x1 = -2, x2 = 0, x3 = 1, x4 = 3, x5 = 3
5
find (a)
10 x
i 1
i
5
(b) 10
x
i
i 1
 5 
(d)   xi 
 i 1 
5
(c)
 (x )
i 1
2
i
n
x
5
(e)
 i( xi ) (f)
x=
i 1
Answers: 1. (a) 24
2 . (a) 50
(b) 28
(b) 50
(c) 23
i 1
i
n
(d) 25
(e) 28
8
(f) 1
2
MEAN, MODE & MEDIAN
The mean, mode and median are measures of the centre or middle of a set of data. They are
sometimes called measures of central tendency and they provide a single value that is typical of the
data.
The mode is the value that occurs most often.
The median is the middle value when the data is arranged in order.
The mean (or average) is the sum of all the scores divided by the number of scores in the data set:
=
∑
Examples
1. Consider the data set 3, 2, 0, 5, 2
The mode is 2 because it has the highest frequency.
Rearranging the data in order gives 0, 2, 2, 3, 5: the median is 2.
middle score
The mean is
=
∑
=
= 2.4
2. Find the mean mode and median of the data displayed in the frequency table
x
-3
-1
4
5
24
frequency
1
3
1
2
1
n = f = 8
highest frequency
n = number of scores is the
sum of all the frequencies
The mode is -1 [this score occurs most often]
There are 8 scores and so two ‘middle’ scores, the 4th and 5th. The median is the average of
these two scores: median =
= 1.5
The mean is
(-3) 1+(-1)  3+4 1+5  2+24 1
8
[NB: A disadvantage of the mean is that it is
affected or distorted by extreme or outlying
values.]
=4
Graphs and the mode, median and mean
For symmetrical bell shaped graphs
such as this the mode, median and
mean all have the same value, 100
100
9
The data set 1, 1, 2 ….86, 94, 96 is shown in the stem and leaf plot below
8
9
A scan of the data organised into the plot reveals that the mode is 35.
There are 21 scores less than or equal to 38 and 20 scores greater than or equal to 50 . There are also
7 scores in the forties. So altogether there are 48 scores. The median will be midway between the 24th
and 25th scores which are easy to locate when we know the 21st score is 38.
The median is 47.5.
1. Given the following scores: 12, 12, 13, 14, 14, 15, 15, 15, 16.
(a) Find the mean score
(b) Find the median
(c) What is the mode?
2. Determine the mean, mode and median for the data in the frequency table
Score
40
50
60
70
80
90
Total
Frequency
1
4
8
3
3
1
20
3. Find the mode and median for the data displayed in the stem and leaf plot for which the smallest
score is 10 and the largest 69.
stem leaf
Answers
1. (a) 14 (b) 14 (c) 15
2. mean 63, mode 60, median 60
3. mode 41, median 31
10
MEASURES OF SPREAD
Measuring spread or dispersion in data:
Consider the two sets of values below:
Set A: 4, 4, 5, 5, 5, 6, 6.
Set B: 1, 3, 4, 5, 6, 7, 9.
Both groups A and B have mean = median = 5 but the data sets are quite different. The values
in Set A are less spread out than those in Set B..
Range
To compare data sets it is also useful to look at the measure of spread. The most basic measure of
spread is the range, the distance from the smallest to the largest value.
Range = Largest Value
-
Smallest Value
Set A
Set B
Range = Highest Value - Lowest Value Range = Highest Value - Lowest Value
Range = 6 – 4
Range = 9 - 1
Range = 2
Range = 8
We can see that Set B has greater spread than Set A.
But the problem with the range is that it uses only two of the values in the data set. One of these may
be an odd or unusual value called an outlier.
Consider the two sets of values below:
Set Y: 1, 1, 2, 2, 2 , 2, 2, 100.
Set Z: 1, 18, 23, 41, 59, 63, 87, 100.
The range for both is 99 because Set Y has one unusual value.
Interquartile Range
The Interquartile Range (IQR) is the distance between the first quartile Q1 and the third
quartile Q3.
IQR = Q3 - Q1
Lowest
value
Q2
Q1
Q3
(median)
The first and third quartiles are values that are ¼ and ¾ of the way through the ordered data.
Q1 is the median of the lower half of the data and Q3 is the median of the upper half of the data.
(NB: there are other ways to find Q1 and Q3 so check with your program).
1 1
2 2
Set Y
2 2 2 100
Q1
IQR = Q3 - Q1
= 2 – 1.5
= 0.5
Q3
1 18
23 41
Q1
Q3
IQR = Q3 - Q1
=
–
= 75 - 20.5
= 54.5
11
Set Z
59 63 87 100
Highest
value
So for the data sets Y and Z the mean together with the IQR are better for summarising the data sets.
Standard Deviation
A measure of dispersion or spread in a data set that takes into account all of the data is the standard
deviation. It gives an indication of the typical or average distance of each score from the mean for the
data.
The standard deviation can be calculated using the formula s = √
∑(
̅)
but it is much more
convenient to use your calculator or the computer.
Some statistical tests make use of the variance which is the square of the standard deviation:
Variance = s2
Set A
4 4 5 5 5 6 6
̅ = 5
s = 0.82
Set B
1 3 4 5 6 7 9
̅ = 5
s = 2.65
We can interpret the standard deviation: the scores in set A are typically 0.82 away from the mean,
but the scores in set B are typically 2.65 away from the mean. Though set A and B have the same
centre those in set B are clearly more dispersed or have greater spread than set A.
1. Given the following scores: 12, 12, 13, 14, 14, 15, 15, 15, 16, find the standard deviation.
2. A class of 22 students gained the following scores, out of 10, on a test :
5, 7, 8, 7, 6, 5, 6, 4, 7, 4, 8, 3, 7, 9, 4, 9, 7, 3, 6, 8, 7, 5.
Find the (a) range
(b) IQR
(c) standard deviation.
3. Pistol Pete is the star full-forward for the local football team. Last season he played 20 games and
kicked the following number of goals in each game: 5, 6, 6, 5, 7, 4, 3, 1, 3, 8, 7, 8, 6, 0, 5, 2, 7, 6, 5, 6.
(a) Find the mean and the standard deviation for the number of goals that Pete kicked per game.
(b) This season the mean number of goals Pete kicks per game is 5, with a standard deviation of 2.7.
In which year was his performance more consistent?
Answers
1.
1.414
2.
(a) 6 (b) 2 (c) 1.807
3.
(a) ̅ = 5, s = 2.22 (b) Last year: smaller standard deviation means less variation
Parts of this resource were adapted from materials created by the Academic Skills Unit at Southern Cross University.
12
INTRODUCTORY PROBABILITY
A probability is written as a number between zero and one: 0  Pr(A)  1
Pr(A) = 0 means that event A is impossible.
Pr(A) = 1 means that event A is certain.
When considering a set of all possible outcomes an event is a particular outcome of interest.
For example,
In tossing a coin the particular event of interest might be ‘obtaining a head’
In considering the weather for Saturday the event of interest might be ‘it doesn’t rain’
In planning a two child family he particular event of interest might be ‘a boy and a girl’.
The probability of an event E can be found with the formula:
Pr (E) =
[assuming all outcomes are equally likely]
Examples:
If two coins are tossed find the probability of obtaining two heads.
Let E be the event ‘two heads’
The possible outcomes are HH HT TH TT
Pr (E) =
=
If a die is thrown find the probability of obtaining an odd number
Let E be the event ‘an odd number’
The possible outcomes are 1 2 3 4 5 6
Pr (E) =
The multiplication principle
Two events, A and B, are independent if the fact that A occurs does not affect the probability of B
occurring. Because successive tosses of a coin are independent events, an alternate way of calculating
the probability in example one would be to use the multiplication principle.
If A and B are independent events then
Pr(A and B) = Pr(A  B) = Pr(A)  Pr(B)
The probability of a head on the first toss (H1) and a head on the second toss (H2)
= Pr (H1  H2)
=
×
=
13
The addition principle
Pr(A or B) = Pr(A  B) = Pr(A) + Pr(B) - Pr(A  B)
If A and B are mutually exclusive (cannot happen together) then
Pr(A or B) = Pr(A  B) = Pr(A) + Pr(B)
If we are tossing a single die twice and want to calculate the probability that a 6 occurs, then the 6
could occur on the first toss (S1) or on the second toss (S2):
Pr(S1 or S2) = Pr(S1  S2) = Pr(S1) + Pr(S2) - Pr(S1  S2) [because the events are not mutually exclusive]
=
+
-
=
Complementary events
If E is an event in then (not E) or or E’ is called the complement of E.
Examples of complementary events:
 ‘winning the grand final’ and ‘not winning the grand final’
 ‘passing a test’ and ‘failing a test’
 ‘being left handed’ and ‘being right handed’
Because P(E) + P(E’) = 1 it follows that
P(E’) = 1 - P(E)
In the previous example where a die was tossed twice the probability of not getting a 6 on either the
first or second toss = 1 =
Exercise
1. If 1000 tickets are sold in a raffle and one winning ticket is chosen at random, what is my
probability of winning the raffle if I buy 5 tickets?
2. If I roll a die, what is the probability that the number uppermost is greater than 4?
3. A bag contains 6 white marbles and 4 black marbles. A marble is chosen, the colour recorded and
then replaced three times. What is the probability that all three marbles are white?
4. The probability that person A is alive in 30 years time is 0.7. The probability that
person B is alive in 30 years time is 0.4 .
Find the probability that:
(a) both are alive in 30 years.
(b) neither are alive in 30 years
(c) only one is alive in 30 years time (d) at least one is alive in 30 years time.
Answers
1.
2.
3. 0.216
4. (a) 0.28 (b) 0.18 (c) 0.54 (d) 0.82
14
SAMPLE SPACES
A list or diagram showing all possible outcomes in a probability experiment is called a sample space.
Then Pr(E) =

h
=
( )
(S)
For tossing a single die the sample space is 1, 2, 3, 4, 5, 6
and Pr(1) = Pr (2) = Pr(3) = Pr(4) = Pr(5) = Pr(6) =

For this spinner, which has 4 equal sectors, the
sample space is Red, Green, Yellow, Blue
And Pr(R) = Pr (G) = Pr(Y) = Pr(B) =
NB: The sum of the probabilities of the distinct outcomes within a sample space is 1.
Tree diagrams
A tree diagram can be used to find the sample space.
For example, if two coins are tossed there are four possible outcomes:
The sample space for tossing two
coins is HH HT TH TT
If E is the event ‘at least one head’ then Pr(E) = Pr(HH or HT or TH) = +
The sample space for a three child family is shown below:
If E is the event ‘first child a girl’ then Pr(E) =
=
15
+ =
Other sample spaces and diagrams
If a single card is drawn from the deck and
(a)
D is the event ‘the card is a diamond’ then Pr(D) =
(b)
E is the event ‘the card is a diamond (D) or an ace (A)’
then Pr(E) = Pr (D or A)
= Pr (D  A)
= Pr(D) + Pr(A) - Pr(D  A)
=
=
+
=
=
Tables and Venn diagrams can also be used to organise information that makes finding probabilities
easier
The diagram shows the number
of people in a survey of 256 who
regularly ate Kit Kats, Mars Bars
or Rocky Road
From the diagram we can see
Pr(K) =
Pr(M  R) =
=
=
=
Pr(KitKat and MarsBar but not Rocky Road) =
Pr(at least one of these things) = 1 -
=
=
=
[using complementary events]
16
The table shows the results of a study
that looked at the association between
smoking (S) and lung cancer (C).
From the table we can see
Pr(S) =
=
Pr(C’) =
Pr (S  C) =
=
=
Exercise
1. Use a tree diagram to find the sample space for a two child family. Hence find
(a) The probability that both children are girls
(b) The probability that the oldest child is a girl
(c) The probability that at least one child is a girl
2. The diagram shows the sample space for tossing a single die twice.
Find the probability that
(a) the first toss is a 4
(b) the sum of the two tosses is 5
(c) at least one toss is a 6
(d) neither toss is a 6
3. In a classroom of 20 Yr 12 VCE students 10 study Maths Methods, 7 study Specialist maths and
5 study both. Organise the information in a Venn diagram and find the probability that a
student chosen at random
(a) Studies neither of these maths subjects
(b) Studies Maths Methods but not Specialist Maths
4. Find the probability that a card drawn at random from a pack is
(a) A red card
(b) Lower than a 5 (ace low)
Answers
1.
(a)
(b)
(c)
2.
(a)
(b)
(c)
3.
(a)
(b)
4.
(a)
(b)
(d)
17
CONDITIONAL PROBABILITY
Dependent events
Two events are dependent if the outcome or occurrence of the first affects the outcome or occurrence
of the second so that the probability is changed.
Example
A card is chosen at random from a pack. If the first card chosen is the jack of diamonds and it is not
replaced what is the probability that the second card is
(a) a diamond?
(b) a jack?
P() ) =
P(jack) =
(c) the queen of clubs?
one less diamond
in the pack
=
=
one less card in
the pack
P(Q♧) =
The events J1 ‘jack of diamonds on the first draw’ and D2 ‘a diamond on the second draw’ are
dependent when there is no replacement. The probability of choosing a diamond on the second draw
given that the jack of diamonds was chosen on the draw pick is called a conditional probability. We
say Pr (D2 /J1) =
… “The probability of D2 given J1 is
“
Multiplication Rule
When two events, A and B, are dependent, the probability of both occurring is:
Pr(A and B) = Pr(A  B) = P(A) · P(B|A)
Example
Find the probability of obtaining two jacks if two cards are drawn is succession from a pack
(a) with replacement
(b) without replacement
(a) If the cards are replaced then the events are independent:
Pr(J1  J2) = Pr(J1 ) × Pr( J2) =
=
(b) If the cards are not replaced then the probability of the second draw depends on the first
draw:
Pr(J1  J2) = Pr(J1 ) × Pr( J2/ J1) =
=
Conditional probability
The multiplication rule for dependent events can be rearranged to find a conditional probability
Pr(B|A) =
P (A B)
P(A)
or
Pr(A|B) =
P (A B)
P(B)
Examples
1. Find the Pr(A|B) if Pr(A) = 0.7, Pr(B) = 0.5 and Pr(A  B) = 0.8.

Pr(A  B) = Pr(A) + Pr(B) - Pr(A  B)
0.8 = 0.7 + 0.5 – Pr(A  B)
Pr(A  B) = 0.4
[we must first find Pr(A  B)]
18
and Pr(A|B) =
P (A  B)
P(B)
=
.
.
= 0.8
2. In a class of 15 boys and 12 girls two students are to be randomly chosen to collect homework.
What is the probability that both students chosen are boys?
Pr(B1  B2) = Pr(B1 ) × Pr( B2/B1) =
=
=
Another way to do conditional probability problems is to reduce the sample space:
3. Given the information in the following table find the probability that someone was sunburnt
given that they were not wearing a hat.
Sunburnt
face
Yes No
Hat
Yes
3
77
80
No
12
8
20
15
85
100
̅) =
Pr(S/H
Highlight the part of the table that satisfies
the condition “not wearing a hat”.
This becomes the sample space for the
question.
=
Exercise
1. The results of a survey of music preferences are displayed in the Venn diagram. Find the
probability that a student likes rock music given that they like dance music.
Image Source: Passy’s World of Mathematics
2. Three cards are chosen at random from a pack without replacement. What is the
probability of choosing 3 aces?
3. In a maths class of 20 students 5 failed the final exam. If two students are chosen at
random without replacement, what is the probability that the first passed but the second
failed?
4. If Pr(X) = 0.5, Pr(Y) = 0.5 and Pr(X  Y) = 0.2 find the probability of
(a) Pr (X/Y)
(b) Pr (X  Y)
(c) Pr(X)×Pr(Y/X)
5. In a three child family what is the probability that all three children will be girls given that
the first child is a girl. [Hint: Draw a tree diagram to find the sample space]
Answers
1.
=
2.
3.
4.
5.
=
×
=
(a) 0.4 (b) 0.8
0.25
(c) 0.2
19
BINOMIAL DISTRIBUTION
A variable may be described as having a binomial distribution when there are only two possible
outcomes. The following are all examples of probability questions about binomial data:
 What is the probability of obtaining 5 heads in 6 tosses of a coin?
 What is the probability that in a randomly selected group of 30 people none of them will have a
particular disease?
 What is the probability that in a sample of 100 manufactured components no more than 2 will
be defective?
Suppose that in a particular family the probability that a child will have red hair is ¼. If the parents
have three children…
(i)
the probability that all three will have red hair is
P(
and
and
) = P( R and R and R )
= ¼x¼x¼
1
64
=
(ii)
the probability that none will have red hair is
P(
and
and
) = P( R and R and R )
= ¾x¾x¾
=
(iii)
27
64
the probability that at least child will have red hair is
1 – P (none with red hair) = 1 - P( R and R and R )
= 1 – (¾ x ¾ x ¾)
= 1=
27
64
37
64
[ NB: ‘at least one child…’ and ‘no children…’ are complementary events]
(iv)
the probability that only the first child will have red hair is
P(
and
and
) = P( R and R and R )
= ¼ x¾x¾
=
(v)
9
64
the probability that exactly one child will have red hair is
P(
and
and
) or P( and
and
) or P( and
and
= P( R and R and R ) + P( R and R and R ) + P( R and R and R )
=¼ x¾x¾+¾x¼ x¾+¾x¾x¼
)
9
64
27
64
= 3x
=
[NB: The last example demonstrates that it is important to consider all the ways in
which the child with red hair might be selected.]
20
Binomial Probability Formula
If ‘n’ is the number of trials eg ( number of tosses of a coin, number of children in a family number of
items in a sample), and ‘p’ is the probability of the outcome of interest then the probability of ‘x’
outcomes is given by the formula
P(X = x) = nCx × px ×(1 - p)n-x
This is a calculator button that counts the number of
ways the desired outcome can occur
Example
One in every hundred items a machine produces are defective. What is the probability that in a sample
of five items produced by this machine that
(a) Exactly three are defective?
(b) None are defective ?
(c) At least 1 is defective
n = 5,
p = 1 = 0.01, x = 3
100
(a) P(X = x) = nCx px (1 - p)n-x
P(X =3) = 5C3 (0.01)3(1 – 0.01)5-3
≈ 0.00001
[so if we obtained three defective items in a sample of 5 we might be suspicious of the
claim that only one in a hundred is defective!]
(b) P(X = 0) = 5C0 (0.01)0(1 – 0.01)5
= 0.95
(c) P(X ≥ 1) = 1 - P(X = 0) [‘none defective’ and ‘at least one defective’ are complementary events]
= 1 - 0.95
= 0.05
Probability Distribution
A list of all possible outcomes of an event and their associated probabilities is called a probability
distribution. The probability distribution table for the event X in the example is
x (no of defectives)
P(X=x)
0
0.95099
1
0.04803
2
0.00097
3
0.00001
4
0.00000
5
0.00000
For a binomial distribution the mean and standard deviation are found using the formulae:
 = E(X) = np

=
np(1-p)
For the previous example the expected value and standard deviation of the number of defectives in a
batch of one thousand would be
 = E(X) = 1000x0.01 = 10

np(1-p) = √1000(0.01)(0. ) = 3.15
=
So that in a batch of 1000 we would expect to get 10 defectives and the number of defectives will
deviate from this amount by an ‘average’ of 3.15. We would expect most batches to have between 7
and 13 defective items.
Exercise
The probability that an archer will hit a bullseye is 0.7. If he is allowed ten attempts, find the
probability that he
(a) hits it every time
(b) misses each time
(c) scores at least two bullseyes
Answers::
(a) 0.028 (b) 0.000006
(c) 0.99985
21
NORMAL DISTRIBUTION
10
20
30
40
50
60
70
80
90
25
20
15
10
100
5
0
0
0
5
10
10
20
15
30
20
40
25
50
Graphical data can display different forms:
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
But many things that can be measured such as

heights of people

blood pressure

errors in measurements

scores on a test
0
10
20
30
40
50
follow a bell shaped curve. Such data is said to be normally distributed
10
20
30
40
50
60
70
80
90
100
110
Graph from Wolfram Alpha
Properties of a normal distribution

Symmetry about the mean

Mean = median = mode

50% of values greater than the mean and 50% less than the mean

68% of values fall within one standard deviation either side of the mean

95% of values fall within two standard deviations either side of the mean

99.7% of values fall within three standard deviation either side of the mean
This is sometimes
known as the
empirical or
68-95-99 rule
NB: Even though most of the data will fall within three standard deviations of the mean there is in
theory no upper or lower bound to a normal distribution. We are just less and less likely to find
values beyond these points.
22
Example
If scores on an IQ test are normally distributed with mean = 100 and standard deviation = 10, what
percentage of people would we expect to
(a) score between 90 and 110?
(b) score less than 80
(a) Because 90 = 100 - 10 and 110 = 100 + 10 are both one standard deviation from the mean
68% of people would be expected to score between 90 and 110
(b) 80 = 100 – 2 × 10 is two standard deviations below the mean. We know that 95% of scores fall
between 80 and 120 so 5% must fall outside this range. Half of these, 2.5%, will be below 80.
Therefore we would expect that 2.5% of people to have IQ scores less than 80.
Exercise
1. Scores on a general achievement test are normally distributed with a mean of 80 and a
standard deviation of 15. Adam scored 95. What proportion of students had a higher score
than Adam?
2. The actual weights of cereal boxes that are supposed to contain 500g are normally distributed
with mean of 510g and a standard deviation of 5g. What proportion of boxes are underfilled?
3. In a maths class the bottom 16% of students are given an F grade. If the class mean is 63 and
the standard deviation is 18 what score must a student get to pass?
4. If newborn birth weights in a certain hospital are normally distributed with a mean of 3200g
and a standard deviation of 400g
(a) what percentage of babies weigh more than 3200g?
(b) what percentage of babies weigh between 2400g and 4000g?
(c) what percentage of babies weigh less than 3600g?
(d) if the 16% of babies with the lowest birth weights are placed in the special care nursery
will a baby that weighs 2500g need special care?
5. 95% of people in a clinical study had systolic blood pressure readings between 116 and 144. If
the blood pressure measurements follow a normal distribution what is the mean and standard
deviation of the blood pressures for this group.
6. A class of ten students get the following marks in a test: 13, 23, 41, 55, 66, 78, 49, 33, 35, 67.
If anyone who scored less than one standard deviation below the mean fails how many
students will fail?
Answers
1. 16%
2. 2.5% 3. 45
4. (a) 50% (b) 95% (c) 84% (d) yes
23
5. μ = 130, σ = 7 6. 2
STANDARD NORMAL DISTRIBUTION
The standard normal distribution (sometimes called a z-distribution) has a mean of zero
( μ = 0) and a standard deviation of 1 (σ = 1).
If we are working with the standard normal distribution we are not restricted to the 68-95-99 rule
because tables are available to enable us to find proportions or percentages or probabilities for any
value in the distribution. Tables come in different layouts, but this table gives the proportion to the
left of a chosen z-value of up to 2 decimal places.
We can also interpret our proportions or percentages as probabilities:
Pr(z < 0) = 0.5
Pr (z < 0.03) = 0.512
Pe (z<0.75) = 0.7734
NB: It is also possible to use a graphics calculator or a computer to find areas, proportions and
probabilities in a normal distribution
24
Example:
In a standard normal distribution what percentage of values will be
(a) less than 1.28?
(b) more than 1.28?
(c) between 0 and 1.28?
(d) greater than -1.28?
(e) between -1.28 and 1.28?
(a) First draw a diagram:
We are looking for the percentage of
the graph to the left of 1.28
We can see from the table that the percentage of values less than 1.28 is 89.97%
(b)
First draw a diagram:
We are looking for the percentage of
the graph to the right of 1.28
However we cannot read areas to the right of a z-value directly from the table.
Instead we must observe that
 100% of all values lie under the curve
 The area to the left of the shaded region is the same as part (a)
So the percentage of values to the right of 1.28 will be 100% subtract the percentage to
the left of 1.28.
Therefore the percentage of values more than 1.28 is 100% - 89.97% = 10.03%
(c)
First draw a diagram:
We are looking for the percentage of
the graph between 0 and 1.28
25
We cannot look up areas between two values directly from the table.
But we know
 from part (a) that 89.97% of values are less than 1.28
 50% of values lie to the left of the mean because this is a property of our symmetrical bell
curve.
Therefore the percentage of values between 0 and 1.28 is 89.97% - 50% = 39.97%
(d)
First draw a diagram:
We are looking for the percentage of
the graph to the right of -1.28
We cannot look up negative values in the table.
But we know
 That the bell curve is symmetrical
 The area to the right of -1.28 is the same as the area to the left of 1.28
Therefore the percentage of values greater than -1.28 is 89.97%
(e)
First draw a diagram:
We are looking for the percentage of
the graph between -1.28 and 1.28
To find this area we use the symmetry of the graph. Observe that the area between -1.28
and 0 is exactly the same as the area between 0 and 1.28. We also know the area between
0 and 1.28 because we found it in part (c).
Therefore the percentage of the graph between -1.28 and 1.28 is 2 × 39.97% = 79.8%
Exercise
1. In a standard normal distribution what percentage of values will be
(a) less than 1.95?
(b) less than -1.95?
(c) between -1.95 and 0?
(d) greater than 1.95?
(e) between -1.95 and 1.95
2.
In a standard normal distribution what proportion of values lie between -0.5 and 1.5?
3. In a standard normal distribution what proportion of values lie outside the interval ±1.7?
4. Given that a value in a standard normal distribution is greater than -1 what is the
probability that it will be less than 2? [Hint: use conditional probability formula]
Answers…may vary slightly depending on whether calculators or tables are used
1.
(a) 97.44% (b) 2.56%
(c) 47.44% (d) 2.56% (e) 94.88%
26
2. 0.6247
3. 0.0892
4. 0.9729
The proportion of the area under the curve to the left of a chosen value of z is given in the table
below.
z
27
PROBABILITY
AND THE NORMAL DISTRIBUTION
Even when data follows a normal distribution different data sets will have their own mean and
standard deviation and a different bell shaped curve.
But every score in a normally distributed data set regardless of the shape has an equivalent score in
the standard distribution. The mean of a normal distribution corresponds to a standardised score of 0
and we can see that
→ 1,
2 → 2 and
3 → 3.
BUT
0.8
z = -1
For other values we can use the formula
z=
1.2
z=0
x = 1.7
z=?
𝑥 𝜇
to find z scores.
𝜎
=
To find the standardised score for x = 1.7: z =
.
.
.
=
1.25
A score of 1.7 in the distribution with mean 1.2 and standard deviation 0.4 is equivalent to a
standardised score of 1.25. Alternatively, the score 1.7 is 1.25 standard deviations above the mean for
that distribution.
Once we have converted the scores of our distribution into standard scores or z-scores we can use
normal distribution tables to calculate precise percentages and probabilities. The normal distribution
is a continuous distribution, so we can find the probability that x is greater than or less than a
particular value, but not that x is equal to a particular value. Because the total area under the
standardised curve is 1, pr(z < β) is equivalent to the area to the left of β.
β
28
Examples
1. If the mean maximum temperature for Melbourne in January is 25. C with a standard
deviation of 2.1 what is the probability that the mean maximum temperature for January 2015
will be above 28 C?
First draw a diagram:
Then standardise x = 28
x = 28
z=
=
.
.
= 1
Pr (x > 28) = Pr (z> 1)
= 1 – 0.84 [from tables]
= 0.16
2. The top 0.5% of students applying for Stato university are given full scholarships. If the mean
score on the entrance exam is 372 and the standard deviation is 40 what mark is needed to
obtain a scholarship?
0.005
First draw a diagram:
xs
In this question we know that Pr(x > xs) = 0.005 but must work backwards to find the
cut off score that defines that area on the graph.
First we find the z-score: Pr (z > zs) = 0.005  z = 2.58 [using the tables in reverse]
Then substituting into the formula z =
: 2.58 =
2.58 × 40 =
3 2
103.2 =
3 2
103.2 + 372 =
= 475.2
Applicants who score more than 475.2 will obtain a scholarship
Exercise
1. If a population has a mean I.Q. of 100 and a standard deviation of 15,
(a) find the probability that an individual chosen at random will have an I.Q. between 110
and 130.
(b) find the probability that an individual chosen at random will have an I.Q. greater than 87
2. A coffee machine is regulated to deliver 200mL. per cup. In fact, the amount of coffee varies,
following a normal distribution with a mean of 200mL. and a standard deviation of 10mL.
(a) What is the probability that a cup contain less than 195mL.?
(b) What is the probability that a cup will contain more than 220mL.?
(c) What is the probability of a cup containing between 195 and 215 mL.?
3. (a) The heights of a group of men follow a normal distribution with a mean of 180 cm. and a
standard deviation of 6 cm. What is the probability that a man chosen from this group is
less than 185 cm tall?
(b) If the tallest 10% of this group are automatically eligible for a basketball team what is the
qualifying height.
Answers
1. (a) 0.2297
(b) 0.8069
2. (a) 0.3085
(b) 0.0228
29
(c) 0.6247
3. (a) 0.7977 (b) 187.69cm
SAMPLING DISTRIBUTIONS
A sampling distribution is the probability distribution for the means of all samples of size n from a
given distribution. The sampling distribution will be normal distributed with parameters
̅
and
̅,
if either

the population from which the samples are drawn is normally distributed, or

the samples are large (n ≧ 30)
where
𝜇𝑥̅ = μ and 𝜎𝑥̅ =
𝜎
[for large samples]
√𝑛
NB: ⦁ the sampling distribution has the same centre as the population
⦁ the measure of variability of a sampling distribution, ̅ , is called the standard error.
The distribution of means is not as spread out as the values in the population from
which the sample was drawn.
⦁ if we do not know the population standard deviation we approximate with the sample
standard deviation: s ̅ ≅ σ ̅ and
≅
)
√
√
Consider the little ‘population’ of values P = {1 2 3 4 5}
This population has μ = 3 and σ = 1.41
If a sample of size n = 3 was drawn from this population it could be any one of…
(1 2 3) (1 2 4) (1 2 5) (1 3 4) (1 3 5) (1 4 5) (2 3 4) (2 3 5) (2 4 5) (3 4 5)
The means of each of the samples, and a histogram of the distribution of means, are shown in the table
and graph below:
Sample
1 2
1 2
1 2
1 3
1 3
1 4
2 3
2 3
2 4
3 4
Mean
̅ =2
̅ = 2.33
̅ = 2.67
̅ = 2.67
̅ =3
̅ = 3.33
̅ =3
̅ = 3.33
̅ = 3.67
̅ =4
3
4
5
4
5
5
4
5
5
5
̿=3
̅ = 0.61
,
The sampling distribution of the means for samples of size 3 is:
P(
= ̅)
2
0.1
2.33
0.1
2.67
0.2
3
0.2
3.33
0.2
3.67
0.1
4
0.1
Even though this sample is small, and the population is not normally distributed (though it is
symmetric) the sampling distribution is reasonably normally distributed:
30
.2
.15
0
.05
.1
probability
2
2.5
3
Mean
3.5
4
We can see that the mean of the sampling distribution (the mean of all the means) is the same as the
population mean, ̿ = μ = 3. But the variability in the sampling distribution is less than that of the
population: ̅ = 0.61 and σ = 1.41. Because larger samples, or those drawn from normally
distributed populations, will follow a normal distribution we can use the properties of normal
distributions to find probabilities relating to samples:
̅
=
̅
̅
̅
=
√
Example
The shire of Bondara has 1200 preschoolers. The mean weight of pre-schoolers is known to be 18kg
with a standard deviation of 3kg. What is the probability that a random sample of 50 preschoolers will
have a mean weight more than 19kg?
n = 50, μ = 18 and σ = 3
The sampling distribution of the means for samples of size 50 will have
error,
̅
̅
=
√
̅–
=
=
√
=
√
Pr ( ̅
̅
= μ = 18, and standard
= 0.42.
–
=
2.38
√
̅ > 2.38)
= 1 – 0.9913 [from tables]
= 0.0087
1 ) = Pr(
Exercise
1. List all samples of size 2 for the population {1, 2, 3, 4, 5, 6}. What is the probability of obtaining
a sample mean of less than 3?
2.
Samples of size 40 are drawn from a population with μ = 50 and σ = 5.
(a) What are the mean and standard error of the sampling distribution?
(b) What is the probability that a particular sample has a mean less than 48.5?
3. If IQ in the general population of secondary students is known to follow a normal distribution
with μ = 100 and σ = 10,
(a) find the mean and standard error for a random samples of size 100.
(b) To test whether a secondary school is representative of the general population a sample of
100 students from that school is chosen. What is the probability of the mean IQ being more
than 105?
(c) What would be your conclusion?
Answers
1. 4/15
2. (a)
3.
̅
= 50 and
̅
= 0.79 (b) 0.0288
(a) ̅ = 100 and ̅ = 1 (b) 0.00003 (c) either the sample was not random (perhaps all the smartest students were in the
sample) or this school has a higher IQ than the general population.
31
CONFIDENCE INTERVALS
We use the statistics we obtain from samples to make inferences or estimates about the population
from which the sample was drawn.
For example

A batch may be selected in a factory production process to assess how the process is operating.

Surveys of consumers are used to determine the preferred brands in the population.

Polls are conducted on samples of the voting population before elections to predict the result
of the election.
Together with our estimate of the population parameter it is often helpful to provide a confidence
interval.
After constructing a confidence interval we are able to make statements such as:
“we are 5% confident that the true mean weight of boxes of cocobix cereal labelled 450g is in the
interval [44 .5, 453.8]”.
For large samples (n ≧ 30) we can use the mean of a sample, ̅ , to estimate the mean of the population,
, using the formula:
μ = 𝑥̅
𝑧
𝜎
√𝑛
or
μ = 𝑥̅
𝑠
𝑧
√𝑛
[when σ is not known]
The value of z is determined by the level of confidence and can be found using normal tables, a
graphics calculator or an online statistics program such as Stat Trek:
1.96
For a 95 % confidence interval z = 1.96
For a 99% confidence interval z = 2.575
For a 90% confidence interval z = 1.645
Example
36 of a certain type of fish were caught in Port Phillip Bay. This sample had a mean length of 30 cm.
and a standard deviation of 3 cm.
(a) What is the 95% confidence interval for the true mean length of this type of fish?
(b) What is the 98% confidence interval for the true mean length of this type of fish?
(a)
Confidence interval for μ = ̅
= 30
√
1. 6
= 30 ± 0.98
√
We can state with 95% confidence that the mean of the entire population of fish will be between
29.02cm and 30.98cm
32
(b)
CI for μ = ̅
= 30
√
2.326
= 30 ± 1.163
√
We can state with 98% confidence that the mean of the entire population of fish will be between
28.84cm and 31.16cm
Exercises
1. In an effort to improve appointment scheduling, a doctor agreed to estimate the average time spent
with each patient. A random sample of 49 patients yielded a mean of 30 minutes and a standard
deviation of 7 minutes.
(a) Construct a 95% confidence interval for the true mean.
(b) Construct an 80% confidence interval for the true mean.
2. To estimate the average weight of males in the town of Cityville a random sample of 100 men was
drawn from the population of 10 000 men and weights recorded. The mean weight was found to be
83kg and the standard deviation 12 kg.
(a) What is the 99% confidence interval for the mean weight of the male population.
(b) In two of the suburbs of Cityville, Subtown and Tubtown, the mean weights for males were found
to be 80kg and 88kg repectively. Comment on these results.
3. A market research company conducted a randomised survey of 50 regular smokers to find the
amount spent on cigarettes per week. They found that the smokers spent on average $22 each week
and the standard deviation was $4.50.
Using a 95% level of confidence calculate the confidence interval for the true mean amount spent
on cigarettes by regular smokers.
4. After randomly sampling 400 individuals and obtaining a sample mean of 56.5 a research
company was able to claim they were 90% certain that the true mean of the population was between
56.089 and 56.911. What was the standard deviation of the sample?
Answers
1. (a) [28.04,31.96]
(b) [28.72 ,31.28]
2. (a) [79.91, 86.09]
(b) The mean weight for Subtown men is within the expected range but men who live in Tubtown appear to be extremely heavy
compared with the general population. This may reflect lifestyle differences or a failure to select a random and representative
sample.
3. [20.75, 23.25]
4. 5
33
HYPOTHESIS TESTING
Consider statements such as
 Teenagers aged 13-15 spend no more than 10 hours a week on Facebook
 The average weight of Australian men is the same as it was in 1990.
 Students from private schools have the same mean ATAR score as the Victorian average.
 The mean winter rainfall for the last 10 years is the same as the historical mean.
Our confidence about the probabilities of values drawn from normally distributed populations and
sampling distributions enables us to formally test hypotheses (or claims) such as these.
When we perform an ‘experiment’ we know there will be chance variation. For example, if we toss a
supposedly fair coin 100 times we would not be surprised to obtain 48 or 45 or perhaps even 40
heads. However we would be surprised to obtain only 5 heads. If we were testing a coin for ‘fairness’
we might even like to decide beforehand what we would consider a reasonable number of heads. In
hypothesis testing ‘reasonable’ is defined as what we could expect 5% (or % or 0% etc) of the
time.
In a hypothesis test we are concerned to assess how unusual our result is, whether it is reasonable
chance variation (obtaining 45 heads in 100 tosses of a coin) or whether the result is too extreme to be
considered chance variation (obtaining 5 heads in 100 tosses of a coin). The experiment may consist
of drawing a sample and comparing the sample mean with the population mean for ‘reasonableness’.
A hypothesis test is a formal process with the following steps:
1.
State the null and alternative hypotheses
Ho:
̅ = μ [the sample mean is the same as the population mean allowing for chance variation]
Ha:
̅ ≠ μ [the sample mean is not the same as the population mean after allowing for chance
variation]
2.
A significance level α is chosen [α = 0.05  we are defining reasonable as what we can expect
3.
Tables or a calculator or a computer are used to find the
0.025
z-value that corresponds to the chosen significance level eg:
These are the thresholds for ‘reasonableness’ and are called
the critical values.
-1.96
4.
95% of the time]
0.025
0.95
1.96
The test statistic is the standardised difference between the sample mean (calculated
from the given data) and the known population mean: z =
̅
̅
[for a large sample where
σ is unknown]
5.
A decision is made regarding the ‘reasonableness’ of the test statistic:
Yes  Reject Ho
“Is the test statistic more extreme than the critical value?”
No  Do not reject Ho
6.
State your conclusion: There is (if you reject)/is not (if you do not reject) evidence to suggest
that….
NB: (i) The steps for hypothesis testing may differ from course to course so check with your
Program.
(ii) The decision relates only to rejecting or not rejecting Ho. Ha is not mentioned in the
decision, and we do not accept Ho or Ha.
34
Example
Because students had previously found a statistics course very difficult the average score
over many years was 48% with a standard deviation of 12%. A bridging program was
introduced and the 120 students that attended achieved a mean score of 50% in the final
exam. Is there evidence that the scores of those who attended the bridging program have changed at
a 99% level of significance?
1. Hypotheses: Ho: μ = 48
Ha: μ ≠ 48
2. α = 0.01 [level of significance 99% = 0.99]
3. Critical values: α = 0.01  z = -2.58 or z = 2.58
4. Test statistic: z =
̅
̅
=
= 1.83
√
5. Decision: Is 1.83 more extreme than 2.58? No, therefore do not reject Ho.
6. Conclusion: There is not evidence to suggest that the scores of those who attended the
bridging program have changed*. It is reasonable that the apparent improvement is due to
chance variation.
*[use the wording of the question]
Exercise
1. Repeat the example to decide if there is evidence at the 90% level of significance that
attending the bridging program is associated with the change in scores.
2. A random sample of 36 soft drinks from vending machines had an average content of 370ml
with a standard deviation of 20ml. Test the null hypothesis that μ = 3 5 ml against the
alternative hypothesis μ ≠ 3 5 ml at the % significance level.
3. A bank manager has historical data that shows over lunchtime Mon –Fri the mean number of
customers that come into the bank is 32. Accordingly he believes he has no need to change the
number of tellers. However a branch survey conducted every lunchtime over eight weeks
found that the mean number of customers was 36 with a standard deviation of 8.2. Conduct a
hypothesis test with a 95% level of significance to test whether the mean number of lunchtime
customers has changed. What recommendation would you make to the bank manager?
4. The manufacturer of ‘longlast’ batteries claims the mean lifetime of his batteries is 450 hours.
A consumer interest magazine samples 100 batteries and finds that they have a mean of 444
hours with a standard deviation of 28 hours. Do the sample data contradict the manufacturer’s
claim? [use α = 0.02]
Answers
Your answers should be set out and contain all the steps shown above. A brief outline of the main features is given below:
1. Test statistic = 1.83  reject Ho: evidence of change in scores.
2. Test statistic = -1.5  do not reject Ho: difference consistent with chance variation
3. Test statistic = 3.09  reject Ho: evidence of increase in number of lunchtime customers and therefore need more
tellers
4. Test statistic = 2.14  do not reject Ho: the difference is consistent with chance variation and there is no evidence
to contradict the claim that the mean battery life is 450 hours.
35