S Building a Phylogenetic Tree of the Human & Ape Superfamily -TO-DO-IT

H O W- T O - D O - I T
Building a Phylogenetic Tree
of the Human & Ape Superfamily
Using DNA-DNA Hybridization Data
S
ystematics is a significant and dynamic discipline within the field of evolutionary biology
whose products provide the foundation for other
evolutionary and biological studies (Futuyma,
1998). Like many other biology textbooks,
Campbell’s Biology (Sixth Edition) has an informative section on phylogeny, with diagrams of several
phylogenetic trees and a good general discussion of
the principle that DNA sequence similarity provides
exceptionally powerful insights into species’ relationships.
But specifically, how can the genetic difference
among a group of related species be determined?
How do DNA difference data reveal evolutionary
relationships? And, how are phylogenetic trees created? To truly understand the principles of phylogeny, students need to go beyond textbook reading assignments and use real data to develop phylogenetic trees of actual species. Unfortunately, activities for high school students and undergraduates
that demonstrate principles of phylogeny are hard
CAROLINE MAIER teaches undergraduate ecology courses and is
a graduate student in evolutionary biology at Rutgers
University, Newark, NJ 07102; e-mail: cmaier@
andromeda.rutgers.edu.
560 THE AMERICAN BIOLOGY TEACHER, VOLUME 66, NO. 8, OCTOBER 2004
CAROLINE ALEXANDRA MAIER
to come by and many of those that exist are simulations employing hypothetical species.
In this 90-minute activity, students learn how
the early technique of DNA-DNA hybridization
reveals the genetic difference between species. Then
they use actual genetic difference data to create a
phylogenetic tree of the human and ape superfamily, Hominoidea. They calibrate their DNA clock and
use it to estimate the divergence dates of the various
branches on the tree. The activity expands theoretical studies of DNA-DNA hybridization, molecular
phylogeny, and DNA clocks by allowing students to
use real DNA difference data to understand species’
evolutionary relationships. Since the generated tree
shows the relationships among humans and our
closest relatives, the activity can also serve as an
introduction to a unit on human evolution.
This activity addresses multiple National Science
Education Standards for grades 9-12: A—Science as
Inquiry (Abilities Necessary to do Scientific Inquiry,
Understandings about Scientific Inquiry), C—Life
Science (The Molecular Basis of Heredity, Biological
Evolution), and G—History and Nature of Science
(Historical Perspectives). Although developed for
advanced high school students and undergraduates
in general biology courses, this activity can be
adapted for younger students by leaving out the
concept-laden section on DNA-DNA hybridization.
Instead of generating their own genetic difference table,
the students can use the data in completed tables to create phylogenetic trees, turning this into a simpler puzzle-solving activity.
Background
One of the important aspects of evolution is that it
gives a historical context to biological diversity. In the
eighteenth century, Carolus Linnaeus organized species
into a hierarchy of increasingly inclusive groups based
on physical similarity. However, because Linnaeus predated Darwin, his taxonomy did not address familial
relationships among species. For example, Linnean
classification could recognize the striking physical similarities between pygmy and common chimpanzees, but
it could not address why these similarities exist. By recognizing that physical similarity can mirror familial relationships, Darwin’s Law of Common Descent added a
new historical dimension to Linnaean taxonomy and
initiated the discipline of systematics (Campbell, 2002).
The goal of systematics is to understand the history, or phylogeny, of groups of related species. To do this,
systematists compare physical similarities among several species, use the comparisons to infer the species’ evolutionary backgrounds, and express the results in a phylogenetic tree (Diamond, 1992; Campbell, 2002).
The earliest systematists compared large-scale morphological characteristics. Current researchers still use
these valuable features, but advances in molecular biology now allow them to also include DNA sequence comparisons in their studies (Diamond, 1992; Campbell,
2002). Since DNA is the fundamental unit of inheritance, sequence comparisons provide
powerful insights into hereditary relationships. At the time two populations split from their common ancestor, they initially carry nearly-identical pools of DNA sequences inherited
from the ancestral species. Over time,
independent
mutations
occur
throughout the genomes of the two
lineages, and this decreases their
genetic similarity. The longer the time
since two species diverged from their
common ancestor, the greater the
genetic difference. Since the average
rate of DNA sequence evolution
appears to be uniform and constant
in related lineages, DNA divergence
acts like a smoothly ticking clock
(Sibley & Alquist, 1984). Therefore,
difference in DNA sequence can be
used to determine the pattern of rela-
tive divergence events among a group of related species.
This can be depicted as a branching pattern that shows
the group’s evolutionary relationships. Figure 1 illustrates how DNA difference data are used to develop a
phylogenetic tree for a hypothetical group of species.
If the actual divergence time of a pair of species in
the group is known from the fossil record, the DNA
clock can be calibrated to absolute values, so that DNA
difference data estimate the date when pairs of species
diverged from their common ancestor (Sibley & Alquist,
1984). Figure 1 explains how a hypothetical DNA clock
is calibrated.
DNA difference data were first applied to taxonomy
in the 1970s by Charles Sibley and Jon Alquist. They
used the newly developed technique of DNA-DNA
hybridization to measure the amount of DNA sequence
difference among bird species, and their work in avian
taxonomy was pivotal. By the 1980s, Sibley and Alquist
shifted their attention to the history of humans and our
closest relatives: the two chimpanzee species, gorilla,
orangutan, and the two gibbon species (Sibley &
Alquist, 1984; Diamond, 1992). They constructed a
phylogenetic tree of the human and ape superfamily,
Hominoidea, using the Old World monkeys
(Cercopithecoidea) as an outgroup. Since the dates at
which the Old World monkey and orangutan clades
diverged from the main Hominoidea trunk are known
from the fossil record, the DNA clock could be calibrated and used to date the divergence times of members of
the superfamily (Sibley & Alquist, 1984).
In this activity, students learn how DNA-DNA
hybridization measures genetic difference, then use
Sibley and Alquist’s genetic distance data to create their
own phylogenetic tree of the hominoid superfamily.
PHYLOGENETIC TREE
561
Figure 1. The use of genetic difference data in creating and
calibrating a phylogenetic tree.
When two populations initially diverge from their common ancestor, they carry nearly
identical pools of DNA sequences inherited from the common ancestor.After divergence,
independent mutations occur continuously throughout the genomes of the two lineages,
and this gradually decreases their genetic similarity over time.The longer the time since
two species diverged from their common ancestor, the greater the genetic difference (A).
In this example, we can assume the lineage leading to modern species Y diverged more
recently from those of species W and X than did Z, since species Y is more genetically similar to species W and X than is Z.Since the average rate of DNA sequence evolution appears
to be uniform and constant in related lineages, genomic difference data can be used to
determine relative divergence events among a group of related species (Sibley & Alquist,
1984).This information can then be used to develop a branching pattern that shows the
group’s evolutionary relationships.Notice that the principle of constant average rate of
DNA sequence evolution means that all branches stemming from a divergence point on
the tree (B) have identical numbers of genetic character states, as represented by horizontal slashes.In this example, the lineages of species W, X, and Y all reflect three genetic
character states since the time they diverged from their common ancestor, represented on
the tree by a solid circle.Dashed arrows trace the evolution of species W, X, and Y.Numbers
on each arrow reflect the order of the three character states in each lineage.If the actual
divergence times of a pair of species in the group are known from the fossil record, the
DNA clock can be calibrated to absolute values.A calibrated clock uses DNA difference data
to estimate the actual date when pairs of species diverged from their common ancestor.In
the example (B), the hypothetical fossil record shows that species W and Z diverged 8,000
years ago.Since W and Z have experienced four genetic character states since they
diverged from their common ancestor, genetic character states occur approximately every
2,000 years within this group.The lineage leading to modern species Y must have
diverged from the main trunk of the tree approximately 6,000 years ago since three character states have occurred.
(A)
Percent Genetic Difference Between Pairs of Hypothetical Species
W
X
Y
Z
W
-
10
30
40
X
10
-
30
40
Y
30
30
-
40
Z
40
40
40
-
(B)
W
X
Y
3
3
3
2
2
2
1
Z
1
1
562 THE AMERICAN BIOLOGY TEACHER, VOLUME 66, NO. 8, OCTOBER 2004
They calibrate the DNA clock and use it to
estimate the divergence dates of the various
branches on the tree.
The Activity
After a general overview of systematics, I
use a simulation activity to introduce the idea
of genetic difference, the technique of DNADNA hybridization, and the principles of creating phylogenetic trees from genetic difference data. Pairs of students each receive an
envelope containing homologous 20-base
pair DNA fragments from five hypothetical
species (Figure 2). The 5'-3' and 3'-5' strands
of each fragment are on separate strips of
paper, so we begin by reviewing DNA structure and base pairing rules as the students
match complementary strands. The students
count the number of hydrogen bonds holding
each double-stranded fragment together and
discover that the fragments from species B
and C would denature at a higher temperature than the others since they are held by a
greater number of H bonds (Figure 3).
I explain that DNA-DNA hybridization
was an early method for comparing the similarity between two DNA sequences before
sequencing technology was developed. DNA
from two species is denatured, brought
together, and allowed to condense into a
hybrid molecule composed of a single
strand from each species. The homologous
sequences from the two species are similar,
but not identical due to the independent
mutations that have occurred since the two
lineages split from their common ancestor.
Hydrogen bonds do not form when mutations have resulted in the pairing of noncomplementary bases, so hybrid molecules
denature at a lower temperature than
parental fragments. There is a direct relationship between the genetic difference of
two species’ DNA and the amount by which
the melting point of the hybrid is reduced,
called the melting point depression. A mathematical model converts actual depression
values to percent genetic difference between
two species’ DNA. So even though DNA
sequencing technology was not yet readily
available in the 1970s, researchers were
using melting point depression to accurately
estimate genetic difference among groups of
related species.
Figure 2. Homologous 20-base pair DNA fragments from five
hypothetical species.
To use these fragments in the DNA-DNA hybridization simulation, enlarge and copy each
species’ fragment on its own color of paper, then cut the 5'-3' and 3'-5' strands apart so
that students can form hybrid molecules.
Species A
Species B
Species C
Species D
Species E
5'
3'
5'
3'
5'
3'
5'
3'
5'
3'
G
C
G
A
C
C
C
T
T
T
T
C
A
A
A
G
G
C
C
A
C
G
C
T
G
G
G
A
A
A
A
G
T
T
T
C
C
G
G
T
A
G
G
G
C
C
T
T
T
A
T
C
C
C
G
G
T
C
A
A
T
C
C
C
G
G
A
A
A
T
A
G
G
G
C
C
A
G
T
T
T
A
G
G
C
C
A
T
T
C
T
A
C
C
G
G
G
C
A
A
A
T
C
C
G
G
T
A
A
G
A
T
G
G
C
C
C
G
T
T
T
G
G
C
C
C
A
T
T
A
T
A
C
T
G
G
G
C
A
A
A
C
C
G
G
G
T
A
A
T
A
T
G
A
C
C
C
G
T
T
T
G
G
C
C
C
A
T
T
A
T
A
T
C
G
G
G
C
A
A
A
C
C
G
G
G
T
A
A
T
A
T
A
G
C
C
C
G
T
T
3'
5'
3'
5'
3'
5'
3'
5'
Students see this direct relationship between the
number of H bonds and genetic difference as they collect data on the hybrids formed by the five hypothetical
species and fill in Figure 3, written on the board as a
large table.
Once the students understand how DNA-DNA
hybridization generates genetic difference data, I show
them how to use the data to create a phylogenetic tree
of the five hypothetical species. On the finished tree
(Figure 4), as on all phylogenetic trees, time is represented vertically. Modern species are listed along the
top of the space delineated by the axes, connected by
branches that reach down through the space, and back
into time, eventually splitting from the main trunk.
Divergence points on the tree represent the common
ancestor shared by the splitting lineages,
and their vertical placement indicates
the relative time the event occurred.
Since the genetic difference among
species is related to the time since lineages split, it is also represented as a vertical axis in Figure 4. Tracing any two
species back to their common ancestor
on the branching diagram reveals the
percentage by which their DNA
sequences differ.
The genetic difference table for the
five hypothetical species (Figure 3) contains all the information needed to create the branching pattern on a copy of
the blank axes (Figure 5). First, students
root the tree by identifying the outgroup, a species or group only distantly
related to the study species. In the simulation, Species A is the outgroup since
its DNA differs from the others by the
greatest amount (50%). Students begin
creating the diagram by drawing a deep
“V” in the area delineated by the axes,
with the divergence point positioned
across from a 50% genetic difference
point on the vertical axis, as shown in
Figure 6. They label the top of the right
arm “Species A.” The empty left arm of
the V is now ready to be expanded into
a tree.
Creating the tree is much like solving a jigsaw puzzle. When all the species
are in the correct place on the tree, it is
possible to trace any pair back to their
common ancestor and read their genetic
3'
5'
difference off the left axis. Students can
most easily build the tree by first identifying the two most closely related
species on the genetic difference table
(D and E), drawing a branch from the main trunk of the
tree at a point corresponding to the correct genetic distance between the taxa, and labeling the end of the
branch with the name of the appropriate species. More
distantly related lineages are then added to the tree until
it is complete (Figure 4).
As students finish the sample tree, I point out that
Species B has the same genetic difference to Species C as
it does to Species D and E (30%). Since evolutionary
rates are fairly uniform in related lineages, all groups
branching from a single point should have approximately equal genetic difference. Slight differences in evolutionary rates and chance events, among other things,
can perturb this pattern, but it is widespread enough
PHYLOGENETIC TREE
563
work with real data. I introduce this part
of the activity by showing students pictures of the common chimpanzee,
pygmy chimpanzee, common gibbon,
siamang gibbon, gorilla, human, and
orangutan. I ask student groups to predict the evolutionary relationships
among the species by organizing them
into a possible tree. After recognizing
the diversity in the students’ ideas, I
explain that systematists traditionally
argued over the relationships among
these species. In an attempt to clarify the
phylogeny of this group, Charles Sibley
and Jon Alquist used DNA-DNA
hybridization to collect genetic difference data on these species. I challenge
the groups to use Sibley and Alquist’s
data to create a phylogenetic tree of the
hominoid superfamily. Copies of the
pair-wise genetic difference data table
(Figure 7) and empty axes (Figure 5)
provide the necessary information and
structure. This tree is more difficult than
the sample the students have just completed, so it is best to work in pencil
with an eraser nearby. The split to the
two species within the main gibbon lineage is perhaps the greatest challenge
because divergence within a side
branch, although common in nature, is
initially unexpected by the students.
This might confuse some groups, but
with a hint or two these students will
solve the problem and end up with the
correct final tree (Figure 8) which they
can confirm with a relative rate test.
Figure 3. The genetic difference and number of hydrogen bonds
between homologous 20-base pair DNA fragments from the five
hypothetical species used in the DNA-DNA hybridization simulation.
Complementary strands from two different species were aligned.To determine percent
genetic difference, the number of mismatched bases in the hybrid molecule was divided
by the total number of bases in the fragment (20) and the resulting product was multiplied by 100.To determine the number of hydrogen bonds that would hold the hybrid
molecule together, two bonds for each A-T pair and three bonds for each C-G pair were
summed.The darkly-shaded diagonal line shows the number of hydrogen bonds found in
the native, non-hybrid molecules.White boxes below the diagonal show the number of
hydrogen bonds holding together each hybrid molecule, while the lightly-shaded boxes
above the diagonal show the percent genetic difference between pairs of molecules.
Since the bottom left and top right portions of the table are mirror images of each other,
it is possible to compare the number of hydrogen bonds between hybrid molecules to
their percent genetic difference by looking across the darkly-shaded diagonal line.
B
C
D
E
A
50
50
50
50
50
B
24
51
30
30
30
C
26
37
51
20
20
D
26
36
40
50
10
E
26
36
40
45
50
% Genetic Difference
A
Number of Hydrogen Bonds
that systematists use this “relative rate test” to doublecheck the trees they develop.
Now that students have learned to create phylogenetic trees from DNA difference data, they are ready to
0
E
D
C
B
A Present
10
Figure 4. Phylogenetic tree of the five
hypothetical species.
This diagram shows the branching pattern of the
lineages leading to modern species A, B, C, D, and
E.Tracing any two species back to their branch
point allows their genetic difference to be read
from the left vertical axis.
20
30
40
50
% Difference in DNA
564 THE AMERICAN BIOLOGY TEACHER, VOLUME 66, NO. 8, OCTOBER 2004
Deep
Past
Time
Figure 5. Blank axes to use when creating the phylogenetic trees of the five hypothetical species (A)
and of the Hominoidea superfamily (B).
These axes are adapted from the diagram used by Sibley and
Alquist (1984) and shown in The Third Chimpanzee (Diamond, 1992).
The left axis corresponds to percentage of DNA difference, while the
right axis depicts time and is used to calibrate a DNA clock to the
tree. An enlarged copy of these axes guides students in creating
their simulated phylogenetic tree.
This diagram shows how the tree is started using Species A as the
outgroup.The tree will be built along the left arm of the V. Notice that
the branching point of the tree corresponds to a genetic difference of
50% since Species A differs from the other species by this amount.
0
A Present
10
(A)
0
Figure 6. Rooting the simulated phylogenetic tree
of the five hypothetical species.
Present
20
10
30
20
40
30
Deep
Past
50
% Difference in DNA
Time
40
Deep
Past
50
% Difference in DNA
Time
(B)
0
Present
1
2
3
4
5
6
7
8
% Difference in DNA
Millions of Years Ago
As each group finishes its tree, I give the students
the information they need to calibrate the DNA clock.
This is a simple matter of relating the two vertical axes
of genetic difference and time to each other. I tell the
students that the fossil record shows that the Old World
monkey and ape lineages, which differ in 7.3% of their
DNA, diverged approximately 30 million years ago. The
fossil record also indicates that the orangutan lineage
branched from the gorilla/chimp lineage approximately
16 million years ago. Orangutans differ genetically from
the great apes by approximately 3.7% (Sibley & Alquist,
1984; Diamond, 1992). I challenge the students to calibrate the DNA clock so that it converts the genetic difference between pairs of species into their approximate
divergence date. This is easily done by scaling the Years
axis on the right side of the tree diagram. Students need
only to place 30 on the Year axis across from the corresponding genetic difference of 7.3%, 16 across from the
genetic difference of 3.7%, and divide the rest of the axis
proportionally (Figure 8).
Students then use their calibrated DNA clock to
determine the approximate divergence time for the various lineages on the tree. They identify our closest and
most distant relatives among the ape species and determine whether the two chimpanzee species are most
closely related to gorillas or to humans. Finally, they
identify which pair is most closely related: the two gibbon species or humans and chimpanzees.
Extension
As groups finish their trees, they check their work
by reading pages 16-25 of Jared Diamond’s The Third
PHYLOGENETIC TREE
565
Figure 7. Genetic difference matrix of species in the superfamily Hominoidea.
This matrix, developed from Sibley and Alquist’s DNA-DNA hybridization data (Sibley & Alquist, 1984) and used in The Third Chimpanzee (Diamond,
1992) shows the approximate genetic difference, in percent, between the genomes of all pairs of ape and human species.
Human
Common
Chimp
Pygmy
Chimp
Common
Gibbon
Siamang
Gibbon
Gorilla
Old World
Monkeys
Orangutan
-
1.6
1.6
5.0
5.0
2.3
7.3
3.6
Common Chimp
1.6
-
0.7
5.0
5.0
2.3
7.3
3.6
Pygmy Chimp
1.6
0.7
-
5.0
5.0
2.3
7.3
3.6
Common Gibbon
5.0
5.0
5.0
-
2.2
5.0
7.3
5.0
Siamang Gibbon
5.0
5.0
5.0
2.2
-
5.0
7.3
5.0
Gorilla
2.3
2.3
2.3
5.0
5.0
-
7.3
3.6
Old World Monkeys
7.3
7.3
7.3
7.3
7.3
7.3
-
7.3
Orangutan
3.6
3.6
3.6
5.0
5.0
3.6
7.3
-
Human
Figure 8. Phylogenetic tree of the superfamily
Hominoidae.
This diagram, based on Sibley and Alquist’s DNA-DNA hybridization data
and tree (1984) and included in The Third Chimpanzee (Diamond, 1992)
shows the branching pattern of human and ape lineages.Tracing any
two species back to their branch point allows their genetic difference to
be read from the left axis.The right axis calibrates the molecular clock by
relating actual divergence dates for the Old World monkey and orangutan clades known from the fossil record to their percentage of genetic
difference.The date when any pair of species diverged from their common ancestor can be determined by tracing the two lineages back to
their divergence point and reading the divergence date off the right axis.
Common Pygmy
Common Siamang Old World
Chimp Chimp Human Gorilla Orangutan Gibbon Gibbon Monkeys
0
0
1
4
2
8
3
4
5
12
16
20
24
Chimpanzee. This delightful section describes the
process of DNA-DNA hybridization and the history of
its use by Sibley and Alquist in simple, straightforward, and interesting language that students easily
understand. It describes Sibley and Alquist’s data,
shows a diagram of the tree generated by the data,
and explains how a clock can be applied to the tree.
In addition, it describes the tree’s implications for
human and chimpanzee classification. Having just
finished using the actual data to create the tree
described by Diamond, students become engrossed
in the reading and understand the concepts more
thoroughly than they otherwise would. Many students develop a real interest in The Third Chimpanzee
and continue reading it outside of class. When this
happens, I know the activity has done what I hoped
it would—make evolution relevant and interesting to
my students.
References
Campbell, N. A. & Reece, J. B. (2002). Biology (Sixth
Edition). San Francisco, CA: Benjamin/Cummings
Publishing Company, Inc.
Diamond, J. (1992). The Third Chimpanzee. New York City,
NY: Harper Collins Publishers, Inc.
6
28
7
Futuyma, Douglas J. (1998). Evolutionary Biology (Third
Edition). Sunderland, MA: Sinauer Associates, Inc.
30
8
% Difference in DNA
Millions of Years Ago
566 THE AMERICAN BIOLOGY TEACHER, VOLUME 66, NO. 8, OCTOBER 2004
Sibley C.G. & Alquist, J. E. (1984). The phylogeny of hominoid primates, as indicated by DNA-DNA hybridization. Journal of Molecular Evolution, 20(1), 2-15.