Electronic Supplementary Material Details of sample collection, lab, and analytical methods

Electronic Supplementary Material
Details of sample collection, lab, and analytical methods
Sample collection—Stool samples were collected opportunistically from the forest floor in
Ndundulu in 2005-2007 while following groups of kipunji for ecological study (Jones, unpubl.).
The locations of freshly excreted samples (Supplementary Table 1) were recorded using
handheld GPS units (Garmin 60Cx, Garmin, UK), before placing each individual scat in
RNAlater stabilization solution (Ambion, Inc., Austin, TX). Based on the collection locations,
dates, and conditions, we are confident that the sequenced samples come from six different
animals. All samples were exported under permits from the Wildlife Division of Tanzania and
the Convention on International Trade in Endangered Species.
A male subadult kipunji was found dead by Claire Bracebridge on 15 July 2008 in Livingstone
Forest (Rungwe District, Mbeya Region, Tanzania; 9.20483º S, 33.89046º E, WGS84, 1872 m),
approximately 2.5 km east of Kiguru village. This specimen was prepared as a museum study
skin, skull, and fluid-preserved carcass and is deposited at the Wildlife Conservation Society
(WCS) Southern Highlands Conservation Programme office, Mbeya Tanzania (SHCP 2458).
Muscle and kidney samples were preserved in EDTA buffer solution. The second specimen was
a living but injured kipunji being held by residents of Syukula village on Mt. Rungwe
(09.166510º S, 33.63304º E, 1770 m) discovered by Noah Mpunga on 1 May 2007. The tail tip
of this animal had been cut, and Mpunga was able to remove a small amount of tissue from the
wound before releasing it otherwise unharmed (tissue sampling was conducted in accordance
with the guidelines of the American Society of Mammalogists' Animal Care and Use Committee
1
[Gannon et al. 2007]) . This sample was delivered to and exported by WTS and is logged in his
field catalogue (archived at the Field Museum, Chicago, IL, USA) as WTS 9308.
Lab methods—We extracted DNA from stool samples using the Qiagen DNA Stool Mini
(QIAGEN, Hilden, Germany) protocol for liquid samples, and from tissue samples using the
PureGene Animal Tissue kit (Gentra Systems Inc., Minneapolis, MN, USA). Extractions from
stool samples were done in a separate lab in a PCR-free building. We PCR-amplified DNA
fragments in 10-25 uL reactions with Promega GoTaq (Promega Corp., Madison, WI) following
the manufacturer’s recommended PCR protocol. PCR amplification of DNA from stool samples
also included bovine serum albumin (BSA; New England Biolabs, Ipswich, MA) at a
concentration of 0.1 mg/mL. We used the following primers: for 12SrRNA, L1091 and H1478
(Kocher et al. 1989); for COI, OWMCO-If and OWMCO-Ir (Lorenz et al. 2005); for CO2, CO2F2 (Davenport et al. 2006) and BCO2R1 (Switzer et al. 2005); and for ND4/5, A896LF,
A896HR, B896LF, B896HR, or H12652 (Newman et al. 2004). We purified 20 µL of the
amplicons using 0.25 µL exonuclease I, 0.50 µL shrimp alkaline phosphatase, and 2.0 µL 10x
buffer (USB Corp., Cleveland, OH, USA) at 37°C for 15 minutes followed by 80°C for 15
minutes, or by vacuum filtration through a Millipore HTS plate (Millipore Corp., Billerica, MA,
USA) following the manufacturer’s instructions. Amplicons were cycle sequenced in both
directions using the amplification primers and ABI BigDyes 3.1 dye termination (Applied
Biosystems, Foster City CA), purified by centrifugal filtration through Sephadex G-50 fine
(Amersham Biosciences, Uppsala, Sweden) in a multiscreen filter plate (Millipore Corp.,
Billerica, MA, USA), and sequenced on an ABI 3130 or ABI 3130xl automated sequencer.
Sequences were visualized, edited, and assembled in Sequencher 4.8 (GeneCodes, Ann Arbor
2
MI). Not all fragments were sequenced for all stool samples (Table 1). Sequences have been
deposited in GenBank under accession numbers GU068059–GU068086.
Taxon sampling—We downloaded representative sequences for non-papionin outgroups and all
available papionin sequences for the sequenced fragments, excluding those thought to be nuclear
copies (numts) by their authors. These fragments allow us to maximize our comparison to Papio
sequences previously published by other investigators, especially Zinner et al. (2009), Burrell et
al. (2009), Wildman et al. (2004), Newman et al. (2004), Switzer et al. (2005), van der Kuyl et
al. (1995), and Lorenz et al. (2005), so the data sets for different fragments contain substantially
different sampling within Papio.
All mitochondrial genes are linked and therefore share a phylogenetic history, so concatenation
is justified and should provide more power than analyses of single fragments. However,
individual analyses can help to identify unexpected conflict, which could be a sign of
contamination or the amplification of nuclear copies of mitochondrial genes (numts). A different
sample of Papio sequences is available for each of these fragments, so individual analyses also
allowed us to make the greatest use of comparative sequence data. For outgroups and
cercopithecines outside Papio, we concatenated sequence fragments from a single individual (for
example, from complete mitochondrial genomes) wherever possible but from different
individuals where necessary to maintain adequate taxon sampling. Sampling of individuals
within Papio is critical in this study, and concatenating fragments from different individuals at
this level could influence results, so we excluded Papio individuals in the combined analysis for
which COI, COII, and 12S were not all available. Only one Papio sequence includes all four
3
fragments, so we excluded the ND4/ND5 fragment from the combined analysis. Because all
Rungwecebus stool samples were identical for all sequenced genes, we included this Ndundulu
haplotype only once in the analyses.
Mitochondrial sequence verification—The amplification of nuclear copies of mitochondrial
genes (numts) is often a concern with mitochondrial data. We used two analytical approaches to
screen for possible numt contamination. First, we checked every individual fragment to make
sure it conformed to the expected characteristics of a true mitochondrial sequence, including base
composition, absence of frameshift mutations and stop codons (for coding genes), and absence of
mutations in conserved, pairing stem regions (for 12S rRNA and the three tRNAs). Second, we
analyzed the four individual fragments separately. It is unlikely that all four of our sequence
fragments are parts of a single numt insertion, as most numts are shorter than the span of the
mitochondrial genome that would involve. However, it is also unlikely that four independent
numt insertions would result in the same gene tree topology for the separate fragments. Our
sequences have the characteristics of typical mammalian mtDNA, and the four individual data
sets yield consistent phylogenetic results for the stool samples; together, these factors make it
extremely unlikely that our results are due to accidental numt amplification.
Alignment—We manually aligned all sequences. For COI, COII, ND4, tRNA-His, and tRNASer (AGY), alignment is unambiguous as there is no length variation. In ND5, there is a 3-bp
deletion in three P. ursinus sequences (AY212057-AY212059), which we parsimoniously
inferred as a single codon deletion. We aligned 12S and tRNA sequences manually to detailed
secondary structure models (12S model based on Springer & Douzery 1996; tRNA structure
4
information from Mamit, http://mamit-trna.u-strasbg.fr/), which can provide a more accurate and
biologically realistic alignment and analysis by explicitly including the stem-and-loop structure
of ribosomal sequences (Kjer et al. 2009). We excluded 56 alignment positions from the 12S
analysis for which we considered the alignment ambiguous. Our alignments, including details of
excluded sites and secondary structure, are available from Dryad (http://www.datadryad.org).
Phylogenetics—We used PAUP*4.0b10 (Swofford 2002) and MrBayes 3.1 (Ronquist and
Huelsenbeck 2003) for phylogenetic analyses. We selected substitution models for likelihood
bootstrapping and for individual-gene Bayesian analyses with the Akaike Information Criterion
(AIC) by scoring possible models on a maximum parsimony tree. For COI and COII we applied
an HKY model, with gamma rate variation (G) and invariant sites (I). For 12S, we applied the
doublet model (Ronquist & Huelsenbeck 2003) to paired nucleotides in the stem regions, with
separate HKY parameters for the paired and unpaired partitions and a single gamma shape
parameter, and with relative rates varying between the two partitions. For ND4/5 we used a
single GTR+I+G model. For the combined data, we used a partitioned model split into 12S
pairing, 12S nonpairing, and 1st, 2nd, and 3rd codon positions. The 12S pairing partition was
assigned a doublet model. All partitions were assigned an HKY model, with the rate ratio linked
for the two 12S partitions and for the three coding-gene partitions. Base frequencies were
estimated separately for the five partitions, and a single gamma shape parameter was estimated
across all partitions. Among-partition rate variation was modeled by using a variable (Dirichlet)
prior on relative rates.
5
In all MrBayes analyses, we used two separate runs from random starting parameters, each with
four chains (1 cold, 3 heated; heating parameter 0.2). We ran each analysis for 20 million
generations, sampling every 1000 generations, and excluded the first 1001 samples (1 million
generations) as burnin. We used our own scripts and the package coda (Plummer et al. 2006) in
R 2.8.1 (R Development Core Team 2004) to assess convergence between runs and the behavior
of Markov chains within runs, including effective sample sizes, autocorrelation, and parameter
variances. In addition to posterior probability for the combined data, we estimated bootstrap
support with 1000 parsimony bootstrap replicates and 500 likelihood bootstrap replicates. For
likelihood bootstrapping, we used a Tamura-Nei + I + G model.
We tested whether an unconstrained topology was significantly better than one with a single
monophyletic Rungwecebus clade using a Shimodaira-Hasegawa test in PAUP* 4.0b10. Under a
GTR+I+G model, the best unconstrained topology (log-likelihood -9972.00572) was
significantly better than the best constrained topology (log-likelihood -10012.20392; RELL
bootstrap P = 0.003). In a Bayes factor test, the log-Bayes factor for the unconstrained topology
compared to the constrained topology, based on 15,000,000 post-burnin generations in MrBayes
with identical likelihood models, was 37.77, overwhelmingly supporting the unconstrained
topology.
6
Supplementary references:
Gannon, W. L., Sikes, R.S., & the Animal Care and Use Committee of the American Society of
Mammalogists. 2007 Guidelines of the American Society of Mammalogists for the use of
wild mammals in research. J.Mamm. 88, 809-823.
Kjer, K.M., Roshan, U., & Gillespie, J.G. 2009 Structural and evolutionary considerations for
multiple sequence alignment of RNA, and the challenges for algorithms that ignore them. In
Sequence alignment: Methods, models, concepts, and strategies (ed. Rosenberg, M.S.), pp.
105-150. Berkeley: University of California Press.
Kocher, T.D., Thomas, W.K., Meyer, A., Edwards, S.V., Pääbo, S., Villablanca, F.X., & Wilson,
A.C. 1989 Dynamics of mitochondrial DNA evolution in animals: amplification and
sequencing with conserved primers. Proc. Natl. Acad. Sci. USA 86, 6196-6200.
Lorenz, J.G., Jackson, W.E., Beck, J.C., & Hanner, R. 2005 The problems and promise of DNA
barcodes for species diagnosis of primate biomaterials. Phil. Trans. R. Soc. B 360, 18691878.
Newman, T.K., Jolly, C.J., & Rogers, J. 2004 Mitochondrial phylogeny and systematics of
baboons (Papio). Am. J. Phys. Anthro. 124, 17-27.
Plummer, M., Best, N., Cowles, K., & Vines, K. 2009. coda: output analysis and diagnostics for
MCMC. R package version 0.13-4.
R Development Core Team 2008. R: a language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org.
Ronquist, F. & Huelsenbeck, J.P. 2003 MrBayes 3: Bayesian phylogenetic inference under
mixed models. Bioinformatics 19, 1572-1574.
7
Springer, M.S. & Douzery, E. Secondary structure and patterns of evolution among mammalian
mitochondrial 12S rRNA molecules. J. Mol. Evol. 43, 357-373.
Switzer, W.M., Salemi, M., Shanmugam, V., Gao, F., Cong, M., Kuiken, C., Bhullar, V., Beer,
B.E., Vallet, D., Gautier-Hion, A., Tooze, Z., Villinger, F., Holmes, E.C., & Heneine, W.
2005 Ancient co-speciation of simian foamy viruses and primates. Nature 434, 376-380.
Swofford, D.L. 2002. PAUP: Phylogenetic inference using parsimony (and other methods),
version 4.0b10. Sunderland, MA: Sinauer and Associates.
van der Kuyl, A.C., Kuiken, C.L., Dekker, J.T., & Goudsmit, J.1995 Phylogeny of African
monkeys based upon mitochondrial 12S rRNA sequences. J. Mol. Evol. 40, 173-180.
Wildman, D.E., Bergmann, T.J., al-Aghbari, A., Sterner, K.N., Newman, T.K., Phillips-Conroy,
J.E., Jolly, C.J., & Disotell, T.R. 2004 Mitochondrial evidence for the origin of hamadryas
baboons. Mol. Phylogenet. Evol. 32, 287-296.
8