Lab Retriever

Lab Retriever Manual
Proper use of this software assumes prior training and education in analyzing and exporting STR genotype
data, formulating hypotheses, and the use of likelihood ratios. This manual does not provide, nor is it a
substitute for, that training and education.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This program is free software: you can redistribute it and/or modify it under the terms of the Creative
Commons license. http://creativecommons.org/licenses/by-nc-sa/4.0/ ;
http://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.
You are free to:
Share – copy and redistribute the material in any medium or format
Adapt – remix, transform, and build upon the material
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if
changes were made. You may do so in any reasonable manner, but not in any way that suggests
the licensor endorses you or your use.
NonCommercial — You may not use the material for commercial purposes.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your
contributions under the same license as the original.
Before implementing for casework, users are advised to check results from this software using an
independent calculation or software program.
Credits:
Based on the original work of:
•
David Balding
•
John Buckleton
Research and development:
•
Keith Inman
•
Kirk Lohmueller
•
Norah Rudin
Programmers:
•
Ken Cheng
•
Luke Inman-Semerau
•
Chris Robinson
•
Adam Kirschner
Data Assistance
•
Allison Bricker
1
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
Introduction
Lab Retriever is a program to calculate likelihood ratios that incorporates a probability of drop-out
(P(DO)). It is based on R-code originally created by David Balding (Balding, D.J., Buckleton J.,
Interpreting low template DNA profile, Forensic Science International: Genetics 4 (2009). The code
has been rewritten in C++ and a graphical user interface (GUI) added for ease of use.
This manual provides basic instructions for using Lab Retriever. It assumes that the user has received
the appropriate education and training to understand the underlying principles and is competent in their
application.
I.
Determine an empirical threshold
A. An integral part of calculating a likelihood ratio incorporating a P(DO) is applying an
empirical threshold to the data so that the information content is maximized.
B. Many methods exist to estimate an empirical threshold. One easy procedure is to
calculate 2X the maximum non-artifactual peak in 1 or more negative samples. The
empirical threshold, by definition, can vary with any particular combination of
hardware/software/chemistry. Based on our experience with current systems (e.g. 3130,
Identifiler, Identifiler plus, PowerPlex 16), this result is frequently about 30 RFU.
C. Re-analyze the evidence sample of interest, along with any associated negative controls
at the new empirically determined threshold. In some instances you may wish to turn off
the stutter filter. This is appropriate if a minor component is in the same peak height
range as stutter peaks from the major component.
Note: create a profile in your .csv input data file that contains peaks in stutter positions in
the evidence sample that are:
1. Above empirical AT
2. Below stutter threshold
3. Also present in the suspected contributor
II.
Determine an empirical P(DO)
A. It is useful to derive an empirical estimate of the probability of drop-out P(DO) for the
sample or sub-sample peaks of interest. This makes a good starting point for your
calculations.
B. The most reliable approach to determining a P(DO) is to use the empirical end product,
that is, the peak heights of the sample or sub-sample peaks of interest as an indicator of
the P(DO). It is possible for each laboratory to determine a P(DO) function from their
internal validation data. Fortunately, the systems used in forensic DNA testing are highly
standardized, thus a universal P(DO) function derived from NIST data is a reasonable
substitute.
C. If you have not generated your own function, you can download a P(DO) calculator based
on NIST data at http://scieg.org/lab_retriever.html.
D. To calculate P(DO)
1. Calculate the average peak height:
a.
Add the peak heights of either all the peaks in the profile or of the
relevant component.
2
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
b.
Divide by the number of peaks to obtain the average
2. Launch the Excel file that contains the P(DO) calculator found at
http://scieg.org/lab_retriever.html.
a.
It will open on the first tab labeled “Average RFU”
b.
Input the average peak height in the indicated cell.
c.
Click return.
d.
Click on either the Identifiler or PowerPlex 16 tab to see the calculated
P(DO)
e.
III.
i.
Results are displayed for an AT of 50 RFU and 30 RFU
ii.
Use the value closest to your empirically determined AT
“Save as” in the location of your choice to save your work.
Prepare data for export from your genetic analysis program
Note1: The following instructions are for GeneMapper ID (GMID). The data can be prepared
and exported from any genetic analysis program than can export a .csv file. The following
instructions should be easy to adapt to other programs.
Note 2: Although Lab Retriever is fairly forgiving in terms of input format, preparing your
data properly streamlines the process. As with any computer program GIGO; Lab Retriever
will return a result for properly prepared data even if the data is incorrect. The user is
responsible for confirming that the correct data is in the correct place in the input file.
A. Export the data
1. Select and open the HID Table.
3
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
a.
In Lab Retriever version 1.1.2 it is no longer necessary to check
“Duplicate homozygous alleles” prior to exporting your data.
2. Select the Genotypes tab.
3. Check:
a.
Sample File
b.
Sample Name
c.
Marker
d.
Allele
e.
Height
i.
Lab Retriever does not use the height value, but you should
choose to extract this data to calculate an empirical threshold
4. Make sure you allow for a sufficient number of alleles to accommodate complex
mixtures. Ten (10) is usually sufficient.
IV.
Export the data
Note: Lab Retriever will accept exported files that include all samples, ladders, controls etc.
However, it may be expedient to prepare a project containing only the samples of interest.
This is not absolutely necessary, but generates a simplified list from which to choose later in
Lab Retriever. The samples in a single project can reside in different run folders. Samples
need not all reside in the same project. Multiple .csv files can later be imported into Lab
Retriever
A. If you do wish to simplify the project by removing ladders, controls, and other extraneous
files:
1. Options exist to simply your data prior to import. You may either:
a.
Delete extraneous files in the genetic analysis program itself, or
b.
After export, delete extraneous files and shorten names in the .csv file
(see below)
4
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
B. To export the sample from GMID:
1. Select the Genotypes tab.
a.
This is a good time to scan for and delete any missed OL peaks.
2. Under File, select Export Table.
V.
a.
Under Export File As, select .csv
b.
Navigate to an appropriate folder to save your exported file and name it
in a way that is meaningful to you.
c.
Click Export File.
Import the data into Lab Retriever.
A. Launch the Lab Retriever program
1. Both Windows and Mac versions are available and work in exactly the same
way.
2. Input Case ID, Sample ID, and Analyst in a way that is meaningful to you.
5
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
3. Unless you have some particular reason to believe that the probability of drop-in
P(DI) is high, you may leave the default value of 0.01. Alternatively, insert a
laboratory-specific empirically-derived value.
4. Input the estimated P(DO)
a.
The sample can be run multiple times with different P(DO)s.
b.
It is useful to check how the calculated LR responds to a range of
values on either side of the empirically determined P(DO) to understand
the effect of a specific P(DO) on the final LR.
5. You can select a single race (AA, Cauc, Hisp) or leave the default selection of
All.
6. Input the desired value for FST (or θ)
a.
This parameter intends to correct for population substructure that
derives from distant ancestry. Formally, θ is a particular derivation of
FST, (Weir and Cockerham 1984 ) but the terms are considered
approximately equivalent for this purpose.
7. The Identical by Decent (IBD) drop-down menu may be used when the
denominator (H2) is defined as a random close relative rather than a random
unrelated person.
a.
IBD can be used when the numerator (H1) contains only one suspected
contributor.
b.
Similarly, only one contributor in the denominator can share alleles
IBD with the suspected contributor (i.e. be a relative of the suspected
contributor), even when multiple unknown contributors are present. For
example, if 3 unknown contributors are specified for H2, one will be a
relative (as specified by the IBD field) of the suspected contributor and
two of them will be random individuals, unrelated to each other and
unrelated to the suspected contributor.
c.
The 0, 1 and 2 next to the IBD fields specify the number of alleles
Identical By Decent between the suspected contributor and the
hypothesized alternate contributor.
i.
The default IBD value is for unrelated individuals. This is
represented as a “1” in the 0 IBD field to indicate no alleles in
the profile should be hypothesized to be IBD.
ii.
For example, a parent-child relationship would, by standard
inheritance rules, share 1 allele IBD at each locus. So the
values to input for this hypothesized relationship would be:
6
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
iii. Similarly for siblings, the standard inheritance rules would
predict:
iv. A list of common relationships is provided in Appendix III.
8. Choose H1 and H2.
a.
The total number of UNKs in H1 and H2 are allowed to differ. It is up
to the analyst to choose supportable hypotheses.
b.
The number of Assumed contributors is not limited
c.
The total number of contributors in the sample is not limited
d.
The total number of contributors in H1 and H2 are allowed to differ. It
is up to the analyst to choose supportable hypotheses.
e.
The number of Suspected contributors in the numerator (H1) may be
either one or two. It is up to the analyst to choose supportable
hypotheses.
f.
The maximum number of unknown contributors (UNK) in the
denominator (H2) is 4.
9. Click on the Load a File button:
a.
Navigate to the location of the exported .csv file and select it.
i.
A confirmation of file upload will appear
i.
Currently Lab Retriever uses the collective set of 29
autosomal loci defined by the NIST 1036 population dataset.
(http://www.cstl.nist.gov/strbase/NISTpop.htm) This includes
the Globalfiler and PowerPlex Fusion loci or any subset of
them.
7
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
ii.
The loci reflected in Lab Retriever are determined by the loci
typed in the evidence (detected) profile.
a.) Please retain loci that were typed, but for which no
result was obtained. Such loci contribute a small, but
non-0 weight to the LR.
ii.
b.
c.
You can upload multiple .csv files and choose your samples
from amongst them. The program holds all uploaded files in
cache while it is open.
Click on the plus sign in the Detected column and select the
Evidence profile. The sample file name appears in the drop-down box
and allows you to select between duplicate injections (if any) with the
same sample name.
i.
It will populate the column with the Detected alleles.
ii.
It will also populate the Unattributed column.
Select the Assumed donor (if any)
i.
If the Assumed donor profile is in the same .csv file as the
evidence, click on the plus sign in the Assumed column and
select the profile of an Assumed contributor.
ii.
If the Assumed donor is in a different .csv file, click Load File
again, navigate to the file containing the Assumed donor
profile, and select it. Then, click on the plus sign in the
Assumed column and select the profile of an Assumed
contributor.
iii. Note that those alleles will be subtracted from the
Unattributed column.
iv. If there is more than one Assumed contributor, just click the
plus sign again, and choose the appropriate sample. This will
generate an additional column. Those alleles will also be
subtracted from the Unattributed column.
d.
Select the Suspected donor
i.
If the Suspected donor profile is in the same .csv file as the
evidence, click on the plus sign in the Suspected column
and select the profile of a Suspected contributor.
ii.
If the Suspected donor is in a different .csv file, click Load File
again, navigate to the file containing the Suspected donor
profile, and select it. Then, click on the plus sign in the
Suspected column and select the profile of an Suspected
contributor.
iii. If you wish to run two Suspected donors simultaneously, click
on the plus sign in the Suspected column to bring up a second
column in which a different Suspected profile may be chosen.
Note: If you choose to run two Suspected donors
simultaneously we recommend that you also run each
Suspected donor separately to determine their relative
8
Applies to Lab Retriever version 2.2
10/5/2014
Norah Rudin 10/6/14 7:08 PM
Comment [1]: image
Lab Retriever Manual
contributions to the final LR. Also consider running each
separately and using one and or the other as an Assumed
contributor if warranted.
e.
You can clear a column by hovering over the sample name to bring up
a red “remove” button. You can then re-select a different profile for
that column.
f.
Column header cells containing sample names can be edited by
clicking in them and selecting the text to be deleted or changed.
i.
VI.
This can be useful to simplify long names
Run Lab Retriever
A. Click the Run! button.
1. Grayed-out data as well as a gray progress bar and a blue cursor ball indicate
that the program is running.
2. When the computations are complete, the output will appear on the right side of
the screen.
a.
A sliding door obscures the Assumed and Suspected input columns.
These can be revealed by clicking on the double arrow at the top of the
Detected column.
9
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
3. Click the “save” button to save the results as a .csv file.
a.
The file name will auto-populate with the date, sample file and chosen
hypotheses. You can edit the file name to your liking if you wish.
b.
Navigate to an appropriate location and save the file.
c.
Open the saved .csv file using Excel or spreadsheet of your choice to
view the results file.
4. You can easily run the sample configuration with multiple P(Do)s by changing
just this parameter and rerunning the calculation.
5. To remove all profiles and calculations and start over, click the Clear button
10
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
For technical support, or to offer comments or suggestions, please go to www.scieg.org.
11
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
Appendix I
Selected References
Schneider, P.M., Gill, P., Garracedo, A., Editorial on the recommendations of the DNA commission of the
ISFG on the interpretation of mixtures, Forensic Science International 160, (2006) 89.
Gill, P. et al. DNA commission of the International Society of Forensic Genetics: Recommendations on the
interpretation of mixtures, Forensic Science International 160 (2006) 90-101.
Buckleton, J.S., Curran, J.M., Gill, P., Towards understanding the effect of uncertainty in the number of
contributors to DNA stains, Forensic Science International: Genetics 1 (2007) 20–28.
Gill, P., Kirkham, A., Curran, J., LoComatioN: A software tool for the analysis of low copy number DNA
profiles, Forensic Science International 166 (2007) 128–13.
Balding, D.J., Buckleton J., Interpreting low template DNA profile, Forensic Science International:
Genetics 4 (2009) 1–10.
Tvedebrink, T., Eriksen, P.S., Mogensen, H.S., Morling, N., Estimating the probability of allelic drop-out
of STR alleles in forensic genetics. Forensic Science International: Genetics 3 (2009) 222–226.
Perlin, M.W., Sinelnikov, A., An Information Gap in DNA Evidence Interpretation, PLOS one, (2009),
4(12)
Gill, P., Buckleton, J., A universal strategy to interpret DNA profiles that does not require a definition of
low-copy-number, Forensic Science International: Genetics 4 (2010) 221–227.
Haned, H., Forensim: An open-source initiative for the evaluation of statistical methods in forensic
genetics, Forensic Science International: Genetics 5 (2011) 265–268.
Perlin, M.W., et al., Validating TrueAllele DNA Mixture Interpretation, J Forensic Sci, (2011), Vol. 56,
No. 6
Haned, H., Egeland, T., Pontier, D., Pene, L., Gill, P., Estimating drop-out probabilities in forensic DNA
samples: A simulation approach to evaluate different model, Forensic Science International: Genetics 5
(2011) 525–531.
Tvedebrink, T., Eriksen, P.S., Mogensen, H.S., Morling, N., Statistical model for degraded DNA samples
and adjusted probabilities for allelic drop-out, Forensic Science International: Genetics 6 (2012) 97–101.
Carracedo., A., Schneider, P.M., Butler, J., Prinz, M., Focus issue—Analysis and biostatistical
interpretation of complex and low template DNA samples, Forensic Science International: Genetics 6
(2012) 677–678.
Gill et al., DNA commission of the International Society of Forensic Genetics: Recommendations on the
evaluation of STR typing results that may include drop-out and/or drop-in using probabilistic methods,
Forensic Science International: Genetics 6 (2012) 679–688.
Benschop, C.C.G., Assessment of mock cases involving complex low template DNA mixtures: A
descriptive study, Forensic Science International: Genetics 6 (2012) 697–707.
Biedermann, A., Bozzza, S., Konis, K., Taroni, F., Inference about the number of contributors to a DNA
mixture: Comparative analyses of a Bayesian network approach and the maximum allele count method,
Forensic Science International: Genetics 6 (2012) 689–696.
Haned, H., Slooten, K., Gill, P., Exploratory data analysis for the interpretation of low template DNA
mixtures, Forensic Science International: Genetics 6 (2012) 762–774.
12
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
Bright, J-A., McManus, K., Harbison, S., Gill, P., Buckleton., A comparison of stochastic variation in
mixed and unmixed casework and synthetic samples., Forensic Science International: Genetics 6 (2012)
180–184.
Kelly, H., Bright, J-A., Curran, J., Buckleton, J., The interpretation of low level DNA mixtures, Forensic
Science International: Genetics 6 (2012) 191–197.
Mitchell, A. et al., Validation of a DNA mixture statistics tool incorporating allelic drop-out and drop-in,
Forensic Science International: Genetics 6 (2012) 749–76.
Pfeifer, C.M., et al., Comparison of different interpretation strategies for low template DNA mixture,
Forensic Science International: Genetics 6 (2012) 716–722
Rakay, C.A., Bregu, J., Gricak, C.M., Maximizing allele detection: Effects of analytical threshold and DNA
levels on rates of allele and locus drop-out, Forensic Science International: Genetics 6 (2012) 723–728
Tvedebrink, T., et al., Allelic drop-out probabilities estimated by logistic regression—Further
considerations and practical implementation, Forensic Science International: Genetics 6 (2012) 263–267
Lohmueller, K.E., Rudin, N., Calculating the Weight of Evidence in Low-Template Forensic DNA
Casework, J Forensic Sci, 2012
Balding, D., likeLTD (likelihoods for low-template DNA profiles)
https://sites.google.com/site/baldingstatisticalgenetics/software/likeltd-r-forensic-dna-r-code
Westen, A.A., et al., Assessment of the stochastic threshold, back- and forward stutter filters and low
template techniques for NGM, Forensic Science International: Genetics 6 (2012) 708-715
Bright, J-A, Taylor, D., Curran, J.M., Buckleton, J.S., Developing allelic and stutter peak height models for
a continuous method of DNA interpretation, Forensic Science International: Genetics 7 (2013) 296-304
Taylor, D., Bright, J-A., Buckleton, J., The interpretation of single source and mixed DNA profiles,
Forensic Science International: Genetics 7 (2013) 516-528
Gill, P., Haned, H., A new methodological framework to interpret complex DNA profiles using likelihood
ratios, Forensic Science International: Genetics 7 (2013) 251-263
Balding, D.J., Evaluation of mixed-source, low-template DNA profiles in forensic science, PNAS early
edition (2013)
Lohmueller, K.E., Rudin, N., Inman, K., Analysis of allelic drop-out using the Identifiler and PowerPlex 16
forensic STR typing systems, Forensic Science International: Genetics 12 (2014) 1-11
Steele, C.D., and Balding, D.J., Statistical Evaluation of Forensic DNA Profile Evidence in Annu. Rev Stat.
Appl. 2014. 1:361-84
Timken, M.D., Klein, S.B., Buoncristiani, M.R., Stochastic sampling effects in STR typing; Implications
for analysis and interpretation, Forensic Science International: Genetics 11 (2014) 195-204
Weir B.S. and Cockerham, C. Clark, Estimating F-Statistics for the Analysis of Population Structure,
Evolution, Vol. 38, No. 6 (Nov., 1984), pp. 1358-1370
Forensic DNA Evidence Interpretation, eds. Buckleton, J., Triggs, C.B., Walsh, S.J. (2005) pg. 127
13
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
Appendix II
Technical data
1.
Lab Retriever is based on code derived from the equations published in:
Balding, D.J., Buckleton J., Interpreting low template DNA profile, Forensic Science
International: Genetics 4 (2009) 1–10.
2.
Theta is a variable designed to compensate for population sub-structure. It is hard-coded at 0.01.
3.
Alpha is variable designed to compensate for drop-out rates applied to homozygotes as compared
with heterozygotes. It is hard-coded at 0.5.
4.
For rare alleles, the frequency is implemented at a minimum 5/2N allele count, where N is the size
of the database.
5.
Allele frequency data is taken from the NIST 1036 U.S. Population Dataset, found at
http://www.cstl.nist.gov/biotech/strbase/NISTpop.htm
14
Applies to Lab Retriever version 2.2
10/5/2014
Lab Retriever Manual
Appendix III
Buckleton et al. 2005
15
Applies to Lab Retriever version 2.2
10/5/2014