Haplotype evidence - Familias home page

Haplotype evidence
Train the trainers workshop
Apr 21, 2015
Mikkel Meyer Andersen
[email protected]
Department of Mathematical Sciences
Aalborg University
Denmark
Themes
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability
Estimators
I
Evidential weight
Discrete Laplace
I
Importance of explicitly stating hypotheses
Conclusion
I
Methods for calculating match probability
Mixture separation
35
Train the trainers workshop
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability
Estimators
Introduction
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
Evidential weight
Haplotype evidence
Hp (prosecutor’s hypothesis): ’The suspect left the
Y-chromosome DNA in the crime stain.’
MM Andersen
[email protected]
2
Hd (defence attorney’s hypothesis): ’A random man left the
Y-chromosome DNA in the crime stain.’
Introduction
Hypotheses
Match probability
Estimators
E: Evidence (e.g. DNA profile from crime scene)
Discrete Laplace
Mixture separation
Conclusion
Likelihood ratio = LR =
P (E | Hp )
P (E | Hd )
Non-match:
LR =
Match:
0
=0
P (E | Hd )
LR =
1
P (E | Hd )
(Ideal situation, no errors, etc.)
Lineage markers (Y-STR/mtDNA): Loci are not independent ⇒ No product rule
35
Train the trainers workshop
Evidential weight
Haplotype evidence
Hp (prosecutor’s hypothesis): ’The suspect left the
Y-chromosome DNA in the crime stain.’
MM Andersen
[email protected]
2
Hd (defence attorney’s hypothesis): ’A random man left the
Y-chromosome DNA in the crime stain.’
Introduction
Hypotheses
Match probability
Estimators
E: Evidence (e.g. DNA profile from crime scene)
Discrete Laplace
Mixture separation
Conclusion
Likelihood ratio = LR =
P (E | Hp )
P (E | Hd )
Non-match:
LR =
Match:
0
=0
P (E | Hd )
LR =
1
P (E | Hd )
(Ideal situation, no errors, etc.)
Lineage markers (Y-STR/mtDNA): Loci are not independent ⇒ No product rule
35
Train the trainers workshop
Match probability
Haplotype evidence
MM Andersen
[email protected]
3
Introduction
Hypotheses
Match probability
Estimators
P (E | Hd )?
Discrete Laplace
Mixture separation
Conclusion
1. Formulation of Hd
2. Estimating P (E | Hd )
Focus on Y-STR, but many of the same challenges with mtDNA
35
Train the trainers workshop
ISFG recommendations
Haplotype evidence
ISFG recommendations of Y-STR usage from 2006
(http://www.isfg.org/Publication;Gusmao2006):
MM Andersen
[email protected]
4
Introduction
Hypotheses
Mostly nomenclature, allele designation, locus selection.
Match probability
Estimators
Recommendations on the estimation of Y-STR
haplotype frequencies and estimation of the weight of
the evidence of Y-STR typing will be presented
separately as guidelines for the interpretation of
forensic genetic evidence.
I
I
I
Discrete Laplace
Mixture separation
Conclusion
Highly wanted guidelines
Problem 1: Singletons (haplotypes only observed once)
are common (a lot of rare variants)
Problem 2: Population substructure (some haplotypes
common in local areas, but not in country as a whole)
35
Train the trainers workshop
Sparsity of Y-STRs
Haplotype evidence
19,630 samples
n = 1 (singletons)
n = 2 (doubletons)
n=3
n=4
n=5
n=6
n=7
n=8
n=9
n = 10
n = 11
...
n ∈ (30, 40]
...
n ∈ (100, 515]
MM Andersen
[email protected]
Forensic marker set
MHT
9 loci
SWGDAM
11 loci
PPY12
12 loci
Yfiler
17 loci
PPY23
23 loci
6,083
(31.0%)
8,495
(43.3%)
9,092
(46.3%)
15,263
(77.8%)
18,237
(92.9%)
1,131
435
226
114
86
63
43
29
31
22
1,227
436
199
101
85
51
50
29
21
24
1,260
416
196
106
85
50
41
34
24
28
1,064
256
94
63
21
12
12
9
4
5
531
64
16
6
2
2
1
13
11
7
1
8
4
4
5
Introduction
Hypotheses
Match probability
Estimators
Discrete Laplace
Mixture separation
Conclusion
1
Purps J, Siegert S, et al. (2014). A global analysis of Y-chromosomal haplotype diversity
for 23 STR loci. Forensic Science International: Genetics, Volume 12, 2014, p. 12-23.
35
Train the trainers workshop
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability
Hypotheses
Estimators
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
Match probability
Haplotype evidence
MM Andersen
[email protected]
P (E | Hd )
P (E | Hd , I)
Introduction
6
Hypotheses
Match probability
I: Additional information (same for Hp and Hd ), e.g. reference
database
Estimators
Discrete Laplace
Mixture separation
Elaborating Hd :
I
Hd : ’A random man left the Y-chromosome DNA in the
crime stain.’
I
Hd : ’A random man (from the population of which the
reference database is a random sample) left the
Y-chromosome DNA in the crime stain.’
Conclusion
Population? What is that? Does it matter? Yes.
35
Train the trainers workshop
Population substructure
Haplotype evidence
Population substructure: Population is a collection of
subpopulations. Haplotypes are more common in some
subpopulations than in others.
MM Andersen
[email protected]
Introduction
7
Hypotheses
Match probability
Estimators
Discrete Laplace
Subpop1
Subpop2
···
Mixture separation
Conclusion
Subpopr
Population
Coloured squares represent haplotypes.
We have a sample from the population without substructure
information.
35
Train the trainers workshop
Population substructure
Haplotype evidence
MM Andersen
[email protected]
What if the random man (the true perpetrator, under Hd ) and
the suspect is from the same subpopulation?
I
I
Introduction
8
Hypotheses
Match probability
Hd : ’A random man (from the population, with
substructure, of which the reference database is a random
sample) left the Y-chromosome DNA in the crime stain.’
Estimators
Discrete Laplace
Mixture separation
Conclusion
Hd : ’A random man (from the population, with
substructure, of which the reference database is a random
sample) – that originate from the same subpopulation
as the suspect – left the Y-chromosome DNA in the crime
stain.’
We assume that the random man and the suspect originate
from same subpopulation, but we do not know which
35
Train the trainers workshop
Match probability
Hd : ’A random man (from the population, with substructure, of
which the reference database is a random sample) – that
originate from the same subpopulation as the suspect –
left the Y-chromosome DNA in the crime stain.’
Haplotype evidence
MM Andersen
[email protected]
Introduction
9
Hypotheses
Match probability
I
I
I
I
I
I
I
Estimators
In this subpopulation, the haplotype may be more frequent
than in the population as a whole
One approach (the Balding-Nichols model):
P (E | Hd ) = θ + (1 − θ)ph
θ (theta) (0 < θ < 1)
Discrete Laplace
Mixture separation
Conclusion
Population parameter (related to the variability of haplotype
frequencies in different subpopulations)
Not haplotype specific (an average) because that is a
simple model and we have a chance to estimate it
Need to be estimated separately using databases from at
least two subpopulations
ph : Population frequency of h (0 < ph < 1)
35
Train the trainers workshop
Match probability
Haplotype evidence
MM Andersen
[email protected]
Match probability for population with substructure
(Balding-Nichols model):
Introduction
10
Hypotheses
Match probability
Estimators
P (E | Hd ) = θ + (1 − θ)ph
Discrete Laplace
Mixture separation
Note, that
Conclusion
θ + (1 − θ)ph > θ
and
θ + (1 − θ)ph > ph
as θ + (1 − θ)ph = θ + ph − θph = ph + (1 − ph )θ (and both θ
and ph is between 0 and 1).
35
Train the trainers workshop
Population substructure
Haplotype evidence
MM Andersen
[email protected]
Introduction
11
···
Hypotheses
Match probability
Estimators
Discrete Laplace
Mixture separation
Subpop1
Subpop2
Subpopr
Conclusion
Population
Coloured squares represent haplotypes.
If a random man and the suspect belong to the same
subpopulation, they are expected to share a haplotype more
often than a random database sample from the population
would represent.
35
Train the trainers workshop
Population substructure: Examples
Haplotype evidence
MM Andersen
[email protected]
Introduction
12
Match probability
Example 1: Danish reference database. We assume no
population substructure (haplotype distribution same in cities
and small islands).
I
Hd : ’A random Dane left the Y-chromosome DNA in the
crime stain.’
I
Use population frequency, ph , based on Danish reference
database (and no θ correction)
Hypotheses
Estimators
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
Population substructure: Examples
Haplotype evidence
MM Andersen
[email protected]
Introduction
Example 2: Danish reference database. We assume
population substructure (such that haplotype distribution may
differ e.g. in cities and small islands).
I
Hd : ’A random Dane originating from the same small
island, Bornholm, as the suspect left the Y-chromosome
DNA in the crime stain.’
I
Use θ correction: θ + (1 − θ)ph with known θ and
population frequency, ph , based on Danish reference
database
13
Hypotheses
Match probability
Estimators
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
Population substructure: Examples
Haplotype evidence
MM Andersen
[email protected]
Introduction
14
Match probability
Example 3: Reference database from Bornholm (small Danish
island). We assume no population substructure (haplotype
distribution same in cities and small islands).
I
Hd : ’A random man from Bornholm left the Y-chromosome
DNA in the crime stain.’
I
Use population frequency, ph , based on the reference
database from Bornholm (and no θ correction)
Hypotheses
Estimators
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
θ (theta) correction
Haplotype evidence
MM Andersen
[email protected]
Introduction
15
Hypotheses
Match probability
Estimators
Discrete Laplace
θ (theta) correction is a remedy for not knowing (or having
information about) the population substructure
Mixture separation
Conclusion
35
Train the trainers workshop
Estimating θ (theta)
Haplotype evidence
MM Andersen
[email protected]
Introduction
16
Hypotheses
Match probability
I
E.g. use geographical information
I
Sample what we believe to be subpopulations (populations
without substructure), e.g. islands, cities (or even
countries) separately (at the right level)
I
Estimators
Discrete Laplace
Mixture separation
Conclusion
θ between countries may be different from θ between
cities/islands in one country
35
Train the trainers workshop
Estimating θ (theta)
Bruce Weir, personal communication. Simple estimation (a lot
of assumptions, e.g. equal weighted subpopulations).
I r : Number of subpopulations
I ni : Size of reference database from i’th subpopulation
(i = 1, 2, . . . , r )
I nih : Number of times haplotype h is observed in reference
database from i’th subpopulation
Haplotype evidence
MM Andersen
[email protected]
Introduction
17
Hypotheses
Match probability
Estimators
Discrete Laplace
Mixture separation
Conclusion
mi =
X
1
nih (nih − 1) and
ni (ni − 1)
mij =
h
r
mW =
1X
mi
r
and mB =
i=1
θˆ =
r −1 mW −mB
r
1−mB
W −mB
1 − 1r m1−m
B
1 X
nih njh
ni nj
h
r −1 X
r
X
2
mij
r (r − 1)
i=1 j=i+1
large r
≈
mW − mB
1 − mB
35
Train the trainers workshop
ISFG recommendations
Haplotype evidence
MM Andersen
[email protected]
ISFG recommendations of Y-STR usage from 2006
(http://www.isfg.org/Publication;Gusmao2006):
Introduction
18
Hypotheses
Match probability
Individual laboratories must establish relevant,
regional Y-STR haplotype databases.
Estimators
Discrete Laplace
Mixture separation
Most of the databases provide haplotype frequency
estimates for larger regions [...]. However, pooling of
different regions is only valid if there is no population
substructure [...].
Conclusion
Population substructure has been shown in a number
of regional groups within the same (but not between
different) major U.S. populations and also in some
European groups.
35
Train the trainers workshop
Population
frequency
database sample
based
on
Haplotype evidence
MM Andersen
[email protected]
Introduction
19
Hypotheses
Match probability
I
I
Estimators
Database sampling must be truely random (not
convenience sampling!)
Discrete Laplace
Mixture separation
Conclusion
Do not search for and exclude close relatives after
randomly sampling individuals (underrepresentation of
common haplotypes)
35
Train the trainers workshop
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability
Match probability
Estimators
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
Match probability
Haplotype evidence
I
I
I
I
Match probability is ph
ph : population frequency (normally estimated based on
sample, the reference database)
Introduction
Hypotheses
20
I
I
Match probability
Estimators
Hd refers to a population with substructure (random man
and suspect assumed to originate from the same,
unknown/unidentifiable, subpopulation):
I
I
MM Andersen
[email protected]
Hd refers to a population with no substructure:
Discrete Laplace
Mixture separation
Conclusion
Match probability is θ + (1 − θ)ph
θ: Population parameter (related to the variability of
haplotype frequencies in different subpopulations)
ph : population frequency (normally estimated based on
sample, the reference database, from population with
substructure (collection of subpopulations))
θ estimated with a collection of reference databases
(counting pairs of match within and between)
35
Train the trainers workshop
Match probability
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability for populations with substructure = θ+(1−θ)ph
21
Match probability
Estimators
Discrete Laplace
Mixture separation
I
If ph is really small (compared to θ), θ + (1 − θ)ph ≈ θ
I
If ph is really large (compared to θ), θ + (1 − θ)ph ≈ ph
θ = 0.001
θ = 0.003
ph = 1/100,000 = 0.00001
0.0010099
0.0030099
Conclusion
ph = 1/100 = 0.01
0.01099
0.01297
35
Train the trainers workshop
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability
Population frequency
estimators
Estimators
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
Estimators
Haplotype evidence
MM Andersen
[email protected]
I
I
I
Precise (low prediction error) – difficult (many measures,
need to know true frequency – from simulated
populations?)
Does it work for all datasets, also for those only consisting
of singletons?
Statistical model: Guaranteed behaviour (e.g. probabilities
sum to 1)
I
I
I
Introduction
Hypotheses
Match probability
22
Estimators
Discrete Laplace
Mixture separation
Conclusion
Assign probability to all possible haplotypes (e.g. for mixture
LR)
Probability mass 1 to be distributed among all possible
haplotypes
Difficult to avoid wasting probability mass on improbable
haplotypes
35
Train the trainers workshop
Add suspect’s haplotype
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
I
Include in dataset (new observation)
I
I
Additional information: Under Hd , suspect considered as a
random (wrongly accused) individual from the population;
the haplotype is just another random sample
Not all LRs are for criminal cases (paternity, immigration,
etc.)
I
Old dataset: D − of size n
I
New dataset: D of size n + 1
Match probability
23
Estimators
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
Count method(s)
Haplotype evidence
MM Andersen
[email protected]
I
Count method: P(X = x) = (nx + 1)/(n + 1)
I
I
nx
x∈D n+1
1
n+1
I
P
I
for x 6∈ D
Corrected count estimators:
I
I
I
Introduction
nx : Number of times x is observed in the dataset
1
nx = 0: P(X = x) = n+1
=
P
x∈D
nx =
n+1
n+1
= 1, hence P(X = x) = 0
Hypotheses
Match probability
24
Estimators
Discrete Laplace
Mixture separation
Conclusion
Brenner’s κ (CH Brenner (2010) / HE Robbins (1968)):
1
by 1 − κ, where κ is the singleton propotion
Deflate n+1
Generalised Good (IJ Good (1953), G Cereda/R Gill):
http://arxiv.org/abs/1502.02406 and
http://arxiv.org/abs/1502.04083
Count methods (both original and corrected): Useful for
observed haplotypes, not for unobserved
35
Train the trainers workshop
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability
The Discrete Laplace
method
Estimators
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
Motivation
Haplotype evidence
MM Andersen
[email protected]
I
I
Haplotype probability distribution (statistical model)
Enables a wide range of inferences using one model:
I
I
I
I
I
I
Haplotype frequency estimation (observed and unobserved)
Mixtures (e.g. separation and LR)
Cluster analysis (not shown today)
...
Introduction
Hypotheses
Match probability
Estimators
25
Discrete Laplace
Mixture separation
Conclusion
Not a new ad-hoc tool for each task
A statistical model gives desirable properties:
I
P(x): Probability mass function
Consistent:
X
P(x) = 1
I
P(x) > 0 for all x ∈ H
I
x∈H
35
Train the trainers workshop
Model
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability
Estimators
I
Y-STR: Loci not statistically independent
I
Our approach: Condition on central haplotypes to obtain
(assumed) independency between loci (caused by ’private
mutations’)
26
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
Discrete Laplace distribution
Haplotype evidence
Discrete Laplace distributed X ∼ DL(p, µ):
I Dispersion parameter 0 < p < 1 and
I Location parameter µ ∈ Z = {. . . , −2, −1, 0, 1, 2, . . .}
Probability mass function:
f (X = x; p, µ) =
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability
1 − p |x−µ|
·p
1+p
for x ∈ Z
Estimators
27
Discrete Laplace
Mixture separation
Perfectly homogeneous population with 1-locus haplotypes:
Conclusion
0.4
0.2
0.0
f(X = x; p = 0.3, µ = 13)
P(X = x) = f (X = x; p, µ)
8
9
10
11
12
13
14
15
x, e.g. Y−STR allele
16
17
18
35
Train the trainers workshop
Statistical model for Y-STR haplotypes
Haplotype evidence
MM Andersen
[email protected]
Introduction
Perfectly homogeneous population with r -locus haplotypes:
Hypotheses
Match probability
P(X = (x1 , x2 , . . . , xr )) =
r
Y
Estimators
f (xk ; pk , µk )
28
Discrete Laplace
Mixture separation
k =1
Conclusion
I
µ
~ = (µ1 , µ2 , . . . , µr ): Central haplotype
~p = (p1 , p2 , . . . , pr ): Discrete Laplace parameters (one for
each locus)
I
Mutations happen independently across loci (relative to µ
~)
I
35
Train the trainers workshop
Statistical model for Y-STR haplotypes
Haplotype evidence
MM Andersen
[email protected]
Non-homogeneous population with c subpopulations and
r -locus haplotypes:
P(X = (x1 , x2 , . . . , xr )) =
c
X
j=1
τj
r
Y
Introduction
Hypotheses
Match probability
f (xk ; pjk , µjk )
k =1
Estimators
29
Discrete Laplace
Mixture separation
Conclusion
I
τj : A priori probability
Pc for originating from the j’th
subpopulation ( j=1 τj = 1)
I
µ~j = (µj1 , µj2 , . . . , µjr ): Central haplotype for the j’th
subpopulation
p~j = (pj1 , pj2 , . . . , pjr ): Parameters for all loci at the j’th
subpopulation
Parameter estimation from observations using R library
disclapmix
I
I
35
Train the trainers workshop
Data and fit
Haplotype evidence
0.5
●
Introduction
0.3
●
Hypotheses
0.2
Match probability
Estimators
30
0.1
Probability
0.4
MM Andersen
[email protected]
0.0
●
●
●
●
●
6
7
8
9
10
11
12
13
14
Discrete Laplace
Mixture separation
●
●
Conclusion
●
●
15
16
DYS392
c: Number ofP
subpopulations
c
P(X = x) = j=1 τj f (x; pj , µj )
35
Train the trainers workshop
Data and fit
Haplotype evidence
●
Observations
Estimated (c = 1)
MM Andersen
[email protected]
Introduction
0.3
●
Hypotheses
0.2
Match probability
Estimators
30
0.1
Probability
0.4
0.5
●
0.0
●
●
●
●
●
6
7
8
9
10
11
12
13
14
Discrete Laplace
Mixture separation
●
●
Conclusion
●
●
15
16
DYS392
c: Number ofP
subpopulations
c
P(X = x) = j=1 τj f (x; pj , µj )
P(DYS392 = x) = 1 · f (x; p = 0.41, µ = 11)
35
Train the trainers workshop
Data and fit
Haplotype evidence
●
Observations
Estimated (c = 2)
MM Andersen
[email protected]
Introduction
0.3
●
Hypotheses
0.2
Match probability
Estimators
30
0.1
Probability
0.4
0.5
●
0.0
●
●
●
●
●
6
7
8
9
10
11
12
13
14
Discrete Laplace
Mixture separation
●
●
Conclusion
●
●
15
16
DYS392
c: Number ofP
subpopulations
c
P(X = x) = j=1 τj f (x; pj , µj )
P(DYS392 = x) =
0.519 · f (x; p = 0.004, µ = 11) + 0.481 · f (x; p = 0.179, µ = 13)
35
Train the trainers workshop
Data and fit
Haplotype evidence
●
Observations
Estimated (c = 3)
MM Andersen
[email protected]
Introduction
0.3
●
Hypotheses
0.2
Match probability
Estimators
30
0.1
Probability
0.4
0.5
●
0.0
●
●
●
●
●
6
7
8
9
10
11
12
13
14
Discrete Laplace
Mixture separation
●
●
Conclusion
●
●
15
16
DYS392
c: Number ofP
subpopulations
c
P(X = x) = j=1 τj f (x; pj , µj )
35
Train the trainers workshop
Data and fit
Haplotype evidence
Observations
Estimated (c = 3)
●
MM Andersen
[email protected]
Introduction
0.3
●
Hypotheses
0.2
Match probability
Estimators
30
0.1
Probability
0.4
0.5
●
0.0
●
●
●
●
●
6
7
8
9
10
11
12
13
14
Discrete Laplace
Mixture separation
●
●
Conclusion
●
●
15
16
DYS392
c: Number ofP
subpopulations
c
P(X = x) = j=1 τj f (x; pj , µj )
µ
ˆj
τˆj
I
3 subpopulations:
I
Observed vs expected:
Allele
Observed
Expected
11
0.5248
0.5248
11
52%
12
0.0567
0.0567
13
46%
13
0.3322
0.3315
14
2%
14
0.0714
0.0715
15
0.0083
0.0089
35
Train the trainers workshop
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability
Mixture separation
Estimators
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
Mixture separation
Haplotype evidence
MM Andersen
[email protected]
Yfiler trace, 15 loci (DYS385a/b removed):
Introduction
Locus
Alleles
DYS19
DYS389I
DYS389II’
DYS390
DYS391
DYS392
DYS393
DYS438
DYS439
DYS437
DYS448
DYS456
DYS458
DYS635
Y GATA H4
14, 15
13, 14
16, 17
24, 26
10, 11
11, 13
13
11, 12
10, 11
14, 15
19, 20
15, 16
14, 18
23
12, 13
Hypotheses
Match probability
Estimators
Discrete Laplace
31
Mixture separation
Conclusion
213−1 = 4,096 possible contributor pairs
35
Train the trainers workshop
Mixture separation
Haplotype evidence
MM Andersen
[email protected]
Introduction
Danish
Loci
n
Singletons
I
I
I
Somali
German
Hypotheses
Match probability
DEN (21)
DEN (15)
DEN (10)
SOM (10)
GER (7)
21
181
181
(100%)
15
181
164
(90.6%)
10
181
112
(61.9%)
10
201
56
(27.9%)
7
3,443
662
(19.2%)
Estimators
Discrete Laplace
32
Mixture separation
Conclusion
For each dataset, 550 mixtures were simulated
ˆ i,1 )P(h
ˆ i,2 )
ˆi = P(h
i’th contributor pair ci = {hi,1 , hi,2 }, find p
ˆi values (highest to
Order all pairs according to the p
lowest)
35
Train the trainers workshop
Mixture separation
Haplotype evidence
Probabiliy
Rank ≤ 1
Rank ≤ 5
Rank ≤ 10
Random ≤ 10
DEN (21)
DEN (15)
DEN (10)
SOM (10)
GER (7)
13%
33%
42%
26%
55%
69%
45%
84%
93%
72%
94%
98%
53%
89%
97%
0.03%
0.78%
12.15%
26.79 %
53.93%
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability
Estimators
Discrete Laplace
P(True rank ≤ x)
33
DEN (21)
DEN (15)
SOM (10)
GER (7)
Mixture separation
Conclusion
DEN (10)
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
x
Ranking
Discrete Laplace
Random
35
Train the trainers workshop
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
Match probability
Concluding remarks
Estimators
Discrete Laplace
Mixture separation
Conclusion
35
Train the trainers workshop
The discrete Laplace method
Haplotype evidence
I
I
Sound statistical properties
Applications
I
I
I
Computationally feasible
I
Open source software: R libraries disclap and
disclapmix (and fwsim for simulating populations)
Criticism
I
I
I
I
I
Introduction
Estimation of Y-STR haplotype population frequencies
Mixture analysis
Cluster analysis (not shown today)
I
I
MM Andersen
[email protected]
Hypotheses
Match probability
Estimators
Discrete Laplace
Mixture separation
34
Conclusion
35
Train the trainers workshop
Intermediate alleles (e.g. 10.2)
Duplications (e.g. DYS385a/b) – general problem for
matches: 14,15 = 14,15?
Copy number variation (e.g. Yfiler Plus)
Central haplotypes difficult to estimate (curse of
dimensionality)
Maybe too much probability mass on unobserved
haplotypes
Conclusion
Haplotype evidence
MM Andersen
[email protected]
Introduction
Hypotheses
I
Match probability is of great interest and is difficult
I
Population substructure (important but difficult to get
correct value for a particular case)
For a matching profile (e.g. Y23 or Yfiler Plus), use only
subset (e.g. 10 loci) for LR calculations?
I
I
Match probability
Estimators
Discrete Laplace
Mixture separation
35
Conclusion
35
Train the trainers workshop
Easier to validate statistical models for 10 locus haplotypes
than for 27 locus haplotypes