What is probability ?

What is probability ?
In many experiments, if conditions kept the same, repeated experiments →Different
results
Results of individual measurement unpredictable → possible results of a series of
measurement have well defined distribution
1.
Events must be completely uncorrelated (statistical independent)
2.
Number of trials is large (law of large number)
Ni = number of event observed in class i out of a total of N
➔ pi = probability of getting an event in class i = lim N→∞ (Ni/N)
For a continuous varying variable x, Nf(xi)dxi = number of events observed in the
interval xi and xi+dxi for value of x out of a total N events
element of probability, dp = f(x)dx
f(x) = probability density function within the permissible range
xhigh
∫ f ( x)dx = 1
xlow
Throwing a dice : Probability that one score N is 1/6. For large throws it appear
1
Emission of scintillator light : Emitted isotropically. If blackened on all sides
except a slid angle dΩ, dΩ/4π emitted photon will escape
More common in physics experiment is determination of some parameters
from measurements
Angular distribution in µ-decay f(cosθ) = ½(1+αcosθ)
Experiment measures α →α* ±δα
Crudely
Probability that true value of α has been in the range α*-δα and
α*+δα is 68.5% (inverse probability)
Probability that true value of α lies between α*-δα and α+δα is 0 or 1
(Direct probality)
What is the length of this table ?
Ans 10
Which unit ? What is its error ? What is the probability that result from
repeated measurement is within this error ?
2
Probability
Probability distributions
distributions
•Binomial
•Poisson
•Normal
•Prove
f (r : N , p) =
f (n, λ ) = e −λ
N!
p r (1 − p) N − r
( N − r )!r!
λn
n!
2
x
−
1
(
µ
)
f ( x; µ , σ 2 ) =
exp(−
2σ 2
2π σ
Binomial
N→α p→0,
but Np=λ
N→
α
Poisson
α
→
λ
Gaussian
•Calculate mean and variance of all
these distributions
3
•P(A+B)≤P(A)+P(B)
•P(AB) = P(A/B)P(B) = P(B/A)P(A)
•P(A/B)=P(A) if Occurrence of B does not effect whether or not A occurs : A and
B are said to be independent
•In that case P(AB) = P(A)P(B)
 e − λ λn  

n!
 × 
P = PF PB = 
f F (1 − f ) B 

 n!   (n − F )! F!
 e −λ ( p + (1− p )) λnp λn (1− p )  

 × 
= 
f F (1 − f ) B 


  ( B)! F!
 e −λp λnp f F   e −λ (1− p ) (λ (1 − f )) n (1− p ) 

 × 
= 
F!
B!

 

 e −λp (λf ) F   e −λ (1− p ) (λ (1 − f )) B 
 × 

= 
F!
B!

 

4
Pr ob density function, P( x)
∫ P( x)dx = 1 or
for discrete number
∑ P( x) = 1
First moment , µ = E[ x] = ∫ xP( x)dx
Second central moment , σ 2 = E[( x − µ ) 2 ] = ∫ ( x − µ ) 2 P( x)dx
= ∫ ( x 2 − 2 xµ + µ 2 ) P( x)dx
= ∫ x 2 P( x)dx − 2 µ ∫ xP( x)dx + µ 2 ∫ P( x)dx
= ∫ x 2 P( x)dx − µ 2 ∫ P( x)dx = [ E ( x 2 )] − [ E ( x)]2 = x 2 − x 2
5
Variance
Variance of
of measured
measured <X>
<X>
Let x1,x2,….xn be a sample of size n from a distribution whose mean is µ and
variance σ2
1 n
1 n
x = ∑ xi , in the lim it n → ∞ µ = lim n→∞ ∑ xi ,
n i =1
n i =1
[
1 n
Simple var iance, s = ∑ ( xi − x ) 2
n i =1
2
]
error on individual measurement , σ 2 ( xi ) = E ( xi − µ ) 2 = ∫ ( xi − µ ) 2 P( x)dx
[
is different than the error on mean value x , σ 2 ( x ) = E ( x − µ ) 2
]
2
2
2






1
1
1






2
2
σ ( x ) =E ( x − µ ) = E  ∑ xi − µ   = 2 E ∑ xi − nµ   = 2 E ∑ ( xi − µ ) 
 
  n  i
  n  i
 n i
[
]
2
2


x
µ
x
µ
(
)
(
)
=
−
− ∑∑ ( xi − µ )( x j − µ )
−
∑ i
 ∑ i
i
i j ≠i
 i

Cross terms are vanishes due to no correlations
2

 1
nσ 2 σ 2
1


2
2
σ ( x ) = 2 E ∑ ( xi − µ )  = 2 ∑ E ( xi − µ ) = 2 =
n  i
n
n
  n i
1
1
1
Simply, x = ∑ xi ⇒ ∂x = ∑ ∂xi ⇒ δ 2 = 2 ∑ σ i2 = nσ 2
n i
n i
n i
[
]
6
Let x1,x2,….xn be a sample of size n from a distribution whose mean is µ and
variance σ2
1 n
1 n
x = ∑ xi , in the lim it n → ∞ µ = lim n→∞ ∑ xi ,
n i =1
n i =1
1 n
Simple var iance, s = ∑ ( xi − x ) 2
n i =1
2
Which is not true variance of distribution σ2, because <x> ≠µ, this
variance turns out to be biased estimator
1 n
1 n
1 n
1
2
2
Variance, s =
( xi − x ) =
{( xi − µ ) − ( x − µ )} =
{( xi − µ ) − ∑ (x − µ )}2
∑
∑
∑
n − 1 i =1
n − 1 i =1
n − 1 i =1
n j
2
 

1
1
 1
2
E (s ) = E 
E ∑ ( xi − µ ) ∑ ( x j − µ )
( xi −µ )  − 2
∑
−
n
n
n j
−
1
1
i


 i 

2

1
1
E ∑ 2
+
n −1  i n

( x j − µ )( xk − µ )
∑∑
j
k

Using the fact that different xi are independent
 1

 1
2
2
E (s ) = E 
( xi −µ )  − E 
( x j −µ ) 
∑
∑
n
−
n
n
−
1
(
1
)
i
j




1
1
1
1
2
2
2
=
E
(
x
−
µ
)
−
E
(
x
−
µ
)
=
n
σ
−
nσ 2 = σ 2
∑
∑
i
j
n −1 i
n(n − 1) j
n −1
n(n − 1)
2
∑ ( x −µ )
i
i
2
= nσ 2 ; ∑ ( xi −x ) 2 = (n − 1)σ 2 ; ∑ ( x −µ ) 2 = σ 2
i
i
7
Central
Central limit
limit theorem
theorem
If x1, x2,…xn be a set of n independent random variables and each xi be
distributed with mean values µi and a finite variance σi2, then the variable
has a limiting distribution which is N(0,1)
z=
∑x −∑µ
i
i
i
∑σ
2
i
i
=∑
i
xi − µ
σ n
1/ 2
 


 ∑ xi − ∑ µi  /  ∑ σ i2 
i
 i
  i

Construct a standarised variable,
For which we want to establish the characteristic function φ(t), by definition, φ(t)
is the∞ Fourier transform of φ(x)
i
φ (t ) = ∫ e itx f ( x)dx =E [e itx ] = 1 + (it ) µ +
1
(it ) 2 (σ 2 + µ 2 ) + ....
2!
Since −x∞i and hence (xi-µ)/(σ√n)- by assumption are independent, characteristics
function for z nis equal to the product
of n characteristic
function for each of these
n
n
  it x − µ  
 − itµ  it x  
 − σitµn
t 
σ n
σ n
σ n
φ (t ) =  E e
φ
=
=
(
)
e
E
e
e







σ
n
 

 

 


2
 itµ


1  it 
it
2
2
µ+ 
+ ln 1 +
ln φ (t ) = n −
 (σ + µ ) + .... 
2
!
σ
σ
σ
n
n
n




 
ϕ (t ) = ϕ (0) + ϕ ' (0)it − ϕ " (0)t 2 / 2!+......
1
= 1 − σ 2t 2
2
2
2

 itµ
 1 − t2

1
1  it
1  it 
it

2
2
+ ... ≈ − t 2 + O n −1/ 2
µ+ 
µ  + .... = n
= n −
+
 (σ + µ ) − 
2
2!  σ n 
2!  σ n 
2 n


 σ n σ n
(
)
 1 
Thus, when n → ∞, ϕ (t ) → exp − t 2 
 2 
8
Likelihood
Likelihood function
function
N independent observables x1,x2, …. Xn from a theoretical dustribution f(x/θ),
where θ is the parameter to be estimated.
L(θ/x) = f1(x1/θ)f(x2/θ) …….. f(xn/θ)
Θ, can be found by solving the equation,
dpj = f(xj, λ) dx has the character of a
“posteriori” probability
How probable that it was just find this result
For N independent measurement dP =
∏f(xj,λ)dx
dL
d (ln L)
= 0 in general
=0
dθ
dθ
“Likelihood is reserved for a posteriori
probabilities”
Variance, σ 2 (θ ) = ∫ (θ − θ ) 2 L(θ / x)dx1dx2 ....dxn
An approximate method in the limit of large numbers
−1
 d ln L 
∂ ln L
2
−1

,
,
(
)
(
) ii
⇒
=
−
U
⇒
σ
θ
=
U
ij
i
2 
∂θ i ∂θ j
 dθ 
σ 2 (θ ) = −
2
9
Mean
Mean and
and error
error in
in Gaussian
Gaussian function
function through
through Likelihood
Likelihood function
function
L=∏
i
 ( xi − µ ) 2 
( xi − µ ) 2
1
; l = ln L = ∑ − ln σ i + ∑ −
+ const
exp −
2
2
2
2
σ
σ
2π σ i
i
i
i
i


xi − µ
∂l
=∑
∂µ
σ i2
i
∂ 2l
1
=
−
∑i σ 2
∂µ 2
i
2
1
1  xi − µ  1
∂l

 2
=
−
+
∑
∑
2
2
2 i  σi  σi
∂σ
i 2σ i
Weighted
mean, µ =
2
/
x
σ
∑ i i
i
∑1 / σ
2
i
i
Error
on
weighted
mean, σ 2 ( µ ) =
1
∑1 / σ i2
i
10
Addition
Addition of
of two
two variables
variables and
and its
its error
error
If a = b+c, what is the error of b in terms of σb and σc ?
Ans : σa2 = σb2 + σc2, in the case b and c are uncorrelated
In a series of measurement, we do not know ∂a or ∂b, but only known
parameters are their mean square value, σ
σ a2 =< (a − a ) 2 >=< [(b + c) − (b + c )]2 >
=< (b − b ) 2 > + < (c − c ) 2 > +2 < (b − b )(c − c ) > = σ b2 + σ c2 + 2 cov(b, c)
All distributions in experiments are
Gaussian (with large statistics). Gaussian
properties are known well, or want to use
that properties in all measurement.
Full Width Half Maximum (FWHM)
=2.35σ for Gaussian distribution
e
1 2
x
2
=
1
⇒ x = 2 ln 2 = 1.177
2
11
Gaussian
Gaussian distributions
distributions in
in two
two variables
variables
Probability distribution of two variables x and y
 1 x2 

P( x) =
exp −
2 
σ
2
2π σ x
x 

1
and
 1 y2 

P( y) =
exp −
2 

2π σ y
 2σy 
1
If x and y are independent, combined probability
 1  x2 y2  
P ( x, y ) = P ( x ) p ( y ) =
exp −  2 + 2  
 2  σ x σ y 
2πσ xσ y



1
Probability reduced by a factor √e (corresponds to
1D
case,
when x=±σx
2
2
1 / σ x2
0  x 
x
y

  = 1
(
)
in
matrix
notation
x
y
+
=
1
,
2  
2
2

1 / σ y  y 
σx σy
 0
 σ x2 0 
Inversion of 2×2 matrix known as


 0 σ2
error matrix for x and y
y 

In general the elements of error matrix, Mij
A specific example
for a set of variables x1,x2,..xn is
<(xi─<xi>)((xj─<xj>)>, which is symmetric and
2
2
=
=
0
.
354
;
=
= 0.707
σ
σ
x
y
off diagonal terms are cov(xi,xj)
4
2
P( x, y ) = P(0,0) / e when 8 x 2 + 2 y 2 = 1
12
Correlated
Correlated variables
variables
Rotate the axes clockwise by an angle θ(e.g. 30o), variable x’ and y’ wrt new coordinate
x' = x cos θ − y sin θ
y ' = x sin θ + y cos θ
Eqn of ellipse for P(x’,y’)=P(0,0)/√e
[
]
1
13 x'2 +6 3 x' y '+7 y '2 = 1
2
In matrix notation
( x'
 13 / 2 3 3 / 2  x' 
  = 1
y ')

 3 3 / 2 7 / 2  y ' 
Error matrix, inverse of
−3 3
1  7



32  − 3 3
13 
7
σ x2' = = (0.468) 2
32
13
σ y2' = = (0.637) 2
32
3 3
cos( x' , y ' ) = −
= −(0.403) 2
32
σ x2 = (0.354) 2
σ y2 = (0.707) 2
cos( x, y ) = 0
13
Correlated
Correlated variables
variables
•Semi-axes(0.354) = square root of eigen
values of error matrix
•Maximum value of x’(0.468) = square
root of diagonal elements in the error
matrix
•Intersect in the x’-axis (√(2/13)=0.392)=
diagonal element of inverse error matrix
•General form of Gaussian distribution of
two variables
P ( x, y ) =
ρ=
1
σ xσ y
 1 1  x 2 y 2 2 ρxy  


+ 2−
exp −
2 
2

2

1− ρ
 2 1 − ρ  σ x σ y σ xσ y  
, | ρ |≤ 1
ρ is the correlation
coefficient for x and y
The probability distribution of y for a value “a” of
x (conditional prob P(y/a) is also Gaussian of mean
σy
and variance
y+ρ
(a − x ) and σ y2 (1 − ρ 2 )
σ
Generalised form for x
K varibales
1
P=
(2π ) K / 2
x’
1
2πσ xσ y
cos( x, y )
y’
1
 1

exp − xT M −1 x ;
|M |
 2

2
 σ x2
cov( x, y ) 
1  1 / σ x
−1


M ( for two) =
and M =
2
 cov( x, y )

σ
1 − ρ 2  − ρ / σ xσ y
y


− ρ / σ xσ y 

2
1 / σ y 
14
Propagation
Propagation of
of error
error and
and error
error matrix
matrix
pi = pi(xa, xb)
What is the error of pi in terms of error of xa and xb ?
2
 ∂p
δpi2 =  i
 ∂xa
2
 ∂p 
 ∂p 
∂p ∂p
< ( pi − pi ) 2 >= δpi2 =  i  δxa2 +  i  δxb2 + 2 i i δxaδxb
∂xa ∂xb
 ∂xb 
 ∂xa 
 ∂pi 


∂pi  δxa2
δxaδxa  ∂xa  Simply matrix notation σy2 = DTMD

∂xb  δxaδxa
δxb2  ∂pi 
 ∂x 
pi = pi ( xa , xb )
 b
∂p
∂p
δpi = i δxa + i δxb ;
∂xb
∂xa
Propagation of error matrix from (xa, xb) to (pi, pj)
δp j =
∂p j
∂xa
δxa +
∂p j
∂xb
δxb
 ∂p ∂p
 ∂p ∂p 
 ∂p ∂p 
∂p ∂p 
δpiδp j =  i j δxa2 +  i j δxb2 +  i j + i j δxaδxb
 ∂xa ∂xb ∂xb ∂xa 
 ∂xb ∂xb 
 ∂xa ∂xa 
 δpi2

 δp δp
 i j
p j = p j ( xa , xb )
 ∂pi

δpiδp j   ∂xa
=
2 
δp j   ∂p j

 ∂xa
∂pi
∂xb
∂p j
 ∂pi



2


x
x
x
δ
δ
δ

a
a
b  ∂xa
 δx δx
δxb2  ∂pi
a
b



∂xb 
 ∂xb
∂p j 

∂xa 
∂p j 

∂xb 
In matrix notation M’=TTMT
M’=New error matrix
TT=Transpose of T
M = Old error matrix
T=Transformation matrix
Function of changed variables
y = y ( pi , p j );
pi = pi ( xa , xb )
p j = p j ( xa , xb )
⇒ σ y2 = D T T T MTD
Assignment 3.3
15
Simple
Simple examples
examples of
of propagation
propagation of
of error
error
Example 1 : Forward-backward asymmetry of an angular distribution
Case A : F and B follows independent poisson distribution
 ∂A
δA = 
 ∂F
2
∂A  σ F2

∂B  0
0  ∂A / ∂F   2 B

 =  2
2 
A
B
∂
∂
/
σ B 
 N
2 F  F
− 2 
N  0
A=
F −B F −B
=
F+B
N
0  2 B / N 2  4 FB
=

B  − 2 F / N 2  N 3
Case B : N is given, F and B follows binomial distribution with probability p of being
produced in forward hemisphere. F and B are completely anti correlated
− cov(σ F , σ B ) = σ F2 = σ B2 = Np (1 − p ) ~ FB / N and
1
σ =
N
2
A
1  σ F2
− 
N  − σ F2
∂A
∂A 1
=−
=
∂B N
∂F
− σ F2  1 / N  4 FB

=
σ F2  − 1 / N  N 3
Example 2 : A and B are uncorrelated with error σA2 and σB2 and
1
( a + b)
 σ x2
cov( x, y )  1 / 2
2
=
⇒ 
2
1
σ x  1 / 2
y=
(a − b)  cov( x, y )
2
x=
2
1 / 2  σ a 0 1 / 2


2 


− 1 / 2  0 σ x 1 / 2
2
2
2
2
1 / 2  1 σ a + σ b σ a − σ b 
=  2

2
2
2


2
−1/ 2 
σ a − σ b σ a + σ b 
Without covariant term, if we want to go back to the error on a and b
 σ a2
cov(a, b)  1 / 2

=
2
 cov(a, b)
σ b  1 / 2

2
2
1 / 2
0
1 / 2  (σ a + σ b ) / 2


2
2


0
(σ a + σ b ) / 2 1 / 2
− 1 / 2 
2
2

0
1 / 2   (σ a + σ b ) / 2
=

2
2


0
(σ a + σ b ) / 2 
−1/ 2  
Error are same for a and b and they are different from true errors σA2 and σB2
16
Combination
Combination of
of different
different errors
errors
•Statistical error (known exactly for specific
distribution)
•Systematic error (very difficult to formulate as
statistical one); depends on the situation of the
experiment
Threshold peak of D0D0π0
•Number of observed events, NOBS
•Signal efficiency
•Background subtraction
•D0 branching fraction
•Track finding efficiency
•K±/π± identification efficiency
• π0 detection efficiency
•D0 resolution is Data/MC
•Poorly reconstructed π0
•Efficiency due to decay kinematics
 N1 + α 2 N12
matrix =  2
 α N1 N 2
Br1 ( B ± → D 0 D 0π 0 K ± ) =
N obs
N B ± ε Brd ( D 0 → K −π + ) 2
ln Br1 = ln N obs + ln N B ± + ln ε + 2 ln Brd
σN
=  obs
Br1
 N obs
σ Br
1
2
  σ N B±
 +
  N B ±
2
  σ ε  2  σ Brd  2
 +   + 4
 Br 
  ε 
 d 

First error is simple statistical
error (uncorrelated)
Muon counts in two different telescope
N1 ± N1 ± α N1
⇒ error
N 2 ± N 2 ± αN 2
B ± → D 0 D 0π 0 K ± ; D 0 → K −π +
α 2 N1 N 2 

N 2 + α 2 N 22 
Second one are totally
correlated due to Vth, τ etc
17
Assignment
Assignment
δr12 = σ 12 (4 cos 2 θ1 + sin 2 θ1 )
δθ12 = σ 12 (4 sin 2 θ1 + cos 2 θ1 ) / r12
δr1δθ1 = −3σ 12 sin θ1 cos θ1 ) / r1 = δθ1 δr1
δr22 = σ 22 (4 cos 2 θ 2 + sin 2 θ 2 )
δθ 22 = σ 22 (4 sin 2 θ 2 + cos 2 θ 2 ) / r22
δr2δθ 2 = 3σ 22 sin θ 2 cos θ 2 ) / r2 = δθ 2 δr2
• Obtain the error matrix for the variables x1,y1,x2,y2
• Calculate the error on the function f=x1x2+y1y2
• Use the original error matrix for the variables r1,θ1,r2,θ2 to obtain the error on the
function r1r2cos(θ1─θ2). Compare this error with previous one
18
Fitting
Fitting technique
technique :: moments
moments
dn
= a + b' cos θ = N (1 + b cos 2 θ )
e.g.,
d cos θ
1 / 3 + b(1 / 5)
5(3cos 2 θ − 1)
⇒ cos θ =
⇒b=
1 + b(1 / 3)
(3 − 5cos 2 θ )
2
Calculate b from averaging
over cos2θ over events and
1 n
cos θ = ∑ cos 2 θ
n i =1
Error from the error on
<cos2θ>
1
σ (cos θ ) =
n
2
2
(
1 n
cos 2 θ − cos 2 θ
∑
n − 1 i =1
)
2
Calculate it from a data set ={−0.9, −0.85, −0.81,−0.7,−0.55,−0.34,−0.2,−0.05,
0.1, 0.25, 0.30, 0.50, 0.55, 0.70, 0.9, 0.99}
19
Moments
Moments
•Simplest
No need of optimisation
•No binning
•Extendable to several parameters or variables
•Several possible moments
dn
e.g .,
d cos θ
= N (1 + a cos θ + b cos 2 θ )
•Constraints among parameters not easily incorporate
•Parameters can be unphysical
cos 2 θ = 0 ⇒ b = −5 / 3
cos 2 θ = 1 ⇒ b = −5
cos 2 θ = 3 / 5 + ε ⇒ b = −∞
•No checking for the goodness of fit, e.g. true
distribution is sin2θcos2θ
•Useful to get a starting value for more
sophisticated methods
−1
Cosθ
1
20
Maximum
Maximum likelihood
likelihood method
method
Angular distribution
y=
dn
= N (1 + b cos 2 θ )
d cos θ
N=normalisation factor
1
∫ yd cosθ = 1; ⇒N =
−1
1
2(1 + b / 3)
Normalised y behaves as a probability distribution
Normalisation is essential beacause its dependence on
parameter b., factor ½ is not crucial, just a scale, but
need for error estimation.
For the ith event, yi = N(1+bcos2θ) is the probability density for observing that event e.g. θi,
for a given value of b
Define likelihood, ℒ as the product of yi for all events or called as joint probability
function, ℒ(b/θ)=∏yi, which is probability of observing given set of θi for that b
For true (theory) value of b, ℒ(b/θ) is maximum, or inversely maximise ℒ as a function of
b to find the best value of b
Normalisation of ℒ, N=∫P(θ/b)dcosθ must be independent of θ
Without the normalisation factor, N, one can make yi larger simply by increasing b, hence
ℒ would not have any absolute maximum
Similarly for lifetime fit to t1,t2,….tn
τ = ∑ti/τ, it will be incorrect, if one misses 1/τ in
the expression of probability P(t/τ) = (1/τ) exp(−t/τ), probability will larger for any
smaller value of τ
21
Likelihhod
Likelihhod function
function for
for multiparameter
multiparameter fit
fit
A case of Breit-Wigner function, where two free parameters in the function, (M0 and Γ)
probability density function is
ℒ(M0,Γ/mi)=∏yi(M0,Γ)
yi ( M 0 , Γ ) =
The maximum ℒ give the best value of M0 and Γ
1
Γ
2π (mi2 − M 0 ) 2 + (Γ / 2) 2
ℒ(M0(Γ)/mi)=∏yi(M0(Γ)) for fixed Γ(M0), one parameter function, when we know
second one.
22
Properties
Properties of
of likelihhood
likelihhood function
function
∏yi is a very small number for large number of measurements (N), difficult to calculate
Conventional to consider ℓ = ln(ℒ)=∑yi
For large N, likelihhod function, ℒ tends to Gaussian, atleast near the vicinity of maximum
1 ∂ 2l
1
2
δ
p
+
=
l
−
(
)
.....
(δp ) 2 + .....,
l = lmax +
max
2
2! ∂p
2c
 ( p − p0 ) 2 
1 ∂ 2l

where − = 2 ⇒ Likelihood = exp −
c
c ∂p
2


•
•
•
•
•
Calculation of p (probable value) is simple,
but what is its error ?
RMS deviation of the ℒ distribution about its
mean
(−∂2ℓ/∂p2))−1/2
The change in p required to reduce ℓ from its
maximum value by 0.5, i.e. ℓ(p0±∂p)=ℓ(p0)−1/2
For non Gaussian ℒ, use second or third option
Try to avoid non-Gaussian situation, e.g. use
decay rate (1/τ), rather than lifetime τ or use
1/p rather than momentum p of charge track
23
Lifetime
Lifetime determination
determination
dn 1
= exp(−t / τ )
dt τ
1
 dn 
L = Π  = Π exp(−t / τ ),
i
τ
 dt i
thus
l = ∑ (−ti / τ − ln τ )
τ→
i
ti N
∂l
 ti 1 
∑
= ∑  2 −  = 0 ⇒ 2 − = 0 ⇒ τ (or τ max ) = ∑ ti / N = ti
τ
τ
τ
∂t
i τ
2N N
N
∂ 2l
∂ 2l
 2ti 1 
= ∑  − 3 + 2  = − 2 + 2 = − 2 ⇒ στ = 1/ − 2 = τ / N
2
τ
τ 
τ
τ
τ
∂t
∂t
i 
Universal function of
τ / τ max = l(τ ) − l(τ max ) = Nτ max / τ − N ln τ − ( N − N ln τ max ) = N [1 + ln(τ max / τ ) − τ max / τ ]
For given N, σ+ and σ− are defined (~τmax/√N and N→∞, For
small N, σ+ > σ−
l(τ max ) = − N (1 + ln t )
N.B. ℓ(τmax) depends only on <t>, but not on distribution of ti
Relevant for whether ℓmax is useful for testing goodness of fit
Not at all justified whether this parametrisation ?
24
Likelihood
ℒ) vs
pdf)
Likelihood ((ℒ)
vs probability
probability distribution
distribution function
function ((pdf)
•Poisson : pdf = Probability distributing function for observing n, given λ, is
P(n; λ ) = e − λ λn / n!
pdf
From this construct ℒ as L(λ ; n) = e −λ λn / n!
Use same function for λ and n, but
λ
ℒ
n
For pdf,
λ is fixed : P(n;λ) exists only at integer n≥0
For Likelihood, n is fixed :ℒ(λ;n) exists as continuous function of λ≥0
Lifetime distribution :
pdf maximises at t=0 P (t ;τ ) = (1 / τ )e − t /τ
ℒ maximises at τ=t
ℒ
Fixed τ
L(τ ; t ) = (1 / τ )e −t /τ
P
Both t and τ are continuous
 1  x − µ 2 
1
Gaussian
exp − 
P ( x; µ ) =
 
 2  σ  
 1  x − µ 2 
1
exp − 
L( µ ; x) =
 
2
σ
2π σ

 

2π σ
t
Fixed t
τ
Same functional form
If only consider Gaussians, one can get confused between pdf and ℒ
Integration of pdf=1, whereas integration of ℒ is meaningless
25
Error
Error on
on maximum
maximum likelihood
likelihood fit
fit
For single parameter fit, parameter p is estimated from eqn (dℓ/dp)=0
and error σ=(─d2ℓ/dp2)−1/2
For multivariable pi, their best value is obtained from ∂ℓ/∂pi)=0
For error, define Hij = − (∂2ℓ/∂pi∂pj) and obtained error matrix as Eij=(H−1)ij
An example : two variables x and
y. Contours of ℓ are ℓ=−(4x2+y2),
where, ℓmax at (0,0) and ℓ=ℓmax−1/2
when 8x2+2y2=1
Error on variables
X = ±√(1/8) (when y=0)
Y = ±√(1/2) (when x=0)
In terms of Hij
 8 0  invert 1  2 0 
∂ 2l

 
→ 
−
= 
∂xi ∂x j  0 2 
16  0 8 
Rotate axis clockwise by an 30o
➾ ℓ= −1/4(13x’2+6√3x’y’+7y’2)
−3 3
∂ 2l
1  13 3 3  invert
2  7


 

→
−
= 
∂xi ∂x j 2  3 3
64  − 3 3
7 
13 
26
Extended
Extended maximum
maximum likelihood
likelihood
1. Maximum likelihood uses shape ➾parameters
N!
PF = f F (1 − f ) B
;
F ! B!
Maximise, ln PF
 ∂ 2 ln PF
2
wrt f ⇒ f = F / N , σ ( f ) =  −
∂f 2

−1

f (1 − f )
 ≈
N

Estimate of F = Nf = F ± FB / N
Estimate of B = N (1 − f ) = B ± FB / N
Completely anti correlate
2. Extended maximum likelihood uses shape and normalisationand uses probability of
1. Observing sample size of N events
and
2. Sample had the given distribution in x.
λN
× PF
Pa = exp(−λ )
N!
Maximise ln Pa (λ , f ) ⇒ λ = N ± N and f = F / N ±
f (1 − f ) / N
Completely uncorrelated variable
For <F> and <B>, either propagate errors for <Φ>=<λ><f> and <β>=<λ>(1−<f>)
−F F
−B B



φ
β 
e
e
Or rewrite equation as product of two independent Poissons
 × 

Pa = 
F
!
B
!

 

Gives the same answer, <Φ> = F ± √F and β = B ± √B
27
Pros
Pros &
& cons
cons of
of Likelihood
Likelihood method
method
•No need of histogramming, Most useful for low statistics case
•Unique answer and error (e.g. λ0±δλ or τ0±δτ, then λ0+δλ = 1/(τ0−δτ),
•Able to constrain parameters and ranges
l = ln f R wR (mi ) PR (θ i ) + f B wB (mi ) PB (θ i )
•Difficult to tackle background
∑ (
)
i
•Use weight factor (e.g. efficiency, ε=1/w) for different events ℓ=∑wiln(yi), but difficult
to estimate error, better option is ℓ=∑ln(Nyi/wi)
•Large computing time, normalisation for each parameter set separately
•How good this fit is ? No limit on the value of ℒ ➾Hypothesis testing is not easy, but
can compare two hypothesis (ℓa−ℓb) can just give better choices of models.
28
Chi
-square fit
Chi-square
fit
Data set {xi, yi ± δyi} and theory y=a+bx
Does it fit straight line ? (hypothesis testing)
y
What are gradient and intercept ?
(Parameter determination) obs
( yi − yith (α j )) 2
2
χ =∑
2
i
σi
σi supposed to be error on theory, but in reality
it is the error on experimental observation.
If theory and data are consistent with each
other, yth ~ yobs ➾χ2 is small
Minimize χ2 to obtain the best line (best
parameter of the theory)
−1 / 2
 1 d 2χ 2 


The error on parameter =
2 
 2 dp 
Or increase χ2 by 1 from its minima χ2min
For multi parameters, their best values
are obtained from ∂χ2/∂p=0 and error
 1 ∂2χ 2 
matrix is the inverse of


 2 ∂p ∂p 
i
j 

For single measurement, yobs±σ
χ =
2
x
( y obs − y th ) 2
σ2
Minimises with χ2=0 for yobs = yth and
χ2=1 for yobs = yth ±σ
For two measurements y1 and y2 of
single quantity with equal
χ =
2
( y1obs − y th ) 2
σ 12
+
( y2obs − y th ) 2
σ 22
Yth = (y1+y2)/2 from dχ2/dyth=0 and its
error = σ/√2 from χ2 =χ2min + 1
Exactly what do we expect, error=σ/√n
29
Simple
Simple example
example of
of minimising
minimising χχ22
Measurements, p1±σ1, p2±σ2,…..pn±σn The beast value <p>±σ
Construct
χ =∑
2
i
( p − pi ) 2
σ i2
and min imise χ 2 wrt p
p − pi
pi
1 dχ 2
1
=∑
=
⇒
p
=
0
; and error on p given by
∑
∑
2
2
2
σ
σ
σ
2 dp
i
i
i
i
i
i
 1 d 2χ 2 

σ = 
2 
 2 dp 
−1 / 2
d 2χ 2
1
=
;
2
;
∑
2
2
dp
i σi
thus
1
σ2
=∑
i
1
σ i2
It of linear eqn
30
Example
Example of
of straight
straight line
line fit
fit
A simple example y=a+bx; simple because y is linear in a and b, not in x
Data consists of n points (xi, yi±σi) :
Lots of line for different χ2
Minimixe χ2 wrt a and b
2
 y − a − bxi 

χ 2 = ∑  i
σi
i 

a + bxi − yi
1 ∂χ 2
=∑
=0
2
σ
2 ∂a
i
i
Simultane
ous two
eqns for
1 ∂χ 2
(a + bxi − yi ) xi
=∑
= 0 two
2
σi
2 ∂b
i
unknown
a and b
Solution of these equations
b=
fi
[1][ xy ] − [ x][ y ]
1
,
[
]
=
where
f
∑
[1][ x 2 ] − [ x][ x]
n i σ i2
Weighted mean <f>=[f]/[1] and a can be
determined from <y>=<a>+b<x>
31
Error
Error on
on straight
straight line
line fit
fit
To evaluate error, need error matrix, a and b are correlated. First evaluate
1 ∂2χ 2
1 ∂2χ 2
1 ∂2χ 2
2
= n[1];
= n[ x ];
= n[ x];
2
2
2 ∂a
2 ∂b
2 ∂a∂b
Inverse error matrix
Error matrix
1 ∂2χ 2
2 ∂pi ∂p j
 [1] [ x] 

n
2 
[
]
[
]
x
x


1  [ x 2 ] − [ x] 


nD  − [ x] [1] 
Where determinat D=[x2][1]-[x][x]
For no correlation σ2(a)=1/[1] and
σ2(b)=1/[x2]
Errors depends only on measured variable xi,
σi but not on how well data agrees with theory
Cov(a,b)= −<x>
Better to use x’ (=x−<x> bcause error on a’ and b are uncorrelated
32
•How well is the y-coordinate of the fitted
line known for a particular x-value ?
Variance of y, σy2 = σa2 +
2xcov(a,b)+ x2σb2
For proper shift of x such that
<x> =0, σy2 = σa2 + x2σb2
• Hypothesis testing : How well the data
points matched with the theoretical
expectation ? Probability of getting as
large as this in a χ2 distribution (χ2/dof)
χ
2
min
=∑
i
yi2
σ i2
− a∑
i
yi
σ i2
− b∑
i
xi yi
σ i2
•Error on first kind : Reject H when it is
true, should happen x% of time
•Error on second kind : Accept H when
something else is true
•Optimum χ2criteria for the best result
33
Generalised
Generalised χχ22 fit
fit
Set of data points (xi, yi±σi) by a form yth(x,αa), a function of variable x and parameters αi
2
 yi − y ( xi , α a ) 
∂χ 2
 , solution of α from simul tan eous eqn
χ = ∑ 
=0
σi
∂α a
i =1 

1 ∂2χ 2
−1
Error on parameters < (α a − α a )(α b − α b ) = ( H ) ab , where H ab =
2 ∂α a ∂α b
2
th
n
Let a general linear case y th = ∑ α a f a ( x)
a
2
( yi − ∑ α a f a ( xi )) f b ( xi )


α
y
f
(
x
)
−
∑
i
a
a
i
2
n 
a

1 ∂χ
2
=0
a
2
χ = ∑
= −∑
 ,⇒
σ
i
σi
2 ∂α b
i =1 
i



In Matrix notation Yb − ∑ α a H ab = 0;
a
where Yb = ∑
i
yi f b ( xi )
σ
2
i
and H ab
f (x ) f (x )
1 ∂2χ 2
=
= ∑ a i 2b i
2 ∂α a ∂α b
σi
i
Thus Y = αH ; then solution as α = YH −1
34
Correlated
Correlated error
error for
for yy
How the definition of χ2 modified when various yi are correlated
Let start with two uncorrelated variables x’ and y’, with x’±σx’ and y’±σy’ are measured
2
2
quantities and x’0 and y’0 the predicted one
 x'− x0 '   y '− y0 ' 
2
χ =  2  +  2 
 σ x'   σ y' 
x' = x cos θ − y sin θ
y ' = x sin θ + y cos θ
Error transformation
Numerator : look separately
the co-efficient of δx2, δy2 and
δxδy terms and use
cov(x’,y’)=0
σ x2' = σ x2 cos 2 θ + σ y2 sin 2 θ − 2 cov( x, y ) sin θ cos θ
σ y2' = σ x2 sin 2 θ + σ y2 cos 2 θ + 2 cov( x, y ) sin θ cos θ
cov( x' , y ' ) = (σ x2 − σ y2 ) sin θ cos θ + cov( x, y )(cos 2 θ − sin 2 θ ) = 0
χ2 =
[
1
σ y2 ( x − x0 ) 2 + σ x2 ( y − y0 ) 2 − 2 cov( x, y )( x − x0 )( y − y0 )
2 2
σ x σ y − cov( x, y )
]
= H11 ( x − x0 ) 2 + H 22 ( y − y0 ) 2 + 2 H12 ( x − x0 )( y − y0 )
Denominator :separately
cov(x,y), cov2(x,y) and
remaining terms and add them
H11, H12 and H22 are the element of the inverse error matrix
 H11

 H 21
−1
 σ x2
H12 
cov( x, y ) 
; In matrix notation
 = 
2
 cov( x, y )

H 22 
σ
y


χ 2 = ∑ ∆Ti H ij ∆ j , where ∆ j = y obs
− y thj
j
ij
e.g., a fit with two BW functions+backgrounds. Area under BW functions is a function of
fitted parameters and area is function of M and Γ, which are correlated as a output of
fitting function. Use this to calculate error on yields.
35
Fit
Fit to
to points
points with
with errors
errors on
on xx and
and yy
Instead of simple χ2, here
minimse ri, where
ri =
Distance of that position on line from given data point
Radius vector of error ellipse in that direction
rimin corresponds to that point
on the line which minimises
the error ellipse function
( x − xi ) 2 ( y − yi ) 2
+
δxi2
δyi2
χ = ∑ (ri
2
i
)
min 2
( yi − a − bxi ) 2
=∑ 2 2
b δxi + δyi2
i
This is simply difference di=yi −a –bxi
with an error εd2 = δyi2 +b2δxi2
(propagation of error on x to error on y)
yi − bxi
∑i b 2δx 2 + δy 2
( yi − a − bxi )
1 ∂χ 2
i
i
= −∑ 2 2
=0⇒a =
2
1
2 ∂a
b δxi + δyi
i
∑i b 2δx 2 + δy 2
i
i
b can be obtained from the minimisation of χ2
36
Comparison
Comparison of
of various
various technique
technique of
of parameter
parameter fit
fit
Table LL110
37
P( x1 < x < x2 ;θ ) = 1 − α =
x2
Upper
Upper and
and lower
lower limit
limit
∫ f ( x;θ )dx
x1
Under fairly general conditions with the methods of
maximum-likelihood or chi-squares in the large sample limit,
the estimator will be distributed according to a multivariate
Gaussian centered about the true (unknown) value
µ +δ
 1 ( x − µ )2 
1
dx
1−α =
exp −
2
∫
2π σ µ −δ

 2 σ
α(%)
31.73
4.55
0.27
6.3×10─3
5.7×10─5
2.0×10─7
δ
1σ
2σ
3σ
4σ
5σ
6σ
α(%)
20
10
5
1
0.1
0.01
δ
1.28σ
1.64σ
1.96σ
2.58σ
3.29σ
3.89σ
Bayesian approach : P(θ/x) = L(x/θ) π(θ)
posterior Likelihood Prior pdf
(debate on this issue)
38
Limits
Limits for
for low
low statistics
statistics (counting
(counting expt
expt with/without
with/without background)
background)
For Poisson distribution, counting a certain
number of events, n : random repetition of
the experiment with µ=Nup (Nlo) probability
(1-α) to observe more (less) than n events
Simultaneously one can have lower and
n
( N up + b) n
upper limit
∑ exp(− ( N up + b))
obs
In presence of background,
α=
n!
n =0
nobs
b
n
∑ exp(− b ) n!
n =0
b=0.4
Nup=5.91
Small number is always a tricky problem
39
Minimisation
Minimisation procedures
procedures
Due to complex expression of likelihoof/-χ2 function, in there is no analytical solution of the
many experimental data points. There are mainly two different kind of approach to look
this problem numerically, e.g., 1. Grid/random Search and 2. Gradient search
Grid search
• Grid : Evaluate F(x) at x0, x0+∆x0, x0+2∆x0, ….. And look for minimum F(X). Crude
mapping over large distance and then finer binning. Only suitable for finite range of
search and smaller dimension.
• Random search : Instead of equally spaced points, generate point according to some
function; better for large range of parameter space and larger dimensions. But, may
not get any true minima.
• Co-ordinate variation method or single parameter or one-by-one variation method
Vary one parameter and get
minima, then next one,….
Weakly correlated
Strongly correlate
Need large number of steps
In case of strongly correlated
variables, this is unacceptably
slow
40
•Rosenbrock method : Single-parameter method but get
the best direction after first steps of each dimension.
Efficiency decreases with number of variables
•Simplex method : Take n+1 points
F(Ph) = max(F(p1), F(p2), ….)
1  n +1

A
p
P
=
−


∑
i
h
F(Plo) = min(F(p1), F(p2),…..)
n  i =1

New line through Plo and Ā
Three operations can be used, reflection, contraction and
expansion
Reflecting Pn about Ā, P* = (1+α)Ā─αPh
1. If F(P*)<F(Plo), has produce a new minima and see
next step
P** = γP*+(1─γ)Ā
if F(P**)<F(Plo), replace Ph by P** and restart
if F(P**) ≥F(Plo), replace Ph by P* and restart
2. If F(P1o)≤F(P*)≤F(Ph), Ph = P*
3. If F(P*) ≥F(Ph) reflection failed, P* is unacceptable
New point P** between Ph and Ā, such that
P*=βPh+(1─β)Ā, 0<β<1
If F(P**)<F(Ph), Ph=P**
If F(P**) >F(Ph) & F(P*), failed, all Pi are replaced by
½(Pi+plo) and restart whole procedure
41
Gradient
Gradient method
method :: Predict
Predict new
new points
points relatively
relatively far
far from
from the
the las
lastt
point
point
•Steepest descent method : From P0 seek a
minimum of parameter space, where function
decreases most rapidly
2 1/ 2
∂F
ξi =
∂xi
 n  ∂F
/  ∑ 

j =1  ∂x j









i = 1,2,...., n
In 2D, it is same as ‘one-by-one variation method’,
but with a rotation of co-ordinate axis
Very slow due to completely interdependency in F
and choice of starting point far from true one
•Newton’s method : Second degree Taylor expansion
∂F
F ( x) = F ( x0 ) +
∂x
1 ∂2F
( x − x0 ) +
2 ∂x 2
xo
( x − x0 ) 2 + ....
x0
F ( x) ≈ F ( x0 ) + g T ( x − x0 ) + (1 / 2)( x − x0 )T G ( x − x0 )
g i = (∂F / ∂xi ) and Gij = (∂ 2 F / ∂xi ∂x j )
Approximate a function around x0 by a quadratic
surface. Calculate the minimum of the n
dimensional parabola analytically
xmin = x0 − G −1 g
Is not true minima, but forming a
new parabolic surface about xmin
and calculating its minimum
It requires G everywhere +ve
definite, when it is negative
artificially altered
Disadvantage : Evaluation of G and
inversion : large CPU time
42
Local
Local vs
vs Global
Global minima
minima
•Obtained minima may be a local minima,
but not a global minima. Change starting
value far away from initial one and look
for second minima,……
•Errors : From the second derivative of
the function in the minimum
1 ∂2F
F ( x) = F ( x0 ) +
( x − x0 ) 2
2
2 ∂x
and
 1 ∂2F 
2

σ = 
2 
∂
2
x


−1
⇒ F ( x0 + σ ) = F ( x0 ) + 1
For correlated variable calculate all
derivative and invert that second
derivative matrix to obtain error
43
Generation
Generation of
of random
random numbers
numbers
•They can take any value in a continuous range
•One can not predict in advance which value will be taken
•The distribution of the variables may be well known
•Distribution of a random variable gives probability of a given
value
g (u )du = P (u < u ' < u + du )
•Probability distribution function given by
•Integral distribution function u
G (u ) =
∫ g (u )du →g (u ) =
dG (u )
du
−∞
•G(u) increases monotonically with
u. Normalisation of g
determined by
+∞
∫ g (u )du = 1
−∞
44
• Truly random number : sequence of truly random numbers is completely
unpredictable and hence irreproducible
• Can be generated only through random physical processes, e.g.
Radioactive decay, arrival time of cosmic muon
• Pesudo random number : mid-square : Sequence generated according to a
strict mathematical formula but indistinguishable from a sequence which is
generated truly randomly
• Start with a number of r digit
• Take the middle r/2 digits , which is first random number
• Square the number of r/2 digit and again
• Take the middle of r/2 digit →second random number
• Multiple linear congruential generator, build up a sequence
ri = mod(ari-1+b,m)
• Choose a,b,m carefully
• Lower order bits much less random than higher order bits, e.g., for
random numbers in the range 1-10
• J=1+int(10*r1) better in compare with
• J=1+mod(100000*r1,10)
45
Transformation
Transformation method
method to
to generate
generate other
other distribution
distribution
Suppose one has a random number function producing x uniformly in the range
0≤x≤1, which implies probability of generating a number between x and x+dx
G(x)dx = dx
for 0≤x≤1
=0
for x<0, x>1
And the probability distribution function
is normalised ∫∞∞ g(x)dx=1
Now generate a random number with a given
probability density function g(y)=f(y)
with ∫∞∞ f(y)dy=1
Transformation law of probability
|f(y)dy| = |g(x)dx| or f(y)=g(x)|dx/dy|
So for a uniform random distribution[0,1],
g(x)=1, implies dx = ∫f(y)dy ➾ x=∫f(y)dy +c
Swap x & y
Inverting this to get y=y(x)
Simply, let a function f(x) ranging from 0 to 1 maps to another function f(y)
ranging from 0 to 2. Cumulative function, ∫00.5 f(x)dx = ∫01 f(y)dy (other example,
your x-coordinate is changing from cm to meter (100 time in x, but 1/100 times in f(x))
46
Generation
Generation of
of some
some simple
simple distribution
distribution
•Isotropic angular distribution : f(φ)=k, x = ∫φ K dφ =Kφ + c At φ=0, x=0 and
At φ=2π, x=1, thus K=1/2π,
thus φ = 2πx
•Exponential decay or absorption : f(τ) = (1/τ0)exp(─τ/τ0)
x=
➾ τ =─τ0ln(1-x) =
∫τ(1/τ0)exp(─τ/τ0) = 1 ─exp(─ τ/τ0)
─τ0ln(x)
• Probability distribution in continuous but not conventionally integrated
cosθ
analytically
F (cos θ ) = ∫ f (cos θ )d (cos θ ) = x
−1
Cumulative
distribution
F(cosθ)
Integrate numerically and normalise F(1)=1
Construct a table of Cosθn with index
f(cosθ)
n(n=0,1,2,….N) where F(cosθ) = n/N. Table
have N+1 entries
Generate x, multiply x by N
α = Integral part of Nx
─1
+1
Cosθ
β = fractional part of Nx
Cosθ(x) = Cosθα + β(Cosθα+1 ─Cosθα)
47
Normal/Gaussian
Normal/Gaussian Distribution
Distribution
First consider the prob distribution function fR(r)
2
R
Consider the cumulative distribution function
r
2
R
0
If a random variable is distributed according to (1) then
1/ 2
1/ 2
f (r ) = r exp(−r / 2)
0<r <∞
(1)
F (r ) = ∫ u exp(−(1 / 2)u ) = 1 − exp(−(1 / 2)r 2 = ξ
r = [− 2 ln(1 − ξ )]
= [− 2 ln(ξ )]
Now consider the Gaussian distribution with mean zero and σ=1
2
1
exp(− y / 2) − ∞ < y < ∞
φ ' ( y / 0,1) =
2π
In fact it will be easier to sample two independent Gaussian r.v. together
1
 1 2
2 
f ( x, y ) = φ ' ( x / 0,1)φ ' ( y / 0,1) =
exp − ( x + y )
2π
 2

48
Transferring it to independent polar co-ordinate r,ϕ
1
 1 2
φ ' ( x)φ ' ( y )dxdy =
exp − r  rdrdφ
2π
 2 
Here ϕ is uniformly distributed between 0 and 2π and may be sampled by
ϕ=2πζ
The pdf of r is r exp(−(1 / 2)r 2 )
thus
x = [− 2 ln ξ1 ] cos(2πξ 2 );
1/ 2
Hence r = [− 2 ln ξ1 ] ;
1/ 2
y = [− 2 ln ξ1 ] sin( 2πξ 2 )
1/ 2
Use of central limit theorem :
Sum of a large number of random variables will itself a normally distributed
random variable
The deviation of the mean of N random number (uniformly distributed) is
σm2=1/12N, σm=1/√(12N)
N
xi 1   N xi 1 
12  N
N
−1 
12
y
N
x
σ
=
−
=
−
=
−






Hence the variable
∑
∑
m ∑
i
2
N  i =1
 i =1 N 2   i =1 N 2 
Will have mean 0 and σ=1 and will be normally distributed
49
Rejection
Rejection and
and acceptance
acceptance
• It is very general technique for sampling any probability distribution
• Neither normalisation of the density nor any numerical/analytic integration
need to be known explicitly to carry out the sampling
• But sometimes it is too time consuming
Pmax
•Take two random numbers, x1[0,1] and
x2[0,1]
•Calculate f(x1)
•If (x2*Pmax <f(x1) accept x1 else reject
•Go for next event ……
• Generate distribution
f ( x) =
4
1
π 1+ x2
x
0 < x <1 P
max
• Transformation method :
x
π
4 1
4
−1
y=∫
dx
=
tan
(
x
)
⇒
x
=
tan(
y)
2
π 1+ x
π
4
0
• Rejection and acceptance method :
Pmax=4/π
Verify these two methods
f(x)
=4/π
f(x)
2/π
0
x
1
50
Special
Special functions
functions –– discrete
discrete numbers
numbers
Binomial distribution
n!
p (r ; n, p ) =
p r (1 − p ) n − r
(n − r )!r!
•K=0 and generate u uniform in (0,1)
•Pk=(1-p)^n, B=Pk
•If u<B accept r = k
•Else K=K+1, Pk= Pk(p/(1−p)).(n-k)/(k+1) , B=B+Pk and go to previous step
Landau distribution
P(n; λ ) = e
−λ
λn
n!
• K=1, A=1, Generate u uniform in (0,1)
• A= uA, if A<exp(−λ) accept nk = k−1
• Else k = k+1, generate new u and go to previous step
•
•
•
•
•
Or Like binomial distribution
K=0, Generate u uniform in (0,1)
A=exp(−λ)
If (u<A) accept nk = k
Else k=k+1, A = A + exp(−λ) λk/k! and go to previous step
51
Note
-square method
Note to
to least/chi
least/chi-square
method
• Choose suitable bin size, bin size may vary depending on statistics, merge bin contents
to have at least 5 entries in a bin
2
2
2
 1
 
  1 
  1 
σ ( x ) =E ( x − µ ) = E  ∑ xi − µ   = 2 E ∑ xi − nµ   = 2 E ∑ ( xi − µ ) 
 
  n  i
  n  i
 n i
2
[
2
]
2

2

∑ ( xi − µ ) = ∑ ( xi − µ ) − ∑∑ ( xi − µ )( x j − µ )
i
i j ≠i
 i

σ 2 (x) =
Error
1 

E ∑ ( xi − µ )
n 2  i

on
Simply, x =
s tan dard
2
 1
= 2
 n
nσ
∑ E [( x − µ ) ] = n
2
i
2
i
=
σ2
n
2(n − 1)
∑σ
2
i
= nσ 2
i
∂l
1
( x − µ )2
= −∑ + ∑ i 3
σ
∂σ
i σ
i
(x − µ)
∂l
1
= ∑ 2 − 3∑ i 4
2
σ
∂σ
i σ
i
2
=
n
σ
2
−3
nσ 2
σ
4
1
= 1 − σ 2t 2
2
σ
deviation, σ (σ ) =
1
1
1
∑ xi ⇒ ∂x = n ∑i ∂xi ⇒ δ 2 = n 2
n i
2
ϕ (t ) = ϕ (0) + ϕ ' (0)it − ϕ " (0)t 2 / 2!+......
=−
2n
σ
1
2
∫ y d cosθ = 1
−1
2
52