What is probability ? In many experiments, if conditions kept the same, repeated experiments →Different results Results of individual measurement unpredictable → possible results of a series of measurement have well defined distribution 1. Events must be completely uncorrelated (statistical independent) 2. Number of trials is large (law of large number) Ni = number of event observed in class i out of a total of N ➔ pi = probability of getting an event in class i = lim N→∞ (Ni/N) For a continuous varying variable x, Nf(xi)dxi = number of events observed in the interval xi and xi+dxi for value of x out of a total N events element of probability, dp = f(x)dx f(x) = probability density function within the permissible range xhigh ∫ f ( x)dx = 1 xlow Throwing a dice : Probability that one score N is 1/6. For large throws it appear 1 Emission of scintillator light : Emitted isotropically. If blackened on all sides except a slid angle dΩ, dΩ/4π emitted photon will escape More common in physics experiment is determination of some parameters from measurements Angular distribution in µ-decay f(cosθ) = ½(1+αcosθ) Experiment measures α →α* ±δα Crudely Probability that true value of α has been in the range α*-δα and α*+δα is 68.5% (inverse probability) Probability that true value of α lies between α*-δα and α+δα is 0 or 1 (Direct probality) What is the length of this table ? Ans 10 Which unit ? What is its error ? What is the probability that result from repeated measurement is within this error ? 2 Probability Probability distributions distributions •Binomial •Poisson •Normal •Prove f (r : N , p) = f (n, λ ) = e −λ N! p r (1 − p) N − r ( N − r )!r! λn n! 2 x − 1 ( µ ) f ( x; µ , σ 2 ) = exp(− 2σ 2 2π σ Binomial N→α p→0, but Np=λ N→ α Poisson α → λ Gaussian •Calculate mean and variance of all these distributions 3 •P(A+B)≤P(A)+P(B) •P(AB) = P(A/B)P(B) = P(B/A)P(A) •P(A/B)=P(A) if Occurrence of B does not effect whether or not A occurs : A and B are said to be independent •In that case P(AB) = P(A)P(B) e − λ λn n! × P = PF PB = f F (1 − f ) B n! (n − F )! F! e −λ ( p + (1− p )) λnp λn (1− p ) × = f F (1 − f ) B ( B)! F! e −λp λnp f F e −λ (1− p ) (λ (1 − f )) n (1− p ) × = F! B! e −λp (λf ) F e −λ (1− p ) (λ (1 − f )) B × = F! B! 4 Pr ob density function, P( x) ∫ P( x)dx = 1 or for discrete number ∑ P( x) = 1 First moment , µ = E[ x] = ∫ xP( x)dx Second central moment , σ 2 = E[( x − µ ) 2 ] = ∫ ( x − µ ) 2 P( x)dx = ∫ ( x 2 − 2 xµ + µ 2 ) P( x)dx = ∫ x 2 P( x)dx − 2 µ ∫ xP( x)dx + µ 2 ∫ P( x)dx = ∫ x 2 P( x)dx − µ 2 ∫ P( x)dx = [ E ( x 2 )] − [ E ( x)]2 = x 2 − x 2 5 Variance Variance of of measured measured <X> <X> Let x1,x2,….xn be a sample of size n from a distribution whose mean is µ and variance σ2 1 n 1 n x = ∑ xi , in the lim it n → ∞ µ = lim n→∞ ∑ xi , n i =1 n i =1 [ 1 n Simple var iance, s = ∑ ( xi − x ) 2 n i =1 2 ] error on individual measurement , σ 2 ( xi ) = E ( xi − µ ) 2 = ∫ ( xi − µ ) 2 P( x)dx [ is different than the error on mean value x , σ 2 ( x ) = E ( x − µ ) 2 ] 2 2 2 1 1 1 2 2 σ ( x ) =E ( x − µ ) = E ∑ xi − µ = 2 E ∑ xi − nµ = 2 E ∑ ( xi − µ ) n i n i n i [ ] 2 2 x µ x µ ( ) ( ) = − − ∑∑ ( xi − µ )( x j − µ ) − ∑ i ∑ i i i j ≠i i Cross terms are vanishes due to no correlations 2 1 nσ 2 σ 2 1 2 2 σ ( x ) = 2 E ∑ ( xi − µ ) = 2 ∑ E ( xi − µ ) = 2 = n i n n n i 1 1 1 Simply, x = ∑ xi ⇒ ∂x = ∑ ∂xi ⇒ δ 2 = 2 ∑ σ i2 = nσ 2 n i n i n i [ ] 6 Let x1,x2,….xn be a sample of size n from a distribution whose mean is µ and variance σ2 1 n 1 n x = ∑ xi , in the lim it n → ∞ µ = lim n→∞ ∑ xi , n i =1 n i =1 1 n Simple var iance, s = ∑ ( xi − x ) 2 n i =1 2 Which is not true variance of distribution σ2, because <x> ≠µ, this variance turns out to be biased estimator 1 n 1 n 1 n 1 2 2 Variance, s = ( xi − x ) = {( xi − µ ) − ( x − µ )} = {( xi − µ ) − ∑ (x − µ )}2 ∑ ∑ ∑ n − 1 i =1 n − 1 i =1 n − 1 i =1 n j 2 1 1 1 2 E (s ) = E E ∑ ( xi − µ ) ∑ ( x j − µ ) ( xi −µ ) − 2 ∑ − n n n j − 1 1 i i 2 1 1 E ∑ 2 + n −1 i n ( x j − µ )( xk − µ ) ∑∑ j k Using the fact that different xi are independent 1 1 2 2 E (s ) = E ( xi −µ ) − E ( x j −µ ) ∑ ∑ n − n n − 1 ( 1 ) i j 1 1 1 1 2 2 2 = E ( x − µ ) − E ( x − µ ) = n σ − nσ 2 = σ 2 ∑ ∑ i j n −1 i n(n − 1) j n −1 n(n − 1) 2 ∑ ( x −µ ) i i 2 = nσ 2 ; ∑ ( xi −x ) 2 = (n − 1)σ 2 ; ∑ ( x −µ ) 2 = σ 2 i i 7 Central Central limit limit theorem theorem If x1, x2,…xn be a set of n independent random variables and each xi be distributed with mean values µi and a finite variance σi2, then the variable has a limiting distribution which is N(0,1) z= ∑x −∑µ i i i ∑σ 2 i i =∑ i xi − µ σ n 1/ 2 ∑ xi − ∑ µi / ∑ σ i2 i i i Construct a standarised variable, For which we want to establish the characteristic function φ(t), by definition, φ(t) is the∞ Fourier transform of φ(x) i φ (t ) = ∫ e itx f ( x)dx =E [e itx ] = 1 + (it ) µ + 1 (it ) 2 (σ 2 + µ 2 ) + .... 2! Since −x∞i and hence (xi-µ)/(σ√n)- by assumption are independent, characteristics function for z nis equal to the product of n characteristic function for each of these n n it x − µ − itµ it x − σitµn t σ n σ n σ n φ (t ) = E e φ = = ( ) e E e e σ n 2 itµ 1 it it 2 2 µ+ + ln 1 + ln φ (t ) = n − (σ + µ ) + .... 2 ! σ σ σ n n n ϕ (t ) = ϕ (0) + ϕ ' (0)it − ϕ " (0)t 2 / 2!+...... 1 = 1 − σ 2t 2 2 2 2 itµ 1 − t2 1 1 it 1 it it 2 2 + ... ≈ − t 2 + O n −1/ 2 µ+ µ + .... = n = n − + (σ + µ ) − 2 2! σ n 2! σ n 2 n σ n σ n ( ) 1 Thus, when n → ∞, ϕ (t ) → exp − t 2 2 8 Likelihood Likelihood function function N independent observables x1,x2, …. Xn from a theoretical dustribution f(x/θ), where θ is the parameter to be estimated. L(θ/x) = f1(x1/θ)f(x2/θ) …….. f(xn/θ) Θ, can be found by solving the equation, dpj = f(xj, λ) dx has the character of a “posteriori” probability How probable that it was just find this result For N independent measurement dP = ∏f(xj,λ)dx dL d (ln L) = 0 in general =0 dθ dθ “Likelihood is reserved for a posteriori probabilities” Variance, σ 2 (θ ) = ∫ (θ − θ ) 2 L(θ / x)dx1dx2 ....dxn An approximate method in the limit of large numbers −1 d ln L ∂ ln L 2 −1 , , ( ) ( ) ii ⇒ = − U ⇒ σ θ = U ij i 2 ∂θ i ∂θ j dθ σ 2 (θ ) = − 2 9 Mean Mean and and error error in in Gaussian Gaussian function function through through Likelihood Likelihood function function L=∏ i ( xi − µ ) 2 ( xi − µ ) 2 1 ; l = ln L = ∑ − ln σ i + ∑ − + const exp − 2 2 2 2 σ σ 2π σ i i i i i xi − µ ∂l =∑ ∂µ σ i2 i ∂ 2l 1 = − ∑i σ 2 ∂µ 2 i 2 1 1 xi − µ 1 ∂l 2 = − + ∑ ∑ 2 2 2 i σi σi ∂σ i 2σ i Weighted mean, µ = 2 / x σ ∑ i i i ∑1 / σ 2 i i Error on weighted mean, σ 2 ( µ ) = 1 ∑1 / σ i2 i 10 Addition Addition of of two two variables variables and and its its error error If a = b+c, what is the error of b in terms of σb and σc ? Ans : σa2 = σb2 + σc2, in the case b and c are uncorrelated In a series of measurement, we do not know ∂a or ∂b, but only known parameters are their mean square value, σ σ a2 =< (a − a ) 2 >=< [(b + c) − (b + c )]2 > =< (b − b ) 2 > + < (c − c ) 2 > +2 < (b − b )(c − c ) > = σ b2 + σ c2 + 2 cov(b, c) All distributions in experiments are Gaussian (with large statistics). Gaussian properties are known well, or want to use that properties in all measurement. Full Width Half Maximum (FWHM) =2.35σ for Gaussian distribution e 1 2 x 2 = 1 ⇒ x = 2 ln 2 = 1.177 2 11 Gaussian Gaussian distributions distributions in in two two variables variables Probability distribution of two variables x and y 1 x2 P( x) = exp − 2 σ 2 2π σ x x 1 and 1 y2 P( y) = exp − 2 2π σ y 2σy 1 If x and y are independent, combined probability 1 x2 y2 P ( x, y ) = P ( x ) p ( y ) = exp − 2 + 2 2 σ x σ y 2πσ xσ y 1 Probability reduced by a factor √e (corresponds to 1D case, when x=±σx 2 2 1 / σ x2 0 x x y = 1 ( ) in matrix notation x y + = 1 , 2 2 2 1 / σ y y σx σy 0 σ x2 0 Inversion of 2×2 matrix known as 0 σ2 error matrix for x and y y In general the elements of error matrix, Mij A specific example for a set of variables x1,x2,..xn is <(xi─<xi>)((xj─<xj>)>, which is symmetric and 2 2 = = 0 . 354 ; = = 0.707 σ σ x y off diagonal terms are cov(xi,xj) 4 2 P( x, y ) = P(0,0) / e when 8 x 2 + 2 y 2 = 1 12 Correlated Correlated variables variables Rotate the axes clockwise by an angle θ(e.g. 30o), variable x’ and y’ wrt new coordinate x' = x cos θ − y sin θ y ' = x sin θ + y cos θ Eqn of ellipse for P(x’,y’)=P(0,0)/√e [ ] 1 13 x'2 +6 3 x' y '+7 y '2 = 1 2 In matrix notation ( x' 13 / 2 3 3 / 2 x' = 1 y ') 3 3 / 2 7 / 2 y ' Error matrix, inverse of −3 3 1 7 32 − 3 3 13 7 σ x2' = = (0.468) 2 32 13 σ y2' = = (0.637) 2 32 3 3 cos( x' , y ' ) = − = −(0.403) 2 32 σ x2 = (0.354) 2 σ y2 = (0.707) 2 cos( x, y ) = 0 13 Correlated Correlated variables variables •Semi-axes(0.354) = square root of eigen values of error matrix •Maximum value of x’(0.468) = square root of diagonal elements in the error matrix •Intersect in the x’-axis (√(2/13)=0.392)= diagonal element of inverse error matrix •General form of Gaussian distribution of two variables P ( x, y ) = ρ= 1 σ xσ y 1 1 x 2 y 2 2 ρxy + 2− exp − 2 2 2 1− ρ 2 1 − ρ σ x σ y σ xσ y , | ρ |≤ 1 ρ is the correlation coefficient for x and y The probability distribution of y for a value “a” of x (conditional prob P(y/a) is also Gaussian of mean σy and variance y+ρ (a − x ) and σ y2 (1 − ρ 2 ) σ Generalised form for x K varibales 1 P= (2π ) K / 2 x’ 1 2πσ xσ y cos( x, y ) y’ 1 1 exp − xT M −1 x ; |M | 2 2 σ x2 cov( x, y ) 1 1 / σ x −1 M ( for two) = and M = 2 cov( x, y ) σ 1 − ρ 2 − ρ / σ xσ y y − ρ / σ xσ y 2 1 / σ y 14 Propagation Propagation of of error error and and error error matrix matrix pi = pi(xa, xb) What is the error of pi in terms of error of xa and xb ? 2 ∂p δpi2 = i ∂xa 2 ∂p ∂p ∂p ∂p < ( pi − pi ) 2 >= δpi2 = i δxa2 + i δxb2 + 2 i i δxaδxb ∂xa ∂xb ∂xb ∂xa ∂pi ∂pi δxa2 δxaδxa ∂xa Simply matrix notation σy2 = DTMD ∂xb δxaδxa δxb2 ∂pi ∂x pi = pi ( xa , xb ) b ∂p ∂p δpi = i δxa + i δxb ; ∂xb ∂xa Propagation of error matrix from (xa, xb) to (pi, pj) δp j = ∂p j ∂xa δxa + ∂p j ∂xb δxb ∂p ∂p ∂p ∂p ∂p ∂p ∂p ∂p δpiδp j = i j δxa2 + i j δxb2 + i j + i j δxaδxb ∂xa ∂xb ∂xb ∂xa ∂xb ∂xb ∂xa ∂xa δpi2 δp δp i j p j = p j ( xa , xb ) ∂pi δpiδp j ∂xa = 2 δp j ∂p j ∂xa ∂pi ∂xb ∂p j ∂pi 2 x x x δ δ δ a a b ∂xa δx δx δxb2 ∂pi a b ∂xb ∂xb ∂p j ∂xa ∂p j ∂xb In matrix notation M’=TTMT M’=New error matrix TT=Transpose of T M = Old error matrix T=Transformation matrix Function of changed variables y = y ( pi , p j ); pi = pi ( xa , xb ) p j = p j ( xa , xb ) ⇒ σ y2 = D T T T MTD Assignment 3.3 15 Simple Simple examples examples of of propagation propagation of of error error Example 1 : Forward-backward asymmetry of an angular distribution Case A : F and B follows independent poisson distribution ∂A δA = ∂F 2 ∂A σ F2 ∂B 0 0 ∂A / ∂F 2 B = 2 2 A B ∂ ∂ / σ B N 2 F F − 2 N 0 A= F −B F −B = F+B N 0 2 B / N 2 4 FB = B − 2 F / N 2 N 3 Case B : N is given, F and B follows binomial distribution with probability p of being produced in forward hemisphere. F and B are completely anti correlated − cov(σ F , σ B ) = σ F2 = σ B2 = Np (1 − p ) ~ FB / N and 1 σ = N 2 A 1 σ F2 − N − σ F2 ∂A ∂A 1 =− = ∂B N ∂F − σ F2 1 / N 4 FB = σ F2 − 1 / N N 3 Example 2 : A and B are uncorrelated with error σA2 and σB2 and 1 ( a + b) σ x2 cov( x, y ) 1 / 2 2 = ⇒ 2 1 σ x 1 / 2 y= (a − b) cov( x, y ) 2 x= 2 1 / 2 σ a 0 1 / 2 2 − 1 / 2 0 σ x 1 / 2 2 2 2 2 1 / 2 1 σ a + σ b σ a − σ b = 2 2 2 2 2 −1/ 2 σ a − σ b σ a + σ b Without covariant term, if we want to go back to the error on a and b σ a2 cov(a, b) 1 / 2 = 2 cov(a, b) σ b 1 / 2 2 2 1 / 2 0 1 / 2 (σ a + σ b ) / 2 2 2 0 (σ a + σ b ) / 2 1 / 2 − 1 / 2 2 2 0 1 / 2 (σ a + σ b ) / 2 = 2 2 0 (σ a + σ b ) / 2 −1/ 2 Error are same for a and b and they are different from true errors σA2 and σB2 16 Combination Combination of of different different errors errors •Statistical error (known exactly for specific distribution) •Systematic error (very difficult to formulate as statistical one); depends on the situation of the experiment Threshold peak of D0D0π0 •Number of observed events, NOBS •Signal efficiency •Background subtraction •D0 branching fraction •Track finding efficiency •K±/π± identification efficiency • π0 detection efficiency •D0 resolution is Data/MC •Poorly reconstructed π0 •Efficiency due to decay kinematics N1 + α 2 N12 matrix = 2 α N1 N 2 Br1 ( B ± → D 0 D 0π 0 K ± ) = N obs N B ± ε Brd ( D 0 → K −π + ) 2 ln Br1 = ln N obs + ln N B ± + ln ε + 2 ln Brd σN = obs Br1 N obs σ Br 1 2 σ N B± + N B ± 2 σ ε 2 σ Brd 2 + + 4 Br ε d First error is simple statistical error (uncorrelated) Muon counts in two different telescope N1 ± N1 ± α N1 ⇒ error N 2 ± N 2 ± αN 2 B ± → D 0 D 0π 0 K ± ; D 0 → K −π + α 2 N1 N 2 N 2 + α 2 N 22 Second one are totally correlated due to Vth, τ etc 17 Assignment Assignment δr12 = σ 12 (4 cos 2 θ1 + sin 2 θ1 ) δθ12 = σ 12 (4 sin 2 θ1 + cos 2 θ1 ) / r12 δr1δθ1 = −3σ 12 sin θ1 cos θ1 ) / r1 = δθ1 δr1 δr22 = σ 22 (4 cos 2 θ 2 + sin 2 θ 2 ) δθ 22 = σ 22 (4 sin 2 θ 2 + cos 2 θ 2 ) / r22 δr2δθ 2 = 3σ 22 sin θ 2 cos θ 2 ) / r2 = δθ 2 δr2 • Obtain the error matrix for the variables x1,y1,x2,y2 • Calculate the error on the function f=x1x2+y1y2 • Use the original error matrix for the variables r1,θ1,r2,θ2 to obtain the error on the function r1r2cos(θ1─θ2). Compare this error with previous one 18 Fitting Fitting technique technique :: moments moments dn = a + b' cos θ = N (1 + b cos 2 θ ) e.g., d cos θ 1 / 3 + b(1 / 5) 5(3cos 2 θ − 1) ⇒ cos θ = ⇒b= 1 + b(1 / 3) (3 − 5cos 2 θ ) 2 Calculate b from averaging over cos2θ over events and 1 n cos θ = ∑ cos 2 θ n i =1 Error from the error on <cos2θ> 1 σ (cos θ ) = n 2 2 ( 1 n cos 2 θ − cos 2 θ ∑ n − 1 i =1 ) 2 Calculate it from a data set ={−0.9, −0.85, −0.81,−0.7,−0.55,−0.34,−0.2,−0.05, 0.1, 0.25, 0.30, 0.50, 0.55, 0.70, 0.9, 0.99} 19 Moments Moments •Simplest No need of optimisation •No binning •Extendable to several parameters or variables •Several possible moments dn e.g ., d cos θ = N (1 + a cos θ + b cos 2 θ ) •Constraints among parameters not easily incorporate •Parameters can be unphysical cos 2 θ = 0 ⇒ b = −5 / 3 cos 2 θ = 1 ⇒ b = −5 cos 2 θ = 3 / 5 + ε ⇒ b = −∞ •No checking for the goodness of fit, e.g. true distribution is sin2θcos2θ •Useful to get a starting value for more sophisticated methods −1 Cosθ 1 20 Maximum Maximum likelihood likelihood method method Angular distribution y= dn = N (1 + b cos 2 θ ) d cos θ N=normalisation factor 1 ∫ yd cosθ = 1; ⇒N = −1 1 2(1 + b / 3) Normalised y behaves as a probability distribution Normalisation is essential beacause its dependence on parameter b., factor ½ is not crucial, just a scale, but need for error estimation. For the ith event, yi = N(1+bcos2θ) is the probability density for observing that event e.g. θi, for a given value of b Define likelihood, ℒ as the product of yi for all events or called as joint probability function, ℒ(b/θ)=∏yi, which is probability of observing given set of θi for that b For true (theory) value of b, ℒ(b/θ) is maximum, or inversely maximise ℒ as a function of b to find the best value of b Normalisation of ℒ, N=∫P(θ/b)dcosθ must be independent of θ Without the normalisation factor, N, one can make yi larger simply by increasing b, hence ℒ would not have any absolute maximum Similarly for lifetime fit to t1,t2,….tn τ = ∑ti/τ, it will be incorrect, if one misses 1/τ in the expression of probability P(t/τ) = (1/τ) exp(−t/τ), probability will larger for any smaller value of τ 21 Likelihhod Likelihhod function function for for multiparameter multiparameter fit fit A case of Breit-Wigner function, where two free parameters in the function, (M0 and Γ) probability density function is ℒ(M0,Γ/mi)=∏yi(M0,Γ) yi ( M 0 , Γ ) = The maximum ℒ give the best value of M0 and Γ 1 Γ 2π (mi2 − M 0 ) 2 + (Γ / 2) 2 ℒ(M0(Γ)/mi)=∏yi(M0(Γ)) for fixed Γ(M0), one parameter function, when we know second one. 22 Properties Properties of of likelihhood likelihhood function function ∏yi is a very small number for large number of measurements (N), difficult to calculate Conventional to consider ℓ = ln(ℒ)=∑yi For large N, likelihhod function, ℒ tends to Gaussian, atleast near the vicinity of maximum 1 ∂ 2l 1 2 δ p + = l − ( ) ..... (δp ) 2 + ....., l = lmax + max 2 2! ∂p 2c ( p − p0 ) 2 1 ∂ 2l where − = 2 ⇒ Likelihood = exp − c c ∂p 2 • • • • • Calculation of p (probable value) is simple, but what is its error ? RMS deviation of the ℒ distribution about its mean (−∂2ℓ/∂p2))−1/2 The change in p required to reduce ℓ from its maximum value by 0.5, i.e. ℓ(p0±∂p)=ℓ(p0)−1/2 For non Gaussian ℒ, use second or third option Try to avoid non-Gaussian situation, e.g. use decay rate (1/τ), rather than lifetime τ or use 1/p rather than momentum p of charge track 23 Lifetime Lifetime determination determination dn 1 = exp(−t / τ ) dt τ 1 dn L = Π = Π exp(−t / τ ), i τ dt i thus l = ∑ (−ti / τ − ln τ ) τ→ i ti N ∂l ti 1 ∑ = ∑ 2 − = 0 ⇒ 2 − = 0 ⇒ τ (or τ max ) = ∑ ti / N = ti τ τ τ ∂t i τ 2N N N ∂ 2l ∂ 2l 2ti 1 = ∑ − 3 + 2 = − 2 + 2 = − 2 ⇒ στ = 1/ − 2 = τ / N 2 τ τ τ τ τ ∂t ∂t i Universal function of τ / τ max = l(τ ) − l(τ max ) = Nτ max / τ − N ln τ − ( N − N ln τ max ) = N [1 + ln(τ max / τ ) − τ max / τ ] For given N, σ+ and σ− are defined (~τmax/√N and N→∞, For small N, σ+ > σ− l(τ max ) = − N (1 + ln t ) N.B. ℓ(τmax) depends only on <t>, but not on distribution of ti Relevant for whether ℓmax is useful for testing goodness of fit Not at all justified whether this parametrisation ? 24 Likelihood ℒ) vs pdf) Likelihood ((ℒ) vs probability probability distribution distribution function function ((pdf) •Poisson : pdf = Probability distributing function for observing n, given λ, is P(n; λ ) = e − λ λn / n! pdf From this construct ℒ as L(λ ; n) = e −λ λn / n! Use same function for λ and n, but λ ℒ n For pdf, λ is fixed : P(n;λ) exists only at integer n≥0 For Likelihood, n is fixed :ℒ(λ;n) exists as continuous function of λ≥0 Lifetime distribution : pdf maximises at t=0 P (t ;τ ) = (1 / τ )e − t /τ ℒ maximises at τ=t ℒ Fixed τ L(τ ; t ) = (1 / τ )e −t /τ P Both t and τ are continuous 1 x − µ 2 1 Gaussian exp − P ( x; µ ) = 2 σ 1 x − µ 2 1 exp − L( µ ; x) = 2 σ 2π σ 2π σ t Fixed t τ Same functional form If only consider Gaussians, one can get confused between pdf and ℒ Integration of pdf=1, whereas integration of ℒ is meaningless 25 Error Error on on maximum maximum likelihood likelihood fit fit For single parameter fit, parameter p is estimated from eqn (dℓ/dp)=0 and error σ=(─d2ℓ/dp2)−1/2 For multivariable pi, their best value is obtained from ∂ℓ/∂pi)=0 For error, define Hij = − (∂2ℓ/∂pi∂pj) and obtained error matrix as Eij=(H−1)ij An example : two variables x and y. Contours of ℓ are ℓ=−(4x2+y2), where, ℓmax at (0,0) and ℓ=ℓmax−1/2 when 8x2+2y2=1 Error on variables X = ±√(1/8) (when y=0) Y = ±√(1/2) (when x=0) In terms of Hij 8 0 invert 1 2 0 ∂ 2l → − = ∂xi ∂x j 0 2 16 0 8 Rotate axis clockwise by an 30o ➾ ℓ= −1/4(13x’2+6√3x’y’+7y’2) −3 3 ∂ 2l 1 13 3 3 invert 2 7 → − = ∂xi ∂x j 2 3 3 64 − 3 3 7 13 26 Extended Extended maximum maximum likelihood likelihood 1. Maximum likelihood uses shape ➾parameters N! PF = f F (1 − f ) B ; F ! B! Maximise, ln PF ∂ 2 ln PF 2 wrt f ⇒ f = F / N , σ ( f ) = − ∂f 2 −1 f (1 − f ) ≈ N Estimate of F = Nf = F ± FB / N Estimate of B = N (1 − f ) = B ± FB / N Completely anti correlate 2. Extended maximum likelihood uses shape and normalisationand uses probability of 1. Observing sample size of N events and 2. Sample had the given distribution in x. λN × PF Pa = exp(−λ ) N! Maximise ln Pa (λ , f ) ⇒ λ = N ± N and f = F / N ± f (1 − f ) / N Completely uncorrelated variable For <F> and <B>, either propagate errors for <Φ>=<λ><f> and <β>=<λ>(1−<f>) −F F −B B φ β e e Or rewrite equation as product of two independent Poissons × Pa = F ! B ! Gives the same answer, <Φ> = F ± √F and β = B ± √B 27 Pros Pros & & cons cons of of Likelihood Likelihood method method •No need of histogramming, Most useful for low statistics case •Unique answer and error (e.g. λ0±δλ or τ0±δτ, then λ0+δλ = 1/(τ0−δτ), •Able to constrain parameters and ranges l = ln f R wR (mi ) PR (θ i ) + f B wB (mi ) PB (θ i ) •Difficult to tackle background ∑ ( ) i •Use weight factor (e.g. efficiency, ε=1/w) for different events ℓ=∑wiln(yi), but difficult to estimate error, better option is ℓ=∑ln(Nyi/wi) •Large computing time, normalisation for each parameter set separately •How good this fit is ? No limit on the value of ℒ ➾Hypothesis testing is not easy, but can compare two hypothesis (ℓa−ℓb) can just give better choices of models. 28 Chi -square fit Chi-square fit Data set {xi, yi ± δyi} and theory y=a+bx Does it fit straight line ? (hypothesis testing) y What are gradient and intercept ? (Parameter determination) obs ( yi − yith (α j )) 2 2 χ =∑ 2 i σi σi supposed to be error on theory, but in reality it is the error on experimental observation. If theory and data are consistent with each other, yth ~ yobs ➾χ2 is small Minimize χ2 to obtain the best line (best parameter of the theory) −1 / 2 1 d 2χ 2 The error on parameter = 2 2 dp Or increase χ2 by 1 from its minima χ2min For multi parameters, their best values are obtained from ∂χ2/∂p=0 and error 1 ∂2χ 2 matrix is the inverse of 2 ∂p ∂p i j For single measurement, yobs±σ χ = 2 x ( y obs − y th ) 2 σ2 Minimises with χ2=0 for yobs = yth and χ2=1 for yobs = yth ±σ For two measurements y1 and y2 of single quantity with equal χ = 2 ( y1obs − y th ) 2 σ 12 + ( y2obs − y th ) 2 σ 22 Yth = (y1+y2)/2 from dχ2/dyth=0 and its error = σ/√2 from χ2 =χ2min + 1 Exactly what do we expect, error=σ/√n 29 Simple Simple example example of of minimising minimising χχ22 Measurements, p1±σ1, p2±σ2,…..pn±σn The beast value <p>±σ Construct χ =∑ 2 i ( p − pi ) 2 σ i2 and min imise χ 2 wrt p p − pi pi 1 dχ 2 1 =∑ = ⇒ p = 0 ; and error on p given by ∑ ∑ 2 2 2 σ σ σ 2 dp i i i i i i 1 d 2χ 2 σ = 2 2 dp −1 / 2 d 2χ 2 1 = ; 2 ; ∑ 2 2 dp i σi thus 1 σ2 =∑ i 1 σ i2 It of linear eqn 30 Example Example of of straight straight line line fit fit A simple example y=a+bx; simple because y is linear in a and b, not in x Data consists of n points (xi, yi±σi) : Lots of line for different χ2 Minimixe χ2 wrt a and b 2 y − a − bxi χ 2 = ∑ i σi i a + bxi − yi 1 ∂χ 2 =∑ =0 2 σ 2 ∂a i i Simultane ous two eqns for 1 ∂χ 2 (a + bxi − yi ) xi =∑ = 0 two 2 σi 2 ∂b i unknown a and b Solution of these equations b= fi [1][ xy ] − [ x][ y ] 1 , [ ] = where f ∑ [1][ x 2 ] − [ x][ x] n i σ i2 Weighted mean <f>=[f]/[1] and a can be determined from <y>=<a>+b<x> 31 Error Error on on straight straight line line fit fit To evaluate error, need error matrix, a and b are correlated. First evaluate 1 ∂2χ 2 1 ∂2χ 2 1 ∂2χ 2 2 = n[1]; = n[ x ]; = n[ x]; 2 2 2 ∂a 2 ∂b 2 ∂a∂b Inverse error matrix Error matrix 1 ∂2χ 2 2 ∂pi ∂p j [1] [ x] n 2 [ ] [ ] x x 1 [ x 2 ] − [ x] nD − [ x] [1] Where determinat D=[x2][1]-[x][x] For no correlation σ2(a)=1/[1] and σ2(b)=1/[x2] Errors depends only on measured variable xi, σi but not on how well data agrees with theory Cov(a,b)= −<x> Better to use x’ (=x−<x> bcause error on a’ and b are uncorrelated 32 •How well is the y-coordinate of the fitted line known for a particular x-value ? Variance of y, σy2 = σa2 + 2xcov(a,b)+ x2σb2 For proper shift of x such that <x> =0, σy2 = σa2 + x2σb2 • Hypothesis testing : How well the data points matched with the theoretical expectation ? Probability of getting as large as this in a χ2 distribution (χ2/dof) χ 2 min =∑ i yi2 σ i2 − a∑ i yi σ i2 − b∑ i xi yi σ i2 •Error on first kind : Reject H when it is true, should happen x% of time •Error on second kind : Accept H when something else is true •Optimum χ2criteria for the best result 33 Generalised Generalised χχ22 fit fit Set of data points (xi, yi±σi) by a form yth(x,αa), a function of variable x and parameters αi 2 yi − y ( xi , α a ) ∂χ 2 , solution of α from simul tan eous eqn χ = ∑ =0 σi ∂α a i =1 1 ∂2χ 2 −1 Error on parameters < (α a − α a )(α b − α b ) = ( H ) ab , where H ab = 2 ∂α a ∂α b 2 th n Let a general linear case y th = ∑ α a f a ( x) a 2 ( yi − ∑ α a f a ( xi )) f b ( xi ) α y f ( x ) − ∑ i a a i 2 n a 1 ∂χ 2 =0 a 2 χ = ∑ = −∑ ,⇒ σ i σi 2 ∂α b i =1 i In Matrix notation Yb − ∑ α a H ab = 0; a where Yb = ∑ i yi f b ( xi ) σ 2 i and H ab f (x ) f (x ) 1 ∂2χ 2 = = ∑ a i 2b i 2 ∂α a ∂α b σi i Thus Y = αH ; then solution as α = YH −1 34 Correlated Correlated error error for for yy How the definition of χ2 modified when various yi are correlated Let start with two uncorrelated variables x’ and y’, with x’±σx’ and y’±σy’ are measured 2 2 quantities and x’0 and y’0 the predicted one x'− x0 ' y '− y0 ' 2 χ = 2 + 2 σ x' σ y' x' = x cos θ − y sin θ y ' = x sin θ + y cos θ Error transformation Numerator : look separately the co-efficient of δx2, δy2 and δxδy terms and use cov(x’,y’)=0 σ x2' = σ x2 cos 2 θ + σ y2 sin 2 θ − 2 cov( x, y ) sin θ cos θ σ y2' = σ x2 sin 2 θ + σ y2 cos 2 θ + 2 cov( x, y ) sin θ cos θ cov( x' , y ' ) = (σ x2 − σ y2 ) sin θ cos θ + cov( x, y )(cos 2 θ − sin 2 θ ) = 0 χ2 = [ 1 σ y2 ( x − x0 ) 2 + σ x2 ( y − y0 ) 2 − 2 cov( x, y )( x − x0 )( y − y0 ) 2 2 σ x σ y − cov( x, y ) ] = H11 ( x − x0 ) 2 + H 22 ( y − y0 ) 2 + 2 H12 ( x − x0 )( y − y0 ) Denominator :separately cov(x,y), cov2(x,y) and remaining terms and add them H11, H12 and H22 are the element of the inverse error matrix H11 H 21 −1 σ x2 H12 cov( x, y ) ; In matrix notation = 2 cov( x, y ) H 22 σ y χ 2 = ∑ ∆Ti H ij ∆ j , where ∆ j = y obs − y thj j ij e.g., a fit with two BW functions+backgrounds. Area under BW functions is a function of fitted parameters and area is function of M and Γ, which are correlated as a output of fitting function. Use this to calculate error on yields. 35 Fit Fit to to points points with with errors errors on on xx and and yy Instead of simple χ2, here minimse ri, where ri = Distance of that position on line from given data point Radius vector of error ellipse in that direction rimin corresponds to that point on the line which minimises the error ellipse function ( x − xi ) 2 ( y − yi ) 2 + δxi2 δyi2 χ = ∑ (ri 2 i ) min 2 ( yi − a − bxi ) 2 =∑ 2 2 b δxi + δyi2 i This is simply difference di=yi −a –bxi with an error εd2 = δyi2 +b2δxi2 (propagation of error on x to error on y) yi − bxi ∑i b 2δx 2 + δy 2 ( yi − a − bxi ) 1 ∂χ 2 i i = −∑ 2 2 =0⇒a = 2 1 2 ∂a b δxi + δyi i ∑i b 2δx 2 + δy 2 i i b can be obtained from the minimisation of χ2 36 Comparison Comparison of of various various technique technique of of parameter parameter fit fit Table LL110 37 P( x1 < x < x2 ;θ ) = 1 − α = x2 Upper Upper and and lower lower limit limit ∫ f ( x;θ )dx x1 Under fairly general conditions with the methods of maximum-likelihood or chi-squares in the large sample limit, the estimator will be distributed according to a multivariate Gaussian centered about the true (unknown) value µ +δ 1 ( x − µ )2 1 dx 1−α = exp − 2 ∫ 2π σ µ −δ 2 σ α(%) 31.73 4.55 0.27 6.3×10─3 5.7×10─5 2.0×10─7 δ 1σ 2σ 3σ 4σ 5σ 6σ α(%) 20 10 5 1 0.1 0.01 δ 1.28σ 1.64σ 1.96σ 2.58σ 3.29σ 3.89σ Bayesian approach : P(θ/x) = L(x/θ) π(θ) posterior Likelihood Prior pdf (debate on this issue) 38 Limits Limits for for low low statistics statistics (counting (counting expt expt with/without with/without background) background) For Poisson distribution, counting a certain number of events, n : random repetition of the experiment with µ=Nup (Nlo) probability (1-α) to observe more (less) than n events Simultaneously one can have lower and n ( N up + b) n upper limit ∑ exp(− ( N up + b)) obs In presence of background, α= n! n =0 nobs b n ∑ exp(− b ) n! n =0 b=0.4 Nup=5.91 Small number is always a tricky problem 39 Minimisation Minimisation procedures procedures Due to complex expression of likelihoof/-χ2 function, in there is no analytical solution of the many experimental data points. There are mainly two different kind of approach to look this problem numerically, e.g., 1. Grid/random Search and 2. Gradient search Grid search • Grid : Evaluate F(x) at x0, x0+∆x0, x0+2∆x0, ….. And look for minimum F(X). Crude mapping over large distance and then finer binning. Only suitable for finite range of search and smaller dimension. • Random search : Instead of equally spaced points, generate point according to some function; better for large range of parameter space and larger dimensions. But, may not get any true minima. • Co-ordinate variation method or single parameter or one-by-one variation method Vary one parameter and get minima, then next one,…. Weakly correlated Strongly correlate Need large number of steps In case of strongly correlated variables, this is unacceptably slow 40 •Rosenbrock method : Single-parameter method but get the best direction after first steps of each dimension. Efficiency decreases with number of variables •Simplex method : Take n+1 points F(Ph) = max(F(p1), F(p2), ….) 1 n +1 A p P = − ∑ i h F(Plo) = min(F(p1), F(p2),…..) n i =1 New line through Plo and Ā Three operations can be used, reflection, contraction and expansion Reflecting Pn about Ā, P* = (1+α)Ā─αPh 1. If F(P*)<F(Plo), has produce a new minima and see next step P** = γP*+(1─γ)Ā if F(P**)<F(Plo), replace Ph by P** and restart if F(P**) ≥F(Plo), replace Ph by P* and restart 2. If F(P1o)≤F(P*)≤F(Ph), Ph = P* 3. If F(P*) ≥F(Ph) reflection failed, P* is unacceptable New point P** between Ph and Ā, such that P*=βPh+(1─β)Ā, 0<β<1 If F(P**)<F(Ph), Ph=P** If F(P**) >F(Ph) & F(P*), failed, all Pi are replaced by ½(Pi+plo) and restart whole procedure 41 Gradient Gradient method method :: Predict Predict new new points points relatively relatively far far from from the the las lastt point point •Steepest descent method : From P0 seek a minimum of parameter space, where function decreases most rapidly 2 1/ 2 ∂F ξi = ∂xi n ∂F / ∑ j =1 ∂x j i = 1,2,...., n In 2D, it is same as ‘one-by-one variation method’, but with a rotation of co-ordinate axis Very slow due to completely interdependency in F and choice of starting point far from true one •Newton’s method : Second degree Taylor expansion ∂F F ( x) = F ( x0 ) + ∂x 1 ∂2F ( x − x0 ) + 2 ∂x 2 xo ( x − x0 ) 2 + .... x0 F ( x) ≈ F ( x0 ) + g T ( x − x0 ) + (1 / 2)( x − x0 )T G ( x − x0 ) g i = (∂F / ∂xi ) and Gij = (∂ 2 F / ∂xi ∂x j ) Approximate a function around x0 by a quadratic surface. Calculate the minimum of the n dimensional parabola analytically xmin = x0 − G −1 g Is not true minima, but forming a new parabolic surface about xmin and calculating its minimum It requires G everywhere +ve definite, when it is negative artificially altered Disadvantage : Evaluation of G and inversion : large CPU time 42 Local Local vs vs Global Global minima minima •Obtained minima may be a local minima, but not a global minima. Change starting value far away from initial one and look for second minima,…… •Errors : From the second derivative of the function in the minimum 1 ∂2F F ( x) = F ( x0 ) + ( x − x0 ) 2 2 2 ∂x and 1 ∂2F 2 σ = 2 ∂ 2 x −1 ⇒ F ( x0 + σ ) = F ( x0 ) + 1 For correlated variable calculate all derivative and invert that second derivative matrix to obtain error 43 Generation Generation of of random random numbers numbers •They can take any value in a continuous range •One can not predict in advance which value will be taken •The distribution of the variables may be well known •Distribution of a random variable gives probability of a given value g (u )du = P (u < u ' < u + du ) •Probability distribution function given by •Integral distribution function u G (u ) = ∫ g (u )du →g (u ) = dG (u ) du −∞ •G(u) increases monotonically with u. Normalisation of g determined by +∞ ∫ g (u )du = 1 −∞ 44 • Truly random number : sequence of truly random numbers is completely unpredictable and hence irreproducible • Can be generated only through random physical processes, e.g. Radioactive decay, arrival time of cosmic muon • Pesudo random number : mid-square : Sequence generated according to a strict mathematical formula but indistinguishable from a sequence which is generated truly randomly • Start with a number of r digit • Take the middle r/2 digits , which is first random number • Square the number of r/2 digit and again • Take the middle of r/2 digit →second random number • Multiple linear congruential generator, build up a sequence ri = mod(ari-1+b,m) • Choose a,b,m carefully • Lower order bits much less random than higher order bits, e.g., for random numbers in the range 1-10 • J=1+int(10*r1) better in compare with • J=1+mod(100000*r1,10) 45 Transformation Transformation method method to to generate generate other other distribution distribution Suppose one has a random number function producing x uniformly in the range 0≤x≤1, which implies probability of generating a number between x and x+dx G(x)dx = dx for 0≤x≤1 =0 for x<0, x>1 And the probability distribution function is normalised ∫∞∞ g(x)dx=1 Now generate a random number with a given probability density function g(y)=f(y) with ∫∞∞ f(y)dy=1 Transformation law of probability |f(y)dy| = |g(x)dx| or f(y)=g(x)|dx/dy| So for a uniform random distribution[0,1], g(x)=1, implies dx = ∫f(y)dy ➾ x=∫f(y)dy +c Swap x & y Inverting this to get y=y(x) Simply, let a function f(x) ranging from 0 to 1 maps to another function f(y) ranging from 0 to 2. Cumulative function, ∫00.5 f(x)dx = ∫01 f(y)dy (other example, your x-coordinate is changing from cm to meter (100 time in x, but 1/100 times in f(x)) 46 Generation Generation of of some some simple simple distribution distribution •Isotropic angular distribution : f(φ)=k, x = ∫φ K dφ =Kφ + c At φ=0, x=0 and At φ=2π, x=1, thus K=1/2π, thus φ = 2πx •Exponential decay or absorption : f(τ) = (1/τ0)exp(─τ/τ0) x= ➾ τ =─τ0ln(1-x) = ∫τ(1/τ0)exp(─τ/τ0) = 1 ─exp(─ τ/τ0) ─τ0ln(x) • Probability distribution in continuous but not conventionally integrated cosθ analytically F (cos θ ) = ∫ f (cos θ )d (cos θ ) = x −1 Cumulative distribution F(cosθ) Integrate numerically and normalise F(1)=1 Construct a table of Cosθn with index f(cosθ) n(n=0,1,2,….N) where F(cosθ) = n/N. Table have N+1 entries Generate x, multiply x by N α = Integral part of Nx ─1 +1 Cosθ β = fractional part of Nx Cosθ(x) = Cosθα + β(Cosθα+1 ─Cosθα) 47 Normal/Gaussian Normal/Gaussian Distribution Distribution First consider the prob distribution function fR(r) 2 R Consider the cumulative distribution function r 2 R 0 If a random variable is distributed according to (1) then 1/ 2 1/ 2 f (r ) = r exp(−r / 2) 0<r <∞ (1) F (r ) = ∫ u exp(−(1 / 2)u ) = 1 − exp(−(1 / 2)r 2 = ξ r = [− 2 ln(1 − ξ )] = [− 2 ln(ξ )] Now consider the Gaussian distribution with mean zero and σ=1 2 1 exp(− y / 2) − ∞ < y < ∞ φ ' ( y / 0,1) = 2π In fact it will be easier to sample two independent Gaussian r.v. together 1 1 2 2 f ( x, y ) = φ ' ( x / 0,1)φ ' ( y / 0,1) = exp − ( x + y ) 2π 2 48 Transferring it to independent polar co-ordinate r,ϕ 1 1 2 φ ' ( x)φ ' ( y )dxdy = exp − r rdrdφ 2π 2 Here ϕ is uniformly distributed between 0 and 2π and may be sampled by ϕ=2πζ The pdf of r is r exp(−(1 / 2)r 2 ) thus x = [− 2 ln ξ1 ] cos(2πξ 2 ); 1/ 2 Hence r = [− 2 ln ξ1 ] ; 1/ 2 y = [− 2 ln ξ1 ] sin( 2πξ 2 ) 1/ 2 Use of central limit theorem : Sum of a large number of random variables will itself a normally distributed random variable The deviation of the mean of N random number (uniformly distributed) is σm2=1/12N, σm=1/√(12N) N xi 1 N xi 1 12 N N −1 12 y N x σ = − = − = − Hence the variable ∑ ∑ m ∑ i 2 N i =1 i =1 N 2 i =1 N 2 Will have mean 0 and σ=1 and will be normally distributed 49 Rejection Rejection and and acceptance acceptance • It is very general technique for sampling any probability distribution • Neither normalisation of the density nor any numerical/analytic integration need to be known explicitly to carry out the sampling • But sometimes it is too time consuming Pmax •Take two random numbers, x1[0,1] and x2[0,1] •Calculate f(x1) •If (x2*Pmax <f(x1) accept x1 else reject •Go for next event …… • Generate distribution f ( x) = 4 1 π 1+ x2 x 0 < x <1 P max • Transformation method : x π 4 1 4 −1 y=∫ dx = tan ( x ) ⇒ x = tan( y) 2 π 1+ x π 4 0 • Rejection and acceptance method : Pmax=4/π Verify these two methods f(x) =4/π f(x) 2/π 0 x 1 50 Special Special functions functions –– discrete discrete numbers numbers Binomial distribution n! p (r ; n, p ) = p r (1 − p ) n − r (n − r )!r! •K=0 and generate u uniform in (0,1) •Pk=(1-p)^n, B=Pk •If u<B accept r = k •Else K=K+1, Pk= Pk(p/(1−p)).(n-k)/(k+1) , B=B+Pk and go to previous step Landau distribution P(n; λ ) = e −λ λn n! • K=1, A=1, Generate u uniform in (0,1) • A= uA, if A<exp(−λ) accept nk = k−1 • Else k = k+1, generate new u and go to previous step • • • • • Or Like binomial distribution K=0, Generate u uniform in (0,1) A=exp(−λ) If (u<A) accept nk = k Else k=k+1, A = A + exp(−λ) λk/k! and go to previous step 51 Note -square method Note to to least/chi least/chi-square method • Choose suitable bin size, bin size may vary depending on statistics, merge bin contents to have at least 5 entries in a bin 2 2 2 1 1 1 σ ( x ) =E ( x − µ ) = E ∑ xi − µ = 2 E ∑ xi − nµ = 2 E ∑ ( xi − µ ) n i n i n i 2 [ 2 ] 2 2 ∑ ( xi − µ ) = ∑ ( xi − µ ) − ∑∑ ( xi − µ )( x j − µ ) i i j ≠i i σ 2 (x) = Error 1 E ∑ ( xi − µ ) n 2 i on Simply, x = s tan dard 2 1 = 2 n nσ ∑ E [( x − µ ) ] = n 2 i 2 i = σ2 n 2(n − 1) ∑σ 2 i = nσ 2 i ∂l 1 ( x − µ )2 = −∑ + ∑ i 3 σ ∂σ i σ i (x − µ) ∂l 1 = ∑ 2 − 3∑ i 4 2 σ ∂σ i σ i 2 = n σ 2 −3 nσ 2 σ 4 1 = 1 − σ 2t 2 2 σ deviation, σ (σ ) = 1 1 1 ∑ xi ⇒ ∂x = n ∑i ∂xi ⇒ δ 2 = n 2 n i 2 ϕ (t ) = ϕ (0) + ϕ ' (0)it − ϕ " (0)t 2 / 2!+...... =− 2n σ 1 2 ∫ y d cosθ = 1 −1 2 52
© Copyright 2024