Discriminant function analysis (DFA) ● ● ● Rationale and use of DFA The underlying model (what is a discriminant function anyway?) Finding discriminant functions: principles and procedures ● ● ● ● Linear versus quadratic discriminant functions Significance testing Rotating discriminant functions Component retention, significance, and reliability. Bio 8100s Multivariate biostatistics L9.1 Université d’Ottawa / University of Ottawa 1999 What is discriminant function analysis? Given a set of p variables X1, X2,…, Xp, and a set of N objects belonging to m known groups (classes) G1, G2,…, Gm , we try and construct a set of functions Z1, Z2,…, Zmin{m-1,p} that allow us to classify each object correctly. ● The hope (sometimes faint) is that “good” classification results (i.e., low misclassification rate, high reliability) will be obtained through a relatively small set of simple functions. Bio 8100s Multivariate biostatistics L9.2 Université d’Ottawa / University of Ottawa What is a discriminant function anyway? ● A discriminant function is a function: Z i = f i ( X 1 ,K, X p ) 1999 Group 1 Group 2 Frequency ● which maximizes the “separation” between the groups under consideration, or (more technically), maximizes the ratio of between group/within group variation. Bio 8100s Multivariate biostatistics Z1 (not so good) Group 1 Z2 Group 2 (better) L9.3 Université d’Ottawa / University of Ottawa 1999 1 The linear discriminant model ● For a set of p variables X1, X2,…, Xp, the general model is ● p Z i = ∑ aij X j j =1 ● where the Xjs are the original variables and the aijs are the discriminant function coefficients. ● Note: unlike in PCA and FA, the discriminant functions are based on the raw (unstandardized) variables, since the resulting classifications are unaffected by scale. For p variables and m groups, the maximum number of DFs is min{p, m-1}. Bio 8100s Multivariate biostatistics L9.4 Université d’Ottawa / University of Ottawa 1999 The geometry of a single linear discriminant function ● 2 groups with measurements of two variables (X1 and X2) on each object. X1 In this case, the linear DF Z* results in no misclassifications, whereas another possible DF (Z) gives two misclassifications. Misclassified under Z but not under Z* Z Z* Group 1 Group 2 X2 Bio 8100s Multivariate biostatistics L9.5 Université d’Ottawa / University of Ottawa Finding discriminant functions: principles ● ● The first discriminant function is that which maximizes the differences between groups compared to the differences within groups… …which is equivalent to maximizing F in a oneway ANOVA. F(Z) ● 1999 Z1 a = (a1,…, ap) F (Z ) = MS B ( Z ) , MSW ( Z ) Z1 = max{F ( Z )} Bio 8100s Multivariate biostatistics L9.6 Université d’Ottawa / University of Ottawa 1999 2 ● ● The second discriminant function is that which maximizes the differences between groups compared to the differences within groups unaccounted for by Z1... …which is equivalent to maximizing F in a one-way ANOVA given the constraint that Z1, Z2 are uncorrelated. F(Z) Finding discriminant functions: principles Z2 a = (a1,…, ap) F (Z ) = MS B ( Z ) , MSW ( Z ) Z 2 = max{F ( Z ) rZ1, Z2 = 0} Bio 8100s Multivariate biostatistics L9.7 Université d’Ottawa / University of Ottawa 1999 The geometry of several linear discriminant functions ● ● 2 groups with measurements of two variables (X1 and X2) on X1 each individual. Using only Z1, 4 objects are misclassified, whereas using both Z1 and Z2, only one object is misclassified. Group 1 Group 2 Z2 Z1 X2 Misclassified using only Z1 Misclassified using both Bio 8100s Multivariate biostatistics Z1 and Z2 L9.8 Université d’Ottawa / University of Ottawa xijk SSCP matrices: within, between, and x jk total ● The total (T) SSCP matrix (based on p variables X1, X2,…, Xp ) in a sample of objects belonging to m groups G1, G2,…, Gm with sizes n1, n2,…, nm can be partitioned into withingroups (W) and betweengroups (B) SSCP matrices: T=B+W 1999 Value of variable Xk for ith observation in group j Mean of variable Xk for group j xk Overall mean of variable Xk trc , wrcElement in row r and column c of total (T, t) and within (W, w) SSCP m nj trc = ∑∑ (xijr − xr )( xijc − xc ) j =1 i =1 m nj wrc = ∑∑ (xijr − x jr )( xijc − x jc ) j =1 i =1 Bio 8100s Multivariate biostatistics L9.9 Université d’Ottawa / University of Ottawa 1999 3 Finding discriminant functions: analytic procedures Calculate total (T), within (W) and between (W) SSCPs. Determine eigenvalues and eigenvectors of the product W-1 B. ● ● T=B+W λ ( B −1W ) = (λ1 , λ2 ,K, λ p ) λι is ratio of between to SS ( Z ) within SSs for the ith λi = B i discriminant function Zi… SSW ( Zi ) …and the elements of the corresponding eigenvectors are the discriminant function ξ ( B −1W ) = ( a , a ,K , a ) i i1 i2 ip coefficients. ● ● Bio 8100s Multivariate biostatistics L9.10 Université d’Ottawa / University of Ottawa 1999 G1 Assumptions ● Variable X1 Equality of within-group covariance matrices (C1 = C2 = ...) implies that each element of C1 is equal to the corresponding element in C2 , etc. X2 X3 X1 s12 X2 c21 s22 X3 c31 c32 s32 Variable X1 X2 X3 X1 s12 X2 c21 s22 X3 c31 c32 Covariance G2 s32 Variance Bio 8100s Multivariate biostatistics L9.11 Université d’Ottawa / University of Ottawa 1999 Quadratic Z1 The quadratic discriminant model X1 ● For a set of p variables X1, X2,…, Xp, the general quadratic model is Linear Z1 p Z i = ∑ aij X j + bij X i X j j =1 ● Group 1 Group 2 ● where the Xjs are the original variables and the aijs are the linear coefficients and the bijs are the 2nd order coefficients. X2 Because the quadratic model involves many more parameters, sample sizes must be considerably larger to get reasonably stable estimates of coefficients. Bio 8100s Multivariate biostatistics L9.12 Université d’Ottawa / University of Ottawa 1999 4 Fitting discriminant function models: the problems ● ● ● ● Goal: find the “best” model, given the available data Problem 1: what is “best”? Problem 2: even if “best” is defined, by what method do we find it? Possibilities: ■ ■ If there are m variables, we might compute DFs using all possible subsets (2m -1) of variables models and choose the best one use some procedure for winnowing down the set of possible models. Bio 8100s Multivariate biostatistics L9.13 Université d’Ottawa / University of Ottawa 1999 Criteria for choosing the “best” discriminant model ● ● ● Discriminating ability: better models are better able to distinguish among groups Implication: better models will have lower misclassification rates. N.B. Raw misclassification rates can be very misleading. ● ● Parsimony: a discriminant model which includes fewer variables is better than one with more variables. Implication: if the elimination/addition of a variable does not significantly increase/decrease the misclassification rate, it may not be very useful. Bio 8100s Multivariate biostatistics L9.14 Université d’Ottawa / University of Ottawa 1999 Criteria for choosing the “best” discriminant model (cont’d) ● ● Model stability: better models have coefficients that are stable as judged through crossvalidation. Procedure: Judge stability through crossvalidation (jackknifing, bootstrapping). ● ● NB.1. In general, linear discriminant functions will be more stable than quadratic functions, especially if the sample is small. N.B.2. If sample is small, then ”outliers” may dramatically decrease model stability. Bio 8100s Multivariate biostatistics L9.15 Université d’Ottawa / University of Ottawa 1999 5 Fitting discriminant function models: the problems ● ● ● ● Goal: find the “best” model, given the available data Problem 1: what is “best”? Problem 2: even if “best” is defined, by what method do we find it? Possibilities: ■ ■ If there are m variables, we might compute DFs using all possible subsets (2m -1) of variables models and choose the best one use some procedure for winnowing down the set of possible models. Bio 8100s Multivariate biostatistics L9.16 Université d’Ottawa / University of Ottawa 1999 Analytic procedures: general approach Evaluate significance of a variable (Xi) in DF by computing the difference in group resolution between two models, one with the variable included, the other with it excluded. Evaluate change in discriminating ability (∆ DA) associated with inclusion of the variable in question Unfortunately, change in discriminating ability may depend on what other variables are in model! ● ● ● ● ● ● Model A (Xi in) ∆ DA Model B (Xi out) Retain Xi (∆ large) Delete Xi (∆ small) Bio 8100s Multivariate biostatistics L9.17 Université d’Ottawa / University of Ottawa 1999 Strategy I: computing all possible models ● ● compute all possible models and choose the “best” one. Impractical unless number of variables is relatively small. {X1, X2, X3} {X1} {X1, X2} {X2} {X2, X3} {X3} {X1, X3} {X1, X2, X3} Bio 8100s Multivariate biostatistics L9.18 Université d’Ottawa / University of Ottawa 1999 6 Strategy II: forward selection ● ● ● ● ● ● (X1, X2, X3, X4 ) All variables F2 > F1, F3, F4 F2 > F to enter start with variable for which (< p to enter) differences among group (X2) means are the largest (largest F-value) F1 > F3 , F4 add others one at a time F1 > F to enter based on F to enter (p to (X1, X2) (< p to enter) enter) until no further significant increase in discriminating ability is achieved. F4 > F3; F4< F to enter problem: if Xj is included, it (> p to enter) stays in even if it contributes little to discriminating ability once other variables are included. (X1, X2) Final model Bio 8100s Multivariate biostatistics L9.19 Université d’Ottawa / University of Ottawa 1999 What is F to enter/remove (p to enter/remove) anyway? ● When no variables are in the model, F to enter is the F-value from a univariate one-way ANOVA comparing group means with respect to the variable in question, and p to enter is the Type I probability associated with the null that all group means are equal. ● When other variables are in the model, F to enter corresponds to the F-value for an ANCOVA comparing group means with respect to the variable in question, where the covariates are the variables already entered. Bio 8100s Multivariate biostatistics L9.20 Université d’Ottawa / University of Ottawa Strategy III: backward selection ● ● ● ● ● ● (X1, X2, X3, X4 ) 1999 All variables in F2 < F1, F3, F4 F2 < F to remove Start with all variables and (> p to remove) drop that for which differences (X1, X3, X4 ) among group means are the F1 < F3 , F4 smallest (smallest F-value) F1 < F to remove Delete others one at a time (> p to remove) based on F to remove (p to (X3, X4) remove) until further removal results in a significant reduction in the ability to discriminate groups. F4 < F3; F4 > F to remove problem: if Xj is excluded, it (< p to remove) stays out even if it contributes substantially to discriminating ability once other variables are (X3, X4) Final model excluded. Bio 8100s Multivariate biostatistics L9.21 Université d’Ottawa / University of Ottawa 1999 7 Canonical scores ● Because discriminant functions are functions, we can “plug in” the values for each variable for each observation, and calculate a canonical score for each observation and each discriminant function. Observation X1 X2 1 3.7 11.5 2 2.3 10.2 0.27 0.97 a= 0.92 0.39 Z11 = .027(3.7) + 0.97(11.5) Z12 = 0.92(3.7) + 0.39(11.5) Z 21 = .027(2.3) + 0.97(10.2) Z 22 = 0.92(2.3) + 0.39(10.2) Bio 8100s Multivariate biostatistics L9.22 Université d’Ottawa / University of Ottawa Canonical scores of group means Canonical scores plots ● 1 2 7.608 0.215 -1.825 -0.728 -5.783 0.513 1 2 3 Plots of canonical scores for each object. The better the model, the greater the separation between clouds of points representing individual groups, e.g. Fisher’s famous irises. 10 5 FACTOR(2) ● 1999 95% confidence ellipse 0 -5 -10 -10 -5 Bio 8100s Multivariate biostatistics 0 FACTOR(1) 5 10 L9.23 Université d’Ottawa / University of Ottawa 1999 Priors ● In standard DFA, it is assumed that in the absence of any information, the a priori (prior) probability φi of a given object belonging to one of I = 1,…,m groups is the same for all groups: φi = ● ● But, if each group is not equally likely, then priors should be adjusted so as to reflect this bias. E.g. in species with biased sex-ratios, males and females should have unequal priors. 1 m Bio 8100s Multivariate biostatistics L9.24 Université d’Ottawa / University of Ottawa 1999 8 Caveats: unequal priors ● ● For a given set of discriminant functions, misclassification rates will usually depend on the priors… …so that artificially low misclassification rates can be obtained simply by strategically adjusting the priors. ● So, only adjust priors if you are confident that the true frequency of each group in the population is (reasonably) accurately estimated by the group frequencies in the sample. Bio 8100s Multivariate biostatistics L9.25 Université d’Ottawa / University of Ottawa 1999 Significance testing ● ● Question: which discriminant functions are statistically “significant”? For testing significance of all r DFs for m groups based on p variables, calculate Bartlett’s V and compare to χ2 distribution with p(m-1) degrees of freedom 1 V = N − 1 − ( p + m) 2 r ×∑ ln(1 + λi ) i =1 λi Eigenvalue associated with ith discriminant function Bio 8100s Multivariate biostatistics L9.26 Université d’Ottawa / University of Ottawa 1999 Significance testing (cont’d) ● ● ● ● 1 Each DF is tested in a r hierarchical fashion by V = N −1 − 2 ( p + m) × ln(1 + λi ) i =1 first testing significance 1 of all DFs combined. r V1 = N −1 − ( p + m − 1) × ln(1 + λi ) If all DFs combined not 2 i =2 significant, then no DF is 1 r significant. V2 = N − 1 − ( p + m − 2) × ln(1 + λi ) 2 If all DFs combined are i =3 significant, then remove first DF and recalculate V (= V1) and test. 1 r Continue until residual Vj V j = N −1 − ( p + m − j ) × ln(1 + λi ) 2 i= j no longer significant at df = (p – j)(m – j - 1) ∑ ∑ ∑ ∑ Bio 8100s Multivariate biostatistics L9.27 Université d’Ottawa / University of Ottawa 1999 9 Caveats/assumptions: tests of significance ● ● ● Tests of significance assume that within-group covariance matrices are the same for all groups, and that within groups, observations have a multivariate normal distribution Tests of signficance can be very misleading because jth discriminant function in the population may not appear as jth discriminant function in the sample due to sampling errors… So be careful, especially if the sample is small! Bio 8100s Multivariate biostatistics L9.28 Université d’Ottawa / University of Ottawa 1999 Caveats/assumptions: tests of significance ● ● If stepwise (forward or backward) procedures are used, significance tests are biased because given enough variables, significant discriminant functions can be produced by chance alone. In such cases, it is advisable to (1) test results with more standard analyses or (2) use randomization procedures whereby objects are randomly assigned to groups. Bio 8100s Multivariate biostatistics L9.29 Université d’Ottawa / University of Ottawa 1999 Assessing classification accuracy I. Raw classification results Group Total ● ● The derived discriminant functions are used to classify all objects in the sample, and a classification table is produced. Classification accuracy is likely to be overestimated, since the data used to generate the DFs in the first place are themselves being classified. Group 1 2 1 43 5 48 2 8 14 22 Total 51 19 70 Misclassification (G1) = 5/48 Misclassification (G2) = 8/22 Overall misclassification = 13/70 Bio 8100s Multivariate biostatistics L9.30 Université d’Ottawa / University of Ottawa 1999 10 Assessing classification accuracy II. Jackknifed classification Group ● ● ● Total Discriminant functions are Group 1 2 derived using N – 1 objects, and 1 41 7 48 the Nth object is then classified. This procedure is repeated for all 2 9 13 22 N objects, each time leaving a Total 51 19 70 different one out, and a classification table produced. Misclassification (G1) = 7/48 In general, jackknifed classification results are worse Misclassification (G2) = 9/22 than raw classification results, Overall but more reliable. misclassification = 16/70 Bio 8100s Multivariate biostatistics L9.31 Université d’Ottawa / University of Ottawa 1999 Assessing classification accuracy III. Data splitting Group ● ● ● Use 2/3 of sample data (randomly) selected to generate discriminant functions (learning set) Use derived discriminant functions to classified other 1/3 (test set) and produce classification table. In general, data-splitting classification results are worse than both raw and jackknifed classification results, but more reliable. Total Group 1 2 1 40 8 48 2 9 13 22 Total 51 19 70 Misclassification (G1) = 8/48 Misclassification (G2) = 9/22 Overall misclassification = 17/70 Bio 8100s Multivariate biostatistics L9.32 Université d’Ottawa / University of Ottawa 1999 Assessing classification accuracy IV. Bootstrapped data splitting Group ● ● ● ● ● ● ● ● Use 2/3 of sample data (randomly sampled) to generate discriminant functions (learning set) Use derived discriminant functions to classify other 1/3 (test set) and produce classification results. Repeat a large number (e.g. 1000) times, each time sampling with replacement. Generate classification statistics over bootstrapped samples, e.g. mean classification results, standard errors, etc. Total Group 1 2 1 41.2 6.8 1.7 0 .6 2 9.3 12.7 0.5 1.1 22 Total 51 19 70 48 Misclassification (G1) = 14.2% Misclassification (G2) = 42.3% Overall misclassification = 23.0% Bio 8100s Multivariate biostatistics L9.33 Université d’Ottawa / University of Ottawa 1999 11 Interpreting discriminant functions ● ● Examine standardized coefficients (coefficients of discriminant functions based on standardized values) For interpretation, use variables with large absolute standardized coefficients. ● ● Examine the discriminant-variable correlations. For interpretation, use variables with high correlations with important discriminant functions. Bio 8100s Multivariate biostatistics L9.34 SEPALWID SEPALLEN ● Data: four variables (sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species). Problem: find the “best” set of DFs. 1999 PETALLEN ● Example: Fisher’s famous irises PETALWID Université d’Ottawa / University of Ottawa SEPALLEN SEPALWID PETALLEN PETALWID Bio 8100s Multivariate biostatistics L9.35 Université d’Ottawa / University of Ottawa Example: Fisher’s famous irises: betweenbetween-groups FFbetween-groups matrix ● Matrix entries are F – values from oneway MANOVA comparing group means, and can be considered measures of the distance between group centroids. 1999 Species ● Species 1 1 0.0 2 2 550.2 0.0 3 1098.3 105.3 3 0.0 N.B. do not use probabilities associated with F-tests to determine “significance” unless you correct for multiple tests. Bio 8100s Multivariate biostatistics L9.36 Université d’Ottawa / University of Ottawa 1999 12 Example: Fisher’s famous irises: canonical discriminant functions Four variables (sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species). ● Canonical discriminant functions Constant 1 2 2.105 -6.661 SEPALLEN SEPALWID PETALLEN PETALWID 0.829 0.024 1.534 2.165 -2.201 -0.932 -2.810 2.839 Note: discriminant functions are derived using equal priors. Bio 8100s Multivariate biostatistics L9.37 Université d’Ottawa / University of Ottawa Example: Fisher’s famous irises: standardized canonical discriminant functions ● Four variables (sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species). 1999 Standardized canonical discriminant functions SEPALLEN SEPALWID PETALLEN PETALWID 1 2 0.427 0.012 0.521 0.735 -0.942 -0.401 -2.810 0.581 Note: canonical discriminant functions are based on standardized values. Bio 8100s Multivariate biostatistics L9.38 Université d’Ottawa / University of Ottawa Example: Fisher’s famous irises: eigenvalues, eigenvalues, canonical correlations and cumulative dispersion ● Eigenvalues give amount of differences among groups captured by a a particular discriminant function, and cumulative proportion of dispersion is the corresponding proportion. 1999 Discriminant function Parameter 1 2 Eigenvalues 32.192 0.285 Canonical correlation 0.985 0.471 Cumulative proportion of dispersion 0.991 1.000 ● Canonical correlation is the correlation between a given canonical variate and a set of two dummy variables representing each group. Bio 8100s Multivariate biostatistics L9.39 Université d’Ottawa / University of Ottawa 1999 13 Fisher’s irises: raw and jackknifed classification Species results ● In this case, results are identical (a relatively rare occurrence!) % correct Species 1 2 3 1 50 0 0 100 2 0 48 2 96 3 0 1 49 98 Total 50 49 51 98 % correct Species Species 1 2 3 1 50 0 0 100 2 0 48 2 96 3 0 1 49 98 Total 50 49 51 98 Bio 8100s Multivariate biostatistics L9.40 Université d’Ottawa / University of Ottawa 1999 Dicriminant function analysis: caveats and notes ● ● Unless the ratio of number of objects/number of variables is large (> 20), standardized coefficients and correlations are unstable. DFA is unaffected by differences among variables in scale, so standardization is not required (unlike PCA, FA, etc.) ● ● Linear DFA is quite sensitive to the assumption of equality of covariance matrices among groups. If this assumption is violated, use quadratic classification. However, quadratic DFA is more unstable when N is small and normality does not hold. Bio 8100s Multivariate biostatistics L9.41 Université d’Ottawa / University of Ottawa 1999 14
© Copyright 2024