On the Interpretability of O-PLS Filtered Models Abstract Orthogonal PLS, introduced originally by Trygg and Wold in 2002 [1], is a patented algorithm that has received much attention for its perceived ability to simplify model interpretation. Since its introduction, it has been shown by Ergon [2] and Kemsley and Tapp [3] that results identical to the original O-PLS formulation can be obtained by post-processing conventional PLS models in a nonpatented way. This demonstrated, unequivocally, that O-PLS models have predictive properties identical to their non-rotated versions. The authors did not, however, consider the interpretability of the models at length. In this poster we explore the issue of interpretability of O-PLS models by applying the method to carefully constructed simple systems and well characterized data. Jeremy M. Shaver and Barry M. Wise Eigenvector Research, Inc. Binary Expression Simulation • 10 Expressed Proteins (variables), 500 Subjects • 1 "primary" effect with loading: 1 2 3 4 5 6 7 8 9 10 + + x x 0 0 0 0 0 0 (must have +, cannot have x, 0 have no effect) • 7 "background" effects (rank 1 patterns with positive and negative correlations as for primary) O-PLS Simple System Example Pseudo Gasoline Example • Synthetic example with three constituents 50 100 • Solve for pure components via CLS • Vary correlation between analyte and interferents 150 • Use pure spectra to create synthetic scenarios - Start with orthogonal concentrations, then 1) All analytes positively correlated 2) One interferent positive, three negative 0.045 Interferent 1 Analyte Interferent 2 0.04 Positive • Five component mixture measured by NIR • Evenly spaced Gaussian peaks, analyte in middle Orthogonalize Model • Only samples with primary loading expression for 1-4 (after mixing all effects) will exhibit property of interest (e.g. disease) 0.035 200 250 Negative 300 350 400 0.03 Example Set 1 450 0.025 500 1 2 3 4 5 6 7 8 9 10 0.02 20 40 60 80 100 120 140 160 180 200 1) Gaussian Peaks Scenarios - 1) Go from orthogonal (r=0) to positively correlated concentrations (r=1) - 2) One interferent positively correlated, one negatively correlated • What kinds of sensitivities does O-PLS have to noise, rotational ambiguity, and correlated signals? First O-PLS Component Loading • What do we need to be aware of when interpreting O-PLS recovered components? 2) First O-PLS Component Loading 1) Co rre la tio nr 2) Co rre la ble Varia tio nr b Varia le Interferent Concentration 2.3 Samples/Scores Plot of spec1 30 0.015 28 26 24 Y CV Predicted 1 0.02 5 Chemical Components: • Heptane • Toluene • Xylene • Decane • Iso-Octane Mixtures at known concentrations Pure Component Spectra from CLS shown 0.01 Regression Coefficient First O-PLS Component reflects pure component response when r = 0 but changes rapidly when analyte and interferent are correlated, even for r = 0.5, R2 = 0.25! NIR of Pseudo-Gasoline Samples R2 = 0.988 5 Latent Variables RMSEC = 0.46945 RMSECV = 0.68306 Calibration Bias = ï1.0658eï14 CV Bias = 0.0375 2.2 2.1 2 1.9 1.6 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 12 100 Cor re 50 latio nr ngth Wavele (nm) 0 ï50 ï150 800 Regression vector is the same for both scenarios, and all values of r except 1. 900 1000 1100 1200 1300 1400 1500 1600 Wavelength (nm) tio nr ble Varia Regression vector is the same for both scenarios, and all values of r except 1. Percent Variance Captured by Regression Model -----X-Block----- -----Y-Block----Comp This Total This Total ---- ------------- ------------1 91.17 91.17 8.36 8.36 2 7.40 98.57 7.19 15.55 3 0.93 99.50 32.81 48.36 4 0.46 99.96 26.18 74.54 5 0.02 99.98 24.90 99.44 14 150 ï100 Co rre la 2.4 Analyte Concentration 20 16 nr ) h (nm gt elen v a W 1.8 Note: this is what a correlation of r = 0.5 looks like! 0.005 tio 1.7 22 18 First O-PLS Loading varies strongly with analyte correlations! Regression Coefficient Questions Co rre la Example Set 3 0 First O-PLS Component Loading 0 Example Set 1 0.005 Example Set 2 0.01 First O-PLS Component Loading 0.015 Conclusions 0 900 1000 1100 1200 1300 Wavelength (nm) 1400 1500 10 10 1600 12 14 16 18 20 22 Y Measured 1 24 26 28 30 Compare to Selectivity Ratio Regular and O-PLS Filtered Regression Vectors Variables/Loadings Plot for spec1 Variables/Loadings Plot for spec1 25 1.5 Kvalheim’s Selectivity Ratio not a strong function of correlation in analyte concentrations, correctly picks variables that are most selective for analyte of interest 20 15 1 Regular PLS Reg Vector for Y 1 Reg Vector for Y 1 5 0 ï5 ï10 0 ï0.5 ï1 ï15 ï1.5 ï20 ï25 800 O-PLS Filtered 0.5 10 900 1000 1100 1200 Variable 1300 1400 1500 1600 ï2 800 Pure Component Spectrum 900 1000 1100 1200 Variable 1300 • O-PLS does simplify regression vectors. It is CLOSER to underlying bilinear response … • … HOWEVER, result generally NOT the same as a first principles model. Selectivity Ratio (normalized) 800 • O-PLS results are a strong function of the correlation in the concentrations Co rre la 1400 1500 1600 tio nr ble Varia • O-PLS recovered component is more sensitive to chance correlation than regression vector (Problem seen even with 2000 samples!) References [1] J. Trygg and S. Wold, “Orthogonal Projections to Latent Structures (O-PLS),” J. Chemo, 16, 119-128, 2002. [2] R. Ergon, “PLS post-processing by similarity transformation (PLS+ST): a simple alternative to OPLS,” J. Chemo, 19, 1-4, 2005. [3] E.K. Kemsley and H.S. Tapp, “OPLS filtered data can be obtained directly from non-orthogonalized PLS1,” J. Chemo, 23, 263-264, 2009
© Copyright 2024