Analytica Chimica Acta 363 (1998) 261±278 Metal complexation model identi®cation and the detection and elimination of erroneous points using evolving least-squares ®tting of voltammetric data BozÏidar S. GrabaricÂ1,*, Zorana GrabaricÂ1, Jose Manuel DõÂaz-Cruz, Miquel Esteban, Enric Casassas Department of Analytical Chemistry, University of Barcelona, Av. Diagonal 647, E-08028 Barcelona, Spain Received 12 November 1997; received in revised form 27 January 1998; accepted 2 February 1998 Abstract The experimental errors and their propagation are very often neglected when using voltammetric data for chemical model identi®cation, consecutive stability-constants determination and speciation studies. The in¯uence of the experimental error has been analyzed for two mathematical models used very frequently, viz. the Leden±DeFord±Hume and the van Leeuwen mathematical models. It was demonstrated using simulated noise-free and noise-corrupted data that relatively small and usually occurring experimental errors, in half-wave or peak potential, can blur the initially assumed chemical model or can lead to a set of incorrect stability constants. In order to minimize the in¯uence of error on model identi®cation and parameter estimation, an evolving least-squares ®tting (ELSQF) procedure is proposed which makes use of progressively increasing (in forward and backward directions) data window size. At the same time, a procedure for detection and elimination of erroneous points is introduced enabling more reliable estimation of the parameters that describe the metal-ion±ligand complexation systems. The proposed procedure was tested on experimental data of consecutively formed Pb(II) 2-hydroxypropanoates, obtained by differential pulse polarography (DPP) in aqueous solution of constant ionic strength, I2 M (NaClO4), pH 5.7 and constant temperature, t(231)8C, when investigating the in¯uences of errors in the Leden±DeFord±Hume model. Data obtained by differential pulse anodic stripping voltammetry (DPASV) for the interaction of Zn(II) with a macromolecular ligand, the anion of polymethacrylic acid (PMA) at constant degree of neutralization, d0.8, and at two different concentrations of supporting electrolyte, c(KNO3)0.04 and 0.10 M, are used to demonstrate the in¯uence of error for the model proposed by van Leeuwen's group. In both cases, the ELSQF approach gave clear and unambiguous complexation model identi®cation and reliable parameters evaluation with, or without detection and elimination of erroneous points. # 1998 Elsevier Science B.V. Keywords: Anodic stripping differential pulse voltammetry; Metal-ion±Ligand complexation model identi®cation; Consecutive complexes; Differential-pulse polarography; Evolving least-squares ®tting; Lead(II) 2-hydroxypropanoates; Metal-ion speciation; Stability-constant determination; Zinc(II) polymethacrylates *Corresponding author. Tel.: +34 3 402 1286; fax: +34 3 402 1233; e-mail: [email protected] 1 On leave from the Faculty of Chemical Engineering and Technology, University of Zagreb, Zagreb, Croatia. 0003-2670/98/$19.00 # 1998 Elsevier Science B.V. All rights reserved. PII S0003-2670(98)00143-3 262 B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 1. Introduction Voltammetric techniques and methods are very often used for studying the metal-ion complexation and speciation with different ligands in natural aquatic environment [1±3], life sciences [4], pharmaceutical and all other branches of the chemical industry [5]. Using these techniques, chemical-model identi®cation and stability-constants determination are usually performed, very often neglecting the fact that the errors in experimental measurement as well as the those introduced during data evaluation, in most circumstances, can strongly in¯uence the model identi®cation and its validation. Recently, it was demonstrated [6] using the Leden±DeFord±Hume model [7,8], and both simulated and experimental data, that relatively small overall errors in voltammetric measurement and evaluation of half-wave or peak potentials (<1 mV) and in limiting or peak currents (<2%) can lead to a wrong model identi®cation and to inaccurate evaluation of consecutive stability constants, if several constraints based on physicochemical reasoning and error propagation are not taken into account. Regardless of the existing computerised instrumentation which minimises the possible human error, there are still many sources which can generate experimental errors and which cannot be compensated by the computer. On the other hand, during the evaluation of characteristic parameters of voltammograms (limiting or peak current, half-wave or peak potential and reversibility parameters), an error may be introduced merely because of the use of approximate methods for background current correction, not to mention the fact that very often many approximate graphical parameter estimation methods are still in use. Such overall errors can contribute to inappropriate model identi®cation, generate unreliable parameters of the chemical system investigated, and in some circumstances lead to physicochemically meaningless parameters or a complete blurring of the chemical model. The most frequent evidences of the strong in¯uence of errors on model identi®cation and stability-constant determination using conventional LSQ methods are: 1. high standard error of evaluated parameter, which by statistical tests usually suggests that the given parameter statistically does not differ from zero; 2. physicochemical meaningless values of evaluated parameters (e.g. negative values of stability constant); and 3. divergence in iterative optimisation procedure. In a previous paper [6], the concept of forward evolving least-squares ®tting (ELSQF) of a polynomial to the experimental data obtained using the Leden±DeFord±Hume model was introduced, which clearly showed that the failure of conventional polynomial ®tting to experimental data, in many circumstances, is simply due to the errors in measurement and evaluation of characteristic parameters of voltammograms (half-wave or peak potential and limiting or peak current). Therefore, without a rigorous errorpropagation analysis of the experimental data and the implication of the error level on the numerical and statistical procedures used, ambiguous results can be obtained. This so-called hard modelling ambiguity can be resolved by: (i) using some additional criteria or constraints because numerical and statistical methods cannot give the characterisation of the investigated chemical system beyond the level of the overall experimental error; and/or (ii) performing many replicate measurements in order to decrease the statistical error. In the present paper, both forward and backward ELSQF are applied as a general tool using two different mathematical models, and the procedure for the detection and elimination of erroneous points is proposed. Using the proposed overall procedure, more reliable chemical system identi®cation and parameters evaluation are obtained. Simulated data and errors were treated together with experimental voltammetric data obtained for the interaction between Pb(II) ions and 2-hydroxypropanoates, and that between Zn(II) ions and polymethacrylates. 2. Theoretical part The concept of parameters optimisation using leastsquares ®tting (LSQF) of a mathematical function to the experimental data is well known [9]. Its goal is to ®nd those parameters that minimise a selected func- B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 tion with respect to the experimental data set. This is usually done by taking i(1,2,. . .,n) observations, yobs,i, and de®ning a model which correlates each observation yobs,i with a calculated value ycalc,i: yobs;i ycalc;i ei i 1; 2; . . . ; n (6) (1) where ei is the experimental error. The calculated value ycalc,i will depend on the g number of parameters, mg, and on the independent variable xi: ycalc;i f mg ; xi (i) Hamilton R-factor (HRF): s Pn yobs;i ÿ ycalc;i 2 i1P HRF n 2 i1 yobs;i 263 (ii) Akaike information criterion (AIC): (P ) n 2 y ÿ y obs;i calc;i i1 AIC n ln 2m nÿm (7) (2) The assumption in this model is that yobs,i values contain experimental error, while xi values are error-free. Another important assumption present in this model is that the errors are all equal and independent of each other. Neither of these assumptions perfectly hold in most experiments; moreover, electrochemical measurements are especially sensitive to the in¯uence of errors due to their complexity. By the method of least squares, a sum of squared residuals: (iii) Mean quadratic error of prediction (MQEP) which is a measure of predictive ability of the proposed model and is defined as: MQEP n 1X yobs;i ÿ ycalc;i 2 n i1 (8) (5) The ELSQF approach proposed [6] and generalised in this paper, i.e. the forward and backward evolving least squares, consists of analysing ± by the least squares ± experimental data pairs, starting with a data window containing the statistically required minimal number of data points, wmin. Then, the same procedure is repeated using a progressively increasing window size of data points, up to maximal or total number of experimental data points, wmax. In this way, each of the `best' (in the sense of least squares) parameters, mh, is obtained as a function of window size or an independent variable, i.e. a set of the `best' parameters, mg,h (where g denotes the gth parameter and h denotes the progressively increasing window size in forward or backward direction, hwmin to wmax), are obtained. The main advantages of the ELSQF approach are: where Wi,k are statistical weights and inverses of the variances and covariances Mi,k. By the conventional LSQ approach, all the data pairs within one (or more) experiment(s) are taken and the selected function is minimised. The correct choice of this function is very important for obtaining reliable and physicochemically meaningful results. In the case when there are several possible chemical models (e.g. consecutive complex formation), the correct model is identi®ed using some statistical criteria and physicochemical reasoning. The statistical criteria used in this paper are [10]: (i) any trend deviating from the expected constancy of the parameters or inadequate selection of the function to be minimised can be easily visualised, which enables simple and reliable complexation model identification; (ii) in most cases, erroneous points can be determined and eliminated using statistical criteria; and (iii) any gth parameter is obtained as the mean of mhs (with or without erroneous points elimination), and therefore it is more reliable then those obtained from only one experimental data window. ri2 yobs;i ÿ ycalc;i ; (3) is minimised (least squares): S n X i1 ri2 min (4) A more general function to be minimised is the one which accounts for different errors assuming that errors for each pair of observations are connected through a covariance term: S n X n X ri Wi;k rk min i1 k1 B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 264 In the ideal case, parameters mg,h, obtained by conventional or by the evolving LSQ approaches would be practically the same. However, simulating the voltammetric data and possible errors, it was demonstrated that the conventional approach: (i) can fail in model identi®cation, (ii) does not converge, or (iii) produces unreliable parameters, when the errors are relatively high although experimentally acceptable. Of course, this depends on the model function used and on the values of parameters to be evaluated. In the case of the Leden±DeFord±Hume model the error in half-wave or peak potential of 1 mV and that in current of 2% blurred the chemical model of consecutively formed weak labile complexes of lead(II) propanoates, while reliable results could be obtained [6] using forward evolving least-squares ®tting. The Leden±DeFord±Hume mathematical model proposed for consecutive complex formation between metal ion and small simple ligands (Dmetal ionDligand) can be expressed as [7,8]: n X zF Ep;i ÿ ln i j Lji exp F0;i 1 RT i1 (9) Fj;i Fjÿ1;i ÿ jÿ1 Li (10) where j represents the number of species in the system. So far this model was investigated without the analysis made in backward ELSQ direction and without the procedure for erroneous points detection and elimination. In Eqs. (9) and (10), j[MLj]/ ([M][L]j), i.e. cumulative stability constants, Ep,i (Ep,freeÿEp,i,complexed) and iIp,i,complexed/Ip,free. Square brackets with symbols of species denote their equilibrium concentrations. The second model investigated is the one proposed by van Leeuwen [11±14] and is more general because it holds for metal-ion complexation with macromolecular ligand (Dmetal ionDligand). This model can be expressed as: ! P j p 1 m i1 DMLj =DM j Li (11) i P j 1 m j1 j Li and, for the case that Dmetal ionDligand, it can be shown that this model is equivalent to the DeFord- Hume models. For 1 : 1 metal-ion-to-ligand stoichiometry, Eq. (11) is usually written as: 1 "1 Li p i (12) 1 1 Li where " is the ratio of diffusion coef®cients of the complexed and free metal ions, "DML/DM, and 1[ML]/([M][L]). The empirical parameter p depends on hydrodynamic conditions in the pre-electrolysis step of stripping techniques which is 1/2 for semi-in®nite linear diffusion and 2/3 for laminar convective diffusion. Both models assume the following: (i) the electron transfer reaction between oxidised and reduced metal ion is sufficiently fast to make the system reversible on the time scale of the technique; (ii) the complex ML formed is not electroactive; (iii) the ligand reacts only with metal ion; (iv) the ligand concentration is in large excess over the metal-ion concentration and the ligand equilibrium concentration is approximately equal to the bulk ligand concentration i.e. [L]cL; (v) in the case that the ligand is a macromolecule with many coordinating groups, ligand L represents only one site in the macromolecule; and (vi) the adsorption of electroactive species on the electrode surface is absent. 3. Experimental part 3.1. Chemicals All chemicals were of analytical reagent grade and were used without additional puri®cation. Ultrapure water used for solution preparation was obtained from a Culligan (Spain) water puri®cation system. Lead and zinc salts were Titrisol (Merck). Sodium 2-hydroxypropanoate, sodium perchlorate and potassium nitrate were of p.a. grade (Merck). Polymethacrylic acid (PMA) solution was reagent grade (BDH) (average molar mass 26 000 g/mol) and it was used to prepare stock solution of 0.1 mol/l in water (in monomers). The total number of carboxylic groups was determined by conductometric acid±base titration. B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 3.2. Instrumentation Differential pulse (DP) polarograms were recorded using a drop time of 0.8 s, pulse duration of 50 ms, pulse amplitude of ÿ50 mV and scan rates of 4 mV/s, with an Autolab System (Eco Chemie, The Netherlands) attached to a Metrohm 663 VA stand. The mercury electrode in the static mercury-drop mode was used for working electrode, Ag|AgCl|(3 M KCl) was used together with an electrolytic bridge containing 2 M NaClO4, as reference electrode, while glassy carbon was used as auxiliary electrode. All solutions were initially deaerated using nitrogen for 15 min, and for 5 min after each ligand solution addition. Measurements were performed at constant ionic strength, I2 M (NaClO4), constant pH5.7 and constant temperature t(231)8C. DP polarograms were corrected for background current [15] and peak potentials and peak currents evaluated using parabolic ®tting of the points around the peak [6]. The peak potentials were determined within the same DP polarogram to 0.3 mV, but the overall reproducibility was not better than 0.8 mV. The peak current was determined within the same polarogram to 0.7%, but the overall reproducibility was 2% at 60 mM concentration level of Pb(II) ions. All DP polarograms showed reversible behaviour having a peak potential halfwidth of 481 mV. In the voltammetric titrations of Zn(II) with PMA differential pulse anodic stripping voltammograms (DPASV) were obtained with 646 VA Processor (Metrohm) and 663 VA Stand (Metrohm) attached to a Dosimat 665 (Metrohm) for the automatic addition of the titrant solutions. A pulse duration of 40 ms, a pulse height of 50 mV, and a deposition potential of ÿ1200 mV were used. The pre-electrolysis time and the rest period used were 1 and 0.5 min, respectively. The scan rate in the stripping step was 10 mV/s. Measurements were taken at (251)8C after deaeration with puri®ed nitrogen. Acid±base conductometric titration of PMA stock solutions were performed in an Orion cell (k1.03 cmÿ1) with an Orion 120 microprocessor conductivity meter. More details on solution preparation and experimental procedure can be found in Refs. [6,16]. All calculation were performed using ELCHEM toolbox for MATLAB written in this laboratory for 265 electrochemical signal simulation, visualisation and manipulation [17,18]. 4. Results and discussion 4.1. Leden±DeFord±Hume model The error propagation and their in¯uence on chemical model identi®cation using the Leden±DeFord± Hume model and conventional polynomial ®tting procedure was reported in a previous paper [6]. In the same paper, the advantage of the forward EPLSQF on simulated and experimental data of consecutively formed Pb(II) propanoate complexes was demonstrated. In this paper, the backward as well as the forward EPLSF were investigated, and the procedure for detection and elimination erroneous points introduced. The lead(II) 2-hydroxypropanoate was chosen as chemical model system and the values of consecutive stability constants selected for simulation were: 1130 Mÿ1, 21300 Mÿ2 and 33600 Mÿ3, as they are close to the ones obtained experimentally for the investigated system. The theoretical error-free F0,i function was calculated according to Eq. (9). Forward and backward EPLSQF, to this error-free F0,i function, were performed assuming different chemical models (i.e. number of complex species, j1,2,. . .,m). The results obtained can be classi®ed in the following three patterns: 1. Underestimating the number of species de®ning the chemical model, the plot of j vs. [L] does not give constant values for progressively increasing data window sizes. In Fig. 1, j values together with corresponding error bars, obtained by forward (`f') and backward (`b') EPLSQF assuming a chemical model having only one complex species (j1), are shown, while in Fig. 2 the results obtained when the assumed chemical model has two consecutive complexes (j1 and 2) are plotted. From the results shown in Figs. 1 and 2, both chemical models can be easily discarded because the values of the stability constants are not constant at all. The standard errors obtained using error-free simulated data are a consequence of selecting and ®tting the wrong chemical model. The conventional PLSQF uses 266 B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 Fig. 1. Values of 1 obtained by EPLSQF of error-free F0 Leden function simulated using 1130 l/mol, 21300 (l/mol)2 and 33600 (l/ mol)3, assuming only one complex species. Results marked with `f' represent forward and those marked with `b' backward evolving direction. Fig. 2. Values of 1 and 2 obtained by EPLSQF of error-free F0 Leden function simulated using 1130 l/mol, 21300 (l/mol)2 and 33600 (l/mol)3, assuming two complex species. Results marked with `f' represent forward and those marked with `b' backward evolving direction. only one data window with all the points in it. This approach gives only one point for j (the last point at highest ligand concentration in Figs. 1 and 2), and the only way for model identi®cation is to use the statistical criteria and physicochemical reason- ing. On the contrary, using EPLSQF, the increase in values of the stability constants, obtained with different ligand concentration windows, can easily disqualify the wrong chemical model. It is interesting to note that the extrapolation of j B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 267 Fig. 3. Values of 1, 2 and 3 obtained by EPLSQF of error-free F0 Leden function simulated using 1130 l/mol, 21300 (l/mol)2 and 33600 (l/mol)3, assuming three complex species. Results marked with `f' represent forward and those marked with `b' backward evolving direction. values of the forward curves to zero ligand concentration, tends towards the `true' value selected to simulate the Leden polynomials. 2. If the model is correctly identified, constant values for all stability constants are obtained independently of the window size in both, forward and backward EPLSQF directions. Error bars are not visible on the scale of the graph shown in Fig. 3, when the error-free simulated data were evaluated. Of course, with experimental error present, this constancy will not be so perfect, but the EPLSQ approach will easily identify the correct chemical model, even in the presence of errors. Moreover, this approach offers another advantage in visualisation, detection and elimination of the erroneous points. The procedure for the latter will be explained in Section 4.2. 3. Although the number of species may be overestimated, the values of j vs. [L] will be constant and correct, as in Fig. 3, only the values of j for `nonexistent' higher complex(es) will have very small values close to zero in the case of error-free data or negative values. Using simulated error-corrupted and experimental data, the discrimination between higher j values is made according to lower number of points that fall within the selected interval. To demonstrate the effect of errors in measurement and evaluation of peak potentials and peak currents when the Leden±DeFord±Hume model is used, random noise with the same initial pattern is added to the error-free Ep,i, having the mean 0 (<10ÿ3 mV) and standard deviations of 0.3, 0.6 and 1.8 mV, respectively. Random noise with a mean value of 0 and a standard deviation of 1% of the peak current is added to the i function, as well. The stability constants obtained, with noise-free and noise-corrupted F0 functions, are shown in Table 1. From the results, shown in Table 1, it can be seen that the correct chemical model cannot be identi®ed by any of the three statistical criteria used (minimum of HRF, AIC and MQEP) unambiguously and that no acceptable set of stability constants can be obtained at any error level. One way of overcoming the in¯uence of the error on ®tting the polynomial to the experimental data is to calculate the higher order Leden functions according to Eq. (10), which is a polynomial one degree lower than F0, and to perform the conventional PLSQ ®tting. The results given in Table 2 show that acceptable sets (bold) of stability constants can be obtained just by ®tting of F1 polynomial, which is a simple transformation of the F0 polynomial shown in Eq. (10). When 3.11 3.11 0.00 0.00 3.56 3.56 0.00 i<0 i<0 log 4SE f b Maximal number of complex species. Absolute difference between initially assumed and obtained value of stability constants in log units; c Hamilton R-factor. d Akaike information criterion. e Mean quadratic error of prediction. f SEstandard error. a F0 function (errors in Ep1.8 mV and in Ip1%) 1130 Mÿ1; 21300 Mÿ2; 33600 Mÿ3 1 3.030.03 0.92 0.47 2 i<0 3.970.11 0.41 3 2.710.18 0.60 i<0 3.990.18 0.88 i<0 4.770.15 4 i<0 5 1.771.12 0.34 2.811.35 0.03 4.420.65 0.86 i<0 F0 function (errors in Ep0.6 mV and in Ip1%) 1130 Mÿ1; 21300 Mÿ2; 33600 Mÿ3 1 3.020.03 0.91 3.550.02 0.44 2 i<0 3 2.440.13 0.33 2.290.62 0.82 3.760.07 0.20 3.660.15 0.55 i<0 4.450.15 4 i<0 5 1.930.70 0.18 3.140.70 0.03 3.990.66 0.43 i<0 F0 function (errors in Ep0.3 mV and in Ip1%) 1130 Mÿ1; 21300 Mÿ2; 33600 Mÿ3 1 3.010.03 0.90 3.540.01 0.43 2 i<0 3 2.340.10 0.23 2.800.20 0.31 3.690.05 0.13 3.510.13 0.40 i<0 4.130.15 4 i<0 5 2.000.49 0.11 3.140.53 0.03 3.850.61 0.29 i<0 0.00 0.00 2.11 || b 5 log 3SE f 0.00 || b 2.11 log 2SE f 4 || b 0.00 log 1SE f F0 function (error-free) 1130 Mÿ1; 21300 Mÿ2; 33600 Mÿ3 1 3.010.03 0.90 3.530.01 0.52 2 i<0 3 2.11 0.00 3.11 0.00 3.56 ma 4.77 4.45 4.13 || b 5.260.38 4.800.42 4.560.42 ÿ6.021.30 log 5SE f 5.26 4.80 4.56 6.02 || b AIC d 26.3 5.9 4.1 3.74 3.72 26.3 5.9 4.1 3.74 3.72 25.7 4.4 2.5 2.28 2.26 250 154 134 131 133 250 154 134 131 133 248 134 100 97 100 710ÿ10 ÿ1347 310ÿ10 ÿ1412 25.0 245 2.7 101 910ÿ12 ÿ1637 HRF c 1622 81 40 33 32 1622 81 40 33 32 1526 45 14 12.0 11.8 1410 16 2 10ÿ22 2 10ÿ19 1 10ÿ18 MQEP e Table 1 Log j values and corresponding standard errors obtained by conventional PLSQF of simulated error-free and error-corrupted F0 function for different chemical models (number of consecutively formed complexes j1,2,. . .,m; stability constants without standard deviation are obtained from simulated error-free data; standard error<10ÿ10 log units) 268 B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 3.11 3.11 0.00 0.00 3.56 3.56 i<0 3.550.24 i<0 0.03 0.88 0.60 3.770.24 i<0 33600 Mÿ3 0.01 0.33 0.45 33600 Mÿ3 0.00 i<0 log 4SE f b Maximal number of complex species. Absolute difference between initially assumed and obtained value of stability constants in log units. c Hamilton R-factor. d Akaike information criterion. e Mean quadratic error of prediction. f Standard error. a F1 function (errors in Ep1.8 mV and in Ip1%) 1130 Mÿ1; 21300 Mÿ2; 33600 Mÿ3 2 1.700.17 0.41 3.440.02 0.33 3 2.110.09 0.00 3.090.13 0.02 3.630.10 0.07 4.170.24 4 1.950.16 0.16 3.400.15 0.39 i<0 5 2.131.18 0.02 2.810.83 0.30 3.990.85 0.43 i<0 F1 function (errors in Ep0.6 mV and in Ip1%) 1130 Mÿ1; 21300 Mÿ2; 2 1.760.10 0.35 3.420.01 0.31 3 2.110.04 0.00 3.100.05 0.01 3.590.05 4 2.060.06 0.05 3.250.09 0.14 2.680.80 5 2.140.06 0.03 2.750.37 0.36 4.160.19 F1 function (errors in Ep0.3 mV and in Ip1%) 1130 Mÿ1; 21300 Mÿ2; 2 1.780.09 0.33 3.420.01 0.31 3 2.110.02 0.00 3.110.03 0.00 3.570.03 4 2.080.03 0.03 3.200.06 0.09 3.230.27 5 2.130.04 0.02 2.930.19 0.18 4.010.17 0.00 0.00 2.11 || b 5 log 3SE f 0.00 || b 2.11 log 2SE f 4 || b 0.00 log 1SE f F1 function (error-free) 1130 Mÿ1; 21300 Mÿ2; 33600 Mÿ3 2 1.800.07 0.31 3.410.01 0.30 3 2.11 0.00 3.11 0.00 3.56 ma 4.17 3.77 3.55 || b 4.331.28 4.860.19 4.640.19 01.510ÿ6 log 5SE 4.33 4.86 4.64 0 || b 15.3 12.6 12.2 11.5 9.6 5.0 4.9 4.6 8.5 3.1 3.0 2.8 297 288 289 291 264 225 226 226 257 192 193 192 ÿ1359 ÿ1575 710ÿ12 210ÿ10 249 ÿ1774 AIC d 7.7 410ÿ13 HRF c 6836 4612 4340 3840 2515 694 651 582 1979 253 237 212 1586 3 10ÿ24 1 10ÿ21 8 10ÿ19 MQEP e Table 2 Log j values and corresponding standard errors obtained by conventional PLSQF of simulated error-free and error-corrupted F1 function for different chemical models (number of consecutively formed complexes j2,3,. . .,m; stability constants without standard deviation are obtained from simulated error-free data (standard error <10ÿ10 log units)) B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 269 270 B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 Fig. 4. Values of 1 with corresponding error bars obtained by forward EPLSQF of error-corrupted (error equivalent to 0.3 mV in peak potential and 1% in peak current) F0 Leden function simulated using 1130 l/mol, 21300 (l/mol)2 and 33600 (l/mol)3 assuming three complex species. Result marked with `o' represents the value obtained by conventional PLSQF. using statistical criteria (HRF, AIC and MQEP), it seems that the most reliable and robust criterion for chemical-model identi®cation is the Akaike information criterion, although neither of the used criteria gave an unambiguous model identi®cation, when error-corrupted data were used. The other approach to minimise the error in¯uence is the use of EPLSQF. Fig. 4 shows clearly why EPLSQF is preferred over the conventional LSQF. In this ®gure, the 1 values obtained by forward EPLSQF and error level in peak potential of 0.3 mV are shown together with corresponding standard errors. The conventional PLSQF of F0 polynomial on experimental data will give the result shown as the point at the highest ligand concentration (point marked `o'), which even for the smallest error level in Ep (0.3 mV) does not give an acceptable set of stability constants (see Table 1). Moreover, any data window size can be chosen in the experimental design, which means that, using conventional PLSQF, different values of j shown in Fig. 4 might be obtained, and some of these ¯uctuations are unacceptably large. Therefore, it is obvious that the mean value of stability constant obtained using EPLSQF will be more reliable than the single value obtained by conventional PLSQF. The use of EPLSQF in backward direction is not recommended when using the Leden±DeFord±Hume model for stability constant determination, unless weighted LSQ procedure is applied, because the polynomial values in the initial window (highest concentration points) prevail and bias the obtained results, as can be clearly seen in Fig. 5. The results obtained by EPLSQF in backward direction (`b') are incorrect (too far from the initial simulated value) and their standard errors are unacceptably high. For some other mathematical models, however, backward EPLSQ analysis can be more useful then the forward one, as will be shown later when using the van Leeuwen model. Another advantage of the ELSQPF procedure is that, together with simple model identi®cation, it enables visualisation, detection and elimination of erroneous points. The procedure for this is as follows: (i) Eliminate negative j values, because they do not have any physicochemical meaning. (ii) Convert all positive j values in log j domain and set the criterion of erroneous points elimination to 0.3 log units. This value is selected because it can be easily demonstrated, using the ttest, that stability constants having this standard error (or standard deviations) do not significantly B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 271 Fig. 5. Values of 1 with corresponding error bars obtained by forward (`f') and backward (`b') EPLSQF of error-corrupted (error equivalent to 0.3 mV in peak potential and 1% in peak current). F0 function simulated using 1130 l/mol, 21300 (l/mol)2 and 33600 (l/mol)3, assuming three complex species. differ from zero at 95% confidence intervals (it corresponds to an error of 100% in nonlogarithmic units). The 0.3 log units elimination limit is taken very conservatively, because in practice this elimination limit can be even lower, depending on the degrees of freedom. (iii) Eliminate the points outside this limit and calculate the new mean and standard deviations. The dispersion of points and their standard errors of the Leden polynomial is higher in the lower ligand concentration range, partially due to division by small concentration values. Therefore, the reference mean j values should be calculated from a data window in the higher ligand concentration range in order to avoid the strongly biased erroneous points in the lower ligand concentration range influencing the calculation of the mean and corresponding standard deviations. In Fig. 6, an example of detection and elimination of erroneous points is shown, together with the discrimination between the models with higher complex species. Log 1 values, assuming m3 (*) and those obtained assuming m4 () consecutively formed complexes in the system, are presented. Simulated initial value of the log 12.11 (± ± ±). Mean value of log 1 obtained with model having m3 is 2.24 (ÐÐÐ) and elimination limits of 0.3 log units are marked with (± ±). Assuming m3, the points are less scattered and only six points fall outside the elimination limit using an error of 0.3 mV in peak potential and an error of 1% in measured peak current, respectively. Assuming m4, the eliminated number of points using the same criteria is almost twice as large (11 out of 24 points eliminated). The situation is even less favourable when analysing higher constants. Assuming m4, the values of 4 are unacceptable, because there are 13 points out of 24 with values smaller than 0, or the mean value obtained is 41.71051.1106 which statistically does not differ from 0. Therefore, by using EPLSQF, the chemical model identi®cation is simple, while the elimination of several erroneous points give a more reliable set of stability constants than those obtained using the conventional LPSQF (see Table 3). To verify the advantages of the principle of the EPLSQF on experimental results, Pb(II) complexation with 2-hydroxypropanoate system was investigated by DPP at I2 M (NaClO4). Using conventional PLSQF of both, F0 and F1 polynomials on experimental data, three complex species are identi®ed in the system and acceptable sets of stability constants are obtained B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 272 Fig. 6. Values of log 1 obtained by forward EPLSQF of error-corrupted (error equivalent to 0.3 mV in peak potential and 1% in peak current). F0 function simulated using 1130 l/mol, 21300 (l/mol)2 and 33600 (l/mol)3, assuming (*) three and () four complex species. (ÐÐÐ) ± the obtained mean value of log 12.24; (± ± ±) ± the initial `true' value (log 12.11), and (± ±) ± the elimination intervals of 0.3 log units. Table 3 Evolving PLSQF of simulated error-free and error-corrupted F0 and F1 functions. Log j values and corresponding standard deviations are obtained as a mean of progressively increasing (in forward direction) window size assuming the chemical system with three consecutively formed complexes (m3) ma || b e/t c log 3SD || b e/t c F0 function (error in Ip1%) 1130 Mÿ1; 21300 Mÿ2; 33600 Mÿ3 3 0.3 2.240.11 0.13 6/24 3.140.14 3 0.6 2.290.13 0.18 8/24 3.140.09 3 1.8 2.430.12 0.32 13/24 3.540.15 0.03 0.03 0.02 10/24 14/24 17/24 3.640.14 3.680.15 4.000.15 0.08 0.12 0.44 15/24 17/24 20/24 F1 function (error in Ip1%) 1130 Mÿ1; 21300 Mÿ2; 33600 Mÿ3 3 0.3 2.100.03 0.01 0/24 3.160.10 3 0.6 2.080.04 0.03 0/24 3.160.11 3 1.8 2.070.07 0.04 2/24 3.240.14 0.05 0.05 0.13 1/24 2/24 8/24 3.610.10 3.580.11 3.510.14 0.05 0.02 0.05 7/24 9/24 15/24 error/mV log 1SD d || b e/t c log 2SD d a Maximal number of complex species. Absolute difference between initially assumed and obtained value of stability constants in log units. c Eliminated/total number of stability constant values. d Standard deviation. b (Table 4, bold) with most probable values of stability constant by EPLSQF of F1 function: log(1/Mÿ1) 2.260.01, log( 2/Mÿ2)3.130.08 and log(3/ Mÿ3)3.790.10, which is in a very good agreement with results reported earlier for the same system using different electroanalytical techniques [19,20]. 4.2. Van Leeuwen model In order to demonstrate that EPLSQF is a useful general approach for reliable system identi®cation and erroneous in¯uential points detection and elimination, a generalised model which holds for metal ion and B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 273 Table 4 Log j values and corresponding standard errors obtained by conventional PLSQF of F0 and F1 functions, obtained using DPP for different chemical models (maximal number of consecutively formed complexes, m) in solutions of Pb(II) 2-hydroxypropanoates. c(Pb2)60 mM; I2 M (NaClO4); pH5.7 and t(231)8C ma log 1SE e log 2SE e log 3SE e F0 function 1 2 3 4 3.110.03 1<0 2.370.08 1.760.40 3.640.11 3.040.10 3.500.11 3.750.04 3<0 F1 function 2 3 4 1.930.08 2.250.01 2.240.02 3.510.01 3.180.02 3.200.05 3.690.02 3.640.11 log 4SE e HRF b AIC c MQEP d 4.030.15 25.4 3.6 1.6 1.5 262 137 86 83 2359 49 9 8 2.980.51 8.5 2.0 1.9 272 178 180 3113 164 163 a Maximal number of complex species. Hamilton R-factor. c Akaike information criterion. d Mean quadratic error of prediction. e SEStandard error. b Table 5 Evolving PLSQF of F0 and F1 functions, obtained using DPP for different chemical models (maximal number of consecutively formed complexes, m) in solutions of Pb(II) 2-hydroxypropanoates. c(Pb2)60 mM; I2 M (NaClO4); pH5.7 and t(231)8C. Log j values and corresponding standard deviations are obtained as a mean of progressively increasing (in forward direction) window size, assuming the chemical system with three and four consecutively formed complexes (m3 and 4) ma log 1SD c e/t b log 2SD c e/t b log 3SD c e/t b log 4SD c e/t b F0 function 3 4 2.260.10 1.950.13 2/26 16/26 3.230.12 3.500.10 8/26 12/26 3.640.13 4.200.10 11/24 20/24 4.170.11 20/24 F1 function 3 4 2.260.01 2.260.02 0/26 0/26 3.130.08 3.100.09 0/26 6/26 3.790.10 3.850.12 3/26 11/26 4.600.18 21/24 a Maximal number of complex species. Eliminated/total number of stability constant values. c Standard deviation. b ligands with very different diffusion coef®cients and developed by van Leeuwen's group [11±14] has also been analysed using simulated data. The experimental veri®cation of EPLSQF was done using DPASV data obtained for complexation system of Zn(II) with macromolecular ligand PMA [14,16,21] (see Table 5). According to Eq. (12), vs. [L] curve was simulated using log (/Mÿ1)4.50, "0.02 and p2/3. To this curve, random errors with the same initial pattern and a mean value of 0 and standard deviations of 0.3, 0.6, 1.1 and 1.8% of the maximum peak current were added. These simulated data were evaluated by conventional LSQF procedure using analytical derivatives of three parameters (, " and p) in Eq. (12). The results are presented in Table 6. As can be seen, only error-free data gave the correct initial values. Noise-corrupted data with 0.3 and 0.6% noise levels gave acceptable values of , but evaluated values of " and p were unacceptable. With increased noise levels (1.1 and 1.7%) the LSQF did not converge. This simulation demonstrates that a three-parameters B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 274 Table 6 Log , " and p values from Eq. (5) and corresponding standard errors obtained by three-parameter conventional LSQF of simulated error-free and error-corrupted (different error levels) vs. c(ligand) functions (mmaximal number of complex species) (error)/% log 1SE e || a "SE e || a pSE e || a HRF b AIC c MQEP d 0.0 0.3 0.6 1.1 1.7 4.50910ÿ16 4.600.08 4.690.12 Does not converge! Does not converge! 0.00 0.10 0.19 0.020310ÿ16 ÿ0.0040.016 ÿ0.0150.017 0.000 0.024 0.035 0.67110ÿ15 0.560.07 0.490.09 0.000 0.107 0.177 810ÿ15 0.25 0.50 ÿ1401 ÿ219 ÿ193 610ÿ33 610ÿ6 210ÿ5 a Absolute difference between evaluated and initial simulated value. Hamilton R-factor. c Akaike information criterion. d Mean quadratic error of prediction. e Standard error. b Table 7 Log and " values from Eq. (5) with p value fixed (p2/3) and corresponding standard errors obtained by two parameters conventional LSQF of simulated error-free and error-corrupted (different error levels) vs. c(ligand) functions (mmaximal number of complex species) (error) (%) log 1SE e || a "SE e || a HRF b AIC c MQEP d 0.0 0.3 0.6 1.1 1.7 4.500.00 4.490.01 4.490.02 4.480.03 4.460.05 0.00 0.01 0.02 0.03 0.05 0.020410ÿ17 0.0190.004 0.0170.008 0.0140.015 0.0110.023 0.000 0.001 0.003 0.006 0.009 0.00 0.45 0.90 1.80 2.65 ÿ1 ÿ200 ÿ174 ÿ148 ÿ133 0.00 210ÿ5 810ÿ5 310ÿ4 710ÿ4 a Absolute difference between evaluated and initial simulated value. Hamilton R-factor. c Akaike information criterion. d Mean quadratic error of prediction. e Standard error. b non-linear LSQF procedure cannot be used with the model described by Eq. (12) because even the lowest error level (0.3%) gives incorrect values for the evaluated parameters. Therefore, a two-parameters LSQF procedure is recommended, ®xing the value of p to either 1/2 or 2/3, according to the electrochemical technique used [14]. In Table 7, the results obtained using conventional LSQF to evaluate and " values from simulated noisefree and noise-corrupted vs. [L] curves are shown. An acceptable set of these two parameters was obtained only by ®tting the Eq. (12) to error-free data and to data having a noise level of 0.3 and 0.6% (bold). With higher simulated noise level (1.1 and 1.7%) evaluated values for are acceptable, but the parameter " statistically does not differ from zero and consequently is not reliable. Graphical representations of log and " vs. forward and backward ligand concentration data window and 1.1% of error level added are shown in Fig. 7. It can be seen that parameters obtained from initial data windows (forward and backward), are incorrect (far from simulated values) and have quite large standard errors. With increase of data window the parameters become constant and coincide with the simulated values. The ELSQF approach reveal visually all the conclusions made by van den Hoop et al. [14] who have analysed the Eq. (12) analytically. From the evaluation of simulated error-free and noise-corrupted data it can be concluded that the van Leeuwen model for 1 : 1 M ± ligand complexation stoichiometry is quite errorrobust for the determination of stability constants but, at the same time, very error-sensitive for the determination of ". B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 275 Fig. 7. Log and " values obtained by ELSQF of the Eq. (12) to simulated data with added error (mean 0 and standard deviation in 1.1%) for forward (`f') and backward (`b') ligand concentration direction. Fig. 8. Experimental vs. c(-COOH, PMA)/mM (points) and LSQ fitted (full line) curves obtained by DPASV for Zn(II) PMA system with (*) c(KNO3)0.04 M and () 0.10 M. c(Zn2)1 mM; n0.8; and t(251)8C. To verify the ELSQF approach, experimental points obtained by DPASV for complexation of Zn(II) with PMA, at d0.8 and two different concentrations of KNO3 (0.04 and 0.10 M) were used (Fig. 8). Conventional LSQ ®tted curves obtained using p2/3 are shown as full lines in the same ®gure. In Table 8, the results obtained by conventional LSQF using p2/3 and 1/2 are presented. Again log values are acceptable, regardless of the ®xed p value, but " values are quite dispersed and not fully reliable. Three statistical parameters shown in Table 8 indicate that a slightly better ®t is obtained when using p2/3 is applied, B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 276 Table 8 Log , " values and corresponding standard errors obtained by two-parameter conventional LSQF. Experimental vs. c(-COOH, PMA) curves were obtained by DPASV in solutions of Zn(II) PMA with c(KNO3)0.04 and 0.10 M. c(Zn2)1 mM; n0.8; t(251)8C. The LSQF procedure was performed using p1/2 and 2/3 c(KNO3)/M p log 1SE d "SE d HRF a AIC b MQEP c 0.04 0.04 0.10 0.10 2/3 1/2 2/3 1/2 4.540.03 4.720.03 3.970.03 4.070.03 0.040.01 ÿ0.00040.0063 0.130.05 0.040.03 2.0 3.1 1.1 1.2 ÿ142 ÿ126 ÿ135 ÿ133 410ÿ4 110ÿ3 610ÿ4 710ÿ4 a Hamilton R-factor. Akaike information criterion. c Mean quadratic error of prediction. d Standard error. b Fig. 9. Log and " values obtained by forward (`f') and backward (`b') ELSQF of Eq. (5) to experimental data obtained by DPASV for Zn(II) PMA system with (*) c(KNO3)0.04 M and () 0.10 M. c(Zn2)1 mM; n0.8; t(251)8C. which is theoretically expected using DPASV [11±14]. The results obtained using ELSQF procedure are shown in Table 9 for forward, backward and cumulative forward and backward evolving directions, obtained using the procedure for erroneous points detection and elimination. In Fig. 9, log and " values are shown with corresponding error bars obtained in solutions with 0.1 M KNO3. It can be seen that ELSQF approach disclose that the scattering of obtained parameters and their standard errors is greater in the initial data window of the forward than in the backward ELSQF direction, where very good constancy of the parameters was obtained. This can be due to the greater experimental error or deviation from the model at lower ligand concentrations, as can be seen from the graphs presented in Fig. 8. In Fig. 9, some of the initial points of the forward ELSQF have been deliberately omitted because they are very far out of the scale. Therefore, it seems that for this system and model data obtained by ELSQF in backward direction is more reliable than those obtained from forward direction, because the latter needs elimination of more erroneous points. Again, as it can be concluded from b a 4.490.05 3.910.08 log SD Forward b a 0/15 6/15 e/t Eliminated/total number of points. Standard deviation. 0.04 0.10 c(KNO3)/M 0.0310.007 0.110.03 "SD b 6/15 9/15 e/t a 4.560.02 3.990.04 log SD Backward b 0/15 0/15 e/t a 0.0430.004 0.140.02 "SD b 0/15 0/15 e/t a 4.520.05 3.960.08 log SD b 0/30 6/30 e/t a Forward and backward 0.0380.07 0.130.03 "SD b 6/30 9/30 e/t a Table 9 Log , " values and corresponding standard deviations obtained by two parameters evolving LSQF. Experimental vs. c(-COOH, PMA) curves were obtained by DPASV in solutions of Zn(II) PMA with c(KNO3)0.04 and 0.10 M. c(Zn2)1 mM; n0.8; t(251)8C. The evolving LSQF procedure was performed in forward and backward directions using p2/3 B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 277 278 B.S. Grabaric et al. / Analytica Chimica Acta 363 (1998) 261±278 simulated data, the log values are more reliable than " values, and a simple visual and statistical detection and elimination of in¯uential erroneous points is possible, because the values of the i vs. [L]i function are always within the scaled range (0 and 1), and an error estimation is possible using statistical criteria. Acknowledgements The authors gratefully acknowledge ®nancial support from the Spanish Ministry of Education and Science, DGICYT Projects PB93-1055 (1994±1997) and PB96-0379-C03-01 (1997±2000). B.S. Grabaric also acknowledges ®nancial supports from the following: Spanish Ministry of Education and Science (October 1996±March 1997), General Direction of Universities, Generalitat de Catalunya (April 1997± August 1997) and IBERDROLA (October 1997± March 1998) for a visiting professorship of science and technology. References [1] A.E. Martell, R.M. Smith, Critical Stability Constants, vols. 1±5, Plenum Press, New York, 1974±1982. [2] J. Buffle, Complexation Reactions in Aquatic Systems. An Analytical Approach, Ellis Horwood, Chichester, 1988. [3] H.P. van Leeuwen, R. Cleven, J. Buffle, Pure Appl. Chem. 61 (1989) 255. [4] J. Wang, Electroanalytical Techniques in Clinical Chemistry and Laboratory Medicine, VCH, New York, 1988. [5] M.R. Smyth, J.G. Vos (Eds.), Analytical Voltammetry, in G. Svehla (Ed.), Wilson's and Wilson's Comprehensive Analytical Chemistry, vol. XXVII, Elsevier, Amsterdam, 1992. [6] B.S. GrabaricÂ, Z. GrabaricÂ, M. Esteban, E. Casassas, Anal. Chim. Acta 325 (1996) 135. [7] I. Leden, Z. Physik. Chem. (Leipzig) 188A (1941) 160. [8] D.D. DeFord, D.N. Hume, J. Am. Chem. Soc. 73 (1951) 5321. [9] P. Gans, Data Fitting in the Chemical Sciences ± By the Method of Least Squares, John Wiley and Sons, Chichester, 1992. [10] M. Meloun, J. Militky, M. Forina, Chemometrics for Analytical Chemistry, PC-aided Regression and Related Methods, vol. 2, Ellis Horwood, New York, 1994. [11] H.G. de Jong, H.P. Van Leeuwen, K. Holub, J. Electroanal. Chem. 234 (1987) 1. [12] H.G. de Jong, H.P. Van Leeuwen, J. Electroanal. Chem. 234 (1987) 17. [13] H.G. de Jong, H.P. Van Leeuwen, J. Electroanal. Chem. 235 (1987) 1. [14] M.A.G.T. Van den Hoop, F.M.R. Leus, H.P. Van Leeuwen, Coll. Czech. Chem. Commun. 56 (1991) 96. [15] A.M. Bond, B.S. GrabaricÂ, Anal. Chem. 51 (1979) 337. [16] J.M. DõÂaz-Cruz, C. ArinÄo, M. Esteban, E. Casassas, Biophys. Chem. 45 (1992) 109. [17] MATLAB High-Performance Numeric Computation and Visualisation Software, Reference Guide, The MathWorks, Inc., Natick, 1992. [18] B.S. GrabaricÂ, MATLAB toolbox ELCHEM for electrochemical signal simulation, visualisation and evaluation, Department of Analytical Chemistry, University of Barcelona, Barcelona, 1997. [19] I. FilipovicÂ, M. TkalcÏec, B. GrabaricÂ, Inorg. Chem. 29 (1990) 1092 (and references cited therein). [20] M. TkalcÏec, B. GrabaricÂ, I. FilipovicÂ, Anal. Chim. Acta 143 (1982) 255. [21] M. Esteban, H.G. de Jong, H.P. Van Leeuwen, Int. J. Environ. Anal. Chem. 38 (1990) 75.
© Copyright 2024