Integrated Design of Hydrological Networks (Proceedings of the Budapest Symposium, July 1986). IAHS Publ. no. 158,1986. Information content and sample length in sediment records Contenu de l'information et durée l'échantillonage archives de données sédimentologiques dans les KAZ ADAMOWSKI Department of Civil Engineering, University of Ottawa, Ottawa, Canada, KIN 6N5 TERRY DAY Sediment Survey Section, Environment Canada, Ottawa, Canada, KIN 0E7 NICOLAS R. DALEZIOS INTERA Technologies Ltd., 785 Carling Avenue, Ottawa, Canada, K1S 5H4 DENIS GINGRAS Atlantic Region, Environment Canada, Dartmouth, Nova Scotia, Canada, B2Y 2N6 ABSTRACT Sediment-concentration data were analyzed for trend, seasonality, probability distribution, and information content. It was found that the analyzed data were trend-free, log-normally distributed, and showed strong periodicity. Information content varied throughout the year, suggesting that observations can be discontinued at certain times of the year. Sample lengths must be very long if it is necessary to have 10% accuracy in the estimation of the mean and median with 95% confidence. INTRODUCTION Sediment data exhibit substantial variability in time and space; hence, statistical procedures must be included to estimate future event magnitudes based on observed data. The estimated probabilistic magnitudes then are used for planning, management, and decisionmaking. Because the data are unpredictable, there is a risk associated with sediment predictions. This risk is composed of two parts: (a) risk due to estimation uncertainty, and (b) sedimentationprocess risk. The former risk can be reduced by acquisition of additional data; however, the latter risk is purely a function of the project-operation period. Thus, there is a limiting value to the acquisition of additional data, after which risk dominates estimation uncertainty (Lall et al., 1982). Much statistical literature exists concerning the design of hydrometeorological data collection, the value of data, and design of a structure with inadequate hydrologie information. Often, there is disagreement over discontinuing data collection at a given 133 134 Kaz Adamowski et al. location. It can be argued that no amount of data is enough, especially to estimate rare events. However, the marginal value of information decreases with increasing data length, and it could be advantageous to shift the limited resources to locations where additional information is desirable. Therefore, a procedure is needed to determine the value of additional information that in turn can be assessed either in the context of the purpose for the data or the reliability of estimation. The concept of information content is useful in such an assessment. Information content usually is defined as the decrease in uncertainty with estimation of a parameter. It is, thus, inversely proportional to the uncertainty about a given parameter, which usually is variance or the mean square error of estimate. This study examines the seasonality patterns, probability distributions, and information content of time series of sediment concentrations from three stations in Canada which are located in New Brunswick (01AP004) and Alberta (05DA009 and 05AA024). Analysis was performed wherever possible on the daily measurements, monthly means, and annual means as well. TIME SERIES AND FREQUENCY ANALYSIS Linear trends were detected by the variate-difference method (Adamowski & Kite, 1973) and estimated by polynomial regression: Tt= X^ajt" (1) where t is the time and a A are constants. The periodic component was estimated by Fourier analysis (Box & Jenkins, 1976). Pt = A Q + f / 2 (Ak cos 2zkt + k=l n B sin Met) n (2) where A's and B's are Fourier coefficients (harmonics), t = 1,2,..., n. To determine which probability distribution fits the observed data, various statistical distributions (normal, log-normal, lognormal three parameter, and Gumbel) were used (Condie et al., 1981), INFORMATION CONTENT AND SAMPLE LENGTH The information content I is defined as the inverse of the variance of a given statistic (Salas et al., 1980). I = _J_ (3) vara where a is a statistic (e.g. a mean). By using the concept of a confidence interval for the mean, it is Information and sample length In sediment records 135 possible to "design" a sampling procedure to yield confidence limits of a desired width with a prescribed confidence. For example, the sample size n that is needed to give an estimate of the mean with specified accuracy and confidence can be determined. If it is assumed that a one-sided 95% confidence interval on the mean with an interval width no greater than 10% of the mean is desired, n can be determined such that: s//n t 9 5 %)11 _ 1 = O.lOx: (4) which can be rewritten as: n = (t95%,n-l C v /10) 2 (5) where C v is the coefficient of variation in percent, n is the sample length, and t is the student's distribution. The above expression is valid for a normally distributed variate. Because the sediment data were found to be log-normally distributed, the expression must be modified, whereby the standard deviation of the logarithms of data and the corresponding value of C v can be determined and used in equation (5). As a measure of accuracy of estimating a parameter, a standard error (SE) of estimate often is used, and for the mean is given by: SE(%) = 100 Cv//n (6) The above relationship is valid when the data are normally distributed, but a modification is required for log-normally distributed data. However, for a log-normally distributed data, the central tendency is better described by a median instead of a mean. It is, therefore, possible to derive a relationship between Cy, SE, and n with normal and log-normal distributions for the mean and median values (Adamowski, 1983 and 1984). Figure 1 is an example of this type of relationship. DATA ANALYSIS Since approximately 1965, a systematic program for sediment data collection has existed in Canada. The longest record in the United States dates back to 1930 (San Juan River near Bluff, Utah). Therefore, the existing data base is rather small by comparison with some other hydrologie variables such as streamflow or rainfall. For numerical analysis, three stations were selected (see Table 1) essentially on the basis of the longest available record length. No trends were detected in any of the daily or monthly series of the three stations analyzed. However, the periodicities were significant and accounted for 38.1% (station 01AP004), 76.9% (for 05AA24), and 90.9% (for 05DA009) in the monthly means (Adamowski, 1984). The sediment concentrations were highly skewed (Table 1) and followed a log-normal distribution except for annual data at station 01AP004. Similar findings were reported by Van Sickle (1981) for monthly data. Tables 2, 3, and 4 give monthly means and variances for the index 136 Kaz Adamowski et al. 0.50 1.77 3.03 4.30 5.57 6.83 8.10 9.37 Coefficient of Variation FIG.l Relationship between the standard error, coefficient of variation, and the record length (N) for the median of log-normally distributed data. of the relative information content. The index is the number of years of additional sampling necessary at the station with the least information (the greatest variance) if it is to reach the information level in the mean of other months. For example, for station 01AP004, the maximum variance is 742.4 and if there are 15 years of data, then: 742.4/(15 + K) = a2 (7) where a2 is the variance of the given month and K is the number of samples required at the station to reach the information level of a given month. It can be observed from Table 2 that sampling from April to September is insignificant in terms of assessing the total monthly mean. The present level of information of these months would be reached by the station only after sampling for an additional 810 years. Similar results were obtained for the other two stations and are shown in Tables 3 and 4. The large differences in monthly Information and sample length in sediment records 137 TABLE 1 Stations selected for analysis Station Period of record Statistics Mean (mg/A) STD 5.21 16.8 34.40 cs Data Kennecabis River, New Brunswick (01AP004) 1966-81 22.0 22.0 22.0 North Saskatchewan River, Alberta (05DA004) 1972-79 Record length too short 47.74 66.1 1.62 Data not available Annual Monthly Daily Oldman River, Alberta (05AA024) 1966-79 45.7 45.7 45.7 Annual Monthly Daily 21.0 93.8 169.5 -0.20 2.07 6.14 1.35 4.02 13.16 Annual MonthlyDaily Standard Deviation STD C s = Coefficient of Skewness information content also suggest that the necessary sampling intervals may be of different orders at varying times of the year. Table 5 contains the sample length required to estimate the mean TABLE 2 Monthly means for Kennecabis River (01AD004) Month Mean (mg/A) Variance (a2) Years to equal information content, K (equation 7) July August September October November December January February March April May June 16.4 10.6 11.7 20.3 20.2 28.7 33.8 25.6 30.5 31.7 20.2 13.9 37.3 13.5 48.5 363.3 217.3 242.6 742.4 532.9 403.6 82.2 122.5 50.2 284 810 215 15.7 36.2 30.9 0.0 5.90 12.6 121 75.9 207 138 Kaz Adamowski et al. TABLE 3 Monthly means for Oldman River (05AA024) Month Mean (mg/£) Variance (a2) Years to equal information content, K (equation 7) May June July August September October November December January February March April 204.6 144.5 17.9 24.2 6.4 5.8 8.6 10.7 7.0 6.2 46.7 65.7 32,281.5 22,497.9 433.6 2,554.1 39.0 73.1 67.6 379.5 18.1 15.3 2,810.4 3,238.8 0.0 5.65 955 151 10,748 5,728 6,195 1,093 23,173 26,416 136 117 TABLE 4 Monthly means for North Saskatchewan River (05DA009) Month Mean (mg/£) Variance (a2) Years to equal information content, K (equation 7) May June July August September October November December January February March April 58.8 135.4 169.2 122.3 50.5 9.5 8.9 4.7 1.6 2.2 3.4 6.3 5,240.5 2,477.7 4,846.1 546.6 489.7 7.8 44.9 19.2 1.9 1.9 4.8 8.8 0.0 7.8 0.57 60.1 67.9 4,696 810 1,904 19,401 19,401 6,318 4,162 and the median with 10% accuracy and 95% confidence. These sample lengths are quite high and obviously can*be modified by assuming a different accuracy and confidence. Information and sample length In sediment records 139 TABLE 5 Sample length n required to determine the mean and median values with an accuracy of 10% and 95% confidence Station Data Distribution Sample length, n mean median Kennecabis River Monthly Annual LN N 5,610 1,100 N. Saskatchewan River Annual LN 17,340 1,530 Oldman River Annual LN 4,100 978 CONCLUSIONS Time-series analysis of monthly means revealed much periodicity in the sediment concentrations, especially for the two stations from Alberta where it accounted for 77 and 91% of the variance of the series. Similarly, there were high periodicities in the daily data. There was no indication of trends in any of the three analyzed series. The data exhibited a high variability and skewness and could be approximated by a log-normal distribution function. The variation of the information content throughout the year as well as the sample lengths required for a given degree of accuracy were determined. The monthly relative information content indicated that at the two stations in Alberta, due to large seasonal variations in sediment concentrations, sampling could be reduced or even discontinued at certain times of the year, usually corresponding to winter months. Since the annual mean of these two highly skewed stations are lognormally distributed, with a corresponding high coefficient of variation, the required sample lengths for a 10% accuracy in the mean with 95% confidence are also very large. Similarly, for the median, the sample lengths are also very large. REFERENCES Adamowski, K. (1983) Investigation and utilization of long-term suspended sediment records. Report to Environment Canada. Adamowski, K. (1984) Statistical analysis of sediment data record. Report to Environment Canada. Adamowski, K. & Kite, G.W. (1973) Stochastic analysis of Lake Superior elevations for computation of relative crustal movement. J. Hydrol. (18), 163-175. 140 Kaz Adamowski et al. Box, G.E.P. & Jenkins, G.M. (1976) Time series analysis - forecasting and control. Holden Day. Condie, R., Nix, G.A. & Boone, L.G. (1981) Flood damage reduction program - flood frequency analysis. Engineering Hydrology Section, Environment Canada. Lall, V. & Beard, L.R. (1982) Assessment of hydrologie information for rare event design. American Geophysical Union, Philadelphia, Pennsylvania, U.S.A. Salas, J.D., Delleur, J.W., Yevjevich, V. & Lane, W.L. (1980) Applied modelling of hydrologie time series. Water Resources Publications. Van Sickle, J. (1981) Long-term distributions of annual sediment yields from small watersheds. Wat. Resour. Res. 17(3), 659-663.