Document 263599

Integrated Design of Hydrological Networks (Proceedings of the Budapest
Symposium, July 1986). IAHS Publ. no. 158,1986.
Information content and sample length in
sediment records
Contenu de l'information
et durée l'échantillonage
archives de données
sédimentologiques
dans
les
KAZ ADAMOWSKI
Department of Civil Engineering, University of
Ottawa, Ottawa, Canada, KIN 6N5
TERRY DAY
Sediment Survey Section, Environment Canada,
Ottawa, Canada, KIN 0E7
NICOLAS R. DALEZIOS
INTERA Technologies Ltd., 785 Carling Avenue,
Ottawa, Canada, K1S 5H4
DENIS GINGRAS
Atlantic Region, Environment Canada,
Dartmouth, Nova Scotia, Canada, B2Y 2N6
ABSTRACT Sediment-concentration data were analyzed for
trend, seasonality, probability distribution, and
information content. It was found that the analyzed data
were trend-free, log-normally distributed, and showed
strong periodicity. Information content varied throughout
the year, suggesting that observations can be discontinued
at certain times of the year. Sample lengths must be very
long if it is necessary to have 10% accuracy in the
estimation of the mean and median with 95% confidence.
INTRODUCTION
Sediment data exhibit substantial variability in time and space;
hence, statistical procedures must be included to estimate future
event magnitudes based on observed data. The estimated probabilistic
magnitudes then are used for planning, management, and decisionmaking. Because the data are unpredictable, there is a risk
associated with sediment predictions. This risk is composed of two
parts: (a) risk due to estimation uncertainty, and (b) sedimentationprocess risk. The former risk can be reduced by acquisition of
additional data; however, the latter risk is purely a function of
the project-operation period. Thus, there is a limiting value to the
acquisition of additional data, after which risk dominates estimation
uncertainty (Lall et al., 1982).
Much statistical literature exists concerning the design of
hydrometeorological data collection, the value of data, and design
of a structure with inadequate hydrologie information. Often, there
is disagreement over discontinuing data collection at a given
133
134 Kaz Adamowski et al.
location. It can be argued that no amount of data is enough,
especially to estimate rare events. However, the marginal value of
information decreases with increasing data length, and it could be
advantageous to shift the limited resources to locations where
additional information is desirable. Therefore, a procedure is
needed to determine the value of additional information that in turn
can be assessed either in the context of the purpose for the data or
the reliability of estimation. The concept of information content
is useful in such an assessment.
Information content usually is defined as the decrease in
uncertainty with estimation of a parameter. It is, thus, inversely
proportional to the uncertainty about a given parameter, which
usually is variance or the mean square error of estimate.
This study examines the seasonality patterns, probability
distributions, and information content of time series of sediment
concentrations from three stations in Canada which are located in
New Brunswick (01AP004) and Alberta (05DA009 and 05AA024). Analysis
was performed wherever possible on the daily measurements, monthly
means, and annual means as well.
TIME SERIES AND FREQUENCY ANALYSIS
Linear trends were detected by the variate-difference method
(Adamowski & Kite, 1973) and estimated by polynomial regression:
Tt=
X^ajt"
(1)
where t is the time and a A are constants.
The periodic component was estimated by Fourier analysis (Box &
Jenkins, 1976).
Pt = A Q +
f / 2 (Ak cos 2zkt +
k=l
n
B
sin
Met)
n
(2)
where A's and B's are Fourier coefficients (harmonics), t = 1,2,...,
n. To determine which probability distribution fits the observed
data, various statistical distributions (normal, log-normal, lognormal three parameter, and Gumbel) were used (Condie et al., 1981),
INFORMATION CONTENT AND SAMPLE LENGTH
The information content I is defined as the inverse of the variance
of a given statistic (Salas et al., 1980).
I = _J_
(3)
vara
where a is a statistic (e.g. a mean).
By using the concept of a confidence interval for the mean, it is
Information and sample length In sediment records 135
possible to "design" a sampling procedure to yield confidence limits
of a desired width with a prescribed confidence. For example, the
sample size n that is needed to give an estimate of the mean with
specified accuracy and confidence can be determined. If it is
assumed that a one-sided 95% confidence interval on the mean with an
interval width no greater than 10% of the mean is desired, n can be
determined such that:
s//n t 9 5 %)11 _ 1 = O.lOx:
(4)
which can be rewritten as:
n
= (t95%,n-l C v /10) 2
(5)
where C v is the coefficient of variation in percent, n is the sample
length, and t is the student's distribution.
The above expression is valid for a normally distributed variate.
Because the sediment data were found to be log-normally distributed,
the expression must be modified, whereby the standard deviation of
the logarithms of data and the corresponding value of C v can be
determined and used in equation (5).
As a measure of accuracy of estimating a parameter, a standard
error (SE) of estimate often is used, and for the mean is given by:
SE(%) = 100 Cv//n
(6)
The above relationship is valid when the data are normally
distributed, but a modification is required for log-normally
distributed data. However, for a log-normally distributed data, the
central tendency is better described by a median instead of a mean.
It is, therefore, possible to derive a relationship between Cy, SE,
and n with normal and log-normal distributions for the mean and
median values (Adamowski, 1983 and 1984). Figure 1 is an example of
this type of relationship.
DATA ANALYSIS
Since approximately 1965, a systematic program for sediment data
collection has existed in Canada. The longest record in the United
States dates back to 1930 (San Juan River near Bluff, Utah).
Therefore, the existing data base is rather small by comparison with
some other hydrologie variables such as streamflow or rainfall.
For numerical analysis, three stations were selected (see Table 1)
essentially on the basis of the longest available record length. No
trends were detected in any of the daily or monthly series of the
three stations analyzed. However, the periodicities were significant
and accounted for 38.1% (station 01AP004), 76.9% (for 05AA24), and
90.9% (for 05DA009) in the monthly means (Adamowski, 1984).
The sediment concentrations were highly skewed (Table 1) and
followed a log-normal distribution except for annual data at station
01AP004. Similar findings were reported by Van Sickle (1981) for
monthly data.
Tables 2, 3, and 4 give monthly means and variances for the index
136 Kaz Adamowski et al.
0.50
1.77
3.03
4.30
5.57
6.83
8.10
9.37
Coefficient of Variation
FIG.l
Relationship between the standard error,
coefficient of variation, and the record length (N)
for the median of log-normally distributed data.
of the relative information content. The index is the number of
years of additional sampling necessary at the station with the least
information (the greatest variance) if it is to reach the information
level in the mean of other months. For example, for station 01AP004,
the maximum variance is 742.4 and if there are 15 years of data,
then:
742.4/(15 + K) = a2
(7)
where a2 is the variance of the given month and K is the number of
samples required at the station to reach the information level of a
given month. It can be observed from Table 2 that sampling from
April to September is insignificant in terms of assessing the total
monthly mean. The present level of information of these months would
be reached by the station only after sampling for an additional 810
years. Similar results were obtained for the other two stations and
are shown in Tables 3 and 4. The large differences in monthly
Information and sample length in sediment records 137
TABLE 1
Stations selected for analysis
Station
Period
of
record
Statistics
Mean
(mg/A)
STD
5.21
16.8
34.40
cs
Data
Kennecabis River,
New Brunswick
(01AP004)
1966-81
22.0
22.0
22.0
North Saskatchewan
River, Alberta
(05DA004)
1972-79
Record length too short
47.74
66.1
1.62
Data not available
Annual
Monthly
Daily
Oldman River,
Alberta
(05AA024)
1966-79
45.7
45.7
45.7
Annual
Monthly
Daily
21.0
93.8
169.5
-0.20
2.07
6.14
1.35
4.02
13.16
Annual
MonthlyDaily
Standard Deviation
STD
C s = Coefficient of Skewness
information content also suggest that the necessary sampling
intervals may be of different orders at varying times of the year.
Table 5 contains the sample length required to estimate the mean
TABLE 2
Monthly means for Kennecabis River (01AD004)
Month
Mean
(mg/A)
Variance
(a2)
Years to equal
information
content, K
(equation 7)
July
August
September
October
November
December
January
February
March
April
May
June
16.4
10.6
11.7
20.3
20.2
28.7
33.8
25.6
30.5
31.7
20.2
13.9
37.3
13.5
48.5
363.3
217.3
242.6
742.4
532.9
403.6
82.2
122.5
50.2
284
810
215
15.7
36.2
30.9
0.0
5.90
12.6
121
75.9
207
138 Kaz Adamowski et al.
TABLE 3
Monthly means for Oldman River (05AA024)
Month
Mean
(mg/£)
Variance
(a2)
Years to equal
information
content, K
(equation 7)
May
June
July
August
September
October
November
December
January
February
March
April
204.6
144.5
17.9
24.2
6.4
5.8
8.6
10.7
7.0
6.2
46.7
65.7
32,281.5
22,497.9
433.6
2,554.1
39.0
73.1
67.6
379.5
18.1
15.3
2,810.4
3,238.8
0.0
5.65
955
151
10,748
5,728
6,195
1,093
23,173
26,416
136
117
TABLE 4
Monthly means for North Saskatchewan River
(05DA009)
Month
Mean
(mg/£)
Variance
(a2)
Years to equal
information
content, K
(equation 7)
May
June
July
August
September
October
November
December
January
February
March
April
58.8
135.4
169.2
122.3
50.5
9.5
8.9
4.7
1.6
2.2
3.4
6.3
5,240.5
2,477.7
4,846.1
546.6
489.7
7.8
44.9
19.2
1.9
1.9
4.8
8.8
0.0
7.8
0.57
60.1
67.9
4,696
810
1,904
19,401
19,401
6,318
4,162
and the median with 10% accuracy and 95% confidence. These sample
lengths are quite high and obviously can*be modified by assuming a
different accuracy and confidence.
Information and sample length In sediment records 139
TABLE 5
Sample length n required to determine the mean
and median values with an accuracy of 10% and 95%
confidence
Station
Data
Distribution
Sample length, n
mean
median
Kennecabis River
Monthly
Annual
LN
N
5,610
1,100
N. Saskatchewan
River
Annual
LN
17,340
1,530
Oldman River
Annual
LN
4,100
978
CONCLUSIONS
Time-series analysis of monthly means revealed much periodicity in
the sediment concentrations, especially for the two stations from
Alberta where it accounted for 77 and 91% of the variance of the
series. Similarly, there were high periodicities in the daily data.
There was no indication of trends in any of the three analyzed
series. The data exhibited a high variability and skewness and could
be approximated by a log-normal distribution function. The variation
of the information content throughout the year as well as the sample
lengths required for a given degree of accuracy were determined. The
monthly relative information content indicated that at the two
stations in Alberta, due to large seasonal variations in sediment
concentrations, sampling could be reduced or even discontinued at
certain times of the year, usually corresponding to winter months.
Since the annual mean of these two highly skewed stations are lognormally distributed, with a corresponding high coefficient of
variation, the required sample lengths for a 10% accuracy in the mean
with 95% confidence are also very large. Similarly, for the median,
the sample lengths are also very large.
REFERENCES
Adamowski, K. (1983) Investigation and utilization of long-term
suspended sediment records. Report to Environment Canada.
Adamowski, K. (1984) Statistical analysis of sediment data record.
Report to Environment Canada.
Adamowski, K. & Kite, G.W. (1973) Stochastic analysis of Lake
Superior elevations for computation of relative crustal movement.
J. Hydrol. (18), 163-175.
140 Kaz Adamowski et al.
Box, G.E.P. & Jenkins, G.M. (1976) Time series analysis - forecasting
and control. Holden Day.
Condie, R., Nix, G.A. & Boone, L.G. (1981) Flood damage reduction
program - flood frequency analysis. Engineering Hydrology
Section, Environment Canada.
Lall, V. & Beard, L.R. (1982) Assessment of hydrologie information
for rare event design. American Geophysical Union, Philadelphia,
Pennsylvania, U.S.A.
Salas, J.D., Delleur, J.W., Yevjevich, V. & Lane, W.L. (1980) Applied
modelling of hydrologie time series. Water Resources Publications.
Van Sickle, J. (1981) Long-term distributions of annual sediment
yields from small watersheds. Wat. Resour. Res. 17(3), 659-663.