Notes Lecture 3: Sample autocorrelation and testing for independence STAT352: Applied Time Series Tilman M. Davies Dept. of Mathematics & Statistics University of Otago This Lecture Estimation of autocovariance/autocorrelation Testing for an independent series Notes Estimating dependence from data Notes Last lecture, we examined the first- and second-order properties of a time series, and defined the mean, covariance and correlation functions for some simple time series models. In practice, however, we don’t start with the model. We start with observed data. To assess the strength and nature of the dependence in our data, we make use of the ‘sample’ analogues of the functions defined earlier for stationary time series models. Sample estimators Notes Let x1 , . . . , xN be observations of a time series. The stationary sample mean is N 1 X x¯ = xt . N t=1 The sample autocovariance function is N−|h| 1 X (xt+|h| − x¯)(xt − x¯) ; γˆ (h) = N −n < h < n. t=1 The sample autocorrelation function is ρˆ(h) = γˆ (h) ; γˆ (0) −n < h < n. Sample estimators (cont.) Notes Note that we can compute the sample estimates for any data set. A common task is to plot the sample autocorrelation function (abbreviated to an ‘ACF plot’) and inspect it (a) for any evidence of independence in the data; and (b) to assess whether there exists a nice stationary model we can fit to the observed time series (subsequently used for prediction/forecasting). Typically we only plot the ACF for h ≥ 0, since it is symmetric for negative values of h. Example I: White noise sample ACF plot Notes 1.0 As we would expect, the purely independent nature of white noise means that for lags h 6= 0, the sample autocorrelation is very small. ● 2 ● 0.8 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ACF ● ●● 0.2 0 ● ● ● −2 W_t ● ● ● ● ● ● ● −0.2 0.0 1 ● 0.6 ● ● ● ● 0 10 20 30 t 40 50 0 5 10 Lag 15 ACF plot confidence bands Notes It can be shown, for an iid series with finite variance, that ρˆ(h) for h > 0 is approximately normally distributed with zero mean and variance 1/N, as N becomes large. The dashed horizontal lines on the ACF plot are typically included by default. They indicate a 95% confidence band corresponding to a null hypothesis of independent terms in the time series. √ Hence, they are computed as ±1.96/ N. Any breaches of these bands by ρˆ for h > 0 constitutes evidence against the null hypothesis of independent terms. This isn’t necessarily a bad thing... if we can fit a model which represents the observed dependence structure, we can provide sensible ‘future’ predictions of the series using that model. Example II: Random walk sample ACF plot Notes 1.0 A realisation of the nonstationary standard normal random walk model, and its ACF plot, is given below: ●● ● 0.8 ● −4 ● ●●● ● ●● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● S_t W_t ● ● ● −0.4 ● ● ACF −2 ● ● −6 S_t ● ● ●● ● ● ● ● 0.4 ● ● 0.6 ● ● 0.2 0 ● 0.0 2 ● ● ● ●● 10 20 30 t 40 50 0 10 20 30 Lag 40 Example III: Standard normal MA(1) model Notes 1.0 10 A realisation of the stationary standard normal first order moving average model with θ = −3, and its ACF plot, is given below: ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● 0 ● ● ● X_t W_t ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● 0.0 0 ● ● ● ● ● ● ● −0.2 X_t ● ● ● 10 0.4 ● ● ● ● ●● 0.6 ● ● ACF 5 0.8 ● ● 20 30 40 t 50 0 10 20 30 40 50 Lag Interpreting an ACF plot We are typically interested in two distinct features of a sample ACF plot: 1 2 The individual values close to h = 0, and The behaviour of the correlations as a whole. That is, we care less about individual correlation values for larger h than for smaller h. The overall behaviour of the plot can aid in (a) selecting an appropriate model to represent our observed data, (b) assessing the presence of trend and or seasonality in the observed data thus, more generally, (c) indicate the possibility of nonstationarity. Notes Interpreting an ACF plot: important notes Notes VERY IMPORTANT: Just because an ACF plot indicates statistically significant correlations, does not necessarily mean the underlying process is nonstationary. A breach of the confidence bands simply provides you with evidence against the null hypothesis of purely independent terms. Remember, stationary processes are of course allowed dependence, its just that the nature of that dependence has certain simple conditions attached to it. It can be difficult to use the ACF plot alone to assess stationarity vs. nonstationarity, but there are certain features of such a plot that can be recognised as unique to either situation. Testing the hypothesis of independent terms Method 1: 95% ACF plot confidence bands. Computation: √ ±1.96/ N, where N is the sample size. Interpretation: Inspect the ACF plot and conclude statistically significant evidence against the null hypothesis of independent terms if more than 5% of the ρˆ(h) values exceed the limits. Notes Testing the hypothesis of independent terms Notes Method 2: Ljung-Box test (single-value version of Method 1). Computation: Under the null hypothesis, the test statistic Q = N(N + 2) h X ρˆ(k)2 N −k k=1 follows a Chi-squared distribution with h degrees of freedom. Interpretation: For a fixed value of h, compute Q, then the p-value is given by P(q > Q); q ∼ χ2 (h). Reject null hypothesis of independent terms at the 5% level if P(q > Q) < 0.05. Testing the hypothesis of independent terms Method 3: Rank test (can be useful for detecting a linear trend in the observed data). Computation: Suppose the observations of our time series are denoted X = {x1 , . . . , xN }. Let P= N−1 X N X 1[xu > xt ] t=1 u=t+1 be the total number of pairs with the ‘later’ value in the time series larger than the ‘earlier’ value in the time series, where 1[ · ] denotes the indicator function. Notes Testing the hypothesis of independent terms Notes Method 3: Rank test (cont.) Furthermore, let N(N − 1) 4 MP = and VP = N(N − 1)(2N + 5) . 72 Then, under the null hypothesis of iid terms, P is normally √ distributed with mean MP and standard deviation VP . Interpretation: Reject null hypothesis√of independent terms at a 5% level of significance if |P − MP |/ VP > 1.96 (and you can get a p-value in the usual way from the standard normal distribution). Testing for iid terms in Example I: White noise 1.0 ● 2 ● 0.8 ● ● ● ● ●● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● 0.4 ACF ● ●● ● ● ● ● 0.2 0 W_t ● ● ● ● ● ● ● −0.2 0.0 1 ● 0.6 ● ● ● Notes ● 0 10 20 30 t 40 50 0 5 10 15 Lag ACF plot test: NO EVIDENCE, RETAIN NULL Ljung-Box test (h=15):√Q=4.8944, p = 0.993 RETAIN NULL Rank test: (P − MP )/ VP = 0.811, p = 0.209 RETAIN NULL Testing for iid terms in Example II: Random walk 1.0 ● ● ●● ● 0.8 2 ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● −0.4 S_t W_t 0 ● ● 0.0 −4 ● ● ● ● ACF −2 S_t ● ● ●● ● ● ● ● 0.4 ● ● 0.6 ● ● 0.2 0 ● −6 Notes ●● 10 20 30 40 50 0 10 20 t 30 40 Lag ACF plot test: STRONG EVIDENCE, REJECT NULL Ljung-Box test (h=15):√Q=191.0684, p < 0.00001 REJECT NULL Rank test: (P − MP )/ VP = 3.873, p = 0.00005 REJECT NULL ● ● ● ● ● ● ● ● ● −5 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● X_t W_t 0.0 0 ●● ● ● ● −0.2 X_t ● ● 10 0.4 ● ● 0.6 ● ● ● ● ACF 5 0.8 ● ● ● Notes 1.0 10 Testing for iid terms in Example III: MA(1) 20 30 t 40 50 0 10 20 30 40 50 Lag ACF plot test: WEAK EVIDENCE, REJECT NULL Ljung-Box test (h=15):√Q=26.3322, p = 0.1468 RETAIN NULL Rank test: (P − MP )/ VP = 1.21, p = 0.225 RETAIN NULL Next lecture... Notes What have we really been talking about when we’ve mentioned trend and seasonality? Classical decomposition Notes