First steps in time series

First steps in time series
Time series look different
300000
1400000000
250000
1200000000
1000000000
200000
800000000
150000
600000000
100000
400000000
50000
200000000
0
1940
56
51
46
41
36
31
26
21
16
11
6
1
0
1950
1960
1970
1980
1990
2000
2010
But they are not
Xt =
systematic pattern +
noise
Obscures the pattern
Often made with two basic components
trend + seasonality
periodic
Underlying linear/non-linear, time changing, uncanny model
300000
250000
200000
150000
100000
50000
0
oct-95
mar-97
jul-98
Monthly car production in Spain
dic-99
abr-01
sep-02
Trend analysis
Smooth data using an underlying model
• Moving average
1 t
ma (t , m) =
xi
∑
m i =t − m +1
• Exponential moving average
st = α xt + (1 − α ) st −1
Good for one-period ahead forecasting (weather)
• Fit any function
linear, splines, log, exp, your guess
linear
Detect and
model trend
ma(12)
100000
50000
Substract trend
45
41
37
33
29
25
21
17
9
13
-50000
5
1
0
-100000
-150000
-200000
Stationarity
(mean and variance)
Seasonality
Look for autocorrelations
• absolute values
xt , xt −1 , xt − 2 , xt −3 ,...
• increments
xt − xt −1 , xt −1 − xt − 2 , ...
• other combination of variables! (intuition+expertise)
lag 1
lag 2
lag 3
lag 4
lag 5
lag 6
lag 7
lag 8
lag 9
lag 10
lag 11
lag 12
lag 13
lag 14
lag 15
lag 16
lag 17
lag 18
lag 19
lag 20
lag 21
lag 22
lag 23
lag 24
npat 58
npat 57
npat 56
npat 55
npat 54
npat 53
npat 52
npat 51
npat 50
npat 49
npat 48
npat 47
npat 46
npat 45
npat 44
npat 43
npat 42
npat 41
npat 40
npat 39
npat 38
npat 37
npat 36
npat 35
corr -0.0397427758
corr -0.274627552
corr -0.265544135
corr 0.20316958
corr -0.0689740424
corr -0.0261346967
corr -0.106164043
corr 0.242284075
corr -0.245981648
corr -0.315532046
corr -0.0585207867
corr 0.94479132
corr -0.0744984938
corr -0.255095906
corr -0.272366416
corr 0.197473796
corr 2.4789972E-05
corr -0.0635903421
corr -0.138785333
corr 0.311166354
corr -0.224766545
corr -0.337246239
corr -0.056239266
corr 0.916841357
xt vs
xt -lag
Structure seasonality
anticorrelations
xt − xt −1
xt −1
lag 1
lag 2
lag 3
lag 4
lag 5
lag 6
lag 7
lag 8
lag 9
lag 10
lag 11
lag 12
lag 13
inc 1
inc 1
inc 1
inc 1
inc 1
inc 1
inc 1
inc 1
inc 1
inc 1
inc 1
inc 1
inc 1
npat 57
npat 56
npat 55
npat 54
npat 53
npat 52
npat 51
npat 50
npat 49
npat 48
npat 47
npat 46
npat 45
vs
corr -0.390023157
corr -0.101618385
corr -0.184521702
corr 0.275796168
corr -0.11691902
corr 0.0605651902
corr -0.146978369
corr 0.309853501
corr -0.18309118
corr -0.127671412
corr -0.35079422
corr 0.944415972
corr -0.390599355
xt −lag − xt −lag −1
xt −lag −1
12-month correlation!
xt − xt − 2
xt − 2
lag 1
lag 2
lag 3
lag 4
lag 5
lag 6
lag 7
lag 8
lag 9
lag 10
lag 11
lag 12
lag 13
inc 2
inc 2
inc 2
inc 2
inc 2
inc 2
inc 2
inc 2
inc 2
inc 2
inc 2
inc 2
inc 2
npat 56
npat 55
npat 54
npat 53
npat 52
npat 51
npat 50
npat 49
npat 48
npat 47
npat 46
npat 45
npat 44
vs
corr 0.0635841158
corr -0.661845284
corr -0.14900299
corr 0.236847899
corr 0.0775265432
corr -0.1446445
corr 0.0427922726
corr 0.268605346
corr -0.13164323
corr -0.660207622
corr 0.0676404933
corr 0.956921988
corr 0.0444132255
xt −lag − xt −lag − 2
xt −lag − 2
2-month anticorrelation!
Danger:
correlations vs error
const
Fourier analysis uncovers periodicity
2/48=.131
Single tick = 2
More sophisticated analysis
is possible but brings little
further information
Reasonable bets:
xt = f ( xt −12 )
 xt −1 − xt − 2 xt −1 − xt −13 
xt − xt −1

= f 
,
xt −1
xt −13 
 xt − 2
 xt − 2 − xt − 4 xt − 2 − xt −14 
xt − xt − 2

= f 
,
xt − 2
xt −14 
 xt − 2
Best linear fit + neural fit to 12 based on the last 12 months
300000
250000
200000
Goal
150000
Linear
100000
Neural
50000
45
41
37
33
29
25
21
17
13
9
5
1
0
Magnification of the last 12 months
(8 train patterns + 4 predictions)
300000
250000
200000
Goal
150000
Linear
100000
prediction
50000
0
1
2
3
4
5
6
7
8
9 10 11 12
Neural
Linear fit heavily depends on one variable
Neural net finds non-linear relations that enhance correlations
Simple is good
If in doubt, start with a simple dependency
Ex:
xt = xt-lag
• lag = 1 day
• Weather forecast
• Donuts
• lag = 7 days
• Donuts
• Electricity load curve
• lag = 1 year
• Electricity load curve
• Sales
ARIMA
Define
Backward shift
Bxt = xt −1
Backward difference
∆xt = xt − xt −1 = (1 − B ) xt
Polynomials of degree p and q
Φ p ( B) , Θ q ( B)
Auto-Regressive Moving Average time series models
Φ p ( B ) ∆d xt = Θ q ( B )ε t
Φ p ( B ) ∆d xt = Θ q ( B )ε t
AR(p)
I(d)
Φ p ( B ) = 1 − φ1 B − φ 2 B 2 − ... − φ p B p
∆d
2
q
Θ
(
B
)
=
1
−
θ
B
−
θ
B
−
...
−
θ
B
MA(q)
q
1
2
q
Zeros in polynomials must lie outside unit circle
Example: ARIMA(1,0,0)
xt = φ1 xt −1 + ε t
NN enhancement
Rather than using recursive NN,
• carry out linear analysis
• preprocess data to ARIMA like
• perform a linear forecast
• feed a NN with all the linear analysis
• preprocessed data
• linear prediction
The NN will learn (if any) the underlying law controlling
the departures of real data from linear analysis
Leave-one-out NN
When the number of data are very small,
Vars + Goal = Pattern 1
Vars + Goal = Pattern 2
train
Vars + Goal = Pattern 3
Vars + Goal = Pattern 4
Vars + Goal = Pattern 5
Vars + Goal = Pattern 6 leave out
Collect statistics
leave out
train
step 1
step n
Predict 1
Predict n
About noise
We have discarded noise
Be careful
asset1
asset2
=
=
finances
trend1 + noise1
trend2 + noise2
correlations
dsti
= µ i dt + σ dWt i
st
Cov(i,j) 3000*3000
Huge CPU time
VaR
Brownian motion: N(0,⌦t)
Summary
• Time series are often structured
• Analyze trend + seasonality + noise
• Build a linear model with preprocessed data
• Build NN on top of previous analysis