Paper - Erik Lehr

Modeling High-Frequency Returns: Empirical
Evidence Using Google and PepsiCo
Erik James Lehr
Econ 589
Abstract
In recent years, the volume and frequency of trading in equity markets has increased dramatically. As a
result of this, along with the computerization of trading, researchers now have access to tick-by-tick
data, with the trade time accurate up to the millisecond. This wealth of information has clear benefits
for research, but also brings new problems, including microstructure issues that can bias estimates or
cause data to be inaccurate. This paper will discuss ways to avoid these pitfalls in modeling highfrequency data by cleaning data and using various sampling frequencies, and then demonstrate by
modeling return data from Google and Pepsico stocks with simple ARCH and GARCH models to estimate
conditional return volatility.
1
I: Introduction
Over the past several decades, trading in equity markets has increased exponentially. In January of
1990, about 181 million shares of companies in the Dow Jones Industrial Average (DJIA) were traded. By
March 2009, traded volume on the DJIA was over 9 billion1. With this massive increase in volume, the
frequency of trades has sharply risen as well, leading to a group of investors who trade with the specific
intent of using the higher frequency per se to capture profits. As a result, it is not unusual for frequently
traded stocks to be traded multiple times per second. This is understandably beneficial for financial
research, as much more data than was previously available can now be used in the modeling of prices
and returns. However, this additional data does not come without some negatives, the most prevalent
being market microstructure issues. The purpose of this essay is to discuss these issues, and address
possible ways to ameliorate negative effects they may have, and then demonstrate possible modeling
techniques for this type of data.
In this paper I used the Trades and Quotes (TAQ) database to download one month of tick data from
Google and PepsiCo. These data are imported into the R programming language and cleaned using the
RTAQ package. The cleaned price data are then converted into return data, and sampled at various
frequencies. This return data are then moved to EViews, and modeled using basic ARCH and GARCH
models in order to get a more accurate estimate of conditional volatility. The estimated conditional
volatility is then compared to the absolute returns.
The remainder of the paper is organized as follows: the next section presents some theoretical
background of high-frequency modeling and market microstructure, as well as discussing relevant
literature. Section III gives a description of the data used, and describes the process of cleaning the
data. Section IV first tests some assumptions of classical finance theory, and then shows the results of
the data modeling, and Section V concludes.
II: Background and market microstructure discussion
Market microstructure is defined as “the area of finance that is concerned with the process by which
investors’ latent demands are ultimately translated into transactions.”2 In other words, microstructure
involves analysis of how trades actually take place, and the issues related to this process. When dealing
1
2
Source: http://finance.yahoo.com
Madhavan, A., 2000. “Market microstructure: A survey.” Journal of Financial Markets 3, 205–258.
2
with high-frequency data, the sheer number of trades can amplify some of the problems relating to
microstructure. Thus, extra steps must be taken by researchers to ensure that the data used is valid.
Some of the problems are from incorrect or nonsensical data, such as an incorrectly entered price, a
price equal to zero, a price or spread well outside of the normal range for that asset at that particular
time, or a corrected or abnormal trade. Other microstructure issues can be more systemic, such as
different prices due to an asset being traded on different exchanges, or multiple trades occurring with
identical timestamps (though this may become less of a problem with the new millisecond TAQ data).
Similarly, measurement can be a problem with non-synchronous data, where a trade does not occur
every single second. This can be problematic as it leads to an uneven time series of data, making
modeling difficult. Even if all the above factors are dealt with, tick data is still susceptible to bid-ask
bounce, or the price appearing to bounce around depending on whether the trade recorded was a
purchase or a sale (meaning by the investor, as opposed to the market maker). A market maker will buy
at the bid price, and sell at the ask price, which will invariably be higher. This difference is known as the
bid-ask spread, and trading the spread is how market makers earn money.
Unfortunately, this
difference in prices of the asset depending on who bought and sold it causes measured volatility to
diverge.
Thus, in order to accurately model high-frequency data, these complications must be
addressed.
In cleaning my data, I follow the approach of Barndorff-Nielsen et al. (2008)3, which will be discussed
further in the next section. In order to eliminate the effects of bid-ask bounce, the data must be
sampled less frequently. Andersen and Bollerslev (1998)4 suggest sampling every 5 minutes, and show
that doing so is nearly always more than sufficient to stabilize volatility measurements. This less
frequent sampling also eliminates the problem of non-synchronous return data in any frequently traded
asset.5 Following this approach, I sample both at 5 and 10 minute intervals, and compare the results.
3
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., Shephard, N., 2009. “Realised kernels in practice: Trades and
quotes.” Econometrics Journal 12, 1_33.
4
Andersen, T.G., and Bollerslev, T. (1998), “Answering the Skeptics: Yes, Standard Volatility Models Do
Provide Accurate Forecasts,” International Economic Review, 39, 885-905.
5
The obvious downside of this method is that it does throw away large amounts of data, and there are methods to
incorporate all available data (e.g. Ait-Sahalia, Mykland, and Zhang (2005)), but those are outside the scope of this
paper.
3
Once the data have been cleaned and sampled, I can proceed to modeling and estimation. When
analyzing return data, perhaps the most important statistic measured is volatility6. Volatility is used in
nearly all types of asset pricing and analysis, from simple mean-variance analysis and portfolio theory to
value-at-risk or the mathematically intense Black-Sholes option pricing model. However, standard
methods for calculating volatility can give misleading results, depending on the process that is followed
by return data. For example, simple least squares regression of return data assumes both non-serially
correlated data and constant variance (i.e. no heteroskedasticity). Neither of these assumptions holds
in typical return data, and thus normal OLS estimates will be incorrect. With respect to volatility in
particular, most asset returns display volatility clustering, meaning that a large change in returns is likely
to be followed by another large change, and similarly with small changes. Perhaps the most widely used
technique to account for this non-constant volatility is Robert Engle’s ARCH/GARCH framework. These
conditional heteroskedasticity models allow the variance of the error term of the regression to follow an
autoregressive process (as well as a moving process in the case of GARCH), accounting for empirical
properties like fat-tailed error terms and volatility clustering. While many extensions have been made
to these workhorse models (e.g. IGARCH, PGARCH, TGARCH), Hansen and Lunde (2001)7 show even the
most complicated models do not significantly outperform a GARCH(1,1) model. With this in mind, I will
use simple ARCH and GARCH models to estimate conditional volatility. Next, I will discuss the data used
and describe in depth the cleaning process.
III: Data
In this study I used one month worth of tick data for Google and PepsiCo. The sample covered 20 days,
from May 3 to May 28 of 20108. I downloaded the data from the TAQ database using the Wharton
Research Data Services (WRDS) website. Next, I imported the data into R using the RTAQ package
written by Jonathan Cornelissen and Kris Boudt. As mentioned above, I followed the cleaning procedure
used by Barndorff-Nielsen, et al., addressing first the trades, followed by the quotes, and then merging
the data. First, all trades and quotes outside of normal trading hours (9:30 AM - 4:00 PM EST) were
eliminated. Next, any data point with a price equal to zero was removed. To avoid apparent price
differences due to trading on various exchanges, only trades or quote from the most common exchange
6
Volatility in finance can be measured as either variance and standard deviation, with the latter being somewhat
more common among practitioners.
7
Hansen, P. R. and Lunde, A. (2001), “A comparison of volatility models: Does anything beat a GARCH(1,1)?”,
Working Paper Series No. 84, Aarhus School of Business, Centre for Analytical Finance, 1-41.
8
May 31 was technically a weekday, but markets were not open due to the Memorial Day holiday.
4
were retained (NASDAQ in the case of Google, and NYSE for Pepsi). When multiple trades or multiple
quotes had the same time stamp, these observations were merged. For trades specifically, if any trade
had an abnormal sale condition, or a corrected trade, it was deleted. For quotes, any quote with a
negative bid-ask spread or with a spread that was much too large given the median spread for that day
was eliminated as well. Finally, trades and quotes data were matched using the standard adjustment of
matching the quotes to trades 2 seconds behind them (as quotes are registered faster than trades).
Finally, with the TAQ data cleaned and matched, I was able to sample at different frequencies. The
more infrequent the sampling, the smoother the data series appears, as can be seen using the price data
for May 3rd for each asset. First, I graph the price of Pepsi using tick data:
65.0
65.2
65.4
65.6
65.8
66.0
Pepsi
May 03 09:30:43
May 03 11:45:07
May 03 14:00:20
As can be seen from the above plot, there are numerous jumps at very small intervals over the course of
the day, representing the movement caused by bid-ask bounce. This effect gives the price data a much
5
more choppy appearance than is actually representative of the change in prices. Compare this to the
same data sample at 5 and 10 minute intervals:
65.0
65.2
65.4
65.6
65.8
66.0
Pepsi_5Min
May 03 09:35
May 03 11:30
May 03 13:15
6
May 03 15:00
65.2
65.4
65.6
65.8
66.0
Pepsi_10Min
May 03 09:40 May 03 11:20 May 03 13:00
May 03 14:50
The localized spots of volatility have been smoothed away, giving a more accurate representation of the
path followed by the stock price. Similarly with Google, comparing the tick-by-tick prices to the 5 and 10
minute price data:
7
526
528
530
532
Google
May 03 09:30:01
May 03 11:45:02
May 03 14:00:02
As with Pepsi, we notice a large amount of volatility over very short periods of time, giving the
appearance of a price that experiences large jumps over small intervals. This effect is even more evident
in the price of Google, as the bid-ask spread is larger due to a significantly higher stock price (at the time
covered in this analysis, Google was trading at a price over 8 times the price of Pepsi). Once again, this
effect can be smoothed away through less frequent sampling, as is shown in the following two graphs.
8
527
528
529
530
531
532
Google_5Min
May 03 09:35
May 03 11:30
May 03 13:15
May 03 15:00
527
528
529
530
531
532
Google_10Min
May 03 09:40 May 03 11:20 May 03 13:00
9
May 03 14:50
The next step is to move the cleaned data into EViews, where it can be modeled using the methods
described above.
IV. Modeling and estimation of volatility
Now that the data have been cleaned, it can be modeled without having to account for possible biases
caused by the market microstructure issues described above.
However, before I begin model
estimation, it is informative to make a quick detour and test a few assumptions of classical finance
theory, particularly those of normally distributed returns as well as the Efficient Market Hypothesis9
(EMH). First, I plot the Pepsi returns in a histogram to see if the distribution appears to approximate a
normal distribution10:
PEP_5MIN
800
700
Frequency
600
500
400
300
200
100
0
-.04
-.03
-.02
-.01
.00
.01
.02
.03
The return distribution is very clearly not normal, with an extremely skinny center, and accordingly fat
tails. To further confirm this non-normality, I use a quantile-quantile plot:
9
Fama, Eugene F. (1970). “Efficient Capital Markets: A Review of Theory and Empirical Work.”
Journal of Finance XXV, no. 2 .
10
I use the returns sampled at 5 minutes for this analysis, as the sheer number of returns in the tick-by-tick data
makes the graphs somewhat unreadable.
10
.008
.006
Quantiles of Normal
.004
.002
.000
-.002
-.004
-.006
-.008
-.04
-.03
-.02
-.01
.00
.01
.02
.03
Quantiles of PEP_5MIN
If the returns followed a normal distribution, the points on the plot should closely follow the line in the
middle of the graph, which they obviously do not. There appear to be far too many outliers, both
positive and negative, for us to conclude that this process is approximately normal. To put a final nail in
the normal distribution coffin, I calculate the Jarque-Bera statistic. The J-B stat, which should be at or
around 0 for normally distributed data, is 483171.5.
Having resoundingly rejected normality of
returns11, I now can test market efficiency.
Market efficiency implies that asset prices (and therefore asset returns) are inherently unforecastable.
Any new market information is instantly factored into the price of a security, thus no trader should be
able to consistently make money by trading off public information.
This conclusion essentially
marginalizes fields like fundamental and technical analysis, so naturally there have been numerous tests
of this hypothesis. The test I will be using is a variance ratio test. If markets are efficient, then return
data should follow a random walk12, meaning the expectation of the price next period is equal to the
price this period, and any change in price is from an unpredictable noise process. In its least restrictive
form, the Random Walk hypothesis states:
= + 11
(1)
As an aside, it is important to note that returns tend to display aggregated normality, meaning the longer the
sample period, the more likely returns are to be normal. My sample period is extremely short, and the return data
is (appropriately) extremely non-normal.
12
Or more accurately, a martingale process.
11
Where rt represents the difference of logged prices, µ is the mean of returns, and εt is the random noise
process, where observations of εt are uncorrelated across time. Using a variance ratio test13 I can
estimate the ratio of the variance of two compounded returns q periods apart to q times the variance of
a one-period return. This is far easier to understand in equation form:
= − =
∗ ∗ (2)
The VR test then calculates the value of VR(q) – 1, which is distributed N(0, 1) under the null hypothesis
of returns following a martingale process. I perform this test on the Pepsi data:
Null Hypothesis: PEP_5MIN is a martingale
Date: 10/03/11 Time: 16:03
Sample: 1 1600
Included observations: 1559 (after adjustments)
Heteroskedasticity robust standard error estimates
User-specified lags: 2 4 8 16
Joint Tests
Value
df
Probability
Max |z| (at period 2)*
2.869702
1559
0.0163
As can be seen from the above table, the VR test returns a value of 2.8697, meaning I am able to reject
the Random Walk hypothesis for Pepsi returns at the 5% level. Thus I am able to conclude that intraday
Pepsi returns are non-normal, and do not follow a martingale process, which is somewhat at odds with
classical finance theory. Next, I perform these same tests on Google return data.
First, I again test normality with a histogram and QQ-plot:
13
Lo, A. W. and MacKinlay, A. C. (1988) “Stock market prices do not follow random walks: evidence from a simple
specification test,” Review of Financial Studies, 1(1), 41-6
12
GOOG_5MIN
700
600
Frequency
500
400
300
200
100
0
-.03
-.02
-.01
.00
.01
.02
.03
.04
.05
.0100
.0075
Quantiles of Normal
.0050
.0025
.0000
-.0025
-.0050
-.0075
-.0100
-.03 -.02 -.01
.00
.01
.02
.03
.04
.05
Quantiles of GOOG_5MIN
As we saw with Pepsi earlier, Google returns are very clearly not normally distributed, though unlike
before they seem to be somewhat skewed to the right. The Jarque-Bera statistic is 320218.6, further
confirming the non-normality of the data. Next, I perform a variance ratio test:
13
Null Hypothesis: GOOG_5MIN is a martingale
Date: 10/04/11 Time: 12:26
Sample: 1 1600
Included observations: 1568 (after adjustments)
Heteroskedasticity robust standard error estimates
User-specified lags: 2 4 8 16
Joint Tests
Value
df
Probability
Max |z| (at period 4)*
3.290357
1568
0.0040
This time, the value of the test is even higher, and we are able to reject the Random Walk hypothesis at
the 1% level. Thus, like Pepsi, Google data is non-normally distributed, and the returns do not follow a
random walk.
Next, I proceed to model estimation14. The two types of models I will be testing are the ARCH(1) and
GARCH(1,1). The choice for ARCH/GARCH terms may seem simplistic, but higher order terms in each
model result in a poorer fit using the Akaike and Schwarz information criterion. First, I run each model
on the Google data sampled at 5 minute intervals, and graph estimates for conditional
volatility(standard deviation in this case)15:
14
To conserve space, I will only analyze Google in this section, but all the same graphs for Pepsi will be included in
the appendix. The results of that analysis are nearly identical to those with Google in terms of the conclusions
drawn in this essay.
15
As the focus of this paper is more on the cleaning and sampling of high-frequency data than model comparison, I
will not spend much time describing the results of the model estimation beyond a comparison of conditional
volatility graphs.
14
ARCH(1):
.024
.020
.016
.012
.008
.004
.000
250
500
750
1000
1250
1500
1250
1500
Conditional standard deviation
GARCH(1,1):
.024
.020
.016
.012
.008
.004
.000
250
500
750
1000
Conditional standard deviation
To see the effect of these models on volatility estimation, compare these graphs to that of absolute
returns:
15
AGOOG_5MIN
.05
.04
.03
.02
.01
.00
250
500
750
1000
1250
1500
As is instantly evident from a comparison of the graphs, the conditional volatility estimates from the
ARCH and GARCH models are far smoother than the absolute returns. Next, I run the same tests with
the Google data sampled at a 10 minute frequency.
ARCH(1):
.024
.020
.016
.012
.008
.004
.000
100
200
300
400
500
Conditional standard deviation
16
600
700
GARCH(1,1):
.030
.025
.020
.015
.010
.005
.000
100
200
300
400
500
600
700
Conditional standard deviation
Again, compare these conditional volatility estimates to absolute returns:
AGOOG_10MIN
.05
.04
.03
.02
.01
.00
100
200
300
400
500
600
700
800
As before, the model estimates are far smoother than the absolute returns. However, it is interesting to
note that their does not appear to be much difference in the conditional volatility graphs of returns
17
sampled at 5 and 10 minutes. The implication of this is that the conditional volatility settles down at or
before 5 minute intervals, meaning the market microstructure effects discussed above are no longer
problematic at that sampling frequency. To get a better idea of these microstructure effects, I also
estimated the ARCH and GARCH models on the cleaned tick data.
ARCH(1):
.020
.016
.012
.008
.004
.000
10000
20000
30000
40000
50000
Conditional standard deviation
18
60000
70000
GARCH(1,1):
.028
.024
.020
.016
.012
.008
.004
.000
10000
20000
30000
40000
50000
60000
70000
Conditional standard deviation
Again, the sheer number of data points (over 75000) makes these graphs somewhat hard to interpret,
but even a perfunctory glance shows that this data is far less smooth than the estimates from less
frequently sampled data, reflecting the issues discussed in section II (namely bid-ask bounce and nonsynchronous data).
V: Concluding remarks
In this paper I described the process of downloading high-frequency data, cleaning it with the R package
RTAQ, and estimating conditional volatility with ARCH and GARCH models. Using return data from
Google and PepsiCo, I tried to highlight certain issues inherent in the modeling of high-frequency data,
including incorrectly recorded or nonsensical data points, problems with time stamps and nonsynchronous data, and bid-ask bounce. I showed how some of these problems could be solved by
simple data cleaning, while others needed to be addressed with sampling frequency. Finally, I estimated
conditional volatility using basic conditional heteroskedasticity models, and showed how they give much
smoother estimates for volatility than using basic sample methods like absolute returns.
It is important to be aware of alternative data modeling techniques such as the ARCH/GARCH
framework when using return data, because the processes followed by typical return data are often not
as simple as basic finance theory would imply. My findings that the return data was non-normal and did
19
not follow a martingale are certainly not unique, and it is important to account for this in model
estimation. Classical finance often uses simplified models of returns to avoid extremely messy and
inelegant math, but it is important when doing financial research that the models estimated are
appropriate to the data, as opposed to simply fitting the theory.
20
References
1) Ait-Sahalia, Y., Mykland, P. A., and Zhang, L. (2005), "How Often to Sample a Continuous-Time
Process in the Presence of Market Microstructure Noise," Review of Financial Studies, 18, 351416.
2) Andersen, T.G., and Bollerslev, T. (1998), “Answering the Skeptics: Yes, Standard Volatility
Models Do Provide Accurate Forecasts,” International Economic Review, 39, 885-905.
3) Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., Shephard, N., (2009), “Realised kernels in
practice: Trades and quotes.” Econometrics Journal 12, 1_33.
4) Bollerslev, T. (1986), "Generalized Autoregressive Conditional Heteroskedasticity," Journal of
Econometrics, 31, 307-327.
5) Engle, R. F. (1982), "Autoregressive Conditional Heteroskedasticity With Estimates of the
Variance of United Kingdom Inflation," Econometrica, 50, 987-1008.
6) Fama, Eugene F. (1970), “Efficient Capital Markets: A Review of Theory and Empirical Work.”
Journal of Finance XXV, no. 2 .
7) Hansen, P. R. and Lunde, A. (2001), “A comparison of volatility models: Does anything beat a
GARCH(1,1)?”, Working Paper Series No. 84, Aarhus School of Business, Centre for Analytical
Finance, 1-41.
8) Lo, A. W. and MacKinlay, A. C. (1988), “Stock market prices do not follow random walks:
evidence from a simple specification test,” Review of Financial Studies, 1(1), 41-6
9) Madhavan, A., (2000), “Market microstructure: A survey.” Journal of Financial Markets 3, 205–
258.
10) Zivot, E. (2005), “Analysis of High Frequency Financial Data: Models, Methods and Software. Part
II: Modeling and Forecasting Realized Variance Measures.” Unpublished.
21
Appendix
Graphs of conditional variance from model estimation using Pepsi:
5 minute data
ARCH(1):
.040
.035
.030
.025
.020
.015
.010
.005
.000
250
500
750
1000
1250
1500
1250
1500
Conditional standard deviation
GARCH(1,1):
.036
.032
.028
.024
.020
.016
.012
.008
.004
.000
250
500
750
1000
Conditional standard deviation
22
Absolute returns:
APEP_5MIN
.032
.028
.024
.020
.016
.012
.008
.004
.000
250
500
750
1000
1250
1500
10 minute data
ARCH(1):
.035
.030
.025
.020
.015
.010
.005
.000
100
200
300
400
500
Conditional standard deviation
23
600
700
GARCH(1,1):
.030
.025
.020
.015
.010
.005
.000
100
200
300
400
500
600
700
Conditional standard deviation
Absolute returns:
APEP_10MIN
.032
.028
.024
.020
.016
.012
.008
.004
.000
100
200
300
400
24
500
600
700
800