Limit Order Books - Chair of Quantitative Finance, MAS, ECP

Limit Order Books
Frédéric Abergel, Marouane Anane, Anirban Chakraborti,
Aymen Jedidi, Ioane Muni Toke
March 19, 2015
2
Contents
1
A short introduction to limit order books
1
I
Empirical properties of order driven markets
5
2
Statistical properties of limit order books: a survey
7
2.1
Time of arrivals of orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.2
Volume of orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3
Placement of orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4
Cancellation of orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5
Intraday seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3
The order book shape as a function of the average size of limit orders
21
4
Empirical evidence of market making and market taking
29
4.1
Arrival times of orders: event time is not enough . . . . . . . . . . . . . . . . . . . . . 29
4.2
Empirical evidence of “market making” . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3
A reciprocal “market following limit” effect ? . . . . . . . . . . . . . . . . . . . . . . . 32
II
Mathematical modelling of limit order books
35
5
Agent-based modeling of limit order books: a survey
37
5.1
Agent-based models as a mechanism for stylized facts . . . . . . . . . . . . . . . . . . . 37
5.2
Early order-driven market modelling: market microstructure and policy issues . . . . . . 39
5.3
5.2.1
A pioneer order book model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.2
Microstructure of the double auction . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2.3
Zero-intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Order-driven market modelling in Econophysics . . . . . . . . . . . . . . . . . . . . . . 41
5.3.1
The order book as a reaction-diffusion model . . . . . . . . . . . . . . . . . . . 41
5.3.2
Introducing market orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3.3
The order book as a deposition-evaporation process . . . . . . . . . . . . . . . . 44
5.4
Empirical zero-intelligence models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5
Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3
6
The mathematical structure of zero-intelligence limit order book models
6.1
An elementary approximation: perfect market making . . . . . . . . . . . . . . . . . . . 51
6.2
Order book dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3
6.4
7
9
Model setup: Poissonian arrivals, reference frame and boundary conditions . . . 53
6.2.2
Analytical treatment of some limit order book models . . . . . . . . . . . . . . 54
6.2.3
Evolution of the order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2.4
Infinitesimal generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2.5
Price dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Ergodicity and diffusive limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.3.1
Ergodicity of the order book and rate of convergence to the stationary state . . . 60
6.3.2
Large-scale limit of the price process . . . . . . . . . . . . . . . . . . . . . . . 61
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
65
7.1
The shape of a limit order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2
A link between the flows of order and the shape of an order book . . . . . . . . . . . . . 67
7.2.1
The basic queueing system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.2.2
A continuous extension of the basic model . . . . . . . . . . . . . . . . . . . . 69
Comparison to existing results on the shape of the order book . . . . . . . . . . . . . . . 71
7.3.1
Numerically simulated shape in [303] . . . . . . . . . . . . . . . . . . . . . . . 71
7.3.2
Empirical and analytical shape in [49] . . . . . . . . . . . . . . . . . . . . . . . 72
7.4
A model with varying sizes of limit orders . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.5
Influence of the size of limit orders on the shape of the order book . . . . . . . . . . . . 78
7.6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Advanced modelling of limit order books
8.1
III
6.2.1
The order book as a queueing system
7.3
8
51
83
Towards non-trivial behaviours: modelling market interactions . . . . . . . . . . . . . . 83
8.1.1
Herding behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.1.2
Fundamentalists and trend followers . . . . . . . . . . . . . . . . . . . . . . . . 84
8.1.3
Model setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.1.4
The infinitesimal generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.2
Stability of the order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.3
Large scale limit of the price process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Simulation of limit order books
93
Numerical simulation of limit order books
9.1
Zero-intelligence limit order book simulator . . . . . . . . . . . . . . . . . . . . . . . . 96
9.1.1
9.2
95
Results for CAC 40 stocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Simulation of a limit order book modelled by Hawkes processes . . . . . . . . . . . . . 113
9.2.1
Adding dependence between order flows . . . . . . . . . . . . . . . . . . . . . 113
4
9.2.2
Numerical results on the order book . . . . . . . . . . . . . . . . . . . . . . . . 114
10 A generic limit order book simulator
119
10.1 Discrete event simulation components . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.1.1 Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.1.2 Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.2 Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
10.2.1 Basic order functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
10.2.2 Market orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.2.3 Limit orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.2.4 Iceberg orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.2.5 Cancel orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.2.6 Limit market orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.2.7 Always best orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.2.8 Virtual market orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.3 Order book and order queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.3.1 Order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
10.3.2 Local order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
10.3.3 Order queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
10.3.4 Remote order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
10.4 Traders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
10.5 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
10.5.1 Strategy base classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
10.5.2 Concrete strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
10.5.3 Arbitrage trading strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
10.5.4 Adaptive strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
IV
Imperfection and predictability in order-driven markets
11 Market imperfection and predictibility
141
143
11.1 Forecasting and market efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
11.2 Data, methodology and performances measures . . . . . . . . . . . . . . . . . . . . . . 144
11.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
11.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
11.2.3 Performance measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
11.3 Conditional probability matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
11.3.1 Binary method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
11.3.2 Four-class method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
11.4 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11.4.1 Ordinary least squares (OLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
11.4.2 Ridge regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5
11.4.3 Least Absolute Shrinkage and Selection Operator (LASSO) . . . . . . . . . . . 158
11.4.4 ELASTIC NET (EN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
V
Appendices
165
A Limit order book data
167
A.1 Limit order book data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
A.2 Stylized facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
A.3 Market taking and making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
B Some useful mathematical notions
171
B.1 Point processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
B.1.1
Hawkes processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
B.2 Ergodic theory for Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
B.2.1
The ergodic theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
B.2.2
Stochastic stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
C Comparison of various prediction methods
177
C.1 Results for the binary classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
C.2 Results for the four-class classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
C.3 Performances of the OLS method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
C.4 Performances of the ridge method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
C.5 Performances of the LASSO method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6
Foreword
The chair of quantitative finance has been created at École Centrale Paris in 2007. Since its inception, it
has devoted most of its research activities to the study of high frequency financial data. In particular, a
sizeable portion of these researches have focused on the characterization of limit order books, the central
object in every modern, electronic financial market. Due to its relevance to the understanding of financial markets, the limit order book has triggered quite a huge amount of research in the past twenty years,
starting with the seminal work of [40] on the empirical analysis of the Paris exchange, and revitalized
a few years later in a fascinating manner - in our opinion - in [303]. However, much as this subject is
interesting, important and challenging, we realized that there was still no reference book on the subject,
and we therefore decided to assemble in a single document a survey of the existing litterature on the
subject, together with our own contributions on limit order books, whether they are pertaining to their
statistical properties, mathematical modeling or numerical simulation.
A large part of the material covered in this online book has been already presented in research or survey
articles, in particular [67][68][7][8] [22][252][253] but it could not be found in a single place where
the emphasis was set on a single object of interest. We hope that this work will help cover this gap in
the abundant literature on market microstructure and limit order books, and that the reader - researcher,
graduate student or praticioner - will appreciate finding most of the contemporary knowledge on this
essential component of modern financial markets.
Finally, we also warn the reader that most sections are constantly being refurbished/reorganized/rewritten,
and that this online book may evolve for a long time before it reaches a satisfactory, stationary state!
i
ii
Chapter 1
A short introduction to limit order books
The vast majority of organized, electronic markets, in particular, all equity, futures and other listed
derivatives markets, use a device called the limit order book in order to store in their central computer,
the list of all the interests of market participants. A limit order book is essentially a file in a computer
that contains all the orders sent to the market, with their characteristics such as the sign of the order (buy
or sell), the price, the quantity, a timestamp giving the time the order was recorded by the market,
and many other, often market-dependent information. In other words, the limit order book contains, at
any given point in time, on a given market, the list of all the transactions that one could possibly enter
into on this market. As such, the limit order books contains all the information available on a specific
market, and moreover, its evolution in time describes the way the market moves under the influence of
its participants. In particular, one can obviously extract the price dynamics from that of the limit order
book, but of course, the latter is a much more complete, and complex object than the former. A market in
which buyers and sellers meet via a limit order book is called an order-driven market. In order-driven
markets, buy and sell orders are matched as they arrive over time subject to some priority rules. Priority
is always based on price, and then, in most markets, on time1 , according to a FIFO 2 rule. Essentially,
three types of orders can be submitted:
• Limit order: an order specifying a price at which one is willing to buy or sell a certain number of
shares - the limit order book is the list of all buy and sell limit orders, with their corresponding
price and quantity, at any point in time;
• Market order: an order to immediately buy or sell a certain quantity at the best available opposite
quote;
• Cancellation order: an order cancelling an existing limit order.
In the microstructure literature, agents who submit limit orders are referred to as liquidity providers.
Those who submit market orders are referred to as liquidity takers. In real markets, especially since the
various deregulation waves that hit the US markets in 2005 and the european markets in 2007, see for
instance [9], there is no such thing as a pure liquidity provider or taker, and this classification should
1
2
There are some noticeable exceptions of markets following a pro rata rule, but we will not focus on them
first in, first out
1
(1) initial state
(2) liquidity is taken
(3) wide spread
40
40
40
30
30
30
20
20
20
10
10
10
0
0
0
-10
-10
-10
-20
-20
-20
-30
-30
-30
-40
-40
95
100
105
110
-40
95
100
105
110
95
(4) liquidity returns
(5) liquidity returns
40
40
40
30
30
30
20
20
20
10
10
10
0
0
0
-10
-10
-10
-20
-20
-20
-30
-30
-30
-40
-40
95
100
105
110
100
105
110
(6) final state
-40
95
100
105
110
95
100
105
110
Figure 1.1: Order book schematic illustration: a buy market order arrives and removes liquidity from
the ask side, then sell limit orders are submitted and liquidity is restored.
be understood as a convenient shorthand rather than a realistic description of the behaviour of market
participants. Note that, depending on the market under consideration, there may exist variations of these
three basic types described above, and we will at times consider more sophisticated types of orders.
Limit orders are stored in the order book until they are either executed against an incoming market order
or canceled. The ask price PA (or simply the ask) is the price of the best (i.e. lowest) limit sell order.
The bid price PB is the price of the best (i.e. highest) limit buy order. The gap between the bid and the
ask
S := PA − PB ,
(1.0.1)
is always positive and is called the spread. Prices are not continuous, but rather have a discrete resolution
∆P, the tick, which represents the smallest quantity by which they can change. We define the mid-price
as the average between the bid and the ask
P :=
PA + P B
.
2
(1.0.2)
The price dynamics is the result of the interplay between the incoming order flow and the order
book [50]. Figure 1.1 is a schematic illustration of this process [138]. Note that we chose to represent
quantities on the bid side of the book by non-positive numbers.
2
It is clear that the study of the empirical properties, as well as the mathematical modelling and
numerical simulation, of limit order books is of paramount importance for the researcher keen on understanding financial markets. Our approach is to start with the data, trying to assess their empirical
properties; then move on to mathematical models able to reproduce the observed properties; and finally,
to present a framework for numerical simulations.
Following empirical studies, the modelling of limit order book has been an area of intense research activity in the last 15 years. The remarkable interest in this area may have been partly motivated by sound
scientific curiosity, but it has very definitely been enhanced by the rapidly growing use of algorithmic
trading and high frequency trading. Schematically, two modeling approaches have been successful in
capturing key properties of the order book—at least partially. The first approach, led by economists,
models the interactions between rational agents who act strategically: they choose their trading decisions as solutions to individual utility maximization problems (see e.g. [266] and references therein).
In the second approach, proposed by econophysicists, agents are described statistically. In the most
basic expression in this line of research, they are supposed to act randomly. This approach is sometime
referred to as zero-intelligence order book modeling, in the sense that the arrival times and placements
of orders of various types are random and independent, the focus being more on the “mechanistic” aspects of the continuous double auction rather than the strategic interactions between agents. Despite this
apparent limitation, statistical order book models do capture many salient features of real markets, and
exhibit interesting, non-trivial mathematical properties that form the basis of a thorough understanding
of limit order books.
Of course, the practical interest of the mathematical models would be very limited, especially in the
numerous cases where closed-form solutions are not available, if one could not simulate, and analyze
the simulations of, limit order books. Besides presenting the numerical analysis of various models of
limit order books, we also introduce a very general, flexible open-source library which can be used by
researchers interested in the study of trading strategies in an order-driven market.
The organization of this book naturally follows the scientific approach described above : the first part
is devoted to the empirical properties of order driven markets, the second part, to the mathematical
modelling of limit order books and the third, to their numerical analysis. A fourth part considers the
subject with a slightly different, more applied angle, that of market imperfections. Each part starts with
a survey of the existing scientific literature on the corresponding topic, and then continues with our own
contributions.
3
4
Part I
Empirical properties of order driven
markets
5
Chapter 2
Statistical properties of limit order books:
a survey
The content of this chapter is based on two survey articles [67][68], updated to incorporate the contributions in the recent scientific literature on limit order books. The basic statistical properties of limit order
books, as can be observed from real data, are described and studied. Variables crucial to a fine modeling
of order flows and dynamics of order books are studied: time of arrival of orders, placement of orders,
size of orders, shape of order book, etc.
The computerization of financial markets in the second half of the 1980’s provided the empirical scientists with easier access to extensive data on order books. [40] is an early study of the new data flows on
the newly (at that time) computerized Paris Bourse. Many subsequent papers offer complementary empirical findings and modeling, e.g. [159], [73], [241], [49], [275]. In this chapter, we try to summarize
some of these empirical facts. The data set that we use is made precise in Appendix A.
Let us recall that the markets we are dealing with are electronic order books with no official market maker, in which orders are submitted in a double auction and executions follow price/time priority. Different market mechanisms have been widely studied in the microstructure literature, see e.g.
[149, 213, 154, 261, 39, 169]. We will not review this literature here (except [149] in Chapter 5), as this
would be too large a digression.
2.1
Time of arrivals of orders
The choice of the time count might be of prime importance when dealing with “stylized facts” of
empirical financial time series. Several results pertaining to the subordination of stochastic processes
([85, 299]) have already demonstrated that the Poisson hypothesis for the arrival times of orders is not
empirically verified.
We compute the empirical distribution for interarrival times – or durations – of market orders on the
stock BNP Paribas using our data set described in the previous section. The results are plotted in figures
2.1 and 2.2, both in linear and log scale. It is clearly observed that the exponential fit is not a good one.
We check however that the Weibull distribution fit is potentially a very good one. Weibull distributions
have been suggested for example in [195]. [269] also obtain good fits with q-exponential distributions.
7
100
BNPP.PA
Lognormal
Exponential
Weibull
Empirical density
10-1
10-2
10-3
10-4
10-5
10-6
1
10
100
Interarrival time
Figure 2.1: Distribution of interarrival times for stock BNPP.PA in log-scale.
In the Econometrics literature, these observations of non-Poissonian arrival times have given rise to
a large trend of modelling of irregular financial data. [117] and [116] have introduced autoregressive
condition duration or intensity models that may help modelling these processes of orders’ submission.
See [174] for a textbook treatment.
Using the same data, we compute the empirical distribution of the number of transactions in a given
time period τ. Results are plotted in figure 2.3. It seems that the log-normal and the gamma distributions
are both good candidates, however none of them really describes the empirical result, suggesting a
complex structure of arrival of orders. A similar result on Russian stocks was presented in [109].
2.2
Volume of orders
Empirical studies show that the unconditional distribution of order size is very complex to characterize.
[159] and [241] observe a power law decay with an exponent 1 + µ ≈ 2.3 − 2.7 for market orders and
1 + µ ≈ 2.0 for limit orders. [73] emphasize on a clustering property: orders tend to have a “round” size
in packages of shares, and clusters are observed around 100’s and 1000’s. As of today, no consensus
emerges in proposed models, and it is plausible that such a distribution varies very wildly with products
and markets.
In figure 2.4, we plot the distribution of volume of market orders for the four stocks composing
8
0.5
BNPP.PA
Lognormal
Exponential
Weibull
Empirical density
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
2
4
6
8
10
12
14
Interarrival time
Figure 2.2: Distribution of interarrival times for stock BNPP.PA (Main body, linear scale).
9
0.06
τ = 1 minute
τ = 5 minutes
τ = 15 minutes
τ = 30 minutes
Empirical density
0.05
0.04
0.03
0.02
0.01
0
0
1
2
3
4
5
6
Normalized number of trades
Figure 2.3: Distribution of the number of trades in a given time period τ for stock BNPP.PA. This
empirical distribution is computed using data from 2007, October 1st until 2008, May 31st.
10
our benchmark. Quantities are normalized by their mean. Power-law coefficient is estimated by a Hill
estimator (see e.g. [183, 108]). We find a power law with exponent 1 + µ ≈ 2.7 which confirms studies
previously cited. Figure 2.5 displays the same distribution for limit orders (of all available limits). We
find an average value of 1 + µ ≈ 2.1, consistent with previous studies. However, we note that the power
law is a poorer fit in the case of limit orders: data normalized by their mean collapse badly on a single
curve, and computed coefficients vary with stocks.
1
Power law ∝ x-2.7
Exponential ∝ e-x
BNPP.PA
FTE.PA
RENA.PA
SOGN.PA
0.1
Probability functions
0.01
0.001
0.0001
1e-05
1e-06
0.01
0.1
1
10
Normalized volume for market orders
100
1000
Figure 2.4: Distribution of volumes of market orders. Quantities are normalized by their mean.
2.3
Placement of orders
Placement of arriving limit orders [49] observe a broad power-law placement around the best quotes
on French stocks, confirmed in [275] on US stocks. Observed exponents are quite stable across stocks,
but exchange dependent: 1 + µ ≈ 1.6 on the Paris Bourse, 1 + µ ≈ 2.0 on the New York Stock Exchange,
1 + µ ≈ 2.5 on the London Stock Exchange. [248] propose to fit the empirical distribution with a Student
distribution with 1.3 degree of freedom.
We plot the distribution of the following quantity computed on our data set, i.e. using only the first
five limits of the order book: ∆p = b0 (t−) − b(t) (resp. a(t) − a0 (t−)) if an bid (resp. ask) order arrives
at price b(t) (resp. a(t)), where b0 (t−) (resp.a0 (t−)) is the best bid (resp. ask) before the arrival of this
order. Results are plotted on figures 2.6 (in semilog scale) and 2.7 (in linear scale).
11
These graphs
1
Power law ∝ x-2.1
Exponential ∝ e-x
BNPP.PA
FTE.PA
RENA.PA
SOGN.PA
0.1
Probability functions
0.01
0.001
0.0001
1e-05
1e-06
1e-07
0.01
0.1
1
10
Normalized volume of limit orders
100
1000
Figure 2.5: Distribution of normalized volumes of limit orders. Quantities are normalized by their mean.
being computed with incomplete data (five best limits), we do not observe a placement as broad as in
[49]. However, our data makes it clear that fat tails are observed. We also observe an asymmetry in the
empirical distribution: the left side is less broad than the right side. Since the left side represent limit
orders submitted inside the spread, this is expected. Thus, the empirical distribution of the placement
of arriving limit orders is maximum at zero (same best quote). We then ask the question: How is it
translated in terms of shape of the order book ?
Average shape of the order book Contrary to what one might expect, it seems that the maximum of
the average offered volume in an order book is located away from the best quotes (see e.g. [49]). Our
data confirms this observation: the average quantity offered on the five best quotes grows with the level.
This result is presented in figure 2.8. We also compute the average price of these levels in order to plot a
cross-sectional graph similar to the ones presented in [40]. Our result is presented for stock BNP.PA in
figure 2.9 and displays the expected shape. Results for other stocks are similar. We find that the average
gap between two levels is constant among the five best bids and asks (less than one tick for FTE.PA, 1.5
tick for BNPP.PA, 2.0 ticks for SOGN.PA, 2.5 ticks for RENA.PA). We also find that the average spread
is roughly twice as large the average gap (factor 1.5 for FTE.PA, 2 for BNPP.PA, 2.2 for SOGN.PA, 2.4
for RENA.PA).
12
Probability density function
100
BNPP.PA
Gaussian
Student
10-1
10-2
10-3
10-4
10-5
10-6
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
∆p
Figure 2.6: Placement of limit orders using the same best quote reference in semilog scale. Data used
for this computation is BNP Paribas order book from September 1st, 2007, until May 31st, 2008.
2.4
Cancellation of orders
[73] show that the distribution of the average lifetime of limit orders fits a power law with exponent
1 + µ ≈ 2.1 for cancelled limit orders, and 1 + µ ≈ 1.5 for executed limit orders. [248] find that in either
case the exponential hypothesis (Poisson process) is not satisfied on the market.
We compute the average lifetime of cancelled and executed orders on our dataset. Since our data
does not include a unique identifier of a given order, we reconstruct life time orders as follows: each
time a cancellation is detected, we go back through the history of limit order submission and look
for a matching order with same price and same quantity. If an order is not matched, we discard the
cancellation from our lifetime data. Results are presented in figure 2.10 and 2.11. We observe a power
law decay with coefficients 1 + µ ≈ 1.3 − 1.6 for both cancelled and executed limit orders, with little
variations among stocks. These results are a bit different than the ones presented in previous studies:
similar for executed limit orders, but our data exhibits a lower decay as for cancelled orders. Note that
the observed cut-off in the distribution for lifetimes above 20000 seconds is due to the fact that we do
not take into account execution or cancellation of orders submitted on a previous day.
13
Probability density function
0.4
BNPP.PA
Gaussian
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-0.1
-0.05
0
0.05
0.1
∆p
Figure 2.7: Placement of limit orders using the same best quote reference in linear scale. Data used for
this computation is BNP Paribas order book from September 1st, 2007, until May 31st, 2008.
2.5
Intraday seasonality
Activity on financial markets is of course not constant throughout the day. Figure 2.12 (resp. 2.13) plots
the (normalized) number of market (resp. limit) orders arriving in a 5-minute interval. It is clear that
a U-shape is observed (an ordinary least-square quadratic fit is plotted): the observed market activity is
larger at the beginning and the end of the day, and more quiet around mid-day. Such a U-shaped curve
is well-known, see [40], for example. On our data, we observe that the number of orders on a 5-minute
interval can vary with a factor 10 throughout the day.
[73] note that the average number of orders submitted to the market in a period ∆T vary wildly
during the day. The authors also observe that these quantities for market orders and limit orders are
highly correlated. Such a type of intraday variation of the global market activity is a well-known fact,
already observed in [40], for example.
14
1.15
Average numbers of shares (Normalized by mean)
1.1
1.05
1
0.95
0.9
0.85
0.8
BNPP.PA
FTE.PA
RENA.PA
SOGN.PA
0.75
0.7
-6
-4
-2
0
2
4
Level of limit orders (negative axis : bids ; positive axis : asks)
Figure 2.8: Average quantity offered in the limit order book.
15
6
Average numbers of shares (Normalized by mean)
720000
700000
680000
660000
640000
620000
600000
BNPP.PA
580000
66.98
67
67.02
67.04
67.06
67.08
67.1
67.12
Level of limit orders (negative axis : bids ; positive axis : asks)
Figure 2.9: Average limit order book: price and depth.
16
67.14
67.16
1
Power law ∝ x-1.4
BNPP.PA
FTE.PA
RENA.PA
SOGN.PA
0.1
Probability functions
0.01
0.001
0.0001
1e-05
1e-06
1e-07
1
10
100
1000
Lifetime for cancelled limit orders
10000
Figure 2.10: Distribution of estimated lifetime of cancelled limit orders.
17
1
Power law ∝ x-1.5
BNPP.PA
FTE.PA
RENA.PA
SOGN.PA
0.1
Probability functions
0.01
0.001
0.0001
1e-05
1e-06
1e-07
1
10
100
1000
Lifetime for executed limit orders
10000
Figure 2.11: Distribution of estimated lifetime of executed limit orders.
18
Number of market orders submitted in ∆t=5 minutes
3
2.5
2
1.5
1
0.5
BNPP.PA
BNPP.PA quadratic fit
FTE.PA
FTE.PA quadratic fit
0
30000
35000
40000
45000
50000
Time of day (seconds)
55000
60000
Figure 2.12: Normalized average number of market orders in a 5-minute interval.
19
65000
3.5
Number of limit orders submitted in ∆t=5 minutes
3
2.5
2
1.5
1
0.5
BNPP.PA
BNPP.PA quadratic fit
FTE.PA
FTE.PA quadratic fit
0
30000
35000
40000
45000
50000
Time of day (seconds)
55000
60000
Figure 2.13: Normalized average number of limit orders in a 5-minute interval.
20
65000
Chapter 3
The order book shape as a function of the
average size of limit orders
This short chapter presents an empirical study of the influence of the size of limit orders on the shape
of the limit order book. It adresses the same point as, and confirms the theoretical findings of, the
model introduced and studied in 7. We use Thomson-Reuters tick-by-tick data for fourteen stocks traded
on the Paris stock exchange, from January 4th, 2010 to February 22nd, 2010. The fourteen stocks
under investigation are: Air Liquide (AIRP.PA, chemicals), Alstom (ALSO.PA, transport and energy),
Axa (AXAF.PA, insurance), BNP Paribas (BNPP.PA, banking), Bouygues (BOUY.PA, construction,
telecom and media), Carrefour (CARR.PA, retail distribution), Danone (DANO.PA, milk and cereal
products), Lagardère (LAGA.PA, media), Michelin (MICP.PA, tires manufacturing), Peugeot (PEUP.PA,
vehicles manufacturing), Renault (RENA.PA, vehicles manufacturing), Sanofi (SASY.PA, healthcare),
Vinci (SGEF.PA, construction and engineering), Ubisoft (UBIP.PA, video games). All these stocks
except Ubisoft were included in the CAC 40 French index in January and February 2010, i.e. they
are among the largest market capitalizations and most liquid stocks on the Paris stock exchange (see
descriptive statistics below).
All movements on the first 10 limits of the ask side and the bid side of the order book are available,
which allows us to reconstruct the evolution of the first limits of the order book during the day. Each
trading day is divided into 12 thirty-minute intervals from 10am to 4pm. We obtain T = 391 intervals
for each stock1 . For each interval t = 1, . . . , T , and for each stock k = 1, . . . , 14, we compute the
µ
λ and market orders N
λ
total number of limit orders Nk,t
k,t and the average sizes of limit orders Vk,t and
µ
market orders Vk,t . Table 3.1 gives the average number of orders and their volumes (the overline denotes
P
µ
µ
the average over the time intervals: N k = T1 Tt=1 Nk,t , and similarly for other quantities). The lowest
average activity is observed on UBIP.PA and LAGA.PA (which are the only two stocks in the sample
with less than 200 market orders and 4000 limit orders in average). The highest activity is observed on
BNPP.PA (which is the only stock with more than 600 market orders and 6000 limit orders in average).
The smallest sizes of orders are observed on AIRP.PA (108.25 and 205.4 for market and limit orders),
and the largest sizes are observed on AXA.PA (535.6 and 876.6 for market and limit orders).
1
Three and a half days of trading are missing or with incomplete information in our dataset: January 15th, the morning of
January 21st, February 18th and 19th.
21
Stock k
AIRP.PA
ALSO.PA
AXAF.PA
BNPP.PA
BOUY.PA
CARR.PA
DANO.PA
LAGA.PA
MICP.PA
PEUP.PA
RENA.PA
SASY.PA
SGEF.PA
UBIP.PA
Nkµ
Vkµ
Nkλ
Vkλ
(min, max)
290.9
59
1015
349.7
68
1343
521.8
119
2169
773.7
160
4326
218.3
38
1021
309.5
49
1200
385.7
86
2393
140.8
22
544
301.2
45
1114
294.9
57
1170
463.2
103
1500
494.0
106
1722
366.1
70
1373
153.3
18
998
(min,
max)
108.2
60.2
236.9
181.3
75.7
310.8
535.6
311.4 1037.4
203.1
113.1
379.5
227.4
113.1
421.9
275.3
156.1
517.3
214.4
112.6
820.3
201.5
87.2
376.8
137.9
74.4
235.4
333.2
159.1
657.7
266.9
133.7
477.2
241.4
128.7
511.7
188.4
107.7
452.3
384.9
198.4
675.1
(min,
max)
4545.4
972
24999
5304.2
817
31861
4560.6
1162 20963
6586.0
946
42939
4544.5
652
25546
4391.4
813
14752
4922.1
1372 18186
3429.0
632
11319
5033.5
1012 23799
3790.6
967
14053
4957.6
1383 27851
4749.1
986
22373
5309.9
983
21372
1288.9
240
7078
(min,
max)
205.4
153.6
295.7
289.2
234.5
389.6
876.6
603.4 1648.3
277.6
189.2
518.7
369.7
274.5
475.2
513.3
380.8
879.6
393.1
286.1
537.4
338.2
219.0
504.0
240.7
178.0
315.1
536.7
316.8
828.3
384.3
283.1
539.8
421.2
311.1
670.2
353.4
274.2
515.1
754.0
451.4 1121.0
Table 3.1: Basic statistics on the number of orders and the average volumes of orders per 30-minute time
interval for each stock.
22
Figure 3.1: Mean-scaled shapes of the cumulative order book as a function of the distance (in ticks)
from the opposite best price for the 14 stocks studied.
We also compute the average cumulative order book shape at 1 to 10 ticks from the best opposite
side. The average cumulative depth Bik,t is thus the quantity available in the order book for stock k in the
price range {b(t) + 1, . . . , b(t) + i} (in ticks) for ask limit orders, or in the price range {a(t) − i, . . . , a(t) − 1}
for bid limit orders, averaged over time during interval t. For i lower or equal to 10, our data is always
complete since the first ten limits are available. For larger i however, we may not have the full data:
10+ j
Bk,t
is not exact if the spread reaches a level lower or equal to j ticks during interval t. Hence i is
lower or equal to 10 in the following empirical analysis. Figure 3.1 plots the empirical average shapes
P
i
10
Bk = T1 Tt=1 Bik,t (arbitrarily scaled to Bk = 1 for easier comparison).
Motivated by the theoretical results from Chapter 7, we investigate the influence of the number of
λ and their average size V λ on the depth on the order book at the first limits. We expect
limit orders Nk,t
k,t
a negative relationship with the number of limit orders and a positive relationship with the size of these
orders, all others things being equal, i.e. the global market activity being held constant. A fairly natural
proxy for the market activity is the traded volume (transactions) per period. The total volume of market
µ
µ
orders submitted during interval t for the stock k is Nk,t Vk,t . We may also consider the total volume of
λ Vλ .
incoming orders Nk,t
k,t
Therefore, we have the following regression model for some book depth Bk,t :
µ
µ
λ
λ
λ λ
Bk,t = β1 Nk,t
+ β2 Vk,t
+ β3 Nk,t
Vk,t + β4 Nk,t Vk,t + k,t .
(3.0.1)
We test this regression using Bk,t = B5k,t (cumulative depth up to 5 ticks away from the best opposite
price), then using Bk,t = B10
k,t (cumulative depth up to 10 ticks away from the best opposite price) and
23
Figure 3.2: Mean-scaled number market orders as a function of the time of day (in seconds) for the 14
stocks studied.
5
finally using Bk,t = B10
k,t − Bk,t (cumulative volume between 6 and 10 ticks away from the best opposite
λ Vλ .
price). For each of these three models, we provide results with or without the interaction term Nk,t
k,t
All models are estimated as panel regression models with fixed effects, i.e. the error term k,t is the sum
of a non-random stock specific component δk (fixed effect) and a random component ηk,t . While the
regression coefficients are stock-independent, the variables δk translate the idiosyncratic characteristics
of each stock.
Note that we use data from 10am to 4pm each day, in order to avoid very active periods, right
after the opening of the market or before its closing, in which our stationary Poisson model might not
be relevant. By concentrating on the heart of the trading day, we focus on smoother variations of the
studied variables. However, even with this restriction, the data may nonetheless exhibit some intraday
seasonality, such as the well-known U-shaped curve of the number of submissions of orders, or the fact
that the order book seems to grow slightly fuller during the day and decline in the end. This intraday
seasonality may be observed by computing, for each stock, the (intraday) seasonal means of the variables
of the model, e.g. the average of a given variable for a given stock at a given time of the day across the
whole sample. This is illustrated for example in figures 3.2 and 3.3 where we plot, for the 14 stocks
studied, the (mean-scaled) seasonal averages of the number of submitted market orders and the (meanscaled) seasonal averages of the cumulative size of the order book up to the tenth limit. For the sake
of completeness, we run the statistical regressions defined at equation (3.0.1) both on raw data and on
deseasonalized data (by subtracting the seasonal mean). Results are given in table 3.2 in the first case,
and in table 3.3 in the latter one.
λ V λ and irrespective of the deIn all cases, irrespective of the presence of the interaction term Nk,t
k,t
24
25
λ
λ
Without Nk,t
Vk,t
Est.
-0.24455434
8.70817318
—
0.00882725
0.079275
156.617
λ
λ
Without Nk,t
Vk,t
Est.
-0.4576647
39.4968315
—
0.0097127
0.14202
301.1
λ
λ
Without Nk,t
Vk,t
Est.
-0.21311036
30.78865834
—
0.00088543
0.1491
318.743
t-value
-7.1143
27.4930
—
0.7487
t-value
-10.4212
24.0566
—
5.6016
Std Err.
0.0439167
1.6418268
—
0.0017339
Std Err.
0.02995514
1.11987302
—
0.00118269
t-value
-12.177
11.598
—
11.132
Std Err.
0.02008400
0.75084050
—
0.00079296
Sig.
***
***
—
***
< 2.2e-16
***
p-value
1.27e-12
< 2.2e-16
—
0.4541
< 2.2e-16
Sig.
***
***
—
***
***
< 2.2e-16
p-value
< 2.2e-16
< 2.2e-16
—
2.228e-08
Sig.
***
***
—
***
p-value
< 2.2e-16
< 2.2e-16
—
< 2.2e-16
λ
λ
With Nk,t
Vk,t
Est.
-8.6511e-02
1.0374e+01
-4.8483e-04
1.0794e-02
0.083945
124.994
λ
λ
With Nk,t
Vk,t
Est.
-0.25599592
41.62260408
-0.00061866
0.01222198
0.14303
228.534
λ
λ
With Nk,t
Vk,t
Est.
-0.16948494
31.24851002
-0.00013383
0.00142825
0.14925
239.292
Std Err.
0.05389411
1.21538277
0.00013744
0.00130749
Std Err.
0.07895182
1.78046702
0.00020134
0.00191540
Std Err.
3.6046e-02
8.1288e-01
9.1924e-05
8.7448e-04
t-value
-3.1448
25.7108
-0.9737
1.0924
t-value
-3.2424
23.3774
-3.0727
6.3809
t-value
-2.4000
12.7622
-5.2743
12.3430
< 2.2e-16
p-value
0.001671
< 2.2e-16
0.330234
0.274723
< 2.2e-16
p-value
0.001192
< 2.2e-16
0.002132
1.906e-10
< 2.2e-16
p-value
0.01643
< 2.2e-16
1.384e-07
< 2.2e-16
***
Sig.
**
***
***
Sig.
**
***
**
***
***
Sig.
*
***
***
***
Table 3.2: Panel regression results for the models defined in equation (3.0.1), using raw data, for Bk,t = B5k,t (top panel), Bk,t = B10
k,t (middle panel) and
10
5
Bk,t = Bk,t − Bk,t (lower panel). In all case, data has 14 stocks and T = 391 time intervals, i.e. 5474 points.
B5k,t
Variables
λ
Nk,t
λ
Vk,t
λ
λ
Nk,t
Vk,t
µ µ
Nk,t
Vk,t
R2
F-statistic
B10
k,t
Variables
λ
Nk,t
λ
Vk,t
λ
λ
Nk,t
Vk,t
µ µ
Nk,t
Vk,t
R2
F-statistic
5
B10
k,t − Bk,t
Variables
λ
Nk,t
λ
Vk,t
λ
λ
Nk,t
Vk,t
µ µ
Nk,t Vk,t
R2
F-statistic
5
Bk,t
Variables
λ
Nk,t
λ
Vk,t
λ
λ
Nk,t
Vk,t
µ µ
Nk,t
Vk,t
R2
F-statistic
10
Bk,t
Variables
λ
Nk,t
Vλ
k,t
λ
λ
Nk,t
Vk,t
µ µ
Nk,t
Vk,t
R2
F-statistic
10
5
Bk,t
− Bk,t
Variables
λ
Nk,t
λ
Vk,t
λ
N λ Vk,t
k,t
µ µ
Nk,t
Vk,t
R2
F-statistic
λ
λ
Without Nk,t
V
k,t
Est.
Std Err.
-0.2693183 0.0199388
14.8579856 0.6885427
—
—
-0.0167731 0.0036252
0.10337
209.718
λ
λ
Without Nk,t
Vk,t
Est.
Std Err.
-0.6663429 0.0423425
50.9522688 1.4622037
—
—
-0.0429541 0.0076985
0.20566
470.96
λ
λ
Without Nk,t
V
k,t
Est.
Std Err.
-0.3970246 0.0293712
36.0942832 1.0142678
—
—
-0.0261809 0.0053401
0.20463
467.995
t-value
-13.5072
21.5789
—
-4.6268
t-value
-15.7370
34.8462
—
-5.5795
t-value
-13.5175
35.5865
—
-4.9027
< 2.2e-16
p-value
< 2.2e-16
< 2.2e-16
—
9.728e-07
< 2.2e-16
p-value
< 2.2e-16
< 2.2e-16
—
2.528e-08
< 2.2e-16
p-value
< 2.2e-16
< 2.2e-16
—
3.799e-06
***
Sig.
***
***
—
***
***
Sig.
***
***
—
***
***
Sig.
***
***
—
***
λ
λ
With Nk,t
V
k,t
Est.
-0.2712167
15.2819415
-0.0017696
-0.0064651
0.10875
166.433
λ
λ
With Nk,t
Vk,t
Est.
-0.67127949
52.05475178
-0.00460179
-0.01614842
0.2128
368.73
λ
λ
With Nk,t
V
k,t
Est.
-0.4000628
36.7728102
-0.0028322
-0.0096833
0.21026
363.153
Std Err.
0.0198836
0.6905058
0.0003085
0.0040367
Std Err.
0.04216152
1.46416246
0.00065416
0.00855951
Std Err.
0.0292738
1.0166060
0.0004542
0.0059431
t-value
-13.6402
22.1315
-5.7361
-1.6016
t-value
-15.9216
35.5526
-7.0347
-1.8866
t-value
-13.6662
36.1721
-6.2356
-1.6293
< 2.2e-16
p-value
< 2.2e-16
< 2.2e-16
4.84e-10
0.1033
< 2.2e-16
p-value
< 2.2e-16
< 2.2e-16
2.243e-12
0.05927
< 2.2e-16
p-value
< 2.2e-16
< 2.2e-16
1.021e-08
0.1093
***
Sig.
***
***
***
***
Sig.
***
***
***
.
***
Sig.
***
***
***
5 (top panel), B
10
Table 3.3: Panel regression results for the models defined in equation (3.0.1), using deseasonalized data, for Bk,t = Bk,t
k,t = Bk,t (middle
10 − B5 (lower panel). In all case, data has 14 stocks and T = 391 time intervals, i.e. 5474 points.
panel) and Bk,t = Bk,t
k,t
26
Figure 3.3: Mean-scaled shapes of the cumulative order book up to 10 ticks away from the opposite best
price as a function of the time of day (in seconds), for the 14 stocks studied.
seasonalization, we observe a positive relationship between the order book depth and the average size
λ , and a negative relationship between the order book depth and the number of limit
of limit orders Vk,t
λ . Note also that the estimated β’s for these two quantities are all significant to the highest level
orders Nk,t
in all the cases using deseasonalized data, and to the 0.2% for all but one case using the raw data (2%
level in this worst case).
Therefore, all others things being equal, it thus appears we have identified the effect described in the
theoretical model of Chapter 7, in particular in section 7.4: for a given total volume of arriving limit
µ
µ
λ V λ and N V , the relative size of the limit orders has a strong influence on the
and market orders Nk,t
k,t
k,t k,t
average shape of the order book: the larger the arriving orders, the deeper the order book ; the more
arriving limit orders, the shallower the order book. The average shape of the order book is deeper when
a few large limit orders are submitted, than when many small limit orders are submitted.
Remark 3.0.1 As a side comment, we may also look at the influence of the global volume of trades
µ
µ
Nk,t Vk,t . Using raw data, we observe a positive influence of this term on the depth of the order book.
At a first glance, this is in line with the phenomenon observed for example in [255], where a positive
relationship between the number of trades and the order book slope of the first 5 limits of the order
book is exhibited. This apparent similarity is to be taken with great caution, since the slope and the
depth are two different variables: their relationship and the possible link between their evolutions is not
known. Note in particular that the positive influence of the global volume of trades does not hold using
deseasonalized data in our sample. A second observation made by [255] is that the influence of market
activity on the order book depth is much stronger closer to the spread, and then decresases when taking
into account further limits. We also observe this phenomenon using raw data: the lower panel of table
27
3.2 shows that this influence is not significant anymore if we take only the "furthest" limits, i.e. the limits
between 6 and 10 ticks away from the best price. This is however unclear with deseasonalized data.
28
Chapter 4
Empirical evidence of market making and
market taking
In this chapter, based on [253], we present an empirical study of the time intervals between orders of
various types on several order-driven markets (equity, bond futures, index futures), with an emphasis of
the mutual excitement of orders. This empirical study will lay the ground for the modeling approach
presented in Chapter 8 and the numerical simulations presented in Chapter 9, where in both cases a
basic order book model is enhanced with a dependence structure of arrival times stemming from these
empirical findings.
4.1
Arrival times of orders: event time is not enough
As seen in Chapter 2, the Poisson hypothesis for the arrival times of orders of different kinds does not
stand under careful scrutiny. However, whereas there already exists a vast literature documenting the
dependence structure of the order flow - the arrival of market orders - , see e.g. [47], little is known on
the interplay between liquidity taking and providing, one of the most important features of order-driven
markets.
As of today, the study of arrival times of orders in an order book has not been a primary focus
in order book modelling. Many toy models leave this dimension aside when trying to understand the
complex dynamics of an order book. In most order driven market models such as [92, 232, 18], and
in some order book models as well (e.g.[279]), a time step in the model is an arbitrary unit of time
during which many events may happen. We may call that clock aggregated time. In most order book
models such as [73, 248, 97], one order is simulated per time step with given probabilities, i.e. these
models use the clock known as event time. In the simple case where these probabilities are constant and
independent of the state of the model, such a time treatment is equivalent to the assumption that order
flows are homogeneous Poisson processes. A likely reason for the use of non-physical time in order
book modelling – leaving aside the fact that models can be sufficiently complicated without adding
another dimension – is that many puzzling empirical observations (see e.g. [67] for a review of some
of the well-known “stylized facts”) can be made in event time (e.g. autocorrelation of the signs of limit
and market orders) or in aggregated time (e.g. volatility clustering).
29
However, it is clear that physical (calendar) time has to be taken into account for the modelling of
a realistic order book model. For example, market activity varies widely, and intraday seasonality is
often observed as a well known U-shaped pattern. Even for a short time scale model – a few minutes, a
few hours – durations of orders (i.e. time intervals between orders) are very broadly distributed. Hence,
the Poisson assumption and its exponential distribution of arrival times has to be discarded, and models
must take into account the way these irregular flows of orders affect the empirical properties studied on
order books.
Let us give one illustration. On figure 4.1, we plot examples of the empirical density function of the
observed spread in event time (i.e. spread is measured each time an event happens in the order book),
and in physical (calendar) time (i.e. measures are weighted by the time interval during which the order
book is idle). It appears that density of the most probable values of the time-weighted distribution is
0.25
Event time spread
Calendar time
0.2
100
10-1
10-2
10-3
10-4
10-5
10-6
0.15
0.1
0
0.05
0.1
0.15
0.2
0.05
0
0
0.02 0.04 0.06 0.08
0.1
0.12 0.14 0.16 0.18
0.2
0.22
Figure 4.1: Empirical density function of the distribution of the bid-ask spread in event time and in
physical (calendar) time. In inset, same data using a semi-log scale. This graph has been computed with
15 four-hour samples of tick data on the BNPP.PA stock (see section A.3 for details).
higher than in the event time case. Symmetrically, the density of the least probable event is even smaller
when physical time is taken into account. This tells us a few things about the dynamics of the order
book, which could be summarized as follows: the wider the spread, the faster its tightening. We can
get another insight of this empirical property by measuring on our data the average waiting time before
the next event, conditionally on the spread size. When computed on the lower one-third-quantile (small
spread), the average waiting time is 320 milliseconds. When computed on the upper one-third-quantile
(large spread), this average waiting time is 200 milliseconds. These observations complement some of
the ones that can be found in the early paper [40].
30
4.2
Empirical evidence of “market making”
Our idea for an enhanced model of order streams is based on the following observation: once a market
order has been placed, the next limit order is likely to take place faster than usual. To illustrate this, we
compute for all our studied assets:
• the empirical probability density function (pdf) of the time intervals of the counting process of all
orders (limit orders and market orders mixed), i.e. the time step between any order book event
(other than cancellation)
• and the empirical density function of the time intervals between a market order and the immediately following limit order.
If an independent Poisson assumption held, then these empirical distributions should be identical. However, we observe a very high peak for short time intervals in the second case. The first moment of these
empirical distributions is significant: one the studied assets, we find that the average time between a
market order and the following limit order is 1.3 (BNPP.PA) to 2.6 (LAGA.PA) times shorter than the
average time between two random consecutive events.
On the graphs shown in figure 4.2, we plot the full empirical distributions for four of the five studied
assets1 . We observe their broad distribution and the sharp peak for the shorter times: on the Footsie
future market for example, 40% of the measured time steps between consecutive events are less that 50
milliseconds ; this figure jumps to nearly 70% when considering only market orders and their following limit orders. This observation is an evidence for some sort of market-making behaviour of some
participants on those markets. It appears that the submission of market orders is monitored and triggers
automatic limit orders that add volumes in the order book (and not far from the best quotes, since we
only monitor the five best limits).
In order to confirm this finding, we perform non-parametric statistical test on the measured data.
For all four studied markets, omnibus Kolmogorov-Smirnov and Cramer-von Mises tests performed
on random samples establish that the considered distributions are statistically different. If assuming a
common shape, a Wilcoxon-Mann-Withney U test clearly states that the distribution of time intervals
between a market orders and the following limit order is clearly shifted to the left compared to the
distributions of time intervals between any orders, i.e. the average “limit following market” reaction
time is shorter than the average time interval between random consecutive orders.
Note that there is no link with the sign of the market order and the sign of the following limit order.
For example for the BNP Paribas (resp. Peugeot and Lagardere) stock, they have the same sign in 48.8%
(resp. 51.9% and 50.7%) of the observations. And more interestingly, the “limit following market”
property holds regardless of the side on which the following limit order is submitted. On figure 4.3,
we plot the empirical distributions of time intervals between a market order and the following limit
order, conditionally on the side of the limit order: the same side as the market order or the opposite one.
It appears for all studied assets that both distributions are roughly identical. In other words, we cannot
distinguish on the data if liquidity is added where the market order has been submitted or on the opposite
1
Observations are identical on all the studied assets.
31
0.6
0.7
AllOrders
MarketNextLimit
AllOrders
MarketNextLimit
0.6
0.5
100
100
0.5
-1
10
0.4
10-1
-2
10
10-2
0.4
-3
10-3
10
0.3
10-4
0.01
0.1
1
10-4
0.01
0.3
10
0.1
1
10
0.2
0.2
0.1
0.1
0
0
0
0.1
0.2
0.3
0.4
0.5
0.4
0
0.1
0.2
0.3
0.4
0.7
AllOrders
MarketNextLimit
AllOrders
MarketNextLimit
0.35
0.6
100
0.3
100
0.5
-1
10
0.25
10-2
-1
10
10-2
0.4
10-3
0.2
10-3
-4
10
0.15
0.01
0.1
1
10-4
0.01
0.3
10
0.1
1
10
0.2
0.1
0.1
0.05
0
0
0
0.1
0.2
0.3
0.4
0.5
0
0.1
0.2
0.3
0.4
Figure 4.2: Empirical density function of the distribution of the time intervals between two consecutive
orders (any type, market or limit) and empirical density function of the distribution of the time intervals
between a market order and the immediately following limit order. X-axis is scaled in seconds. In
insets, same data using a log-log scale. Studied assets: BNPP.PA (top left), LAGA.PA (top right), FEIZ9
(bottom left), FFIZ9 (bottom right).
side. Therefore, we do not infer any empirical property of placement: when a market order is submitted,
the intensity of the limit order process increases on both sides of the order book.
This effect we have thus identified is a phenomenon of liquidity replenishment of an order book after
the execution of a trade. The fact that it is a bilateral effect makes its consequences similar to “market
making”, event though there is obviously no market maker involved on the studied markets.
4.3
0.5
A reciprocal “market following limit” effect ?
We now check if a similar or opposite distortion is to be found on market orders when they follow limit
orders. To investigate this, we compute for all our studied assets the “reciprocal” measures:
• the empirical probability density function (pdf) of the time intervals of the counting process of all
orders (limit orders and market orders mixed), i.e. the time step between any order book event
32
0.5
0.7
0.7
Any limit order
Same side limit order
Opposite side limit order
0.6
Any limit order
Same side limit order
Opposite side limit order
0.6
100
0.5
100
0.5
-1
10
10-1
-2
10
0.4
10-2
0.4
-3
10-3
10
10-4
0.01
0.3
0.1
1
10-4
0.01
0.3
10
0.2
0.2
0.1
0.1
0
0.1
1
10
0
0
0.1
0.2
0.3
0.4
0.5
0
0.1
0.2
0.3
0.4
Figure 4.3: Empirical density function of the distribution of the time intervals between a market order
and the immediately following limit order, whether orders have been submitted on the same side and on
opposite sides. X-axis is scaled in seconds. In insets, same data using a log-log scale. Studied assets:
BNPP.PA (left), LAGA.PA (right).
(other than cancellation)
• and the empirical density function of the time step between a market order and the previous limit
order.
As previously, if an independent Poisson assumption held, then these empirical distribution should be
identical. Results for four assets are shown on figure 4.4. Contrary to previous case, no effect is very
easily interpreted. For the three stocks (BNPP.PA, LAGA.PA and PEUP.PA (not shown)), it seems that
the empirical distribution is less peaked for small time intervals, but the difference is much less important
than in the previous case. As for the FEI and FFI markets, the two distributions are even much closer.
Non-parametric tests confirms these observations.
Performed on data from the three equity markets, Kolmogorov tests indicate different distributions
and Wilcoxon tests enforce the observation that time intervals between a limit order and a following
market order are stochastically larger than time intervals between unidentified orders. As for the future
markets on Footsie (FFI) and 3-month Euribor (FEI), Kolmogorov tests does not indicate differences
in the two observed distributions, and the result is confirmed by a Wilcoxon test that concludes at the
equality of the means. Note that [23], using a different methodology akin to that of [113], does show a
rather clear dependance between limit orders that move the price, and subsequent market orders.
33
0.5
0.5
0.5
AllOrders
PreviousLimitMarket
0.4
AllOrders
PreviousLimitMarket
0.4
100
100
-1
10-1
10
-2
10
0.3
10-2
0.3
10-3
10-3
10-4
0.01
0.2
0.1
1
10
10-4
0.01
0.2
0.1
0.1
1
10
0.1
0
0
0
0.1
0.2
0.3
0.4
0.5
0.5
0
0.1
0.2
0.3
0.4
0.5
0.5
AllOrders
PreviousLimitMarket
0.4
AllOrders
PreviousLimitMarket
0.4
100
100
-1
-1
10
10
-2
10
0.3
10-2
0.3
-3
10-3
10
-4
10
0.2
0.01
0.1
1
10
10-4
0.01
0.2
0.1
0.1
1
10
0.1
0
0
0
0.1
0.2
0.3
0.4
0.5
0
0.1
0.2
0.3
0.4
Figure 4.4: Empirical density function of the distribution of the time intervals between two consecutive
orders (any type, market or limit) and empirical density function of the distribution of the time intervals
between a limit order and an immediately following market order. In insets, same data using a log-log
scale. Studied assets: BNPP.PA (top left), LAGA.PA (top right), FEIZ9 (bottom left), FFIZ9 (bottom
right).
34
0.5
Part II
Mathematical modelling of limit order
books
35
Chapter 5
Agent-based modeling of limit order
books: a survey
In this chapter, we review agent-based models of limit order books, that is, models depicting, at the
individual agent level, possibly from a statistical point of view, the interactions that lead to a transaction
on a financial market. This survey is far from exhaustive, we have selected models that we feel are
representative of some important, specific trends in agent-based modelling.
5.1
Agent-based models as a mechanism for stylized facts
Although known, at least partly, for a long time – [237] gives a reference for a paper dealing with nonnormality of price time series in 1915, followed by several others in the 1920’s – the stylized facts of asset
returns: fat tails, volatility clustering, leverage... have often been left aside when modelling financial
markets. They were even often referred to as "anomalous" characteristics, as if observations failed to
comply with theory. Much has been done these past fifteen years in order to address this challenge
and provide new models that can reproduce these facts. These recent developments have been built on
top of early attempts at modelling mechanisms of financial markets with agents. For example, [305],
investigating some rules of the SEC1 , or [149], investigating double-auction microstructure, belong to
those historical works. It seems that the first modern attempts at that type of models were made in the
field of behavioural finance. This field aims at improving financial modelling based on the psychology
and sociology of the investors. Models are built with agents who can exchange shares of stocks according
to exogenously defined utility functions reflecting their preferences and risk aversions. [219] shows
that this type of modelling offers good flexibility for reproducing some of the stylized facts and [218]
provides a review of that type of model. However, although achieving some of their goals, these models
suffer from many drawbacks: first, they are very complex, and it may be a very difficult task to identify
the role of their numerous parameters and the types of dependence to these parameters; second, the
chosen utility functions do not necessarily reflect what is observed on the mechanisms of a financial
market.
1
Securities and Exchange Commission, the agency supervising the organization of regulator of the US stock exchanges
37
A sensible change in modelling appears with much simpler models implementing only well-identified
and presumably realistic “behaviour”: [92] uses noise traders that are subject to “herding”, i.e. form
random clusters of traders sharing the same view on the market. The idea is used in [283] as well. A
complementary approach is to characterize traders as fundamentalists, chartists or noise traders. [232]
propose an agent-based model in which these types of traders interact. In all these models, the price
variation directly results from the excess demand: at each time step, all agents submit orders and the
resulting price is computed. Therefore, everything is cleared at each time step and there is no structure
of order book to keep track of orders.
One big step is made with models really taking into account limit orders and keeping them in an
order book once submitted and not executed. [81] build an agent-based model where all traders submit
orders depending on the three elements identified in [232]: chartists, fundamentalists, noise. Orders
submitted are then stored in a persistent order book. In fact, one of the first simple models with this
feature was proposed in [29]. In this model, orders are particles moving along a price line, and each
collision is a transaction. Due to numerous caveats in this model, the authors propose in the same paper
an extension with fundamentalist and noise traders in the spirit of the models previously evoked. [240]
goes further in the modelling of trading mechanisms by taking into account fixed limit orders and market
orders that trigger transactions, and really simulating the order book. This model was analytically solved
using a mean-field approximation by [301].
Following this trend of modelling, the more or less “rational” agents composing models in economics tends to vanish and be replaced by the notion of flows: orders are not submitted any more by
an agent following a strategic behaviour, but are viewed as an arriving flow whose properties are to
be determined by empirical observations of market mechanisms. Thus, the modelling of order books
calls for more “stylized facts”, i.e. empirical properties that could be observed on a large number of
order-driven markets. [40] is a thorough empirical study of the order flows in the Paris Bourse a few
years after its complete computerization. Market orders, limit orders, time of arrivals and placement are
studied. [49] and [275] provide statistical features on the order book itself. These empirical studies, that
have been reviewed in the first part of this review, are the foundation for “zero-intelligence” models,
in which “stylized facts” are expected to be reproduced by the properties of the order flows and the
structure of order book itself, without considering exogenous “rationality”. [73] propose a simple model
of order flows: limit orders are deposited in the order book and can be removed if not executed, in a
simple deposition-evaporation process. [49] use this type of model with empirical distribution as inputs.
As of today, the most complete empirical model is to our knowledge [248], where order placement and
cancellation models are proposed and fitted on empirical data. Finally, new challenges arise as scientists
try to identify simple mechanisms that allow an agent-based model to reproduce non-trivial behaviours:
herding behaviour in[92], dynamic price placement in [280], threshold behaviour in [91], etc.
38
5.2
Early order-driven market modelling: market microstructure and
policy issues
The pioneering works in simulation of financial markets were aimed to study market regulations. The
very first one, [305], tries to investigate the effect of regulations of the SEC on American stock markets,
using empirical data from the 20’s and the 50’s. Twenty years later, at the start of the computerization
of financial markets, [165] implements a simulator in order to test the feasibility of automated market
making. Instead of reviewing the huge microstructure literature, we refer the reader to the well-known
books by [261] or [169], for example, for a panorama of this branch of finance. However, by presenting
a small selection of early models, we here underline the grounding of recent order book modelling.
5.2.1
A pioneer order book model
To our knowledge, the first attempt to simulate a financial market was by [305]. This paper was a
biting and controversial reaction to the Report of the Special Study of the Securities Markets of the SEC
([88]), whose aim was to “study the adequacy of rules of the exchange and that the New York stock
exchange undertakes to regulate its members in all of their activities” ([89]). According to Stigler, this
SEC report lacks rigorous tests when investigating the effects of regulation on financial markets. Stating
that “demand and supply are [...] erratic flows with sequences of bids and asks dependent upon the
random circumstances of individual traders”, he proposes a simple simulation model to investigate the
evolution of the market. In this model, constrained by simulation capability in 1964, price is constrained
within L = 10 ticks. (Limit) orders are randomly drawn, in trade time, as follows: they can be bid or
ask orders with equal probability, and their price level is uniformly distributed on the price grid. Each
time an order crosses the opposite best quote, it is a market order. All orders are of size one. Orders not
executed N = 25 time steps after their submission are cancelled. Thus, N is the maximum number of
orders available in the order book.
In the original paper, a run of a hundred trades was manually computed using tables of random
numbers. Of course, no particular results concerning the “stylized facts” of financial time series was
expected at that time. However, in his review of some order book models, [302] makes simulations
of a similar model, with parameters L = 5000 and N = 5000, and shows that price returns are not
Gaussian: their distribution exhibits power law with exponent 0.3, far from empirical data. As expected,
the limitation L is responsible for a sharp cut-off of the tails of this distribution.
5.2.2
Microstructure of the double auction
[149] provides an early study of the double auction market with a point of view that does not ignore
temporal structure, and really defines order flows. Price is discrete and constrained to be within {p1 , pL }.
Buy and sell orders are assumed to be submitted according to two Poisson processes of intensities λ and
µ. Each time an order crosses the best opposite quote, it is a market order. All quantities are assumed to
be equal to one. The aim of the author was to provide an empirical study of the market microstructure.
The main result of its Poisson model was to support the idea that negative correlation of consecutive
price changes is linked the microstructure of the double auction exchange. This paper is very interesting
39
because it can be seen as precursor that clearly sets the challenges of order book modelling. First, the
mathematical formulation is promising. With its fixed constrained prices, [149] can define the state
of the order book at a given time as the vector (ni )i=1,...,L of awaiting orders (negative quantity for bid
orders, positive for ask orders). Future analytical models will use similar vector formulations that can be
cast it into known mathematical processes in order to extract analytical results – see e.g. [97] reviewed
below. Second, the author points out that, although the Poisson model is simple, analytical solution
is hard to work out, and he provides Monte Carlo simulation. The need for numerical and empirical
developments is a constant in all following models. Third, the structural question is clearly asked in the
conclusion of the paper: “Does the auction-market model imply the characteristic leptokurtosis seen in
empirical security price changes?”. The computerization of markets that was about to take place when
this research was published – Toronto’s CATS2 opened a year later in 1977 – motivated many following
papers on the subject. As an example, let us cite here [165], who built a model to choose the right
mechanism for setting clearing prices in a multi-securities market.
5.2.3
Zero-intelligence
In the models by [305] and [149], orders are submitted in a purely random way on the grid of possible
prices. Traders do not observe the market here and do not act according to a given strategy. Thus, these
two contributions clearly belong to a class of “zero-intelligence” models. To our knowledge, [156] is the
first paper to introduce the expression “zero-intelligence” in order to describe non-strategic behaviour of
traders. It is applied to traders that submit random orders in a double auction market. The expression has
since been widely used in agent-based modelling, sometimes in a slightly different meaning (see more
recent models described in this review). In [156], two types of zero-intelligence traders are studied.
The first are unconstrained zero-intelligence traders. These agents can submit random order at random
prices, within the allowed price range {1, . . . , L}. The second are constrained zero-intelligence traders.
These agents submit random orders as well, but with the constraint that they cannot cross their given
reference price pRi : constrained zero-intelligence traders are not allowed to buy or sell at loss. The aim
of the authors was to show that double auction markets exhibit an intrinsic “allocative efficiency” (ratio
between the total profit earned by the traders divided by the maximum possible profit) even with zerointelligence traders. An interesting fact is that in this experiment, price series resulting from actions by
zero-intelligence traders are much more volatile than the ones obtained with constrained traders. This
fact will be confirmed in future models where “fundamentalists” traders, having a reference price, are
expected to stabilize the market (see [322] or [232] below). Note that the results have been criticized by
[86], who show that the observed convergence of the simulated price towards the theoretical equilibrium
price may be an artefact of the model. More precisely, the choice of traders’ demand carry a lot of
constraints that alone explain the observed results.
Modern works in Econophysics owe a lot to these early models or contributions. Starting in the
mid-90’s, physicists have proposed simple order book models directly inspired from Physics, where the
analogy “order ≡ particle” is emphasized. Three main contributions are presented in the next section.
2
Computer Assisted Trading System
40
5.3
5.3.1
Order-driven market modelling in Econophysics
The order book as a reaction-diffusion model
A very simple model directly taken from Physics was presented in [29]. The authors consider a market
with N noise traders able to exchange one share of stock at a time. Price p(t) at time t is constrained to
be an integer (i.e. price is quoted in number of ticks) with an upper bound p:
¯ ∀t,
p(t) ∈ {0, . . . , p}.
¯
Simulation is initiated at time 0 with half of the agents asking for one share of stock (buy orders, bid)
with price:
j
j = 1, . . . , N/2,
pb (0) ∈ {0, p/2},
¯
(5.3.1)
and the other half offering one share of stock (sell orders, ask) with price:
j
j = 1, . . . , N/2.
¯ p},
¯
p s (0) ∈ { p/2,
(5.3.2)
At each time step t, agents revise their offer by exactly one tick, with equal probability to go up or
down. Therefore, at time t, each seller (resp. buyer) agent chooses his new price as:
j
j
p s (t + 1) = p s (t) ± 1
j
j
(resp. pb (t + 1) = pb (t) ± 1 ).
(5.3.3)
j
A transaction occurs when there exists (i, j) ∈ {1, . . . , N/2}2 such that pib (t +1) = p s (t +1). In such a case
the orders are removed and the transaction price is recorded as the new price p(t). Once a transaction has
j
been recorded, two orders are placed at the extreme positions on the grid: pib (t + 1) = 0 and p s (t + 1) = p.
¯
As a consequence, the number of orders in the order book remains constant and equal to the number of
agents. In figure 5.1, an illustration of these moving particles is given.
As pointed out by the authors, this process of simulation is similar the reaction-diffusion model
A + B → ∅ in Physics. In such a model, two types of particles are inserted at each side of a pipe of length
p¯ and move randomly with steps of size 1. Each time two particles collide, they’re annihilated and two
new particles are inserted. The analogy is summarized in table 5.1. Following this analogy, it thus can
Figure 5.1: Illustration of the Bak, Paczuski and Shubik model: white particles (buy orders, bid) moving
from the left, black particles (sell orders, ask) moving from the right. Reproduced from [29].
41
Table 5.1: Analogy between the A + B → ∅ reaction model and the order book in [29].
Physics
[29]
Particles
Finite Pipe
Collision
Orders
Order book
Transaction
be showed that the variation ∆p(t) of the price p(t) verifies :
t
∆p(t) ∼ t1/4 (ln( ))1/2 .
t0
(5.3.4)
Thus, at long time scales, the series of price increments simulated in this model exhibit a Hurst exponent
H = 1/4. As for the stylized fact H ≈ 0.7, this sub-diffusive behavior appears to be a step in the
wrong direction compared to the random walk H = 1/2. Moreover, [302] points out that no fat tails are
observed in the distribution of the returns of the model, but rather fits the empirical distribution with an
exponential decay. Other drawbacks of the model could be mentioned. For example, the reintroduction
of orders at each end of the pipe leads to unrealistic shape of the order book, as shown on figure 5.2.
Actually here is the main drawback of the model: “moving” orders is highly unrealistic as for modelling
an order book, and since it does not reproduce any known financial exchange mechanism, it cannot be the
base for any larger model. Therefore, attempts by the authors to build several extensions of this simple
framework, in order to reproduce “stylized facts” by adding fundamental traders, strategies, trends, etc.
are not of interest for us in this review. However, we feel that the basic model as such is very interesting
because of its simplicity and its “particle” representation of an order-driven market that has opened the
way for more realistic models.
5.3.2
Introducing market orders
[240] keeps the zero-intelligence structure of the [29] model but adds more realistic features in the order
placement and evolution of the market. First, limit orders are submitted and stored in the model, without
moving. Second, limit orders are submitted around the best quotes. Third, market orders are submitted
to trigger transactions. More precisely, at each time step, a trader is chosen to perform an action. This
trader can either submit a limit order with probability ql or submit a market order with probability 1 − ql .
Once this choice is made, the order is a buy or sell order with equal probability. All orders have a one
unit volume.
As usual, we denote p(t) the current price. In case the submitted order at time step t + 1 is a limit
ask (resp. bid) order, it is placed in the book at price p(t) + ∆ (resp. p(t) − ∆), ∆ being a random variable
uniformly distributed in ]0; ∆ M = 4]. In case the submitted order at time step t + 1 is a market order, one
order at the opposite best quote is removed and the price p(t + 1) is recorded. In order to prevent the
number of orders in the order book from large increase, two mechanisms are proposed by the author:
either keeping a fixed maximum number of orders (by discarding new limit orders when this maximum
is reached), or removing them after a fixed lifetime if they have not been executed.
Numerical simulations show that this model exhibits non-Gaussian heavy-tailed distributions of re42
Figure 5.2: Snapshot of the limit order book in the Bak, Paczuski and Shubik model. Reproduced from
[29].
43
Figure 5.3: Empirical probability density functions of the price increments in the Maslov model. In
inset, log-log plot of the positive increments. Reproduced from [240].
turns. On figure 5.3, the empirical probability density of the price increments for several time scales
are plotted. For a time scale δt = 1, the author fit the tails distribution with a power law with exponent
3.0, i.e. reasonable compared to empirical value. However, the Hurst exponent of the price series is still
H = 1/4 with this model. It should also be noted that [301] proposed an analytical study of the model
using a mean-field approximation (See below section 6.2.2).
This model brings very interesting innovations in order book simulation: order book with (fixed)
limit orders, market orders, necessity to cancel orders waiting too long in the order book. These features
are of prime importance in any following order book model.
5.3.3
The order book as a deposition-evaporation process
[73] continue the work of [29] and [240], and develop the analogy between dynamics of an order book
and an infinite one dimensional grid, where particles of two types (ask and bid) are subject to three
types of events: deposition (limit orders), annihilation (market orders) and evaporation (cancellation).
Note that annihilation occurs when a particle is deposited on a site occupied by a particle of another
type. The analogy is summarized in table 5.2. Hence, the model goes as follows: At each time step,
a bid (resp. ask) order is deposited with probability λ at a price n(t) drawn according to a Gaussian
44
Table 5.2: Analogy between the deposition-evaporation process and the order book in [73].
Physics
[73]
Particles
Infinite lattice
Deposition
Evaporation
Annihilation
Orders
Order book
Limit orders submission
Limit orders cancellation
Transaction
distribution centred on the best ask a(t) (resp. best bid b(t)) and with variance depending linearly on the
spread s(t) = a(t) − b(t): σ(t) = K s(t) + C. If n(t) > a(t) (resp. n(t) < b(t)), then it is a market order:
annihilation takes place and the price is recorded. Otherwise, it is a limit order and it is stored in the
book. Finally, each limit order stored in the book has a probability δ to be cancelled (evaporation).
Figure 5.4 shows the average return as a function of the time scale. It appears that the series of
price returns simulated with this model exhibit a Hurst exponent H = 1/4 for short time scales, and
that tends to H = 1/2 for larger time scales. This behaviour might be the consequence of the random
evaporation process (which was not modelled in [240], where H = 1/4 for large time scales). Although
some modifications of the process (more than one order per time step) seem to shorten the sub-diffusive
region, it is clear that no over-diffusive behaviour is observed.
5.4
Empirical zero-intelligence models
The three models presented in the previous section 5.3 have successively isolated essential mechanisms
that are to be used when simulating a “realistic” market: one order is the smallest entity of the model;
the submission of one order is the time dimension (i.e. event time is used, not an exogenous time
defined by market clearing and “tatonnement” on exogenous supply and demand functions); submission
of market orders (as such in [240], as “crossing limit orders” in [73]) and cancellation of orders are
taken into account. On the one hand, one may try to describe these mechanisms using a small number of
parameters, using Poisson process with constant rates for order flows, constant volumes, etc. This might
lead to some analytically tractable models, as will be described in section 6.2.2. On the other hand, one
may try to fit more complex empirical distributions to market data without analytical concern.
This type of modelling is best represented by [248]. It is the first model that proposes an advanced
calibration on the market data as for order placement and cancellation methods. As for volume and
time of arrivals, assumptions of previous models still hold: all orders have the same volume, discrete
event time is used for simulation, i.e. one order (limit or market) is submitted per time step. Following
[73], there is no distinction between market and limit orders, i.e. market orders are limit orders that are
submitted across the spread s(t). More precisely, at each time step, one trading order is simulated: an
ask (resp. bid) trading order is randomly placed at n(t) = a(t) + δa (resp. n(t) = b(t) + δb) according to
a Student distribution with scale and degrees of freedom calibrated on market data. If an ask (resp. bid)
order satisfies δa < −s(t) = b(t) − a(t) (resp. δb > s(t) = a(t) − b(t)), then it is a buy (resp. sell) market
order and a transaction occurs at price a(t) (resp. b(t).
During a time step, several cancellations of orders may occur. The authors propose an empirical
45
Figure 5.4: Average return hr∆t i as a function of ∆t for different sets of parameters and simultaneous
depositions allowed in the Challet and Stinchcombe model. Reproduced from [73].
46
Figure 5.5: Lifetime of orders for simulated data in the Mike and Farmer model, compared to the
empirical data used for fitting. Reproduced from [248].
distribution for cancellation based on three components for a given order:
• the position in the order book, measured as the ratio y(t) =
∆(t)
∆(0)
where ∆(t) is the distance of the
order from the opposite best quote at time t,
• the order book imbalance, measured by the indicator Nimb (t) =
Nb (t)
Na (t)+Nb (t) )
Na (t)
Na (t)+Nb (t)
(resp. Nimb (t) =
for ask (resp. bid) orders, where Na (t) and Nb (t) are the number of orders at ask and
bid in the book at time t,
• the total number N(t) = Na (t) + Nb (t) of orders in the book.
Their empirical study leads them to assume that the cancellation probability has an exponential dependance on y(t), a linear one in Nimb and finally decreases approximately as 1/Nt (t) as for the total
number of orders. Thus, the probability P(C|y(t), Nimb (t), Nt (t)) to cancel an ask order at time t is formally written :
P(C|y(t), Nimb (t), Nt (t)) = A(1 − e−y(t) )(Nimb (t) + B)
1
,
Nt (t)
(5.4.1)
where the constants A and B are to be fitted on market data. Figure 5.5 shows that this empirical formula
provides a quite good fit on market data.
Finally, the authors mimic the observed long memory of order signs by simulating a fractional Brownian motion. The auto-covariance function Γ(t) of the increments of such a process exhibits a slow decay
47
Figure 5.6: Cumulative distribution of returns in the Mike and Farmer model, compared to the empirical
data used for fitting. Reproduced from [248].
:
Γ(k) ∼ H(2H − 1)t2H−2
(5.4.2)
and it is therefore easy to reproduce exponent β of the decay of the empirical autocorrelation function of
order signs observed on the market with H = 1 − β/2.
The results of this empirical model are quite satisfying as for return and spread distribution. The
distribution of returns exhibit fat tails which are in agreement with empirical data, as shown on figure 5.6.
The spread distribution is also very well reproduced. As their empirical model has been built on the data
of only one stock, the authors test their model on 24 other data sets of stocks on the same market and
find for half of them a good agreement between empirical and simulated properties. However, the bad
results of the other half suggest that such a model is still far from being “universal”.
Despite these very nice results, some drawbacks have to be pointed out. The first one is the fact that
the stability of the simulated order book is far from ensured. Simulations using empirical parameters in
the simulations may bring situations where the order book is emptied by large consecutive market orders.
Thus, the authors require that there is at least two orders in each side of the book. This exogenous trick
might be important, since it is activated precisely in the case of rare events that influence the tails of
the distributions. Also, the original model does not focus on volatility clustering. [162] propose a
variant that tackles this feature. Another important drawback of the model is the way order signs are
simulated. As noted by the authors, using an exogenous fractional Brownian motion leads to correlated
48
price returns, which is in contradiction with empirical stylized facts. We also find that at long time scales
it leads to a dramatic increase of volatility. As we have seen in the first part of the review, the correlation
of trade signs can be at least partly seen as an artefact of execution strategies. Therefore this element is
one of the numerous that should be taken into account when “programming” the agents of the model. In
order to do so, we have to leave the (quasi) “zero-intelligence” world and see how modelling based on
heterogeneous agents might help to reproduce non-trivial behaviours. Prior to this development below
in 8.1, we briefly review some analytical works on the “zero-intelligence” models.
5.5
Remarks
In table 5.3, we summarize the key features of some of the order book models reviewed in this chapter.
49
Model
Stigler (1961)
Garman
(1976)
Bak, Paczuski
and Shubik
(1997)
Maslov
(2000)
Challet and
Stinchcombe
(2001)
Price
range
Clock
Finite grid
Finite grid
Finite grid
Unconstrained Unconstrained Unconstrained
Trade time
Physical Time
Aggregated
time
N agents owning each one
limit order
Event time
and
Aggregated
time
One
zerointelligence
agent / One
flow
Aggregated
time
One
zerointelligence
agent / One
flow
Normally
distributed
around best
quote
Studentdistributed
around best
quote
Pending
orders
are
cancelled
after a fixed
number
of
time steps
Defined
as
crossing limit
orders
Pending orders can be
cancelled
with
fixed
probability at
each time step
Defined
as
crossing limit
orders
Pending
orders
can
be
cancelled
with
3-parameter
conditional
probability at
each time step
Unit
Correlated
with a fractional Brownian motion
Flows One
zero/
intelligence
Agents agent / One
flow
One
zerointelligence
agent
/
Two
flows
(buy/sell)
Limit
orders
Uniform distribution on
the price grid
Two Poisson
processes for
buy and sell
orders
Moving
at
each time step
by one tick
Market Defined
as
orders crossing limit
orders
Cancel- Pending
lation orders
are
orders cancelled
after a fixed
number
of
time steps
Defined
as
crossing limit
orders
None
Defined
as
crossing limit
orders
None (constant number
of
pending
orders)
Volume Unit
Order Independent
signs
Unit
Independent
Unit
Independent
Unit
Independent
Unit
Independent
Claimed Return disretribution
is
sults
power-law
0.3 / Cut-off
because finite
grid
Microstructure
is responsible
for negative
correlation of
consecutive
price changes
No fat tails for
returns / Hurst
exponent 1/4
for price increments
Fat tails for
distributions
of returns /
Hurst exponent 1/4
Hurst exponent 1/4 for
short
time
scales, tending to 1/2 for
larger
time
scales
One
zerointelligence
flow (limit order with fixed
probability,
else market
order)
Uniformly
distributed in
a finite interval
around
last price
Submitted as
such
Table 5.3: Summary of the characteristics of the reviewed limit order book models.
50
Mike
Farmer
(2008)
Fat tails distributions of
returns / Realistic spread
distribution
/
Unstable
order book
Chapter 6
The mathematical structure of
zero-intelligence limit order book models
In this chapter, based on [7], we present some mathematical results focusing on the convergence of the
price process - a pure jump process at the microscopic level - to a diffusive process at macroscopic time
scales. This result is an important step towards establishing the compatibility of microstructural models
based on a realistic limit order book dynamics, and more classical models used in continuous time finance.
We elaborate on the models of [97] and [303] to present a stylized description of the order book, and
derive several mathematical results in the case of independent Poissonian arrival times. In particular, we
show that the cancellation structure is an important factor ensuring the existence of a stationary distribution for the order book and the exponential convergence towards it. We also prove a functional central
limit theorem (FCLT) forthe rescaled, centered price process.
6.1
An elementary approximation: perfect market making
We start with the simplest agent-based market model:
• The order book starts in a full state: all limits above PA (0) and below PB (0) are filled with one
limit order of unit size q. The spread starts equal to 1 tick;
• The flow of market orders is modeled by two independent Poisson processes M + (t) (buy orders)
and M − (t) (sell orders) with constant arrival rates (or intensities) λ+ and λ− ;
• There is one liquidity provider, who reacts immediately after a market order arrives so as to maintain the spread constantly equal to 1 tick. He places a limit order on the same side as the market
order (i.e. a buy limit order after a buy market order and vice versa) with probability u and on the
opposite side with probability 1 − u.
51
The mid-price dynamics can be written in the following form
dP(t) = ∆P (dM + (t) − dM − (t))Z,
(6.1.1)
where Z is a Bernoulli random variable
Z = 0 with probability (1 − u),
(6.1.2)
Z = 1 with probability u.
(6.1.3)
and
The infinitesimal generator L associated with this dynamics is
L f (P) = u λ+ ( f (P + ∆P) − f ) + λ− ( f (P − ∆P) − f ) ,
(6.1.4)
where f denotes a test function. It is well known that a continuous limit is obtained under suitable
assumptions on the intensity and tick size. Noting that (6.1.4) can be rewritten as
1
f (P + ∆P) − 2 f + f (P − ∆P)
u (λ+ + λ− )(∆P)2
2
(∆P)2
f (P + ∆P) − f (P − ∆P)
+ u (λ+ − λ− )∆P
,
2∆P
L f (P) =
(6.1.5)
and under the following assumptions
u (λ+ + λ− )(∆P)2 −→σ2 as ∆P → 0,
(6.1.6)
u (λ+ − λ− )∆P−→µ as ∆P → 0,
(6.1.7)
and
the generator converges to the classical diffusion operator
σ2 ∂2 f
∂f
+µ ,
2
2 ∂P
∂P
(6.1.8)
corresponding to a Brownian motion with drift. This simple case is worked out as an example of the
type of limit theorems that we will be interested in in the sequel. Of course, a more classical approach
using the Functional Central limit Theorem (FCLT) as in [42] or [320] yields similar results: for given,
fixed values of λ+ , λ− and ∆P, the rescaled-centred price process
P(nt) − nµt
√
nσ
(6.1.9)
converges as n → ∞, to a standard Brownian motion (B(t)) where
p
σ = ∆P (λ+ + λ− )u,
52
(6.1.10)
and
µ = ∆P(λ+ − λ− )u.
(6.1.11)
One can easily achieve more complex diffusive limits, such as a local volatility model, by imposing that
the limit is a function of P and t
u (λ+ + λ− )(∆P)2 → σ2 (P, t),
(6.1.12)
u (λ+ − λ− )∆P → µ(P, t).
(6.1.13)
and
This may indeed be the case if the original intensities are functions of P and t themselves.
6.2
6.2.1
Order book dynamics
Model setup: Poissonian arrivals, reference frame and boundary conditions
Consider the dynamics of a general order book under the assumption of Poissonian arrival times for
market orders, limit orders and cancellations. We shall assume that each side of the order book is fully
described by a finite number of limits K, ranging from 1 to K ticks away from the best available opposite
quote. We will use the notation
X(t) := (a(t); b(t)) := (a1 (t), . . . , aK (t); b1 (t), . . . , bK (t)) ,
(6.2.1)
where a := (a1 , . . . , aK ) designates the ask side of the order book and ai the number of shares available
i ticks away from the best opposite quote, and b := (b1 , . . . , bK ) designates the bid side of the book.
Note that, as opposed to the representation described in [97] or [303], see also [152] for an interesting
discussion, we adopt a finite moving frame, as we think it is more realistic, and also and more convenient
when scaling in tick size will be addressed.
Let us now describe the events that may happen:
• arrival of a new market order;
• arrival of a new limit order;
• cancellation of an already existing limit order.
These events are described by independent Poisson processes:
+
−
• M ± (t): arrival of new market order, with intensity λ M I(a , 0) and λ M I(b , 0);
±
• Li± (t): arrival of a limit order at level i, with intensity λiL ;
+
−
• Ci± (t): cancellation of a limit order at level i, with intensity λCi ai and λCi |bi |.
q is the size of any new incoming order, and the superscript “+” (respectively “−”) refers to the ask
(respectively bid) side of the book. Although in reality orders can have any size, we shall assume that all
orders have a fixed unit size q. This assumption is convenient to carry out our analysis and is, for now,
53
a7
PB
PA
S
a
a8
a6
a9
a5
a1 a2 a3 a4
b6
b4 b3 b2
b1
b5
b9
b7
b
b8
Figure 6.1: Order book dynamics: in this example, K = 9, q = 1, a∞ = 4, b∞ = −4. The shape of the order
book is such that a(t) = (0, 0, 0, 0, 1, 3, 5, 4, 2) and b(t) = (0, 0, 0, 0, −1, 0, −4, −5, −3). The spread S (t) = 5
ticks. Assume that at time t0 > t a sell market order dM − (t0 ) arrives, then a(t0 ) = (0, 0, 0, 0, 0, 0, 1, 3, 5), b(t0 ) =
(0, 0, 0, 0, 0, 0, −4, −5, −3) and S (t0 ) = 7. Assume instead that at t0 > t a buy limit order dL1− (t0 ) arrives one tick
away from the best opposite quote, then a(t0 ) = (1, 3, 5, 4, 2, 4, 4, 4, 4), b(t0 ) = (−1, 0, 0, 0, −1, 0, −4, −5, −3) and
S (t0 ) = 1.
of secondary importance in the problem we are interested in.
Note that the intensity of the cancellation process at level i is proportional to the available quantity at
that level. That is to say, each order at level i has a lifetime drawn from an exponential distribution with
±
intensity λCi . Note also that buy limit orders Li− (t) arrive below the ask price PA (t), and sell limit orders
Li+ (t) arrive above the bid price PB (t).
We impose constant boundary conditions outside the moving frame of size 2K: Every time the
moving frame leaves a price level, the number of shares at that level is set to a∞ (or b∞ depending on
the side of the book). Our choice of a finite moving frame and constant boundary conditions has three
motivations. Firstly, it assures that the order book does not empty and that PA , PB are always well
defined. Secondly, it keeps the spread S and the increments of PA , PB and P = (PA + PB )/2 bounded—
This will be important when addressing the scaling limit of the price. Thirdly, it makes the model
Markovian as we do not keep track of the price levels that have been visited (then left) by the moving
frame at some prior time. Figure 6.1 is a representation of the order book using the above notations.
6.2.2
Analytical treatment of some limit order book models
In this section we review some analytical results obtained on zero-intelligence models where processes
are kept sufficiently simple that a mean-field approximation may be derived [301][303] or probabilities
54
conditionally to the state of the order book may be computed [97]. The key assumptions here are such
that the process describing the order book is stationary. This allows either to write a stable density
equation, or to fit the model into a nice mathematical framework such as ergodic Markov chains.
[301] proposes an analytical treatment of the model introduced by [240] and reviewed in Chapter 5.
Let us briefly described the formalism used. The main hypothesis is the following: on each side of the
current price level, the density of limit orders is uniform and constant (and ρ+ on the ask side, ρ− on the
bid side). In that sense, this is a “mean-field” approximation since the individual position of a limit order
is not taken into account. Assuming we are in a stable state, the arrival of a market order of size s on
the ask (resp. bid) side will make the price change by x+ = s/ρ+ (resp. x− = s/ρ− ). It is then observed
that the transformations of the vector X = (x+ , x− ) occurring at each event (new limit order, new buy
market order, new sell market order) are linear transformation that can easily and explicitly be written.
Therefore, an equation satisfied by the probability distribution P of the vector X of price changes can be
obtained. Finally, assuming further simplifications (such as ρ+ = ρ− ), one can solve this equation for a
tail exponent and find that the distribution behaves as P(x) ≈ x−2 for large x. This analytical result is
slightly different from the one obtained by simulation in [240]. However, the numerous approximations
make the comparison difficult. The main point here is that some sort of mean-field approximation is
natural if we assume the existence of a stationary state of the order book, and thus may help handling
order book models.
[303] investigates the scaling properties of some liquidity and price characteristics in a limit order book
model. These results are summarized in table 6.1. In [303], orders arrive on an infinite price grid (This is
consistent as limit orders arrival rate per price level is finite). Moreover, the arrival rates are independent
of the price level, which has the advantage of enabling the analytical predictions summarized in table
6.1.
Quantity
Average asymptotic depth
Average spread
Slope of average depth profile
Price “diffusion” parameter at short time scales
Price “diffusion” parameter at long time scales
Scaling relation
λL /λC
M
L
λ /λ f (, ∆P/pc )
L
(λ )2 /λ M λC g(, ∆P/pc )
(λ M )2 λC /λL −0.5
(λ M )2 λC /λL 0.5
Table 6.1: Results of Smith et al. := q/(λ M /2λC ) is a “granularity” parameter that characterizes the
effect of discreteness in order sizes, pc := λ M /2λL is a characteristic price interval, and f and g are
slowly varying functions.
These results are obtained by mean-field approximations, which assume that the fluctuations at adjacent price levels are independent. This allows fruitful simplifications of the complex dynamics of the
order book. In addition, the authors do not characterize the convergence of the coarse-grained price
process in the sense of stochastic-process limits, nor do they show that the limiting process is precisely
a Brownian motion (theorem ??).
[97] is an original attempt at analytical treatments of limit order books. In their model, the price is
contrained to be on a grid {1, . . . , N}. The state of the order book can then be described by a vector
55
X(t) = (X1 (t), . . . , XN (t)) where |Xi (t)| is the quantity offered in the order book at price i. Conventionaly,
Xi (t), i = 1, . . . , N is positive on the ask side and negative on the bid side. As usual, limit orders arrive
at level i at a constant rate λi , and market orders arrive at a constant rate µ. Finally, at level i, each order
can be cancelled at a rate θi . Using this setting, [97] shows that each event (limit order, market order,
cancellation) transforms the vector X in a simple linear way. Therefore, it is shown that under reasonable
conditions, X is an ergodic Markov chain, and thus admits a stationary state. The original idea is then
to use this formalism to compute conditional probabilities on the processes. More precisely, it is shown
that using Laplace transform, one may explicitly compute the probability of an increase of the mid price
conditionally on the current state of the order book.
In this chapter, we use a combination of the two models in that the arrival rates are not uniformly
distributed across prices, and the reference frame is finite but moving.
6.2.3
Evolution of the order book
We can write the following coupled SDEs for the quantities of outstanding limit orders in each side of
the order book:1


i−1
X



dai (t) = − q −
ak  dM + (t) + qdLi+ (t) − qdCi+ (t)
+
k=1
+ (J
M−
(a) − a)i dM (t) +
−
K
X
−
(J Li (a) − a)i dLi− (t)
i=1
+
K
X
−
(JCi (a) − a)i dCi− (t),
(6.2.2)
i=1
and


i−1
X


dbi (t) = q −
|bk | dM − (t) − qdLi− (t) + qdCi− (t)
k=1
+
+
+ (J M (b) − b)i dM + (t) +
K
X
+
(J Li (b) − b)i dLi+ (t)
i=1
K
X
+
+
(JCi (b) − b)i dCi+ (t),
(6.2.3)
i=1
where the J’s are shift operators corresponding to the renumbering of the ask side following an event
affecting the bid side of the book and vice versa. For instance the shift operator corresponding to the
arrival of a sell market order dM − (t) of size q is2
J
M−






(a) = 0, 0, . . . , 0, a1 , a2 , . . . , aK−k  ,
|
{z
}


k times
1
2
Remember that, by convention, the bi ’s are non-positive.
−
−
For notational simplicity, we write J M (a) instead of J M (a; b) etc. for the shift operators.
56
(6.2.4)
with
k := inf{p :
p
X
|b j | > q} − inf{p : |b p | > 0}.
(6.2.5)
j=1
Similar expressions can be derived for the other events affecting the order book.
In the next sections, we will study some general properties of the order book, starting with the
generator associated with this 2K-dimensional continuous-time Markov chain.
6.2.4
Infinitesimal generator
Let us work out the infinitesimal generator associated with the jump process described above. We have
+
+
L f (a; b) = λ M ( f [ai − (q − A(i − 1))+ ]+ ; J M (b) − f )
+
+
K
X
i=1
K
X
+
+
λiL ( f ai + q; J Li (b) − f )
+
+
λCi ai ( f ai − q; JCi (b) − f )
i=1
− −
+ λ M f J M (a); [bi + (q − B(i − 1))+ ]− − f
+
+
K
X
i=1
K
X
−
−
λiL ( f J Li (a); bi − q − f )
−
−
λCi |bi |( f JCi (a); bi + q − f ),
(6.2.6)
i=1
where, to ease the notations, we note f (ai ; b) instead of f (a1 , . . . , ai , . . . , aK ; b) etc. and
x− := min(x, 0), x ∈ R.
x+ := max(x, 0),
(6.2.7)
The operator above, although cumbersome to put in writing, is simple to decipher: a series of standard
difference operators corresponding to the “deposition-evaporation” of orders at each limit, combined
with the shift operators expressing the moves in the best limits and therefore, in the origins of the frames
for the two sides of the order book. Note the coupling of the two sides: the shifts on the a’s depend on
the b’s, and vice versa. More precisely the shifts depend on the profile of the order book on the other
side, namely the cumulative depth up to level i defined by
A(i) :=
i
X
ak ,
(6.2.8)
|bk |,
(6.2.9)
k=1
and
B(i) :=
i
X
k=1
57
and the generalized inverse functions thereof
p
X
A−1 (q0 ) := inf{p :
a j > q0 },
(6.2.10)
|b j | > q0 },
(6.2.11)
j=1
and
B−1 (q0 ) := inf{p :
p
X
j=1
where q0 designates a certain quantity of shares3 .
Remark 6.2.1 The index corresponding to the best opposite quote equals the spread S in ticks, that is
iA := A (0) = inf{p :
−1
p
X
j=1
and
iB := B−1 (0) = inf{p :
p
X
|b j | > 0} =
j=1
6.2.5
S
:= iS ,
∆P
(6.2.12)
S
:= iS = iA .
∆P
(6.2.13)
a j > 0} =
Price dynamics
We now focus on the dynamics of the best ask and bid prices, denoted by PA (t) and PB (t). One can
easily see that they satisfy the following SDEs:
dPA (t) = ∆P[(A−1 (q) − A−1 (0))dM + (t)
K
X
−
(A−1 (0) − i)+ dLi+ (t) + (A−1 (q) − A−1 (0))dCi+A (t)],
(6.2.14)
i=1
and
dPB (t) = −∆P[(B−1 (q) − B−1 (0))dM − (t)
K
X
−
(B−1 (0) − i)+ dLi− (t) + (B−1 (q) − B−1 (0))dCi−B (t)],
(6.2.15)
i=1
which describe the various events that affect them: change due to a market order, change due to limit orders inside the spread, and change due to the cancellation of a limit order at the best price. Equivalently,
3
Note that a more rigorous notation would be
A(i, a(t)) and A−1 (q0 , a(t))
for the depth and inverse depth functions respectively. We drop the dependence on the last variable as it is clear from the
context.
58
the respective dynamics of the mid-price and the spread are:
∆P h −1
(A (q) − A−1 (0))dM + (t) − (B−1 (q) − B−1 (0))dM − (t)
2
K
K
X
X
−
(A−1 (0) − i)+ dLi+ (t) +
(B−1 (0) − i)+ dLi− (t)
dP(t) =
i=1
i=1
+ (A (q) − A
−1
−1
(0))dCi+A (t)
i
− (B−1 (q) − B−1 (0))dCi−B (t) ,
(6.2.16)
h
dS (t) = ∆P (A−1 (q) − A−1 (0))dM + (t) + (B−1 (q) − B−1 (0))dM − (t)
−
K
X
K
X
(A−1 (0) − i)+ dLi+ (t) −
i=1
(B−1 (0) − i)+ dLi− (t)
i=1
+ (A (q) − A
−1
−1
(0))dCi+A (t)
i
+ (B−1 (q) − B−1 (0))dCi−B (t) .
(6.2.17)
The equations above are interesting in that they relate in an explicit way the profile of the order book to
the size of an increment of the mid-price or the spread, therefore linking the price dynamics to the order
flow. For instance the infinitesimal drifts of the mid-price and the spread, conditional on the shape of the
order book at time t, are given by:
+
−
∆P h −1
(A (q) − A−1 (0))λ M − (B−1 (q) − B−1 (0))λ M
2
K
K
X
X
+
−
−
(A−1 (0) − i)+ λiL +
(B−1 (0) − i)+ λiL
E [dP(t)|(a; b)] =
i=1
i=1
+ (A (q) − A
−1
−1
+
(0))λCiA aiA
i
−
− (B−1 (q) − B−1 (0))λCiB |biB | dt,
(6.2.18)
and
h
+
−
E [dS (t)|(a; b)] = ∆P (A−1 (q) − A−1 (0))λ M + (B−1 (q) − B−1 (0))λ M
−
K
X
−1
(A (0) −
+
i)+ λiL
i=1
(A−1 (0) − i)+ λiL
−
i=1
+ (A (q) − A
−1
−
K
X
−1
+
(0))λCiA aiA
i
−
+ (B−1 (q) − B−1 (0))λCiB |biB | dt.
(6.2.19)
For simplicity, we recast this, or rather, these equations under a general form
Pt =
t 2K+2
X
Z
0
Fi (Xu )dNui ,
(6.2.20)
i=1
where the N i ’s are the Poisson processes driving the events affecting the limit order book, and the Fi ’s
are the corresponding jumps in the price.
59
6.3
Ergodicity and diffusive limit
In this section, our interest lies in the following questions:
1. Is the order book model defined above stable?
2. What is the stochastic-process limit of the price at large time scales?
6.3.1
Ergodicity of the order book and rate of convergence to the stationary state
Of the utmost interest is the behavior of the order book in its stationary state. We have the following
result:
±
Theorem 6.3.1 If λC = min1≤i≤K {λCi } > 0, then (X(t))t≥0 = (a(t); b(t))t≥0 is an ergodic Markov process.
In particular (X(t)) has a stationary distribution Π. Moreover, the rate of convergence of the order book
to its stationary state is exponential. That is, there exist r < 1 and R < ∞ such that
||Qt (x, .) − Π(.)|| ≤ Rrt V(x), t ∈ R+ , x ∈ S.
Proof. Let
V(x) := V(a; b) :=
K
X
ai +
i=1
K
X
(6.3.1)
|bi | + q
(6.3.2)
i=1
be the total number of shares in the book (+q shares). Using the expression of the infinitesimal generator
(6.2.6) we have
+
−
LV(x) ≤ −(λ M + λ M )q +
K
X
+
−
(λiL + λiL )q −
i=1
+
K
X
−
λiL (iS − i)+ a∞ +
i=1
K
X
+
−
(λCi ai + λCi |bi |)q
i=1
K
X
+
λiL (iS − i)+ |b∞ |
(6.3.3)
i=1
+
−
+
−
≤ −(λ M + λ M )q + (ΛL + ΛL )q − λC qV(x)
+
−
+ K(ΛL a∞ + ΛL |b∞ |),
where
±
ΛL :=
K
X
(6.3.4)
±
±
λiL and λC := min {λCi } > 0.
1≤i≤K
i=1
(6.3.5)
The first three terms in the right hand side of inequality (6.3.3) correspond respectively to the arrival of a
market, limit or cancellation order—ignoring the effect of the shift operators. The last two terms are due
to shifts occurring after the arrival of a limit order inside the spread. The terms due to shifts occurring
after market or cancellation orders (which we do not put in the r.h.s. of (6.3.3)) are negative, hence
the inequality. To obtain inequality (6.3.4), we used the fact that the spread iS is bounded by K + 1—a
consequence of the boundary conditions we impose— and hence (iS − i)+ is bounded by K.
60
The drift condition (6.3.4) can be rewritten as
LV(x) ≤ −βV(x) + γ,
(6.3.6)
for some positive constants β, γ. Inequality (6.3.6) together with theorem 7.1 in [246] let us assert that
(X(t)) is V-uniformly ergodic, hence (6.3.1).
Corollary 6.3.1 The spread S (t) = A−1 (0, a(t))∆P = S (X(t)) has a well-defined stationary distribution—
This is expected as by construction the spread is bounded by K + 1.
6.3.2
Large-scale limit of the price process
This section is devoted to the asymptotics of the price process. The ergodic theory of Markov processes
is combined with the martingale convergence theorem to obtain the results. This approach is extremely
general and flexible, and prone to many generalizations for markovian models of limit order books. We
state and prove our main result concerning the long-time price dynamics in the case of Poissonian arrival
times.
Denote then by Π the stationary distribution of X as provided by Proposition 6.3.1. Using the Ergodic
Theorem B.2.1 together with the deep Martingale Convergence Theorem 7.1.4 of [125], one can show
the
Proposition 6.3.1 Consider the price process described by Equation (6.2.20) above, and introduce the
sequence of martingales Pˆ n formed by the centered, rescaled price
Pnt − Qnt
Pˆ nt ≡
,
√
n
where Q is the compensator of P
Qt =
X
i
λ
t
Z
i
Fi (X s )ds.
0
Then, Pˆ n converges in distribution to a Wiener process σW,
ˆ
where the volatility σ
ˆ is given by
1X i
λ
σ
ˆ = lim
t→+∞ t
i
t
Z
X
(Fi (X s )) ds =
2
2
0
λi
Z
(Fi (X))2 Π(dX).
i
Proof. Proposition 6.3.1 will follow from the convergence of the predictable quadratic variation of Pˆ n .
By construction, there holds
or else
nt
1X i
< P , P >t =
λ
n i
Z
1X i
< P , P >t = t(
λ
n i
Z
ˆn
ˆn
(Fi (X s ))2 ds,
ˆn
0
nt
(Fi (X s ))2 ds),
ˆn
61
0
and Theorem B.2.1 ensures that
a.s.
1 X i
lim
λ
t→+∞ nt
i
nt
Z
(Fi (X s )) ds =
2
X
0
λi
Z
(Fi (X))2 Π(dX)
i
whenever the integrability conditions (see Theorem B.2.1) are satisfied. Now, those are easily seen to
hold true, since the integrand in the predictable quadratic variation is a bounded function. As a matter
of fact, the only possibly unbounded term would come from the cancellation rate, proportional to the
ai , |b|i ’s. However, whenever a cancellation order causes a price change, then necessarily, the book is in
a state where the quantity at the best limit that moves is precisely equal to q. Hence, the boundedeness
follows.
The other condition for the martingale convergence theorem to apply is trivially satisfied, since the size
of the jumps of Pˆ n is bounded by √C , C being some constant.
n
Appealing as it first seems, Proposition 6.3.1 is not satisfactory: in order to give a more precise characterization of the dynamics of the rescaled price process, it is necessary to understand thoroughly the
behaviour of its compensator Qnt . As a matter of fact, Qnt itself satisfies an ergodic theorem, and if its
asymptotic variance is not negligible w.r. to nt, one cannot conclude directly from Proposition 6.3.1 that
the rescaled price process
P
√nt
n
behaves like a Wiener process with a deterministic drift.
The next result provides a more accurate answer, valid under general ergodicity conditions.
Theorem 6.3.2 Write as above the price
t
XZ
Pt =
Fi (X s )dN si
0
i
and its compensator
Qt =
X
λi
t
Z
Fi (X s )ds.
0
i
Define
X
h=
λi Fi (X)
i
and let
a.s.
1X i
λ
α = lim
t→+∞ t
i
t
Z
(Fi (X s ))ds =
Z
h(X)Π(dX).
0
Finally, introduce the solution g to the Poisson equation
Lg = h − α
and the associated martingale
Zt = g(Xt ) − g(X0 ) −
Z
t
Lg(X s )ds ≡ g(Xt ) − g(X0 ) − Qt + αt.
0
62
Then, the deterministically centered, rescaled price
Pnt − αnt
P¯ nt ≡
√
n
converges in distribution to a Wiener process σW.
¯
The asymptotic volatility σ
¯ satisfies the identity
1X i
λ
t→+∞ t
i
σ
¯ 2 = lim
t
Z
((Fi − ∆i (g))(X s ))2 ds ≡
X
0
λi
Z
((Fi − ∆i (g))(X))2 Π(dx).
i
where ∆i (g)(X) denotes the jump of the process g(X) when the process N i jumps and the limit order book
is in the state X.
Proof. The martingale method, see e.g. [155][111][196], consists in rewriting the price process under
the form
Pt = (Pt − Qt ) − Zt + g(Xt ) − g(X0 ) + αt ≡ (Mt − Zt ) + g(Xt ) − g(X0 ) + αt,
so that
Vnt + g(Xt ) − g(X0 )
P¯ nt =
,
√
n
where V = M − Z is a martingale. Therefore, the theorem is proven iff one can show that
g(Xt )−g(X0 )
√
n
converges to 0 in L2 (Π(dx)), or simply, that g ∈ L2 (Π(dx)).
Theorem 4.4 of [155] states that the condition
h2 6 V
(6.3.7)
(where V is a Lyapunov function for the process) is sufficient for g to be in L2 (Π(dx)) - but Condition
(6.3.7) is trivially satisfied since h is bounded.
6.4
Conclusions
In this chapter, we have analyzed a simple Markovian order book model, in which elementary changes
in the price and spread processes are explicitly linked to the instantaneous shape of the order book and
the order flow parameters.
Two basic properties were investigated: the ergodicity of the order book and the large-scale limit of
the price process. The first property, which we answered positively, is desirable in that it assures the stability of the order book in the long run, and gives a theoretical underpinning to statistical measurements
on order book data. The scaling limit of the price process is, as anticipated, a Brownian motion. A key
ingredient in this result is the convergence of the order book to its stationary state at an exponential rate,
a property equivalent to a geometric mixing condition satisfied by the stationary version of the order
book. This short memory effect, plus a constraint on the variance of price increments guarantee a diffusive limit at large time scales. Our assumptions are independent Poissonian order flows, proportional
cancellation rates, and the presence of two reservoirs of liquidity K ticks away from the best quotes to
guarantee that the spread does not diverge.
63
In a sense, this chapter serves as a mathematical justification to the Bachelier model of asset prices,
from a market microstructure perspective. In reality, the picture is however more subtle: even if the price
process is asymptotically diffusive, at short time scales, the model produces stronger anti-correlation in
traded prices than what is actually observed in the data. At those time scales, price diffusivity is arguably
the result of a balance between persistent liquidity taking and anti-persistent liquidity providing.
We believe however that the approach presented here is interesting for clearly identifying conditions
under which the asymptotic normality of price increments holds; and more importantly, for introducing
a set of mathematical tools for further investigating the price dynamics in more sophisticated stochastic
order book models. For instance, as documented in Chapter 4, market orders excite limit orders and vice
versa, so that the actual order flows exhibit non-negligible cross dependences. A possible solution for
endogenously incorporating these dependences is the use of mutually exciting processes:
Z
t
λ (t) = λ (0) +
ϕ MM (t − s)dN M (s)
0
Z t
+
ϕLM (t − s)dN L (s),
M
M
(6.4.1)
0
and,
Z
t
λ (t) = λ (0) +
ϕLL (t − s)dN L (s)
0
Z t
+
ϕ ML (t − s)dN M (s),
L
L
(6.4.2)
0
This model has the additional advantage of capturing clustering in order arrivals (due to the selfexcitation terms ϕ MM and ϕLL ), and for exponentially decaying kernels can be cast into a Markovian
setting. Chapter 8 is devoted to the general study of such a model.
Many other generalizations are possible, and in particular, our approach can be extended to very general
markovian order books: as long as there exists a Lyapunov function, one can apply exactly the same
tools to study ergodicity and price asymptotics.
64
Chapter 7
The order book as a queueing system
In this chapter, we move forward on the study of zero-intelligence limit order books, by studying the
analytical properties of a (one-sided) order book model in which the flows of limit and market orders
are Poisson processes, and the distribution of lifetimes of cancelled orders is exponential. The model is
then extended to the case of random order sizes, thereby allowing to study the relationship between the
size of the incoming limit orders and the shape of the order book.
7.1
The shape of a limit order book
What is called the "shape" of the order book is simply the function which for any price gives the number
of shares standing in the order book at that price. We call "cumulative shape" up to price p the total
quantity offered in the order book between the best limit and price p. These quantities measure what is
sometimes known as the depth of the order book. In a probabilistic model of the order book in which
the random variable describing the (cumulative) shape admits a stationary distribution, its expectation
with respect to this stationary distribution will be simply called the average (cumulative) shape.
Understanding the average shape of an order book and its link to the order flows is not straightforward. [40] observed that, at least for the first limits, the shape of the order book is increasing away from
the spread. A decade later, and still on the Paris Bourse, [49] and [276] propose improved results. It is
observed (using 2001 data) that "the average order book has a maximum away from the current bid/ask,
and a tail reflecting the statistics of the incoming orders" (hump-shaped limit order book). A power law
distribution is suggested for modelling the decrease of the flow of limit orders away from the spread.
Furthermore, [49] provides an analytical approximation to fit the average shape of the order book, that
depends on the incoming flow of limit orders. [163] confirms the hump-shaped form of the order book
by an empirical study of liquid Chinese stocks (using 2003 data), but finds an exponential decrease of
the average shape as the price increases. [292] also observes the hump-shaped for the US Google stock
(using 2005 data).
More recently, research about optimal execution in an order book and order book simulations has
risen interest in understanding the shape of orders books. Indeed, optimal execution problems need an
order book shape as input: block-shape has been used in toy models, general shapes have recently been
considered [256, 278, 20]. The inverse function of the cumulative shape of the order book is known as
65
the virtual (or: instantaneous) market impact, and may be of interest in these problems, even though it
is not a direct proxy for the price impact of a trade [see e.g. 131, 47]. As for order book simulations,
once the flows of orders have been defined, checking the simulated shape is important in assessing that
the model is realistic. A log-normal shape of the order book has been suggested [279].
An order book is primarily a complex queueing system. Recent studies [97, 94] build on this observation and use classical tools from queueing theory such as the computation of Laplace-Stieljes transforms
of distributions of interest. [97] observes that "the average profile of the order book displays a hump
[. . . ] that does not result from any fine-tuning of model parameters or additional ingredients such as
correlation between order flow and past price moves", but no explicit link between the average shape
and the order flows is made. Our paper hopefully closes this gap. We start with the simplest queueing
system as an order book: we consider a one-side limit order book model in which limit orders and market orders occur according to Poisson processes. It is also assumed that all limit orders are submitted
with an exponentially distributed lifetime, i.e. they are cancelled if they have not been executed some
exponential time after their submission. This starting model is discrete in the sense that the price is an
integer, expressed in number of ticks.
In section 7.2, the model is described and we show recalling simple results from queueing theory
that this system admits an analytically computable expected size associated to its stationary distribution.
We then develop this into a continuous model by replacing the finite set of Poisson processes describing
the arrival of limit orders with a spatial Poisson process. The main interest of the continuous version of
the model is to allow easier analytical manipulation and to describe the frequent case in which the tick
size is small compared to standard variations of prices. We obtain an analytical formula for the average
shape of the continuous order book, that depends explicitly on the arrival rates of limit, market and
cancellation orders. This shape has an interpretation in terms of the probability that a submitted limit
order be cancelled (as opposed to; executed), introducing a law of conservation of the flows of orders in
a Markovian order book.
In section 7.3, we compare our analytical formula for the shape of the order book to the numerical
investigations of [303], and then to the only (to our knowledge) existing analytical formula for the shape
of an order book available in the literature, proposed by [49].
Section 7.4 extends the previous model by allowing for random sizes of limit orders. We update our
analytical formulas in the special case in which the sizes of limit orders are assumed to be geometrically
distributed. Thanks to this model, we investigate in section 7.5 the role of the size of limit orders in
an order book, providing insights on its influence on the order book shape. Few empirical literature
provides insights on the factors that make the average shape of order books vary. [37] analyses the
variations of order books on the German Stock Exchange (2004 data) through a principal component
analysis and identifies two main factors that respectively shift and rotate the shape of the order book.
[255] investigates the slope of the order book around the spread as a function of the traded volume on
108 Norwegian stocks (2001 data) and, among other results, exhibit a positive relation between the order
book slope around the spread and the trading activity: high trading activity tends to increase the slope
of the first limits of an order book. Our model suggests an influence of the relative size of limit orders,
that we empirically test using 2010 trading data on the Paris Stock Exchange. Section 7.6 concludes this
work.
66
7.2
7.2.1
A link between the flows of order and the shape of an order book
The basic queueing system
The aim of this section is to present the basic one-side order book model, discuss the relevance of its
assumptions and recall some results from queueing theory in the context of this order book model. Let
us consider a one-side order book model, i.e. a model in which all limit orders are ask orders, and all
market orders are buy orders. Bid price is assumed to be constantly equal to zero, and consequently
spread and ask price are identically equal. From now on, this quantity will be simply refered to as the
price. Let p(t) denote the price at time t. {p(t), t ∈ [0 ∈ ∞)} is a continuous-time stochastic process
with value in the discrete set {1, . . . , K}. In other words, the price is given in number of ticks. Let ∆
be the tick size, such that the price range of the model in currency is actually {∆, . . . , K∆}. For realistic
modelling and empirical fitting performance, one may assume that the maximum price K is chosen very
large, but in fact it will soon be obvious that this upper bound does not affect in any way the order book
for lower prices. For all i ∈ {1, . . . , K}, (ask) limit orders at price i are submitted according to a Poisson
process with parameter λi . These processes as assumed to be mutually independent, so that the number
P
of orders submitted at prices q . . . , r is a Poisson process with parameter λq→r defined as λq→r = ri=q λi .
All limit orders standing in the book may be cancelled. It is assumed that the time intervals between
submission and cancellation form a set of mutually independent random variables identically distributed
according to an exponential distribution with parameter θ > 0. Finally, (buy) market orders are submitted
at random times according to a Poisson process with parameter µ. Note that all orders are assumed to be
of unit size. This restriction will be dropped in sections 7.4 et sq.
Let us comment here on the model specification. Firstly, parameter θ is difficult to grasp empirically,
and the use, for our analytical purposes, of the unconditional exponential distribution is rough. [77] finds
an asymptotic power-law distribution for this quantity, confirmed in [67]. [114] propose a two-regime
power-law distribution (with a different exponent for small and large lifetimes of orders). It seems also
plausible that the lifetime of limit orders that are cancelled in an order book varies with the distance
between the order price and the current ask price, among many other parameters. [248] outperforms the
basic exponential model with an empirically-defined three-parameter cancellation model, too complex
to be analytically used here. Secondly, as for the limit and market orders, several empirical studies show
that the Poisson processes do not reflect the complexity of the order flows. Complex point processes
have been suggested to model the interactions between the order flows, and the consequent resilience of
the order book [e.g. 215, 253]. We however focus on a order flows as Poisson processes in order to keep
as much analytical tractability as possible. Thirdly, focusing on a one-side limit order book answers
to the same need for simplicity. Such models are often considered in optimal execution papers [e.g.
256, 278, 20] or in empirical papers [e.g. 49] where a bid/ask symmetry is assumed. Our specification
is equivalent to a two-side model with an infinite-volume limit bid order standing in the book at price 0.
This situation could be found on a market where several sellers would make continuous-time offers that
may be cancelled to a single-buyer that alone decides when to buy. Note also that our toy model is a halfbook of the double-side book studied by [97]. Finally, despite these very simple specifications, we will
obtain general flexible order shapes that are both empirically valid and founded by market mechanisms.
67
Let {L1→k (t), t ∈ [0, ∞)} be the stochastic process representing the number of limit orders at prices
1, . . . , k standing in the order book at time t. L1→k is thus the cumulative shape of the order book in our
model. L1→k can be viewed as a birth-death process with birth rate λ1→k et death rate µ + nθ in state
n ; it may equivalently be viewed as the size of a M/M/1 + M queueing system with arrival rate λ1→k ,
service rate µ and reneging rate θ (see e.g. [137, Chapter XVII] or [59, Chapter 8] among many textbook
references). This queueing system will now be refered to as the 1 → k queueing system. L1→k admits a
stationary distribution π1→k (·) as soon as θ > 0. Using the formalism of the infinitesimal generator:

 −λ1→k
λ1→k
0
0

 µ + θ −(λ1→k + µ + θ)
λ1→k
0
A = 
0
µ + 2θ
−(λ1→k + µ + 2θ) λ1→k


..
..
..
..
.
.
.
.

. . . 

0 . . . 
 ,
0 . . . 

. . . . 
.
.
0
(7.2.1)
this stationary probability π1→k is classically obtained and written for all n ∈ N∗ :
n
Y
λ1→k
,
µ
+
iθ
i=1
(7.2.2)
∞ n

X Y λ1→k −1

 .
π1→k (0) = 
µ + iθ 
(7.2.3)
π1→k (n) = π1→k (0)
and setting
n=0 π(n)
P∞
= 1 gives:
n=0 i=1
Introducing the normalized parameters ν1→k =
for all n ∈ N:
λ1→k
θ
and δ = µθ , and after some simplifications, we write
n
δ
Y
e−ν1→k ν1→k
ν1→k
π1→k (n) =
,
δΓν1→k (δ) i=1 i + δ
(7.2.4)
where Γy is the lower incomplete version of the Euler gamma function:
Γy : R+ → R, x 7→
y
Z
t x−1 e−t dt.
(7.2.5)
0
Now, the price in the one-side order book model is equal to k if and only if the "1 → k − 1" queueing
system is empty and the "1 → k" system is not. Therefore, if L1→k is distributed according to the
invariant distribution π1→k , then the distribution π p of the price p is written:
δ
e−ν1→1 ν1→1
,
π p (1) = 1 −
δΓν1→1 (δ)
δ
e−ν1→K−1 ν1→K−1
π p (K) =
,
δΓν1→K−1 (δ)
(7.2.6)
and for all k ∈ {2, . . . , K − 1},
π p (k) =
δ
e−ν1→k−1 ν1→k−1
δΓν1→k−1 (δ)
−
δ
e−ν1→k ν1→k
δΓν1→k (δ)
.
(7.2.7)
Using previous results, the average size E[L1→k ] of the "1 → k" queueing system is easily computed.
68
From equation (7.2.4), we can write after some simplifications:
E[L1→k ] = ν1→k −
Γν1→k (1 + δ)
.
Γν1→k (δ)
(7.2.8)
Let Lk = L1→k − L1→k−1 be the number of orders in the book at price k ∈ {1, . . . , K}. Then the average
shape of the order book at price k is obviously :
!
Γν1→k (1 + δ) Γν1→k−1 (1 + δ)
E[L ] = νk −
−
.
Γν1→k (δ)
Γν1→k−1 (δ)
k
7.2.2
(7.2.9)
A continuous extension of the basic model
In order to facilitate the comparison with existing results, we derive a continuous version of the previous
toy model. Price is now assumed to be a positive real number. Mechanisms for market orders and
cancellations are identical: unit-size market orders are submitted according to a Poisson process with
rate µ, and standing limit orders are cancelled after some exponential random time with parameter θ. As
for the submission of limit orders, the mechanism is now slightly modified: since the price is continuous,
instead of a finite set of homogeneous Poisson processes indexed by the number of ticks k ∈ {1, . . . , K},
we now consider a spatial Poisson process on the positive quadrant R2+ . Let λ(p, t) be a non-negative
function denoting the intensity of the spatial Poisson process modelling the arrival of limit orders, the
first coordinate representing the price, the second one the time [see e.g. 282, Chapter 12 for a textbook
introduction on the construction of spatial Poisson processes]. As in the discrete case, this process
is assumed to be time-homogeneous, and it is hence assumed that price and time are separable. Let
hλ : R+ → R+ denote the spatial intensity function of the random events, i.e. limit orders. Then,
λ(p, t) = αhλ (p) is the intensity of the spatial Poisson process representing the arrival of limit orders.
We recall that in this framework, for any p1 < p2 ∈ [0, ∞), the number of limit orders submitted at
Rp
Rp
a price p ∈ [p1 , p2 ] is a homogeneous Poisson process with intensity p 2 λ(p, t) d p = α p 2 hλ (p) d p.
1
1
Furthermore, if p1 < p2 < p3 < p4 on the real positive half-line, then the number of limit orders
submitted in [p1 , p2 ] and [p3 , p4 ] form two independent Poisson processes.
Now, let L([0, p]) be the random variable describing the cumulative size of our new order book up
to price p ∈ R+ . Given the preceding remarks, L([0, p]) is, as in the previous section, the size of a
Rp
M/M/1 + M queueing system with arrival rate α 0 hλ (u) du, service rate µ and reneging rate θ. Using
the results of section 7.2.1, we obtain from equation (7.2.8):
E[L([0, p])] =
p
Z
h(u) du − f δ,
0
where we have defined h(u) =
αhλ (u)
θ
R
p
h(u) du
0
!
h(u) du ,
(7.2.10)
0
and :
f (x, y) =
From now on, H(p) =
p
Z
Γy (1 + x)
.
Γy (x)
(7.2.11)
will be the (normalized) arrival rate of limit orders up to price p,
and B(p) = E[L([0, p])] will be the average cumulative shape of the order book up to price p. Then
69
b(p) =
dB(p)
dp
will be the average shape of the order book (per price unit, not cumulative). Straightforward
differentiation of equation (7.2.10) and some terms rearrangements lead to the following proposition.
Proposition 7.2.1 In a continuous order book with homogeneous Poisson arrival of market orders with
intensity µ, spatial Poisson arrival of limit orders with intensity αhλ (p), and exponentially distributed
lifetimes of non-executed limit orders with parameter θ, the average shape of the order book b is computed for all p ∈ [0, ∞) by:
h
h
ii
b(p) = h(p) 1 − δ (gδ ◦ H) (p) 1 − δ[H(p)]−1 1 − (gδ ◦ H) (p) ,
where
gδ (y) =
e−y yδ
.
δΓy (δ)
(7.2.12)
(7.2.13)
Let us give a few comments on the average shape we obtain. Firstly, note that by identification to
equation (7.2.4), observing that π1→k (0) = gδ (ν1→k ) in the discrete model, gδ (H(p)) is to be interpreted
as the probability that the order book is empty up to a price p. Secondly, note that letting δ → 0 in
equation (7.2.12) gives b(p) → h(p) (cf. limδ→0 gδ (y) = e−y ). Indeed, if there were no market orders,
then the average shape of the order book would be equal to the normalized arrival rates. Thirdly, as
p → ∞, we have b(p) ∼ k h(p) for some constant k. This leads to our main comment, which we state as
the following proposition.
Proposition 7.2.2 The shape of the order book b(p) can be written as:
b(p) = h(p)C(p),
(7.2.14)
where C(p) is the probability that a limit order submitted at price p will be cancelled before being
executed.
This proposition translates a law of conservation of the flows of orders: the shape of the order book
is exactly the fraction of arriving limit orders that will be cancelled. The difference between the flows
of arriving limit orders and the order book is exactly the fraction of arriving limit orders that will be
executed.
The proof is straightforward. Indeed, in the 1 → k queueing system, the average number of limit
orders at price k that are cancelled per unit time is θE[Lk ] (θE[L1→k ] is the reneging rate of 1 → k queue
using queueing system vocabulary). Therefore, the fraction of cancelled orders at price k over arriving
limit orders at price k, per unit time, is Ck =
θE[Lk ]
λk .
Using equation (7.2.9) and some straightforward
computations, the fraction Ck of limit orders submitted at price k which are cancelled is:
Ck = 1 −
δ
(gδ (ν1→k−1 ) − gδ (ν1→k )) .
νk
(7.2.15)
Therefore, in the continuous model up to price p ∈ R+ with average cumulative shape B(p), the fraction
of limit orders submitted at a price in [p, p + ] which are cancelled is written:
1−
δ
(gδ (H(p)) − gδ (H(p + ))) .
H(p + ) − H(p)
70
(7.2.16)
By letting, → 0, we obtain that the fraction C(p) of limit orders submitted at price p ∈ R+ which are
cancelled is :
C(p) = 1 −
δg0δ (H(p))
!
δ
= 1 − δ(gδ ◦ H)(p) 1 −
(1 − (gδ ◦ H)(p)) ,
H(p)
(7.2.17)
which gives the final result.
This law of conservations of the flows of orders explains the relationship between the shape of the
order book and the flows of arrival of limit orders. For high prices, two cases are to be distinguished. On
the one hand, if the total arrival rate of limit orders is a finite positive constant α (for example when hλ is
R∞
a probability density function on [0, +∞), in which case lim p→+∞ H(p) = 0 λ(u, t) du = α ∈ R∗+ ), then
the proportionality constant between the shape of the order book b(p) and the normalized limit order
flow h(p) is, as p → +∞, C∞ defined as:
δ
C∞ = lim C(p) = 1 − δgδ (α) 1 − (1 − gδ (α)) < 1.
p→∞
α
(7.2.18)
In such as case, the shape of the order book as p → +∞ is proportional to the normalized rate of arrival of
limit orders h(p), but not equivalent. The fraction of cancelled orders does not tend to 1 as p → +∞, i.e.
market orders play a role even at high prices. On the other hand, in the case where lim p→+∞ H(p) = ∞,
then very high prices are not reached by market orders, and the tail of the order book behaves exactly
as if there were no market orders: b(p) ∼ h(p) as p → +∞. We may remark here that there exists
an empirical model for the probability of execution F(p) = 1 − C(p) in [248]. In this latter model,
it is assumed to be the complementary cumulative distribution function of a Student distribution with
parameter s = 1.3. As such, it is decreasing towards 0 as p−s . In our model however, it is exponentially
decreasing, and, in view of the previous discussion, does not necessarily tends towards 0.
7.3
Comparison to existing results on the shape of the order book
The model presented in section 7.2 belongs to the class of "zero-intelligence" Markovian order book
models: all order flows are independent Poisson processes. Although very simple, it turns out it replicates the shapes of the order book usually obtained in previous empirical and numerical studies, as we
will now see.
7.3.1
Numerically simulated shape in [303]
A first result on the shape of the order book is provided in [303], on figures 3a) and 3b). These figures are
obtained by numerical simulations of an order book model very similar to the one presented in section
7.2, where all order flows are Poisson processes: market orders are submitted are rate µS with size σS ,
limit orders are submitted with the same size at rate αS per unit price on a grid with tick size d pS , and all
orders are removed randomly with constant probability δS per unit time. (We have indexed all variables
with an S to differentiate them from our own notations). Figures 3a) and 3b) in [303] are obtained for
different values of a "granularity" parameter S ∝
δS σS
µS
. It is observed that, when S gets larger, the
average book becomes deeper close to the spread, and thinner for higher prices.
71
Using our own notations, S actually reduces to 1δ , i.e. the inverse of the normalized rate of arrival
of market orders. Using [303]’s assumption that limit orders arrive at constant rate αS per unit price and
unit time, we obtain in our model λ(p, t) = αS , i.e. H(p) = αS p. On figure 7.1, we plot the average
shapes and cumulative shapes of the order book given at equations (7.2.12) and (7.2.10) with this H. It
turns out that when δ varies, our basic model indeed reproduces precisely figure 3a) and 3b) of [303].
Therefore we are able to analytically describe the shapes that were only numerically obtained. These
shapes can be straightforwardly obtained with different regimes of market orders in our basic model:
when the arrival rates of market orders increases (i.e. when S increases), all other things being equal,
the average shape of the order book is thinner for lower prices.
7.3.2
Empirical and analytical shape in [49]
We now give two more examples of order book shapes obtained with equation (7.2.12) of proposition
7.2.1. We successively consider two types of normalized intensities of arrival rates of limit orders:
• exponentially decreasing with the price: h(u) = αθ βe−βu ;
• power-law decreasing with the price: h(u) = αθ (γ − 1)(1 + u)−γ , γ > 1.
The first case is the one observed on Chinese stocks by [163]. The second case is the one suggested in an
empirical study by [49], in which γ ≈ 1.5 − 1.7. Moreover, the latter paper provides the only analytical
formula previoulsy proposed (to our knowledge) linking the order flows and the average shape of a limit
order book: [49] derives an analytical formula from a zero-intelligence model by assuming that the price
process is diffusive with diffusion constant D. Using our notations, their average order book, denoted
here bBP , is for any p ∈ (0, ∞):
−σ−1 p
p
Z
hλ (u) sinh(σ u) du + sinh(σ p)
−1
bBP (p) ∝ e
∞
Z
−1
0
−1 u
hλ (u)e−σ
du,
(7.3.1)
p
where the parameter σ2 is homogeneous to the variance of a price (it is proportional to the diffusion
coefficient divided by the cancellation rate). σ is thus interpreted in [49] as "the typical variation of price
during the lifetime of an order, and [it] fixes the scale over which the order book varies". Therefore,
although σ is not available in our model, since our one-side model does not have a diffusive price,
we may however obtain a satisfying order of magnitude for the parameter by computing the standard
deviation of the price in our model, using numerical simulations (see also remark 7.3.1 below).
We plot the shape of [49] for the two types of normalized arrival rates of limit orders previously
mentionned. Note that the formula (7.3.1) is defined up to a multiplicative constant that we arbitrarily
fix such that the maximum offered with respect to the price in our model is equal to the maximum of
equation (7.3.1). Results are plotted in figure 7.2, and numerical values given in caption. It turns out
that our model (7.2.12) and equation (7.3.1) provide similar order book shapes. Since equation (7.3.1)
has been successfully tested with empirical data in [49], figure 7.2 provides a good hint that the shape
(7.2.12) could provide good empirical fittings as well. As p → ∞, both formulas lead to a shape b(p)
decreasing as the arrival rate of limit orders h(p), which was already observed in [49], and discussed
here in section 7.2. The main difference between the shapes occurs as p → 0. Equation (7.3.1) imposes
72
1.0
Average shape b p
0.8
0.6
0.4
0.2
0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
2.0
2.5
3.0
Price p
Average cumulative shape B p
2.0
1.5
1.0
0.5
0.0
0.0
0.5
1.0
1.5
Price p
Figure 7.1: Shape (top panel) and cumulative shape (bottom panel) of the order book computed using
equations (7.2.12) and (7.2.10) with h(p) = α, α = 8 and δ = 1 (full line), δ = 2 (dashed), δ = 6
(dotdashed), δ = 16 (dotted). Note that results are scaled on the same dimensionless axes used in [303].
73
12
Average shape b(p)
10
8
6
4
2
0
0
2
4
6
8
10
Price p
Figure 7.2: Comparison of the shapes of the order book in our model (black curves) and using the formula proposed by [49] (red curves). Three examples are plotted: arrival of limit orders with exponential
prices, α = 20, β = 0.75, µ = 4, θ = 1 (full lines) ; idem with µ = 8 (dashed lines) ; arrival of limit
orders with power-law prices, α = 40, γ = 1.6, µ = 4, θ = 1 (dash-dotted lines).
that bBP = 0, whereas the result (7.2.12) allows a more flexible behaviour with b(0) =
h(0)
1+δ ,
a quantity
that depends on the three types of order flows. The difference of behaviour close to the spread is not
surprising considering the different natures of both models.
Remark 7.3.1 We have used numerical simulations of our model to compute the standard deviation
of the price in our model. Note however that this could be found by a numerical evaluation of the
analytical form of the standard deviation of the price. Indeed, the stationary distribution of the price in
our continuous model can be explicitly derived. Assuming this distribution admits a density function π p ,
then the same observation that leads to equations (7.2.6) and (7.2.7) in the discrete case gives here for
any p ∈ [0, ∞):
Z
p
π p (p) d p = 1 −
0
e−H(p) [H(p)]δ
,
δΓH(p) (δ)
(7.3.2)
which is written by straighforward differentiation and some terms rearrangements:
1 − C(p)
.
π p (p) = h(p)(gδ ◦ H)(p) 1 − δ[H(p)]−1 (1 − (gδ ◦ H)(p)) = h(p)
δ
(7.3.3)
Some examples of this distribution are plotted on figure 7.3. Now, using this explicit distribution of the
price in our order book model, we may compute its standard deviation in the example cases described
above, by numerically evaluating the integrals defining the first two moments of the distribution.
Finally, let us recall that the price process in our model is a jump process with right-continuous
paths, so it is not diffusive. Note however that the price process in similar two-sided order book models
has been shown to admit a diffusive limit with an appropriate time scaling [see e.g. 7].
74
3.0
Probability density function
2.5
2.0
1.5
1.0
0.5
0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Price
Figure 7.3: Price density function π p as a function of the price, computed with equation (7.3.3), with hλ
constant (full line), exponentially decreasing (dotted) and power-law decreasing (dashed).
7.4
A model with varying sizes of limit orders
We now allow for random sizes of limit orders in our model. As in section 7.2, we start by describing
the basic model as a queueing system, and then extend it to the case of a continuous price.
Let us recall that we deal with a one-side order book model, i.e. a model in which all limit orders are
ask orders, and all market orders are bid orders. Let p(t) denote the price at time t. {p(t), t ∈ [0 ∈ ∞)}
is a continuous-time stochastic process with value in the discrete set {1, . . . , K}, i.e. the price is given
in number of ticks. For all i ∈ {1, . . . , K}, (ask) limit orders at price i are submitted according to a
Poisson process with parameter λi . These processes as assumed to be mutually independent, so that the
number of orders submitted at prices between q and r (included) is a Poisson process with parameter
P
λq→r defined as λq→r = ri=q λi .
The contribution of this section is to allow for random sizes of limit orders, instead of having unitsize limit orders as in the basic model of section 7.2. We assume that all the sizes of limit orders are
independent random variables. We also assume that the sizes of limit orders submitted at a given price
are identically distributed, but we allow this distribution to vary depending on the price. For a given
price k ∈ N∗ , let gkn , n ∈ N∗ , denote the probability that a limit order at price k is of size n. Let gk denote
the mean size of a limit order at price k, which is assumed to be finite. It is a well known property of the
Poisson process to state that the rate of arrival of limit orders of size n at price i is λi gin , hence the rate of
P
arrival of limit orders of size n with a price lower or equal to k is ki=1 λi gin . Similarly, the probability that
P
P
λi
λi
gin . Let g1→k = ki=1 λ1→k
gi
a limit order with a price lower or equal to k is of size n is g1→k
= ki=1 λ1→k
n
denote the mean size of a limit order with price up to k.
Mechanism for cancellation is unchanged: all limit orders standing in the book may be cancelled.
Note however that a limit order is not cancelled all at once, but unit by unit, i.e. share by share (see
also Remark 7.5.1 below). It is assumed that the time intervals between the submission of a limit order
and the cancellation of one share of this order form a set of mutually independent random variables
identically distributed according to an exponential distribution with parameter θ > 0. Finally, (buy)
market orders are submitted at random times according to a Poisson process with parameter µ. All
75
market orders are assumed to be of unit size.
As in section 7.2, let {L1→k (t), t ∈ [0, ∞)} be the stochastic process representing the number of limit
orders at prices 1, . . . , k standing in the order book at time t. L1→k is thus the cumulative shape of the
order book in our model. It can be viewed as the size of a M X /M/1 + M queueing system with bulk
arrival rate λ1→k , bulk volume distribution (g1→k
n )n∈N∗ , service rate µ and reneging rate θ (see e.g. [79]
for queueing systems with bulk arrivals). The infinitisemal generator of the process L1→k is thus written:

 −λ1→k
λ1→k g1→k
λ1→k g1→k
λ1→k g1→k
1
2
3

1→k
1→k
 µ + θ −(λ1→k + µ + θ)
λ
g
λ
g
1→k 1
1→k 2


0
µ + 2θ
−(λ1→k + µ + 2θ)
λ1→k g1→k

1

0
0
µ
+
3θ
−(λ
+
µ + 3θ)

1→k

..
..
..
..
.
.
.
.

λ1→k g1→k
. . . 
4

λ1→k g1→k
. . . 
3

λ1→k g1→k
. . .  .
2

λ1→k g1→k
. . . 
1

..
. . 
.
.
(7.4.1)
The stationary distribution π1→k = (πn1→k )n∈N of L1→k hence satisfies the following system of equations:



+ (µ + θ)π1→k
 0 = −λ1→k π1→k
0
1 ,

Pn

1→k π1→k , (n ≥ 1),
 0 = −(λ1→k + µ + nθ)π1→k
+ (µ + (n + 1)θ)π1→k
n
n−i
n+1 + i=1 λ1→k gi
which can be solved by introducing the probability generating functions. Let Π1→k (z) =
P
n
and G1→k (z) = n∈N∗ g1→k
n z . Let us also introduce the normalized parameters
δ=
(7.4.2)
P
λ1→k
µ
and ν1→k =
.
θ
θ
1→k n
n∈N πn z
(7.4.3)
By multiplicating the n-th line by zn and summing, the previous system leads to the following differential
equation:
δ
δ
d 1→k
Π (z) + − ν1→k H 1→k (z) Π1→k (z) = π1→k
,
dz
z
z 0
where we have set H 1→k (z) =
Π
1→k
1−G1→k (z)
.
1−z
(z) = z
−δ
(7.4.4)
This equation is straightforwardly solved to obtain:
ν1→k
δπ1→k
0 e
Rz
0
H 1→k (u) du
z
Z
vδ−1 e−ν1→k
Rv
0
H 1→k (u) du
dv,
(7.4.5)
0
and the condition Π1→k (1) = 1 leads to
π1→k
0
= δ
Z
1
δ−1 ν1→k
R1
H 1→k (u) du
!−1
,
(7.4.6)
H 1→k (u) du
v
dv
−δ 0
z R
.
R1
1 δ−1 ν1→k
H 1→k (u) du
v
v e
dv
0
(7.4.7)
v
e
v
dv
0
which by substituting in the general solution gives:
Rz
Π
1→k
(z) =
vδ−1 eν1→k
Rz
Now, turning back to the differential equation (7.4.4), then taking the limit when z tends increasingly to 1
76
and using basic properties of probability generating functions (limz→1 Π1→k (z) = 1, limz→1
t<1
t<1
d 1→k
(z)
dz Π
=
E[L1→k ] and limz→1 H 1→k (z) = g1→k ), we obtain the result stated in the following proposition.
t<1
Proposition 7.4.1 In the discrete one-side order book model with Poisson arrival at rate µ of unit size
market orders, Poisson arrival of limit orders with rate λk at price k and random size with distribution
(gkn )n∈N∗ on N∗ , and exponential lifetime of non-executed limit orders with parameters θ, the average
cumulative shape of the order book up to price k is given by:
E[L
1→k
] = ν1→k
g1→k
−δ+
1
Z
δ−1 ν1→k
v
e
R1
v
H 1→k (u) du
!−1
dv
.
(7.4.8)
0
Note that by taking the sizes of all limit orders to be equal to 1, i.e. by setting gk1 = 1 and gkn = 0, n ≥
2, equation (7.4.8) reduces to equation (7.2.8) of section 7.2, as expected.
We now introduce a specification of the model where the sizes of limit orders are geometrically
distributed with parameter q ∈ (0, 1) and independent of the price, i.e. for any price k ∈ N∗ , gkn =
(1 − q)n−1 q. This specification is empirically founded, as it has been observed that the exponential
distribution may be a rough continuous approximation of the distribution of the sizes of limit orders [see
e.g. 67]. This specification of the volume distribution straightfowardly gives H 1→k (z) =
1
1−(1−q)z
and
with some computations we obtain:
ν1→k
E[L
1→k
ν1→k
δq 1−q
]=
−δ+
,
−ν1→k
q
2 F 1 (δ, 1−q , 1 + δ, 1 − q)
(7.4.9)
where 2 F1 is the ordinary hypergeometric function [see e.g. 296, chapter 2].
Now, following the idea presented in section 7.2, we consider an order book with a continuous price,
in which limit orders are submitted according to a spatial Poisson process with intensity λ(p, t) = αhλ (p).
Recall that hλ is assumed to be a real non-negative function with positive support, denoting the spatial
intensity of arrival rates, i.e. the function such that the number of limit orders submitted in the order
Rp
book in the price interval [p1 , p2 ] is a homogeneous Poisson process with rate p 2 αhλ (u) du. Using
1
notations defined in section 7.2, the cumulative shape at price p ∈ [0, +∞) of this continuous order book
is thus:
H(p)
δq 1−q
1
,
B(p) = H(p) − δ +
−H(p)
q
2 F 1 (δ, 1−q , 1 + δ, 1 − q)
(7.4.10)
which can be derived to obtain the average shape b(p) of the order book, which we state in the following
proposition.
Proposition 7.4.2 In a continuous one-side order book with homogeneous Poisson arrival of unit-size
market orders with intensity µ, spatial Poisson arrival of limit orders intensity αhλ (p), geometric distribution of the sizes of limit orders with parameter q, and exponentially distributed lifetimes of nonexecuted limit orders with parameter θ, the average shape of the order book b is computed for all
p ∈ [0, ∞) by:


H(p)

d 
h(p)
δq 1−q
 .
b(p) =
+


−H(p)
q
d p 2 F1 (δ,
, 1 + δ, 1 − q) 
1−q
77
(7.4.11)
7.5
Influence of the size of limit orders on the shape of the order book
We now use the results of section 7.4 to investigate the influence of the size of the limit orders on the
shape of the order book. Recall that market orders are submitted at rate µ with size 1, that non-executed
limit orders are cancelled share by share after a random time with exponential distribution with parameter
θ, and that the distribution of the sizes of limit orders is a geometric distribution with parameter q (i.e.
with mean 1q ). We study in detail the influence of the parameter q on the theoretical shape of the order
book.
In a first example, we assume that the normalized intensity of arrival of limit orders h is constant [i.e. as
in 303] and equal to αq. Note that when q varies, the mean total volume V(p) of arriving limit orders up
to price p per unit time remains constant:
1
V(p) =
q
p
Z
h(u) du = pα.
(7.5.1)
0
In other words, when q decreases, limit orders are submitted with larger sizes in average, but less often,
keeping the total submitted volume constant. The first remarkable observation is that, although the mean
total volumes of limit and market orders are constant, the shape of the order book varies widely with
q. On figure 7.4, we plot the shape b(p) defined at equation (7.4.11), and cumulative shape defined at
equation (7.4.10), of an order book with arriving volumes of limit orders as in equation (7.5.1). With the
chosen numerical values, the average volume of one limit order ranges from approximately 1 (q = 0.99)
to 20 (q = 0.05). It appears that when q decreases, the shape of the order book increases for lower prices.
In other words, the larger the size of arriving limit orders is, the deeper is the order book around the
spread, all other things being equal.
We observe that figure 7.4 here is similar to figure 7.1 here and figure 3 in [303]. However, volumes
of limit and market orders are equal in the two latter cases, and we have shown in section 7.3 that these
different shapes can actually be obtained with different regimes of market orders, but equal sizes of
market and limit orders: when the arrival rates of market orders increases, all other things being equal,
the average shape of the order book is thinner for lower prices. Therefore, the observation made now is
different. In similar trading regimes where the mean total volume of limit and market orders are equal,
we highlight the influence of the relative volume of limit orders (compared to unit market orders) on the
order book shape: the smaller the average size of limit orders, the shallower the order book close to the
spread.
We provide a second example of the phenomenon by assuming that the intensity of incoming limit
orders exhibits a power-law decrease with the price, as tested in section 7.3 to compare to the analytical
shape provided in [49]. We thus have now h(p) = qα(γ − 1)(1 + p)−γ . There again, when q varies, the
Rp
average total volume of incoming limit orders up to price p remains constant and equal to α(γ−1) 0 (1+
u)−γ du = α(1 − (1 + p)1−γ ). Figure 7.5 plots the shape of the order book with these characteristics, when
q varies. The observed phenomenon is equally clear with this more realistic distribution of incoming
limit orders: the order book deepens at the first limits when the average volume of limit orders increases,
the total volume of limit orders and unit-size market orders submitted being constant.
Finally, one might remark that, by assuming unit-size market orders, our model with geometric
78
5
Shape b p
4
3
2
1
0
0
2
4
6
8
6
8
Price p
30
Cumulative shape B p
25
20
15
10
5
0
0
2
4
Price p
Figure 7.4: Shape of the order book as computed in equation (7.4.11) (top) and cumulative shape of the
order book as computed in equation (7.4.10) (bottom) with δ = 10, h(p) = αq
K 1(0,K) , α = 40, K = 8,
and q = 0.99 (full line), q = 0.5 (dotted), q = 0.25 (dotdashed), q = 0.10 (short-dashed), q = 0.05
(long-dashed)
79
8
Shape b p
6
4
2
0
0
2
4
6
8
10
Price p
Figure 7.5: Shape of the order book as computed in equation (7.4.11) with δ = 10, h(p) = qα(γ −
1)(1 + p)−γ , α = 40 γ = 1.6, and q = 0.99 (full line), q = 0.5 (dotted), q = 0.25 (dotdashed), q = 0.10
(short-dashed), q = 0.05 (long-dashed).
distribution of limit orders’ sizes does not predict the order book shape in the case where the average
size of limit orders is smaller than the average size of market orders. In fact, it will now appear that this
case has never been empirically encountered in our data: the average size of limit orders is always in our
sample greater than the average size of market orders.
Remark 7.5.1 We recall that, for analytical tractability purposes, the cancellation mechanism used in
this work is that each share of submitted limit orders has a lifetime following an exponential distribution
independent of the other variables. A more realistic cancellation mechanism would be that each limit
order has an independent exponentially-distributed lifetime, i.e. that an order standing in the book is
cancelled all at once and not share by share. It turns out this does not qualitatively change our results.
We simulate an order book with this latter cancellation system, and compare the average shape of the
simulated order book with our analytical formulas. Results are shown on figure 7.6 in the case of a
uniform price distribution for the limit orders. Obviously, if q is close to 1, then both cancellation
mechanisms are equivalent and the simulated shape converges to our analytical results. When q , 1, we
cannot straightforwardly compare the two models (the cancellation rate is proportional to the number
of shares in the book in one case, and to the number of orders in the book in the other one), but we
nevertheless observe that both mechanisms exhibit similar shapes of the order book. Furthermore, it
is interesting to note that, irrespective of the cancellation mechanism, our observation stating that the
submission of larger limit orders leads to thicker order books around the spread is always valid. We may
finally add that since market orders are unit-sized, q may be understood as a relative measure of the
volume of limit orders compared to the volume of market orders, and is therefore expected to be closer
to 1 than to 0.
80
6
Average order book b(p)
5
4
3
2
1
0
0
1
2
3
4
Price p
5
6
7
8
Figure 7.6: Shape of the order book as computed in equation (7.4.11), i.e. with share-by-share cancellation (full lines), and numerically simulated with order-by-order cancellation (dashed lines), with µ = 10,
qα
θ = 1, h(p) = θK
, α = 40, K = 8, and q = 0.99 (blue), q = 0.5 (green), q = 0.25 (red), q = 0.10 (cyan),
q = 0.05 (magenta).
7.6
Conclusion
We have presented in this chapter a simple order book model based on classic results from queueing
theory. We have derived a continuous version of the model and shown that it provides an analytical
formula for the shape of the order book that reproduces results from acknowledged numerical and empirical studies. We then have extended the model to allow for the limit orders to be submitted with
random sizes. The extended model provides some insight on the influence of the size of limit orders
in an order book. This insight is confirmed by the empirical study on liquid stocks traded on the Paris
Stock Exchange presented in Chapter 3.
81
82
Chapter 8
Advanced modelling of limit order books
In this chapter, we first review the existing literature on agent-based modelling of market interactions,
before introducing and analyzing in the spirit of Chapter 6, a Hawkes process-based limit order book
model.
8.1
Towards non-trivial behaviours: modelling market interactions
In the simple statistical models of limit order books, flows of orders are treated as independent processes.
Under some (strong) modelling constraints, we can see the order book as a Markov chain and look for
analytical results ([97]). In any case, even if the process is empirically detailed and not trivial ([248]),
we work with the assumption that orders are independent and identically distributed. This very strong
(and false) hypothesis is similar to the “representative agent” hypothesis in Economics: orders being
successively and independently submitted, we may not expect anything but regular behaviours. Following the work of economists such as [206, 207, 208], one has to translate the heterogeneous property of
the markets into the agent-based models. Agents are not identical, and not independent.
In this section we present some toy models implementing mechanisms that aim at bringing heterogeneity: herding behaviour on markets in [92], trend following behaviour in [232] or in [280], threshold
behaviour [91]. Most of the models reviewed in this section are not order book models, since a persistent
order book is not kept during the simulations. They are rather price models, where the price changes are
determined by the aggregation of excess supply and demand. However, they identify essential mechanisms that may clearly explain some empirical data, and lay the grounds for better limit order book
models such as that presented in the following sections.
8.1.1
Herding behaviour
The model presented in [92] considers a market with N agents trading a given stock with price p(t). At
each time step, agents choose to buy or sell one unit of stock, i.e. their demand is φi (t) = ±1, i = 1, . . . , N
with probability a or are idle with probability 1 − 2a. The price change is assumed to be linearly linked
83
with the excess demand D(t) =
i=1 φi (t)
PN
with a factor λ measuring the liquidity of the market :
N
1X
φi (t).
p(t + 1) = p(t) +
λ i=1
(8.1.1)
λ can also be interpreted as a market depth, i.e. the excess demand needed to move the price by one
unit. In order to evaluate the distribution of stock returns from Eq.(8.1.1), we need to know the joint
distribution of the individual demands (φi (t))1≤i≤N . As pointed out by the authors, if the distribution
of the demand φi is independent and identically distributed with finite variance, then the Central Limit
Theorem stands and the distribution of the price variation ∆p(t) = p(t + 1) − p(t) will converge to a
Gaussian distribution as N goes to infinity.
The idea here is to model the diffusion of the information among traders by randomly linking their
demand through clusters. At each time step, agents i and j can be linked with probability pi j = p =
c
N,
c being a parameter measuring the degree of clustering among agents. Therefore, an agent is linked to
an average number of (N − 1)p other traders. Once clusters are determined, the demand are forced to
be identical among all members of a given cluster. Denoting nc (t) the number of cluster at a given time
step t, Wk the size of the k-th cluster, k = 1, . . . , nc (t) and φk = ±1 its investement decision, the price
variation is then straightforwardly written :
n (t)
∆p(t) =
c
1X
Wk φk .
λ k=1
(8.1.2)
This modelling is a direct application to the field of finance of the random graph framework as
studied in [123]. [205] previously suggested it in economics. Using these previous theoretical works,
and assuming that the size of a cluster Wk and the decision taken by its members φk (t) are independent,
the author are able to show that the distribution of the price variation at time t is the sum of nc (t)
independent identically distributed random variables with heavy-tailed distributions :
n (t)
∆p(t) =
c
1X
Xk ,
λ k=1
(8.1.3)
where the density f (x) of Xk = Wk φk is decaying as :
f (x) ∼|x|→∞
A −(c−1)|x|
e W0 .
|x|5/2
(8.1.4)
Thus, this simple toy model exhibits fat tails in the distribution of prices variations, with a decay reasonably close to empirical data. Therefore, [92] show that taking into account a naive mechanism of
communication between agents (herding behaviour) is able to drive the model out of the Gaussian convergence and produce non-trivial shapes of distributions of price returns.
8.1.2
Fundamentalists and trend followers
[232] proposed a model very much in line with agent-based models in behavioural finance, but where
84
trading rules are kept simple enough so that they can be identified with a presumably realistic behaviour
of agents. This model considers a market with N agents that can be part of two distinct groups of traders:
n f traders are “fundamentalists”, who share an exogenous idea p f of the value of the current price p;
and nc traders are “chartists” (or trend followers), who make assumptions on the price evolution based
on the observed trend (mobile average). The total number of agents is constant, so that n f + nc = N at
any time. At each time step, the price can be moved up or down with a fixed jump size of ±0.01 (a tick).
The probability to go up or down is directly linked to the excess demand ED through a coefficient β.
The demand of each group of agents is determined as follows :
• Each fundamentalist trades a volume V f proportional, with a coefficient γ, to the deviation of the
current price p from the perceived fundamental value p f : V f = γ(p f − p).
• Each chartist trades a constant volume Vc . Denoting n+ the number of optimistic (buyer) chartists
and n− the number of pessimistic (seller) chartists, the excess demand by the whole group of
chartists is written (n+ − n− )Vc .
Therefore, assuming that there exists some noise traders on the market with random demand µ, the
global excess demand is written :
ED = (n+ − n− )Vc + n f γ(p f − p) + µ.
(8.1.5)
The probability that the price goes up (resp. down) is then defined to be the positive (resp. negative) part
of βED.
As observed in [322], fundamentalists are expected to stabilize the market, while chartists should
destabilize it. In addition, following [92], the authors expect non-trivial features of the price series to
results from herding behaviour and transitions between groups of traders. Referring to Kirman’s work as
well, a mimicking behaviour among chartists is thus proposed. The nc chartists can change their view on
the market (optimistic, pessimistic), their decision being based on a clustering process modelled by an
opinion index x =
n+ −n−
nc
representing the weight of the majority. The probabilities π+ and π− to switch
from one group to another are formally written :
π± = v
nc ±U
e ,
N
U = α1 x + α2 p/v,
(8.1.6)
where v is a constant, and α1 and α2 reflect respectively the weight of the majority’s opinion and the
weight of the observed price in the chartists’ decision. Transitions between fundamentalists and chartists
are also allowed, decided by comparison of expected returns (see [232] for details).
The authors show that the distribution of returns generated by their model have excess kurtosis. Using a Hill estimator, they fit a power law to the fat tails of the distribution and observe exponents grossly
ranging from 1.9 to 4.6. They also check hints for volatility clustering: absolute returns and squared
returns exhibit a slow decay of autocorrelation, while raw returns do not. It thus appears that such a
model can grossly fit some “stylized facts”. However, the number of parameters involved, as well as
the complicated rules of transition between agents, make clear identification of sources of phenomenons
and calibration to market data difficult and intractable.
85
[18, 19] provide a somewhat simplifying view on the Lux-Marchesi model. They clearly identify
the fundamentalist behaviour, the chartist behaviour, the herding effect and the observation of the price
by the agents as four essential effects of an agent-based financial model. They show that the number
of agents plays a crucial role in a Lux-Marchesi-type model: more precisely, the stylized facts are
reproduced only with a finite number of agents, not when the number of agents grows asymptotically,
in which case the model stays in a fundamentalist regime. There is a finite-size effect that may prove
important for further studies.
The role of the trend following mechanism in producing non-trivial features in price time series
is also studied in [280]. The starting point is an order book model similar to [73] and [303]: at each
time step, liquidity providers submit limit orders at rate λ and liquidity takers submit market orders at
rate µ. As expected, this zero-intelligence framework does not produce fat tails in the distribution of
(log-)returns nor an over-diffusive Hurst exponent. Then, a stochastic link between order placement
and market trend is added: it is assumed that liquidity providers observing a trend in the market will act
consequently and submit limit orders at a wider depth in the order book. Although the assumption behind
such a mechanism may not be empirically confirmed (a questionable symmetry in order placement is
assumed) and should be further discussed, it is interesting enough that it directly provides fat tails in the
log-return distributions and an over-diffusive Hurst exponent H ≈ 0.6 − 0.7 for medium time-scales, as
shown in figure 8.1.
Since their introduction, Hawkes processes have been applied to a wide range of research areas,
from seismology in the pioneering work [175] to credit risk, financial contagion and more recently, to
the modelling of market microstructure [27, 26, 28, 253, 273, 328, 102, 101].
In market microstructure, and particularly limit order book modelling, the relevance of these processes is amply demonstrated by two empirical properties of the order flow of market and limit orders at
the microscopic level:
• Time clustering: order arrivals are highly clustered in time, see e.g. [67][102].
• Mutual dependence: order flow exhibit non-negligible cross-dependences. For instance, as documented in [253][23][113], market orders excite limit orders, and limit orders that change the price
excite market orders.
By using, as in Chapter 6, techniques from the ergodic theory of Markov processes, we show that the
extended process describing the joint dynamics of the limit order book and the intensities of the Hawkes
process driving it, is ergodic, and that it leads to a diffusive behaviour of the price at large time scales.
8.1.3
Model setup
A stylized order book model whose dynamics is governed by Hawkes processes is now introduced. We
shall use the same notations and conventions as in Chapter 6 to represent the limit order book. The
same three types of events can modify the limit order book: arrival of a new limit order, arrival of a new
market order, cancellation of an already existing limit order. This time, the arrival of market and limit
orders are described by mutually exciting Hawkes processes, see Chapter B, Section B.1.1 for a brief
introduction to Hawkes processes:
86
Figure 8.1: Hurst exponent found in the Preis model for different number of agents when including
random demand perturbation and dynamic limit order placement depth. Reproduced from [280].
87
+
−
• M ± (t): arrival of new buy or sell market order, with intensity λ M and λ M ;
±
• Li± (t): arrival of a limit order at level i, with intensity λiL ,
whereas the arrival of a cancellation order is modelled as in Chapter 6 by a doubly stochastic Poisson
process:
+
−
• Ci± (t): cancellation of a limit order at level i, with intensity λCi ai and λCi |bi |.
8.1.4
The infinitesimal generator
A markovian (2K + 2)-dimensional Hawkes process now models the intensities of the arrivals of market
and limit orders. It is characterized by the D-dimensional process (a; b; µ) of the available quantities
and the intensities of the Hawkes processes decomposed as in Section ??, where D = (2K + 2)2 + 2K is
the dimension of the state space.
The infinitesimal generator associated with the process describing the joint evolution of the limit order
book has the following expression
+
+
+
LF(a; b; µ) = λ M (F [ai − (q − A(i − 1))+ ]+ ; J M (b); µ + ∆ M (µ) − F)
+
K
X
+
+
+
λiL (F ai + q; J Li (b); µ + ∆Li (µ) − F)
i=1
+
K
X
+
+
λCi ai (F ai − q; JCi (b) − F)
i=1
−
−
−
+ λ M F J M (a); [bi + (q − B(i − 1))+ ]− ; µ + ∆ M (µ) − F
+
K
X
−
−
−
λiL (F J Li (a); bi − q; µ + ∆Li (µ) − F)
i=1
+
−
K
X
−
−
λCi |bi |( f JCi (a); bi + q − f )
i=1
D
X
βi j µi j
i, j=1
∂F
.
∂µi j
(8.1.7)
In order to ease the already cumbersome notations, we have written
F(ai ; b; µ) instead of F(a1 , . . . , ai , . . . , aK ; b; µ). Moreover, the notation ∆(...) (µ) stands for the jump of
the intensity vector µ corresponding to a jump of the process N (...) (see Section B.1.1).
The operator L is a combination of
• standard difference operators corresponding to the arrival or cancellation of orders at each limit
and shift operators expressing the moves in the best limits, as already seen;
• drift terms coming from the mean-reverting behaviour of the intensities of the Hawkes processes
between jumps.
88
Note that the infinitesimal generator is fully worked out in (8.1.7) in the case of the discrete state-space;
some trivial but notationally cumbersome modifications would be necessary in order to account for the
case of real-valued quantities ai , bi ’s and order size q.
8.2
Stability of the order book
In this section, we study the long-time behaviour of the limit order book. A Lyapunov function is built,
ensuring the ergodicity of the limit order book under the natural assumption (B.1.8). More precisely,
there holds the
Proposition 8.2.1 Under the standing assumptions, in particular (B.1.8), the limit order book process
X is ergodic. It converges exponentially fast towards its unique stationary distribution Π.
Proof. Given the existence of the Lyapunov function provided in Lemma 8.2.1 below, and the geometric
drift condition (8.2.2), the result is proven using Theorem 7.1 in [246], see Chapter B, Section B.2.2.
Lemma 8.2.1 For η > 0 small enough, the function V defined by
V(a; b; µ) =
K
X
K
X
ai +
i=1
i=1
2
(2K+2)
1 X
1
|bi | +
δi j µi j ≡ V1 + V2
η i, j=1
η
(8.2.1)
where V1 (resp. V2 ) corresponds to the part that depends only on (a; b) (resp. µ), is a Lyapunov function
satisfying a geometric drift condition
LV 6 −ζV + C,
(8.2.2)
for some ζ > 0 and C ∈ R. The coefficients δi j ’s are defined in (B.1.13) in Section B.1.1.
Proof. First specialize V2 to be identical - up to a change in the indices - to the function defined by
(8.2.1) in Section B.1.1 in Appendix. Regarding the "small" parameter η > 0, it will become handy as a
penalization parameter, as we shall see below.
Thanks to the linearity of L, there holds
1
LV = LV1 + LV2 .
η
The first term LV1 is dealt with exactly as in Chapter 6:
+
−
LV1 (x) ≤ −(λ M + λ M )q +
K
X
+
−
(λiL + λiL )q −
i=1
+
K
X
−
λiL (iS − i)+ a∞ +
i=1
K
X
+
−
(λCi ai + λCi |bi |)q
i=1
K
X
+
λiL (iS − i)+ |b∞ |
(8.2.3)
i=1
+
−
−
+
≤ −(λ M + λ M )q + (ΛL + ΛL )q − λC qV1 (x)
−
+
+ K(ΛL a∞ + ΛL |b∞ |),
(8.2.4)
89
where
±
ΛL :=
K
X
±
±
λiL and λC := min {λCi } > 0.
1≤i≤K
i=1
Computing LV2 yields an expression identical to that obtained in Section B.1.1:
L(V2 ) =
X
j
λ0 δi j αi j + (κ − 1)
X
i, j
so that there holds
k µ jk ,
j,k
1
γ
LV = LV1 + LV2 6 −λC qV1 − V2 − G.µ + C,
η
η
where γ is as in Equation (B.1.15), G.µ is a compact notation for the linear form in the µi j ’s obtained in
(8.2.4), and C is some constant. Now, thanks to the positivity of the coefficients in V2 and of the µi j ’s,
one can choose η small enough that there holds
γ
V2 (µ),
2η
∀µ, |G.µ| 6
which yields
γ
1
LV ≡ LV1 + LV2 6 −λC qV1 − V2 + C,
η
2η
(8.2.5)
and finally
LV 6 −ζV + C,
γ
) and C is some constant.
with ζ = Min(λC q, 2η
8.3
Large scale limit of the price process
Using the same approach as in the previous chapter, but in a more general context, we study the longtime bahaviour of the price process, taking into account the stochastic behaviour of the intensities of the
point processes triggering the order book events. We first recall the expression of the price dynamics in
our limit order book model. Consider for instance the mid-price, solution to the SDE (6.2.16) which we
recall here
∆P h −1
(A (q) − A−1 (0))dM + (t) − (B−1 (q) − B−1 (0))dM − (t)
2
K
K
X
X
−
(A−1 (0) − i)+ dLi+ (t) +
(B−1 (0) − i)+ dLi− (t)
dP(t) =
i=1
+ (A (q) − A
−1
i=1
−1
(0))dCi+A (t)
i
− (B−1 (q) − B−1 (0))dCi−B (t) .
Let us, as before, recast this equation under the following form:
Pt =
t 2K+2
X
Z
0
i=1
90
Fi (Xu )dNui ,
(8.3.1)
where the N i are the point processes driving the limit order book, X is the markovian process describing
its state, and P is one of the price processes we are interested in. For instance, in the Poisson case dealt
with in Chapter 6, X = (a, b) and the N i are the Poisson processes driving the arrival of market, limit and
cancellation orders. In the context of Hawkes processes considered in this work, X = (a, b, µ) and the
N i are the Poisson and Hawkes processes driving the limit order book. Also, recall that, as mentioned
earlier, the Fi are bounded functions, as the price changes are bounded by the total number of limits in
the book, thanks to the non-zero boundary conditions a∞ , b∞ .
Denote again by Π the stationary distribution of X, as provided by Proposition 8.2.1. One can prove the
Theorem 8.3.1 Write as above the price
Z tX
Pt =
0
Fi (X s )dN si
i
and its compensator
Qt =
Z tX
0
Fi (X s )λis ds
i
(note that this time, the sum is inside the integral, since the λi ’s are components of the Markov process
X). Define
h=
X
Fi (X)λi
i
and let
a.s.
1
α = lim
t→+∞ t
Z tX
0
(Fi (X s ))λis ds
=
Z
h(X)Π(dX).
i
Finally, introduce the solution g to the Poisson equation
Lg = h − α
and the associated martingale
Zt = g(Xt ) − g(X0 ) −
t
Z
Lg(X s )ds ≡ g(Xt ) − g(X0 ) − Qt + αt.
0
Then, the deterministically centered, rescaled price
Pnt − αnt
P¯ nt ≡
√
n
converges in distribution to a Wiener process σW.
¯
The asymptotic volatility σ
¯ satisfies the identity
1
t→+∞ t
σ
¯ 2 = lim
Z tX
Z X
((Fi − ∆i (g))(X s ))2 λis ds ≡
((Fi − ∆i (g))(X))2 λi Π(dx).
0
i
i
Proof. Using again the martingale method with
Pt = (Pt − Qt ) − Zt + g(Xt ) − g(X0 ) + αt ≡ (Mt − Zt ) + g(Xt ) − g(X0 ) + αt,
91
the theorem will be proven iff one can show that g ∈ L2 (Π(dx)). The condition
h2 6 V
(8.3.2)
(where V is a Lyapunov function for the process) of Theorem 4.4 in [155] is sufficient for g to be
in L2 (Π(dx)). The linear Lyapunov function V introduced in (8.2.1) does not yield the desired result,
because h is no longer bounded but rather, linearly increasing in the λ’s. However, Lemma B.1.1 in
Chapter B provides a Lyapunov function having a polynomial growth of arbitrarily high order in the λ
’s at infinity, thereby ensuring that Condition (8.3.2) holds.
92
Part III
Simulation of limit order books
93
Chapter 9
Numerical simulation of limit order books
The base model for the simulation of a limit order book is a standard zero-intelligence agent-based
market simulator that can be built as follows: one agent is a liquidity provider. This agent submits limit
orders in the order books, orders which he can cancel at any time. This agent is simply characterized by
a few random distributions:
1. submission times of new limit orders are distributed according to a homogeneous Poisson process
N L with intensity λL ;
2. submission times of cancellation of orders are distributed according to homogeneous Poisson
process N C with intensity λC ;
3. placement of new limit orders is centred around the same side best quote and follows a Student’s
distribution with degrees of freedom ν1P , shift parameter m1P and scale parameter s1P ;
4. new limit orders’ volume is randomly distributed according to an exponential law with mean mV1 ;
5. in case of a cancellation, the agent deletes his own orders with probability δ.
The second agent in the basic model is a noise trader. This agent only submits market order (it is
sometimes referred to as the liquidity taker). Characterization of this agent is even simpler:
resume submission times of new market orders are distributed according to a homogeneous Poisson
process N M with intensity µ;
resume market orders’ volume is randomly distributed according to an exponential law with mean mV2 .
For all the experiments, agents submit orders on the bid or the ask side with probability 0.5. This
basic model will be henceforth referred to as “HP” (Homogeneous Poisson).
Assumption 3 is in line with empirical observations in [248]. Assumptions 4 and 7 are in line with
empirical observations in [67] as far as the main body of the distribution is concerned, although they fail
to represent the broad distribution observed in empirical studies.
Assumptions 1, 2 and 6 (Poisson) will be enforced throughout Section ??. A more complicated, and
more interesting dependence structure will be implemented and studied in Section 9.2, where Hawkes
processes are used to reproduce the interaction between liquidity taking and providing.
95
9.1
Zero-intelligence limit order book simulator
In order to gain a better intuitive understanding of the “mechanics” of the model, we sketch in Algorithm 1 below the simulation procedure in pseudo-code (see also [152] for a similar description.) For
simplicity, we take a symmetric order book model. We also let (usual notations):
λL :=
Λ
L
:=
λC (a) :=
ΛC (a) :=
λC (b) :=
ΛC (b) :=
λ1L , . . . , λLK ,
K
X
λiL ,
i=1
λC1 a1 , . . . , λCK aK ,
K
X
λCi ai ,
i=1
λC1 |b1 |, . . . , λCK |bK | ,
K
X
λCi |bi |,
(9.1.1)
(9.1.2)
(9.1.3)
(9.1.4)
(9.1.5)
(9.1.6)
i=1
Λ(a, b) := 2(λ M + ΛL ) + ΛC (a) + ΛC (b).
(9.1.7)
In order to put the simulation results and the data on the same footing, we relax the assumption of constant order sizes; we draw the order volumes from lognormal distributions. The parameters of the model
are estimated from tick by tick data as detailed in Appendix A.1. For concreteness1 , we use the parameters of the stock SCHN.PA (Schneider Electric) in March 2011 for the plots. They are summarized in
tables 9.1 and 9.2.
Figure 9.1 represents the average depth profile, that is, the average number of outstanding shares at
a distance of i ticks from the best opposite price. The agreement between the simulation and the data
is fairly good (See panel (a) of figure 9.10 for a cross-sectional view on CAC 40 stocks.) We also plot
the distribution of the spread in figure 9.2. Note that the simulated distribution is tighter than the actual
one (this is systematic and is documented in panel (b) of figure 9.10.) Figure 9.3 shows the fast decay of
the autocorrelation function of the price increments. Note the high negative autocorrelation of simulated
trade prices relatively to the data. In accordance with the theoretical analysis, figures 9.4–9.6 illustrate
the asymptotic normality of price increments.
The signature plot of the price time series is defined as the variance of price increments at lag h
normalized by the lag, that is
σ2h :=
V [P(t + h) − P(t)]
.
h
(9.1.8)
This function measures the variance of price increments per time unit. It is interesting in that it shows the
transition from the variance at small time scales where micro-structure effects dominate, to the long-term
variance. By theorem ??2
lim σ2h = σ2 , for some fixed value σ.
h→∞
1
2
The results are qualitatively the same for all CAC 40 stocks.
Strictly spreaking, we proved the result in event-time.
96
(9.1.9)
Algorithm 1 Order book simulation.
Require: Model parameters— Arrival rates: λ M , {λiL }i∈{1,...K} , {λCi }i∈{1,...K} , order book size: K, reservoirs: a∞ , b∞ , volume distribution parameters: (v M , s M ), (vL , sL ), (vC , sC ).
Simulation Parameters— Number of time steps: N.
Initialization— t ← 0, X(0) ← Xinit .
1: for time step n = 1, . . . , N, do
2:
Compute the best bid pB and best ask pA .
K
X
3:
Compute ΛC (b) =
λCi |bi |, i.e. the weighted sum of shares at price levels from pA − K to pA − 1.
i=1
K
X
4:
Compute ΛC (a) =
λCi ai .
5:
Draw the waiting time τ for the next event from an exponential distribution with parameter
i=1
Λ(a, b) = 2(λ M + ΛL ) + ΛC (a) + ΛC (b).
6:
7:
8:
9:
10:
11:
Draw a new event according to the probability vector
λ M , λ M , ΛL , ΛL , ΛC (a), ΛC (b) /Λ(a, b).
These probabilities correspond respectively to a buy market order, a sell market order, a buy limit
order, a sell limit order, a cancellation of an existing sell order and a cancellation of an existing
buy order.
Depending on the event type, draw the order volume from a lognormal distribution with parameters (v M , s M ), (vL , sL ) or (vC , sC ).
If the selected event is a limit order, select the relative price level from {1, 2, . . . , K} according to
the probability vector
λ1L , . . . , λLK /ΛL .
If the selected event is a cancellation, select the relative price level at which to cancel an order
from {1, 2, . . . , K} according to the probability vector
λC1 a1 , . . . , λCK aK /ΛC (a).
(or λC (b)/ΛC (b) for the bid side.)
Update the order book state according to the selected event.
Enforce the boundary conditions:
ai = a∞ , i ≥ K + 1,
bi = b∞ , i ≥ K + 1.
Increment the event time n by 1 and the physical time t by τ.
13: end for
Remark 9.1.1 For the practical implementation, it is easier to work with an “absolute” price frame
∆p × {1 . . . L} where L K.
12:
97
We verify this numerically in figure 9.7. Two remarks are in order regarding the signature plot:
Long-term variance— The simulated long-term variance is systematically lower than the variance
computed from the data (This is documented in panel (c) of figure 9.10.) Intuitively, the depth of the
order book is expected to increase from the best price towards the center of the book. In the absence
of autocorrelation in trade signs, this would cause prices to wander less often far away from the current
best as they hit a higher “resistance”. We also suspect that actual prices exhibit locally more “drifting
phases” than in a (symmetric) Markovian model where the expected price drift is null at all times. An
interesting analysis of a simple order book model that allows time-varying arrival rates can be found in
[75].
Short-term variance— The signature plot predicted by the model is too high at short time scales
relative to the asymptotic variance, especially for traded prices. This is classically known to be due to
bid-ask bounce. It is however remarkable that the signature plot of actual trade prices looks much flatter
compared to the simulation (See figure 9.7.) This was discovered and discussed in detail by Bouchaud
et al. in [48], and Lillo and Farmer in [225] (See also [135] and [47].) They note that actual order signs
exhibit positive long-ranged correlations. They also note that actual prices are diffusive—the signature
plot is flat—even at small time scales. They solve this apparent paradox by showing that diffusivity
results from two opposite effects: autocorrelation in trade signs induces persistence in the prices, just
at the exact amount to counterbalance the mean reversion induced by the liquidity stored in the order
book. This subtle equilibrium between liquidity takers and liquidity providers which guarantees price
diffusivity at short lags, is not accounted for by the bare Markovian order book model we study, and
one can speak about anomalous diffusion at short time scales for Markovian order book models [303].
Because of the absence of positive autocorrelation in trade signs in the model, this effect is magnified
when one looks at trades. The next paragraph elaborates on this point.
Anomalous diffusion at short time scales. A qualitative understanding of the discrepancy between
the model and the data signature plots at short time scales can be gleaned with the following heuristic
argument. In what follows, we reason in trade time t. Denote by PT r (t) the price of the trade at time t,
and α(t) its sign:
α(t) = 1, for a buyer initiated trade, i.e. a buy market order,
(9.1.10)
α(t) = −1, for a seller initiated trade, i.e. a sell market order.
(9.1.11)
and,
We assume that the two signs are equally probable (symmetric model). But to make the argument valid
for both the model (for which successive trade signs are independent) and the data (for which trade signs
exhibit long memory) we do not assume independence of successive trade signs. Let also for a quantity
Z
∆Z(t) := Z(t + 1) − Z(t).
(9.1.12)
1
PT r (t) = P(t− ) + α(t)S (t− ),
2
(9.1.13)
We have by definition
where P(t− ) and S (t− ) are respectively the prevailing mid-price and spread just before the trade. From
98
2000
Model
Data
1500
Quantity (shares)
1000
500
0
−500
−1000
−1500
−2000
−30
−20
−10
0
10
Distance from best opposite quote (ticks)
20
30
Figure 9.1: Average depth profile. Simulation parameters are summarized in tables 9.1 and 9.2.
99
1
Model
Data
0.9
0.8
Frequency
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
Spread (ticks)
7
8
9
10
Figure 9.2: Probability distribution of the spread. Note that the model (dark gray) predicts a tighter
spread than the data.
100
1
Autocorrelation of price increments
Trade price (Model)
Mid−price (Model)
Trade price (Data)
0.5
0
−0.5
0
2
4
6
8
10
12
Lag (trade time)
14
16
18
20
Figure 9.3: Autocorrelation of price increments. This figure shows the fast decay of the autocorrelation
function, and the large negative autocorrelation of trades at the first lag.
101
70
60
50
P (t) - P (0) (ticks)
40
30
20
10
0
−10
−20
−30
−40
0
5
10
15
20
25
Time (hours)
30
35
40
Figure 9.4: Price sample path. At large time scales, the price process is a Brownian motion.
102
Quantiles of Midprice Increments
h=1s
h = 30 s
5
5
0
0
−5
−5
−5
0
5
Standard Normal Quantiles
−5
h = 120 s
5
0
0
−5
−5
0
5
h = 300 s
5
−5
0
5
−5
0
5
Figure 9.5: Q-Q plot of mid-price increments. h is the time lag in seconds. This figure illustrates the
aggregational normality of price increments.
103
0.25
0.2
Frequency
0.15
0.1
0.05
0
−8
−6
−4
−2
0
2
P(t + h) - P(t) (ticks)
4
6
Figure 9.6: Probability distribution of price increments. Time lag h = 1000 events.
104
8
Trades (Sim.)
Mid−price (Sim.)
Trades (Data)
Mid−price (Data)
1.2
1
0.25
0.8
σh2
0.2
0.15
0.6
0.1
0.05
0.4
0
20
40
60
140
160
80
100
0.2
0
20
40
60
80
100
120
Time lag h (trade time)
180
200
(a) Trade time.
0.25
Trades (Sim.)
Mid−price (Sim.)
Trades (Data)
Mid−price (Data)
0.2
0.06
0.05
0.15
σh2
0.04
0.03
0.02
0.1
0.01
0
0
100
200
300
0.05
0
0
100
200
300
400
500
600
Time lag h (seconds)
700
800
900
1000
(b) Calendar time.
Figure 9.7: Signature plot: σ2h := V [P(t + h) − P(t)]/h. y axis unit is tick2 per trade for panel (a) and
tick2 .second−1 for panel (b). We used a 1,000,000 event simulation run for the model signature plots.
Data signature plots are computed separately for each trading day [9 : 30–14 : 00] then averaged across
23 days. For calendar time signature plots, prices are sampled every second using the last tick rule. The
105
inset is a zoom-in.
equation (9.1.13)
σT1 r
2
:= V[∆PT r (t)]
2 = E ∆PT r (t)
h
2 i
= E ∆P(t− )
+ E ∆P(t− )∆(α(t)S (t− ))
2 i
1 h
+
E ∆(α(t)S (t− )) .
4
(9.1.14)
The first term in the r.h.s. is the variance of mid-price increments σ21 . The second term represents the
covariance of mid-price increments and the trade sign (weighted by the spread) and we assume it is
negligible3 . Let us focus on the third term:
∆(α(t)S (t− )) = α(t + 1)∆S (t− ) + S (t− )∆α(t).
(9.1.15)
h
h
i h
i
2 i
E ∆(α(t)S (t− ))
= E (∆α(t))2 E S (t− )2
+ 2E α(t + 1)∆S (t− )S (t− )∆α(t)
h
i h
i
+ E α(t + 1)2 E (∆S (t− ))2 .
(9.1.16)
Then
Again, we neglect the cross term4 in the r.h.s. and we are left with
h
h
i h
i
2 i
E ∆(α(t)S (t− ))
≈ E (∆α(t))2 E S (t− )2
h
i
+ E (∆S (t− ))2 .
(9.1.17)
But
h
i
h
i
h
i
E (∆α(t))2 = E α(t + 1)2 + E α(t)2 − 2E [α(t)α(t + 1)]
= 2 (1 − ρ1 (α)) ,
(9.1.18)
where ρ1 (α) is the autocorrelation of trade signs at the first lag.
Finally5 :
2
σT1 r ≈ σ1 2 +
h
i 1 h
i
1
(1 − ρ1 (α)) E S (t− )2 + E (∆S (t− ))2 .
2
4
(9.1.20)
Two effects are clear from equation (9.1.20):
1. The trade price variance at short time scales is larger than the mid-price variance,
3
This amounts to neglecting the correlation between trade signs and mid-quote movements, which is justified by the dominance of cancellations and limit orders in comparison to market orders in order book data.
4
This time, we are neglecting the correlation between trade signs and spread movements.
5
More generally, after n trades:
h
i
1
2
(1 − ρn (α)) E S (t− )2 .
σTn r ≈ σn 2 +
(9.1.19)
2n
106
2. Autocorrelation in trade signs dampens this discrepancy. This partially explains6 why the trades
signature plot obtained from the data is flatter than the model predictions: ρ1 (α)model = 0, while
ρ1 (α)data ≈ 0.6.
From a modeling perspective, a possible solution to recover the diffusivity even at very short time
scales, is to incorporate long-ranged correlation in the order flow. Toth et al. [312] have investigated
numerically this route using a “-intelligence” order book model. In this model, market orders signs are
long-ranged correlated, that is, in trade time
ρn (α) = E [α(t + n)α(t)] ∝ n−γ ,
γ ∈]0, 1[.
(9.1.21)
And the size of incoming market orders is a fraction f of the volume displayed at the best opposite
quote, with f drawn from the distribution
Pξ ( f ) = ξ(1 − f )ξ−1 ,
(9.1.22)
They show that, by fine tuning the additional parameters γ and ξ, one can ensure the diffusive behavior
of the price both at mesoscopic (≈ a few trades) and macroscopic (≈ hundred trades) time scales7 .
Parameters estimation
For reproducibility, we summarize in tables 9.1 and 9.2 the parameters used to obtain figures 9.1–9.7.
These correspond to estimating the model for the stock SCHN.PA (Schneider Electric). Our dataset
consists of TRTH8 data for the CAC 40 index constituents in March 2011 (23 trading days). We have
tick by tick order book data up to 10 price levels, and trades. In order to avoid the diurnal seasonality
in trading activity (and the impact of the US market open on European stocks), we somehow arbitrarily
restrict our attention to the time window [9 : 30–14 : 00] CET.
If T be the trading duration of interest each day (T = 4.5 hours—[9 : 30–14 : 00]—in our case.) Then
#trades
M :=
λc
,
2T
(9.1.23)
and
1
λbiL :=
.
2T
(#buy limit orders arriving i tick away from the best opposite quote
+ #sell lim. orders etc.) . (9.1.24)
6
although the arguments that led to (9.1.20) are rather qualitative, a back of the envelope calculation with
h Interestingly,
i
2
E S 2 ∈ [1, 9], gives a difference σT r − σ2 in the range [0.5, 4.5]; which has the same order of magnitude of the values
obtained by simulation.
7
Note that Toth. el al. [312] model the “latent order book”, not the actual observable order book. The former represents the
intended volume at each price level p, that is, the volume that would be revealed should the price come close to p. So that the
interpretation of their parameters, in particular the expected lifetime τlife of an order, does not strictly match ours.
8
Thomson Reuters Tick History.
107
K
a∞
b∞
M
(v , s M )
(vL , sL )
(vC , sC )
λM
30
250
250
(4.00, 1.19)
(4.47, 0.83)
(4.48, 0.82)
±
0.1237
Table 9.1: Model parameters for the stock SCHN.PA (Schneider Electric) in March 2011 (23 trading
days). Figures 9.8 and 9.9 are graphical representation of these parameters.
For cancellations, we need to normalize the count by the average number of shares hXi i at distance i
from the best opposite quote:
1
c
C := 1
λ
.
i
hXi i 2T
(#cancellation orders in the bid side arriving i tick away from the best opposite quote
+ #cancellation orders in the ask side etc.) , (9.1.25)
bL and λbL across 23 trading days to get the final estimates. As for the volumes,
M, λ
We then average λc
i
i
we estimate by maximum likelihood the parameters (b
v, b
s) of a lognormal distribution separately for each
order type. We depict the parameters in figures 9.8 and 9.9.
9.1.1
Results for CAC 40 stocks
In order to get a cross-sectional view of the performance of the model on all CAC 40 stocks, we estimate
the parameters separately for each stock and run a 100, 000 event simulation for each parameter set. We
then compare in figure 9.10 the average depth, average spread and the long-term “volatility” measured
directly from the data, to those obtained from the simulations. Dashed line is the identity function—It
would correspond to a perfect match between model predictions and the data. Solid line is a linear
regression zdata = b1 + b2 zmodel for each quantity of interest z.
Note that despite the good agreement between the average depth profiles (panel (a)), and although
the model successfully predicts the relative magnitudes of the long-term variance σ2∞ and the average
spread hS i for different stocks, it tends to systematically underestimate σ2∞ and hS i. This may be related
to the absence of autocorrelation in order signs in the model and the presence of more drifting phases in
actual prices than in those obtained by simulation.
108
i (ticks)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
hXi i (shares)
276
1129
1896
1924
1951
1966
1873
1786
1752
1691
1558
1435
1338
1238
15
16
17
18
19
20
21
22
23
24
25
26
27
28
1122
1036
943
850
796
716
667
621
560
490
443
400
357
317
29
30
285
249
±
±
λiL
0.2842
0.5255
0.2971
0.2307
0.0826
0.0682
0.0631
0.0481
0.0462
0.0321
0.0178
0.0015
0.0001
0.0
..
.
103 .λCi
0.8636
0.4635
0.1487
0.1096
0.0402
0.0341
0.0311
0.0237
0.0233
0.0178
0.0127
0.0012
0.0001
0.0
..
.
..
.
0.0
..
.
0.0
Table 9.2: Model parameters for the stock SCHN.PA (Schneider Electric) in March 2011 (23 trading
days). Figures 9.8 and 9.9 are graphical representation of these parameters.
Log hAi (5)
Log hS i
σ∞
b1
−0.42 (±0.11)
0.20 (±0.06)
−0.012 (±0.05)
b2
1.13 (±0.04)
1.16 (±0.07)
1.35 (±0.11)
R2
0.99
0.97
0.94
Table 9.3: CAC 40 stocks regression results.
109
(a)
2500
hX i i (shares)
2000
1500
1000
500
0
5
10
15
20
25
30
20
25
30
20
25
30
(b)
0.8
λL
i
0.6
0.4
0.2
0
5
10
15
(c)
−3
x 10
1.5
λC
i
1
0.5
0
5
10
15
i (ticks)
110
Figure 9.8: Model parameters: arrival rates and average depth profile (parameters as in table 9.2). Error
bars indicate variability across different trading days.
(a)
−2
10
Pdf
−4
10
−6
10
−8
10
1
10
10
2
10
3
10
4
5
10
(b)
−2
10
Pdf
−4
10
−6
10
−8
10
1
10
10
2
10
3
10
4
5
10
(c)
−2
10
Pdf
−4
10
−6
10
−8
10
1
10
10
2
3
10
Volume (shares)
10
4
5
10
111
Figure 9.9: Model parameters: volume distribution. Panels (a), (b) and (c) correspond respectively
to market, limit and cancellation orders volumes. Dashed lines are lognormal fits (parameters as in
table 9.1).
(a)
FTE
LVMH
SCHN
AXAF
CAGR
UNBP
Log hAi (5) (data)
4
PRTP VIV
ISPA
GSZ
BNPP
SASY TOTF
ALUA
SEVI
EDF
VIE
DANO
CARR
EAD
ESSI
PERP
AIRP
OREP
BOUY
SGEF
SGOB
MICP
TECF
VLLP
SOGN
ALSO
PUBP
ACCP
SAF
RENA
PEUP
CAPP
LAFP
3
2
1
STM
0.5
1
1.5
2
2.5
3
Log hAi (5) (model)
3.5
4
4.5
(b)
Log hSi (data)
2
STM
RENA
LAFP
CAPP
PEUP
SOGN
ACCP
PUBP
ALSO
SAF
TECF
VLLP
MICP
ESSI
SGOB
OREP
AIRP
BOUY
PERP
EAD
SGEF
CARR
DANO
SEVI
PRTP
EDFALUA
VIE
BNPP
SASY
TOTF
ISPA
GSZ
VIV
UNBP
AXAF
CAGR
SCHN
LVMH
FTE
1.5
1
0.5
0.2
0.4
0.6
0.8
1
1.2 1.4
Log hSi (model)
1.6
1.8
2
(c)
SOGN
σ∞ (data)
1.2
RENA
STM
1
LAFP
PEUPSGOB
ALSO
CAPP
SGEF
AIRP
MICP
VLLP
ACCP
ALUA
CARR
OREP
PUBP TECF
BNPP
DANO
SASY
TOTF
BOUY
SAF
ESSI EDF
ISPA
GSZ
PERP
PRTP
EAD
VIE
AXAF
CAGR
VIV
SEVI
UNBP
LVMH
SCHN
FTE
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
σ∞ (model)
1
1.2
112
Figure 9.10: A cross-sectional comparison of liquidity and price diffusion characteristics between the
model and data for CAC 40 stocks (March 2011).
9.2
Simulation of a limit order book modelled by Hawkes processes
The basic order book simulator is now enhanced with arrival times of limit and market orders following mutually (asymmetrically) exciting Hawkes processes, as per the the model described in Chapter 8.
Modelling is based on empirical observations, verified on several markets (equities, futures on index,
futures on bonds), and detailed in section 4.2 of Chapter 4. More specifically, we observe evidence of
some sort of “market making” in the studied order books: after a market order, a limit order is likely to
be submitted more quickly than it would have been without the market order. In other words, there is a
clear reaction that seems to happen: once liquidity has been removed from the order book, a limit order
is triggered to replace it. We also show that the reciprocal effect is not observed on the studied markets.
These features lead to the use of asymmetric Hawkes processes for the design of a limit order book
simulator . We show in Section 9.2.2 that this simple feature enables a much more realistic treatment of
the bid-ask spread of the simulated order book.
9.2.1
Adding dependence between order flows
Section 4.2 of Chapter 4 shows that the flow of limit orders strongly depends on the flow of market order.
We thus propose that in our experiment, the flow of limit and market orders are modelled by Hawkes
processes N L and N M , with stochastic intensities respectively λ and µ defined as:

Z t


M
M


µ (t) = µ0 +
α MM e−βMM (t−s) dN sM



Z0 t
Z t




L
L
−βLM (t−s)
M

αLM e
dN s
+
αLL e−βLL (t−s) dN sL

 λ (t) = λ0 +
0
(9.2.1)
0
Three mechanisms can be used here. The first two are self-exciting ones, “MM” and “LL”. They are a
way to translate into the model the observed clustering of arrival of market and limit orders and the broad
distributions of their durations. In the empirical study [215], it is found that the measured excitation MM
is important. In our simulated model, we will show (see 9.2.2) that this allows a simulator to provide
realistic distributions of the durations of trades.
The third mechanism, “LM”, is the direct translation of the empirical property we have presented
in section ??. When a market order is submitted, the intensity of the limit order process N L increases,
enforcing the probability that a “market making” behaviour will be the next event. We do no implement
the reciprocal mutual excitation “ML”, since we do not observe that kind of influence on our data, as
explained in section ??.
Rest of the model is unchanged. Turning these features successively on and off gives us several models to test – namely HP (Homogeneous Poisson processes), LM, MM, MM+LM, MM+LL,
MM+LL+LM – to try to understand the influence of each effect.
113
9.2.2
Numerical results on the order book
Fitting and simulation
We fit these processes by computing the maximum likelihood estimators of the parameters of the different models on our data. As expected, estimated values varies with the market activity on the day of the
sample. However, it appears that estimation of the parameters of stochastic intensity for the MM and LM
effect are quite robust. We find an average relaxation parameter βˆ MM = 6, i.e. roughly 170 milliseconds
as a characteristic time for the MM effect, and βˆ LM = 1.8, i.e. roughly 550 milliseconds characteristic time for the LM effect. Estimation of models including the LL effect are much more troublesome
on our data. In the simulations that follows, we assume that the self-exciting parameters are similar
(α MM = αLL , β MM = βLL ) and ensure that the number of market orders and limit orders in the different
simulations is roughly equivalent (i.e. approximately 145000 limit orders and 19000 market orders for
24 hours of continuous trading). Table 9.4 summarizes the numerical values used for simulation. Fitted
parameters are in agreement with an assumption of asymptotic stationarity.
Model
µ0
α MM β MM
λ0
αLM βLM αLL
HP
0.22
1.69
LM
0.22
0.79 5.8
1.8
MM
0.09 1.7
6.0 1.69
MM LL
0.09 1.7
6.0 0.60
1.7
MM LM
0.12 1.7
6.0 0.82 5.8
1.8
MM LL LM 0.12 1.7
5.8 0.02 5.8
1.8 1.7
Common parameters: m1P = 2.7, ν1P = 2.0, s1P = 0.9
V = 275, mV = 380
2
1
λC = 1.35, δ = 0.015
βLL
6.0
6.0
Table 9.4: Estimated values of parameters used for simulations.
We compute long runs of simulations with our enhanced model, simulating each time 24 hours of
continuous trading. Note that using the chosen parameters, we never face the case of an empty order
book. We observe several statistics on the results, which we discuss in the following sections.
Impact on arrival times
We can easily check that introducing self- and mutually exciting processes into the order book simulator
helps producing more realistic arrival times. Figure 9.11 shows the distributions of the durations of
market orders (left) and limit orders (right). As expected, we check that the Poisson assumption has to
be discarded, while the Hawkes processes help getting more weight for very short time intervals.
We also verify that models with only self-exciting processes MM and LL are not able to reproduce
the “liquidity replenishment” feature described in section ??. Distribution of time intervals between a
market order and the next limit order are plotted on figure 9.12. As expected, no peak for short times
is observed if the LM effect is not in the model. But when the LM effect is included, the simulated
distribution of time intervals between a market order and the following limit order is very close to the
114
0.25
0.5
Empirical BNPP
Homogeneous Poisson
Hawkes MM
0.2
Empirical BNPP
Homogeneous Poisson
Hawkes LL
0.4
100
100
10-1
-1
10
10-2
10-2
0.15
0.3
10-3
10-3
10-4
10-4
10-5
2 4 6 8 10 12 14 16 18 20
0.1
0.2
0.05
2 4 6 8 10 12 14 16 18 20
0.1
0
0
0
0.5
1
1.5
2
0
0.5
1
1.5
Figure 9.11: Empirical density function of the distribution of the durations of market orders (left) and
limit orders (right) for three simulations, namely HP, MM, LL, compared to empirical measures. In
inset, same data using a semi-log scale.
empirical one.
0.7
Empirical BNPP
Homogeneous Poisson
Hawkes MM
Hawkes MM LL
Hawkes MM LL LM
0.6
0.5
0
10
10-1
0.4
10-2
10
0.3
-3
10-4
1 2 3 4 5 6 7 8 9 10
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
1.2
Figure 9.12: Empirical density function of the distribution of the time intervals between a market order
and the following limit order for three simulations, namely HP, MM+LL, MM+LL+LM, compared to
empirical measures. In inset, same data using a semi-log scale.
Impact on the bid-ask spread
Besides a better simulation of the arrival times of orders, we argue that the LM effect also helps simulating a more realistic behaviour of the bid-ask spread of the order book. On figure 9.13, we compare the
distributions of the spread for three models – HP, MM, MM+LM – in regard to the empirical measures.
We first observe that the model with homogeneous Poisson processes produces a fairly good shape for
115
2
0.3
Empirical BNPP.PA
Homogeneous Poisson
Hawkes MM
Hawkes MM+LM
0.25
100
10-1
0.2
10-2
10-3
0.15
10-4
10-5
0.1
10-6
0
0.05
0.1
0.15
0.2
0.05
0
0
0.05
0.1
0.15
0.2
0.25
0.3
Figure 9.13: Empirical density function of the distribution of the bid-ask spread for three simulations,
namely HP, MM, MM+LM, compared to empirical measures. In inset, same data using a semi-log scale.
X-axis is scaled in euro (1 tick is 0.01 euro).
the spread distribution, but slightly shifted to the right. Small spread values are largely underestimated.
When adding the MM effect in order to get a better grasp at market orders’ arrival times, it appears
that we flatten the spread distribution. One interpretation could be that when the process N M is excited,
markets orders tend to arrive in cluster and to hit the first limits of the order book, widening the spread
and thus giving more weight to large spread values. But since the number of orders is roughly constant
in our simulations, there has to be periods of lesser market activity where limit orders reduce the spread.
Hence a flatter distribution.
Here, the MM+LM model produces a spread distribution much closer to the empirical shape. It
appears from figure 9.13 that the LM effect reduces the spread: the “market making” behaviour, i.e.
limit orders triggered by market orders, helps giving less weight to larger spread values (see the tail of
the distribution) and to sharpen the peak of the distribution for small spread values. Thus, it seems that
simulations confirm the empirical properties of a “market making” behaviour on electronic order books.
We show on figure 9.14 that the same effect is observed in an even clearer way with the MM+LL
and MM+LL+LM models. Actually, the spread distribution produced by the MM+LL model is the
flattest one. This is in line with our previous argument. When using two independent self exciting
Hawkes processes for arrival of orders, periods of high market orders’ intensity gives more weight to
large spread values, while periods of high limit orders’ intensity gives more weight to small spread
values. Adding the cross-term LM to the processes implements a coupling effect that helps reproducing
the empirical shape of the spread distribution. The MM+LL+LM simulated spread is the closest to the
empirical one.
116
0.3
Empirical BNPP.PA
Homogeneous Poisson
Hawkes MM+LL
Hawkes MM+LL+LM
0.25
100
10-1
0.2
10-2
10-3
0.15
10-4
10-5
0.1
10-6
0
0.05
0.1
0.15
0.2
0.05
0
0
0.05
0.1
0.15
0.2
Figure 9.14: Empirical density function of the distribution of the bid-ask spread three simulations,
namely HP, MM, MM+LM, compared to empirical measures. In inset, same data using a semi-log
scale. X-axis is scaled in euro (1 tick is 0.01 euro).
A remark on price returns in the model
It is somewhat remarkable to observe that these variations of the spread distributions are obtained with
little or no change in the distributions of the variations of the mid-price. As shown on figure 9.15, the
distributions of the variations of the mid-price sampled every 30 seconds are nearly identical for all the
simulated models (and much tighter than the empirical one). This is due to the fact that the simulated
order books are much more furnished than the empirical one, hence the smaller standard deviation
of the mid price variations. One solution to get thinner order books and hence more realistic values
of the variations of the mid-price would be to increase our exogenous parameter δ. But in that case,
mechanisms for the replenishment of an empty order book should be carefully studied, which is still to
be done.
We have shown the the use of Hawkes processes may help producing a realistic shape of the spread
distribution in an agent-based order book simulator. We emphasize on the role of the excitation of the
limit order process by the market order process. This coupling of the processes, similar to a “market
making” behaviour, is empirically observed on several markets, and simulations confirms it is a key
component for realistic order book models.
Future work should investigate if other processes or other kernels (νLM in our notation) might better
fit the observed orders flows. In particular, we observe very short characteristic times, which should lead
us to question the use of the exponential decay. Furthermore, as pointed out in the paper, many other
mechanisms are to be investigated: excitation of markets orders, link with volumes, replenishment of an
empty order book, etc.
117
1
Homogeneous Poisson
Hawkes MM
Hawkes MM LM
Hawkes MM LL
Hawkes MM LL MM
0.1
0.01
0.001
0.0001
-0.04
-0.02
0
0.02
0.04
0.06
Figure 9.15: Empirical density function of the distribution of the 30-second variations of the mid-price
for five simulations, namely HP, MM, MM+LM, MM+LL, MM+LL+LM, using a semi-log scale. Xaxis is scaled in euro (1 tick is 0.01 euro).
118
Chapter 10
A generic limit order book simulator
Marketsim is a library for market microstructure simulation. It allows to instantiate a number of traders
behaving according to various strategies and simulate their trade over number of assets. For the moment
information gathered during simulation is reported in form of graphs.
This library is a result of rewriting Marketsimulator library, originally developed in the Chair of
Quantitative finance at École Centrale Paris, into Python. Python language was chosen as basic language
for development due to its expressiveness, interactivity and availability of free high quality libraries for
scientific computing.
The library source can be found at http://marketsimulator.svn.sourceforge.net/viewvc/
marketsimulator/DevAnton/v3/. It requires Python >= 2.6 installed. For graph plotting free Veusz
plotting software is needed (http://home.gna.org/veusz/downloads/). After installation please
specify path to the Veusz executable (including executable filename) in VEUSZ_EXE environment variable. ArbitrageTrader needs free blist package to be installed: http://pypi.python.org/
pypi/blist/.
10.1
Discrete event simulation components
Main class for every discrete event simulation system is a scheduler that maintains a set of actions
to fulfill in future and launches them according their action times: from older ones to newer. Normally
schedulers are implemented as some kind of a heap data structure in order to perform frequent operations
very fast. Classical heap allows inserts into at O(logN), extracting the best element at O(logN) and
accessing to the best element at O(1).
10.1.1
Scheduler
Scheduler class provides following interface:
class Scheduler(object):
def reset(self)
def currentTime(self)
def schedule(self, actionTime, handler)
119
def scheduleAfter(self, dt, handler)
def workTill(self, limitTime)
def advance(self, dt)
In order to schedule an event a user should use schedule or scheduleAfter methods passing there
an event handler which should be given as a callable object (a function, a method, lambda expression or
a object exposing __call__ method). These functions return a callable object which can be called in
order to cancel the scheduled event.
Methods workTill and advance advance model time calling event handlers in order of their action
times. If two events have same action time it is garanteed that the event scheduled first will be executed
first. These methods must not be called from an event handler. In such a case an exception will be
issued.
Since a scheduler is supposed to be single for non-parallel simulations, for convenient use from
other parts of the library marketsim.scheduler.world static class is introduced which grants access
to currentTime, schedule and scheduleAfter methods of the unique scheduler. This scheduler
should be initialized by creating an instance of Scheduler class in client code (scheduler.create
method) and it gives right to manage simulation by calling workTill and advance methods.
10.1.2
Timer
It is a convenience class designed to represent some repeating action. It has following interface:
class Timer(object):
def __init__(self, intervalFunc):
self.on_timer = Event()
...
def advise(self, listener):
self.on_timer += listener
It is initialized by a function defining interval of time to the next event invocation. Event handler to
be invoked should be passed to advise method (there can be multiple listeners).
For example, sample path a Poisson process with λ=1 can be obtained in the following way:
import random
from marketsim.scheduler import Timer, world
def F(timer):
print world.currentTime
Timer(lambda: random.expovariate(1.)).advise(F)
world.advance(20)
will print
120
0.313908407622
0.795173273046
1.50151801647
3.52280681834
6.30719707516
8.48277712333
Note that Timer is designed to be an event source and for previous example there is a more convenient shortcut:
world.process(lambda: random.expovariate(1.), F)
10.2
Orders
Traders send orders to a market. There are two basic kinds of orders:
• Market orders that ask to buy or sell some asset at any price.
• Limit orders that ask to buy or sell some asset at price better than some limit price. If a limit order
is not competely fulfilled it remains in an order book waiting to be matched with another order.
An order book processes market and limit orders but keeps persistently only limit orders. Limit orders
can be cancelled by sending cancel orders. From trader point of view there can be other order types like
Iceberg order but from order book perspective it is just a sequence of basic orders: market or limit ones.
10.2.1
Basic order functionality
Orders to buy or sell some assets should derive from order.Base which provides basic functionality. It
has following user interface:
class OrderBase(object):
def __init__(self, volume):
self.on_matched = Event()
...
def volume(self)
def PnL(self)
def empty(self)
def cancelled(self)
def cancel(self)
side = Side.Buy | Side.Sell
It stores number of assets to trade (volume), calculates P&L for this order (positive if it is a sell
order and negative if it is a buy order) and keeps a cancellation flag. An order is considered as empty if
121
its volume is equal to 0. When order is matched (partially or completely) on_matched event listeners
are invoked with information what orders are matched, at what price and what volume with.
Usually the library provides a generic class for an order that accepts in its constructor side of the
order. Also, that class provides Buy and Sell static methods that create orders with fixed side. For
convenience, a method to call the constructor in curried form side->*args->Order is also provided
(T)
# from marketsim.order.Market class definition
@staticmethod
def Buy(volume): return Market(Side.Buy, volume)
@staticmethod
def Sell(volume): return Market(Side.Sell, volume)
@staticmethod
def T(side): return lambda volume: Market(side, volume)
10.2.2
Market orders
Market order (order.Market class) derives from order.Base and simply add some logic how it should
be processed in an order book.
10.2.3
Limit orders
Limit order (order.Limit class) stores also its limit price and has a bit more complicated logic of
processing in an order book.
10.2.4
Iceberg orders
Iceberg orders (order.Iceberg class) are virtual orders (so they are composed as a sequence of basic
orders) and are never stored in order books. Once an iceberg order is sent to an order book, it creates an
order of underlying type with volume constrained by some limit and sends it to the order book instead
of itself. Once the underlying order is matched completely, the iceberg order resends another order to
the order book till all iceberg order volume will be traded.
Iceberg orders are created by the following function in curried form:
def iceberg(volumeLimit, orderFactory):
def inner(*args):
return Iceberg(volumeLimit, orderFactory, *args)
return inner
where volume is a maximal volume of an order that can be issued by the iceberg order and orderFactory
is a factory to create underlying orders like order.Market.Sell or order.Limit.Buy and *args are
parameters to be passed to order’s initializer.
122
10.2.5
Cancel orders
Cancel orders (order.Cancel class) are aimed to cancel already issued limit (and possibly iceberg)
orders. If a user wants to cancel anOrder she should send order.Cancel(anOrder) to the same order
book. It will notify the order book event listeners if the best order in the book has changed. Also,
on_order_cancelled event is fired.
10.2.6
Limit market orders
Limit market order (order.LimitMarket class) is a virtual order which is composed of a limit order
and a cancel order sent immediately after the limit one thus combining behavior of market and limit
orders: the trade is done at price better than given one (limit order behavior) but in case of failure the
limit order is not stored in the order book (market order behavior). This class also has optional delay
parameter which, when given, instructs to store the limit order for this delay (Should it be extracted into
a separate Cancellable virtual order which should be parametrized by an order to cancel – it might be a
limit order, an iceberg or an always best order???).
10.2.7
Always best orders
Always best order (order.AlwaysBest class) is a virtual order that ensures that it has the best price
in the order book. It is implemented as a limit order which is cancelled once the best price in the order
queue has changed and is sent again to the order book with a price one tick better than the best price in
the book. If several always best orders are sent to one side of an order book, a price race will start thus
leading to their elimination by orders of the other side.
10.2.8
Virtual market orders
Virtual market orders (order.VirtualMarket class) are utilitary orders that are used to evaluate a
P&L of a market order without submitting it. These orders don’t make any influence onto the market
and end with call to orderBook.evaluateOrderPriceAsync method. When the P&L is evaluated,
on_matched event is fired with parameters (self, None, PnL, volume_matched)
10.3
Order book and order queue
Order book represents a single asset traded in some market. Same asset traded in different markets
would have been represented by different order books. An order book stores unfulfilled limit orders sent
for this asset in two order queues, one for each trade sides (Asks for sell orders and Bids for buy orders).
Order queues are organized in a way to extract quickly the best order and to place a new order inside.
In order to achieve this a heap based implementation is used.
Order books support a notion of a tick size: all limit orders stored in the book should have prices
that are multipliers of the chosen tick size. If an order has a limit price not divisible by the tick size it is
rounded to the closest ’weaker’ tick (’floored’ for buy orders and ’ceiled’ for sell orders).
123
Market orders are processed by an order book in the following way: if there are unfulfilled limit
orders at the opposite trade side, the market order is matched against them till either it is fulfilled or
there are no more unfilled limit orders. Price for the trade is taken as the limit order limit price. Limit
orders are matched in order of their price (ascending for sell orders and descending for buy orders). If
two orders have the same price, it is garanteed that the elder order will be matched first.
Limit orders firstly processed exactly as market orders. If a limit order is not filled completely it is
stored in a corresponding order queue.
There is a notion of transaction costs: if a user wants to define functions computing transaction
fees for market, limit and cancel orders she should pass functions of form (anOrder, orderBook) >
Price to the order book constructor. If Price is negative, the trader gains some money on this transaction. If the functions are given, once an order is processed by an order book, method order.charge(price)
is called. The default implementation for the method delegates it to trader.charge(price) where
trader is a trader associated with the order.
10.3.1
Order book
For the moment, there are two kinds of order books: local and remote ones. Local order books execute
methods immediately but remote onces try to simulate some delay between a trader and a market by
means of message passing (so they are asynchronous by their nature). These books try to have the same
interface in order that traders cannot tell the difference between them.
The base class for the order books is:
class orderbook._base.BookBase(object):
def __init__(self, tickSize=1, label="")
def queue(self, side)
def tickSize(self)
def process(self, order)
def bids(self)
def asks(self)
def price(self)
def spread(self)
def evaluateOrderPriceAsync(self, side, volume, callback)
Methods bids, asks and queue give access to queues composing the order book. Methods price
and spread return middle arithmetic price and spread if they are defined (i.e. bids and asks are not
empty). Orders are handled by an order book by process method. If process method is called recursively (e.g. from a listener of on_best_changed event) its order is put into an internal queue which
is to be processed when the current order processing is finished. This ensures that at every moment of
time only one order is processed. Method evaluateOrderPriceAsync is used to compute P&L of a
market order of given side and volume without executing it. Since this operation might be expensive to
be computed locally in case of a remote order book we return the result price asynchronously by calling
function given by callback parameter.
124
10.3.2
Local order book
Local order books extend the base order book by concrete order processing implementation (methods
processLimitOrder, cancelOrder etc.) and allow user to define functions computing transaction
fees:
class orderbook.Local(BookBase):
""" Order book for a single asset in a market
Maintains two order queues for orders of different sides
"""
def __init__(self, tickSize=1, label="",
marketOrderFee = None,
limitOrderFee = None,
cancelOrderFee = None)
10.3.3
Order queue
Order queues can be accessed via queue, asks and bids methods of an order book and provide following user interface:
class orderbook._queue.Queue(object):
def __init__(self, ...):
# event to be called when the best order changes
self.on_best_changed = Event()
# event (orderQueue, cancelledOrder) to be called when an order is cancelled
self.on_order_cancelled = Event()
def book(self)
def empty(self)
def best(self)
def withPricesBetterThan(self, limit)
def volumeWithPriceBetterThan(self, limit)
def sorted(self)
def sortedPVs(self)
The best order in a queue can be obtained by calling best method provided that the queue is
not empty. Method withPricesBetterThan enumerates orders having limit price better or equal
to limit. This function is used to obtain total volume of orders that can be traded on price better or
equal to the given one: volumeWithPriceBetterThan. Property sorted enumerates orders in their
limit price order. Property sortedPVs enumerates aggregate volumes for each price in the queue in
their order.
When the best order changes in a queue (a new order arrival or the best order has matched), on_best_changed
event listeners get notified.
125
10.3.4
Remote order book
Remote order book (orderbook.Remote class) represents an order book for a remote trader. Remoteness means that there is some delay between moment when an order is sent to a market and the moment
when the order is received by the market so it models latency in telecommunication networks. A remote
book constructor accepts a reference to an actual order book (or to another remote order book) and a
reference to a two-way communication channel. Class remote.TwoWayLink implements a two-way
telecommunication channel having different latency functions in each direction (to market and from
market). It also ensures that messages are delivired to the recipient in the order they were sent. Queues
in a remote book are instances of orderbook._remote.Queue class. This class is connected to the real
order queue and listens on_best_changed events thus keeping information about the best order in the
queue up-to-date. When a remote order book receives an order, it is cloned and sent to the actual order
book via communication link. The remote order book gets subscribed to the clone order’s events via
downside link. It leads to that in some moments of time the state of the original order and its clone are
not synchronised (and this is normal).
10.4
Traders
Trader functionality can be divided into the following parts:
• Ability to send orders to the market
• Tracking number of assets traded and current P&L
• Managing number of assets and money that can be involved into trade (currently, a trader may
send as many orders as it wishes; in future, when modelling money management there should be
some limitations on how many resources it may use)
• Logic for determining moments of time when orders should to be sent and their parameters.
In our library trading strategy classes encapsulate trader behaviour (the last point) while trader
classes are responsible for resource management (first three points).
Class trader.Base provides basic functionality for all traders: P&L bookkeeping (PnL method)
and notifying listeners about two events: on_order_sent when a new order is sent to market and
on_traded when an order sent to market partially or completely fulfilled.
Class trader.SingleAsset derives from trader.Base and adds tracking of a number of assets
traded (amount property).
Class trader.SingleAssetSingleMarket derives from trader.SingleAsset and associates
with a specific order book where orders are to be sent. Class trader.SingleAssetMultipleMarkets
stores references to a collection of order books where the trader can trade.
Class observable.Efficiency can be used to measure trader "efficiency" i.e. trader’s balance if
it were "cleared" (if the trader bought missing assets or sold the excess ones by sending a market order
for -trader.amount assets).
126
10.5
Strategies
10.5.1
Strategy base classes
All strategies should derive from strategy._basic.Strategy class. This class keeps suspended
flag and stores a reference to a trader for the strategy.
class Strategy(object):
def __init__(self, trader):
self._suspended = False
self._trader = trader
@property
def suspended(self):
return self._suspended
def suspend(self, s):
self._suspended = s
@property
def trader(self):
return self._trader
Generic one side strategy
Class strategy.OneSide derives from strategy._basic.Strategy and provides generic structure
for creating strategies that trade only at one side. It is parametrized by following data:
• Trader.
• Side of the trader.
• Event source. It is an object having advise method that allows to start listening some events.
Typical examples of event sources are timer events or an event that the best price in an order queue
has changed. Of course, possible choices of event sources are not limited to these examples.
• Function to calculate parameters of an order to create. For limit orders it should return pair
(price, volume) and for market orders it should return a 1-tuple (volume,). This function
has one parameter - trader which can be requested for example for its current P&L and about
amount of the asset being traded.
• Factory to create orders that will be used as a function in curried form side -> *orderArgs ->
Order where *orderArgs is type of return value of the function calculating order parameters.
127
Typical values for this factory are marketOrderT, limitOrderT or, for exalmple, icebergT(someVolume,
limitOrderT).
The strategy wakes up in moments of time given by the event source, calculates parameters of an
order to create, creates an order and sends it to the order book.
Generic two side strategy
Class strategy.TwoSide derives from strategy._basic.Strategy and provides generic structure
for creating traders that trade on both sides. It is parametrized by following data (see also strategy.OneSide
parameters):
• Trader.
• Event source. It is an object having advise method that allows to start listening some events.
• Function to calculate parameters of an order to create. It should return either pair (side, *orderArgs)
if it wants to create an order, or None if no order should be created.
• Factory to create orders that will be used as a function in curried form side -> *orderArgs ->
Order where *orderArgs is type of return value of the function calculating order parameters.
The strategy wakes up in moments of time given by the event source, calculates parameters and side
of an order to create, if the side is defined creates an order and sends it to the order book.
10.5.2
Concrete strategies
Liquidity provider
Liquidity provider (strategy.LiquidityProviderSide class) strategy is implemented as instance of
strategy.OneSide. It has followng parameters:
• trader - strategy’s trader
• side - side of orders to create (default: Side.Sell)
• orderFactoryT - order factory function (default: order.Limit.T)
• initialValue - initial price which is taken if orderBook is empty (default: 100)
• creationIntervalDistr - defines intervals of time between order creation (default: exponential distribution with λ=1)
• priceDistr - defines multipliers for current asset price when price of order to create is calculated
(default: log normal distribution with µ=0 and σ=0.1)
• volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1)
128
It wakes up in moments of time given by creationIntervalDistr, checks the last best price of
orders in the corresponding queue, takes initialValue if it is empty, multiplies it by a value taken
from priceDistr to obtain price of the order to create, calculates order volume using volumeDistr,
creates an order via orderFactoryT(side) and tells the trader to send it.
For convenience, there is a strategy.LiquidityProvider function that creates two liquidity
providers with same parameters except but of two different trading sides.
Figure 10.1: Liquidity provider sample run
Sample usage:
from marketsim.veusz_graph import Graph, showGraphs
import random
from marketsim import strategy, trader, orderbook, scheduler, observable
world = scheduler.create()
avg = observable.avg
book_A = orderbook.Local(tickSize=0.01, label="A")
price_graph = Graph("Price")
assetPrice = observable.Price(book_A)
price_graph.addTimeSeries([\
129
assetPrice,
avg(assetPrice, alpha=0.15),
avg(assetPrice, alpha=0.015),
avg(assetPrice, alpha=0.65)])
def volume(v):
return lambda: v*random.expovariate(.1)
lp_A = strategy.LiquidityProvider(\
trader.SASM(book_A, "A"), volumeDistr=volume(10)).trader
lp_a = strategy.LiquidityProvider(\
trader.SASM(book_A, "a"), volumeDistr=volume(1)).trader
spread_graph = Graph("Bid-Ask Spread")
spread_graph.addTimeSerie(observable.BidPrice(book_A))
spread_graph.addTimeSerie(observable.AskPrice(book_A))
eff_graph = Graph("efficiency")
eff_graph.addTimeSerie(observable.Efficiency(lp_a))
eff_graph.addTimeSerie(observable.PnL(lp_a))
world.workTill(500)
showGraphs("liquidity", [price_graph, spread_graph, eff_graph])
Canceller
strategy.Canceller is an agent aimed to cancel persistent (limit, iceberg etc.) orders issued by a
trader. In order to do that it subscribes to on_order_sent event of the trader, stores incoming orders in an array and in some moments of time chooses randomly an order and cancels it by sending
order.Cancel.
It has following parameters:
• trader - trader to subscribe to
• cancellationIntervalDistr - intervals of times between order cancellations (default: exponential distribution with λ=1)
• choiceFunc - function N -> idx that chooses which order should be cancelled
130
Fundamental value strategy
Fundamental value strategy (strategy.FundamentalValue class) believes that an asset should cost
some specific price (’fundamental value’) and if current asset price is lower than fundamental value it
starts to buy the asset and if the price is higher than it starts to sell the asset. It has following parameters:
• trader -strategy’s trader
• orderFactoryT - order factory function (default: order.Market.T)
• creationIntervalDistr - defines intervals of time between order creation (default: exponential distribution with λ=1)
• fundamentalValue - defines fundamental value (default: constant 100)
• volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1)
Figure 10.2: Fundamental value trader sample run (fv=200)
Sample usage:
from marketsim.veusz_graph import Graph, showGraphs
from marketsim import strategy, orderbook, trader, scheduler, observable
world = scheduler.create()
book_A = orderbook.Local(tickSize=0.01, label="A")
131
price_graph = Graph("Price")
assetPrice = observable.Price(book_A)
price_graph.addTimeSerie(assetPrice)
avg = observable.avg
trend = observable.trend
price_graph.addTimeSerie(avg(assetPrice))
lp_A = strategy.LiquidityProvider(trader.SASM(book_A), volumeDistr=lambda: 70).trader
trader_200 = strategy.FundamentalValue(trader.SASM(book_A, "t200"),\
fundamentalValue=lambda: 200., volumeDistr=lambda: 12).trader
trader_150 = strategy.FundamentalValue(trader.SASM(book_A, "t150"), \
fundamentalValue=lambda: 150., volumeDistr=lambda: 1).trader
eff_graph = Graph("efficiency")
trend_graph = Graph("efficiency trend")
pnl_graph = Graph("P&L")
volume_graph = Graph("volume")
def addToGraph(traders):
for t in traders:
e = observable.Efficiency(t)
eff_graph.addTimeSerie(avg(e))
trend_graph.addTimeSerie(trend(e))
pnl_graph.addTimeSerie(observable.PnL(t))
volume_graph.addTimeSerie(observable.VolumeTraded(t))
addToGraph([trader_150, trader_200])
world.workTill(1500)
showGraphs("fv_trader", [price_graph, eff_graph, trend_graph, pnl_graph, volume_graph])
Dependent price strategy
Dependent price strategy (strategy.Dependency class) believes that the fair price of an asset A is
completely correlated with price of another asset B and the following relation should be held: PriceA =
132
kPriceB , where k is some factor. It may be considered as a variety of a fundamental value strategy with
the exception that it is invoked every the time price of another asset B has changed. It has following
parameters:
• trader -strategy’s trader
• bookToDependOn - asset that is considered as a reference one
• orderFactoryT - order factory function (default: order.Market.T)
• factor - multiplier to obtain fair asset price from the reference asset price
• volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1)
Figure 10.3: Dependency trader sample run (k=0.5)
Sample usage:
from marketsim.veusz_graph import Graph, showGraphs
import random
from marketsim import strategy, orderbook, trader, scheduler, observable
world = scheduler.create()
book_A = orderbook.Local(tickSize=0.01, label="A")
133
book_B = orderbook.Local(tickSize=0.01, label="B")
price_graph = Graph("Price")
assetPrice_A = observable.Price(book_A)
price_graph.addTimeSerie(assetPrice_A)
assetPrice_B = observable.Price(book_B)
price_graph.addTimeSerie(assetPrice_B)
avg = observable.avg
price_graph.addTimeSerie(avg(assetPrice_A, alpha=0.15))
price_graph.addTimeSerie(avg(assetPrice_B, alpha=0.15))
liqVol = lambda: random.expovariate(.1)*5
lp_A = strategy.LiquidityProvider(trader.SASM(book_A), \
defaultValue=50., volumeDistr=liqVol).trader
lp_B = strategy.LiquidityProvider(trader.SASM(book_B), \
defaultValue=150., volumeDistr=liqVol).trader
dep_AB = strategy.Dependency(trader.SASM(book_A, "AB"), book_B, factor=2).trader
dep_BA = strategy.Dependency(trader.SASM(book_B, "BA"), book_A, factor=.5).trader
eff_graph = Graph("efficiency")
eff_graph.addTimeSerie(observable.Efficiency(dep_AB))
eff_graph.addTimeSerie(observable.Efficiency(dep_BA))
eff_graph.addTimeSerie(observable.PnL(dep_AB))
eff_graph.addTimeSerie(observable.PnL(dep_BA))
world.workTill(500)
showGraphs("dependency", [price_graph, eff_graph])
Noise strategy
Noise strategy (strategy.Noise class) is quite dummy strategy that randomly creates an order and
sends it to the order book. It has following parameters:
• trader -strategy’s trader
• orderFactoryT - order factory function (default: order.Market.T)
134
• creationIntervalDistr - defines intervals of time between order creation (default: exponential distribution with λ=1)
• sideDistr - side of orders to create (default: discrete uniform distribution P(Sell)=P(Buy)=.5)
• volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1)
Signal strategy
Signal strategy (strategy.Signal class) listens to some discrete signal and when the signal becomes
more than some threshold the strategy starts to buy. When the signal gets lower than -threshold the
strategy starts to sell. It has following parameters:
• trader -strategy’s trader
• signal - signal to be listened to
• orderFactoryT - order factory function (default: order.Market.T)
• threshold - threshold when the trader starts to act (default: 0.7)
• volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1)
Figure 10.4: Signal trader sample run (signal=20-t/10)
Sample usage:
135
from marketsim.veusz_graph import Graph, showGraphs
from marketsim import signal, strategy, trader, orderbook, scheduler, observable
world = scheduler.create()
book_A = orderbook.Local(tickSize=0.01, label="A")
price_graph = Graph("Price")
assetPrice = observable.Price(book_A)
price_graph.addTimeSerie(assetPrice)
avg = observable.avg
price_graph.addTimeSerie(avg(assetPrice))
lp_A = strategy.LiquidityProvider(trader.SASM(book_A), volumeDistr=lambda:1).trader
signal = signal.RandomWalk(initialValue=20, deltaDistr=lambda: -.1, label="signal")
trader = strategy.Signal(trader.SASM(book_A, "signal"), signal).trader
price_graph.addTimeSerie(signal)
price_graph.addTimeSerie(observable.VolumeTraded(trader))
eff_graph = Graph("efficiency")
eff_graph.addTimeSerie(observable.Efficiency(trader))
eff_graph.addTimeSerie(observable.PnL(trader))
world.workTill(500)
showGraphs("signal_trader", [price_graph, eff_graph])
Signal
signal.RandomWalk is a sample discrete signal that changes at some moments of time by incrementing
on some random value. It has following parameters:
• initialValue - initial value of the signal (default: 0)
• deltaDistr - increment distribution function (default: normal distribution with µ=0, σ=1)
• intervalDistr - defines intervals of time between order creation (default: exponential distribution with λ=1)
136
Trend follower
Trend follower (strategy.TrendFollower class) can be considered as a sort of a signal strategy
(strategy.Signal) where the signal is a trend of the asset. Under trend we understand the first derivative of some moving average of asset prices. If the derivative is positive, the trader buys; if negative
– sells. Since moving average is a continuously changing signal, we check its derivative at random
moments of time. It has following parameters:
• trader -strategy’s trader
• average - moving average object. It should have methods update(time,value) and derivativeAt(time).
By default, we use exponentially weighted moving average with α = 0.15.
• orderFactoryT - order factory function (default: order.Market.T)
• threshold - threshold when the trader starts to act (default: 0.)
• creationIntervalDistr - defines intervals of time between order creation (default: exponential distribution with λ=1)
• volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1)
10.5.3
Arbitrage trading strategy
Arbitrage trading strategy (strategy.Arbitrage) tries to make arbitrage trading the same asset on
different markets. Once a bid price at one market becomes more than ask price at other market, the
strategy sends two complimentary market orders with opposite sides and the same volume (which is calculated using order queue’s volumeWithPriceBetterThan method) having thus non-negative profit.
This strategy is initialized by a sequence of order books it will follow for.
Sample usage:
from marketsim.veusz_graph import Graph, showGraphs
from marketsim import trader, strategy, orderbook, remote, scheduler, observable
world = scheduler.create()
book_A = orderbook.Local(tickSize=0.01, label="A")
book_B = orderbook.Local(tickSize=0.01, label="B")
link = remote.TwoWayLink()
remote_A = orderbook.Remote(book_A, link)
remote_B = orderbook.Remote(book_B, link)
price_graph = Graph("Price")
spread_graph = Graph("Bid-Ask Spread")
137
Figure 10.5: Arbitrage trader sample run (Bid/Ask price)
Figure 10.6: Arbitrage trader sample run (Cross spread)
138
cross_graph = Graph("Cross Bid-Ask Spread")
arbitrager = strategy.Arbitrage(\
trader.SingleAssetMultipleMarket([remote_A, remote_B])).trader
assetPrice = observable.Price(book_A)
price_graph.addTimeSerie(assetPrice)
avg = observable.avg
cross_AB = observable.CrossSpread(book_A, book_B)
cross_BA = observable.CrossSpread(book_B, book_A)
cross_graph.addTimeSerie(cross_AB)
cross_graph.addTimeSerie(cross_BA)
cross_graph.addTimeSerie(avg(cross_AB))
cross_graph.addTimeSerie(avg(cross_BA))
spread_graph.addTimeSerie(avg(observable.BidPrice(book_A)))
spread_graph.addTimeSerie(avg(observable.AskPrice(book_A)))
spread_graph.addTimeSerie(avg(observable.BidPrice(book_B)))
spread_graph.addTimeSerie(avg(observable.AskPrice(book_B)))
ewma_0_15 = observable.EWMA(assetPrice, alpha=0.15)
price_graph.addTimeSerie(observable.OnEveryDt(1, ewma_0_15))
lp_A = strategy.LiquidityProvider(trader.SASM(remote_A))
lp_B = strategy.LiquidityProvider(trader.SASM(remote_B))
world.workTill(500)
showGraphs("arbitrage", [price_graph, spread_graph, cross_graph])
10.5.4
Adaptive strategies
An adaptive strategy is a strategy that is suspended or resumed in function of its "efficiency". Adaptive
strategies in the library are parametrized by an object measuring the efficiency since there can be different
ways to do it. The default estimator is the first derivative of trader "efficiency" exponentially weighted
moving average. To measure strategy efficiency without influencing the market we use a "phantom"
strategy having same parameters as the original one (except volumeDistr is taken as constant 1) but
139
sending order.VirtualMarket to the order book. Function adaptive.withEstimator creates a
real strategy coupled with its "phantom" and estimator object.
def withEstimator(constructor, *args, **kwargs):
assert len(args) == 0, "positional arguments are not supported"
efficiencyFunc = kwargs[’efficiencyFunc’] \
if ’efficiencyFunc’ in kwargs \
else lambda trader: trend(efficiency(trader))
real = constructor(*args, **kwargs)
real.estimator = createVirtual(constructor, copy(kwargs))
real.efficiency = efficiencyFunc(real.estimator.trader)
return real
Function adaptive.suspendIfNotEffective suspends a strategy if it is considered as ineffective:
def suspendIfNotEffective(strategy):
strategy.efficiency.on_changed += \
lambda _: strategy.suspend(strategy.efficiency.value < 0)
return strategy
Class adaptive.chooseTheBest is a composite strategy. It is initialized with a sequence of strategies coupled with their efficiency estimators (i.e. having efficiency field). In some moments of time
(given by eventGen parameter) the most effective strategy is chosen and made running; other strategies
are suspended.
class chooseTheBest(Strategy):
def __init__(self, strategies, event_gen=None):
assert all(map(lambda s: s.trader == strategies[0].trader, strategies))
if event_gen is None:
event_gen = scheduler.Timer(lambda: 1)
Strategy.__init__(self, strategies[0].trader)
self._strategies = strategies
event_gen.advise(self._chooseTheBest)
def _chooseTheBest(self, _):
best = -10e38
for s in self._strategies:
if s.efficiency.value > best:
best = s.efficiency.value
for s in self._strategies:
s.suspend(best != s.efficiency.value)
140
Part IV
Imperfection and predictability in
order-driven markets
141
Chapter 11
Market imperfection and predictibility
This chapter addresses the notion of efficiency, in the sense of short-term predictibility, of financial
markets, in the light of the dramatic, recent changes that has led to a fierce competition between a
growing number of actors, and a dramatic increase in the influence of high frequency trading.
11.1
Forecasting and market efficiency
Forecasting the market has been one of the most exciting financial subjects for over a century. In 1900,
L. Bachelier [25] admitted, “Undoubtedly, the Theory of Probability will never be applicable to the
movements of quoted prices and the dynamics of the Stock Exchange will never be an exact science.
However, it is possible to study mathematically the static state of the market at a given instant to establish the probability law for the price fluctuations that the market admits at this instant." 70 years later,
Fama [126] proposed a formal definition of the market efficiency: “a market in which prices always fully
reflect available information is called efficient.” Opinions have been always divergent about the market
efficiency. B Malkiel [235] concluded that most investors trying to predict stocks’ returns always ended
up with profits inferior to passive strategies. In his book Fooled by Randomness, N. Taleb [306] argued
that even the best performances can be explained by luck and randomness.
Interestingly enough, the temptation to discover some predictable, and hopefully profitable, signals in
the middle of those millions of numbers has never been as high as today. In the academic world, [67] is
a detailed study of the statistical properties of the intraday returns, leading to the conclusion that there
is no evidence of correlation between successive returns. Similarly, [127] concludes that stock returns
exhibit negligible temporal autocorrelation. However, the use of limit order book data as in [327] has
yielded more promising results - in particular, the liquidity imbalance at the best bid and ask limits seems
to be informative to predict the next trade sign.
In the professional world, many books present strategies predicting the market and always earning
money, see e.g. [254], [313]. When testing those strategies on realistic samples, results are quite different and the strategies are no longer profitable. It is quite possible that therisk of over-fitting, inherent to
such methods, has played a key role in the seemingly good performances published in those books.
The study presented here is performed both from an academic and a professional perspective. For each
143
prediction method, not only are statistical results presented, but also, the performances of the correspondant strategies are presented. The aim is to give another point of view of a good prediction and of an
efficient market. In the first section, the data and the test methodology are presented. In the second section a nonlinear method based on conditional probability matrices, is used to test the predictive power of
each indicator. In the last section, the linear regression is introduced to combine the different indicators,
and several regularization tools are tested in order to enhance the performances of the strategies.
11.2
Data, methodology and performances measures
11.2.1
Data
We focus on the EUROSTOXX 50 European liquid stocks. One year (2013) of full daily order book
data are used to achieve the study. For a stock with a mid price S t at time t, the return to be predicted
over a period δt is Ln( SSt+δt
). At the time t, one can use all the available data for any time s ≤ t to perform
t
the prediction.
The focus is on predicting the stocks’ returns over a fixed period δt using some limit order book indicators. Once the returns and the indicators are computed, the data are sampled on a fixed time grid
from 10 : 00 am to 5 : 00 pm with a resolution δt. Three different resolutions are tested: 1, 5 and 30
minutes.
Below are the definitions of the studied indicators and the rationale behind using them to predict the
returns.
t
). Two effects justify the use of the past return indicaPast return: the past return is defined as Ln( SSt−δt
tor to predict the next return; the mean-reversion effect and the momentum effect. If a stock, suddenly,
shows an abnormal return that, significantly, deviates the stock’s price from its historical mean value,
the mean reversion effect is observed when another large return with opposite sign occurs rapidly after, driving the stock price back to its usual average range. On the other hand, if the stock exhibits, in
a progressive fashion, a significant deviation, then the momentum effect occurs when more and more
market participants become convinced of the relevance of the move and trade in the same sense, thereby
increasing the deviation.
P
Order book imbalance: the liquidity on the bid (respectively ask) side is defined as Liqbid = 5i=1 wi bi qbi
P
(respectively Liqask = 5i=1 wi ai qai ), where bi (respectively ai ) is the price at the limit i on the bid (respectively ask) side, qbi (respectively qai ) is the corresponding available quantity, and wi is a decreasing
function on i used to give more weight to the best limits. Those indicators give an idea about the instantaneous volume available for trading on each side of the order book. Finally, the order book imbalance
Liqbid
). This indicator summarizes the order book static state and gives an idea about the
is defined as Ln( Liq
ask
buy-sell instantaneous equilibrium. When this indicator is significantly higher (respectively lower) than
0, the available quantity at the bid side is significantly higher (respectively lower) than the one at the ask
side; only few participants are willing to sell (respectively buy) the stock, which might reflect a market
144
consensus that the stock will move up (respectively down).
Flow quantity: this indicator summarizes the order book dynamic over the last period δt. Qb (respectively Q s ) is denoted as the sum of the bought (respectively sold) quantities, over the last period δt
b
and the flow quantity is defined as Ln( Q
Q s ). This indicator is similar to the order flow and shows a high
positive autocorrelation. The rationale behind using the flow quantity is to verify if the persistence of
the flow is informative about the next return.
EMA: for a process (X)ti observed at discrete times (ti ), the Exponential Moving Average EMA(d, X)
with delay d is defined as EMA(d, X)t0 = Xt0 and for 1 ≤ i EMA(d, X)ti = ωXti + (1 − ω)EMA(d, X)ti−1 ,
where ω = min(1, ti −tdi−1 ). The EMA is a weighted average of the process with an exponential decay. The
smaller d is, the shorter the EMA memory is.
11.2.2
Methodology
The aim of this study is to empirically test the market efficiency by predicting the stocks’ returns over
three different time intervals: 1, 5 and 30 minutes. In Section 11.3, the indicators are the past returns,
the order book imbalance and the flow quantity. A simple method based on historical conditional probabilities is used to assess, separately, the informative effect of each indicator. In Section 11.4, the three
indicators and their EMA(X, d) for d ∈ (1, 2, 4, 8, 16, 32, 64, 128, 256) are combined in order to perform
a better prediction than the case of a single indicator. Different methods, based on linear regression,
are tested. In particular, some statistical and numerical stability problems of the linear regression are
addressed. In the different sections, the predictions are tested statistically, then used to design a simple
trading strategy. The goal is to verify whether one can find a profitable strategy covering 0.5 basis point
trading costs. This trading cost is realistic and corresponds to many funds, brokers, and banks trading
costs. The possibility of determining, if it exists, a strategy that stays profitable after paying the costs,
would be an empirical argument of the market inefficiency. Notice that, in all the sections, the learning
samples are sliding windows containing sufficient number of days, and the testing samples are the next
days. The models parameters are fitted in-sample on a learning sample, and the strategies are tested
out-of-sample on a testing sample. The sliding training windows prevents the methodology from any
overfitting, since performances are only computed out of sample.
11.2.3
Performance measures
In most studies addressing market efficiency, results are summarized in a linear correlation coefficient.
However, such a measure is not sufficient to conclude about returns predictability or the market efficiency: any interpretation of the results should depend on the predicted signal and a corresponding
trading strategy. A 1% correlation is high if the signal is supposed to be totally random, and 99% correlation is unsufficient if the signal is supposed to be perfectly predictable. Moreover, a trader making
1 euro each time trading a stock with 50.01% probability and losing 1 euro with 49.99% probability,
might be considered as a noise trader. However, if this strategy can be run, over 500 stocks, one time
per second, for 8 hours a day, at the end of the day the gain will be the sum S n of n = 14.4 million
145
realisations. Assuming that the hypotheses of the central limit theorem are √satisfied,
law N(E,
σ
√
).
n
Thus the probability of having a negative trading day is
−E n
Φ( σ )
Sn
n
has a normal
= Φ(−0.62) = 26.5%,
much lower than that of a noise trader. From now on, we shall adopt the very empirical, but quite realistic, view that returns are considered predictable, and thus, the market is considered inefficient, if one
can run a profitable strategy covering the trading costs.
11.3
Conditional probability matrices
The conditional probability matrices provide empirical estimates of conditional probability laws. To
apply this method, the data need to be discretized in a small number of classes. Denote the explanatory
variable as X, the logarithmic return as Y, so that Ytn = Ln(
Denote the classes of X (respectively Y) as
CX
=
{CiX
S tn+1
S tn ),
and the frequencies matrix as M.
: i ∈ N+ ∩ {i ≤ S X }} (respectively C Y = {CiY :
i ∈ N+ ∩ {i ≤ S Y }}). S X (respectively S Y ) denotes the total number of classes for X (respectively Y).
For a given learning period [0, T ] containing N observations, the matrix of frequencies at the time T is
constructed as:
i, j
MT = card({(Xtn ∈ CiX , Ytn ∈ C Yj )})
where n ∈ N+ ∩ {n ≤ N}, and Xtn (respectively Ytn ) is the nth observed value of X (respectively Y),
observed at the time tn . Then, a prediction of the "next" return YT conditional to the observations XT
at time T can be computed using the matrix MT . The idea of this method is a simple application of
the statistical independence test. If some events A = "Xtn ∈ CiX " and B="Ytn ∈ C Yj " are statistically
independent, then P(A|B) = P(A). For example, to check if the past returns (denoted X in this example)
can help predicting the future returns (denoted Y in this example), the returns are classified into two
classes and the empirical frequencies matrix is computed. Table 11.1 shows the results for the 1-minute
returns of Deutsche Telekom over the year 2013.
A = “X < 000
B = “X > 000
A = “Y < 000
19,950
21,597
B = “Y > 000
21,597
20,448
Table 11.1: Historical frequencies matrix for Deutsche Telekom over 2013
In probabilistic terms, the historical probability to observe a negative return is P(A) = 49.70% and
to observe a positive return is P(B) = 50.30%. Therefore, a trader always buying the stock would have
a success rate of 50.30%. Also note that P(A/A) = 48.02%, P(B/A) = 51.98%, P(A/B) = 51.37%,
P(B/B) = 48.63%. Thus, a trader playing the mean-reversion (buy when the past return is negative
and sell when the past return is positive), would have a success rate of 51.67%. The same approach,
when trading the strategy over 500 stocks, gives a success rate of 54.38% for the buy strategy and of
72.91% for the mean reversion strategy. This simple test shows that the smallest statistical bias can be
profitable and useful for designing a trading strategy. However the previous strategy is not realistic: the
conditional probabilities are computed in-sample and the full data set of Deutsche Telekom was used
for the computation. In reality, predictions have to be computed using only the past data. It is, thus,
important to have stationary probabilities. Table 11.2 shows that the monthly observed frequencies are
146
quite stable, and thus can be used to estimate out-of-sample probabilities. Each month, one can use the
observed frequencies of the previous month as an estimator of current month probabilities.
Month
P(A/A)
P(B/A)
P(A/B)
P(B/B)
Jan
0.49
0.51
0.50
0.50
Feb
0.48
0.52
0.52
0.48
Mar
0.47
0.53
0.51
0.49
Apr
0.48
0.52
0.53
0.47
May
0.47
0.53
0.52
0.48
Jun
0.50
0.50
0.49
0.51
Jul
0.48
0.52
0.51
0.49
Aug
0.51
0.49
0.50
0.50
Sep
0.51
0.49
0.51
0.49
Oct
0.47
0.53
0.50
0.50
Nov
0.46
0.54
0.51
0.49
Dec
0.44
0.56
0.55
0.45
Table 11.2: Monthly historical conditional probabilities: In the most cases, P(A/A) and P(B/B) are
lower than 50% where P(B/A) and P(A/B) are higher than 50%.
In the following paragraphs, frequencies matrices are computed on sliding windows for the different
indicators. Several classification and prediction methods are presented.
11.3.1
Binary method
In the binary case, returns are classified into positive and negative as the previous example and explanatory variables are classified relatively to their historical mean. A typical constructed matrix is shown
in Table 11.1. Denote, in the example shown on Table 11.1, C1X = {X < X = 0}, C2X = {X > X = 0},
C1Y = {Y < 0}, C2Y = {Y > 0}. Y can be predicted using different formula based on the frequency matrix.
Below some estimators examples:
Yb1 : The sign of the most likely next return conditionally to the current state.
Yb2 : The expectation of the most likely next return conditionally to the current state.
Yb3 : The expectation of next return conditionally to the current state.


X


+1 if XT ∈ C1
Yb1 = 


−1 if XT ∈ C X
2


Y
X
X


E(Y|Y ∈ C2 ∩ X ∈ C1 ) if XT ∈ C1
Yb2 = 

E(Y|Y ∈ C Y ∩ X ∈ C X ) if X ∈ C X

T
1
2
2


X
X


E(Y|X ∈ C1 ) if XT ∈ C1
Yb3 = 


E(Y|X ∈ C X ) if X ∈ C X
2
T
2
In this study, only results based on the estimator Yb3 (denoted b
Y in the rest of the paper) are presented.
Results computed using different other estimators are equivalent and the differences do not impact the
conclusions. To measure the quality of the prediction, four tests are applied:
AUC (Area under the curve) combines the true positive rate and the false positive rate to give an idea
about the classification quality.
147
Accuracy is defined as the ratio of the correct predictions (Y and b
Y have the same sign).
Gain is computed on a simple strategy to measure the prediction performance. Predictions are used
to run a strategy that buys when the predicted return is positive and sells when it is negative. At each
time, for each stock the strategy’s position is in {−100, 000, 0 , +100, 000}.
Profitability is defined as the gain divided by the traded notional of the strategy presented above. This
measure is useful to estimate the gain with different transaction costs.
Figure 1 summarizes the statistical results of predicting the 1-minute returns using the three indicators.
For each predictor, the AUC and the accuracy are computed over all the stocks. Results are computed
over more than 100,000 observations and the amplitude of the 95% confidence interval is around 0.6%.
For the three indicators, the accuracy and the AUC are significantly higher than the 50% random guessing threshold. The graph also shows that the order book imbalance gives the best results, and that the
past resturn is the least successful predictor. Detailed results per stock are given in Appendix C.
Figure 11.1: The quality of the binary prediction: The AUC and the Accuracy are higher than 50%. The
three predictors are better than random guessing and are significatly informative.
In Figure 2, the performances of the trading strategies based on the prediction of the 1-minute returns
are presented. The strategies are profitable and the results confirm the predictability of the returns (see
the details in Appendix C).
In Figure 3, the cumulative gains of the strategies based on the 3 indicators over 2013 are represented.
When trading without costs, predicting the 1-minute return using the past return and betting 100,000
euros at each time, would make a 5-million Euro profit. Even better, predicting using the order book
imbalance would make more than 20 million Euros profit. The results confirm the predictability of the
returns, but not the inefficiency of the market. In fact, Figure 4 shows that, when adding the 0.5 bp
trading costs, only the strategy based on the order book imbalance remains (marginally) positive. Thus,
Figure 5 represents
cumulative
gain
profitability
for the in5-minute
andC).the 30-minute
no conclusion,
about the the
market
efficiency,
canand
be the
made
(see more details
Appendix
strategies (with the trading costs). The strategies are not profitable. Moreover, the predictive power
decreases as the time horizon increases. The results of the binary method show that the returns are
148
Figure 11.2: The quality of the binary prediction: For the 3 predictors, the densities of the gain and the
profitability are positively biased, confirming the predictability of the returns.
Figure 11.3: The quality of the binary prediction: The graphs confirm that the 3 indicators are informative and that the order book imbalance indicator is the most profitable.
significantly predictable. Nevertheless, the strategies based on those predictions are not sufficiently
profitable to cover the trading costs. In order to enhance the predictions, the same idea is applied to the
four-class case. Moreover, a new strategy based on a minimum threshold of the expected return is tested.
11.3.2
Four-class method
We now investigate the case where the indicator X is classified into four classes; “very low values”
C1X , “low values” C2X , “high values” C3X and “very high values” C4X . At each time tn , Y is predicted
as b
Y = E(Y|X ∈ CiX ), where CiX is the class of the current observation Xtn . As the previous case, the
expectation is estimated from the historical frequencies matrix. Finally, a new trading strategy is tested.
149
Figure 11.4: The quality of the binary prediction: When adding the 0.5 bp trading costs, the strategies
are only slightly profitable.
The strategy is to buy (respectively sell) 100,000 euros when b
Y is positive (respectively negative) and
b
|Y| > θ, where θ is a minimum threshold (1 bp in this paper). Notice that the case θ = 0 corresponds to
the strategy tested in the binary case.
The idea of choosing θ > 0 aims to avoid trading the stock when the signal is noisy. In particular,
when analyzing the expectations of Y relative to the different classes of X, it is always observed that the
absolute value of the expectation is high when X is in one of its extreme classes (C1X or C4X ). On the
other hand, when X is in one of the intermediary classes (C2X or C3X ) the expectation of Y is close to 0
reflecting a noisy signal.
For each indicator X, the classes are defined as C1X =] − ∞, Xa [, C2X =]Xa , Xb [, C3X =]Xb , Xc [ and
C4X =]Xc , +∞[. To compute Xa , Xb and Xc , the 3 following classifications were tested:
Quartile classification: the quartile Q1 , Q2 and Q3 are computed in-sample for each day, then averaged over the days. Xa , Xb and Xc corresponds, respectively, to Q1 , Q2 and Q3 .
K-means classification: the K-means algorithm [171], applied to the in-sample data with k = 4, gives
the centers G1 , G2, G3 and G4 of the optimal (in the sense of the minimum within-cluster sum of squares)
clusters. Xa , Xb and Xc are given respectively by
G1 +G2 G2 +G3
2 ,
2
and
G3 +G4
2 .
Mean-variance classification: the average X and the standard deviation σ(X) are computed in the
learning period. Then, Xa , Xb and Xc correspond, respectively, to X − σ(X), X and X + σ(X).
Only the results based on the mean-variance classification are presented here, since the results computed
150
Figure 11.5: The quality of the binary prediction: The strategies are not profitable. Moreover, the
performances decreases significantly compared to the 1-minute horizon.
using the two other classifications are equivalent and the differences do not impact our conclusions.
Figure 6 compares the profitabilities of the binary and the 4-class methods. For the 1-minute prediction, the results of the 4-class method are significantly better. For the longer horizons, the results of
both methods are equivalent. Notice also that, using the best indicator, in the 4-class case, one could
obtain a significantly positive performance after paying the trading costs. Some more detailed results
are given in Appendix C.
The interesting result of this first section is that even when using the simplest statistical learning method,
the used indicators are informative and provide a better prediction than random guessing. However, in
most cases, the obtained performances are too low to conclude about the market inefficiency. In order
151
Figure 11.6: The quality of the 4-class prediction: For the 1-minute prediction, the results of the 4-class
method are significantly better than the results of the binary one. For longer horizons, both strategies are
not profitable when adding the trading costs.
to enhance the performances, the three indicators and their exponential moving average are combined in
the next section.
152
11.4
Linear regression
In this section, X denotes a 30-column matrix containing the 3 indicators and their EMA(d) for d ∈
(1, 2, 4, 8, 16, 32, 64, 128, 256), and Y denotes the target vector to be predicted. The general approach
is to calibrate, on the learning sample, a function f such that f (X) is “the closest possible” to Y, and
hope that, after the learning period, the relation between X and Y is still well described by f . Hence
f (X) would be a good estimator of Y. In the linear case, f is supposed to be linear and the model errors
are supposed to be independent and identically distributed [297]. Actually, the standard textbook model
posits a relationship of the form Y = Xβ+,
are done with z-scored data (use
11.4.1
Xi −Xi
σ(Xi )
∼ N(0, σ2 ). For technical reasons, the computations
instead of Xi ).
Ordinary least squares (OLS)
The OLS method consists in estimating the unknown parameter β by minimizing the sum of squares of
the residuals between the observed variable Y and the linear approximation Xβ. The estimator is denoted
b
β and is defined as
b
β = argminβ (Jβ = ||Y − Xβ||22 )
. This criterion is reasonable if at each time i the row Xi of the matrix X and the observation Yi of the
vector Y represent independent random sample from their populations.
The cost function Jβ dependes quadratically on β, and differentiating with respect to β gives:
δJβ
= 2tX Xβ − 2tX Y
δβ
δ2 J β
= 2tX X
δβδβ
.
When tX X is invertible, setting the first derivative to 0 gives the unique solution b
β = (tX X)−1 tX Y. The
statistical properties of this estimator can be calculated straightforward as follows:
E(b
β|X) = (tX X)−1 tX E(Y|X) = (tX X)−1 tX Xβ = β
,
Var(b
β|X) = (tX X)−1 tX Var(Y|X)X(tX X)−1 = (tX X)−1 tX σ2 IX(tX X)−1 = σ2 (tX X)−1
,
E(||b
β||22 |X) = E(tY X(tXX)−2 tXY|X)) = T race(X(tXX)−2 tXσ2 I) + ||β||22 = σ2 T race((tXX)−1 ) + ||β||22
X 1
MS E(b
β) = E(||b
β − β||22 |X) = E(||b
β||22 |X) − ||β||22 = σ2 T race((tXX)−1 ) = σ2
λi
,
,
where MSE denotes the mean squared error and the λi ’s are the eigenvalues of tX X. Notice that the OLS
estimator is unbiased, but can exhibit an arbitrary high MSE when the matrix tX X has eigenvalues close
to 0 .
In the out-of-sample period, b
Y = Xb
β is used to predict the target. As seen in section 11.2, the corresponding trading strategy is to buy (respectively sell) 100,000 euros when b
Y > 0 (respectively b
Y < 0).
To measure the quality of the predictions, the binary method based on the order book imbalance indicator is taken as a benchmark. The linear regression is computed using 30 indicators, including the order
153
book imbalance, thus it should perform at least as well as the binary method. Figure 7 compares the
profitabilities of the two strategies. The detailed statistics per stock are given in Appendix C. Similarly
Figure 11.7: The quality of the OLS prediction: The results of the OLS method are not better than those
of the binary one.
to the binary method, the performance of the OLS method decreases as the horizon increases. Moreover,
the surprising result is that, when combining all the 30 indicators, the results are not better than just applying the binary method to the order book imbalance indicator. This leads to questioning the quality of
the regression.
Figure 8 gives some example of the OLS regression coefficients. It is observed that the coefficients
are not stable over the time. For example, for some period, the regression coefficient of the order book
imbalance indicator is negative. This does not make any financial sense. In fact, when the imbalance is
high, the order book shows more liquidity on the bid side (participants willing to buy) than the ask side
(participants willing to sell). This state of the order book is observed on average before an up move,
i.e., a positive return. The regression coefficient should, thus, be always positive. It is also observed
that, for highly correlated indicators, the regression coefficients might be quite different. This result also
does not make sense, since one would expect to have close coefficients for similar indicators. From a
statistical view, this is explained by the high MSE caused by the high colinearity between the variables.
In the following paragraphs, the numerical aspect is addressed, and some popular solutions to the OLS
estimation problems are tested.
11.4.2
Ridge regression
When solving a linear system AX = B, A being invertible, if a small change in the coefficient matrix (A)
or a small change in the right hand side (B) causes a large change in the solution vector (X), the system
is said to be ill-conditioned. An example of an ill-conditioned system is given below:
154
Figure 11.8: The quality of the OLS prediction: The graph on the left shows the instability of the
regression coefficient of the order book imbalance indicator over the year 2013 for the stock Deutsh
Telecom. The graph on the right shows, for a random day, a very different coefficients for similar
indicators; the order book imbalance and its exponential moving averages.

   

  

1.000 2.000  x   4.000 
 x   2.000 

 ×   = 
 =>   = 

3.000 5.999
y
11.999
y
1.000
. When making a small change in the matrix A:

 
1.001 2.000  x
 × 

y
3.000 5.999
  

 
 x   −0.400
  4.000 
 =>   = 
 = 
2.200
y
11.999



. When making a small change in the vector B:

 
1.000 2.000  x

 × 
3.000 5.999
y
 

  
  4.001 
 x   −3.999
 = 
 =>   = 
11.999
y
4.000



. Clearly, it is mandatory to take into consideration such effects before achieving any computation when
dealing with experimental data. In the literature, various measures of the ill-conditioning of a matrix
have been proposed [289], the most popular one [203] probably being is the condition numberK(A) =
√
||A||2 ||A−1 ||2 , where ||.||2 denotes the l2 -norm defined for a vector X as ||X||2 = tX X and for a matrix A as
||AX||2
.
||X||2 ,0 ||X||2
||A||2 = max
The larger K(A), the more ill-conditioned A is. The rationale for K(A) is to measure
the sensitivity of the solution X relative to a perturbation of the matrix A or the vector B. More precisely,
it is proved that:
• If AX = B and A(X + δX) = B + δB
then
155
||δX||2
||X||2
2
≤ K(A) ||δB||
||B||2 ;
• If AX = B and (A + δA)(X + δX) = B
then
||δX||2
||X+δX||2
2
≤ K(A) ||δA||
||A||2 .
Notice that K(A) can be easily computed as the maximum singular value of A. For example, in the system above, K(A) = 49988. The small perturbations can, thus, be amplified by afactor of almost 50000,
causing the previous observations.
Figure 9 represents the singular values of tX X used to compute the regression of the right graph of
Figure 8. The graph shows a hard decreasing singular values. In particular, the condition number is
higher than 80000.
Figure 11.9: The quality of the OLS prediction: The graph shows that the matrix inverted when computing the OLS coefficient is ill-conditioned.
This finding explains the instability observed on the previous section. Not only is the OLS estimator
statistically not satisfactory, but the numerical problems caused by the ill-conditioning of the matrix
makes the result numerically unreliable. One popular solution to enhance the stability of the estimation
of the regression coefficients is the Ridge method. This method was introduced independently by A.
Tikhonov, in the context of solving ill-posed problems, around the middle of the 20th century, and by
A.E. Hoerl in the context of linear regression. The Ridge regression consists of adding a regularization
term to the original OLS problem:
βbΓ = argminβ (||Y − Xβ||22 + ||Γβ||22 )
. The new term gives preference to a particular solution with desirable properties. Γ is called the
Tikhonov matrix and is usually chosen as a multiple of the identity matrix: λR I, where λR ≥ 0. The new
estimator of the linear regression coefficients is called the Ridge estimator, denoted by b
βR and defined as
follows:
b
βR = argminβ (||Y − Xβ||22 + λR ||β||22 )
.
156
Similarly to the OLS case, straightforward computations show that
b
βR = (tX X + λR I)−1 tX Y = Zb
β
where
Z = (I + λR (tX X)−1 )−1 = W −1
and
E(b
βR |X) = E(ZRb
β|X) = Zβ
,
Var(b
βR |X) = Var(Zb
β|X) = σ2 Z(tX X)−1 tZ
,
MS E(b
βR ) = E(t(Zbβ−β) (Zb
β − β)|X) = E(tβ tZ Zβ|X) − 2tβ Zβ + tβ β
,
= T race(tZ Zσ2 (tX X)−1 ) + tβ tZ Zβ − 2tβ Zβ + tβ β.
Note that
(tX X)−1 =
Z −1 − I
λR
,
I − Z = (I − Z)WW −1 = (W − I)W −1 = λR (tX X)−1 W −1 = λR (WtX X)−1 = λR (tX X + λR I)−1 ,
and thus
σ2 Z
σ2 Z
MS E(b
βR ) = T race(
) − T race(
) + tβ (I − Z)2 β,
λR Z
λR Z 2
X
λi
= σ2
+ λ2R tβ (tX X + λR I)−2 β.
(λi + λR )2
The first element of the MSE corresponds exactly to the trace of the covariance matrix of b
βR , i.e., the
total variance of the parameters estimations. The second element is the squared distance from b
βR to β
and corresponds to the square of the bias introduced when adding the ridge penalty. Notice that, when
increasing the λR , the bias increases and the variance decreases. On the other hand, when decreasing the
λR , the bias decreases and the variance increases, both converging to their OLS values. To enhance the
stability of the linear regression, one should compute a λR , such that MS E(b
βR ) ≤ MS E(b
β). As proved
in [184], this is always possible:
Theorem 11.4.1 (Hoerl) There always exist λR ≥ 0 such that MS E(b
βR ) ≤ MS E(b
β).
From a statistical view, adding the Ridge penalty aims at reducing the MSE of the estimator, and
is particularly necessary when the covariance matrix is ill-conditioned. From a numerical view, the
new matrix to be inverted is tX X + λR I with as eigen values (λi + λR )i . The conditional number is
K(tX X + λR I) =
λmax +λR
λmin +λR
≤
λmax
λmin
= K(tX X). Hence, the ridge regularization enhances the conditioning of
the problem and improves the numerical reliability of the result.
From the previous, it can be seen that increasing the λR leads to numerical stability and reduces the
variance of the estimator, however it increases the bias of the estimator. One has to chose the λR as a
157
tradoff between those 2 effects. Next, 2 estimators of λR are tested; the Hoerl-Kennard-Baldwin (HKB)
estimator [185] and the Lawless-Wang (LW) estimator [217] .
In order to compare the stability of the Ridge and the OLS coefficients, Figure 10 and Figure 11 represent the same test of Figure 8, applied, respectively, to the Ridge HKB and the Ridge LW methods.
In the 1-minute prediction case, the graphs show that the Ridge LW method gives the most coherent
coefficients. In particular, the coefficient of the order book imbalance is always positive (as expected
from a financial view) and the coefficients of similar indicators have the same signs.
Finally, Figure 12 summarizes the profitabilities of the corresponding strategies of the 2 methods. Appendix C contains more detailed results per stock.
Figure 11.10: The quality of the Ridge HKB prediction: The graphs show that the results of the Ridge
HKB method are not significantly different from those of the OLS method (Figure 8). In this case, the
λR is close to 0 and the effect of the regularization is limited.
From the previous results, it can be concluded that adding a regularization term to the regression
enhances the predictions. The next section deals with an other method of regularization: the reduction
of the indicators’ space.
11.4.3
Least Absolute Shrinkage and Selection Operator (LASSO)
Due to the colinearity of the indicators, the eigen values spectrum of the covariance matrix might be
concentrated on the largest values, leading to an ill-conditioned regression problem. The Ridge method,
reduces this effect by shifting all the eigen values. This transformation leads to a more reliable results,
but might introduce a bias in the estimation. In this paragraph, a simpler transformation of the original
158
Figure 11.11: The quality of the Ridge LW prediction: The graph on the left shows the stability of
the regression coefficient of the order book imbalance over the year 2013 for Deutsh Telecom. The
coefficient is positive during all the period, in line with the financial view. The graph on the right shows,
for a random day, a positive coefficients for the order book imbalance and its short term EMAs. The
coefficients decreases with the time; ie the state of the order book “long time ago” has a smaller effect
than its current state. More over, for longer than a 10-second horizon, the coefficients become negative
confirming the mean-reversion effect.
Figure 11.12: The quality of the Ridge prediction: For the 1-minute and the 5-minute horizons the
LW method performs significantly better than the OLS method. However, for the 30-minute horizon,
the HKB method gives the best results. Notice that for the 1-minute case, the LW method improves
the performances by 58% compared to the OLS, confirming that stabilizing the regression coefficients
(Figure 11 compared to Figure 8), leads to a better trading strategies.
indicators’ space, the LASSO regression, is presented.
The LASSO method [310] enhances the conditioning of the covariance matrix by reducing the num159
ber of the used indicators. Mathematically, the LASSO regression aims to produce a sparse regression
coefficients -ie with some coefficients exactly equal to 0. This is possible thanks to the l1 −penalization.
More precisely, the LASSO regression is to estimate the linear regression coefficient as:
b
βL = argminβ (||Y − Xβ||22 + λL ||β||1 )
where ||.||1 denotes the l1 -norm, defined as the sum of the coordinates’ absolute values. Writing |βi | =
βi+ − βi− and βi = βi+ + βi− , with βi+ ≥ 0 and βi− ≤ 0, a classic quadratic problem, with a linear constraints, is obtained and can be solved by a classic solver. As far as known, there is no estimator for λL .
In this study, the cross-validation [171] method is applied to select λL out of a set of parameters; T 10−k ,
where k ∈ (2, 3, 4, 5, 6) and T denotes the number of the observations.
Figure 13 compares, graphically, the Ridge and the LASSO regularization, Figure 14 addresses the
instability problems observed in Figure 8 and Figure 15 summarizes the results of the strategies corresponding to the LASSO method. The detailed results per stock are given in Appendix C. The next
Figure 11.13: The quality of the LASSO prediction: The estimation graphs for the Ridge (on the left)
and the LASSO regression (on the right). Notice that the l1 −norm leads to 0 coefficients on the less
important axis.
paragraph introduces the natural combination of the Ridge and the LASSO regression and presents this
paper’s conclusions concerning the market inefficiency.
11.4.4
ELASTIC NET (EN)
The EN regression aims to combine the regularization effect of the Ridge method and the selection effect
of the LASSO one. The idea is to estimate the regression coefficients as:
b
βEN = argminβ (||Y − Xβ||22 + λEN1 ||β||1 + λEN2 ||β||22 )
160
Figure 11.14: The quality of the LASSO prediction: The graphs show that the LASSO regression gives a
regression coefficients in line with the financial view (similarly to figure 11). Moreover, the coefficients
are sparse and simple for the interpretation.
Figure 11.15: The quality of the LASSO prediction: Similar as the Ridge regression, the LASSO regression gives a better profitability than the OLS one. Notice that for the 1-minute case, the LASSO method
improves the performances by 165% compared to the OLS. Eventhough the LASSO metho is using less
regressors than the OLS method, (and thus less signal), the out of sample results are significantly better
in the LASSO case. This result confirms the importance of the signal by noise ratio and highlights the
importance of the regularization when adressing an ill-conditioned problem.
The detail about the computation can be found in [330].
In this study, the estimation is computed in two steps. In the first step λEN1 and λEN2 are selected
via the crossvalidation and the problem is solved same as the LASSO case. In the second step, the final
coefficients are obtained by a Ridge regression (λEN1 = 0) over the selected indicators (indicators with
161
a non-zero coefficient in the first step). The two step method avoids useless l1 -penalty effects on the
selected coefficients.
Figure 16 shows that the coefficients obtained by the EN method are in line with the financial view and
combine both regularization effects observed when using the Ridge and the LASSO methods. Finally,
Figure 11.16: The quality of the EN prediction: The graphs show that the EN regression gives a regression coefficients in line with the financial view (similarly to figure 11 and 14).
the strategy presented in 2.2 (trading only if b
Y ≥ |θ| ) is applied to the different regression methods.
Figure 17 summarizes the obtained results.
The results for the three horizons confirm that the predictions of all the regularized method (Ridge,
LASSO, EN) are better than the OLS ones. As detailed in the previous paragraphs, this is always the
case when the indicators are highly correlated. Moreover, the graphs show that the EN method gives
the best results compared to the other regressions. The 1-minute horizon results underline that, when an
indicator has an obvious correlation with the target, using a simple method based exhaustively on this
indicator, performs as least as well as more sophisticated methods including more indicators. Finally,
the performance of the EN method for the 1-minute horizon suggest that the market is inefficient for
such horizon. The conclusion is less obvious for the 5-minute horizon. On the other hand, the 30-minute
horizon results show that, none of the tested methods could find any proof of the market inefficiency for
such horizon. From the previous, it can be concluded that the market is inefficient in the short term, this
inefficiency disappears progressively when the new information are widely diffused.
162
Figure 11.17: The quality of the EN prediction: The EN method gives the best results compared to the
other regressions.
Conclusions
A large empirical study has been performed over the stocks of the EUROSTOXX 50 index, in order to
test the predictability of returns. The first part of the study shows that the future returns are not independent of the past dynamic and state of the order book. In particular, the order book imbalance indicator
is informative and provides a reliable prediction of the returns. The second part of the study shows that
combining different order book indicators using adequate regressions leads to a trading strategies with a
good performances even when paying the trading costs. In particular, our results show that the market is
inefficient in the short term and that a period of a few minutes is necessary for prices to incorporate new
163
information.
164
Part V
Appendices
165
Appendix A
Limit order book data
The very possibility of an experimental approach to the study of limit order books lies in the availability of data. Although exchanges themselves actually keep an extensive record of the orders that they
received together with all their characteristics, the available historical data sets are generally not as complete as one would hope. Therefore, many statistical studies on the order books rely on a set of data
that may be lacking some information, and some assumptions may have to be made. Most of the results
presented here, at least, those we have produced ourselves, use the Thomson-Reuters TRTH database.
After explaining the algorithm used for the processing of limit order book data, we describe the data sets
that have been used to obtain the various empirical results presented in this book.
A.1
Limit order book data processing
Because one cannot distinguish market orders from cancellations in tick by tick data, and since the
timestamps of the trades and tick by tick data files are asynchronous, we use a matching procedure to
reconstruct the order book events. In a nutshell, we proceed as follows for each stock and each trading
day:
1. Parse the tick by tick data file to compute order book state variations:
• If the variation is positive (volume at one or more price levels has increased), then label the
event as a limit order.
• If the variation is negative (volume at one or more price levels has decreased), then label the
event as a “likely market order”.
• If no variation—this happens when there is just a renumbering in the field “Level” that does
not affect the state of the book—do not count an event.
2. Parse the trades file and for each trade:
(a) Compare the trade price and volume to likely market orders whose timestamps are in [tT r −
∆t, tT r + ∆t], where tT r is the trade timestamp and ∆t is a predefined time window1 .
1
We set ∆t = 3 s for CAC 40 stocks. We found that the median reporting delay for trades is −900 ms: on average, trades
are reported 900 milliseconds before the change is recorded in tick by tick data.
167
Timestamp
33480.158
33480.476
33481.517
33483.218
33484.254
33486.832
33489.014
33490.473
33490.473
33490.473
33490.473
Side
B
B
B
B
B
A
B
B
B
B
A
Level
1
2
5
1
1
1
2
1
1
1
1
Price
121.1
121.05
120.9
121.1
121.1
121.15
121.05
121.1
121.1
121.1
121.15
Quantity
480
1636
1318
420
556
187
1397
342
304
256
237
Table A.1: Tick by tick data file sample. Note that the field “Level” does not necessarily correspond to
the distance in ticks from the best opposite quote as there might be gaps in the book. Lines corresponding
to the trades in table A.2 are highlighted in italics.
Timestamp
33483.097
33490.380
33490.380
33490.380
Last
121.1
121.1
121.1
121.1
Last quantity
60
214
38
48
Table A.2: Trades data file sample.
(b) Match the trade to the first likely market order with the same price and volume and label the
corresponding event as a market order—making sure the change in order book state happens
at the best price limits.
(c) Remaining negative variations are labeled as cancellations.
Doing so, we have an average matching rate of around 85% for CAC 40 stocks. As a byproduct, one
gets the sign of each matched trade, that is, whether it is buyer or seller initiated.
Tables A.1 and A.2 below provide an example of the data files and of the matching algorithm
A.2
Stylized facts
In Chapter 2, we have produced our own empirical plots. We use Thomson-Reuters tick-by-tick data on
the Paris Bourse. We select four stocks: France Telecom (FTE.PA) , BNP Paribas (BNPP.PA), Societe
Générale (SOGN.PA) and Renault (RENA.PA). For any given stocks, the data displays time-stamps,
traded quantities, traded prices, the first five best-bid limits and the first five best-ask limits. Except
when mentioned otherwise, all statistics are computed using all trading days from Oct, 1st 2007 to May,
30th 2008, i.e. 168 trading days. On a given day, orders submitted between 9:05am and 5:20pm are
taken into account, i.e. first and last minutes of each trading days are removed.
168
A.3
Market taking and making
In Chapter 4, we use the following order book data for several types of financial assets:
• BNP Paribas (RIC2 : BNPP.PA): 7th component of the CAC40 during the studied period
• Peugeot (RIC: PEUP.PA): 38th component of the CAC40 during the studied period
• Lagardère SCA (RIC: LAGA.PA): 33th component of the CAC40 during the studied period
• Dec.2009 futures on the 3-month Euribor (RIC: FEIZ9)
• Dec.2009 futures on the Footsie index (RIC: FFIZ9)
We use Reuters RDTH tick-by-tick data from September 10th, 2009 to September 30th, 2009 (i.e.
15 days of trading). For each trading day, we use only 4 hours of data, precisely from 9:30 am to 1:30
pm. As we are studying European markets, this time frame is convenient because it avoids the opening
of American markets and the consequent increase of activity.
Our data is composed of snapshots of the first five limits of the order books (ten for the BNPP.PA
stock). These snapshots are timestamped to the millisecond and taken at each change of any of the limits
or at each transaction. The data analysis is performed as follows for a given snapshot:
1. if the transaction fields are not empty, then we record a market order, with given price and volume;
2. if the quantity offered at a given price has increased, then we record a limit order at that price,
with a volume equal to the difference of the quantities observed;
3. if the quantity offered at a given price has decreased without any transaction being recorded, then
we record a cancellation order at that price, with a volume equal to the difference of the quantities
observed;
4. finally, if two orders of the same type are recorded at the same time stamp, we record only one
order with a volume equal to the sum of the two measured volumes.
Therefore, market orders are well observed since transactions are explicitly recorded, but it is important to note that our measure of the limit orders and cancellation orders is not direct. In table A.3,
we give for each studied order book the number of market and limit orders detected on our 15 4-hour
samples. On the studied period, market activity ranges from 2.7 trades per minute on the least liquid
stock (LAGA.PA) to 14.2 trades per minute on the most traded asset (Footsie futures).
2
Reuters Identification Code
169
Code
BNPP.PA
PEUP.PA
LAGA.PA
FEIZ9
FFIZ9
Number of limit orders
321,412
228,422
196,539
110,300
799,858
Number of market orders
48,171
23,888
9,834
10,401
51,020
Table A.3: Number of limit and markets orders recorded on 15 samples of four hours (Sep 10th to Sep
30th, 2009 ; 9:30am to 1:30pm) for 5 different assets (stocks, index futures, bond futures)
170
Appendix B
Some useful mathematical notions
B.1
Point processes
General reference are [104][? ].
Definition B.1.1 (Point process) A point process is an increasing sequence (T n )n∈N of positive random
variables defined on a measurable space (Ω, F , P).
We will restrict our attention to processes that are nonexplosive, that is, for which limn→∞ T n = ∞. To
each realization (T n ) corresponds a counting function (N(t))t∈R+ defined by
N(t) = n if t ∈ [T n , T n+1 [, n ≥ 0.
(B.1.1)
(N(t)) is a right continuous step function with jumps of size 1 and carries the same information as the
sequence (T n ), so that (N(t)) is also called a point process.
Definition B.1.2 (Multivariate point process) A multivariate point process (or marked point process)
is a point process (T n ) for which a random variable Xn is associated to each T n . The variables Xn take
their values in a measurable space (E, E).
We will restrict our attention to the case where E = {1, . . . , M}, m ∈ N∗ . For each m ∈ {1, . . . , M}, we
can define the counting processes
N m (t) =
X
I(T n ≤ t)I(Xn = i).
(B.1.2)
n≥1
We also call the process
N(t) = (N 1 (t), . . . , N M (t))
a multivariate point process.
Definition B.1.3 (Intensity of a point process) A point process (N(t))t∈R+ can be completely characterized by its (conditional) intensity function, λ(t), defined as
λ(t) = lim
u→0
P [N(t + u) − N(u) = 1|Ft ]
,
u
171
(B.1.3)
where Ft is the history of the process up to time t, that is, the specification of all points in [0, t]. Intuitively
P [N(t + u) − N(u) = 1|Ft ] = λ(t) u + o(u),
(B.1.4)
P [N(t + u) − N(u) = 0|Ft ] = 1 − λ(t)u + o(u),
(B.1.5)
P [N(t + u) − N(u) > 1|Ft ] = o(u).
(B.1.6)
This is naturally extended to the multivariate case by setting for each m ∈ {1, . . . , M}
P [N m (t + u) − N m (u) = 1|Ft ]
.
u→0
u
λm (t) = lim
(B.1.7)
Definition B.1.4 A point process is stationary when for every r ∈ N∗ and all bounded Borel subsets
A1 , . . . , Ar of the real line, the joint distribution of
{N(A1 + t), . . . , N(Ar + t)}
does not depend on t.
B.1.1
Hawkes processes
We recall several classical results on multivariate Hawkes processes. Let then N = (N 1 , ..., N D ) be a
D-dimensional point process with intensity vector λ = (λ1 , ..., λD ).
Definition B.1.5 We say that N = (N 1 , . . . , N D ) is a multivariate Hawkes process when
λm
t
=
λm
0
+
D
X
αm j
t
Z
j
e−βm j (t−s) dN s
0
j=1
for 1 6 m 6 D.
ij
Proposition B.1.1 Define the processes µi j as: µt = αi j
{µi j }1≤i, j≤D . Then, the process (N, µ) is markovian.
Rt
0
j
e−βi j (t−s) dN s , 1 ≤ i, j ≤ D, and let µ =
Proof. The proof of this classical result can be found e.g. in [57].
Stationarity
Extending the early stability and stationarity result in [178], [57] proves a general result for the multivariate Hawkes processes just introduced:
Proposition B.1.2 Let the matrix A be defined by
Ai j =
α ji
,
β ji
1 ≤ i, j ≤ D.
If its spectral radius ρ(A) satisfies the condition
ρ(A) < 1,
172
(B.1.8)
then there exists a (unique) multivariate point process N = (N 1 , . . . , N m ) whose intensity is specified
as in Definition B.1.5. Morevover, this process is stable, and reaches its unique stationary distribution
exponentially fast.
The interested reader is referred to the original paper [57] for the proof of this result. Note that, in Section
B.1.1, a straightforward, explicit construction of a Lyapunov function for such a process is given. This
function actually provides an alternate proof of stability, together with the exponential convergence
towards the stationary distribution, based on the concept of V-geometric ergodicity of [247].
Lyapunov functions for Hawkes processes
For the sake of completeness, an explicit construction of a Lyapunov function for a multi-dimensional
Hawkes processes N = (N i ) with intensities
λit
=
λi0
+
t
XZ
j
αi j e−βi j (t−s) dN s
0
j
is provided here.
Denote as in Proposition B.1.1
ij
µt
=
t
Z
j
αi j e−βi j (t−s) dN s ,
0
so that there holds
λit = λi0 +
X
ij
µt .
(B.1.9)
j
We assume for simplicity 1 the following
∀i, j, αi j > 0, βi j > 0,
(B.1.10)
as well as the spectral condition (B.1.8). The infinitesimal generator associated to the markovian process
(µi j ), 1 6 i, j 6 D, is the operator
LH F(µ) =
X
λ j (F(µ + ∆ j (µ)) − F(µ)) −
j
X
i, j
βi j µi j
∂F
,
∂µi j
where µ is the vector with components µi j and λ j is as in (B.1.9). The notation ∆ j (µ) characterizes the
jumps in those of the entries in µ that are affected by a jump of the process N j . For a fixed index j, it is
given by the vector with entries αi j at the relevant spots, and zero entries elsewhere.
A Lyapunov function for the associated semi-group is sought under the form
V(µ) =
X
δi j µi j
i, j
1
The classical assumption of irreducibility would be sufficient, as long as all the β’s are strictly positive
173
(B.1.11)
(since the intensities are always positive, a linear function will be coercive). Assuming (B.1.11), there
holds
LH V =
X
λ j(
X
j
δi j αi j ) −
X
i
βi j µi j δi j
i, j
or
LH V =
X
j
(λ0 +
X
i, j
µ jk )δi j αi j − βi j µi j δi j .
(B.1.12)
k
At this stage, it is convenient to introduce the maximal eigenvector of the matrix A (introduced in
Proposition B.1.2) with entries
Ai j =
α ji
.
β ji
Denote by κ the associated maximal eigenvalue. By Assumption (B.1.8), one has that 0 < κ < 1 and
furthermore, by Perron-Frobenius theorem, there holds: ∀i, i > 0.
Assuming that
δi j ≡
the expression for V becomes
V(µ) =
i
,
βi j
(B.1.13)
X µi j
i .
βi j
i, j
(B.1.14)
Plugging (B.1.14) in (B.1.12) yields
LH V =
X
j
λ0 δi j αi j +
i, j
=
X
µ jk i
i, j,k
X
j
λ0 δi j αi j + (κ − 1)
i, j
using the identity
P
j A ji i
αi j X
−
β jk µ jk δ jk
βi j
j,k
X
k µ jk ,
j,k
= κ j . A comparison with (B.1.14) easily yields the upper bound
LH V 6 −γV + C,
(B.1.15)
with γ = (1 − K)βmin , βmin ≡ In fi, j (βi j ) > 0 by assumption, and C =
P
j
i, j λ0 δi j αi j
≡ κ.λ0 .
The following result generalizes the form of Lyapunov functions beyond Equation (B.1.11):
Lemma B.1.1 Under the standing assumptions (B.1.8) and (B.1.10), one can construct a Lyapunov
function of arbitrary high polynomial growth at infinity.
Proof. Let n ∈ N∗ , and V be the function defined in (B.1.14). The function V n satisfies
LH V n (µ) =
X
λ j (V n (µ + ∆ j (µ)) − V n (µ)) − nV n−1 (
j
X
i, j
βi j µi j
∂V
).
∂µi j
Upon factoring V n (µ + ∆ j (µ)) − V n (µ):
n−1
X
V n (µ + ∆ j (µ)) − V n (µ) = (V(µ + ∆ j (µ)) − V(µ))( V n−1−k (µ + ∆ j (µ))V k (µ)),
k=0
174
(B.1.16)
the linearity of V yields the following expression
V n (µ + ∆ j (µ)) − V n (µ) = nV n−1 (µ)(V(µ + ∆ j (µ)) − V(µ)) + M j V(µ),
where M j V(µ) can be bounded by a polynomial function of order n − 1 at infinity in µ. Therefore, one
can rewrite (B.1.16) as follows
LH V n (µ) = (nV n−1 LH V)(µ) + MV(µ),
(B.1.17)
where M(V)(µ) is a polynomial of order n − 1 in µ. Combining (B.1.14) with (B.1.17) shows that V n is
also a Lyapunov function for the Hawkes process.
B.2
Ergodic theory for Markov processes
B.2.1
The ergodic theorem
The Ergodic Theorem for Markov processes states the following:
Theorem B.2.1 ([247][239][63]) Let H be in L1 (Π(dX)). Then,
a.s.
1
lim
t→+∞ t
B.2.2
Z
t
G(Xt )dt =
Z
G(X)Π(dX).
0
Stochastic stability
The main reference here is [? ].
Recall some useful definitions from the theory of Markov chains and stochastic stability. Let (Qt )t≥0 be
the Markov transition probability function of a Markov process at time t, that is
Qt (x, E) := P [X(t) ∈ E|X(0) = x] , t ∈ R+ , x ∈ S, E ⊂ S,
(B.2.1)
where S is the (locally compact, metric) state space. We recall that a (aperiodic, irreducible) Markov
process is ergodic if an invariant probability measure π exists and
lim ||Qt (x, .) − π(.)|| = 0, ∀x ∈ S,
t→∞
(B.2.2)
where ||.|| designates for a signed measure ν the total variation norm defined as
||ν|| := sup |ν( f )| = sup ν(E) − inf ν(E).
f :| f |<1
E∈B(S)
E∈B(S)
(B.2.3)
In (B.2.3), B(S) is the Borel σ-field generated by S, and for a measurable function f on S, ν( f ) :=
R
f dν.
S
V-uniform ergodicity. A Markov process is said V-uniformly ergodic if there exists a coercive func175
tion V > 1, an invariant distribution π, and constants 0 < r < 1, and R < ∞ such that
||Qt (x, .) − π(.)|| ≤ Rrt V(x), x ∈ S, t ∈ R+ .
(B.2.4)
V−uniform ergodicity can be characterized in terms of the infinitesimal generator of the Markov process.
Indeed, it is shown in [247, 246] that it is equivalent to the existence of a coercive function V (the
“Lyapunov test function”) such that
LV(x) ≤ −βV(x) + γ, (Geometric drift condition.)
(B.2.5)
for some positive constants β and γ. (Theorems 6.1 and 7.1 in [246].) Intuitively, condition (B.2.5) says
that the larger V(X(t)) the stronger X is pulled back towards the center of the state space S.
176
Appendix C
Comparison of various prediction methods
C.1
Results for the binary classification
177
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
Order book imbalance
AUC
Accuracy
0.54
0.54
0.56
0.56
0.61
0.61
0.56
0.56
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.56
0.56
0.53
0.53
0.55
0.55
0.62
0.62
0.55
0.55
0.54
0.54
0.53
0.53
0.54
0.54
0.56
0.56
0.56
0.56
0.63
0.63
0.64
0.64
0.58
0.58
0.54
0.54
0.62
0.62
0.52
0.52
0.56
0.56
0.56
0.56
0.53
0.53
0.60
0.60
0.59
0.59
0.59
0.59
0.58
0.58
0.60
0.60
0.56
0.56
0.57
0.57
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.59
0.59
0.57
0.57
0.56
0.56
0.57
0.57
0.57
0.57
Flow quantity
AUC Accuracy
0.53
0.53
0.53
0.53
0.51
0.51
0.53
0.53
0.52
0.52
0.53
0.53
0.53
0.53
0.53
0.53
0.53
0.53
0.52
0.52
0.53
0.53
0.58
0.58
0.51
0.51
0.53
0.53
0.52
0.52
0.53
0.53
0.52
0.52
0.54
0.54
0.51
0.51
0.51
0.51
0.51
0.51
0.52
0.52
0.50
0.50
0.51
0.51
0.52
0.52
0.54
0.54
0.53
0.53
0.51
0.51
0.55
0.55
0.52
0.52
0.52
0.52
0.53
0.53
0.55
0.55
0.54
0.54
0.53
0.53
0.53
0.53
0.53
0.53
0.52
0.52
0.53
0.53
0.53
0.53
0.52
0.52
0.53
0.53
0.50
0.50
0.52
0.52
0.53
0.53
0.52
0.52
Past return
AUC Accuracy
0.51
0.51
0.51
0.51
0.53
0.53
0.51
0.51
0.50
0.50
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.53
0.53
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.50
0.50
0.55
0.55
0.56
0.56
0.51
0.51
0.51
0.51
0.54
0.54
0.51
0.51
0.50
0.50
0.51
0.51
0.51
0.51
0.53
0.53
0.50
0.50
0.52
0.52
0.51
0.51
0.52
0.52
0.50
0.50
0.51
0.51
0.51
0.51
0.51
0.51
0.50
0.50
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
Table C.1: The quality of the binary prediction: 1-minute prediction AUC and accuracy per stock
C.2
Results for the four-class classification
178
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
Order book imbalance
Gain
σ(Gain)
1388
1201
1603
1112
2775
1219
1969
1278
1156
1102
1269
1055
1954
1537
1330
1219
1591
993
1120
1608
1878
1572
4144
1881
2003
1373
1380
1275
1251
1372
1410
1113
1586
1416
1762
1315
3723
1655
2996
1185
2245
1193
1256
956
3977
1764
1195
1763
2031
1227
2220
1433
1511
1564
4019
1911
2481
1452
2445
1220
1895
1107
2367
1109
1978
1173
2694
1451
1323
1348
1717
1535
1368
1040
1225
1022
1612
1359
1108
983
1419
1294
2694
1267
3039
2025
1402
766
2142
1223
2044
1440
Flow quantity
Gain σ(Gain)
1107
1308
996
1005
221
1107
1244
1316
921
1311
1142
1251
1866
1700
1240
1325
958
1143
831
1620
1461
1601
2853
1691
674
1428
1130
1228
905
1405
1252
1211
848
1196
1523
1295
295
1384
321
1161
481
1722
831
977
177
1324
853
1896
934
1389
1626
1514
1493
1491
153
1787
1742
1525
533
1148
791
1485
894
1242
1670
1565
1700
1607
1475
1880
1393
1577
1118
1123
939
1071
1209
1449
967
1196
1014
1275
1156
1341
382
1850
551
860
1114
1391
1165
1397
Past return
Gain σ(Gain)
174
1264
169
936
638
1175
190
1419
2
1185
289
1296
595
1934
347
1394
231
1196
526
1911
600
1665
1496
1542
582
1603
208
1390
310
1672
376
1113
308
1298
12
1281
1219
1307
1109
1201
323
1445
326
950
1210
1577
643
2060
156
1355
566
1403
217
1720
1048
1954
145
1344
613
1267
194
1006
438
1220
182
1251
292
1558
307
1747
383
1684
107
1190
117
1084
455
1607
164
1124
379
1436
290
1194
683
2002
222
949
244
1326
225
1359
Table C.2: The quality of the binary prediction: The daily gain average and standard deviation for the
1-minute prediction (without trading costs)
Notice that the nans on the tables of the Appendix 2 correspond to the cases where |b
Y| is always lower
179
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
Order book imbalance
Gain
σ(Gain)
-191
1189
81
1112
1141
1063
370
1179
-422
1064
-363
1002
303
1477
-260
1176
-40
963
-402
1596
251
1486
2971
1714
313
1299
-231
1243
-394
1368
-170
1072
50
1407
185
1265
2151
1456
1513
971
583
1096
-362
934
2369
1565
-405
1718
402
1140
762
1332
-186
1519
2333
1715
1110
1375
831
1119
366
1011
816
985
377
1113
1233
1308
-182
1251
205
1431
-279
998
-340
1000
-48
1326
-472
966
-162
1263
1124
1130
1434
1940
-253
730
547
1113
446
1373
Flow quantity
Gain σ(Gain)
-788
1325
-980
1057
-1199
1309
-697
1335
-955
1338
-734
1249
-58
1681
-530
1263
-906
1164
-1022
1618
-492
1606
934
1612
-1064
1488
-748
1235
-959
1423
-656
1224
-949
1225
-389
1296
-1069
1610
-1136
1375
-1108
1887
-1058
1024
-1403
1539
-846
1901
-951
1438
-312
1503
-450
1470
-1081
1822
-195
1535
-1183
1296
-1019
1490
-797
1274
-272
1575
-184
1585
-399
1864
-492
1566
-720
1127
-944
1093
-694
1463
-898
1209
-872
1296
-686
1342
-896
1953
-1246
938
-804
1386
-785
1408
Past return
Gain σ(Gain)
-1222
1531
-1211
1164
-952
1162
-1301
1574
-1298
1558
-1122
1503
-910
2027
-1256
1510
-1246
1369
-1115
1998
-975
1690
-27
1549
-1152
1560
-1206
1529
-1277
1819
-1093
1324
-1128
1516
-1104
1575
-329
1198
-281
1046
-1047
1592
-1278
1206
-484
1490
-968
2002
-1249
1513
-1094
1475
-1186
1890
-517
1820
-1155
1457
-928
1235
-1260
1177
-982
1236
-1255
1490
-1188
1713
-1122
1960
-1064
1822
-1382
1454
-1428
1277
-1060
1655
-1353
1363
-1339
1493
-1044
1257
-738
2067
-1344
1142
-1186
1452
-979
1584
Table C.3: The quality of the binary prediction: The daily gain average and standard deviation for the
1-minute prediction (with trading costs)
than θ thus no positions are taken.
180
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
Order book imbalance
AUC
Accuracy
0.58
0.59
0.71
0.72
0.69
0.69
0.60
0.60
0.60
0.60
0.53
0.55
0.57
0.57
0.57
0.58
0.60
0.60
0.58
0.59
0.59
0.60
0.70
0.70
0.58
0.60
0.57
0.57
0.55
0.55
0.60
0.60
0.71
0.72
0.60
0.60
0.73
0.73
0.76
0.76
0.64
0.64
0.54
0.59
0.68
0.68
0.55
0.56
0.62
0.62
0.63
0.63
0.55
0.55
0.67
0.67
0.68
0.68
0.65
0.66
0.66
0.66
0.67
0.67
0.61
0.62
0.63
0.63
0.58
0.58
0.57
0.56
0.60
0.60
0.52
0.61
0.56
0.58
0.56
0.61
0.57
0.58
0.68
0.68
0.64
0.65
0.50
0.63
0.63
0.63
0.62
0.62
Flow quantity
AUC Accuracy
0.50
0.42
nan
nan
0.50
0.54
0.50
0.54
nan
nan
0.50
0.59
0.55
0.55
0.55
0.55
nan
nan
0.50
0.50
0.50
0.56
0.64
0.64
nan
nan
0.50
0.51
0.54
0.56
0.55
0.56
nan
nan
0.50
0.55
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
0.50
0.54
nan
nan
0.56
0.56
0.54
0.55
nan
nan
0.58
0.58
nan
nan
0.55
0.55
nan
nan
0.50
0.51
0.53
0.58
0.53
0.55
0.52
0.51
nan
nan
0.50
0.56
0.54
0.58
0.55
0.56
nan
nan
0.53
0.57
0.50
0.54
nan
nan
nan
nan
0.49
0.49
Past return
AUC Accuracy
0.50
0.50
0.50
0.58
0.61
0.61
0.48
0.48
0.49
0.50
0.50
0.56
0.55
0.56
0.54
0.55
0.58
0.58
0.52
0.53
0.56
0.56
0.55
0.56
0.56
0.56
0.54
0.54
0.52
0.52
0.56
0.56
0.51
0.51
0.52
0.56
0.57
0.60
0.61
0.61
0.53
0.53
0.50
0.46
0.60
0.60
0.52
0.54
0.53
0.53
0.57
0.57
0.52
0.55
0.58
0.58
0.55
0.55
0.58
0.58
0.54
0.54
0.58
0.58
0.52
0.54
0.57
0.57
0.52
0.52
0.58
0.58
0.50
0.60
0.52
0.54
0.54
0.55
0.59
0.59
0.56
0.57
0.56
0.56
0.57
0.57
nan
nan
0.51
0.52
0.52
0.53
Table C.4: The quality of the 4-class prediction: 1-minute prediction AUC and accuracy per stock
181
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
Order book imbalance
Gain
σ(Gain)
137
388
306
577
1363
779
440
651
87
287
21
128
390
665
107
281
168
366
171
428
486
715
2534
1240
594
786
93
289
34
224
154
451
488
827
351
596
2219
1056
2000
773
651
680
10
93
2520
1420
184
503
504
692
738
951
109
373
2512
1248
1039
914
930
847
370
533
800
674
440
613
1234
1013
192
556
228
501
26
127
50
196
210
519
26
139
123
434
1402
825
1316
1393
16
104
583
826
530
745
Flow quantity
Gain σ(Gain)
-6
98
0
0
4
47
5
63
0
0
14
137
273
669
47
276
0
0
3
66
11
139
1364
1077
0
0
2
24
38
212
13
111
0
0
17
164
0
0
0
0
0
0
0
0
0
0
2
25
0
0
155
512
59
296
0
0
151
587
0
0
26
145
0
0
6
94
142
445
85
380
4
158
0
0
24
187
30
186
31
198
0
0
36
232
17
197
0
0
0
0
-0
78
Past return
Gain σ(Gain)
4
131
3
42
68
276
-2
132
-2
48
14
99
208
582
52
238
4
47
44
453
136
469
202
560
55
320
16
191
12
291
27
147
3
66
10
106
193
503
110
300
10
168
1
38
249
756
56
410
21
171
115
409
7
138
185
731
44
223
64
277
3
50
22
112
11
116
110
555
29
364
168
635
6
90
6
200
88
362
28
162
37
214
34
205
247
835
0
0
5
141
1
215
Table C.5: The quality of the 4-class prediction: The daily gain average and standard deviation for the
1-minute prediction (without trading costs)
C.3
Performances of the OLS method
C.4
Performances of the ridge method
182
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
Order book imbalance
Gain
σ(Gain)
22
263
128
329
586
559
125
408
15
189
-14
105
107
507
-1
193
21
210
34
271
116
506
1848
1102
174
550
-7
245
-23
204
38
281
241
526
88
388
1338
881
1082
613
185
475
-5
72
1518
1179
2
412
142
464
340
722
-13
329
1514
1096
547
702
372
581
111
322
285
443
113
417
611
809
38
450
20
392
1
69
2
120
25
403
2
74
16
317
656
663
693
1159
1
56
214
617
171
545
Flow quantity
Gain σ(Gain)
-9
150
0
0
-1
16
-0
32
0
0
1
86
31
465
1
199
0
0
-12
126
-8
131
518
844
0
0
-1
14
8
122
-3
73
0
0
-14
157
0
0
0
0
0
0
0
0
0
0
-2
24
0
0
28
331
6
209
0
0
3
400
0
0
-3
62
0
0
-6
96
40
254
-2
299
-31
203
0
0
-4
137
-1
114
-2
89
0
0
6
139
-5
173
0
0
0
0
-7
115
Past return
Gain σ(Gain)
-38
183
1
31
8
194
-25
168
-7
51
-2
77
16
474
-12
184
-4
42
-65
481
18
362
18
442
-23
274
-32
199
-33
311
-5
111
-10
79
-4
91
-18
443
-25
211
-22
173
-3
49
58
636
-41
394
-8
126
18
292
-12
147
-20
658
-10
198
-11
169
-6
46
-5
85
-8
105
27
437
-42
372
49
463
-0
79
-30
207
-7
289
-3
141
-14
195
-7
183
19
628
0
0
-27
175
-45
246
Table C.6: The quality of the 4-class prediction: The daily gain average and standard deviation for the
1-minute prediction (with trading costs)
C.5
Performances of the LASSO method
183
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
Order book imbalance
Gain
σ(Gain)
-11
887
-57
669
-14
762
-87
862
-20
807
38
759
-16
1263
-25
783
-61
744
-28
998
4
1108
75
962
12
1054
75
872
54
1054
110
761
27
722
29
830
27
991
7
628
-70
911
49
660
18
1011
53
1413
59
906
3
1017
-21
1138
-128
1359
-8
894
-36
831
29
641
-19
671
-24
844
-87
878
32
1132
2
1150
-29
810
4
683
-66
996
127
771
-31
896
-12
759
130
1529
5
543
21
874
71
929
Flow quantity
Gain σ(Gain)
-6
845
-17
633
69
689
43
1075
-3
781
-93
774
-63
1138
-23
923
19
726
-2
1179
-135
1082
-105
1161
6
1055
-51
825
-89
1152
80
742
81
700
43
827
-40
971
-18
651
-4
963
108
689
2
1094
67
1253
-24
847
-73
960
105
1205
-54
1329
-161
912
15
725
-25
688
31
755
24
789
-5
920
61
1217
-60
1072
25
856
-52
709
22
994
-35
802
-79
837
42
918
58
1498
31
546
-15
899
120
994
Past return
Gain σ(Gain)
-41
823
-21
624
-41
729
-29
897
-67
722
-46
765
16
1084
-13
901
-18
745
-9
1151
-52
972
-6
1117
49
1111
9
961
-35
996
100
743
-14
718
41
872
55
959
-16
645
65
826
73
669
11
1085
-5
1335
25
823
51
949
-80
1142
85
1288
17
860
-26
675
-7
727
15
727
-29
841
3
925
46
1140
48
1090
7
794
-15
682
-51
945
-59
725
8
838
111
912
81
1357
-26
508
6
859
75
1055
Table C.7: The quality of the 4-class prediction: The daily gain average and standard deviation for the
30-minute prediction (without trading costs)
184
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
Order book imbalance
Gain
σ(Gain)
-55
887
-96
672
-57
764
-132
863
-61
809
-7
758
-58
1265
-65
784
-101
743
-71
997
-39
1110
31
960
-31
1052
36
874
9
1053
72
763
-12
722
-9
830
-17
993
-36
627
-106
911
10
661
-26
1011
10
1415
14
905
-40
1016
-63
1137
-172
1359
-48
896
-82
830
-13
641
-57
674
-65
845
-128
877
-7
1130
-37
1149
-67
810
-37
683
-105
997
84
772
-73
896
-49
760
84
1529
-39
543
-24
874
33
929
Flow quantity
Gain σ(Gain)
-51
845
-61
635
23
687
-6
1072
-47
780
-136
777
-108
1137
-69
923
-25
726
-46
1180
-182
1085
-152
1163
-37
1054
-93
825
-138
1151
40
742
36
702
-1
828
-81
974
-58
652
-45
965
66
690
-44
1096
19
1252
-70
847
-117
962
58
1207
-97
1327
-204
913
-30
725
-66
691
-9
754
-23
788
-52
920
15
1218
-103
1073
-21
856
-100
709
-23
995
-77
805
-123
836
-4
919
15
1499
-14
545
-61
900
76
995
Past return
Gain σ(Gain)
-84
824
-62
625
-80
731
-73
896
-108
724
-84
767
-25
1082
-53
902
-60
742
-51
1149
-94
972
-48
1116
4
1109
-31
961
-77
997
65
743
-53
720
-2
869
17
959
-57
642
22
824
34
666
-32
1087
-51
1336
-16
818
5
947
-122
1144
47
1290
-22
859
-68
675
-49
728
-22
728
-71
839
-41
920
5
1140
6
1089
-34
797
-60
680
-93
946
-98
725
-34
838
68
913
40
1359
-67
509
-37
856
38
1058
Table C.8: The quality of the binary prediction: The daily gain average and standard deviation for the
30-minute prediction (with 0.5 bp trading costs)
185
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
1-min horizon
AUC Accuracy
0.54
0.54
0.57
0.57
0.61
0.61
0.55
0.55
0.54
0.54
0.54
0.54
0.54
0.54
0.55
0.55
0.56
0.56
0.53
0.53
0.55
0.55
0.62
0.62
0.55
0.55
0.54
0.54
0.53
0.53
0.55
0.55
0.56
0.56
0.56
0.56
0.62
0.62
0.64
0.64
0.57
0.57
0.54
0.54
0.61
0.61
0.53
0.53
0.56
0.56
0.57
0.57
0.53
0.53
0.59
0.59
0.59
0.59
0.59
0.59
0.58
0.58
0.60
0.60
0.56
0.56
0.57
0.57
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.54
0.59
0.59
0.56
0.56
0.56
0.56
0.57
0.57
0.56
0.56
5-min horizon
AUC Accuracy
0.50
0.50
0.52
0.52
0.53
0.53
0.51
0.51
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.56
0.56
0.51
0.51
0.51
0.51
0.51
0.51
0.52
0.52
0.52
0.52
0.51
0.51
0.53
0.53
0.54
0.54
0.52
0.52
0.51
0.51
0.54
0.54
0.50
0.50
0.51
0.51
0.52
0.52
0.51
0.51
0.51
0.51
0.53
0.53
0.52
0.52
0.53
0.53
0.52
0.52
0.51
0.51
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.52
0.52
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
0.52
0.52
30-min horizon
AUC Accuracy
0.50
0.50
0.49
0.49
0.50
0.50
0.51
0.51
0.50
0.50
0.51
0.51
0.49
0.49
0.49
0.49
0.49
0.49
0.50
0.50
0.52
0.52
0.52
0.52
0.50
0.50
0.50
0.50
0.51
0.51
0.51
0.51
0.50
0.51
0.51
0.51
0.48
0.48
0.50
0.50
0.48
0.48
0.50
0.50
0.50
0.50
0.52
0.52
0.50
0.50
0.51
0.51
0.49
0.49
0.50
0.50
0.52
0.52
0.52
0.52
0.50
0.50
0.51
0.51
0.50
0.50
0.51
0.51
0.49
0.49
0.49
0.49
0.49
0.49
0.51
0.51
0.52
0.52
0.50
0.50
0.51
0.51
0.50
0.50
0.49
0.49
0.50
0.50
0.51
0.51
0.51
0.51
Table C.9: The quality of the OLS prediction: The AUC and the accuracy per stock for the different
horizons
186
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
1-min horizon
AUC Accuracy
0.54
0.54
0.57
0.57
0.61
0.61
0.55
0.55
0.54
0.54
0.54
0.54
0.54
0.54
0.55
0.55
0.56
0.56
0.53
0.53
0.55
0.55
0.62
0.62
0.56
0.55
0.54
0.54
0.53
0.53
0.55
0.55
0.56
0.56
0.56
0.56
0.62
0.62
0.65
0.65
0.57
0.57
0.54
0.54
0.62
0.62
0.53
0.53
0.57
0.57
0.57
0.57
0.53
0.53
0.60
0.60
0.59
0.59
0.59
0.59
0.59
0.59
0.60
0.60
0.56
0.56
0.58
0.58
0.54
0.54
0.54
0.54
0.54
0.54
0.55
0.55
0.54
0.54
0.54
0.54
0.55
0.55
0.59
0.59
0.57
0.57
0.56
0.56
0.57
0.57
0.57
0.57
5-min horizon
AUC Accuracy
0.50
0.50
0.52
0.52
0.53
0.53
0.51
0.51
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.56
0.56
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.52
0.52
0.52
0.51
0.51
0.53
0.53
0.54
0.54
0.52
0.52
0.51
0.51
0.54
0.54
0.50
0.50
0.52
0.52
0.53
0.53
0.51
0.50
0.52
0.52
0.53
0.53
0.52
0.52
0.53
0.53
0.52
0.52
0.51
0.51
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.52
0.52
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
0.52
0.52
30-min horizon
AUC Accuracy
0.50
0.50
0.50
0.50
0.49
0.49
0.51
0.51
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.52
0.52
0.52
0.52
0.50
0.50
0.50
0.50
0.51
0.51
0.51
0.51
0.51
0.52
0.51
0.51
0.48
0.48
0.50
0.50
0.48
0.48
0.50
0.50
0.51
0.51
0.53
0.52
0.50
0.50
0.52
0.52
0.50
0.50
0.50
0.50
0.52
0.52
0.50
0.50
0.50
0.50
0.51
0.51
0.49
0.49
0.52
0.52
0.50
0.50
0.50
0.50
0.51
0.51
0.51
0.51
0.52
0.52
0.51
0.51
0.50
0.50
0.51
0.51
0.49
0.49
0.49
0.49
0.51
0.51
0.51
0.51
Table C.10: The quality of the Ridge HKB prediction: The AUC and the accuracy per stock for the
different horizons
187
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
1-min horizon
AUC Accuracy
0.55
0.55
0.57
0.57
0.61
0.61
0.56
0.56
0.54
0.54
0.55
0.55
0.54
0.54
0.55
0.55
0.56
0.56
0.54
0.54
0.55
0.55
0.62
0.62
0.56
0.56
0.54
0.54
0.53
0.53
0.56
0.56
0.57
0.57
0.56
0.56
0.63
0.63
0.65
0.65
0.58
0.58
0.54
0.54
0.62
0.62
0.53
0.53
0.57
0.57
0.57
0.57
0.53
0.53
0.60
0.60
0.60
0.60
0.59
0.59
0.59
0.59
0.60
0.60
0.57
0.57
0.58
0.58
0.55
0.55
0.54
0.54
0.55
0.55
0.55
0.55
0.55
0.55
0.55
0.55
0.55
0.55
0.60
0.60
0.57
0.57
0.57
0.57
0.58
0.58
0.57
0.57
5-min horizon
AUC Accuracy
0.52
0.52
0.53
0.53
0.54
0.54
0.52
0.52
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.52
0.52
0.51
0.51
0.57
0.57
0.51
0.51
0.52
0.52
0.51
0.51
0.52
0.53
0.52
0.52
0.51
0.51
0.54
0.54
0.55
0.55
0.52
0.52
0.52
0.52
0.55
0.55
0.50
0.50
0.52
0.52
0.53
0.53
0.51
0.51
0.53
0.53
0.54
0.54
0.52
0.52
0.54
0.54
0.53
0.53
0.52
0.52
0.53
0.53
0.51
0.51
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
0.52
0.52
0.52
0.52
0.53
0.53
0.52
0.52
0.51
0.51
0.52
0.52
0.52
0.52
30-min horizon
AUC Accuracy
0.50
0.50
0.49
0.49
0.50
0.50
0.52
0.52
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.49
0.49
0.50
0.50
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.52
0.52
0.51
0.52
0.52
0.52
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.51
0.51
0.51
0.49
0.49
0.52
0.52
0.50
0.50
0.52
0.52
0.50
0.50
0.48
0.48
0.51
0.51
0.50
0.50
0.50
0.50
0.51
0.51
0.50
0.50
0.51
0.51
0.49
0.49
0.51
0.51
0.50
0.50
0.51
0.51
0.52
0.52
0.51
0.52
0.50
0.50
0.51
0.51
0.49
0.49
0.51
0.51
0.51
0.51
0.50
0.50
Table C.11: The quality of the Ridge LW prediction: The AUC and the accuracy per stock for the
different horizons
188
Stock
INTERBREW
AIR LIQUIDE
ALLIANZ
ASML Holding NV
BASF AG
BAYER AG
BBVARGENTARIA
BAY MOT WERKE
DANONE
BNP PARIBAS
CARREFOUR
CRH PLC IRLANDE
AXA
DAIMLER CHRYSLER
DEUTSCHE BANK AG
VINCI
DEUTSCHE TELEKOM
ESSILOR INTERNATIONAL
ENEL
ENI
E.ON AG
TOTAL
GENERALI ASSIC
SOCIETE GENERALE
GDF SUEZ
IBERDROLA I
ING
INTESABCI
INDITEX
LVMH
MUNICH RE
LOREAL
PHILIPS ELECTR.
REPSOL
RWE ST
BANCO SAN CENTRAL HISPANO
SANOFI
SAP AG
SAINT GOBAIN
SIEMENS AG
SCHNEIDER ELECTRIC SA
TELEFONICA
UNICREDIT SPA
UNILEVER CERT
VIVENDI UNIVERSAL
VOLKSWAGEN
1-min horizon
AUC Accuracy
0.54
0.54
0.58
0.58
0.61
0.61
0.56
0.56
0.53
0.53
0.54
0.54
0.54
0.54
0.55
0.55
0.56
0.56
0.54
0.54
0.55
0.55
0.62
0.62
0.55
0.55
0.53
0.53
0.53
0.53
0.55
0.55
0.58
0.58
0.56
0.56
0.62
0.62
0.64
0.64
0.57
0.57
0.54
0.54
0.62
0.62
0.53
0.53
0.56
0.56
0.56
0.56
0.52
0.52
0.60
0.60
0.59
0.59
0.59
0.59
0.58
0.58
0.60
0.60
0.56
0.56
0.57
0.57
0.54
0.54
0.54
0.54
0.54
0.54
0.53
0.53
0.54
0.54
0.54
0.54
0.54
0.54
0.59
0.59
0.57
0.57
0.57
0.57
0.57
0.57
0.56
0.56
5-min horizon
AUC Accuracy
0.51
0.51
0.52
0.52
0.54
0.54
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.51
0.56
0.56
0.51
0.51
0.52
0.52
0.51
0.51
0.52
0.53
0.52
0.52
0.51
0.51
0.53
0.53
0.55
0.55
0.52
0.52
0.52
0.52
0.54
0.54
0.50
0.50
0.52
0.52
0.53
0.53
0.51
0.51
0.53
0.53
0.53
0.53
0.52
0.52
0.54
0.54
0.53
0.53
0.52
0.52
0.52
0.52
0.51
0.51
0.52
0.52
0.51
0.51
0.52
0.52
0.51
0.51
0.51
0.51
0.51
0.51
0.53
0.53
0.52
0.52
0.51
0.51
0.52
0.52
0.52
0.52
30-min horizon
AUC Accuracy
0.50
0.50
0.49
0.49
0.52
0.52
0.51
0.51
0.51
0.51
0.50
0.50
0.49
0.49
0.49
0.49
0.50
0.50
0.49
0.49
0.50
0.50
0.52
0.52
0.49
0.49
0.50
0.50
0.51
0.51
0.52
0.52
0.52
0.52
0.50
0.50
0.50
0.50
0.49
0.49
0.49
0.50
0.51
0.51
0.51
0.51
0.52
0.52
0.51
0.51
0.53
0.53
0.50
0.50
0.50
0.50
0.52
0.52
0.51
0.51
0.50
0.50
0.50
0.50
0.50
0.50
0.51
0.51
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.52
0.52
0.50
0.50
0.49
0.49
0.51
0.51
0.48
0.48
0.51
0.51
0.52
0.52
0.49
0.49
Table C.12: The quality of the LASSO prediction: The AUC and the accuracy per stock for the different
horizons
189
190
Bibliography
[1] Einstein A. Über die von der molekularkinetischen theorie der wärme geforderte bewegung von
in ruhenden flüssigkeiten suspendierten teilchen. Ann. d. Physik, 17:549–560, 1905.
[2] Einstein A. Zur theorie der brownschen bewegung. Ann. d. Physik, 19:371–381, 1906.
[3] F. Abergel, J.P. Bouchaud, T. Foucault, C.A. Lehalle, and M. Rosenbaum, editors. Market Microstructure: Confronting Many Viewpoints. Wiley, 2012.
[4] F. Abergel, B.K. Chakrabarti, Chakraborti, and A. Gosh, editors. Econophysics of systemic risk
and network dynamics. Springer, New economic windows, 2012.
[5] F. Abergel, B.K. Chakrabarti, A. Chakraborti, and M. Mitra, editors. Econophysics of orderdriven markets. Springer, 2011.
[6] F. Abergel, N. Huth, and I. Muni Toke. Financial bubbles analysis with a cross-sectional estimator.
Available at SSRN 1474612, 2009.
[7] F. Abergel and A. Jedidi. A mathematical approach to order book modelling. International
Journal of Theoretical and Applied Finance, 16(5), 2013.
[8] F. Abergel and A. Jedidi. Long time behaviour of a Hawkes process-based limit order book. 2015.
[9] F. Abergel, C.-A. Lehalle, and M. Rosenbaum. Understanding the stakes of high-frequency trading. Journal of Trading, 9(4):49–73, 2014.
[10] F. Abergel and M. Politi. Optimizing a basket against the efficient market hypothesis. Quantitative
Finance, 13(1):13–23, 2013.
[11] F. Abergel and R. Zaatour. What drives option prices? The Journal of Trading, 7(3):12–28, 2012.
[12] L. Adamopoulos. Cluster models for earthquakes: Regional comparisons. Journal of the International Association for Mathematical Geology, 8:463–475, 1976.
[13] Y. Aït-Sahalia, J. Cacho-Diaz, and R.J.A. Laeven. Modeling financial contagion using mutually
exciting jump processes. Working Paper 15850, National Bureau of Economic Research, mar
2010.
[14] Y. Aït-Sahalia and J. Jacod. Estimating the degree of activity of jumps in high frequency data.
The Annals of Statistics, 37(5A):2202–2244, 2009.
191
[15] Y. Aït-Sahalia and A. Mykland. The effects of random and discrete sampling when estimating
continuous-time diffusions. Technical Report 0276, National Bureau of Economic Research, Inc,
apr 2002.
[16] M. Aitken, A. Kua, P. Brown, T. Watter, and H.Y. Izan. An intraday analysis of the probability of
trading on the ASX at the asking price. Australian Journal of Management, 20(2):115–154, dec
1995.
[17] M. Al-Suhaibani and L. Kryzanowski. An exploratory analysis of the order book, and order
flow and execution on the saudi stock market. Journal of Banking & Finance, 24(8):1323–1357,
August 2000.
[18] V. Alfi, M. Cristelli, L. Pietronero, and A. Zaccaria. Minimal agent based model for financial
markets i - origin and self-organization of stylized facts. The European Physical Journal B,
67(3):13 pages, 2009.
[19] V. Alfi, M. Cristelli, L. Pietronero, and A. Zaccaria. Minimal agent based model for financial markets II. The European Physical Journal B - Condensed Matter and Complex Systems, 67(3):399–
417, February 2009.
[20] A. Alfonsi, A. Fruth, and A. Schied. Optimal execution strategies in limit order books with
general shape functions. Quantitative Finance, 10(2):143–157, 2010.
[21] R. Almgren, C. Thum, E. Hauptmann, and H. Li. Direct estimation of equity market impact.
working paper.
[22] M. Anane and F. Abergel. Empirical evidence of market inefficiency : Predicting single-stock
returns. Proceedings of the 8th Kolkata econophysics conference, Springer, 2015.
[23] M. Anane, F. Abergel, and X. F. Lu. Mathematical modeling of the order book: New approach of
predicting single-stock returns. 2015.
[24] S. Asmussen. Applied probability and queues, volume 51 of Applications of Mathematics (New
York). Springer-Verlag, New York, second edition, 2003.
[25] L. Bachelier. Théorie de la spéculation. Annales scientifiques de l’École Normale Supérieure,
Sér.3, 3:21–86, 1900.
[26] E. Bacry, K. Dayri, and J.F. Muzy. Non-parametric kernel estimation for symmetric hawkes
processes. application to high frequency financial data. European Physics Journal (B), 85, 2012.
[27] E. Bacry, S. Delattre, M. Hoffmann, and J. F. Muzy. Modelling microstructure noise with mutually
exciting point processes. Quantitative Finance, 13(1):65–77, 2013.
[28] E. Bacry, S. Delattre, M. Hoffmann, and J.F. Muzy. Scaling limits for Hawkes processes and application to financial statistics. Stochastic Processes and Applications, 123(7):2475–2499, 2013.
192
[29] P. Bak, M. Paczuski, and M. Shubik. Price variations in a stock market with many agents. Physica
A: Statistical and Theoretical Physics, 246(3-4):430 – 453, 1997.
[30] O. E. Barndorff-Nielsen and N. Shephard. Advances in Economics and Econometrics: Theory
and Applications, Ninth World Congress, chapter Variation, jumps, market frictions and high frequency data in financial econometrics. Econometric Society Monograph. Cambridge University
Press, 2007.
[31] O.E. Barndorff-Nielsen and N. Shephard. Econometric of testing for jumps in financial economics
using bipower variation. Oxford Financial Research Centre, 2004.
[32] M.S. Bartlett. The spectral analysis of point processes. Journal of the Royal Statistical Society,
(25):264–296, 1963.
[33] L. Bauwens and N. Hautsch. Dynamic latent factor models for intensity processes. CORE Discussion Papers 2003103, Université catholique de Louvain, Center for Operations Research and
Econometrics (CORE), 2003.
[34] L. Bauwens and N. Hautsch. Modelling financial high frequency data using point processes. In
Handbook of Financial Time Series, pages 953–979. 2009.
[35] C. Bayer, U. Horst, and J. Qiu. A functional limit theorem for limit order books. 2014.
[36] E. Bayraktar, U. Horst, and R. Sircar. Queueing theoretic approaches to financial price fluctuations. In Handbook of Financial Engineering. Elsevier, 2007.
[37] H. Beltran-Lopez, P. Giot, and J. Grammig. Commonalities in the order book. Financial Markets
and Portfolio Management, 23(3):209–242, 2009.
[38] N. Bershova and D. Rakhlin. The non-linear market impact of large trades: Evidence from buyside order flow. http://dx.doi.org/10.2139/ssrn.2197534, 2013.
[39] B. Biais, T. Foucault, and P. Hillion. Microstructure des marches financiers : Institutions, modeles
et tests empiriques. Presses Universitaires de France - PUF, May 1997.
[40] B. Biais, P. Hillion, and C. Spatt. An Empirical Analysis of the Limit Order Book and the Order
Flow in the Paris Bourse. The Journal of Finance, 50:1655–1689, 1995.
[41] B. Biais, P. Hillion, and C. Spatt. An empirical analysis of the limit order book and the order flow
in the paris bourse. Journal of Finance, 50(5):1655–1689, 1995.
[42] P. Billingsley. Convergence of Probability Measures. Wiley, 2nd ed., 1999.
[43] M Blatt, S. Wiseman, and E. Domany. Superparamagnetic clustering of data. Physical Review
Letters, 76(18):3251, April 1996.
[44] A. Blazejewski and R. Coggins. A local non-parametric model for trade sign inference. Physica
A: Statistical Mechanics and its Applications, 348(1):481–495, 2005.
193
[45] T. Bollerslev, R. F. Engle, and D. B. Nelson. Chapter 49 arch models. In R. F. Engle and D. L.
McFadden, editors, Handbook of Econometrics 4, pages 2959 – 3038. Elsevier, 1994.
[46] G. Bonanno, F. Lillo, and R. N. Mantegna. High-frequency cross-correlation in a set of stocks.
Quantitative Finance, 1(1):96, 2001.
[47] J.-P. Bouchaud, J. D. Farmer, and F. Lillo. How markets slowly digest changes in supply and
demand. In T. Hens and K. R. Schenk-Hoppé, editors, Handbook of Financial Markets: Dynamics
and Evolution. Elsevier, 2009.
[48] J.-P. Bouchaud, Y. Gefen, M. Potters, and M. Wyart. Fluctuations and response in financial
markets: The subtle nature of random price changes. Quantitative Finance, 4(2):176–190, 2004.
[49] J. P. Bouchaud, M. Mézard, and M. Potters. Statistical properties of stock order books: empirical
results and models. Quantitative Finance, 2:251–256, 2002.
[50] J.-P. Bouchaud, M. Mézard, and M. Potters. Statistical properties of stock order books: empirical
results and models. Quantitative Finance, 2:251–256, 2002.
[51] J.-P. Bouchaud and M. Potters. Theory of Financial Risks: From Statistical Physics to Risk
Management. Cambridge University Press, 1 edition, January 2000.
[52] J.P. Bouchaud. Economics needs a scientific revolution. Nature, 455(7217):1181, 2008.
[53] J.P. Bouchaud and M. Potters. Theory of Financial Risk and Derivative Pricing: From Statistical
Physics to Risk Management. Cambridge University Press, 2 edition, feb 2004.
[54] C. Bowsher. Modelling security market events in continuous time: Intensity based, multivariate
point process models. Nuffield College Economics Discussion Papers, nov 2002.
[55] C. Bowsher. Modelling security market events in continuous time: Intensity based, multivariate
point process models. Journal of Econometrics, 141:876–912, dec 2007.
[56] C. G. Bowsher. Modelling security market events in continuous time: Intensity based, multivariate
point process models. Journal of Econometrics, 141(2):876–912, December 2007.
[57] P. Brémaud and L. Massoulié. Stability of nonlinear Hawkes processes. Ann. Probab., 24:1563–
1588, 1996.
[58] P. Brémaud, L. Massoulié, and A. Ridolfi. Power spectra of random spike fields and related
processes. Adv. in Appl. Probab., 37:1116–1146, 2005.
[59] P. Brémaud. Markov chains: Gibbs fields, Monte Carlo simulation, and queues. Springer, 1999.
[60] William A. Brock and Cars H. Hommes. Heterogeneous beliefs and routes to chaos in a simple
asset pricing model. Journal of Economic Dynamics and Control, 22(8-9):1235–1274, August
1998.
194
[61] W. Buffett. The superinvestors of graham-and-doddsville. 1984.
[62] C. Cao, O. Hansch, and X. Wang Beardsley. The informational content of an open limit order
book. EFA 2004 Maastricht Meetings Paper, mar 2004.
[63] P. Cattiaux, D. Chafaï, and A. Guillin. Central limit theorems for additive functionals of ergodic
markov diffusions processes. Lat. Am. J. Probab. Math. Stat, 9(2):337–382, 2012.
[64] Anindya S. Chakrabarti and Bikas K. Chakrabarti. Microeconomics of the ideal gas like market
models. Physica A: Statistical Mechanics and its Applications, 388(19):4151 – 4158, 2009.
[65] Anindya S. Chakrabarti and Bikas K. Chakrabarti. Statistical theories of income and wealth
distribution. Economics E-Journal (open access), 4, 2010.
[66] Anindya Sundar Chakrabarti, Bikas K. Chakrabarti, Arnab Chatterjee, and Manipushpak Mitra.
The kolkata paise restaurant problem and resource utilization. Physica A: Statistical Mechanics
and its Applications, 388(12):2420–2426, June 2009.
[67] A. Chakraborti, I. Muni Toke, M. Patriarca, and F. Abergel. Econophysics review: I. empirical
facts. Quantitative Finance, 11(7):991–1012, 2011.
[68] A. Chakraborti, I. Muni Toke, M. Patriarca, and F. Abergel. Econophysics review: II. agent-based
models. Quantitative Finance, 11(7):1013–1041, 2011.
[69] A. Chakraborti, M. Patriarca, and M. S. Santhanam. Econophysics of Markets and Business
Networks, chapter Financial Time-series Analysis: a Brief Overview. Springer, Milan, 2007.
[70] Anirban Chakraborti, Ioane M. Toke, Marco Patriarca, and Frederic Abergel. Econophysics:
Empirical facts and agent-based models. Sep 2009. http://arxiv.org/abs/0909.1974.
[71] D. Challet and D. Morton de la Chapelle. Collective portfolio optimization in brokerage data: The
role of transaction cost structure. Market Microstructure: Confronting Many Viewpoints, 2012.
[72] D. Challet and D. Morton de la Chapelle. A robust measure of investor contrarian behaviour.
Econophysics of Systemic Risk and Network Dynamics, 2013.
[73] D. Challet and R. Stinchcombe. Analyzing and modeling 1+ 1d markets. Physica A: Statistical
Mechanics and its Applications, 300(1-2):285–299, 2001.
[74] D. Challet and R. Stinchcombe. Analyzing and modeling 1+1d markets. Physica A, 300:285–299,
2001.
[75] D. Challet and R. Stinchcombe. Non-constant rates and over-diffusive prices in a simple model
of limit order markets. Quantitative Finance, 3:155–162, 2003.
[76] Damien Challet, Matteo Marsili, and Yi-Cheng Zhang. Minority Games. Oxford University Press,
illustrated edition edition, November 2004.
195
[77] Damien Challet and Robin Stinchcombe. Analyzing and modeling 1+1d markets. Physica A:
Statistical Mechanics and its Applications, 300(1–2):285–299, November 2001.
[78] Ernest P. Chan. Quantitative Trading: How to Build Your Own Algorithmic Trading Business
(Wiley Trading). November 2008.
[79] M.L. Chaudhry and J.G.C. Templeton. A first course in bulk queues. John Wiley and Sons, 1983.
[80] S.S. Chen, D.L. Donoho, and M.A. Saunders. Atomic decomposition by basis pursuit. SIAM
Journal on Scientific Computing, 20(1):33–61, 1998.
[81] C. Chiarella and G. Iori. A simulation analysis of the microstructure of double auction markets.
Quantitative Finance, 2(5):346–353, 2002.
[82] Carl Chiarella, Xue-Zhong He, and Cars Hommes. A dynamic analysis of moving average rules.
Journal of Economic Dynamics and Control, 30(9-10):1729–1753, September 2006.
[83] E.S. Chornoboy, L.P. Schramm, and A.F. Karr. Maximum likelihood identification of neural point
process systems. Biol Cybern, 59:265–275, 1988.
[84] J. Claerbout and F. Muir. Robust modeling with erratic data. Geophysics, 38(5):826–844, 1973.
[85] P. K. Clark. A subordinated stochastic process model with finite variance for speculative prices.
Econometrica, 41(1):135–55, January 1973.
[86] D. Cliff and J. Bruten. Zero is not enough: On the lower limit of agent intelligence for continuous
double auction markets. Technical Report HPL-97-141, Hewlett-Packard Laboratories, Bristol,
UK, 1997.
[87] K.J. Cohen, S.F. Maier, R.A. Schwartz, and D.K. Whitcomb. A simulation model of stock exchange trading. Simulation, 41(5):181, 1983.
[88] M.H. Cohen. Report of the special study of the securities markets of the securities and exchanges
commission. Technical report, 1963. U.S. Congress, 88th Cong., 1st sess., H. Document 95.
[89] Milton H. Cohen. Reflections on the special study of the securities markets. Technical report,
May 1963. Speech at the Practising Law Intitute, New York.
[90] R. Cont. Empirical properties of asset returns: stylized facts and statistical issues. Quantitative
Finance, 1(2):223–236, 2001.
[91] R. Cont. Volatility clustering in financial markets: Empirical facts and Agent-Based models. In
Long Memory in Economics, pages 289–309. 2007.
[92] R. Cont and J.-P. Bouchaud. Herd behavior and aggregate fluctuations in financial markets.
Macroeconomic Dynamics, 4(02):170–196, 2000.
[93] R. Cont and A. de Larrard. Price dynamics in a markovian limit order book market. jan 2011.
196
[94] R. Cont and A. de Larrard. Price dynamics in a markovian limit order market. arXiv:1104.4596,
April 2011.
[95] R. Cont and A. de Larrard. Order book dynamics in liquid markets: limit theorems and diffusion
approximations. Preprint, 2012.
[96] R. Cont and A. de Larrard. Price dynamics in a markovian limit order book market. SIAM Journal
on Financial Mathematics, 2013.
[97] R. Cont, S. Stoikov, and R. Talreja. A stochastic model for order book dynamics. Operations
Research, 58:549–563, 2010.
[98] R. Cont and P. Tankov. Financial modelling with jump processes. Chapman & Hall/CRC, 2004.
[99] A. C. C. Coolen. The Mathematical Theory Of Minority Games: Statistical Mechanics Of Interacting Agents. Oxford University Press, illustrated edition edition, January 2005.
[100] D. R. Cox and V. Isham. An Introduction to the Theory of Point Processes. Chapman & Hall,
1980.
[101] J. Da Fonseca and R. Zaatour. Clustering and mean reversion in a Hawkes microstructure model.
Journal of Futures Markets, 2014.
[102] J. Da Fonseca and R. Zaatour. Hawkes process: Fast calibration, application to trade clustering,
and diffusive limit. Journal of Futures Markets, 34:548–579, 2014.
[103] D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes. Springer, 2003.
[104] D. J. Daley and D. Vere-Jones. An introduction to the theory of point processes. Vol. I. Probability
and its Applications (New York). Springer-Verlag, New York, second edition, 2003. Elementary
theory and methods.
[105] D. J. Daley and D. Vere-Jones. Scoring probability forecasts for point processes: the entropy
score and information gain. J. Appl. Probab., 41A:297–312, 2004. Stochastic methods and their
applications.
[106] D. J. Daley and D. Vere-Jones. An introduction to the theory of point processes. Vol. II. Probability
and its Applications (New York). Springer, New York, second edition, 2008. General theory and
structure.
[107] M. G. Daniels, J. D. Farmer, L. Gillemot, G. Iori, and E. Smith. Quantitative model of price
diffusion and market friction based on trading as a mechanistic random process. Physical Review
Letters, 90:549–563, 2003.
[108] L. de Haan, S. Resnick, and H. Drees. How to make a hill plot. The Annals of Statistics,
28(1):254–274, 2000.
197
[109] I.M. Dremin and A.V. Leonidov. On distribution of number of trades in different time windows
in the stock market. Physica A: Statistical Mechanics and its Applications, 353:388 – 402, 2005.
[110] D. Duffie. Dynamic asset pricing theory. Princeton University Press Princeton, NJ, 1996.
[111] M. Duflo. Méthodes récursives aléatoires. Masson, 1990.
[112] A. Dufour and R.F. Engle. Time and the price impact of a trade. JOURNAL OF FINANCE,
55:2467–2498, 2000.
[113] Z. Eisler, J.-P. Bouchaud, and J. Kockelkoren. Models for the impact of all order book events.
Market Microstructure: Confronting Many Viewpoints, pages 115–136, 2012.
[114] Z. Eisler, J. Kertesz, F. Lillo, and R. N. Mantegna. Diffusive behavior and the modeling of
characteristic times in limit order executions. Quantitative Finance, 9(5):547–563, 2009.
[115] P. Embrechts, T. Liniger, and L. Lin. Multivariate hawkes processes : an application to financial
data. Journal of Applied Probability, 48A:367–378, 2011.
[116] R. F. Engle. The econometrics of Ultra-High-Frequency data. Econometrica, 68(1):1–22, 2000.
[117] R. F. Engle and J. R. Russell. Forecasting the frequency of changes in quoted foreign exchange
prices with the autoregressive conditional duration model. Journal of Empirical Finance, 4(23):187–212, June 1997.
[118] R.F. Engle and A. Lunde. Trades and quotes: A bivariate point process. Journal of Financial
Econometrics, 1:159–188, 1998.
[119] R.F. Engle and J.R. Russell. Forecasting the frequency of changes in quoted foreign exchange
prices with the autoregressive conditional duration model. Journal of Empirical Finance, 4(23):187–212, 1997.
[120] R.F. Engle and J.R. Russell. Autoregressive conditional duration: A new model for irregularly
spaced transaction data. Econometrica, 66(5):1127–1162, sep 1998.
[121] T. W. Epps. Comovements in stock prices in the very short-run. Journal of the American Statistical Association, 74, 1979.
[122] Thomas W. Epps. Comovements in stock prices in the very short run. Journal of the American
Statistical Association, 74(366):291–298, June 1979.
[123] P. Erdos and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci,
5:17–61, 1960.
[124] E. Errais, K. Giesecke, and L. Goldberg. Affine point processes and portfolio credit risk. SIAM
Journal on Financial Mathematics, 2010.
[125] S. N. Ethier and T. G. Kurtz. Markov processes characterization and convergence. Wiley, 2005.
198
[126] E. F. Fama. Efficient capital markets : A review of theory and empirical work. Journal of finance,
1969.
[127] F. Lillo J. D. Farmer. The long memory of the efficient market. 2008.
[128] J. D. Farmer and D. Foley. The economy needs agent-based modelling. Nature, 460(7256):685–
686, 2009.
[129] J. D. Farmer, A. Gerig, F. Lillo, and S. Mike. Market efficiency and the long-memory of supply
and demand: is price impact variable and permanent or fixed and temporary?
Quantitative
Finance, 6(2):107–112, 2006.
[130] J. D. Farmer, A. Gerig, F. Lillo, and H. Waelbroeck. How efficiency shapes market impact. 2013.
[131] J. D. Farmer, L. Gillemot, F. Lillo, S. Mike, and A. Sen. What really causes large price changes?
Quantitative finance, 4(4):383–397, 2004.
[132] J. D. Farmer, L. Gillemot, F. Lillo, S. Mike, and A. Sen. What really causes large price changes?
Technical report, apr 2004.
[133] J. D. Farmer, P. Patelli, and I. I. Zovko. The predictive power of zero intelligence in financial
markets. Quantitative Finance Papers. cond-mat/0309233, arXiv.org, September 2003.
[134] J. D. Farmer, P. Patelli, and I. I. Zovko. The predictive power of zero intelligence in financial
markets. Proceedings of the National Academy of Sciences, 102(6):2254–2259, 2005.
[135] J.D. Farmer, A. Gerig, F. Lillo, and S. Mike. Market efficiency and the long-memory of supply and
demand: Is price impact variable and permanent or fixed and temporary? Quantitative Finance,
6(2):107–112, 2006.
[136] J.D. Farmer, P. Patelli, and I. Zovko. The predictive power of zero-intelligence in financial markets. PNAS, 102(1):2254–2259, 2005.
[137] W. Feller. Introduction to the Theory of Probability and its Applications, Vol. 2. Wiley, New York,
1968.
[138] A. Ferraris. Equity market impact models. 2008.
[139] T. Fletcher, Z. Hussain, and J. Shawe-Taylor. Multiple kernel learning on the limit order book.
Quantitative Finance, preprint, 2010.
[140] L. Foata, M. Vidhamali, and F. Abergel. Multi-agent order book simulation: Mono-and multiasset high-frequency market making strategies. Econophysics of Order-driven Markets, pages
139–152, 2011.
[141] Hans Follmer and Alexander Schied. Stochastic Finance: An Introduction In Discrete Time 2.
Walter de Gruyter & Co, 2nd revised edition edition, November 2004.
199
[142] T. Foucault, O. Kadan, and E. Kandel. Limit order book as a market for liquidity. Review of
Financial Studies, 18(4):1171–1217, December 2005.
[143] T. Foucault and A.J. Menkveld. Competition for order flow and smart order routing systems.
Journal of Finance, 63:119–158, 2008.
[144] J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models
via coordinate descent. Journal of Statistical Software, 33(1):1–22, feb 2010.
[145] G.M. Furnival and R.W. Wilson. Regression by leaps and bounds. Technometrics, 16:499–511,
1974.
[146] X. Gabaix, P. Gopikrishnan, V. Plerou, and H. E. Stanley. Institutional investors and stock market
volatility. Quarterly Journal of Economics, 121(2):461–504, May 2006.
[147] Xavier Gabaix. Power laws in economics and finance. Annual Review of Economics, 1(1):255–
294, 2009.
[148] M. Gallegati and A. P. Kirman, editors. Beyond the Representative Agent. Edward Elgar Publishing, 1st edition, November 1999.
[149] M. B Garman. Market microstructure. Journal of Financial Economics, 3(3):257 – 275, 1976.
[150] J. Gatheral. The volatility surface: a practitioner’s guide. Wiley, 2006.
[151] J. Gatheral. No-dynamic-arbitrage and market impact. Quantitative Finance, 10(7):749–759,
2010.
[152] J. Gatheral and R. Oomen. Zero-intelligence realized variance estimation. Finance and Stochastics, 14:249–283, 2010.
[153] J.F. Germain and F. Roueff.
Weak convergence of the regularization path in penalized m-
estimation. Scand. J. Stat., 37(3):477–495, 2010.
[154] L. R. Glosten. Is the electronic open limit order book inevitable?
The Journal of Finance,
49(4):1127–1161, September 1994.
[155] P. W. Glynn and S. P. Meyn. A Liapounov bound for solutions of the Poisson equation. Annals
of Probability, 24(2):916–931, 1996.
[156] D. K. Gode and S. Sunder. Allocative efficiency of markets with zero-intelligence traders: Market
as a partial substitute for individual rationality. The Journal of Political Economy, 101(1):119–
137, 1993.
[157] P Gopikrishnan, M. Meyer, L. A. Amaral, and H. E. Stanley. Inverse cubic law for the probability
distribution of stock price variations. The European Physical Journal B, 3:139, 1998.
[158] P. Gopikrishnan, V. Plerou, L. A. Amaral, M. Meyer, and H. E. Stanley. Scaling of the distribution
of fluctuations of financial market indices. Phys. Rev. E, 60(5):5305–5316, Nov 1999.
200
[159] P. Gopikrishnan, V. Plerou, X. Gabaix, and H. E. Stanley. Statistical properties of share volume
traded in financial markets. Physical Review - Series E-, 62(4; PART A):4493–4496, 2000.
[160] C. Gourieroux, J. Jasiak, and G. Le Fol. Intra-day market activity. Journal of Financial Markets,
2(3):193–226, aug 1999.
[161] J. E. Griffin and R. C. A. Oomen. Sampling returns for realized variance calculations: tick time
or transaction time? Econometric Reviews, 27(1):230–253, 2008.
[162] G Gu and W. Zhou. Emergence of long memory in stock volatility from a modified mike–farmer
model. Europhysics Letters, 86, 2009.
[163] G.-F. Gu, W. Chen, and W.-X. Zhou. Empirical shape function of limit-order books in the chinese stock market. Physica A: Statistical Mechanics and its Applications, 387(21):5182–5188,
September 2008.
[164] D.M. Guillaume, M.M. Dacorogna, R.R. Davé, U.A. Müller, R.B. Olsen, and O.V. Pictet. From
the bird’s eye to the microscope: A survey of new stylized facts of the intra-daily foreign exchange
markets. Finance and Stochastics, 1(2):95–129, 1997.
[165] N.H. Hakansson, A. Beja, and J. Kale. On the feasibility of automated market making by a
programmed specialist. Journal of Finance, pages 1–20, 1985.
[166] A. D. Hall and N. Hautsch. Modelling the buy and sell intensity in a limit order book market.
Journal of Financial Markets, 10(3):249–286, August 2007.
[167] J. Hasbrouck. Measuring the information content of stock trades. Journal of Finance, 46(1):179–
207, mar 1991.
[168] J. Hasbrouck. Empirical market microstructure: The institutions, economics, and econometrics
of securities trading. 2006.
[169] J. Hasbrouck. Empirical Market Microstructure: The Institutions, Economics, and Econometrics
of Securities Trading. Oxford University Press, USA, 2007.
[170] J. Hasbrouck. Measuring the information content of stock trades. The Journal of Finance,
46(1):179–207, 2012.
[171] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2011.
[172] T. Hastie, R. Tibshirani, and J.H. Friedman. The Elements of Statistical Learning. Springer,
corrected edition, jul 2003.
[173] J.A. Hausman, A.W. Lo, and A.C. MacKinlay. An ordered probit analysis of transaction stock
prices. Journal of Financial Economics, 31(3):319–379, jun 1992.
[174] N. Hautsch. Modelling irregularly spaced financial data. Springer, 2004.
201
[175] A. G. Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika,
58(1):83–90, April 1971.
[176] A. G. Hawkes and D. Oakes. A cluster process representation of a Self-Exciting process. Journal
of Applied Probability, 11(3):493–503, September 1974.
[177] A.G. Hawkes. Point spectra of some mutually exciting point processes. Journal of the Royal
Statistical Society. Series B (Methodological), 33(3):438–443, 1971.
[178] A.G. Hawkes and D. Oakes. A cluster process representation of a self-exciting process. Journal
of Applied Probability, 11(3):493–503, 1974.
[179] Takaki Hayashi and Nakahiro Yoshida. On covariance estimation of non-synchronously observed
diffusion processes. Bernoulli, 11(2):359–379, 2005.
[180] SL Heston. A closed-form solution for options with stochastic volatility with applications to bond
and currency options. Rev. Financ. Stud., 6(2):327–343, April 1993.
[181] P. Hewlett. Clustering of order arrivals, price impact and trade path optimisation. Workshop on
Financial Modeling, 2006.
[182] P. Hewlett. Clustering of order arrivals, price impact and trade path optimisation. In Workshop
on Financial Modeling with Jump processes, Ecole Polytechnique, 2006.
[183] B. M. Hill. A simple general approach to inference about the tail of a distribution. The Annals of
Statistics, 3(5):1163–1174, September 1975.
[184] A. E. Hoerl and R. W. Kennard. Ridge Regression: Biased Estimation for Nonorthogonal Problems. 1970.
[185] A.E. Hoerl, R.W. Kennard, and K.F. Baldwin. Ridge Regression: Some Simulations. 1975.
[186] B. Hollifield, R. A. Miller, and P. Sandås. Empirical analysis of limit order markets. The Review
of Economic Studies, 71(4):1027–1063, oct 2004.
[187] U. Horst and M. Paulsen. A law of large numbers for limit order books. 2015.
[188] W. HUANG, Y. Nakamori, and S. WANG. Forecasting stock market movement direction with
support vector machine. Computer Operations Research, 32(10):2513–2522, 2005.
[189] N. Huth and F. Abergel. The times change: Multivariate subordination, empirical facts. SSRN
eLibrary, March 2009.
[190] N. Huth and F. Abergel. High frequency correlation modelling. Econophysics of order-driven
markets, pages 189–202, 2011.
[191] N. Huth and F. Abergel. High frequency lead/lag relationships-empirical facts. arXiv preprint
arXiv:1111.7103, 2011.
202
[192] N. Huth and F. Abergel. The times change: multivariate subordination. empirical facts. Quantitative Finance, 12(1):1–10, 2012.
[193] Giulia Iori, Saqib Jafarey, and Francisco G. Padilla. Systemic risk on the interbank market.
Journal of Economic Behavior & Organization, 61(4):525–542, December 2006.
[194] Giulia Iori and Ovidiu V. Precup. Weighted network analysis of high-frequency cross-correlation
measures. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 75(3):036110–7,
March 2007.
[195] P. C. Ivanov, A. Yuen, B. Podobnik, and Y. Lee. Common scaling patterns in intertrade times of
u. s. stocks. Phys. Rev. E, 69(5):056107, May 2004.
[196] J. Jacod and A. N. Shyriaev. Limit theorems for stochastic processes. Springer, 2003.
[197] Jean Jacod. Multivariate point processes: predictable projection, radon-nikodym derivatives,
representation of martingales. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete,
31(3):235–253, 1975.
[198] E. Jondeau, A. Perilla, and M. Rockinger. Optimal liquidation strategies in illiquid markets. Swiss
Finance Institute Research Paper No.09-24, nov 2008.
[199] N. Kaldor. Economic growth and capital accumulation. The Theory of Capital, Macmillan,
London, 1961.
[200] N. El Karoui, S. Peng, and M. C. Quenez. Backward stochastic differential equations in finance.
Mathematical Finance, 7(1):1–71, 1997.
[201] D.B. Keim and A. Madhaven. The upstairs market for large-block transactions: Analysis and
measurement of price effects. Review of Financial Studies, 9(1):1–36, 1996.
[202] MG Kendall and A.B. Hill. The analysis of economic time-series-Part I: Prices. Journal of the
Royal Statistical Society. Series A (General), pages 11–34, 1953.
[203] W. Cheney D Kincaid. Numerical Mathematics and Computing. 2008.
[204] C. P. Kindleberger and R. Z. Aliber. Manias, Panics, And Crashes: A History Of Financial Crises.
John Wiley & Sons, fifth edition edition, 2005.
[205] A. P. Kirman. In R. Frydman and E.S. Phelps, editors, Individual Forecasting and Aggregate
Outcomes: Rational Expectations Examined, chapter On mistaken beliefs and resultant equilibria.
Cambridge University Press, 1983.
[206] A. P. Kirman. Whom or what does the representative individual represent?
The Journal of
Economic Perspectives, pages 117–136, 1992.
[207] A. P. Kirman. Ants, rationality, and recruitment. The Quarterly Journal of Economics, pages
137–156, 1993.
203
[208] A. P. Kirman. Reflections on interaction and markets. Quantitative Finance, 2(5):322–326, 2002.
[209] L. Koskinen. Handbook of financial time series. International Statistical Review, 78:148–149,
2010.
[210] U. Krengel and A. Brunel. Ergodic theorems. Walter de Gruyter, 1985.
[211] C.M. Kuan and T. Liu. Forecasting exchange rates using feedforward and recurrent neural networks. Journal of Applied Econometrics, 10:347–364, 1995.
[212] L. Kullmann, J. Toyli, J. Kertesz, A. Kanto, and K. Kaski. Characteristic times in stock market
indices. Physica A: Statistical Mechanics and its Applications, 269(1):98 – 110, 1999.
[213] A. S. Kyle. Continuous auctions and insider trading. Econometrica, 53(6):1315–1335, November
1985.
[214] P. Lakner, J. Reed, and S. Stoikov.
High frequency asymptotics for the limit order book.
http://pages.stern.nyu.edu/ jreed/Papers/LimitFinal.pdf, 2013. Preprint.
[215] J. Large. Measuring the resiliency of an electronic limit order book. Journal of Financial Markets,
10(1):1–25, February 2007.
[216] J.H. Large. Measuring the resiliency of an electronic limit order book. Journal of Financial
Markets, 10(1):1–25, 2007.
[217] J. F. Lawless and P. Wang. A Simulation Study of Ridge and Other Regression Estimators. 1976.
[218] B. LeBaron.
Agent-based computational finance.
Handbook of computational economics,
2:1187–1233, 2006.
[219] B. LeBaron. Agent-based financial markets: Matching stylized facts with style. In David C.
Colander, editor, Post Walrasian macroeconomics, page 439. Cambridge University Press, 2006.
[220] B LeBaron and R. Yamamoto. Long-memory in an order-driven market. Physica A: Statistical
Mechanics and its Applications, 383(1):85–89, September 2007.
[221] F. Lillo and J. D. Farmer. The long memory of the efficient market. Studies in Nonlinear Dynamics
& Econometrics, 8(3), 2004.
[222] F. Lillo and J. D. Farmer. The long memory of the efficient market. Studies in Nonlinear Dynamics
& Econometrics, 8(3), September 2004.
[223] F. Lillo, J. D. Farmer, and R. N. Mantegna. Econophysics: Master curve for price-impact function.
Nature, 421(6919):130, 129, 2003.
[224] F. Lillo, J. D. Farmer, and Mantegna R.N. Master curve for price impact function. Nature,
421(6919):129–130, 2003.
204
[225] F. Lillo and J.D. Farmer. The long memory of the efficient market. Studies in Nonlinear Dynamics
& Econometrics, 8(3), 2004.
[226] T.F. Liniger. Multivariate Hawkes Processes. PhD thesis, ETH Zürich, 2009.
[227] J. Linnainmaa and I. Rosu. Weather and time series determinants of liquidity in a limit order
market. 2009.
[228] A. Lo, A. MacKinlay, and J. Zhang. Econometric models of limit-order executions. NBER
WORKING PAPER SERIES, 1997.
[229] A. Lo, A. MacKinlay, and J. Zhang. Weather and time series determinants of liquidity in a limit
order market. 2009.
[230] A.W. Lo, A. Craig MacKinlay, and J. Zhang. Econometric models of limit-order executions.
Journal of Financial Economics, 65(1):31–71, jul 2002.
[231] T. Lux. Handbook of Research on Complexity, chapter Applications of Statistical Physics in
Finance and Economics. Edward Elgar Publishing Ltd, May 2009.
[232] T. Lux, M. Marchesi, and G. Italy. Volatility clustering in financial markets. International Journal
of Theoretical and Applied Finance, 3:675–702, 2000.
[233] T. Lux and F. Westerhoff. Economics crisis. Nature Physics, 5(1):2–3, 2009.
[234] Thomas Lux and D Sornette. On rational bubbles and fat tails. Journal of Money, Credit, and
Banking, 34(3a):589–610, 2002.
[235] B. G. Malkiel. A random walk down wall street. 1973.
[236] Paul Malliavin and Maria Elvira Mancino. Fourier series method for measurement of multivariate
volatilities. Finance and Stochastics, 6(1):49–61, 2002.
[237] B. Mandelbrot. The variation of certain speculative prices. Journal of business, 36(4):394, 1963.
[238] R. N. Mantegna and H. E. Stanley. Introduction to Econophysics: Correlations and Complexity
in Finance. Cambridge University Press, 1 edition, July 2007.
[239] G. Maruyama and H. Tanaka. Ergodic property of n-dimensional recurrent markov processes.
Mem. Fac. Sci. Kyushu Univ. Ser. A, 13:157–172, 1959.
[240] S. Maslov. Simple model of a limit order-driven market. Physica A: Statistical Mechanics and its
Applications, 278(3-4):571–578, 2000.
[241] S. Maslov and M. Mills. Price fluctuations from the order book perspective – empirical facts and
a simple model. Physica A: Statistical Mechanics and its Applications, 299(1-2):234–246, 2001.
[242] M. McAleer and M. C. Medeiros. Realized volatility: A review. Econometric Reviews, 27(1):10–
45, 2008.
205
[243] L. Meier, S. Van de Geer, and P. Buehlmann. High-dimensional additive modeling. jun 2008.
[244] S. Meyn and R.L. Tweedie. Stability of markovian processes I: criteria for discrete-time chains.
Advances in Applied Probability, 24:542–574, 1992.
[245] S. Meyn and R.L. Tweedie. Stability of markovian processes II: continuous-time processes and
sampled chains. Advances in Applied Probability, 25:487–517, 1993.
[246] S. Meyn and R.L. Tweedie. Stability of markovian processes III: Foster-lyapunov criteria for
continuous-time processes. Advances in Applied Probability, 25:518–548, 1993.
[247] S. Meyn and R.L. Tweedie. Markov Chains and Stochastic Stability. Cambridge University Press,
2nd ed, 2009.
[248] S. Mike and J. D. Farmer. An empirical behavioral model of liquidity and volatility. Journal of
Economic Dynamics and Control, 32(1):200–234, 2008.
[249] J. Møller and J.G. Rasmussen. Approximate simulation of hawkes processes, 2004.
[250] J. Møller and G.L. Torrisi. The pair correlation function of spatial Hawkes processes. Statist.
Probab. Lett., 77:995–1003, 2007.
[251] Jesper Møller and Jakob G. Rasmussen. Perfect simulation of hawkes processes. Advances in
Applied Probability, 37(3):629–646, 2005.
[252] I. Muni Toke. A note on analytical properties of a one-side order book model with exponential
order flows. Submitted.
[253] I. Muni Toke. Market Making in an order book model and its impact on the spread. In F. Abergel,
B. K. Chakrabarti, A. Chakraborti, and M. Mitra, editors, Econophysics of Order-driven Markets,
pages 49–64. Springer, 2011.
[254] J. Murphy. Technical analysis of the financial markets: A comprehensive guide to trading methods
and applications.
[255] R. Næs and J. A Skjeltorp. Order book characteristics and the volume–volatility relation: Empirical evidence from a limit order market. Journal of Financial Markets, 9(4):408–432, 2006.
[256] A. A. Obizhaeva and J. Wang. Optimal trading strategy and supply/demand dynamics. Journal
of Financial Markets, to appear, 2012.
[257] Y. Ogata. The asymptotic behaviour of maximum likelihood estimators for stationary point processes. Annals of the Institute of Statistical Mathematics, 30(1):243–261, dec 1978.
[258] Y. Ogata. On lewis’ simulation method for point processes. Information Theory, IEEE Transactions on, 27(1):23–31, jan 1981.
[259] Y. Ogata. On Lewis’ simulation method for point processes. IEEE Transactions on Information
Theory, 27(1):23–31, 1981.
206
[260] Y. Ogata. Statistical models for earthquake occurrences and residual analysis for point processes.
Journal of the American Statistical Association, pages 9–27, 1988.
[261] M. O’Hara. Market Microstructure Theory. Blackwell Publishers, second edition edition, November 1997.
[262] M. F. M. Osborne. Brownian motion in the stock market. Operations Research, 7(2):145–173,
March 1959.
[263] F. O’Sullivan, B.S. Yandell, and W.J. Raynor. Automatic smoothing of regression functions in
generalized linear models. the Journal of the American Statistical Association, 81:96–103, 1986.
[264] A. Pagan. The econometrics of financial markets. Journal of empirical finance, 3(1):15–102,
1996.
[265] C. Parlour and D. Seppi. Limit order markets: A survey. pages 61–96, 2008.
[266] C. A. PArlour and D. Seppi. Limit order markets: a survey. In Handbook of Financial Intermediation & Banking, pages 63–96. Elsevier, 2008.
[267] C.A. Parlour. Price dynamics in limit order markets. Review of Financial Studies, 11(4):789–816,
1998.
[268] V. Plerou, P. Gopikrishnan, L. A. Nunes Amaral, X. Gabaix, and E. H. Stanley. Economic fluctuations and anomalous diffusion. Phys. Rev. E, 62(3):R3023–R3026, Sep 2000.
[269] M. Politi and E. Scalas. Fitting the empirical distribution of intertrade durations. Physica A:
Statistical Mechanics and its Applications, 387(8-9):2025 – 2034, 2008.
[270] F. Pomponio and F. Abergel. Trade-throughs : Empirical facts - application to lead-lag measures.
SSRN eLibrary, 2010.
[271] F. Pomponio and F. Abergel. Trade-throughs: Empirical facts and application to lead-lag measures. Econophysics of Order-driven Markets, pages 3–16, 2011.
[272] F. Pomponio and F. Abergel. Trade-throughs: Empirical facts and application to lead-lag measures. In F. Abergel, B.K. Chakrabarti, A. Chakraborti, and M. Mitra, editors, Econophysics of
Order-driven Markets, New Economic Windows, pages 3–16. Springer Milan, 2011.
[273] F. Pomponio and I. Muni Toke. Modelling trade-throughs in a limited order book using Hawkes
processes. Economics: The Open-Access, Open-Assessment E-Journal, 6, 2012.
[274] D.C. Porter. The probability of a trade at the ask: An examination of interday and intraday
behavior. Journal of Financial and Quantitative Analysis, 27(2):209–227, 1992.
[275] M. Potters and J. P. Bouchaud. More statistical properties of order books and price impact.
Physica A: Statistical Mechanics and its Applications, 324(1-2):133–140, 2003.
207
[276] M. Potters and J.-P. Bouchaud. More statistical properties of order books and price impact. Physica A: Statistical Mechanics and its Applications, 324(1–2):133–140, June 2003.
[277] M. Potters and J.P. Bouchaud. More statistical properties of order books and price impact. Physica
A, 324(1-2):133–140, 2003.
[278] S. Predoiu, G. Shaikhet, and S. Shreve. Optimal execution in a general one-sided limit-order
book. SIAM Journal on Financial Mathematics, 2(1):183–212, January 2011.
[279] T. Preis, S. Golke, W. Paul, and J. J. Schneider. Multi-agent-based order book model of financial
markets. Europhysics Letters (EPL), 75(3):510–516, August 2006.
[280] T. Preis, S. Golke, W. Paul, and J. J. Schneider. Statistical analysis of financial returns for a
multiagent order book model of asset trading. Physical Review E, 76(1), 2007.
[281] G. Pristas. Limit order book and asset liquidity. Cuvillier, 2008.
[282] N. Privault. Understanding Markov Chains. Springer, 2013.
[283] M. Raberto, S. Cincotti, S.M. Focardi, and M. Marchesi. Agent-based simulation of a financial
market. Physica A: Statistical Mechanics and its Applications, 299(1-2):319–327, 2001.
[284] A. Rakotomamonjy, F. Bach, F. Canu, and Y. Grandvalet. Simplemkl. Journal of Machine Learning Research, 9:2491–2521, nov 2008.
[285] E. Renault and B.J.M. Werker. Stochatic volatility models with transaction time risk. 2005.
[286] R. Reno. A closer look at the epps effect. International Journal of Theoretical and Applied
Finance, 6(1):87–102, 2003.
[287] P. Reynaud-Bouret and E. Roy. Some non asymptotic tail estimates for Hawkes processes. Bull.
Belg. Math. Soc. Simon Stevin, 13:883–896, 2006.
[288] P. Reynaud-Bouret and S. Schbath. Adaptive estimation for hawkes processes; application to
genome analysis. Ann. Statist., 38:2781–2822, 2010.
[289] J. D. Riley. Solving systems of linear equations with a positive definite, symmetric, but possibly
ill-conditioned matrix. 1955.
[290] I. Rocu. A dynamic model of the limit order book. Review of Financial Studies, 22(11):4601–
4641, 2009.
[291] R. Roll. A simple implicit measure of the effective bid-ask spread in an efficient market. The
Journal of Finance, 49:1127–1139, 1984.
[292] J. R. Russell and T. Kim. A new model for limit order book dynamics. In Tim Bollerslev, Jeffrey
Russell, and Mark Watson, editors, Volatility and Time Series Econometrics: Essays in Honor of
Robert Engle. Oxford University Press, USA, April 2010.
208
[293] M. N Saha, B. N. Srivastava, and MN & Srivastava Saha. A treatise on heat. Indian Press,
Allahabad, 1950.
[294] E Samanidou, E Zschischang, D Stauffer, and T Lux. Agent-based models of financial markets.
Reports on Progress in Physics, 70(3):409–450, 2007.
[295] F. Santosa and W.W. Symes. Linear inversion of band-limited reflection seismograms. SIAM
Journal on Scientific and Statistical Computing, 7(4):1307–1330, 1986.
[296] J. B. Seaborn. Hypergeometric Functions and their Applications. Number 8 in Text in Applied
Mathematics. Springer, 1991.
[297] G. A. F. Seber and A. J. Lee. Linear Regression Analysis. Wiley, 2003.
[298] J. Shadbolt and J.G. Taylor. Neural networks and the financial markets: predicting, combining
and portfolio optimisation. 2002.
[299] A. C Silva and V. M. Yakovenko. Stochastic volatility of financial markets as the fluctuating rate
of trading: An empirical study. Physica A: Statistical Mechanics and its Applications, 382(1):278
– 285, 2007.
[300] S. Sinha, A. Chatterjee, A. Chakraborti, and B. K. Chakrabarti. Econophysics: An Introduction.
Wiley-VCH, 1 edition, November 2010.
[301] F. Slanina. Mean-field approximation for a limit order driven market model. Phys. Rev. E,
64(5):056136, Oct 2001.
[302] F. Slanina. Critical comparison of several order-book models for stock-market fluctuations. The
European Physical Journal B-Condensed Matter and Complex Systems, 61(2):225–240, 2008.
[303] E. Smith, J. D. Farmer, L. Gillemot, and S. Krishnamurthy. Statistical theory of the continuous
double auction. Quantitative Finance, 3(6):481–514, 2003.
[304] John N. Smithin. What Is Money? Routledge, October 1999.
[305] G.J. Stigler. Public regulation of the securities markets. Journal of Business, pages 117–142,
1964.
[306] N. Taleb. Fooled by randomness: The hidden role of chance in the markets and in life. 2001.
[307] H.L. Taylor, S.C. Banks, and J.F. McCoy. Deconvolution with the l 1 norm. Geophysics, 44(1):39–
52, 1979.
[308] S. Thurner, J. D Farmer, and J. Geanakoplos. Leverage causes fat tails and clustered volatility.
Preprint, 2009.
[309] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical
Society, Series B, 58(1):267–288, 1994.
209
[310] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Roayl Statistical
Society. Series B, 58(1):267–288, 1996.
[311] Ioane Muni Toke. "market making" behaviour in an order book model and its impact on the
bid-ask spread. Quantitative finance papers, arXiv.org, 2010.
[312] B. Tóth, Y. Lempérière, C. Deremble, J. de Lataillade, J. Kockelkoren, and J.-P. Bouchaud.
Anomalous price impact and the critical nature of liquidity in financial markets. Physical Review X, 1, 2011.
[313] G. Vidyamurthy. Pairs trading: Quantitative methods and analysis. 2004.
[314] J. Voit. The statistical mechanics of financial markets. Texts and monographs in physics. Springer,
3 edition, 2005.
[315] M. von Smoluchowski. Zur kinetischen theorie der brownschen molekularbewegung und der
suspensionen. Annalen der Physik, 326(14):756–780, 1906.
[316] R. F. Voss. Fractals in nature: from characterization to simulation. pages 21–70, 1988.
[317] S. Walczak. An empirical analysis of data requirements for financial forecasting with neural
networks. J. Manage. Inf. Syst., 17:203–222, 2001.
[318] P. Weber and B. Rosenow. Order book approach to price impact. Quantitative Finance, 5(4):357–
364, 2005.
[319] P. Weber and B. Rosenow. Large stock price changes: volume or liquidity? Quantitative Finance,
6(1):7–14, 2006.
[320] W. Whitt. Stochastic-process limits. Springer Series in Operations Research. Springer-Verlag,
New York, 2002. An introduction to stochastic-process limits and their application to queues.
[321] WIKIPEDIA.
High-frequency
trading.
http://en.wikipedia.org/wiki/
High-frequency_trading, jan 2011.
[322] M. Wyart and J.-P. Bouchaud. Self-referential behaviour, overreaction and conventions in financial markets. Journal of Economic Behavior & Organization, 63(1):1–24, May 2007.
[323] M. Wyart, J.-P. Bouchaud, J. Kockelkoren, M. Potters, and M. Vettorazzo. Relation between
bid-ask spread, impact and volatility in order-driven markets. Quantitative Finance, 8(1):41–57,
2008.
[324] J. Cacho-Diaz Y. Aït-Sahalia and R.J.A. Laeven. Modelling financial contagion using mutually
exciting hawkes processes. Preprint, 2013.
[325] Victor M. Yakovenko and J. Barkley Rosser. Colloquium: Statistical mechanics of money, wealth,
and income. Reviews of Modern Physics, 81(4):1703, December 2009.
210
[326] Y. Yudovina. A simple model of a limit order book. http://arxiv.org/abs/1205.7017, 2012.
Preprint.
[327] B. Zheng, E. Moulines, and F. Abergel. Price jump prediction in limit order book. Journal of
Mathematical Finance, 3(2):242–255, 2012.
[328] B. Zheng, F. Roueff, and F. Abergel. Modelling bid and ask prices using constrained hawkes
processes: Ergodicity and scaling limit. SIAM Journal on Financial Mathematics, 5:99–136,
2014.
[329] J. Zhu and T. Hastie. Kernel logistic regression and the import vector machine. Journal of
Computational and Graphical Statistics, 14:185–205, 2005.
[330] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the
Royal Statistical Society, 67(2):301–320, 2005.
211