Limit Order Books Frédéric Abergel, Marouane Anane, Anirban Chakraborti, Aymen Jedidi, Ioane Muni Toke March 19, 2015 2 Contents 1 A short introduction to limit order books 1 I Empirical properties of order driven markets 5 2 Statistical properties of limit order books: a survey 7 2.1 Time of arrivals of orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Volume of orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Placement of orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Cancellation of orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Intraday seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 The order book shape as a function of the average size of limit orders 21 4 Empirical evidence of market making and market taking 29 4.1 Arrival times of orders: event time is not enough . . . . . . . . . . . . . . . . . . . . . 29 4.2 Empirical evidence of “market making” . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 A reciprocal “market following limit” effect ? . . . . . . . . . . . . . . . . . . . . . . . 32 II Mathematical modelling of limit order books 35 5 Agent-based modeling of limit order books: a survey 37 5.1 Agent-based models as a mechanism for stylized facts . . . . . . . . . . . . . . . . . . . 37 5.2 Early order-driven market modelling: market microstructure and policy issues . . . . . . 39 5.3 5.2.1 A pioneer order book model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.2 Microstructure of the double auction . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2.3 Zero-intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Order-driven market modelling in Econophysics . . . . . . . . . . . . . . . . . . . . . . 41 5.3.1 The order book as a reaction-diffusion model . . . . . . . . . . . . . . . . . . . 41 5.3.2 Introducing market orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3.3 The order book as a deposition-evaporation process . . . . . . . . . . . . . . . . 44 5.4 Empirical zero-intelligence models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3 6 The mathematical structure of zero-intelligence limit order book models 6.1 An elementary approximation: perfect market making . . . . . . . . . . . . . . . . . . . 51 6.2 Order book dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.3 6.4 7 9 Model setup: Poissonian arrivals, reference frame and boundary conditions . . . 53 6.2.2 Analytical treatment of some limit order book models . . . . . . . . . . . . . . 54 6.2.3 Evolution of the order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.4 Infinitesimal generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.2.5 Price dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Ergodicity and diffusive limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3.1 Ergodicity of the order book and rate of convergence to the stationary state . . . 60 6.3.2 Large-scale limit of the price process . . . . . . . . . . . . . . . . . . . . . . . 61 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 65 7.1 The shape of a limit order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7.2 A link between the flows of order and the shape of an order book . . . . . . . . . . . . . 67 7.2.1 The basic queueing system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.2.2 A continuous extension of the basic model . . . . . . . . . . . . . . . . . . . . 69 Comparison to existing results on the shape of the order book . . . . . . . . . . . . . . . 71 7.3.1 Numerically simulated shape in [303] . . . . . . . . . . . . . . . . . . . . . . . 71 7.3.2 Empirical and analytical shape in [49] . . . . . . . . . . . . . . . . . . . . . . . 72 7.4 A model with varying sizes of limit orders . . . . . . . . . . . . . . . . . . . . . . . . . 75 7.5 Influence of the size of limit orders on the shape of the order book . . . . . . . . . . . . 78 7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Advanced modelling of limit order books 8.1 III 6.2.1 The order book as a queueing system 7.3 8 51 83 Towards non-trivial behaviours: modelling market interactions . . . . . . . . . . . . . . 83 8.1.1 Herding behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 8.1.2 Fundamentalists and trend followers . . . . . . . . . . . . . . . . . . . . . . . . 84 8.1.3 Model setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.1.4 The infinitesimal generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.2 Stability of the order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 8.3 Large scale limit of the price process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Simulation of limit order books 93 Numerical simulation of limit order books 9.1 Zero-intelligence limit order book simulator . . . . . . . . . . . . . . . . . . . . . . . . 96 9.1.1 9.2 95 Results for CAC 40 stocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Simulation of a limit order book modelled by Hawkes processes . . . . . . . . . . . . . 113 9.2.1 Adding dependence between order flows . . . . . . . . . . . . . . . . . . . . . 113 4 9.2.2 Numerical results on the order book . . . . . . . . . . . . . . . . . . . . . . . . 114 10 A generic limit order book simulator 119 10.1 Discrete event simulation components . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 10.1.1 Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 10.1.2 Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 10.2 Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 10.2.1 Basic order functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 10.2.2 Market orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 10.2.3 Limit orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 10.2.4 Iceberg orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 10.2.5 Cancel orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 10.2.6 Limit market orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 10.2.7 Always best orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 10.2.8 Virtual market orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 10.3 Order book and order queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 10.3.1 Order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 10.3.2 Local order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 10.3.3 Order queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 10.3.4 Remote order book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 10.4 Traders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 10.5 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 10.5.1 Strategy base classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 10.5.2 Concrete strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 10.5.3 Arbitrage trading strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 10.5.4 Adaptive strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 IV Imperfection and predictability in order-driven markets 11 Market imperfection and predictibility 141 143 11.1 Forecasting and market efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 11.2 Data, methodology and performances measures . . . . . . . . . . . . . . . . . . . . . . 144 11.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 11.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 11.2.3 Performance measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 11.3 Conditional probability matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 11.3.1 Binary method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 11.3.2 Four-class method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 11.4 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 11.4.1 Ordinary least squares (OLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 11.4.2 Ridge regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 5 11.4.3 Least Absolute Shrinkage and Selection Operator (LASSO) . . . . . . . . . . . 158 11.4.4 ELASTIC NET (EN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 V Appendices 165 A Limit order book data 167 A.1 Limit order book data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 A.2 Stylized facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 A.3 Market taking and making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 B Some useful mathematical notions 171 B.1 Point processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 B.1.1 Hawkes processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 B.2 Ergodic theory for Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 B.2.1 The ergodic theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 B.2.2 Stochastic stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 C Comparison of various prediction methods 177 C.1 Results for the binary classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 C.2 Results for the four-class classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 C.3 Performances of the OLS method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 C.4 Performances of the ridge method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 C.5 Performances of the LASSO method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 6 Foreword The chair of quantitative finance has been created at École Centrale Paris in 2007. Since its inception, it has devoted most of its research activities to the study of high frequency financial data. In particular, a sizeable portion of these researches have focused on the characterization of limit order books, the central object in every modern, electronic financial market. Due to its relevance to the understanding of financial markets, the limit order book has triggered quite a huge amount of research in the past twenty years, starting with the seminal work of [40] on the empirical analysis of the Paris exchange, and revitalized a few years later in a fascinating manner - in our opinion - in [303]. However, much as this subject is interesting, important and challenging, we realized that there was still no reference book on the subject, and we therefore decided to assemble in a single document a survey of the existing litterature on the subject, together with our own contributions on limit order books, whether they are pertaining to their statistical properties, mathematical modeling or numerical simulation. A large part of the material covered in this online book has been already presented in research or survey articles, in particular [67][68][7][8] [22][252][253] but it could not be found in a single place where the emphasis was set on a single object of interest. We hope that this work will help cover this gap in the abundant literature on market microstructure and limit order books, and that the reader - researcher, graduate student or praticioner - will appreciate finding most of the contemporary knowledge on this essential component of modern financial markets. Finally, we also warn the reader that most sections are constantly being refurbished/reorganized/rewritten, and that this online book may evolve for a long time before it reaches a satisfactory, stationary state! i ii Chapter 1 A short introduction to limit order books The vast majority of organized, electronic markets, in particular, all equity, futures and other listed derivatives markets, use a device called the limit order book in order to store in their central computer, the list of all the interests of market participants. A limit order book is essentially a file in a computer that contains all the orders sent to the market, with their characteristics such as the sign of the order (buy or sell), the price, the quantity, a timestamp giving the time the order was recorded by the market, and many other, often market-dependent information. In other words, the limit order book contains, at any given point in time, on a given market, the list of all the transactions that one could possibly enter into on this market. As such, the limit order books contains all the information available on a specific market, and moreover, its evolution in time describes the way the market moves under the influence of its participants. In particular, one can obviously extract the price dynamics from that of the limit order book, but of course, the latter is a much more complete, and complex object than the former. A market in which buyers and sellers meet via a limit order book is called an order-driven market. In order-driven markets, buy and sell orders are matched as they arrive over time subject to some priority rules. Priority is always based on price, and then, in most markets, on time1 , according to a FIFO 2 rule. Essentially, three types of orders can be submitted: • Limit order: an order specifying a price at which one is willing to buy or sell a certain number of shares - the limit order book is the list of all buy and sell limit orders, with their corresponding price and quantity, at any point in time; • Market order: an order to immediately buy or sell a certain quantity at the best available opposite quote; • Cancellation order: an order cancelling an existing limit order. In the microstructure literature, agents who submit limit orders are referred to as liquidity providers. Those who submit market orders are referred to as liquidity takers. In real markets, especially since the various deregulation waves that hit the US markets in 2005 and the european markets in 2007, see for instance [9], there is no such thing as a pure liquidity provider or taker, and this classification should 1 2 There are some noticeable exceptions of markets following a pro rata rule, but we will not focus on them first in, first out 1 (1) initial state (2) liquidity is taken (3) wide spread 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 -10 -10 -10 -20 -20 -20 -30 -30 -30 -40 -40 95 100 105 110 -40 95 100 105 110 95 (4) liquidity returns (5) liquidity returns 40 40 40 30 30 30 20 20 20 10 10 10 0 0 0 -10 -10 -10 -20 -20 -20 -30 -30 -30 -40 -40 95 100 105 110 100 105 110 (6) final state -40 95 100 105 110 95 100 105 110 Figure 1.1: Order book schematic illustration: a buy market order arrives and removes liquidity from the ask side, then sell limit orders are submitted and liquidity is restored. be understood as a convenient shorthand rather than a realistic description of the behaviour of market participants. Note that, depending on the market under consideration, there may exist variations of these three basic types described above, and we will at times consider more sophisticated types of orders. Limit orders are stored in the order book until they are either executed against an incoming market order or canceled. The ask price PA (or simply the ask) is the price of the best (i.e. lowest) limit sell order. The bid price PB is the price of the best (i.e. highest) limit buy order. The gap between the bid and the ask S := PA − PB , (1.0.1) is always positive and is called the spread. Prices are not continuous, but rather have a discrete resolution ∆P, the tick, which represents the smallest quantity by which they can change. We define the mid-price as the average between the bid and the ask P := PA + P B . 2 (1.0.2) The price dynamics is the result of the interplay between the incoming order flow and the order book [50]. Figure 1.1 is a schematic illustration of this process [138]. Note that we chose to represent quantities on the bid side of the book by non-positive numbers. 2 It is clear that the study of the empirical properties, as well as the mathematical modelling and numerical simulation, of limit order books is of paramount importance for the researcher keen on understanding financial markets. Our approach is to start with the data, trying to assess their empirical properties; then move on to mathematical models able to reproduce the observed properties; and finally, to present a framework for numerical simulations. Following empirical studies, the modelling of limit order book has been an area of intense research activity in the last 15 years. The remarkable interest in this area may have been partly motivated by sound scientific curiosity, but it has very definitely been enhanced by the rapidly growing use of algorithmic trading and high frequency trading. Schematically, two modeling approaches have been successful in capturing key properties of the order book—at least partially. The first approach, led by economists, models the interactions between rational agents who act strategically: they choose their trading decisions as solutions to individual utility maximization problems (see e.g. [266] and references therein). In the second approach, proposed by econophysicists, agents are described statistically. In the most basic expression in this line of research, they are supposed to act randomly. This approach is sometime referred to as zero-intelligence order book modeling, in the sense that the arrival times and placements of orders of various types are random and independent, the focus being more on the “mechanistic” aspects of the continuous double auction rather than the strategic interactions between agents. Despite this apparent limitation, statistical order book models do capture many salient features of real markets, and exhibit interesting, non-trivial mathematical properties that form the basis of a thorough understanding of limit order books. Of course, the practical interest of the mathematical models would be very limited, especially in the numerous cases where closed-form solutions are not available, if one could not simulate, and analyze the simulations of, limit order books. Besides presenting the numerical analysis of various models of limit order books, we also introduce a very general, flexible open-source library which can be used by researchers interested in the study of trading strategies in an order-driven market. The organization of this book naturally follows the scientific approach described above : the first part is devoted to the empirical properties of order driven markets, the second part, to the mathematical modelling of limit order books and the third, to their numerical analysis. A fourth part considers the subject with a slightly different, more applied angle, that of market imperfections. Each part starts with a survey of the existing scientific literature on the corresponding topic, and then continues with our own contributions. 3 4 Part I Empirical properties of order driven markets 5 Chapter 2 Statistical properties of limit order books: a survey The content of this chapter is based on two survey articles [67][68], updated to incorporate the contributions in the recent scientific literature on limit order books. The basic statistical properties of limit order books, as can be observed from real data, are described and studied. Variables crucial to a fine modeling of order flows and dynamics of order books are studied: time of arrival of orders, placement of orders, size of orders, shape of order book, etc. The computerization of financial markets in the second half of the 1980’s provided the empirical scientists with easier access to extensive data on order books. [40] is an early study of the new data flows on the newly (at that time) computerized Paris Bourse. Many subsequent papers offer complementary empirical findings and modeling, e.g. [159], [73], [241], [49], [275]. In this chapter, we try to summarize some of these empirical facts. The data set that we use is made precise in Appendix A. Let us recall that the markets we are dealing with are electronic order books with no official market maker, in which orders are submitted in a double auction and executions follow price/time priority. Different market mechanisms have been widely studied in the microstructure literature, see e.g. [149, 213, 154, 261, 39, 169]. We will not review this literature here (except [149] in Chapter 5), as this would be too large a digression. 2.1 Time of arrivals of orders The choice of the time count might be of prime importance when dealing with “stylized facts” of empirical financial time series. Several results pertaining to the subordination of stochastic processes ([85, 299]) have already demonstrated that the Poisson hypothesis for the arrival times of orders is not empirically verified. We compute the empirical distribution for interarrival times – or durations – of market orders on the stock BNP Paribas using our data set described in the previous section. The results are plotted in figures 2.1 and 2.2, both in linear and log scale. It is clearly observed that the exponential fit is not a good one. We check however that the Weibull distribution fit is potentially a very good one. Weibull distributions have been suggested for example in [195]. [269] also obtain good fits with q-exponential distributions. 7 100 BNPP.PA Lognormal Exponential Weibull Empirical density 10-1 10-2 10-3 10-4 10-5 10-6 1 10 100 Interarrival time Figure 2.1: Distribution of interarrival times for stock BNPP.PA in log-scale. In the Econometrics literature, these observations of non-Poissonian arrival times have given rise to a large trend of modelling of irregular financial data. [117] and [116] have introduced autoregressive condition duration or intensity models that may help modelling these processes of orders’ submission. See [174] for a textbook treatment. Using the same data, we compute the empirical distribution of the number of transactions in a given time period τ. Results are plotted in figure 2.3. It seems that the log-normal and the gamma distributions are both good candidates, however none of them really describes the empirical result, suggesting a complex structure of arrival of orders. A similar result on Russian stocks was presented in [109]. 2.2 Volume of orders Empirical studies show that the unconditional distribution of order size is very complex to characterize. [159] and [241] observe a power law decay with an exponent 1 + µ ≈ 2.3 − 2.7 for market orders and 1 + µ ≈ 2.0 for limit orders. [73] emphasize on a clustering property: orders tend to have a “round” size in packages of shares, and clusters are observed around 100’s and 1000’s. As of today, no consensus emerges in proposed models, and it is plausible that such a distribution varies very wildly with products and markets. In figure 2.4, we plot the distribution of volume of market orders for the four stocks composing 8 0.5 BNPP.PA Lognormal Exponential Weibull Empirical density 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 2 4 6 8 10 12 14 Interarrival time Figure 2.2: Distribution of interarrival times for stock BNPP.PA (Main body, linear scale). 9 0.06 τ = 1 minute τ = 5 minutes τ = 15 minutes τ = 30 minutes Empirical density 0.05 0.04 0.03 0.02 0.01 0 0 1 2 3 4 5 6 Normalized number of trades Figure 2.3: Distribution of the number of trades in a given time period τ for stock BNPP.PA. This empirical distribution is computed using data from 2007, October 1st until 2008, May 31st. 10 our benchmark. Quantities are normalized by their mean. Power-law coefficient is estimated by a Hill estimator (see e.g. [183, 108]). We find a power law with exponent 1 + µ ≈ 2.7 which confirms studies previously cited. Figure 2.5 displays the same distribution for limit orders (of all available limits). We find an average value of 1 + µ ≈ 2.1, consistent with previous studies. However, we note that the power law is a poorer fit in the case of limit orders: data normalized by their mean collapse badly on a single curve, and computed coefficients vary with stocks. 1 Power law ∝ x-2.7 Exponential ∝ e-x BNPP.PA FTE.PA RENA.PA SOGN.PA 0.1 Probability functions 0.01 0.001 0.0001 1e-05 1e-06 0.01 0.1 1 10 Normalized volume for market orders 100 1000 Figure 2.4: Distribution of volumes of market orders. Quantities are normalized by their mean. 2.3 Placement of orders Placement of arriving limit orders [49] observe a broad power-law placement around the best quotes on French stocks, confirmed in [275] on US stocks. Observed exponents are quite stable across stocks, but exchange dependent: 1 + µ ≈ 1.6 on the Paris Bourse, 1 + µ ≈ 2.0 on the New York Stock Exchange, 1 + µ ≈ 2.5 on the London Stock Exchange. [248] propose to fit the empirical distribution with a Student distribution with 1.3 degree of freedom. We plot the distribution of the following quantity computed on our data set, i.e. using only the first five limits of the order book: ∆p = b0 (t−) − b(t) (resp. a(t) − a0 (t−)) if an bid (resp. ask) order arrives at price b(t) (resp. a(t)), where b0 (t−) (resp.a0 (t−)) is the best bid (resp. ask) before the arrival of this order. Results are plotted on figures 2.6 (in semilog scale) and 2.7 (in linear scale). 11 These graphs 1 Power law ∝ x-2.1 Exponential ∝ e-x BNPP.PA FTE.PA RENA.PA SOGN.PA 0.1 Probability functions 0.01 0.001 0.0001 1e-05 1e-06 1e-07 0.01 0.1 1 10 Normalized volume of limit orders 100 1000 Figure 2.5: Distribution of normalized volumes of limit orders. Quantities are normalized by their mean. being computed with incomplete data (five best limits), we do not observe a placement as broad as in [49]. However, our data makes it clear that fat tails are observed. We also observe an asymmetry in the empirical distribution: the left side is less broad than the right side. Since the left side represent limit orders submitted inside the spread, this is expected. Thus, the empirical distribution of the placement of arriving limit orders is maximum at zero (same best quote). We then ask the question: How is it translated in terms of shape of the order book ? Average shape of the order book Contrary to what one might expect, it seems that the maximum of the average offered volume in an order book is located away from the best quotes (see e.g. [49]). Our data confirms this observation: the average quantity offered on the five best quotes grows with the level. This result is presented in figure 2.8. We also compute the average price of these levels in order to plot a cross-sectional graph similar to the ones presented in [40]. Our result is presented for stock BNP.PA in figure 2.9 and displays the expected shape. Results for other stocks are similar. We find that the average gap between two levels is constant among the five best bids and asks (less than one tick for FTE.PA, 1.5 tick for BNPP.PA, 2.0 ticks for SOGN.PA, 2.5 ticks for RENA.PA). We also find that the average spread is roughly twice as large the average gap (factor 1.5 for FTE.PA, 2 for BNPP.PA, 2.2 for SOGN.PA, 2.4 for RENA.PA). 12 Probability density function 100 BNPP.PA Gaussian Student 10-1 10-2 10-3 10-4 10-5 10-6 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 ∆p Figure 2.6: Placement of limit orders using the same best quote reference in semilog scale. Data used for this computation is BNP Paribas order book from September 1st, 2007, until May 31st, 2008. 2.4 Cancellation of orders [73] show that the distribution of the average lifetime of limit orders fits a power law with exponent 1 + µ ≈ 2.1 for cancelled limit orders, and 1 + µ ≈ 1.5 for executed limit orders. [248] find that in either case the exponential hypothesis (Poisson process) is not satisfied on the market. We compute the average lifetime of cancelled and executed orders on our dataset. Since our data does not include a unique identifier of a given order, we reconstruct life time orders as follows: each time a cancellation is detected, we go back through the history of limit order submission and look for a matching order with same price and same quantity. If an order is not matched, we discard the cancellation from our lifetime data. Results are presented in figure 2.10 and 2.11. We observe a power law decay with coefficients 1 + µ ≈ 1.3 − 1.6 for both cancelled and executed limit orders, with little variations among stocks. These results are a bit different than the ones presented in previous studies: similar for executed limit orders, but our data exhibits a lower decay as for cancelled orders. Note that the observed cut-off in the distribution for lifetimes above 20000 seconds is due to the fact that we do not take into account execution or cancellation of orders submitted on a previous day. 13 Probability density function 0.4 BNPP.PA Gaussian 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -0.1 -0.05 0 0.05 0.1 ∆p Figure 2.7: Placement of limit orders using the same best quote reference in linear scale. Data used for this computation is BNP Paribas order book from September 1st, 2007, until May 31st, 2008. 2.5 Intraday seasonality Activity on financial markets is of course not constant throughout the day. Figure 2.12 (resp. 2.13) plots the (normalized) number of market (resp. limit) orders arriving in a 5-minute interval. It is clear that a U-shape is observed (an ordinary least-square quadratic fit is plotted): the observed market activity is larger at the beginning and the end of the day, and more quiet around mid-day. Such a U-shaped curve is well-known, see [40], for example. On our data, we observe that the number of orders on a 5-minute interval can vary with a factor 10 throughout the day. [73] note that the average number of orders submitted to the market in a period ∆T vary wildly during the day. The authors also observe that these quantities for market orders and limit orders are highly correlated. Such a type of intraday variation of the global market activity is a well-known fact, already observed in [40], for example. 14 1.15 Average numbers of shares (Normalized by mean) 1.1 1.05 1 0.95 0.9 0.85 0.8 BNPP.PA FTE.PA RENA.PA SOGN.PA 0.75 0.7 -6 -4 -2 0 2 4 Level of limit orders (negative axis : bids ; positive axis : asks) Figure 2.8: Average quantity offered in the limit order book. 15 6 Average numbers of shares (Normalized by mean) 720000 700000 680000 660000 640000 620000 600000 BNPP.PA 580000 66.98 67 67.02 67.04 67.06 67.08 67.1 67.12 Level of limit orders (negative axis : bids ; positive axis : asks) Figure 2.9: Average limit order book: price and depth. 16 67.14 67.16 1 Power law ∝ x-1.4 BNPP.PA FTE.PA RENA.PA SOGN.PA 0.1 Probability functions 0.01 0.001 0.0001 1e-05 1e-06 1e-07 1 10 100 1000 Lifetime for cancelled limit orders 10000 Figure 2.10: Distribution of estimated lifetime of cancelled limit orders. 17 1 Power law ∝ x-1.5 BNPP.PA FTE.PA RENA.PA SOGN.PA 0.1 Probability functions 0.01 0.001 0.0001 1e-05 1e-06 1e-07 1 10 100 1000 Lifetime for executed limit orders 10000 Figure 2.11: Distribution of estimated lifetime of executed limit orders. 18 Number of market orders submitted in ∆t=5 minutes 3 2.5 2 1.5 1 0.5 BNPP.PA BNPP.PA quadratic fit FTE.PA FTE.PA quadratic fit 0 30000 35000 40000 45000 50000 Time of day (seconds) 55000 60000 Figure 2.12: Normalized average number of market orders in a 5-minute interval. 19 65000 3.5 Number of limit orders submitted in ∆t=5 minutes 3 2.5 2 1.5 1 0.5 BNPP.PA BNPP.PA quadratic fit FTE.PA FTE.PA quadratic fit 0 30000 35000 40000 45000 50000 Time of day (seconds) 55000 60000 Figure 2.13: Normalized average number of limit orders in a 5-minute interval. 20 65000 Chapter 3 The order book shape as a function of the average size of limit orders This short chapter presents an empirical study of the influence of the size of limit orders on the shape of the limit order book. It adresses the same point as, and confirms the theoretical findings of, the model introduced and studied in 7. We use Thomson-Reuters tick-by-tick data for fourteen stocks traded on the Paris stock exchange, from January 4th, 2010 to February 22nd, 2010. The fourteen stocks under investigation are: Air Liquide (AIRP.PA, chemicals), Alstom (ALSO.PA, transport and energy), Axa (AXAF.PA, insurance), BNP Paribas (BNPP.PA, banking), Bouygues (BOUY.PA, construction, telecom and media), Carrefour (CARR.PA, retail distribution), Danone (DANO.PA, milk and cereal products), Lagardère (LAGA.PA, media), Michelin (MICP.PA, tires manufacturing), Peugeot (PEUP.PA, vehicles manufacturing), Renault (RENA.PA, vehicles manufacturing), Sanofi (SASY.PA, healthcare), Vinci (SGEF.PA, construction and engineering), Ubisoft (UBIP.PA, video games). All these stocks except Ubisoft were included in the CAC 40 French index in January and February 2010, i.e. they are among the largest market capitalizations and most liquid stocks on the Paris stock exchange (see descriptive statistics below). All movements on the first 10 limits of the ask side and the bid side of the order book are available, which allows us to reconstruct the evolution of the first limits of the order book during the day. Each trading day is divided into 12 thirty-minute intervals from 10am to 4pm. We obtain T = 391 intervals for each stock1 . For each interval t = 1, . . . , T , and for each stock k = 1, . . . , 14, we compute the µ λ and market orders N λ total number of limit orders Nk,t k,t and the average sizes of limit orders Vk,t and µ market orders Vk,t . Table 3.1 gives the average number of orders and their volumes (the overline denotes P µ µ the average over the time intervals: N k = T1 Tt=1 Nk,t , and similarly for other quantities). The lowest average activity is observed on UBIP.PA and LAGA.PA (which are the only two stocks in the sample with less than 200 market orders and 4000 limit orders in average). The highest activity is observed on BNPP.PA (which is the only stock with more than 600 market orders and 6000 limit orders in average). The smallest sizes of orders are observed on AIRP.PA (108.25 and 205.4 for market and limit orders), and the largest sizes are observed on AXA.PA (535.6 and 876.6 for market and limit orders). 1 Three and a half days of trading are missing or with incomplete information in our dataset: January 15th, the morning of January 21st, February 18th and 19th. 21 Stock k AIRP.PA ALSO.PA AXAF.PA BNPP.PA BOUY.PA CARR.PA DANO.PA LAGA.PA MICP.PA PEUP.PA RENA.PA SASY.PA SGEF.PA UBIP.PA Nkµ Vkµ Nkλ Vkλ (min, max) 290.9 59 1015 349.7 68 1343 521.8 119 2169 773.7 160 4326 218.3 38 1021 309.5 49 1200 385.7 86 2393 140.8 22 544 301.2 45 1114 294.9 57 1170 463.2 103 1500 494.0 106 1722 366.1 70 1373 153.3 18 998 (min, max) 108.2 60.2 236.9 181.3 75.7 310.8 535.6 311.4 1037.4 203.1 113.1 379.5 227.4 113.1 421.9 275.3 156.1 517.3 214.4 112.6 820.3 201.5 87.2 376.8 137.9 74.4 235.4 333.2 159.1 657.7 266.9 133.7 477.2 241.4 128.7 511.7 188.4 107.7 452.3 384.9 198.4 675.1 (min, max) 4545.4 972 24999 5304.2 817 31861 4560.6 1162 20963 6586.0 946 42939 4544.5 652 25546 4391.4 813 14752 4922.1 1372 18186 3429.0 632 11319 5033.5 1012 23799 3790.6 967 14053 4957.6 1383 27851 4749.1 986 22373 5309.9 983 21372 1288.9 240 7078 (min, max) 205.4 153.6 295.7 289.2 234.5 389.6 876.6 603.4 1648.3 277.6 189.2 518.7 369.7 274.5 475.2 513.3 380.8 879.6 393.1 286.1 537.4 338.2 219.0 504.0 240.7 178.0 315.1 536.7 316.8 828.3 384.3 283.1 539.8 421.2 311.1 670.2 353.4 274.2 515.1 754.0 451.4 1121.0 Table 3.1: Basic statistics on the number of orders and the average volumes of orders per 30-minute time interval for each stock. 22 Figure 3.1: Mean-scaled shapes of the cumulative order book as a function of the distance (in ticks) from the opposite best price for the 14 stocks studied. We also compute the average cumulative order book shape at 1 to 10 ticks from the best opposite side. The average cumulative depth Bik,t is thus the quantity available in the order book for stock k in the price range {b(t) + 1, . . . , b(t) + i} (in ticks) for ask limit orders, or in the price range {a(t) − i, . . . , a(t) − 1} for bid limit orders, averaged over time during interval t. For i lower or equal to 10, our data is always complete since the first ten limits are available. For larger i however, we may not have the full data: 10+ j Bk,t is not exact if the spread reaches a level lower or equal to j ticks during interval t. Hence i is lower or equal to 10 in the following empirical analysis. Figure 3.1 plots the empirical average shapes P i 10 Bk = T1 Tt=1 Bik,t (arbitrarily scaled to Bk = 1 for easier comparison). Motivated by the theoretical results from Chapter 7, we investigate the influence of the number of λ and their average size V λ on the depth on the order book at the first limits. We expect limit orders Nk,t k,t a negative relationship with the number of limit orders and a positive relationship with the size of these orders, all others things being equal, i.e. the global market activity being held constant. A fairly natural proxy for the market activity is the traded volume (transactions) per period. The total volume of market µ µ orders submitted during interval t for the stock k is Nk,t Vk,t . We may also consider the total volume of λ Vλ . incoming orders Nk,t k,t Therefore, we have the following regression model for some book depth Bk,t : µ µ λ λ λ λ Bk,t = β1 Nk,t + β2 Vk,t + β3 Nk,t Vk,t + β4 Nk,t Vk,t + k,t . (3.0.1) We test this regression using Bk,t = B5k,t (cumulative depth up to 5 ticks away from the best opposite price), then using Bk,t = B10 k,t (cumulative depth up to 10 ticks away from the best opposite price) and 23 Figure 3.2: Mean-scaled number market orders as a function of the time of day (in seconds) for the 14 stocks studied. 5 finally using Bk,t = B10 k,t − Bk,t (cumulative volume between 6 and 10 ticks away from the best opposite λ Vλ . price). For each of these three models, we provide results with or without the interaction term Nk,t k,t All models are estimated as panel regression models with fixed effects, i.e. the error term k,t is the sum of a non-random stock specific component δk (fixed effect) and a random component ηk,t . While the regression coefficients are stock-independent, the variables δk translate the idiosyncratic characteristics of each stock. Note that we use data from 10am to 4pm each day, in order to avoid very active periods, right after the opening of the market or before its closing, in which our stationary Poisson model might not be relevant. By concentrating on the heart of the trading day, we focus on smoother variations of the studied variables. However, even with this restriction, the data may nonetheless exhibit some intraday seasonality, such as the well-known U-shaped curve of the number of submissions of orders, or the fact that the order book seems to grow slightly fuller during the day and decline in the end. This intraday seasonality may be observed by computing, for each stock, the (intraday) seasonal means of the variables of the model, e.g. the average of a given variable for a given stock at a given time of the day across the whole sample. This is illustrated for example in figures 3.2 and 3.3 where we plot, for the 14 stocks studied, the (mean-scaled) seasonal averages of the number of submitted market orders and the (meanscaled) seasonal averages of the cumulative size of the order book up to the tenth limit. For the sake of completeness, we run the statistical regressions defined at equation (3.0.1) both on raw data and on deseasonalized data (by subtracting the seasonal mean). Results are given in table 3.2 in the first case, and in table 3.3 in the latter one. λ V λ and irrespective of the deIn all cases, irrespective of the presence of the interaction term Nk,t k,t 24 25 λ λ Without Nk,t Vk,t Est. -0.24455434 8.70817318 — 0.00882725 0.079275 156.617 λ λ Without Nk,t Vk,t Est. -0.4576647 39.4968315 — 0.0097127 0.14202 301.1 λ λ Without Nk,t Vk,t Est. -0.21311036 30.78865834 — 0.00088543 0.1491 318.743 t-value -7.1143 27.4930 — 0.7487 t-value -10.4212 24.0566 — 5.6016 Std Err. 0.0439167 1.6418268 — 0.0017339 Std Err. 0.02995514 1.11987302 — 0.00118269 t-value -12.177 11.598 — 11.132 Std Err. 0.02008400 0.75084050 — 0.00079296 Sig. *** *** — *** < 2.2e-16 *** p-value 1.27e-12 < 2.2e-16 — 0.4541 < 2.2e-16 Sig. *** *** — *** *** < 2.2e-16 p-value < 2.2e-16 < 2.2e-16 — 2.228e-08 Sig. *** *** — *** p-value < 2.2e-16 < 2.2e-16 — < 2.2e-16 λ λ With Nk,t Vk,t Est. -8.6511e-02 1.0374e+01 -4.8483e-04 1.0794e-02 0.083945 124.994 λ λ With Nk,t Vk,t Est. -0.25599592 41.62260408 -0.00061866 0.01222198 0.14303 228.534 λ λ With Nk,t Vk,t Est. -0.16948494 31.24851002 -0.00013383 0.00142825 0.14925 239.292 Std Err. 0.05389411 1.21538277 0.00013744 0.00130749 Std Err. 0.07895182 1.78046702 0.00020134 0.00191540 Std Err. 3.6046e-02 8.1288e-01 9.1924e-05 8.7448e-04 t-value -3.1448 25.7108 -0.9737 1.0924 t-value -3.2424 23.3774 -3.0727 6.3809 t-value -2.4000 12.7622 -5.2743 12.3430 < 2.2e-16 p-value 0.001671 < 2.2e-16 0.330234 0.274723 < 2.2e-16 p-value 0.001192 < 2.2e-16 0.002132 1.906e-10 < 2.2e-16 p-value 0.01643 < 2.2e-16 1.384e-07 < 2.2e-16 *** Sig. ** *** *** Sig. ** *** ** *** *** Sig. * *** *** *** Table 3.2: Panel regression results for the models defined in equation (3.0.1), using raw data, for Bk,t = B5k,t (top panel), Bk,t = B10 k,t (middle panel) and 10 5 Bk,t = Bk,t − Bk,t (lower panel). In all case, data has 14 stocks and T = 391 time intervals, i.e. 5474 points. B5k,t Variables λ Nk,t λ Vk,t λ λ Nk,t Vk,t µ µ Nk,t Vk,t R2 F-statistic B10 k,t Variables λ Nk,t λ Vk,t λ λ Nk,t Vk,t µ µ Nk,t Vk,t R2 F-statistic 5 B10 k,t − Bk,t Variables λ Nk,t λ Vk,t λ λ Nk,t Vk,t µ µ Nk,t Vk,t R2 F-statistic 5 Bk,t Variables λ Nk,t λ Vk,t λ λ Nk,t Vk,t µ µ Nk,t Vk,t R2 F-statistic 10 Bk,t Variables λ Nk,t Vλ k,t λ λ Nk,t Vk,t µ µ Nk,t Vk,t R2 F-statistic 10 5 Bk,t − Bk,t Variables λ Nk,t λ Vk,t λ N λ Vk,t k,t µ µ Nk,t Vk,t R2 F-statistic λ λ Without Nk,t V k,t Est. Std Err. -0.2693183 0.0199388 14.8579856 0.6885427 — — -0.0167731 0.0036252 0.10337 209.718 λ λ Without Nk,t Vk,t Est. Std Err. -0.6663429 0.0423425 50.9522688 1.4622037 — — -0.0429541 0.0076985 0.20566 470.96 λ λ Without Nk,t V k,t Est. Std Err. -0.3970246 0.0293712 36.0942832 1.0142678 — — -0.0261809 0.0053401 0.20463 467.995 t-value -13.5072 21.5789 — -4.6268 t-value -15.7370 34.8462 — -5.5795 t-value -13.5175 35.5865 — -4.9027 < 2.2e-16 p-value < 2.2e-16 < 2.2e-16 — 9.728e-07 < 2.2e-16 p-value < 2.2e-16 < 2.2e-16 — 2.528e-08 < 2.2e-16 p-value < 2.2e-16 < 2.2e-16 — 3.799e-06 *** Sig. *** *** — *** *** Sig. *** *** — *** *** Sig. *** *** — *** λ λ With Nk,t V k,t Est. -0.2712167 15.2819415 -0.0017696 -0.0064651 0.10875 166.433 λ λ With Nk,t Vk,t Est. -0.67127949 52.05475178 -0.00460179 -0.01614842 0.2128 368.73 λ λ With Nk,t V k,t Est. -0.4000628 36.7728102 -0.0028322 -0.0096833 0.21026 363.153 Std Err. 0.0198836 0.6905058 0.0003085 0.0040367 Std Err. 0.04216152 1.46416246 0.00065416 0.00855951 Std Err. 0.0292738 1.0166060 0.0004542 0.0059431 t-value -13.6402 22.1315 -5.7361 -1.6016 t-value -15.9216 35.5526 -7.0347 -1.8866 t-value -13.6662 36.1721 -6.2356 -1.6293 < 2.2e-16 p-value < 2.2e-16 < 2.2e-16 4.84e-10 0.1033 < 2.2e-16 p-value < 2.2e-16 < 2.2e-16 2.243e-12 0.05927 < 2.2e-16 p-value < 2.2e-16 < 2.2e-16 1.021e-08 0.1093 *** Sig. *** *** *** *** Sig. *** *** *** . *** Sig. *** *** *** 5 (top panel), B 10 Table 3.3: Panel regression results for the models defined in equation (3.0.1), using deseasonalized data, for Bk,t = Bk,t k,t = Bk,t (middle 10 − B5 (lower panel). In all case, data has 14 stocks and T = 391 time intervals, i.e. 5474 points. panel) and Bk,t = Bk,t k,t 26 Figure 3.3: Mean-scaled shapes of the cumulative order book up to 10 ticks away from the opposite best price as a function of the time of day (in seconds), for the 14 stocks studied. seasonalization, we observe a positive relationship between the order book depth and the average size λ , and a negative relationship between the order book depth and the number of limit of limit orders Vk,t λ . Note also that the estimated β’s for these two quantities are all significant to the highest level orders Nk,t in all the cases using deseasonalized data, and to the 0.2% for all but one case using the raw data (2% level in this worst case). Therefore, all others things being equal, it thus appears we have identified the effect described in the theoretical model of Chapter 7, in particular in section 7.4: for a given total volume of arriving limit µ µ λ V λ and N V , the relative size of the limit orders has a strong influence on the and market orders Nk,t k,t k,t k,t average shape of the order book: the larger the arriving orders, the deeper the order book ; the more arriving limit orders, the shallower the order book. The average shape of the order book is deeper when a few large limit orders are submitted, than when many small limit orders are submitted. Remark 3.0.1 As a side comment, we may also look at the influence of the global volume of trades µ µ Nk,t Vk,t . Using raw data, we observe a positive influence of this term on the depth of the order book. At a first glance, this is in line with the phenomenon observed for example in [255], where a positive relationship between the number of trades and the order book slope of the first 5 limits of the order book is exhibited. This apparent similarity is to be taken with great caution, since the slope and the depth are two different variables: their relationship and the possible link between their evolutions is not known. Note in particular that the positive influence of the global volume of trades does not hold using deseasonalized data in our sample. A second observation made by [255] is that the influence of market activity on the order book depth is much stronger closer to the spread, and then decresases when taking into account further limits. We also observe this phenomenon using raw data: the lower panel of table 27 3.2 shows that this influence is not significant anymore if we take only the "furthest" limits, i.e. the limits between 6 and 10 ticks away from the best price. This is however unclear with deseasonalized data. 28 Chapter 4 Empirical evidence of market making and market taking In this chapter, based on [253], we present an empirical study of the time intervals between orders of various types on several order-driven markets (equity, bond futures, index futures), with an emphasis of the mutual excitement of orders. This empirical study will lay the ground for the modeling approach presented in Chapter 8 and the numerical simulations presented in Chapter 9, where in both cases a basic order book model is enhanced with a dependence structure of arrival times stemming from these empirical findings. 4.1 Arrival times of orders: event time is not enough As seen in Chapter 2, the Poisson hypothesis for the arrival times of orders of different kinds does not stand under careful scrutiny. However, whereas there already exists a vast literature documenting the dependence structure of the order flow - the arrival of market orders - , see e.g. [47], little is known on the interplay between liquidity taking and providing, one of the most important features of order-driven markets. As of today, the study of arrival times of orders in an order book has not been a primary focus in order book modelling. Many toy models leave this dimension aside when trying to understand the complex dynamics of an order book. In most order driven market models such as [92, 232, 18], and in some order book models as well (e.g.[279]), a time step in the model is an arbitrary unit of time during which many events may happen. We may call that clock aggregated time. In most order book models such as [73, 248, 97], one order is simulated per time step with given probabilities, i.e. these models use the clock known as event time. In the simple case where these probabilities are constant and independent of the state of the model, such a time treatment is equivalent to the assumption that order flows are homogeneous Poisson processes. A likely reason for the use of non-physical time in order book modelling – leaving aside the fact that models can be sufficiently complicated without adding another dimension – is that many puzzling empirical observations (see e.g. [67] for a review of some of the well-known “stylized facts”) can be made in event time (e.g. autocorrelation of the signs of limit and market orders) or in aggregated time (e.g. volatility clustering). 29 However, it is clear that physical (calendar) time has to be taken into account for the modelling of a realistic order book model. For example, market activity varies widely, and intraday seasonality is often observed as a well known U-shaped pattern. Even for a short time scale model – a few minutes, a few hours – durations of orders (i.e. time intervals between orders) are very broadly distributed. Hence, the Poisson assumption and its exponential distribution of arrival times has to be discarded, and models must take into account the way these irregular flows of orders affect the empirical properties studied on order books. Let us give one illustration. On figure 4.1, we plot examples of the empirical density function of the observed spread in event time (i.e. spread is measured each time an event happens in the order book), and in physical (calendar) time (i.e. measures are weighted by the time interval during which the order book is idle). It appears that density of the most probable values of the time-weighted distribution is 0.25 Event time spread Calendar time 0.2 100 10-1 10-2 10-3 10-4 10-5 10-6 0.15 0.1 0 0.05 0.1 0.15 0.2 0.05 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 Figure 4.1: Empirical density function of the distribution of the bid-ask spread in event time and in physical (calendar) time. In inset, same data using a semi-log scale. This graph has been computed with 15 four-hour samples of tick data on the BNPP.PA stock (see section A.3 for details). higher than in the event time case. Symmetrically, the density of the least probable event is even smaller when physical time is taken into account. This tells us a few things about the dynamics of the order book, which could be summarized as follows: the wider the spread, the faster its tightening. We can get another insight of this empirical property by measuring on our data the average waiting time before the next event, conditionally on the spread size. When computed on the lower one-third-quantile (small spread), the average waiting time is 320 milliseconds. When computed on the upper one-third-quantile (large spread), this average waiting time is 200 milliseconds. These observations complement some of the ones that can be found in the early paper [40]. 30 4.2 Empirical evidence of “market making” Our idea for an enhanced model of order streams is based on the following observation: once a market order has been placed, the next limit order is likely to take place faster than usual. To illustrate this, we compute for all our studied assets: • the empirical probability density function (pdf) of the time intervals of the counting process of all orders (limit orders and market orders mixed), i.e. the time step between any order book event (other than cancellation) • and the empirical density function of the time intervals between a market order and the immediately following limit order. If an independent Poisson assumption held, then these empirical distributions should be identical. However, we observe a very high peak for short time intervals in the second case. The first moment of these empirical distributions is significant: one the studied assets, we find that the average time between a market order and the following limit order is 1.3 (BNPP.PA) to 2.6 (LAGA.PA) times shorter than the average time between two random consecutive events. On the graphs shown in figure 4.2, we plot the full empirical distributions for four of the five studied assets1 . We observe their broad distribution and the sharp peak for the shorter times: on the Footsie future market for example, 40% of the measured time steps between consecutive events are less that 50 milliseconds ; this figure jumps to nearly 70% when considering only market orders and their following limit orders. This observation is an evidence for some sort of market-making behaviour of some participants on those markets. It appears that the submission of market orders is monitored and triggers automatic limit orders that add volumes in the order book (and not far from the best quotes, since we only monitor the five best limits). In order to confirm this finding, we perform non-parametric statistical test on the measured data. For all four studied markets, omnibus Kolmogorov-Smirnov and Cramer-von Mises tests performed on random samples establish that the considered distributions are statistically different. If assuming a common shape, a Wilcoxon-Mann-Withney U test clearly states that the distribution of time intervals between a market orders and the following limit order is clearly shifted to the left compared to the distributions of time intervals between any orders, i.e. the average “limit following market” reaction time is shorter than the average time interval between random consecutive orders. Note that there is no link with the sign of the market order and the sign of the following limit order. For example for the BNP Paribas (resp. Peugeot and Lagardere) stock, they have the same sign in 48.8% (resp. 51.9% and 50.7%) of the observations. And more interestingly, the “limit following market” property holds regardless of the side on which the following limit order is submitted. On figure 4.3, we plot the empirical distributions of time intervals between a market order and the following limit order, conditionally on the side of the limit order: the same side as the market order or the opposite one. It appears for all studied assets that both distributions are roughly identical. In other words, we cannot distinguish on the data if liquidity is added where the market order has been submitted or on the opposite 1 Observations are identical on all the studied assets. 31 0.6 0.7 AllOrders MarketNextLimit AllOrders MarketNextLimit 0.6 0.5 100 100 0.5 -1 10 0.4 10-1 -2 10 10-2 0.4 -3 10-3 10 0.3 10-4 0.01 0.1 1 10-4 0.01 0.3 10 0.1 1 10 0.2 0.2 0.1 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.4 0 0.1 0.2 0.3 0.4 0.7 AllOrders MarketNextLimit AllOrders MarketNextLimit 0.35 0.6 100 0.3 100 0.5 -1 10 0.25 10-2 -1 10 10-2 0.4 10-3 0.2 10-3 -4 10 0.15 0.01 0.1 1 10-4 0.01 0.3 10 0.1 1 10 0.2 0.1 0.1 0.05 0 0 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 Figure 4.2: Empirical density function of the distribution of the time intervals between two consecutive orders (any type, market or limit) and empirical density function of the distribution of the time intervals between a market order and the immediately following limit order. X-axis is scaled in seconds. In insets, same data using a log-log scale. Studied assets: BNPP.PA (top left), LAGA.PA (top right), FEIZ9 (bottom left), FFIZ9 (bottom right). side. Therefore, we do not infer any empirical property of placement: when a market order is submitted, the intensity of the limit order process increases on both sides of the order book. This effect we have thus identified is a phenomenon of liquidity replenishment of an order book after the execution of a trade. The fact that it is a bilateral effect makes its consequences similar to “market making”, event though there is obviously no market maker involved on the studied markets. 4.3 0.5 A reciprocal “market following limit” effect ? We now check if a similar or opposite distortion is to be found on market orders when they follow limit orders. To investigate this, we compute for all our studied assets the “reciprocal” measures: • the empirical probability density function (pdf) of the time intervals of the counting process of all orders (limit orders and market orders mixed), i.e. the time step between any order book event 32 0.5 0.7 0.7 Any limit order Same side limit order Opposite side limit order 0.6 Any limit order Same side limit order Opposite side limit order 0.6 100 0.5 100 0.5 -1 10 10-1 -2 10 0.4 10-2 0.4 -3 10-3 10 10-4 0.01 0.3 0.1 1 10-4 0.01 0.3 10 0.2 0.2 0.1 0.1 0 0.1 1 10 0 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 Figure 4.3: Empirical density function of the distribution of the time intervals between a market order and the immediately following limit order, whether orders have been submitted on the same side and on opposite sides. X-axis is scaled in seconds. In insets, same data using a log-log scale. Studied assets: BNPP.PA (left), LAGA.PA (right). (other than cancellation) • and the empirical density function of the time step between a market order and the previous limit order. As previously, if an independent Poisson assumption held, then these empirical distribution should be identical. Results for four assets are shown on figure 4.4. Contrary to previous case, no effect is very easily interpreted. For the three stocks (BNPP.PA, LAGA.PA and PEUP.PA (not shown)), it seems that the empirical distribution is less peaked for small time intervals, but the difference is much less important than in the previous case. As for the FEI and FFI markets, the two distributions are even much closer. Non-parametric tests confirms these observations. Performed on data from the three equity markets, Kolmogorov tests indicate different distributions and Wilcoxon tests enforce the observation that time intervals between a limit order and a following market order are stochastically larger than time intervals between unidentified orders. As for the future markets on Footsie (FFI) and 3-month Euribor (FEI), Kolmogorov tests does not indicate differences in the two observed distributions, and the result is confirmed by a Wilcoxon test that concludes at the equality of the means. Note that [23], using a different methodology akin to that of [113], does show a rather clear dependance between limit orders that move the price, and subsequent market orders. 33 0.5 0.5 0.5 AllOrders PreviousLimitMarket 0.4 AllOrders PreviousLimitMarket 0.4 100 100 -1 10-1 10 -2 10 0.3 10-2 0.3 10-3 10-3 10-4 0.01 0.2 0.1 1 10 10-4 0.01 0.2 0.1 0.1 1 10 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.5 0 0.1 0.2 0.3 0.4 0.5 0.5 AllOrders PreviousLimitMarket 0.4 AllOrders PreviousLimitMarket 0.4 100 100 -1 -1 10 10 -2 10 0.3 10-2 0.3 -3 10-3 10 -4 10 0.2 0.01 0.1 1 10 10-4 0.01 0.2 0.1 0.1 1 10 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 Figure 4.4: Empirical density function of the distribution of the time intervals between two consecutive orders (any type, market or limit) and empirical density function of the distribution of the time intervals between a limit order and an immediately following market order. In insets, same data using a log-log scale. Studied assets: BNPP.PA (top left), LAGA.PA (top right), FEIZ9 (bottom left), FFIZ9 (bottom right). 34 0.5 Part II Mathematical modelling of limit order books 35 Chapter 5 Agent-based modeling of limit order books: a survey In this chapter, we review agent-based models of limit order books, that is, models depicting, at the individual agent level, possibly from a statistical point of view, the interactions that lead to a transaction on a financial market. This survey is far from exhaustive, we have selected models that we feel are representative of some important, specific trends in agent-based modelling. 5.1 Agent-based models as a mechanism for stylized facts Although known, at least partly, for a long time – [237] gives a reference for a paper dealing with nonnormality of price time series in 1915, followed by several others in the 1920’s – the stylized facts of asset returns: fat tails, volatility clustering, leverage... have often been left aside when modelling financial markets. They were even often referred to as "anomalous" characteristics, as if observations failed to comply with theory. Much has been done these past fifteen years in order to address this challenge and provide new models that can reproduce these facts. These recent developments have been built on top of early attempts at modelling mechanisms of financial markets with agents. For example, [305], investigating some rules of the SEC1 , or [149], investigating double-auction microstructure, belong to those historical works. It seems that the first modern attempts at that type of models were made in the field of behavioural finance. This field aims at improving financial modelling based on the psychology and sociology of the investors. Models are built with agents who can exchange shares of stocks according to exogenously defined utility functions reflecting their preferences and risk aversions. [219] shows that this type of modelling offers good flexibility for reproducing some of the stylized facts and [218] provides a review of that type of model. However, although achieving some of their goals, these models suffer from many drawbacks: first, they are very complex, and it may be a very difficult task to identify the role of their numerous parameters and the types of dependence to these parameters; second, the chosen utility functions do not necessarily reflect what is observed on the mechanisms of a financial market. 1 Securities and Exchange Commission, the agency supervising the organization of regulator of the US stock exchanges 37 A sensible change in modelling appears with much simpler models implementing only well-identified and presumably realistic “behaviour”: [92] uses noise traders that are subject to “herding”, i.e. form random clusters of traders sharing the same view on the market. The idea is used in [283] as well. A complementary approach is to characterize traders as fundamentalists, chartists or noise traders. [232] propose an agent-based model in which these types of traders interact. In all these models, the price variation directly results from the excess demand: at each time step, all agents submit orders and the resulting price is computed. Therefore, everything is cleared at each time step and there is no structure of order book to keep track of orders. One big step is made with models really taking into account limit orders and keeping them in an order book once submitted and not executed. [81] build an agent-based model where all traders submit orders depending on the three elements identified in [232]: chartists, fundamentalists, noise. Orders submitted are then stored in a persistent order book. In fact, one of the first simple models with this feature was proposed in [29]. In this model, orders are particles moving along a price line, and each collision is a transaction. Due to numerous caveats in this model, the authors propose in the same paper an extension with fundamentalist and noise traders in the spirit of the models previously evoked. [240] goes further in the modelling of trading mechanisms by taking into account fixed limit orders and market orders that trigger transactions, and really simulating the order book. This model was analytically solved using a mean-field approximation by [301]. Following this trend of modelling, the more or less “rational” agents composing models in economics tends to vanish and be replaced by the notion of flows: orders are not submitted any more by an agent following a strategic behaviour, but are viewed as an arriving flow whose properties are to be determined by empirical observations of market mechanisms. Thus, the modelling of order books calls for more “stylized facts”, i.e. empirical properties that could be observed on a large number of order-driven markets. [40] is a thorough empirical study of the order flows in the Paris Bourse a few years after its complete computerization. Market orders, limit orders, time of arrivals and placement are studied. [49] and [275] provide statistical features on the order book itself. These empirical studies, that have been reviewed in the first part of this review, are the foundation for “zero-intelligence” models, in which “stylized facts” are expected to be reproduced by the properties of the order flows and the structure of order book itself, without considering exogenous “rationality”. [73] propose a simple model of order flows: limit orders are deposited in the order book and can be removed if not executed, in a simple deposition-evaporation process. [49] use this type of model with empirical distribution as inputs. As of today, the most complete empirical model is to our knowledge [248], where order placement and cancellation models are proposed and fitted on empirical data. Finally, new challenges arise as scientists try to identify simple mechanisms that allow an agent-based model to reproduce non-trivial behaviours: herding behaviour in[92], dynamic price placement in [280], threshold behaviour in [91], etc. 38 5.2 Early order-driven market modelling: market microstructure and policy issues The pioneering works in simulation of financial markets were aimed to study market regulations. The very first one, [305], tries to investigate the effect of regulations of the SEC on American stock markets, using empirical data from the 20’s and the 50’s. Twenty years later, at the start of the computerization of financial markets, [165] implements a simulator in order to test the feasibility of automated market making. Instead of reviewing the huge microstructure literature, we refer the reader to the well-known books by [261] or [169], for example, for a panorama of this branch of finance. However, by presenting a small selection of early models, we here underline the grounding of recent order book modelling. 5.2.1 A pioneer order book model To our knowledge, the first attempt to simulate a financial market was by [305]. This paper was a biting and controversial reaction to the Report of the Special Study of the Securities Markets of the SEC ([88]), whose aim was to “study the adequacy of rules of the exchange and that the New York stock exchange undertakes to regulate its members in all of their activities” ([89]). According to Stigler, this SEC report lacks rigorous tests when investigating the effects of regulation on financial markets. Stating that “demand and supply are [...] erratic flows with sequences of bids and asks dependent upon the random circumstances of individual traders”, he proposes a simple simulation model to investigate the evolution of the market. In this model, constrained by simulation capability in 1964, price is constrained within L = 10 ticks. (Limit) orders are randomly drawn, in trade time, as follows: they can be bid or ask orders with equal probability, and their price level is uniformly distributed on the price grid. Each time an order crosses the opposite best quote, it is a market order. All orders are of size one. Orders not executed N = 25 time steps after their submission are cancelled. Thus, N is the maximum number of orders available in the order book. In the original paper, a run of a hundred trades was manually computed using tables of random numbers. Of course, no particular results concerning the “stylized facts” of financial time series was expected at that time. However, in his review of some order book models, [302] makes simulations of a similar model, with parameters L = 5000 and N = 5000, and shows that price returns are not Gaussian: their distribution exhibits power law with exponent 0.3, far from empirical data. As expected, the limitation L is responsible for a sharp cut-off of the tails of this distribution. 5.2.2 Microstructure of the double auction [149] provides an early study of the double auction market with a point of view that does not ignore temporal structure, and really defines order flows. Price is discrete and constrained to be within {p1 , pL }. Buy and sell orders are assumed to be submitted according to two Poisson processes of intensities λ and µ. Each time an order crosses the best opposite quote, it is a market order. All quantities are assumed to be equal to one. The aim of the author was to provide an empirical study of the market microstructure. The main result of its Poisson model was to support the idea that negative correlation of consecutive price changes is linked the microstructure of the double auction exchange. This paper is very interesting 39 because it can be seen as precursor that clearly sets the challenges of order book modelling. First, the mathematical formulation is promising. With its fixed constrained prices, [149] can define the state of the order book at a given time as the vector (ni )i=1,...,L of awaiting orders (negative quantity for bid orders, positive for ask orders). Future analytical models will use similar vector formulations that can be cast it into known mathematical processes in order to extract analytical results – see e.g. [97] reviewed below. Second, the author points out that, although the Poisson model is simple, analytical solution is hard to work out, and he provides Monte Carlo simulation. The need for numerical and empirical developments is a constant in all following models. Third, the structural question is clearly asked in the conclusion of the paper: “Does the auction-market model imply the characteristic leptokurtosis seen in empirical security price changes?”. The computerization of markets that was about to take place when this research was published – Toronto’s CATS2 opened a year later in 1977 – motivated many following papers on the subject. As an example, let us cite here [165], who built a model to choose the right mechanism for setting clearing prices in a multi-securities market. 5.2.3 Zero-intelligence In the models by [305] and [149], orders are submitted in a purely random way on the grid of possible prices. Traders do not observe the market here and do not act according to a given strategy. Thus, these two contributions clearly belong to a class of “zero-intelligence” models. To our knowledge, [156] is the first paper to introduce the expression “zero-intelligence” in order to describe non-strategic behaviour of traders. It is applied to traders that submit random orders in a double auction market. The expression has since been widely used in agent-based modelling, sometimes in a slightly different meaning (see more recent models described in this review). In [156], two types of zero-intelligence traders are studied. The first are unconstrained zero-intelligence traders. These agents can submit random order at random prices, within the allowed price range {1, . . . , L}. The second are constrained zero-intelligence traders. These agents submit random orders as well, but with the constraint that they cannot cross their given reference price pRi : constrained zero-intelligence traders are not allowed to buy or sell at loss. The aim of the authors was to show that double auction markets exhibit an intrinsic “allocative efficiency” (ratio between the total profit earned by the traders divided by the maximum possible profit) even with zerointelligence traders. An interesting fact is that in this experiment, price series resulting from actions by zero-intelligence traders are much more volatile than the ones obtained with constrained traders. This fact will be confirmed in future models where “fundamentalists” traders, having a reference price, are expected to stabilize the market (see [322] or [232] below). Note that the results have been criticized by [86], who show that the observed convergence of the simulated price towards the theoretical equilibrium price may be an artefact of the model. More precisely, the choice of traders’ demand carry a lot of constraints that alone explain the observed results. Modern works in Econophysics owe a lot to these early models or contributions. Starting in the mid-90’s, physicists have proposed simple order book models directly inspired from Physics, where the analogy “order ≡ particle” is emphasized. Three main contributions are presented in the next section. 2 Computer Assisted Trading System 40 5.3 5.3.1 Order-driven market modelling in Econophysics The order book as a reaction-diffusion model A very simple model directly taken from Physics was presented in [29]. The authors consider a market with N noise traders able to exchange one share of stock at a time. Price p(t) at time t is constrained to be an integer (i.e. price is quoted in number of ticks) with an upper bound p: ¯ ∀t, p(t) ∈ {0, . . . , p}. ¯ Simulation is initiated at time 0 with half of the agents asking for one share of stock (buy orders, bid) with price: j j = 1, . . . , N/2, pb (0) ∈ {0, p/2}, ¯ (5.3.1) and the other half offering one share of stock (sell orders, ask) with price: j j = 1, . . . , N/2. ¯ p}, ¯ p s (0) ∈ { p/2, (5.3.2) At each time step t, agents revise their offer by exactly one tick, with equal probability to go up or down. Therefore, at time t, each seller (resp. buyer) agent chooses his new price as: j j p s (t + 1) = p s (t) ± 1 j j (resp. pb (t + 1) = pb (t) ± 1 ). (5.3.3) j A transaction occurs when there exists (i, j) ∈ {1, . . . , N/2}2 such that pib (t +1) = p s (t +1). In such a case the orders are removed and the transaction price is recorded as the new price p(t). Once a transaction has j been recorded, two orders are placed at the extreme positions on the grid: pib (t + 1) = 0 and p s (t + 1) = p. ¯ As a consequence, the number of orders in the order book remains constant and equal to the number of agents. In figure 5.1, an illustration of these moving particles is given. As pointed out by the authors, this process of simulation is similar the reaction-diffusion model A + B → ∅ in Physics. In such a model, two types of particles are inserted at each side of a pipe of length p¯ and move randomly with steps of size 1. Each time two particles collide, they’re annihilated and two new particles are inserted. The analogy is summarized in table 5.1. Following this analogy, it thus can Figure 5.1: Illustration of the Bak, Paczuski and Shubik model: white particles (buy orders, bid) moving from the left, black particles (sell orders, ask) moving from the right. Reproduced from [29]. 41 Table 5.1: Analogy between the A + B → ∅ reaction model and the order book in [29]. Physics [29] Particles Finite Pipe Collision Orders Order book Transaction be showed that the variation ∆p(t) of the price p(t) verifies : t ∆p(t) ∼ t1/4 (ln( ))1/2 . t0 (5.3.4) Thus, at long time scales, the series of price increments simulated in this model exhibit a Hurst exponent H = 1/4. As for the stylized fact H ≈ 0.7, this sub-diffusive behavior appears to be a step in the wrong direction compared to the random walk H = 1/2. Moreover, [302] points out that no fat tails are observed in the distribution of the returns of the model, but rather fits the empirical distribution with an exponential decay. Other drawbacks of the model could be mentioned. For example, the reintroduction of orders at each end of the pipe leads to unrealistic shape of the order book, as shown on figure 5.2. Actually here is the main drawback of the model: “moving” orders is highly unrealistic as for modelling an order book, and since it does not reproduce any known financial exchange mechanism, it cannot be the base for any larger model. Therefore, attempts by the authors to build several extensions of this simple framework, in order to reproduce “stylized facts” by adding fundamental traders, strategies, trends, etc. are not of interest for us in this review. However, we feel that the basic model as such is very interesting because of its simplicity and its “particle” representation of an order-driven market that has opened the way for more realistic models. 5.3.2 Introducing market orders [240] keeps the zero-intelligence structure of the [29] model but adds more realistic features in the order placement and evolution of the market. First, limit orders are submitted and stored in the model, without moving. Second, limit orders are submitted around the best quotes. Third, market orders are submitted to trigger transactions. More precisely, at each time step, a trader is chosen to perform an action. This trader can either submit a limit order with probability ql or submit a market order with probability 1 − ql . Once this choice is made, the order is a buy or sell order with equal probability. All orders have a one unit volume. As usual, we denote p(t) the current price. In case the submitted order at time step t + 1 is a limit ask (resp. bid) order, it is placed in the book at price p(t) + ∆ (resp. p(t) − ∆), ∆ being a random variable uniformly distributed in ]0; ∆ M = 4]. In case the submitted order at time step t + 1 is a market order, one order at the opposite best quote is removed and the price p(t + 1) is recorded. In order to prevent the number of orders in the order book from large increase, two mechanisms are proposed by the author: either keeping a fixed maximum number of orders (by discarding new limit orders when this maximum is reached), or removing them after a fixed lifetime if they have not been executed. Numerical simulations show that this model exhibits non-Gaussian heavy-tailed distributions of re42 Figure 5.2: Snapshot of the limit order book in the Bak, Paczuski and Shubik model. Reproduced from [29]. 43 Figure 5.3: Empirical probability density functions of the price increments in the Maslov model. In inset, log-log plot of the positive increments. Reproduced from [240]. turns. On figure 5.3, the empirical probability density of the price increments for several time scales are plotted. For a time scale δt = 1, the author fit the tails distribution with a power law with exponent 3.0, i.e. reasonable compared to empirical value. However, the Hurst exponent of the price series is still H = 1/4 with this model. It should also be noted that [301] proposed an analytical study of the model using a mean-field approximation (See below section 6.2.2). This model brings very interesting innovations in order book simulation: order book with (fixed) limit orders, market orders, necessity to cancel orders waiting too long in the order book. These features are of prime importance in any following order book model. 5.3.3 The order book as a deposition-evaporation process [73] continue the work of [29] and [240], and develop the analogy between dynamics of an order book and an infinite one dimensional grid, where particles of two types (ask and bid) are subject to three types of events: deposition (limit orders), annihilation (market orders) and evaporation (cancellation). Note that annihilation occurs when a particle is deposited on a site occupied by a particle of another type. The analogy is summarized in table 5.2. Hence, the model goes as follows: At each time step, a bid (resp. ask) order is deposited with probability λ at a price n(t) drawn according to a Gaussian 44 Table 5.2: Analogy between the deposition-evaporation process and the order book in [73]. Physics [73] Particles Infinite lattice Deposition Evaporation Annihilation Orders Order book Limit orders submission Limit orders cancellation Transaction distribution centred on the best ask a(t) (resp. best bid b(t)) and with variance depending linearly on the spread s(t) = a(t) − b(t): σ(t) = K s(t) + C. If n(t) > a(t) (resp. n(t) < b(t)), then it is a market order: annihilation takes place and the price is recorded. Otherwise, it is a limit order and it is stored in the book. Finally, each limit order stored in the book has a probability δ to be cancelled (evaporation). Figure 5.4 shows the average return as a function of the time scale. It appears that the series of price returns simulated with this model exhibit a Hurst exponent H = 1/4 for short time scales, and that tends to H = 1/2 for larger time scales. This behaviour might be the consequence of the random evaporation process (which was not modelled in [240], where H = 1/4 for large time scales). Although some modifications of the process (more than one order per time step) seem to shorten the sub-diffusive region, it is clear that no over-diffusive behaviour is observed. 5.4 Empirical zero-intelligence models The three models presented in the previous section 5.3 have successively isolated essential mechanisms that are to be used when simulating a “realistic” market: one order is the smallest entity of the model; the submission of one order is the time dimension (i.e. event time is used, not an exogenous time defined by market clearing and “tatonnement” on exogenous supply and demand functions); submission of market orders (as such in [240], as “crossing limit orders” in [73]) and cancellation of orders are taken into account. On the one hand, one may try to describe these mechanisms using a small number of parameters, using Poisson process with constant rates for order flows, constant volumes, etc. This might lead to some analytically tractable models, as will be described in section 6.2.2. On the other hand, one may try to fit more complex empirical distributions to market data without analytical concern. This type of modelling is best represented by [248]. It is the first model that proposes an advanced calibration on the market data as for order placement and cancellation methods. As for volume and time of arrivals, assumptions of previous models still hold: all orders have the same volume, discrete event time is used for simulation, i.e. one order (limit or market) is submitted per time step. Following [73], there is no distinction between market and limit orders, i.e. market orders are limit orders that are submitted across the spread s(t). More precisely, at each time step, one trading order is simulated: an ask (resp. bid) trading order is randomly placed at n(t) = a(t) + δa (resp. n(t) = b(t) + δb) according to a Student distribution with scale and degrees of freedom calibrated on market data. If an ask (resp. bid) order satisfies δa < −s(t) = b(t) − a(t) (resp. δb > s(t) = a(t) − b(t)), then it is a buy (resp. sell) market order and a transaction occurs at price a(t) (resp. b(t). During a time step, several cancellations of orders may occur. The authors propose an empirical 45 Figure 5.4: Average return hr∆t i as a function of ∆t for different sets of parameters and simultaneous depositions allowed in the Challet and Stinchcombe model. Reproduced from [73]. 46 Figure 5.5: Lifetime of orders for simulated data in the Mike and Farmer model, compared to the empirical data used for fitting. Reproduced from [248]. distribution for cancellation based on three components for a given order: • the position in the order book, measured as the ratio y(t) = ∆(t) ∆(0) where ∆(t) is the distance of the order from the opposite best quote at time t, • the order book imbalance, measured by the indicator Nimb (t) = Nb (t) Na (t)+Nb (t) ) Na (t) Na (t)+Nb (t) (resp. Nimb (t) = for ask (resp. bid) orders, where Na (t) and Nb (t) are the number of orders at ask and bid in the book at time t, • the total number N(t) = Na (t) + Nb (t) of orders in the book. Their empirical study leads them to assume that the cancellation probability has an exponential dependance on y(t), a linear one in Nimb and finally decreases approximately as 1/Nt (t) as for the total number of orders. Thus, the probability P(C|y(t), Nimb (t), Nt (t)) to cancel an ask order at time t is formally written : P(C|y(t), Nimb (t), Nt (t)) = A(1 − e−y(t) )(Nimb (t) + B) 1 , Nt (t) (5.4.1) where the constants A and B are to be fitted on market data. Figure 5.5 shows that this empirical formula provides a quite good fit on market data. Finally, the authors mimic the observed long memory of order signs by simulating a fractional Brownian motion. The auto-covariance function Γ(t) of the increments of such a process exhibits a slow decay 47 Figure 5.6: Cumulative distribution of returns in the Mike and Farmer model, compared to the empirical data used for fitting. Reproduced from [248]. : Γ(k) ∼ H(2H − 1)t2H−2 (5.4.2) and it is therefore easy to reproduce exponent β of the decay of the empirical autocorrelation function of order signs observed on the market with H = 1 − β/2. The results of this empirical model are quite satisfying as for return and spread distribution. The distribution of returns exhibit fat tails which are in agreement with empirical data, as shown on figure 5.6. The spread distribution is also very well reproduced. As their empirical model has been built on the data of only one stock, the authors test their model on 24 other data sets of stocks on the same market and find for half of them a good agreement between empirical and simulated properties. However, the bad results of the other half suggest that such a model is still far from being “universal”. Despite these very nice results, some drawbacks have to be pointed out. The first one is the fact that the stability of the simulated order book is far from ensured. Simulations using empirical parameters in the simulations may bring situations where the order book is emptied by large consecutive market orders. Thus, the authors require that there is at least two orders in each side of the book. This exogenous trick might be important, since it is activated precisely in the case of rare events that influence the tails of the distributions. Also, the original model does not focus on volatility clustering. [162] propose a variant that tackles this feature. Another important drawback of the model is the way order signs are simulated. As noted by the authors, using an exogenous fractional Brownian motion leads to correlated 48 price returns, which is in contradiction with empirical stylized facts. We also find that at long time scales it leads to a dramatic increase of volatility. As we have seen in the first part of the review, the correlation of trade signs can be at least partly seen as an artefact of execution strategies. Therefore this element is one of the numerous that should be taken into account when “programming” the agents of the model. In order to do so, we have to leave the (quasi) “zero-intelligence” world and see how modelling based on heterogeneous agents might help to reproduce non-trivial behaviours. Prior to this development below in 8.1, we briefly review some analytical works on the “zero-intelligence” models. 5.5 Remarks In table 5.3, we summarize the key features of some of the order book models reviewed in this chapter. 49 Model Stigler (1961) Garman (1976) Bak, Paczuski and Shubik (1997) Maslov (2000) Challet and Stinchcombe (2001) Price range Clock Finite grid Finite grid Finite grid Unconstrained Unconstrained Unconstrained Trade time Physical Time Aggregated time N agents owning each one limit order Event time and Aggregated time One zerointelligence agent / One flow Aggregated time One zerointelligence agent / One flow Normally distributed around best quote Studentdistributed around best quote Pending orders are cancelled after a fixed number of time steps Defined as crossing limit orders Pending orders can be cancelled with fixed probability at each time step Defined as crossing limit orders Pending orders can be cancelled with 3-parameter conditional probability at each time step Unit Correlated with a fractional Brownian motion Flows One zero/ intelligence Agents agent / One flow One zerointelligence agent / Two flows (buy/sell) Limit orders Uniform distribution on the price grid Two Poisson processes for buy and sell orders Moving at each time step by one tick Market Defined as orders crossing limit orders Cancel- Pending lation orders are orders cancelled after a fixed number of time steps Defined as crossing limit orders None Defined as crossing limit orders None (constant number of pending orders) Volume Unit Order Independent signs Unit Independent Unit Independent Unit Independent Unit Independent Claimed Return disretribution is sults power-law 0.3 / Cut-off because finite grid Microstructure is responsible for negative correlation of consecutive price changes No fat tails for returns / Hurst exponent 1/4 for price increments Fat tails for distributions of returns / Hurst exponent 1/4 Hurst exponent 1/4 for short time scales, tending to 1/2 for larger time scales One zerointelligence flow (limit order with fixed probability, else market order) Uniformly distributed in a finite interval around last price Submitted as such Table 5.3: Summary of the characteristics of the reviewed limit order book models. 50 Mike Farmer (2008) Fat tails distributions of returns / Realistic spread distribution / Unstable order book Chapter 6 The mathematical structure of zero-intelligence limit order book models In this chapter, based on [7], we present some mathematical results focusing on the convergence of the price process - a pure jump process at the microscopic level - to a diffusive process at macroscopic time scales. This result is an important step towards establishing the compatibility of microstructural models based on a realistic limit order book dynamics, and more classical models used in continuous time finance. We elaborate on the models of [97] and [303] to present a stylized description of the order book, and derive several mathematical results in the case of independent Poissonian arrival times. In particular, we show that the cancellation structure is an important factor ensuring the existence of a stationary distribution for the order book and the exponential convergence towards it. We also prove a functional central limit theorem (FCLT) forthe rescaled, centered price process. 6.1 An elementary approximation: perfect market making We start with the simplest agent-based market model: • The order book starts in a full state: all limits above PA (0) and below PB (0) are filled with one limit order of unit size q. The spread starts equal to 1 tick; • The flow of market orders is modeled by two independent Poisson processes M + (t) (buy orders) and M − (t) (sell orders) with constant arrival rates (or intensities) λ+ and λ− ; • There is one liquidity provider, who reacts immediately after a market order arrives so as to maintain the spread constantly equal to 1 tick. He places a limit order on the same side as the market order (i.e. a buy limit order after a buy market order and vice versa) with probability u and on the opposite side with probability 1 − u. 51 The mid-price dynamics can be written in the following form dP(t) = ∆P (dM + (t) − dM − (t))Z, (6.1.1) where Z is a Bernoulli random variable Z = 0 with probability (1 − u), (6.1.2) Z = 1 with probability u. (6.1.3) and The infinitesimal generator L associated with this dynamics is L f (P) = u λ+ ( f (P + ∆P) − f ) + λ− ( f (P − ∆P) − f ) , (6.1.4) where f denotes a test function. It is well known that a continuous limit is obtained under suitable assumptions on the intensity and tick size. Noting that (6.1.4) can be rewritten as 1 f (P + ∆P) − 2 f + f (P − ∆P) u (λ+ + λ− )(∆P)2 2 (∆P)2 f (P + ∆P) − f (P − ∆P) + u (λ+ − λ− )∆P , 2∆P L f (P) = (6.1.5) and under the following assumptions u (λ+ + λ− )(∆P)2 −→σ2 as ∆P → 0, (6.1.6) u (λ+ − λ− )∆P−→µ as ∆P → 0, (6.1.7) and the generator converges to the classical diffusion operator σ2 ∂2 f ∂f +µ , 2 2 ∂P ∂P (6.1.8) corresponding to a Brownian motion with drift. This simple case is worked out as an example of the type of limit theorems that we will be interested in in the sequel. Of course, a more classical approach using the Functional Central limit Theorem (FCLT) as in [42] or [320] yields similar results: for given, fixed values of λ+ , λ− and ∆P, the rescaled-centred price process P(nt) − nµt √ nσ (6.1.9) converges as n → ∞, to a standard Brownian motion (B(t)) where p σ = ∆P (λ+ + λ− )u, 52 (6.1.10) and µ = ∆P(λ+ − λ− )u. (6.1.11) One can easily achieve more complex diffusive limits, such as a local volatility model, by imposing that the limit is a function of P and t u (λ+ + λ− )(∆P)2 → σ2 (P, t), (6.1.12) u (λ+ − λ− )∆P → µ(P, t). (6.1.13) and This may indeed be the case if the original intensities are functions of P and t themselves. 6.2 6.2.1 Order book dynamics Model setup: Poissonian arrivals, reference frame and boundary conditions Consider the dynamics of a general order book under the assumption of Poissonian arrival times for market orders, limit orders and cancellations. We shall assume that each side of the order book is fully described by a finite number of limits K, ranging from 1 to K ticks away from the best available opposite quote. We will use the notation X(t) := (a(t); b(t)) := (a1 (t), . . . , aK (t); b1 (t), . . . , bK (t)) , (6.2.1) where a := (a1 , . . . , aK ) designates the ask side of the order book and ai the number of shares available i ticks away from the best opposite quote, and b := (b1 , . . . , bK ) designates the bid side of the book. Note that, as opposed to the representation described in [97] or [303], see also [152] for an interesting discussion, we adopt a finite moving frame, as we think it is more realistic, and also and more convenient when scaling in tick size will be addressed. Let us now describe the events that may happen: • arrival of a new market order; • arrival of a new limit order; • cancellation of an already existing limit order. These events are described by independent Poisson processes: + − • M ± (t): arrival of new market order, with intensity λ M I(a , 0) and λ M I(b , 0); ± • Li± (t): arrival of a limit order at level i, with intensity λiL ; + − • Ci± (t): cancellation of a limit order at level i, with intensity λCi ai and λCi |bi |. q is the size of any new incoming order, and the superscript “+” (respectively “−”) refers to the ask (respectively bid) side of the book. Although in reality orders can have any size, we shall assume that all orders have a fixed unit size q. This assumption is convenient to carry out our analysis and is, for now, 53 a7 PB PA S a a8 a6 a9 a5 a1 a2 a3 a4 b6 b4 b3 b2 b1 b5 b9 b7 b b8 Figure 6.1: Order book dynamics: in this example, K = 9, q = 1, a∞ = 4, b∞ = −4. The shape of the order book is such that a(t) = (0, 0, 0, 0, 1, 3, 5, 4, 2) and b(t) = (0, 0, 0, 0, −1, 0, −4, −5, −3). The spread S (t) = 5 ticks. Assume that at time t0 > t a sell market order dM − (t0 ) arrives, then a(t0 ) = (0, 0, 0, 0, 0, 0, 1, 3, 5), b(t0 ) = (0, 0, 0, 0, 0, 0, −4, −5, −3) and S (t0 ) = 7. Assume instead that at t0 > t a buy limit order dL1− (t0 ) arrives one tick away from the best opposite quote, then a(t0 ) = (1, 3, 5, 4, 2, 4, 4, 4, 4), b(t0 ) = (−1, 0, 0, 0, −1, 0, −4, −5, −3) and S (t0 ) = 1. of secondary importance in the problem we are interested in. Note that the intensity of the cancellation process at level i is proportional to the available quantity at that level. That is to say, each order at level i has a lifetime drawn from an exponential distribution with ± intensity λCi . Note also that buy limit orders Li− (t) arrive below the ask price PA (t), and sell limit orders Li+ (t) arrive above the bid price PB (t). We impose constant boundary conditions outside the moving frame of size 2K: Every time the moving frame leaves a price level, the number of shares at that level is set to a∞ (or b∞ depending on the side of the book). Our choice of a finite moving frame and constant boundary conditions has three motivations. Firstly, it assures that the order book does not empty and that PA , PB are always well defined. Secondly, it keeps the spread S and the increments of PA , PB and P = (PA + PB )/2 bounded— This will be important when addressing the scaling limit of the price. Thirdly, it makes the model Markovian as we do not keep track of the price levels that have been visited (then left) by the moving frame at some prior time. Figure 6.1 is a representation of the order book using the above notations. 6.2.2 Analytical treatment of some limit order book models In this section we review some analytical results obtained on zero-intelligence models where processes are kept sufficiently simple that a mean-field approximation may be derived [301][303] or probabilities 54 conditionally to the state of the order book may be computed [97]. The key assumptions here are such that the process describing the order book is stationary. This allows either to write a stable density equation, or to fit the model into a nice mathematical framework such as ergodic Markov chains. [301] proposes an analytical treatment of the model introduced by [240] and reviewed in Chapter 5. Let us briefly described the formalism used. The main hypothesis is the following: on each side of the current price level, the density of limit orders is uniform and constant (and ρ+ on the ask side, ρ− on the bid side). In that sense, this is a “mean-field” approximation since the individual position of a limit order is not taken into account. Assuming we are in a stable state, the arrival of a market order of size s on the ask (resp. bid) side will make the price change by x+ = s/ρ+ (resp. x− = s/ρ− ). It is then observed that the transformations of the vector X = (x+ , x− ) occurring at each event (new limit order, new buy market order, new sell market order) are linear transformation that can easily and explicitly be written. Therefore, an equation satisfied by the probability distribution P of the vector X of price changes can be obtained. Finally, assuming further simplifications (such as ρ+ = ρ− ), one can solve this equation for a tail exponent and find that the distribution behaves as P(x) ≈ x−2 for large x. This analytical result is slightly different from the one obtained by simulation in [240]. However, the numerous approximations make the comparison difficult. The main point here is that some sort of mean-field approximation is natural if we assume the existence of a stationary state of the order book, and thus may help handling order book models. [303] investigates the scaling properties of some liquidity and price characteristics in a limit order book model. These results are summarized in table 6.1. In [303], orders arrive on an infinite price grid (This is consistent as limit orders arrival rate per price level is finite). Moreover, the arrival rates are independent of the price level, which has the advantage of enabling the analytical predictions summarized in table 6.1. Quantity Average asymptotic depth Average spread Slope of average depth profile Price “diffusion” parameter at short time scales Price “diffusion” parameter at long time scales Scaling relation λL /λC M L λ /λ f (, ∆P/pc ) L (λ )2 /λ M λC g(, ∆P/pc ) (λ M )2 λC /λL −0.5 (λ M )2 λC /λL 0.5 Table 6.1: Results of Smith et al. := q/(λ M /2λC ) is a “granularity” parameter that characterizes the effect of discreteness in order sizes, pc := λ M /2λL is a characteristic price interval, and f and g are slowly varying functions. These results are obtained by mean-field approximations, which assume that the fluctuations at adjacent price levels are independent. This allows fruitful simplifications of the complex dynamics of the order book. In addition, the authors do not characterize the convergence of the coarse-grained price process in the sense of stochastic-process limits, nor do they show that the limiting process is precisely a Brownian motion (theorem ??). [97] is an original attempt at analytical treatments of limit order books. In their model, the price is contrained to be on a grid {1, . . . , N}. The state of the order book can then be described by a vector 55 X(t) = (X1 (t), . . . , XN (t)) where |Xi (t)| is the quantity offered in the order book at price i. Conventionaly, Xi (t), i = 1, . . . , N is positive on the ask side and negative on the bid side. As usual, limit orders arrive at level i at a constant rate λi , and market orders arrive at a constant rate µ. Finally, at level i, each order can be cancelled at a rate θi . Using this setting, [97] shows that each event (limit order, market order, cancellation) transforms the vector X in a simple linear way. Therefore, it is shown that under reasonable conditions, X is an ergodic Markov chain, and thus admits a stationary state. The original idea is then to use this formalism to compute conditional probabilities on the processes. More precisely, it is shown that using Laplace transform, one may explicitly compute the probability of an increase of the mid price conditionally on the current state of the order book. In this chapter, we use a combination of the two models in that the arrival rates are not uniformly distributed across prices, and the reference frame is finite but moving. 6.2.3 Evolution of the order book We can write the following coupled SDEs for the quantities of outstanding limit orders in each side of the order book:1 i−1 X dai (t) = − q − ak dM + (t) + qdLi+ (t) − qdCi+ (t) + k=1 + (J M− (a) − a)i dM (t) + − K X − (J Li (a) − a)i dLi− (t) i=1 + K X − (JCi (a) − a)i dCi− (t), (6.2.2) i=1 and i−1 X dbi (t) = q − |bk | dM − (t) − qdLi− (t) + qdCi− (t) k=1 + + + (J M (b) − b)i dM + (t) + K X + (J Li (b) − b)i dLi+ (t) i=1 K X + + (JCi (b) − b)i dCi+ (t), (6.2.3) i=1 where the J’s are shift operators corresponding to the renumbering of the ask side following an event affecting the bid side of the book and vice versa. For instance the shift operator corresponding to the arrival of a sell market order dM − (t) of size q is2 J M− (a) = 0, 0, . . . , 0, a1 , a2 , . . . , aK−k , | {z } k times 1 2 Remember that, by convention, the bi ’s are non-positive. − − For notational simplicity, we write J M (a) instead of J M (a; b) etc. for the shift operators. 56 (6.2.4) with k := inf{p : p X |b j | > q} − inf{p : |b p | > 0}. (6.2.5) j=1 Similar expressions can be derived for the other events affecting the order book. In the next sections, we will study some general properties of the order book, starting with the generator associated with this 2K-dimensional continuous-time Markov chain. 6.2.4 Infinitesimal generator Let us work out the infinitesimal generator associated with the jump process described above. We have + + L f (a; b) = λ M ( f [ai − (q − A(i − 1))+ ]+ ; J M (b) − f ) + + K X i=1 K X + + λiL ( f ai + q; J Li (b) − f ) + + λCi ai ( f ai − q; JCi (b) − f ) i=1 − − + λ M f J M (a); [bi + (q − B(i − 1))+ ]− − f + + K X i=1 K X − − λiL ( f J Li (a); bi − q − f ) − − λCi |bi |( f JCi (a); bi + q − f ), (6.2.6) i=1 where, to ease the notations, we note f (ai ; b) instead of f (a1 , . . . , ai , . . . , aK ; b) etc. and x− := min(x, 0), x ∈ R. x+ := max(x, 0), (6.2.7) The operator above, although cumbersome to put in writing, is simple to decipher: a series of standard difference operators corresponding to the “deposition-evaporation” of orders at each limit, combined with the shift operators expressing the moves in the best limits and therefore, in the origins of the frames for the two sides of the order book. Note the coupling of the two sides: the shifts on the a’s depend on the b’s, and vice versa. More precisely the shifts depend on the profile of the order book on the other side, namely the cumulative depth up to level i defined by A(i) := i X ak , (6.2.8) |bk |, (6.2.9) k=1 and B(i) := i X k=1 57 and the generalized inverse functions thereof p X A−1 (q0 ) := inf{p : a j > q0 }, (6.2.10) |b j | > q0 }, (6.2.11) j=1 and B−1 (q0 ) := inf{p : p X j=1 where q0 designates a certain quantity of shares3 . Remark 6.2.1 The index corresponding to the best opposite quote equals the spread S in ticks, that is iA := A (0) = inf{p : −1 p X j=1 and iB := B−1 (0) = inf{p : p X |b j | > 0} = j=1 6.2.5 S := iS , ∆P (6.2.12) S := iS = iA . ∆P (6.2.13) a j > 0} = Price dynamics We now focus on the dynamics of the best ask and bid prices, denoted by PA (t) and PB (t). One can easily see that they satisfy the following SDEs: dPA (t) = ∆P[(A−1 (q) − A−1 (0))dM + (t) K X − (A−1 (0) − i)+ dLi+ (t) + (A−1 (q) − A−1 (0))dCi+A (t)], (6.2.14) i=1 and dPB (t) = −∆P[(B−1 (q) − B−1 (0))dM − (t) K X − (B−1 (0) − i)+ dLi− (t) + (B−1 (q) − B−1 (0))dCi−B (t)], (6.2.15) i=1 which describe the various events that affect them: change due to a market order, change due to limit orders inside the spread, and change due to the cancellation of a limit order at the best price. Equivalently, 3 Note that a more rigorous notation would be A(i, a(t)) and A−1 (q0 , a(t)) for the depth and inverse depth functions respectively. We drop the dependence on the last variable as it is clear from the context. 58 the respective dynamics of the mid-price and the spread are: ∆P h −1 (A (q) − A−1 (0))dM + (t) − (B−1 (q) − B−1 (0))dM − (t) 2 K K X X − (A−1 (0) − i)+ dLi+ (t) + (B−1 (0) − i)+ dLi− (t) dP(t) = i=1 i=1 + (A (q) − A −1 −1 (0))dCi+A (t) i − (B−1 (q) − B−1 (0))dCi−B (t) , (6.2.16) h dS (t) = ∆P (A−1 (q) − A−1 (0))dM + (t) + (B−1 (q) − B−1 (0))dM − (t) − K X K X (A−1 (0) − i)+ dLi+ (t) − i=1 (B−1 (0) − i)+ dLi− (t) i=1 + (A (q) − A −1 −1 (0))dCi+A (t) i + (B−1 (q) − B−1 (0))dCi−B (t) . (6.2.17) The equations above are interesting in that they relate in an explicit way the profile of the order book to the size of an increment of the mid-price or the spread, therefore linking the price dynamics to the order flow. For instance the infinitesimal drifts of the mid-price and the spread, conditional on the shape of the order book at time t, are given by: + − ∆P h −1 (A (q) − A−1 (0))λ M − (B−1 (q) − B−1 (0))λ M 2 K K X X + − − (A−1 (0) − i)+ λiL + (B−1 (0) − i)+ λiL E [dP(t)|(a; b)] = i=1 i=1 + (A (q) − A −1 −1 + (0))λCiA aiA i − − (B−1 (q) − B−1 (0))λCiB |biB | dt, (6.2.18) and h + − E [dS (t)|(a; b)] = ∆P (A−1 (q) − A−1 (0))λ M + (B−1 (q) − B−1 (0))λ M − K X −1 (A (0) − + i)+ λiL i=1 (A−1 (0) − i)+ λiL − i=1 + (A (q) − A −1 − K X −1 + (0))λCiA aiA i − + (B−1 (q) − B−1 (0))λCiB |biB | dt. (6.2.19) For simplicity, we recast this, or rather, these equations under a general form Pt = t 2K+2 X Z 0 Fi (Xu )dNui , (6.2.20) i=1 where the N i ’s are the Poisson processes driving the events affecting the limit order book, and the Fi ’s are the corresponding jumps in the price. 59 6.3 Ergodicity and diffusive limit In this section, our interest lies in the following questions: 1. Is the order book model defined above stable? 2. What is the stochastic-process limit of the price at large time scales? 6.3.1 Ergodicity of the order book and rate of convergence to the stationary state Of the utmost interest is the behavior of the order book in its stationary state. We have the following result: ± Theorem 6.3.1 If λC = min1≤i≤K {λCi } > 0, then (X(t))t≥0 = (a(t); b(t))t≥0 is an ergodic Markov process. In particular (X(t)) has a stationary distribution Π. Moreover, the rate of convergence of the order book to its stationary state is exponential. That is, there exist r < 1 and R < ∞ such that ||Qt (x, .) − Π(.)|| ≤ Rrt V(x), t ∈ R+ , x ∈ S. Proof. Let V(x) := V(a; b) := K X ai + i=1 K X (6.3.1) |bi | + q (6.3.2) i=1 be the total number of shares in the book (+q shares). Using the expression of the infinitesimal generator (6.2.6) we have + − LV(x) ≤ −(λ M + λ M )q + K X + − (λiL + λiL )q − i=1 + K X − λiL (iS − i)+ a∞ + i=1 K X + − (λCi ai + λCi |bi |)q i=1 K X + λiL (iS − i)+ |b∞ | (6.3.3) i=1 + − + − ≤ −(λ M + λ M )q + (ΛL + ΛL )q − λC qV(x) + − + K(ΛL a∞ + ΛL |b∞ |), where ± ΛL := K X (6.3.4) ± ± λiL and λC := min {λCi } > 0. 1≤i≤K i=1 (6.3.5) The first three terms in the right hand side of inequality (6.3.3) correspond respectively to the arrival of a market, limit or cancellation order—ignoring the effect of the shift operators. The last two terms are due to shifts occurring after the arrival of a limit order inside the spread. The terms due to shifts occurring after market or cancellation orders (which we do not put in the r.h.s. of (6.3.3)) are negative, hence the inequality. To obtain inequality (6.3.4), we used the fact that the spread iS is bounded by K + 1—a consequence of the boundary conditions we impose— and hence (iS − i)+ is bounded by K. 60 The drift condition (6.3.4) can be rewritten as LV(x) ≤ −βV(x) + γ, (6.3.6) for some positive constants β, γ. Inequality (6.3.6) together with theorem 7.1 in [246] let us assert that (X(t)) is V-uniformly ergodic, hence (6.3.1). Corollary 6.3.1 The spread S (t) = A−1 (0, a(t))∆P = S (X(t)) has a well-defined stationary distribution— This is expected as by construction the spread is bounded by K + 1. 6.3.2 Large-scale limit of the price process This section is devoted to the asymptotics of the price process. The ergodic theory of Markov processes is combined with the martingale convergence theorem to obtain the results. This approach is extremely general and flexible, and prone to many generalizations for markovian models of limit order books. We state and prove our main result concerning the long-time price dynamics in the case of Poissonian arrival times. Denote then by Π the stationary distribution of X as provided by Proposition 6.3.1. Using the Ergodic Theorem B.2.1 together with the deep Martingale Convergence Theorem 7.1.4 of [125], one can show the Proposition 6.3.1 Consider the price process described by Equation (6.2.20) above, and introduce the sequence of martingales Pˆ n formed by the centered, rescaled price Pnt − Qnt Pˆ nt ≡ , √ n where Q is the compensator of P Qt = X i λ t Z i Fi (X s )ds. 0 Then, Pˆ n converges in distribution to a Wiener process σW, ˆ where the volatility σ ˆ is given by 1X i λ σ ˆ = lim t→+∞ t i t Z X (Fi (X s )) ds = 2 2 0 λi Z (Fi (X))2 Π(dX). i Proof. Proposition 6.3.1 will follow from the convergence of the predictable quadratic variation of Pˆ n . By construction, there holds or else nt 1X i < P , P >t = λ n i Z 1X i < P , P >t = t( λ n i Z ˆn ˆn (Fi (X s ))2 ds, ˆn 0 nt (Fi (X s ))2 ds), ˆn 61 0 and Theorem B.2.1 ensures that a.s. 1 X i lim λ t→+∞ nt i nt Z (Fi (X s )) ds = 2 X 0 λi Z (Fi (X))2 Π(dX) i whenever the integrability conditions (see Theorem B.2.1) are satisfied. Now, those are easily seen to hold true, since the integrand in the predictable quadratic variation is a bounded function. As a matter of fact, the only possibly unbounded term would come from the cancellation rate, proportional to the ai , |b|i ’s. However, whenever a cancellation order causes a price change, then necessarily, the book is in a state where the quantity at the best limit that moves is precisely equal to q. Hence, the boundedeness follows. The other condition for the martingale convergence theorem to apply is trivially satisfied, since the size of the jumps of Pˆ n is bounded by √C , C being some constant. n Appealing as it first seems, Proposition 6.3.1 is not satisfactory: in order to give a more precise characterization of the dynamics of the rescaled price process, it is necessary to understand thoroughly the behaviour of its compensator Qnt . As a matter of fact, Qnt itself satisfies an ergodic theorem, and if its asymptotic variance is not negligible w.r. to nt, one cannot conclude directly from Proposition 6.3.1 that the rescaled price process P √nt n behaves like a Wiener process with a deterministic drift. The next result provides a more accurate answer, valid under general ergodicity conditions. Theorem 6.3.2 Write as above the price t XZ Pt = Fi (X s )dN si 0 i and its compensator Qt = X λi t Z Fi (X s )ds. 0 i Define X h= λi Fi (X) i and let a.s. 1X i λ α = lim t→+∞ t i t Z (Fi (X s ))ds = Z h(X)Π(dX). 0 Finally, introduce the solution g to the Poisson equation Lg = h − α and the associated martingale Zt = g(Xt ) − g(X0 ) − Z t Lg(X s )ds ≡ g(Xt ) − g(X0 ) − Qt + αt. 0 62 Then, the deterministically centered, rescaled price Pnt − αnt P¯ nt ≡ √ n converges in distribution to a Wiener process σW. ¯ The asymptotic volatility σ ¯ satisfies the identity 1X i λ t→+∞ t i σ ¯ 2 = lim t Z ((Fi − ∆i (g))(X s ))2 ds ≡ X 0 λi Z ((Fi − ∆i (g))(X))2 Π(dx). i where ∆i (g)(X) denotes the jump of the process g(X) when the process N i jumps and the limit order book is in the state X. Proof. The martingale method, see e.g. [155][111][196], consists in rewriting the price process under the form Pt = (Pt − Qt ) − Zt + g(Xt ) − g(X0 ) + αt ≡ (Mt − Zt ) + g(Xt ) − g(X0 ) + αt, so that Vnt + g(Xt ) − g(X0 ) P¯ nt = , √ n where V = M − Z is a martingale. Therefore, the theorem is proven iff one can show that g(Xt )−g(X0 ) √ n converges to 0 in L2 (Π(dx)), or simply, that g ∈ L2 (Π(dx)). Theorem 4.4 of [155] states that the condition h2 6 V (6.3.7) (where V is a Lyapunov function for the process) is sufficient for g to be in L2 (Π(dx)) - but Condition (6.3.7) is trivially satisfied since h is bounded. 6.4 Conclusions In this chapter, we have analyzed a simple Markovian order book model, in which elementary changes in the price and spread processes are explicitly linked to the instantaneous shape of the order book and the order flow parameters. Two basic properties were investigated: the ergodicity of the order book and the large-scale limit of the price process. The first property, which we answered positively, is desirable in that it assures the stability of the order book in the long run, and gives a theoretical underpinning to statistical measurements on order book data. The scaling limit of the price process is, as anticipated, a Brownian motion. A key ingredient in this result is the convergence of the order book to its stationary state at an exponential rate, a property equivalent to a geometric mixing condition satisfied by the stationary version of the order book. This short memory effect, plus a constraint on the variance of price increments guarantee a diffusive limit at large time scales. Our assumptions are independent Poissonian order flows, proportional cancellation rates, and the presence of two reservoirs of liquidity K ticks away from the best quotes to guarantee that the spread does not diverge. 63 In a sense, this chapter serves as a mathematical justification to the Bachelier model of asset prices, from a market microstructure perspective. In reality, the picture is however more subtle: even if the price process is asymptotically diffusive, at short time scales, the model produces stronger anti-correlation in traded prices than what is actually observed in the data. At those time scales, price diffusivity is arguably the result of a balance between persistent liquidity taking and anti-persistent liquidity providing. We believe however that the approach presented here is interesting for clearly identifying conditions under which the asymptotic normality of price increments holds; and more importantly, for introducing a set of mathematical tools for further investigating the price dynamics in more sophisticated stochastic order book models. For instance, as documented in Chapter 4, market orders excite limit orders and vice versa, so that the actual order flows exhibit non-negligible cross dependences. A possible solution for endogenously incorporating these dependences is the use of mutually exciting processes: Z t λ (t) = λ (0) + ϕ MM (t − s)dN M (s) 0 Z t + ϕLM (t − s)dN L (s), M M (6.4.1) 0 and, Z t λ (t) = λ (0) + ϕLL (t − s)dN L (s) 0 Z t + ϕ ML (t − s)dN M (s), L L (6.4.2) 0 This model has the additional advantage of capturing clustering in order arrivals (due to the selfexcitation terms ϕ MM and ϕLL ), and for exponentially decaying kernels can be cast into a Markovian setting. Chapter 8 is devoted to the general study of such a model. Many other generalizations are possible, and in particular, our approach can be extended to very general markovian order books: as long as there exists a Lyapunov function, one can apply exactly the same tools to study ergodicity and price asymptotics. 64 Chapter 7 The order book as a queueing system In this chapter, we move forward on the study of zero-intelligence limit order books, by studying the analytical properties of a (one-sided) order book model in which the flows of limit and market orders are Poisson processes, and the distribution of lifetimes of cancelled orders is exponential. The model is then extended to the case of random order sizes, thereby allowing to study the relationship between the size of the incoming limit orders and the shape of the order book. 7.1 The shape of a limit order book What is called the "shape" of the order book is simply the function which for any price gives the number of shares standing in the order book at that price. We call "cumulative shape" up to price p the total quantity offered in the order book between the best limit and price p. These quantities measure what is sometimes known as the depth of the order book. In a probabilistic model of the order book in which the random variable describing the (cumulative) shape admits a stationary distribution, its expectation with respect to this stationary distribution will be simply called the average (cumulative) shape. Understanding the average shape of an order book and its link to the order flows is not straightforward. [40] observed that, at least for the first limits, the shape of the order book is increasing away from the spread. A decade later, and still on the Paris Bourse, [49] and [276] propose improved results. It is observed (using 2001 data) that "the average order book has a maximum away from the current bid/ask, and a tail reflecting the statistics of the incoming orders" (hump-shaped limit order book). A power law distribution is suggested for modelling the decrease of the flow of limit orders away from the spread. Furthermore, [49] provides an analytical approximation to fit the average shape of the order book, that depends on the incoming flow of limit orders. [163] confirms the hump-shaped form of the order book by an empirical study of liquid Chinese stocks (using 2003 data), but finds an exponential decrease of the average shape as the price increases. [292] also observes the hump-shaped for the US Google stock (using 2005 data). More recently, research about optimal execution in an order book and order book simulations has risen interest in understanding the shape of orders books. Indeed, optimal execution problems need an order book shape as input: block-shape has been used in toy models, general shapes have recently been considered [256, 278, 20]. The inverse function of the cumulative shape of the order book is known as 65 the virtual (or: instantaneous) market impact, and may be of interest in these problems, even though it is not a direct proxy for the price impact of a trade [see e.g. 131, 47]. As for order book simulations, once the flows of orders have been defined, checking the simulated shape is important in assessing that the model is realistic. A log-normal shape of the order book has been suggested [279]. An order book is primarily a complex queueing system. Recent studies [97, 94] build on this observation and use classical tools from queueing theory such as the computation of Laplace-Stieljes transforms of distributions of interest. [97] observes that "the average profile of the order book displays a hump [. . . ] that does not result from any fine-tuning of model parameters or additional ingredients such as correlation between order flow and past price moves", but no explicit link between the average shape and the order flows is made. Our paper hopefully closes this gap. We start with the simplest queueing system as an order book: we consider a one-side limit order book model in which limit orders and market orders occur according to Poisson processes. It is also assumed that all limit orders are submitted with an exponentially distributed lifetime, i.e. they are cancelled if they have not been executed some exponential time after their submission. This starting model is discrete in the sense that the price is an integer, expressed in number of ticks. In section 7.2, the model is described and we show recalling simple results from queueing theory that this system admits an analytically computable expected size associated to its stationary distribution. We then develop this into a continuous model by replacing the finite set of Poisson processes describing the arrival of limit orders with a spatial Poisson process. The main interest of the continuous version of the model is to allow easier analytical manipulation and to describe the frequent case in which the tick size is small compared to standard variations of prices. We obtain an analytical formula for the average shape of the continuous order book, that depends explicitly on the arrival rates of limit, market and cancellation orders. This shape has an interpretation in terms of the probability that a submitted limit order be cancelled (as opposed to; executed), introducing a law of conservation of the flows of orders in a Markovian order book. In section 7.3, we compare our analytical formula for the shape of the order book to the numerical investigations of [303], and then to the only (to our knowledge) existing analytical formula for the shape of an order book available in the literature, proposed by [49]. Section 7.4 extends the previous model by allowing for random sizes of limit orders. We update our analytical formulas in the special case in which the sizes of limit orders are assumed to be geometrically distributed. Thanks to this model, we investigate in section 7.5 the role of the size of limit orders in an order book, providing insights on its influence on the order book shape. Few empirical literature provides insights on the factors that make the average shape of order books vary. [37] analyses the variations of order books on the German Stock Exchange (2004 data) through a principal component analysis and identifies two main factors that respectively shift and rotate the shape of the order book. [255] investigates the slope of the order book around the spread as a function of the traded volume on 108 Norwegian stocks (2001 data) and, among other results, exhibit a positive relation between the order book slope around the spread and the trading activity: high trading activity tends to increase the slope of the first limits of an order book. Our model suggests an influence of the relative size of limit orders, that we empirically test using 2010 trading data on the Paris Stock Exchange. Section 7.6 concludes this work. 66 7.2 7.2.1 A link between the flows of order and the shape of an order book The basic queueing system The aim of this section is to present the basic one-side order book model, discuss the relevance of its assumptions and recall some results from queueing theory in the context of this order book model. Let us consider a one-side order book model, i.e. a model in which all limit orders are ask orders, and all market orders are buy orders. Bid price is assumed to be constantly equal to zero, and consequently spread and ask price are identically equal. From now on, this quantity will be simply refered to as the price. Let p(t) denote the price at time t. {p(t), t ∈ [0 ∈ ∞)} is a continuous-time stochastic process with value in the discrete set {1, . . . , K}. In other words, the price is given in number of ticks. Let ∆ be the tick size, such that the price range of the model in currency is actually {∆, . . . , K∆}. For realistic modelling and empirical fitting performance, one may assume that the maximum price K is chosen very large, but in fact it will soon be obvious that this upper bound does not affect in any way the order book for lower prices. For all i ∈ {1, . . . , K}, (ask) limit orders at price i are submitted according to a Poisson process with parameter λi . These processes as assumed to be mutually independent, so that the number P of orders submitted at prices q . . . , r is a Poisson process with parameter λq→r defined as λq→r = ri=q λi . All limit orders standing in the book may be cancelled. It is assumed that the time intervals between submission and cancellation form a set of mutually independent random variables identically distributed according to an exponential distribution with parameter θ > 0. Finally, (buy) market orders are submitted at random times according to a Poisson process with parameter µ. Note that all orders are assumed to be of unit size. This restriction will be dropped in sections 7.4 et sq. Let us comment here on the model specification. Firstly, parameter θ is difficult to grasp empirically, and the use, for our analytical purposes, of the unconditional exponential distribution is rough. [77] finds an asymptotic power-law distribution for this quantity, confirmed in [67]. [114] propose a two-regime power-law distribution (with a different exponent for small and large lifetimes of orders). It seems also plausible that the lifetime of limit orders that are cancelled in an order book varies with the distance between the order price and the current ask price, among many other parameters. [248] outperforms the basic exponential model with an empirically-defined three-parameter cancellation model, too complex to be analytically used here. Secondly, as for the limit and market orders, several empirical studies show that the Poisson processes do not reflect the complexity of the order flows. Complex point processes have been suggested to model the interactions between the order flows, and the consequent resilience of the order book [e.g. 215, 253]. We however focus on a order flows as Poisson processes in order to keep as much analytical tractability as possible. Thirdly, focusing on a one-side limit order book answers to the same need for simplicity. Such models are often considered in optimal execution papers [e.g. 256, 278, 20] or in empirical papers [e.g. 49] where a bid/ask symmetry is assumed. Our specification is equivalent to a two-side model with an infinite-volume limit bid order standing in the book at price 0. This situation could be found on a market where several sellers would make continuous-time offers that may be cancelled to a single-buyer that alone decides when to buy. Note also that our toy model is a halfbook of the double-side book studied by [97]. Finally, despite these very simple specifications, we will obtain general flexible order shapes that are both empirically valid and founded by market mechanisms. 67 Let {L1→k (t), t ∈ [0, ∞)} be the stochastic process representing the number of limit orders at prices 1, . . . , k standing in the order book at time t. L1→k is thus the cumulative shape of the order book in our model. L1→k can be viewed as a birth-death process with birth rate λ1→k et death rate µ + nθ in state n ; it may equivalently be viewed as the size of a M/M/1 + M queueing system with arrival rate λ1→k , service rate µ and reneging rate θ (see e.g. [137, Chapter XVII] or [59, Chapter 8] among many textbook references). This queueing system will now be refered to as the 1 → k queueing system. L1→k admits a stationary distribution π1→k (·) as soon as θ > 0. Using the formalism of the infinitesimal generator: −λ1→k λ1→k 0 0 µ + θ −(λ1→k + µ + θ) λ1→k 0 A = 0 µ + 2θ −(λ1→k + µ + 2θ) λ1→k .. .. .. .. . . . . . . . 0 . . . , 0 . . . . . . . . . 0 (7.2.1) this stationary probability π1→k is classically obtained and written for all n ∈ N∗ : n Y λ1→k , µ + iθ i=1 (7.2.2) ∞ n X Y λ1→k −1 . π1→k (0) = µ + iθ (7.2.3) π1→k (n) = π1→k (0) and setting n=0 π(n) P∞ = 1 gives: n=0 i=1 Introducing the normalized parameters ν1→k = for all n ∈ N: λ1→k θ and δ = µθ , and after some simplifications, we write n δ Y e−ν1→k ν1→k ν1→k π1→k (n) = , δΓν1→k (δ) i=1 i + δ (7.2.4) where Γy is the lower incomplete version of the Euler gamma function: Γy : R+ → R, x 7→ y Z t x−1 e−t dt. (7.2.5) 0 Now, the price in the one-side order book model is equal to k if and only if the "1 → k − 1" queueing system is empty and the "1 → k" system is not. Therefore, if L1→k is distributed according to the invariant distribution π1→k , then the distribution π p of the price p is written: δ e−ν1→1 ν1→1 , π p (1) = 1 − δΓν1→1 (δ) δ e−ν1→K−1 ν1→K−1 π p (K) = , δΓν1→K−1 (δ) (7.2.6) and for all k ∈ {2, . . . , K − 1}, π p (k) = δ e−ν1→k−1 ν1→k−1 δΓν1→k−1 (δ) − δ e−ν1→k ν1→k δΓν1→k (δ) . (7.2.7) Using previous results, the average size E[L1→k ] of the "1 → k" queueing system is easily computed. 68 From equation (7.2.4), we can write after some simplifications: E[L1→k ] = ν1→k − Γν1→k (1 + δ) . Γν1→k (δ) (7.2.8) Let Lk = L1→k − L1→k−1 be the number of orders in the book at price k ∈ {1, . . . , K}. Then the average shape of the order book at price k is obviously : ! Γν1→k (1 + δ) Γν1→k−1 (1 + δ) E[L ] = νk − − . Γν1→k (δ) Γν1→k−1 (δ) k 7.2.2 (7.2.9) A continuous extension of the basic model In order to facilitate the comparison with existing results, we derive a continuous version of the previous toy model. Price is now assumed to be a positive real number. Mechanisms for market orders and cancellations are identical: unit-size market orders are submitted according to a Poisson process with rate µ, and standing limit orders are cancelled after some exponential random time with parameter θ. As for the submission of limit orders, the mechanism is now slightly modified: since the price is continuous, instead of a finite set of homogeneous Poisson processes indexed by the number of ticks k ∈ {1, . . . , K}, we now consider a spatial Poisson process on the positive quadrant R2+ . Let λ(p, t) be a non-negative function denoting the intensity of the spatial Poisson process modelling the arrival of limit orders, the first coordinate representing the price, the second one the time [see e.g. 282, Chapter 12 for a textbook introduction on the construction of spatial Poisson processes]. As in the discrete case, this process is assumed to be time-homogeneous, and it is hence assumed that price and time are separable. Let hλ : R+ → R+ denote the spatial intensity function of the random events, i.e. limit orders. Then, λ(p, t) = αhλ (p) is the intensity of the spatial Poisson process representing the arrival of limit orders. We recall that in this framework, for any p1 < p2 ∈ [0, ∞), the number of limit orders submitted at Rp Rp a price p ∈ [p1 , p2 ] is a homogeneous Poisson process with intensity p 2 λ(p, t) d p = α p 2 hλ (p) d p. 1 1 Furthermore, if p1 < p2 < p3 < p4 on the real positive half-line, then the number of limit orders submitted in [p1 , p2 ] and [p3 , p4 ] form two independent Poisson processes. Now, let L([0, p]) be the random variable describing the cumulative size of our new order book up to price p ∈ R+ . Given the preceding remarks, L([0, p]) is, as in the previous section, the size of a Rp M/M/1 + M queueing system with arrival rate α 0 hλ (u) du, service rate µ and reneging rate θ. Using the results of section 7.2.1, we obtain from equation (7.2.8): E[L([0, p])] = p Z h(u) du − f δ, 0 where we have defined h(u) = αhλ (u) θ R p h(u) du 0 ! h(u) du , (7.2.10) 0 and : f (x, y) = From now on, H(p) = p Z Γy (1 + x) . Γy (x) (7.2.11) will be the (normalized) arrival rate of limit orders up to price p, and B(p) = E[L([0, p])] will be the average cumulative shape of the order book up to price p. Then 69 b(p) = dB(p) dp will be the average shape of the order book (per price unit, not cumulative). Straightforward differentiation of equation (7.2.10) and some terms rearrangements lead to the following proposition. Proposition 7.2.1 In a continuous order book with homogeneous Poisson arrival of market orders with intensity µ, spatial Poisson arrival of limit orders with intensity αhλ (p), and exponentially distributed lifetimes of non-executed limit orders with parameter θ, the average shape of the order book b is computed for all p ∈ [0, ∞) by: h h ii b(p) = h(p) 1 − δ (gδ ◦ H) (p) 1 − δ[H(p)]−1 1 − (gδ ◦ H) (p) , where gδ (y) = e−y yδ . δΓy (δ) (7.2.12) (7.2.13) Let us give a few comments on the average shape we obtain. Firstly, note that by identification to equation (7.2.4), observing that π1→k (0) = gδ (ν1→k ) in the discrete model, gδ (H(p)) is to be interpreted as the probability that the order book is empty up to a price p. Secondly, note that letting δ → 0 in equation (7.2.12) gives b(p) → h(p) (cf. limδ→0 gδ (y) = e−y ). Indeed, if there were no market orders, then the average shape of the order book would be equal to the normalized arrival rates. Thirdly, as p → ∞, we have b(p) ∼ k h(p) for some constant k. This leads to our main comment, which we state as the following proposition. Proposition 7.2.2 The shape of the order book b(p) can be written as: b(p) = h(p)C(p), (7.2.14) where C(p) is the probability that a limit order submitted at price p will be cancelled before being executed. This proposition translates a law of conservation of the flows of orders: the shape of the order book is exactly the fraction of arriving limit orders that will be cancelled. The difference between the flows of arriving limit orders and the order book is exactly the fraction of arriving limit orders that will be executed. The proof is straightforward. Indeed, in the 1 → k queueing system, the average number of limit orders at price k that are cancelled per unit time is θE[Lk ] (θE[L1→k ] is the reneging rate of 1 → k queue using queueing system vocabulary). Therefore, the fraction of cancelled orders at price k over arriving limit orders at price k, per unit time, is Ck = θE[Lk ] λk . Using equation (7.2.9) and some straightforward computations, the fraction Ck of limit orders submitted at price k which are cancelled is: Ck = 1 − δ (gδ (ν1→k−1 ) − gδ (ν1→k )) . νk (7.2.15) Therefore, in the continuous model up to price p ∈ R+ with average cumulative shape B(p), the fraction of limit orders submitted at a price in [p, p + ] which are cancelled is written: 1− δ (gδ (H(p)) − gδ (H(p + ))) . H(p + ) − H(p) 70 (7.2.16) By letting, → 0, we obtain that the fraction C(p) of limit orders submitted at price p ∈ R+ which are cancelled is : C(p) = 1 − δg0δ (H(p)) ! δ = 1 − δ(gδ ◦ H)(p) 1 − (1 − (gδ ◦ H)(p)) , H(p) (7.2.17) which gives the final result. This law of conservations of the flows of orders explains the relationship between the shape of the order book and the flows of arrival of limit orders. For high prices, two cases are to be distinguished. On the one hand, if the total arrival rate of limit orders is a finite positive constant α (for example when hλ is R∞ a probability density function on [0, +∞), in which case lim p→+∞ H(p) = 0 λ(u, t) du = α ∈ R∗+ ), then the proportionality constant between the shape of the order book b(p) and the normalized limit order flow h(p) is, as p → +∞, C∞ defined as: δ C∞ = lim C(p) = 1 − δgδ (α) 1 − (1 − gδ (α)) < 1. p→∞ α (7.2.18) In such as case, the shape of the order book as p → +∞ is proportional to the normalized rate of arrival of limit orders h(p), but not equivalent. The fraction of cancelled orders does not tend to 1 as p → +∞, i.e. market orders play a role even at high prices. On the other hand, in the case where lim p→+∞ H(p) = ∞, then very high prices are not reached by market orders, and the tail of the order book behaves exactly as if there were no market orders: b(p) ∼ h(p) as p → +∞. We may remark here that there exists an empirical model for the probability of execution F(p) = 1 − C(p) in [248]. In this latter model, it is assumed to be the complementary cumulative distribution function of a Student distribution with parameter s = 1.3. As such, it is decreasing towards 0 as p−s . In our model however, it is exponentially decreasing, and, in view of the previous discussion, does not necessarily tends towards 0. 7.3 Comparison to existing results on the shape of the order book The model presented in section 7.2 belongs to the class of "zero-intelligence" Markovian order book models: all order flows are independent Poisson processes. Although very simple, it turns out it replicates the shapes of the order book usually obtained in previous empirical and numerical studies, as we will now see. 7.3.1 Numerically simulated shape in [303] A first result on the shape of the order book is provided in [303], on figures 3a) and 3b). These figures are obtained by numerical simulations of an order book model very similar to the one presented in section 7.2, where all order flows are Poisson processes: market orders are submitted are rate µS with size σS , limit orders are submitted with the same size at rate αS per unit price on a grid with tick size d pS , and all orders are removed randomly with constant probability δS per unit time. (We have indexed all variables with an S to differentiate them from our own notations). Figures 3a) and 3b) in [303] are obtained for different values of a "granularity" parameter S ∝ δS σS µS . It is observed that, when S gets larger, the average book becomes deeper close to the spread, and thinner for higher prices. 71 Using our own notations, S actually reduces to 1δ , i.e. the inverse of the normalized rate of arrival of market orders. Using [303]’s assumption that limit orders arrive at constant rate αS per unit price and unit time, we obtain in our model λ(p, t) = αS , i.e. H(p) = αS p. On figure 7.1, we plot the average shapes and cumulative shapes of the order book given at equations (7.2.12) and (7.2.10) with this H. It turns out that when δ varies, our basic model indeed reproduces precisely figure 3a) and 3b) of [303]. Therefore we are able to analytically describe the shapes that were only numerically obtained. These shapes can be straightforwardly obtained with different regimes of market orders in our basic model: when the arrival rates of market orders increases (i.e. when S increases), all other things being equal, the average shape of the order book is thinner for lower prices. 7.3.2 Empirical and analytical shape in [49] We now give two more examples of order book shapes obtained with equation (7.2.12) of proposition 7.2.1. We successively consider two types of normalized intensities of arrival rates of limit orders: • exponentially decreasing with the price: h(u) = αθ βe−βu ; • power-law decreasing with the price: h(u) = αθ (γ − 1)(1 + u)−γ , γ > 1. The first case is the one observed on Chinese stocks by [163]. The second case is the one suggested in an empirical study by [49], in which γ ≈ 1.5 − 1.7. Moreover, the latter paper provides the only analytical formula previoulsy proposed (to our knowledge) linking the order flows and the average shape of a limit order book: [49] derives an analytical formula from a zero-intelligence model by assuming that the price process is diffusive with diffusion constant D. Using our notations, their average order book, denoted here bBP , is for any p ∈ (0, ∞): −σ−1 p p Z hλ (u) sinh(σ u) du + sinh(σ p) −1 bBP (p) ∝ e ∞ Z −1 0 −1 u hλ (u)e−σ du, (7.3.1) p where the parameter σ2 is homogeneous to the variance of a price (it is proportional to the diffusion coefficient divided by the cancellation rate). σ is thus interpreted in [49] as "the typical variation of price during the lifetime of an order, and [it] fixes the scale over which the order book varies". Therefore, although σ is not available in our model, since our one-side model does not have a diffusive price, we may however obtain a satisfying order of magnitude for the parameter by computing the standard deviation of the price in our model, using numerical simulations (see also remark 7.3.1 below). We plot the shape of [49] for the two types of normalized arrival rates of limit orders previously mentionned. Note that the formula (7.3.1) is defined up to a multiplicative constant that we arbitrarily fix such that the maximum offered with respect to the price in our model is equal to the maximum of equation (7.3.1). Results are plotted in figure 7.2, and numerical values given in caption. It turns out that our model (7.2.12) and equation (7.3.1) provide similar order book shapes. Since equation (7.3.1) has been successfully tested with empirical data in [49], figure 7.2 provides a good hint that the shape (7.2.12) could provide good empirical fittings as well. As p → ∞, both formulas lead to a shape b(p) decreasing as the arrival rate of limit orders h(p), which was already observed in [49], and discussed here in section 7.2. The main difference between the shapes occurs as p → 0. Equation (7.3.1) imposes 72 1.0 Average shape b p 0.8 0.6 0.4 0.2 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 2.0 2.5 3.0 Price p Average cumulative shape B p 2.0 1.5 1.0 0.5 0.0 0.0 0.5 1.0 1.5 Price p Figure 7.1: Shape (top panel) and cumulative shape (bottom panel) of the order book computed using equations (7.2.12) and (7.2.10) with h(p) = α, α = 8 and δ = 1 (full line), δ = 2 (dashed), δ = 6 (dotdashed), δ = 16 (dotted). Note that results are scaled on the same dimensionless axes used in [303]. 73 12 Average shape b(p) 10 8 6 4 2 0 0 2 4 6 8 10 Price p Figure 7.2: Comparison of the shapes of the order book in our model (black curves) and using the formula proposed by [49] (red curves). Three examples are plotted: arrival of limit orders with exponential prices, α = 20, β = 0.75, µ = 4, θ = 1 (full lines) ; idem with µ = 8 (dashed lines) ; arrival of limit orders with power-law prices, α = 40, γ = 1.6, µ = 4, θ = 1 (dash-dotted lines). that bBP = 0, whereas the result (7.2.12) allows a more flexible behaviour with b(0) = h(0) 1+δ , a quantity that depends on the three types of order flows. The difference of behaviour close to the spread is not surprising considering the different natures of both models. Remark 7.3.1 We have used numerical simulations of our model to compute the standard deviation of the price in our model. Note however that this could be found by a numerical evaluation of the analytical form of the standard deviation of the price. Indeed, the stationary distribution of the price in our continuous model can be explicitly derived. Assuming this distribution admits a density function π p , then the same observation that leads to equations (7.2.6) and (7.2.7) in the discrete case gives here for any p ∈ [0, ∞): Z p π p (p) d p = 1 − 0 e−H(p) [H(p)]δ , δΓH(p) (δ) (7.3.2) which is written by straighforward differentiation and some terms rearrangements: 1 − C(p) . π p (p) = h(p)(gδ ◦ H)(p) 1 − δ[H(p)]−1 (1 − (gδ ◦ H)(p)) = h(p) δ (7.3.3) Some examples of this distribution are plotted on figure 7.3. Now, using this explicit distribution of the price in our order book model, we may compute its standard deviation in the example cases described above, by numerically evaluating the integrals defining the first two moments of the distribution. Finally, let us recall that the price process in our model is a jump process with right-continuous paths, so it is not diffusive. Note however that the price process in similar two-sided order book models has been shown to admit a diffusive limit with an appropriate time scaling [see e.g. 7]. 74 3.0 Probability density function 2.5 2.0 1.5 1.0 0.5 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Price Figure 7.3: Price density function π p as a function of the price, computed with equation (7.3.3), with hλ constant (full line), exponentially decreasing (dotted) and power-law decreasing (dashed). 7.4 A model with varying sizes of limit orders We now allow for random sizes of limit orders in our model. As in section 7.2, we start by describing the basic model as a queueing system, and then extend it to the case of a continuous price. Let us recall that we deal with a one-side order book model, i.e. a model in which all limit orders are ask orders, and all market orders are bid orders. Let p(t) denote the price at time t. {p(t), t ∈ [0 ∈ ∞)} is a continuous-time stochastic process with value in the discrete set {1, . . . , K}, i.e. the price is given in number of ticks. For all i ∈ {1, . . . , K}, (ask) limit orders at price i are submitted according to a Poisson process with parameter λi . These processes as assumed to be mutually independent, so that the number of orders submitted at prices between q and r (included) is a Poisson process with parameter P λq→r defined as λq→r = ri=q λi . The contribution of this section is to allow for random sizes of limit orders, instead of having unitsize limit orders as in the basic model of section 7.2. We assume that all the sizes of limit orders are independent random variables. We also assume that the sizes of limit orders submitted at a given price are identically distributed, but we allow this distribution to vary depending on the price. For a given price k ∈ N∗ , let gkn , n ∈ N∗ , denote the probability that a limit order at price k is of size n. Let gk denote the mean size of a limit order at price k, which is assumed to be finite. It is a well known property of the Poisson process to state that the rate of arrival of limit orders of size n at price i is λi gin , hence the rate of P arrival of limit orders of size n with a price lower or equal to k is ki=1 λi gin . Similarly, the probability that P P λi λi gin . Let g1→k = ki=1 λ1→k gi a limit order with a price lower or equal to k is of size n is g1→k = ki=1 λ1→k n denote the mean size of a limit order with price up to k. Mechanism for cancellation is unchanged: all limit orders standing in the book may be cancelled. Note however that a limit order is not cancelled all at once, but unit by unit, i.e. share by share (see also Remark 7.5.1 below). It is assumed that the time intervals between the submission of a limit order and the cancellation of one share of this order form a set of mutually independent random variables identically distributed according to an exponential distribution with parameter θ > 0. Finally, (buy) market orders are submitted at random times according to a Poisson process with parameter µ. All 75 market orders are assumed to be of unit size. As in section 7.2, let {L1→k (t), t ∈ [0, ∞)} be the stochastic process representing the number of limit orders at prices 1, . . . , k standing in the order book at time t. L1→k is thus the cumulative shape of the order book in our model. It can be viewed as the size of a M X /M/1 + M queueing system with bulk arrival rate λ1→k , bulk volume distribution (g1→k n )n∈N∗ , service rate µ and reneging rate θ (see e.g. [79] for queueing systems with bulk arrivals). The infinitisemal generator of the process L1→k is thus written: −λ1→k λ1→k g1→k λ1→k g1→k λ1→k g1→k 1 2 3 1→k 1→k µ + θ −(λ1→k + µ + θ) λ g λ g 1→k 1 1→k 2 0 µ + 2θ −(λ1→k + µ + 2θ) λ1→k g1→k 1 0 0 µ + 3θ −(λ + µ + 3θ) 1→k .. .. .. .. . . . . λ1→k g1→k . . . 4 λ1→k g1→k . . . 3 λ1→k g1→k . . . . 2 λ1→k g1→k . . . 1 .. . . . . (7.4.1) The stationary distribution π1→k = (πn1→k )n∈N of L1→k hence satisfies the following system of equations: + (µ + θ)π1→k 0 = −λ1→k π1→k 0 1 , Pn 1→k π1→k , (n ≥ 1), 0 = −(λ1→k + µ + nθ)π1→k + (µ + (n + 1)θ)π1→k n n−i n+1 + i=1 λ1→k gi which can be solved by introducing the probability generating functions. Let Π1→k (z) = P n and G1→k (z) = n∈N∗ g1→k n z . Let us also introduce the normalized parameters δ= (7.4.2) P λ1→k µ and ν1→k = . θ θ 1→k n n∈N πn z (7.4.3) By multiplicating the n-th line by zn and summing, the previous system leads to the following differential equation: δ δ d 1→k Π (z) + − ν1→k H 1→k (z) Π1→k (z) = π1→k , dz z z 0 where we have set H 1→k (z) = Π 1→k 1−G1→k (z) . 1−z (z) = z −δ (7.4.4) This equation is straightforwardly solved to obtain: ν1→k δπ1→k 0 e Rz 0 H 1→k (u) du z Z vδ−1 e−ν1→k Rv 0 H 1→k (u) du dv, (7.4.5) 0 and the condition Π1→k (1) = 1 leads to π1→k 0 = δ Z 1 δ−1 ν1→k R1 H 1→k (u) du !−1 , (7.4.6) H 1→k (u) du v dv −δ 0 z R . R1 1 δ−1 ν1→k H 1→k (u) du v v e dv 0 (7.4.7) v e v dv 0 which by substituting in the general solution gives: Rz Π 1→k (z) = vδ−1 eν1→k Rz Now, turning back to the differential equation (7.4.4), then taking the limit when z tends increasingly to 1 76 and using basic properties of probability generating functions (limz→1 Π1→k (z) = 1, limz→1 t<1 t<1 d 1→k (z) dz Π = E[L1→k ] and limz→1 H 1→k (z) = g1→k ), we obtain the result stated in the following proposition. t<1 Proposition 7.4.1 In the discrete one-side order book model with Poisson arrival at rate µ of unit size market orders, Poisson arrival of limit orders with rate λk at price k and random size with distribution (gkn )n∈N∗ on N∗ , and exponential lifetime of non-executed limit orders with parameters θ, the average cumulative shape of the order book up to price k is given by: E[L 1→k ] = ν1→k g1→k −δ+ 1 Z δ−1 ν1→k v e R1 v H 1→k (u) du !−1 dv . (7.4.8) 0 Note that by taking the sizes of all limit orders to be equal to 1, i.e. by setting gk1 = 1 and gkn = 0, n ≥ 2, equation (7.4.8) reduces to equation (7.2.8) of section 7.2, as expected. We now introduce a specification of the model where the sizes of limit orders are geometrically distributed with parameter q ∈ (0, 1) and independent of the price, i.e. for any price k ∈ N∗ , gkn = (1 − q)n−1 q. This specification is empirically founded, as it has been observed that the exponential distribution may be a rough continuous approximation of the distribution of the sizes of limit orders [see e.g. 67]. This specification of the volume distribution straightfowardly gives H 1→k (z) = 1 1−(1−q)z and with some computations we obtain: ν1→k E[L 1→k ν1→k δq 1−q ]= −δ+ , −ν1→k q 2 F 1 (δ, 1−q , 1 + δ, 1 − q) (7.4.9) where 2 F1 is the ordinary hypergeometric function [see e.g. 296, chapter 2]. Now, following the idea presented in section 7.2, we consider an order book with a continuous price, in which limit orders are submitted according to a spatial Poisson process with intensity λ(p, t) = αhλ (p). Recall that hλ is assumed to be a real non-negative function with positive support, denoting the spatial intensity of arrival rates, i.e. the function such that the number of limit orders submitted in the order Rp book in the price interval [p1 , p2 ] is a homogeneous Poisson process with rate p 2 αhλ (u) du. Using 1 notations defined in section 7.2, the cumulative shape at price p ∈ [0, +∞) of this continuous order book is thus: H(p) δq 1−q 1 , B(p) = H(p) − δ + −H(p) q 2 F 1 (δ, 1−q , 1 + δ, 1 − q) (7.4.10) which can be derived to obtain the average shape b(p) of the order book, which we state in the following proposition. Proposition 7.4.2 In a continuous one-side order book with homogeneous Poisson arrival of unit-size market orders with intensity µ, spatial Poisson arrival of limit orders intensity αhλ (p), geometric distribution of the sizes of limit orders with parameter q, and exponentially distributed lifetimes of nonexecuted limit orders with parameter θ, the average shape of the order book b is computed for all p ∈ [0, ∞) by: H(p) d h(p) δq 1−q . b(p) = + −H(p) q d p 2 F1 (δ, , 1 + δ, 1 − q) 1−q 77 (7.4.11) 7.5 Influence of the size of limit orders on the shape of the order book We now use the results of section 7.4 to investigate the influence of the size of the limit orders on the shape of the order book. Recall that market orders are submitted at rate µ with size 1, that non-executed limit orders are cancelled share by share after a random time with exponential distribution with parameter θ, and that the distribution of the sizes of limit orders is a geometric distribution with parameter q (i.e. with mean 1q ). We study in detail the influence of the parameter q on the theoretical shape of the order book. In a first example, we assume that the normalized intensity of arrival of limit orders h is constant [i.e. as in 303] and equal to αq. Note that when q varies, the mean total volume V(p) of arriving limit orders up to price p per unit time remains constant: 1 V(p) = q p Z h(u) du = pα. (7.5.1) 0 In other words, when q decreases, limit orders are submitted with larger sizes in average, but less often, keeping the total submitted volume constant. The first remarkable observation is that, although the mean total volumes of limit and market orders are constant, the shape of the order book varies widely with q. On figure 7.4, we plot the shape b(p) defined at equation (7.4.11), and cumulative shape defined at equation (7.4.10), of an order book with arriving volumes of limit orders as in equation (7.5.1). With the chosen numerical values, the average volume of one limit order ranges from approximately 1 (q = 0.99) to 20 (q = 0.05). It appears that when q decreases, the shape of the order book increases for lower prices. In other words, the larger the size of arriving limit orders is, the deeper is the order book around the spread, all other things being equal. We observe that figure 7.4 here is similar to figure 7.1 here and figure 3 in [303]. However, volumes of limit and market orders are equal in the two latter cases, and we have shown in section 7.3 that these different shapes can actually be obtained with different regimes of market orders, but equal sizes of market and limit orders: when the arrival rates of market orders increases, all other things being equal, the average shape of the order book is thinner for lower prices. Therefore, the observation made now is different. In similar trading regimes where the mean total volume of limit and market orders are equal, we highlight the influence of the relative volume of limit orders (compared to unit market orders) on the order book shape: the smaller the average size of limit orders, the shallower the order book close to the spread. We provide a second example of the phenomenon by assuming that the intensity of incoming limit orders exhibits a power-law decrease with the price, as tested in section 7.3 to compare to the analytical shape provided in [49]. We thus have now h(p) = qα(γ − 1)(1 + p)−γ . There again, when q varies, the Rp average total volume of incoming limit orders up to price p remains constant and equal to α(γ−1) 0 (1+ u)−γ du = α(1 − (1 + p)1−γ ). Figure 7.5 plots the shape of the order book with these characteristics, when q varies. The observed phenomenon is equally clear with this more realistic distribution of incoming limit orders: the order book deepens at the first limits when the average volume of limit orders increases, the total volume of limit orders and unit-size market orders submitted being constant. Finally, one might remark that, by assuming unit-size market orders, our model with geometric 78 5 Shape b p 4 3 2 1 0 0 2 4 6 8 6 8 Price p 30 Cumulative shape B p 25 20 15 10 5 0 0 2 4 Price p Figure 7.4: Shape of the order book as computed in equation (7.4.11) (top) and cumulative shape of the order book as computed in equation (7.4.10) (bottom) with δ = 10, h(p) = αq K 1(0,K) , α = 40, K = 8, and q = 0.99 (full line), q = 0.5 (dotted), q = 0.25 (dotdashed), q = 0.10 (short-dashed), q = 0.05 (long-dashed) 79 8 Shape b p 6 4 2 0 0 2 4 6 8 10 Price p Figure 7.5: Shape of the order book as computed in equation (7.4.11) with δ = 10, h(p) = qα(γ − 1)(1 + p)−γ , α = 40 γ = 1.6, and q = 0.99 (full line), q = 0.5 (dotted), q = 0.25 (dotdashed), q = 0.10 (short-dashed), q = 0.05 (long-dashed). distribution of limit orders’ sizes does not predict the order book shape in the case where the average size of limit orders is smaller than the average size of market orders. In fact, it will now appear that this case has never been empirically encountered in our data: the average size of limit orders is always in our sample greater than the average size of market orders. Remark 7.5.1 We recall that, for analytical tractability purposes, the cancellation mechanism used in this work is that each share of submitted limit orders has a lifetime following an exponential distribution independent of the other variables. A more realistic cancellation mechanism would be that each limit order has an independent exponentially-distributed lifetime, i.e. that an order standing in the book is cancelled all at once and not share by share. It turns out this does not qualitatively change our results. We simulate an order book with this latter cancellation system, and compare the average shape of the simulated order book with our analytical formulas. Results are shown on figure 7.6 in the case of a uniform price distribution for the limit orders. Obviously, if q is close to 1, then both cancellation mechanisms are equivalent and the simulated shape converges to our analytical results. When q , 1, we cannot straightforwardly compare the two models (the cancellation rate is proportional to the number of shares in the book in one case, and to the number of orders in the book in the other one), but we nevertheless observe that both mechanisms exhibit similar shapes of the order book. Furthermore, it is interesting to note that, irrespective of the cancellation mechanism, our observation stating that the submission of larger limit orders leads to thicker order books around the spread is always valid. We may finally add that since market orders are unit-sized, q may be understood as a relative measure of the volume of limit orders compared to the volume of market orders, and is therefore expected to be closer to 1 than to 0. 80 6 Average order book b(p) 5 4 3 2 1 0 0 1 2 3 4 Price p 5 6 7 8 Figure 7.6: Shape of the order book as computed in equation (7.4.11), i.e. with share-by-share cancellation (full lines), and numerically simulated with order-by-order cancellation (dashed lines), with µ = 10, qα θ = 1, h(p) = θK , α = 40, K = 8, and q = 0.99 (blue), q = 0.5 (green), q = 0.25 (red), q = 0.10 (cyan), q = 0.05 (magenta). 7.6 Conclusion We have presented in this chapter a simple order book model based on classic results from queueing theory. We have derived a continuous version of the model and shown that it provides an analytical formula for the shape of the order book that reproduces results from acknowledged numerical and empirical studies. We then have extended the model to allow for the limit orders to be submitted with random sizes. The extended model provides some insight on the influence of the size of limit orders in an order book. This insight is confirmed by the empirical study on liquid stocks traded on the Paris Stock Exchange presented in Chapter 3. 81 82 Chapter 8 Advanced modelling of limit order books In this chapter, we first review the existing literature on agent-based modelling of market interactions, before introducing and analyzing in the spirit of Chapter 6, a Hawkes process-based limit order book model. 8.1 Towards non-trivial behaviours: modelling market interactions In the simple statistical models of limit order books, flows of orders are treated as independent processes. Under some (strong) modelling constraints, we can see the order book as a Markov chain and look for analytical results ([97]). In any case, even if the process is empirically detailed and not trivial ([248]), we work with the assumption that orders are independent and identically distributed. This very strong (and false) hypothesis is similar to the “representative agent” hypothesis in Economics: orders being successively and independently submitted, we may not expect anything but regular behaviours. Following the work of economists such as [206, 207, 208], one has to translate the heterogeneous property of the markets into the agent-based models. Agents are not identical, and not independent. In this section we present some toy models implementing mechanisms that aim at bringing heterogeneity: herding behaviour on markets in [92], trend following behaviour in [232] or in [280], threshold behaviour [91]. Most of the models reviewed in this section are not order book models, since a persistent order book is not kept during the simulations. They are rather price models, where the price changes are determined by the aggregation of excess supply and demand. However, they identify essential mechanisms that may clearly explain some empirical data, and lay the grounds for better limit order book models such as that presented in the following sections. 8.1.1 Herding behaviour The model presented in [92] considers a market with N agents trading a given stock with price p(t). At each time step, agents choose to buy or sell one unit of stock, i.e. their demand is φi (t) = ±1, i = 1, . . . , N with probability a or are idle with probability 1 − 2a. The price change is assumed to be linearly linked 83 with the excess demand D(t) = i=1 φi (t) PN with a factor λ measuring the liquidity of the market : N 1X φi (t). p(t + 1) = p(t) + λ i=1 (8.1.1) λ can also be interpreted as a market depth, i.e. the excess demand needed to move the price by one unit. In order to evaluate the distribution of stock returns from Eq.(8.1.1), we need to know the joint distribution of the individual demands (φi (t))1≤i≤N . As pointed out by the authors, if the distribution of the demand φi is independent and identically distributed with finite variance, then the Central Limit Theorem stands and the distribution of the price variation ∆p(t) = p(t + 1) − p(t) will converge to a Gaussian distribution as N goes to infinity. The idea here is to model the diffusion of the information among traders by randomly linking their demand through clusters. At each time step, agents i and j can be linked with probability pi j = p = c N, c being a parameter measuring the degree of clustering among agents. Therefore, an agent is linked to an average number of (N − 1)p other traders. Once clusters are determined, the demand are forced to be identical among all members of a given cluster. Denoting nc (t) the number of cluster at a given time step t, Wk the size of the k-th cluster, k = 1, . . . , nc (t) and φk = ±1 its investement decision, the price variation is then straightforwardly written : n (t) ∆p(t) = c 1X Wk φk . λ k=1 (8.1.2) This modelling is a direct application to the field of finance of the random graph framework as studied in [123]. [205] previously suggested it in economics. Using these previous theoretical works, and assuming that the size of a cluster Wk and the decision taken by its members φk (t) are independent, the author are able to show that the distribution of the price variation at time t is the sum of nc (t) independent identically distributed random variables with heavy-tailed distributions : n (t) ∆p(t) = c 1X Xk , λ k=1 (8.1.3) where the density f (x) of Xk = Wk φk is decaying as : f (x) ∼|x|→∞ A −(c−1)|x| e W0 . |x|5/2 (8.1.4) Thus, this simple toy model exhibits fat tails in the distribution of prices variations, with a decay reasonably close to empirical data. Therefore, [92] show that taking into account a naive mechanism of communication between agents (herding behaviour) is able to drive the model out of the Gaussian convergence and produce non-trivial shapes of distributions of price returns. 8.1.2 Fundamentalists and trend followers [232] proposed a model very much in line with agent-based models in behavioural finance, but where 84 trading rules are kept simple enough so that they can be identified with a presumably realistic behaviour of agents. This model considers a market with N agents that can be part of two distinct groups of traders: n f traders are “fundamentalists”, who share an exogenous idea p f of the value of the current price p; and nc traders are “chartists” (or trend followers), who make assumptions on the price evolution based on the observed trend (mobile average). The total number of agents is constant, so that n f + nc = N at any time. At each time step, the price can be moved up or down with a fixed jump size of ±0.01 (a tick). The probability to go up or down is directly linked to the excess demand ED through a coefficient β. The demand of each group of agents is determined as follows : • Each fundamentalist trades a volume V f proportional, with a coefficient γ, to the deviation of the current price p from the perceived fundamental value p f : V f = γ(p f − p). • Each chartist trades a constant volume Vc . Denoting n+ the number of optimistic (buyer) chartists and n− the number of pessimistic (seller) chartists, the excess demand by the whole group of chartists is written (n+ − n− )Vc . Therefore, assuming that there exists some noise traders on the market with random demand µ, the global excess demand is written : ED = (n+ − n− )Vc + n f γ(p f − p) + µ. (8.1.5) The probability that the price goes up (resp. down) is then defined to be the positive (resp. negative) part of βED. As observed in [322], fundamentalists are expected to stabilize the market, while chartists should destabilize it. In addition, following [92], the authors expect non-trivial features of the price series to results from herding behaviour and transitions between groups of traders. Referring to Kirman’s work as well, a mimicking behaviour among chartists is thus proposed. The nc chartists can change their view on the market (optimistic, pessimistic), their decision being based on a clustering process modelled by an opinion index x = n+ −n− nc representing the weight of the majority. The probabilities π+ and π− to switch from one group to another are formally written : π± = v nc ±U e , N U = α1 x + α2 p/v, (8.1.6) where v is a constant, and α1 and α2 reflect respectively the weight of the majority’s opinion and the weight of the observed price in the chartists’ decision. Transitions between fundamentalists and chartists are also allowed, decided by comparison of expected returns (see [232] for details). The authors show that the distribution of returns generated by their model have excess kurtosis. Using a Hill estimator, they fit a power law to the fat tails of the distribution and observe exponents grossly ranging from 1.9 to 4.6. They also check hints for volatility clustering: absolute returns and squared returns exhibit a slow decay of autocorrelation, while raw returns do not. It thus appears that such a model can grossly fit some “stylized facts”. However, the number of parameters involved, as well as the complicated rules of transition between agents, make clear identification of sources of phenomenons and calibration to market data difficult and intractable. 85 [18, 19] provide a somewhat simplifying view on the Lux-Marchesi model. They clearly identify the fundamentalist behaviour, the chartist behaviour, the herding effect and the observation of the price by the agents as four essential effects of an agent-based financial model. They show that the number of agents plays a crucial role in a Lux-Marchesi-type model: more precisely, the stylized facts are reproduced only with a finite number of agents, not when the number of agents grows asymptotically, in which case the model stays in a fundamentalist regime. There is a finite-size effect that may prove important for further studies. The role of the trend following mechanism in producing non-trivial features in price time series is also studied in [280]. The starting point is an order book model similar to [73] and [303]: at each time step, liquidity providers submit limit orders at rate λ and liquidity takers submit market orders at rate µ. As expected, this zero-intelligence framework does not produce fat tails in the distribution of (log-)returns nor an over-diffusive Hurst exponent. Then, a stochastic link between order placement and market trend is added: it is assumed that liquidity providers observing a trend in the market will act consequently and submit limit orders at a wider depth in the order book. Although the assumption behind such a mechanism may not be empirically confirmed (a questionable symmetry in order placement is assumed) and should be further discussed, it is interesting enough that it directly provides fat tails in the log-return distributions and an over-diffusive Hurst exponent H ≈ 0.6 − 0.7 for medium time-scales, as shown in figure 8.1. Since their introduction, Hawkes processes have been applied to a wide range of research areas, from seismology in the pioneering work [175] to credit risk, financial contagion and more recently, to the modelling of market microstructure [27, 26, 28, 253, 273, 328, 102, 101]. In market microstructure, and particularly limit order book modelling, the relevance of these processes is amply demonstrated by two empirical properties of the order flow of market and limit orders at the microscopic level: • Time clustering: order arrivals are highly clustered in time, see e.g. [67][102]. • Mutual dependence: order flow exhibit non-negligible cross-dependences. For instance, as documented in [253][23][113], market orders excite limit orders, and limit orders that change the price excite market orders. By using, as in Chapter 6, techniques from the ergodic theory of Markov processes, we show that the extended process describing the joint dynamics of the limit order book and the intensities of the Hawkes process driving it, is ergodic, and that it leads to a diffusive behaviour of the price at large time scales. 8.1.3 Model setup A stylized order book model whose dynamics is governed by Hawkes processes is now introduced. We shall use the same notations and conventions as in Chapter 6 to represent the limit order book. The same three types of events can modify the limit order book: arrival of a new limit order, arrival of a new market order, cancellation of an already existing limit order. This time, the arrival of market and limit orders are described by mutually exciting Hawkes processes, see Chapter B, Section B.1.1 for a brief introduction to Hawkes processes: 86 Figure 8.1: Hurst exponent found in the Preis model for different number of agents when including random demand perturbation and dynamic limit order placement depth. Reproduced from [280]. 87 + − • M ± (t): arrival of new buy or sell market order, with intensity λ M and λ M ; ± • Li± (t): arrival of a limit order at level i, with intensity λiL , whereas the arrival of a cancellation order is modelled as in Chapter 6 by a doubly stochastic Poisson process: + − • Ci± (t): cancellation of a limit order at level i, with intensity λCi ai and λCi |bi |. 8.1.4 The infinitesimal generator A markovian (2K + 2)-dimensional Hawkes process now models the intensities of the arrivals of market and limit orders. It is characterized by the D-dimensional process (a; b; µ) of the available quantities and the intensities of the Hawkes processes decomposed as in Section ??, where D = (2K + 2)2 + 2K is the dimension of the state space. The infinitesimal generator associated with the process describing the joint evolution of the limit order book has the following expression + + + LF(a; b; µ) = λ M (F [ai − (q − A(i − 1))+ ]+ ; J M (b); µ + ∆ M (µ) − F) + K X + + + λiL (F ai + q; J Li (b); µ + ∆Li (µ) − F) i=1 + K X + + λCi ai (F ai − q; JCi (b) − F) i=1 − − − + λ M F J M (a); [bi + (q − B(i − 1))+ ]− ; µ + ∆ M (µ) − F + K X − − − λiL (F J Li (a); bi − q; µ + ∆Li (µ) − F) i=1 + − K X − − λCi |bi |( f JCi (a); bi + q − f ) i=1 D X βi j µi j i, j=1 ∂F . ∂µi j (8.1.7) In order to ease the already cumbersome notations, we have written F(ai ; b; µ) instead of F(a1 , . . . , ai , . . . , aK ; b; µ). Moreover, the notation ∆(...) (µ) stands for the jump of the intensity vector µ corresponding to a jump of the process N (...) (see Section B.1.1). The operator L is a combination of • standard difference operators corresponding to the arrival or cancellation of orders at each limit and shift operators expressing the moves in the best limits, as already seen; • drift terms coming from the mean-reverting behaviour of the intensities of the Hawkes processes between jumps. 88 Note that the infinitesimal generator is fully worked out in (8.1.7) in the case of the discrete state-space; some trivial but notationally cumbersome modifications would be necessary in order to account for the case of real-valued quantities ai , bi ’s and order size q. 8.2 Stability of the order book In this section, we study the long-time behaviour of the limit order book. A Lyapunov function is built, ensuring the ergodicity of the limit order book under the natural assumption (B.1.8). More precisely, there holds the Proposition 8.2.1 Under the standing assumptions, in particular (B.1.8), the limit order book process X is ergodic. It converges exponentially fast towards its unique stationary distribution Π. Proof. Given the existence of the Lyapunov function provided in Lemma 8.2.1 below, and the geometric drift condition (8.2.2), the result is proven using Theorem 7.1 in [246], see Chapter B, Section B.2.2. Lemma 8.2.1 For η > 0 small enough, the function V defined by V(a; b; µ) = K X K X ai + i=1 i=1 2 (2K+2) 1 X 1 |bi | + δi j µi j ≡ V1 + V2 η i, j=1 η (8.2.1) where V1 (resp. V2 ) corresponds to the part that depends only on (a; b) (resp. µ), is a Lyapunov function satisfying a geometric drift condition LV 6 −ζV + C, (8.2.2) for some ζ > 0 and C ∈ R. The coefficients δi j ’s are defined in (B.1.13) in Section B.1.1. Proof. First specialize V2 to be identical - up to a change in the indices - to the function defined by (8.2.1) in Section B.1.1 in Appendix. Regarding the "small" parameter η > 0, it will become handy as a penalization parameter, as we shall see below. Thanks to the linearity of L, there holds 1 LV = LV1 + LV2 . η The first term LV1 is dealt with exactly as in Chapter 6: + − LV1 (x) ≤ −(λ M + λ M )q + K X + − (λiL + λiL )q − i=1 + K X − λiL (iS − i)+ a∞ + i=1 K X + − (λCi ai + λCi |bi |)q i=1 K X + λiL (iS − i)+ |b∞ | (8.2.3) i=1 + − − + ≤ −(λ M + λ M )q + (ΛL + ΛL )q − λC qV1 (x) − + + K(ΛL a∞ + ΛL |b∞ |), (8.2.4) 89 where ± ΛL := K X ± ± λiL and λC := min {λCi } > 0. 1≤i≤K i=1 Computing LV2 yields an expression identical to that obtained in Section B.1.1: L(V2 ) = X j λ0 δi j αi j + (κ − 1) X i, j so that there holds k µ jk , j,k 1 γ LV = LV1 + LV2 6 −λC qV1 − V2 − G.µ + C, η η where γ is as in Equation (B.1.15), G.µ is a compact notation for the linear form in the µi j ’s obtained in (8.2.4), and C is some constant. Now, thanks to the positivity of the coefficients in V2 and of the µi j ’s, one can choose η small enough that there holds γ V2 (µ), 2η ∀µ, |G.µ| 6 which yields γ 1 LV ≡ LV1 + LV2 6 −λC qV1 − V2 + C, η 2η (8.2.5) and finally LV 6 −ζV + C, γ ) and C is some constant. with ζ = Min(λC q, 2η 8.3 Large scale limit of the price process Using the same approach as in the previous chapter, but in a more general context, we study the longtime bahaviour of the price process, taking into account the stochastic behaviour of the intensities of the point processes triggering the order book events. We first recall the expression of the price dynamics in our limit order book model. Consider for instance the mid-price, solution to the SDE (6.2.16) which we recall here ∆P h −1 (A (q) − A−1 (0))dM + (t) − (B−1 (q) − B−1 (0))dM − (t) 2 K K X X − (A−1 (0) − i)+ dLi+ (t) + (B−1 (0) − i)+ dLi− (t) dP(t) = i=1 + (A (q) − A −1 i=1 −1 (0))dCi+A (t) i − (B−1 (q) − B−1 (0))dCi−B (t) . Let us, as before, recast this equation under the following form: Pt = t 2K+2 X Z 0 i=1 90 Fi (Xu )dNui , (8.3.1) where the N i are the point processes driving the limit order book, X is the markovian process describing its state, and P is one of the price processes we are interested in. For instance, in the Poisson case dealt with in Chapter 6, X = (a, b) and the N i are the Poisson processes driving the arrival of market, limit and cancellation orders. In the context of Hawkes processes considered in this work, X = (a, b, µ) and the N i are the Poisson and Hawkes processes driving the limit order book. Also, recall that, as mentioned earlier, the Fi are bounded functions, as the price changes are bounded by the total number of limits in the book, thanks to the non-zero boundary conditions a∞ , b∞ . Denote again by Π the stationary distribution of X, as provided by Proposition 8.2.1. One can prove the Theorem 8.3.1 Write as above the price Z tX Pt = 0 Fi (X s )dN si i and its compensator Qt = Z tX 0 Fi (X s )λis ds i (note that this time, the sum is inside the integral, since the λi ’s are components of the Markov process X). Define h= X Fi (X)λi i and let a.s. 1 α = lim t→+∞ t Z tX 0 (Fi (X s ))λis ds = Z h(X)Π(dX). i Finally, introduce the solution g to the Poisson equation Lg = h − α and the associated martingale Zt = g(Xt ) − g(X0 ) − t Z Lg(X s )ds ≡ g(Xt ) − g(X0 ) − Qt + αt. 0 Then, the deterministically centered, rescaled price Pnt − αnt P¯ nt ≡ √ n converges in distribution to a Wiener process σW. ¯ The asymptotic volatility σ ¯ satisfies the identity 1 t→+∞ t σ ¯ 2 = lim Z tX Z X ((Fi − ∆i (g))(X s ))2 λis ds ≡ ((Fi − ∆i (g))(X))2 λi Π(dx). 0 i i Proof. Using again the martingale method with Pt = (Pt − Qt ) − Zt + g(Xt ) − g(X0 ) + αt ≡ (Mt − Zt ) + g(Xt ) − g(X0 ) + αt, 91 the theorem will be proven iff one can show that g ∈ L2 (Π(dx)). The condition h2 6 V (8.3.2) (where V is a Lyapunov function for the process) of Theorem 4.4 in [155] is sufficient for g to be in L2 (Π(dx)). The linear Lyapunov function V introduced in (8.2.1) does not yield the desired result, because h is no longer bounded but rather, linearly increasing in the λ’s. However, Lemma B.1.1 in Chapter B provides a Lyapunov function having a polynomial growth of arbitrarily high order in the λ ’s at infinity, thereby ensuring that Condition (8.3.2) holds. 92 Part III Simulation of limit order books 93 Chapter 9 Numerical simulation of limit order books The base model for the simulation of a limit order book is a standard zero-intelligence agent-based market simulator that can be built as follows: one agent is a liquidity provider. This agent submits limit orders in the order books, orders which he can cancel at any time. This agent is simply characterized by a few random distributions: 1. submission times of new limit orders are distributed according to a homogeneous Poisson process N L with intensity λL ; 2. submission times of cancellation of orders are distributed according to homogeneous Poisson process N C with intensity λC ; 3. placement of new limit orders is centred around the same side best quote and follows a Student’s distribution with degrees of freedom ν1P , shift parameter m1P and scale parameter s1P ; 4. new limit orders’ volume is randomly distributed according to an exponential law with mean mV1 ; 5. in case of a cancellation, the agent deletes his own orders with probability δ. The second agent in the basic model is a noise trader. This agent only submits market order (it is sometimes referred to as the liquidity taker). Characterization of this agent is even simpler: resume submission times of new market orders are distributed according to a homogeneous Poisson process N M with intensity µ; resume market orders’ volume is randomly distributed according to an exponential law with mean mV2 . For all the experiments, agents submit orders on the bid or the ask side with probability 0.5. This basic model will be henceforth referred to as “HP” (Homogeneous Poisson). Assumption 3 is in line with empirical observations in [248]. Assumptions 4 and 7 are in line with empirical observations in [67] as far as the main body of the distribution is concerned, although they fail to represent the broad distribution observed in empirical studies. Assumptions 1, 2 and 6 (Poisson) will be enforced throughout Section ??. A more complicated, and more interesting dependence structure will be implemented and studied in Section 9.2, where Hawkes processes are used to reproduce the interaction between liquidity taking and providing. 95 9.1 Zero-intelligence limit order book simulator In order to gain a better intuitive understanding of the “mechanics” of the model, we sketch in Algorithm 1 below the simulation procedure in pseudo-code (see also [152] for a similar description.) For simplicity, we take a symmetric order book model. We also let (usual notations): λL := Λ L := λC (a) := ΛC (a) := λC (b) := ΛC (b) := λ1L , . . . , λLK , K X λiL , i=1 λC1 a1 , . . . , λCK aK , K X λCi ai , i=1 λC1 |b1 |, . . . , λCK |bK | , K X λCi |bi |, (9.1.1) (9.1.2) (9.1.3) (9.1.4) (9.1.5) (9.1.6) i=1 Λ(a, b) := 2(λ M + ΛL ) + ΛC (a) + ΛC (b). (9.1.7) In order to put the simulation results and the data on the same footing, we relax the assumption of constant order sizes; we draw the order volumes from lognormal distributions. The parameters of the model are estimated from tick by tick data as detailed in Appendix A.1. For concreteness1 , we use the parameters of the stock SCHN.PA (Schneider Electric) in March 2011 for the plots. They are summarized in tables 9.1 and 9.2. Figure 9.1 represents the average depth profile, that is, the average number of outstanding shares at a distance of i ticks from the best opposite price. The agreement between the simulation and the data is fairly good (See panel (a) of figure 9.10 for a cross-sectional view on CAC 40 stocks.) We also plot the distribution of the spread in figure 9.2. Note that the simulated distribution is tighter than the actual one (this is systematic and is documented in panel (b) of figure 9.10.) Figure 9.3 shows the fast decay of the autocorrelation function of the price increments. Note the high negative autocorrelation of simulated trade prices relatively to the data. In accordance with the theoretical analysis, figures 9.4–9.6 illustrate the asymptotic normality of price increments. The signature plot of the price time series is defined as the variance of price increments at lag h normalized by the lag, that is σ2h := V [P(t + h) − P(t)] . h (9.1.8) This function measures the variance of price increments per time unit. It is interesting in that it shows the transition from the variance at small time scales where micro-structure effects dominate, to the long-term variance. By theorem ??2 lim σ2h = σ2 , for some fixed value σ. h→∞ 1 2 The results are qualitatively the same for all CAC 40 stocks. Strictly spreaking, we proved the result in event-time. 96 (9.1.9) Algorithm 1 Order book simulation. Require: Model parameters— Arrival rates: λ M , {λiL }i∈{1,...K} , {λCi }i∈{1,...K} , order book size: K, reservoirs: a∞ , b∞ , volume distribution parameters: (v M , s M ), (vL , sL ), (vC , sC ). Simulation Parameters— Number of time steps: N. Initialization— t ← 0, X(0) ← Xinit . 1: for time step n = 1, . . . , N, do 2: Compute the best bid pB and best ask pA . K X 3: Compute ΛC (b) = λCi |bi |, i.e. the weighted sum of shares at price levels from pA − K to pA − 1. i=1 K X 4: Compute ΛC (a) = λCi ai . 5: Draw the waiting time τ for the next event from an exponential distribution with parameter i=1 Λ(a, b) = 2(λ M + ΛL ) + ΛC (a) + ΛC (b). 6: 7: 8: 9: 10: 11: Draw a new event according to the probability vector λ M , λ M , ΛL , ΛL , ΛC (a), ΛC (b) /Λ(a, b). These probabilities correspond respectively to a buy market order, a sell market order, a buy limit order, a sell limit order, a cancellation of an existing sell order and a cancellation of an existing buy order. Depending on the event type, draw the order volume from a lognormal distribution with parameters (v M , s M ), (vL , sL ) or (vC , sC ). If the selected event is a limit order, select the relative price level from {1, 2, . . . , K} according to the probability vector λ1L , . . . , λLK /ΛL . If the selected event is a cancellation, select the relative price level at which to cancel an order from {1, 2, . . . , K} according to the probability vector λC1 a1 , . . . , λCK aK /ΛC (a). (or λC (b)/ΛC (b) for the bid side.) Update the order book state according to the selected event. Enforce the boundary conditions: ai = a∞ , i ≥ K + 1, bi = b∞ , i ≥ K + 1. Increment the event time n by 1 and the physical time t by τ. 13: end for Remark 9.1.1 For the practical implementation, it is easier to work with an “absolute” price frame ∆p × {1 . . . L} where L K. 12: 97 We verify this numerically in figure 9.7. Two remarks are in order regarding the signature plot: Long-term variance— The simulated long-term variance is systematically lower than the variance computed from the data (This is documented in panel (c) of figure 9.10.) Intuitively, the depth of the order book is expected to increase from the best price towards the center of the book. In the absence of autocorrelation in trade signs, this would cause prices to wander less often far away from the current best as they hit a higher “resistance”. We also suspect that actual prices exhibit locally more “drifting phases” than in a (symmetric) Markovian model where the expected price drift is null at all times. An interesting analysis of a simple order book model that allows time-varying arrival rates can be found in [75]. Short-term variance— The signature plot predicted by the model is too high at short time scales relative to the asymptotic variance, especially for traded prices. This is classically known to be due to bid-ask bounce. It is however remarkable that the signature plot of actual trade prices looks much flatter compared to the simulation (See figure 9.7.) This was discovered and discussed in detail by Bouchaud et al. in [48], and Lillo and Farmer in [225] (See also [135] and [47].) They note that actual order signs exhibit positive long-ranged correlations. They also note that actual prices are diffusive—the signature plot is flat—even at small time scales. They solve this apparent paradox by showing that diffusivity results from two opposite effects: autocorrelation in trade signs induces persistence in the prices, just at the exact amount to counterbalance the mean reversion induced by the liquidity stored in the order book. This subtle equilibrium between liquidity takers and liquidity providers which guarantees price diffusivity at short lags, is not accounted for by the bare Markovian order book model we study, and one can speak about anomalous diffusion at short time scales for Markovian order book models [303]. Because of the absence of positive autocorrelation in trade signs in the model, this effect is magnified when one looks at trades. The next paragraph elaborates on this point. Anomalous diffusion at short time scales. A qualitative understanding of the discrepancy between the model and the data signature plots at short time scales can be gleaned with the following heuristic argument. In what follows, we reason in trade time t. Denote by PT r (t) the price of the trade at time t, and α(t) its sign: α(t) = 1, for a buyer initiated trade, i.e. a buy market order, (9.1.10) α(t) = −1, for a seller initiated trade, i.e. a sell market order. (9.1.11) and, We assume that the two signs are equally probable (symmetric model). But to make the argument valid for both the model (for which successive trade signs are independent) and the data (for which trade signs exhibit long memory) we do not assume independence of successive trade signs. Let also for a quantity Z ∆Z(t) := Z(t + 1) − Z(t). (9.1.12) 1 PT r (t) = P(t− ) + α(t)S (t− ), 2 (9.1.13) We have by definition where P(t− ) and S (t− ) are respectively the prevailing mid-price and spread just before the trade. From 98 2000 Model Data 1500 Quantity (shares) 1000 500 0 −500 −1000 −1500 −2000 −30 −20 −10 0 10 Distance from best opposite quote (ticks) 20 30 Figure 9.1: Average depth profile. Simulation parameters are summarized in tables 9.1 and 9.2. 99 1 Model Data 0.9 0.8 Frequency 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 Spread (ticks) 7 8 9 10 Figure 9.2: Probability distribution of the spread. Note that the model (dark gray) predicts a tighter spread than the data. 100 1 Autocorrelation of price increments Trade price (Model) Mid−price (Model) Trade price (Data) 0.5 0 −0.5 0 2 4 6 8 10 12 Lag (trade time) 14 16 18 20 Figure 9.3: Autocorrelation of price increments. This figure shows the fast decay of the autocorrelation function, and the large negative autocorrelation of trades at the first lag. 101 70 60 50 P (t) - P (0) (ticks) 40 30 20 10 0 −10 −20 −30 −40 0 5 10 15 20 25 Time (hours) 30 35 40 Figure 9.4: Price sample path. At large time scales, the price process is a Brownian motion. 102 Quantiles of Midprice Increments h=1s h = 30 s 5 5 0 0 −5 −5 −5 0 5 Standard Normal Quantiles −5 h = 120 s 5 0 0 −5 −5 0 5 h = 300 s 5 −5 0 5 −5 0 5 Figure 9.5: Q-Q plot of mid-price increments. h is the time lag in seconds. This figure illustrates the aggregational normality of price increments. 103 0.25 0.2 Frequency 0.15 0.1 0.05 0 −8 −6 −4 −2 0 2 P(t + h) - P(t) (ticks) 4 6 Figure 9.6: Probability distribution of price increments. Time lag h = 1000 events. 104 8 Trades (Sim.) Mid−price (Sim.) Trades (Data) Mid−price (Data) 1.2 1 0.25 0.8 σh2 0.2 0.15 0.6 0.1 0.05 0.4 0 20 40 60 140 160 80 100 0.2 0 20 40 60 80 100 120 Time lag h (trade time) 180 200 (a) Trade time. 0.25 Trades (Sim.) Mid−price (Sim.) Trades (Data) Mid−price (Data) 0.2 0.06 0.05 0.15 σh2 0.04 0.03 0.02 0.1 0.01 0 0 100 200 300 0.05 0 0 100 200 300 400 500 600 Time lag h (seconds) 700 800 900 1000 (b) Calendar time. Figure 9.7: Signature plot: σ2h := V [P(t + h) − P(t)]/h. y axis unit is tick2 per trade for panel (a) and tick2 .second−1 for panel (b). We used a 1,000,000 event simulation run for the model signature plots. Data signature plots are computed separately for each trading day [9 : 30–14 : 00] then averaged across 23 days. For calendar time signature plots, prices are sampled every second using the last tick rule. The 105 inset is a zoom-in. equation (9.1.13) σT1 r 2 := V[∆PT r (t)] 2 = E ∆PT r (t) h 2 i = E ∆P(t− ) + E ∆P(t− )∆(α(t)S (t− )) 2 i 1 h + E ∆(α(t)S (t− )) . 4 (9.1.14) The first term in the r.h.s. is the variance of mid-price increments σ21 . The second term represents the covariance of mid-price increments and the trade sign (weighted by the spread) and we assume it is negligible3 . Let us focus on the third term: ∆(α(t)S (t− )) = α(t + 1)∆S (t− ) + S (t− )∆α(t). (9.1.15) h h i h i 2 i E ∆(α(t)S (t− )) = E (∆α(t))2 E S (t− )2 + 2E α(t + 1)∆S (t− )S (t− )∆α(t) h i h i + E α(t + 1)2 E (∆S (t− ))2 . (9.1.16) Then Again, we neglect the cross term4 in the r.h.s. and we are left with h h i h i 2 i E ∆(α(t)S (t− )) ≈ E (∆α(t))2 E S (t− )2 h i + E (∆S (t− ))2 . (9.1.17) But h i h i h i E (∆α(t))2 = E α(t + 1)2 + E α(t)2 − 2E [α(t)α(t + 1)] = 2 (1 − ρ1 (α)) , (9.1.18) where ρ1 (α) is the autocorrelation of trade signs at the first lag. Finally5 : 2 σT1 r ≈ σ1 2 + h i 1 h i 1 (1 − ρ1 (α)) E S (t− )2 + E (∆S (t− ))2 . 2 4 (9.1.20) Two effects are clear from equation (9.1.20): 1. The trade price variance at short time scales is larger than the mid-price variance, 3 This amounts to neglecting the correlation between trade signs and mid-quote movements, which is justified by the dominance of cancellations and limit orders in comparison to market orders in order book data. 4 This time, we are neglecting the correlation between trade signs and spread movements. 5 More generally, after n trades: h i 1 2 (1 − ρn (α)) E S (t− )2 . σTn r ≈ σn 2 + (9.1.19) 2n 106 2. Autocorrelation in trade signs dampens this discrepancy. This partially explains6 why the trades signature plot obtained from the data is flatter than the model predictions: ρ1 (α)model = 0, while ρ1 (α)data ≈ 0.6. From a modeling perspective, a possible solution to recover the diffusivity even at very short time scales, is to incorporate long-ranged correlation in the order flow. Toth et al. [312] have investigated numerically this route using a “-intelligence” order book model. In this model, market orders signs are long-ranged correlated, that is, in trade time ρn (α) = E [α(t + n)α(t)] ∝ n−γ , γ ∈]0, 1[. (9.1.21) And the size of incoming market orders is a fraction f of the volume displayed at the best opposite quote, with f drawn from the distribution Pξ ( f ) = ξ(1 − f )ξ−1 , (9.1.22) They show that, by fine tuning the additional parameters γ and ξ, one can ensure the diffusive behavior of the price both at mesoscopic (≈ a few trades) and macroscopic (≈ hundred trades) time scales7 . Parameters estimation For reproducibility, we summarize in tables 9.1 and 9.2 the parameters used to obtain figures 9.1–9.7. These correspond to estimating the model for the stock SCHN.PA (Schneider Electric). Our dataset consists of TRTH8 data for the CAC 40 index constituents in March 2011 (23 trading days). We have tick by tick order book data up to 10 price levels, and trades. In order to avoid the diurnal seasonality in trading activity (and the impact of the US market open on European stocks), we somehow arbitrarily restrict our attention to the time window [9 : 30–14 : 00] CET. If T be the trading duration of interest each day (T = 4.5 hours—[9 : 30–14 : 00]—in our case.) Then #trades M := λc , 2T (9.1.23) and 1 λbiL := . 2T (#buy limit orders arriving i tick away from the best opposite quote + #sell lim. orders etc.) . (9.1.24) 6 although the arguments that led to (9.1.20) are rather qualitative, a back of the envelope calculation with h Interestingly, i 2 E S 2 ∈ [1, 9], gives a difference σT r − σ2 in the range [0.5, 4.5]; which has the same order of magnitude of the values obtained by simulation. 7 Note that Toth. el al. [312] model the “latent order book”, not the actual observable order book. The former represents the intended volume at each price level p, that is, the volume that would be revealed should the price come close to p. So that the interpretation of their parameters, in particular the expected lifetime τlife of an order, does not strictly match ours. 8 Thomson Reuters Tick History. 107 K a∞ b∞ M (v , s M ) (vL , sL ) (vC , sC ) λM 30 250 250 (4.00, 1.19) (4.47, 0.83) (4.48, 0.82) ± 0.1237 Table 9.1: Model parameters for the stock SCHN.PA (Schneider Electric) in March 2011 (23 trading days). Figures 9.8 and 9.9 are graphical representation of these parameters. For cancellations, we need to normalize the count by the average number of shares hXi i at distance i from the best opposite quote: 1 c C := 1 λ . i hXi i 2T (#cancellation orders in the bid side arriving i tick away from the best opposite quote + #cancellation orders in the ask side etc.) , (9.1.25) bL and λbL across 23 trading days to get the final estimates. As for the volumes, M, λ We then average λc i i we estimate by maximum likelihood the parameters (b v, b s) of a lognormal distribution separately for each order type. We depict the parameters in figures 9.8 and 9.9. 9.1.1 Results for CAC 40 stocks In order to get a cross-sectional view of the performance of the model on all CAC 40 stocks, we estimate the parameters separately for each stock and run a 100, 000 event simulation for each parameter set. We then compare in figure 9.10 the average depth, average spread and the long-term “volatility” measured directly from the data, to those obtained from the simulations. Dashed line is the identity function—It would correspond to a perfect match between model predictions and the data. Solid line is a linear regression zdata = b1 + b2 zmodel for each quantity of interest z. Note that despite the good agreement between the average depth profiles (panel (a)), and although the model successfully predicts the relative magnitudes of the long-term variance σ2∞ and the average spread hS i for different stocks, it tends to systematically underestimate σ2∞ and hS i. This may be related to the absence of autocorrelation in order signs in the model and the presence of more drifting phases in actual prices than in those obtained by simulation. 108 i (ticks) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 hXi i (shares) 276 1129 1896 1924 1951 1966 1873 1786 1752 1691 1558 1435 1338 1238 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1122 1036 943 850 796 716 667 621 560 490 443 400 357 317 29 30 285 249 ± ± λiL 0.2842 0.5255 0.2971 0.2307 0.0826 0.0682 0.0631 0.0481 0.0462 0.0321 0.0178 0.0015 0.0001 0.0 .. . 103 .λCi 0.8636 0.4635 0.1487 0.1096 0.0402 0.0341 0.0311 0.0237 0.0233 0.0178 0.0127 0.0012 0.0001 0.0 .. . .. . 0.0 .. . 0.0 Table 9.2: Model parameters for the stock SCHN.PA (Schneider Electric) in March 2011 (23 trading days). Figures 9.8 and 9.9 are graphical representation of these parameters. Log hAi (5) Log hS i σ∞ b1 −0.42 (±0.11) 0.20 (±0.06) −0.012 (±0.05) b2 1.13 (±0.04) 1.16 (±0.07) 1.35 (±0.11) R2 0.99 0.97 0.94 Table 9.3: CAC 40 stocks regression results. 109 (a) 2500 hX i i (shares) 2000 1500 1000 500 0 5 10 15 20 25 30 20 25 30 20 25 30 (b) 0.8 λL i 0.6 0.4 0.2 0 5 10 15 (c) −3 x 10 1.5 λC i 1 0.5 0 5 10 15 i (ticks) 110 Figure 9.8: Model parameters: arrival rates and average depth profile (parameters as in table 9.2). Error bars indicate variability across different trading days. (a) −2 10 Pdf −4 10 −6 10 −8 10 1 10 10 2 10 3 10 4 5 10 (b) −2 10 Pdf −4 10 −6 10 −8 10 1 10 10 2 10 3 10 4 5 10 (c) −2 10 Pdf −4 10 −6 10 −8 10 1 10 10 2 3 10 Volume (shares) 10 4 5 10 111 Figure 9.9: Model parameters: volume distribution. Panels (a), (b) and (c) correspond respectively to market, limit and cancellation orders volumes. Dashed lines are lognormal fits (parameters as in table 9.1). (a) FTE LVMH SCHN AXAF CAGR UNBP Log hAi (5) (data) 4 PRTP VIV ISPA GSZ BNPP SASY TOTF ALUA SEVI EDF VIE DANO CARR EAD ESSI PERP AIRP OREP BOUY SGEF SGOB MICP TECF VLLP SOGN ALSO PUBP ACCP SAF RENA PEUP CAPP LAFP 3 2 1 STM 0.5 1 1.5 2 2.5 3 Log hAi (5) (model) 3.5 4 4.5 (b) Log hSi (data) 2 STM RENA LAFP CAPP PEUP SOGN ACCP PUBP ALSO SAF TECF VLLP MICP ESSI SGOB OREP AIRP BOUY PERP EAD SGEF CARR DANO SEVI PRTP EDFALUA VIE BNPP SASY TOTF ISPA GSZ VIV UNBP AXAF CAGR SCHN LVMH FTE 1.5 1 0.5 0.2 0.4 0.6 0.8 1 1.2 1.4 Log hSi (model) 1.6 1.8 2 (c) SOGN σ∞ (data) 1.2 RENA STM 1 LAFP PEUPSGOB ALSO CAPP SGEF AIRP MICP VLLP ACCP ALUA CARR OREP PUBP TECF BNPP DANO SASY TOTF BOUY SAF ESSI EDF ISPA GSZ PERP PRTP EAD VIE AXAF CAGR VIV SEVI UNBP LVMH SCHN FTE 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 σ∞ (model) 1 1.2 112 Figure 9.10: A cross-sectional comparison of liquidity and price diffusion characteristics between the model and data for CAC 40 stocks (March 2011). 9.2 Simulation of a limit order book modelled by Hawkes processes The basic order book simulator is now enhanced with arrival times of limit and market orders following mutually (asymmetrically) exciting Hawkes processes, as per the the model described in Chapter 8. Modelling is based on empirical observations, verified on several markets (equities, futures on index, futures on bonds), and detailed in section 4.2 of Chapter 4. More specifically, we observe evidence of some sort of “market making” in the studied order books: after a market order, a limit order is likely to be submitted more quickly than it would have been without the market order. In other words, there is a clear reaction that seems to happen: once liquidity has been removed from the order book, a limit order is triggered to replace it. We also show that the reciprocal effect is not observed on the studied markets. These features lead to the use of asymmetric Hawkes processes for the design of a limit order book simulator . We show in Section 9.2.2 that this simple feature enables a much more realistic treatment of the bid-ask spread of the simulated order book. 9.2.1 Adding dependence between order flows Section 4.2 of Chapter 4 shows that the flow of limit orders strongly depends on the flow of market order. We thus propose that in our experiment, the flow of limit and market orders are modelled by Hawkes processes N L and N M , with stochastic intensities respectively λ and µ defined as: Z t M M µ (t) = µ0 + α MM e−βMM (t−s) dN sM Z0 t Z t L L −βLM (t−s) M αLM e dN s + αLL e−βLL (t−s) dN sL λ (t) = λ0 + 0 (9.2.1) 0 Three mechanisms can be used here. The first two are self-exciting ones, “MM” and “LL”. They are a way to translate into the model the observed clustering of arrival of market and limit orders and the broad distributions of their durations. In the empirical study [215], it is found that the measured excitation MM is important. In our simulated model, we will show (see 9.2.2) that this allows a simulator to provide realistic distributions of the durations of trades. The third mechanism, “LM”, is the direct translation of the empirical property we have presented in section ??. When a market order is submitted, the intensity of the limit order process N L increases, enforcing the probability that a “market making” behaviour will be the next event. We do no implement the reciprocal mutual excitation “ML”, since we do not observe that kind of influence on our data, as explained in section ??. Rest of the model is unchanged. Turning these features successively on and off gives us several models to test – namely HP (Homogeneous Poisson processes), LM, MM, MM+LM, MM+LL, MM+LL+LM – to try to understand the influence of each effect. 113 9.2.2 Numerical results on the order book Fitting and simulation We fit these processes by computing the maximum likelihood estimators of the parameters of the different models on our data. As expected, estimated values varies with the market activity on the day of the sample. However, it appears that estimation of the parameters of stochastic intensity for the MM and LM effect are quite robust. We find an average relaxation parameter βˆ MM = 6, i.e. roughly 170 milliseconds as a characteristic time for the MM effect, and βˆ LM = 1.8, i.e. roughly 550 milliseconds characteristic time for the LM effect. Estimation of models including the LL effect are much more troublesome on our data. In the simulations that follows, we assume that the self-exciting parameters are similar (α MM = αLL , β MM = βLL ) and ensure that the number of market orders and limit orders in the different simulations is roughly equivalent (i.e. approximately 145000 limit orders and 19000 market orders for 24 hours of continuous trading). Table 9.4 summarizes the numerical values used for simulation. Fitted parameters are in agreement with an assumption of asymptotic stationarity. Model µ0 α MM β MM λ0 αLM βLM αLL HP 0.22 1.69 LM 0.22 0.79 5.8 1.8 MM 0.09 1.7 6.0 1.69 MM LL 0.09 1.7 6.0 0.60 1.7 MM LM 0.12 1.7 6.0 0.82 5.8 1.8 MM LL LM 0.12 1.7 5.8 0.02 5.8 1.8 1.7 Common parameters: m1P = 2.7, ν1P = 2.0, s1P = 0.9 V = 275, mV = 380 2 1 λC = 1.35, δ = 0.015 βLL 6.0 6.0 Table 9.4: Estimated values of parameters used for simulations. We compute long runs of simulations with our enhanced model, simulating each time 24 hours of continuous trading. Note that using the chosen parameters, we never face the case of an empty order book. We observe several statistics on the results, which we discuss in the following sections. Impact on arrival times We can easily check that introducing self- and mutually exciting processes into the order book simulator helps producing more realistic arrival times. Figure 9.11 shows the distributions of the durations of market orders (left) and limit orders (right). As expected, we check that the Poisson assumption has to be discarded, while the Hawkes processes help getting more weight for very short time intervals. We also verify that models with only self-exciting processes MM and LL are not able to reproduce the “liquidity replenishment” feature described in section ??. Distribution of time intervals between a market order and the next limit order are plotted on figure 9.12. As expected, no peak for short times is observed if the LM effect is not in the model. But when the LM effect is included, the simulated distribution of time intervals between a market order and the following limit order is very close to the 114 0.25 0.5 Empirical BNPP Homogeneous Poisson Hawkes MM 0.2 Empirical BNPP Homogeneous Poisson Hawkes LL 0.4 100 100 10-1 -1 10 10-2 10-2 0.15 0.3 10-3 10-3 10-4 10-4 10-5 2 4 6 8 10 12 14 16 18 20 0.1 0.2 0.05 2 4 6 8 10 12 14 16 18 20 0.1 0 0 0 0.5 1 1.5 2 0 0.5 1 1.5 Figure 9.11: Empirical density function of the distribution of the durations of market orders (left) and limit orders (right) for three simulations, namely HP, MM, LL, compared to empirical measures. In inset, same data using a semi-log scale. empirical one. 0.7 Empirical BNPP Homogeneous Poisson Hawkes MM Hawkes MM LL Hawkes MM LL LM 0.6 0.5 0 10 10-1 0.4 10-2 10 0.3 -3 10-4 1 2 3 4 5 6 7 8 9 10 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 1.2 Figure 9.12: Empirical density function of the distribution of the time intervals between a market order and the following limit order for three simulations, namely HP, MM+LL, MM+LL+LM, compared to empirical measures. In inset, same data using a semi-log scale. Impact on the bid-ask spread Besides a better simulation of the arrival times of orders, we argue that the LM effect also helps simulating a more realistic behaviour of the bid-ask spread of the order book. On figure 9.13, we compare the distributions of the spread for three models – HP, MM, MM+LM – in regard to the empirical measures. We first observe that the model with homogeneous Poisson processes produces a fairly good shape for 115 2 0.3 Empirical BNPP.PA Homogeneous Poisson Hawkes MM Hawkes MM+LM 0.25 100 10-1 0.2 10-2 10-3 0.15 10-4 10-5 0.1 10-6 0 0.05 0.1 0.15 0.2 0.05 0 0 0.05 0.1 0.15 0.2 0.25 0.3 Figure 9.13: Empirical density function of the distribution of the bid-ask spread for three simulations, namely HP, MM, MM+LM, compared to empirical measures. In inset, same data using a semi-log scale. X-axis is scaled in euro (1 tick is 0.01 euro). the spread distribution, but slightly shifted to the right. Small spread values are largely underestimated. When adding the MM effect in order to get a better grasp at market orders’ arrival times, it appears that we flatten the spread distribution. One interpretation could be that when the process N M is excited, markets orders tend to arrive in cluster and to hit the first limits of the order book, widening the spread and thus giving more weight to large spread values. But since the number of orders is roughly constant in our simulations, there has to be periods of lesser market activity where limit orders reduce the spread. Hence a flatter distribution. Here, the MM+LM model produces a spread distribution much closer to the empirical shape. It appears from figure 9.13 that the LM effect reduces the spread: the “market making” behaviour, i.e. limit orders triggered by market orders, helps giving less weight to larger spread values (see the tail of the distribution) and to sharpen the peak of the distribution for small spread values. Thus, it seems that simulations confirm the empirical properties of a “market making” behaviour on electronic order books. We show on figure 9.14 that the same effect is observed in an even clearer way with the MM+LL and MM+LL+LM models. Actually, the spread distribution produced by the MM+LL model is the flattest one. This is in line with our previous argument. When using two independent self exciting Hawkes processes for arrival of orders, periods of high market orders’ intensity gives more weight to large spread values, while periods of high limit orders’ intensity gives more weight to small spread values. Adding the cross-term LM to the processes implements a coupling effect that helps reproducing the empirical shape of the spread distribution. The MM+LL+LM simulated spread is the closest to the empirical one. 116 0.3 Empirical BNPP.PA Homogeneous Poisson Hawkes MM+LL Hawkes MM+LL+LM 0.25 100 10-1 0.2 10-2 10-3 0.15 10-4 10-5 0.1 10-6 0 0.05 0.1 0.15 0.2 0.05 0 0 0.05 0.1 0.15 0.2 Figure 9.14: Empirical density function of the distribution of the bid-ask spread three simulations, namely HP, MM, MM+LM, compared to empirical measures. In inset, same data using a semi-log scale. X-axis is scaled in euro (1 tick is 0.01 euro). A remark on price returns in the model It is somewhat remarkable to observe that these variations of the spread distributions are obtained with little or no change in the distributions of the variations of the mid-price. As shown on figure 9.15, the distributions of the variations of the mid-price sampled every 30 seconds are nearly identical for all the simulated models (and much tighter than the empirical one). This is due to the fact that the simulated order books are much more furnished than the empirical one, hence the smaller standard deviation of the mid price variations. One solution to get thinner order books and hence more realistic values of the variations of the mid-price would be to increase our exogenous parameter δ. But in that case, mechanisms for the replenishment of an empty order book should be carefully studied, which is still to be done. We have shown the the use of Hawkes processes may help producing a realistic shape of the spread distribution in an agent-based order book simulator. We emphasize on the role of the excitation of the limit order process by the market order process. This coupling of the processes, similar to a “market making” behaviour, is empirically observed on several markets, and simulations confirms it is a key component for realistic order book models. Future work should investigate if other processes or other kernels (νLM in our notation) might better fit the observed orders flows. In particular, we observe very short characteristic times, which should lead us to question the use of the exponential decay. Furthermore, as pointed out in the paper, many other mechanisms are to be investigated: excitation of markets orders, link with volumes, replenishment of an empty order book, etc. 117 1 Homogeneous Poisson Hawkes MM Hawkes MM LM Hawkes MM LL Hawkes MM LL MM 0.1 0.01 0.001 0.0001 -0.04 -0.02 0 0.02 0.04 0.06 Figure 9.15: Empirical density function of the distribution of the 30-second variations of the mid-price for five simulations, namely HP, MM, MM+LM, MM+LL, MM+LL+LM, using a semi-log scale. Xaxis is scaled in euro (1 tick is 0.01 euro). 118 Chapter 10 A generic limit order book simulator Marketsim is a library for market microstructure simulation. It allows to instantiate a number of traders behaving according to various strategies and simulate their trade over number of assets. For the moment information gathered during simulation is reported in form of graphs. This library is a result of rewriting Marketsimulator library, originally developed in the Chair of Quantitative finance at École Centrale Paris, into Python. Python language was chosen as basic language for development due to its expressiveness, interactivity and availability of free high quality libraries for scientific computing. The library source can be found at http://marketsimulator.svn.sourceforge.net/viewvc/ marketsimulator/DevAnton/v3/. It requires Python >= 2.6 installed. For graph plotting free Veusz plotting software is needed (http://home.gna.org/veusz/downloads/). After installation please specify path to the Veusz executable (including executable filename) in VEUSZ_EXE environment variable. ArbitrageTrader needs free blist package to be installed: http://pypi.python.org/ pypi/blist/. 10.1 Discrete event simulation components Main class for every discrete event simulation system is a scheduler that maintains a set of actions to fulfill in future and launches them according their action times: from older ones to newer. Normally schedulers are implemented as some kind of a heap data structure in order to perform frequent operations very fast. Classical heap allows inserts into at O(logN), extracting the best element at O(logN) and accessing to the best element at O(1). 10.1.1 Scheduler Scheduler class provides following interface: class Scheduler(object): def reset(self) def currentTime(self) def schedule(self, actionTime, handler) 119 def scheduleAfter(self, dt, handler) def workTill(self, limitTime) def advance(self, dt) In order to schedule an event a user should use schedule or scheduleAfter methods passing there an event handler which should be given as a callable object (a function, a method, lambda expression or a object exposing __call__ method). These functions return a callable object which can be called in order to cancel the scheduled event. Methods workTill and advance advance model time calling event handlers in order of their action times. If two events have same action time it is garanteed that the event scheduled first will be executed first. These methods must not be called from an event handler. In such a case an exception will be issued. Since a scheduler is supposed to be single for non-parallel simulations, for convenient use from other parts of the library marketsim.scheduler.world static class is introduced which grants access to currentTime, schedule and scheduleAfter methods of the unique scheduler. This scheduler should be initialized by creating an instance of Scheduler class in client code (scheduler.create method) and it gives right to manage simulation by calling workTill and advance methods. 10.1.2 Timer It is a convenience class designed to represent some repeating action. It has following interface: class Timer(object): def __init__(self, intervalFunc): self.on_timer = Event() ... def advise(self, listener): self.on_timer += listener It is initialized by a function defining interval of time to the next event invocation. Event handler to be invoked should be passed to advise method (there can be multiple listeners). For example, sample path a Poisson process with λ=1 can be obtained in the following way: import random from marketsim.scheduler import Timer, world def F(timer): print world.currentTime Timer(lambda: random.expovariate(1.)).advise(F) world.advance(20) will print 120 0.313908407622 0.795173273046 1.50151801647 3.52280681834 6.30719707516 8.48277712333 Note that Timer is designed to be an event source and for previous example there is a more convenient shortcut: world.process(lambda: random.expovariate(1.), F) 10.2 Orders Traders send orders to a market. There are two basic kinds of orders: • Market orders that ask to buy or sell some asset at any price. • Limit orders that ask to buy or sell some asset at price better than some limit price. If a limit order is not competely fulfilled it remains in an order book waiting to be matched with another order. An order book processes market and limit orders but keeps persistently only limit orders. Limit orders can be cancelled by sending cancel orders. From trader point of view there can be other order types like Iceberg order but from order book perspective it is just a sequence of basic orders: market or limit ones. 10.2.1 Basic order functionality Orders to buy or sell some assets should derive from order.Base which provides basic functionality. It has following user interface: class OrderBase(object): def __init__(self, volume): self.on_matched = Event() ... def volume(self) def PnL(self) def empty(self) def cancelled(self) def cancel(self) side = Side.Buy | Side.Sell It stores number of assets to trade (volume), calculates P&L for this order (positive if it is a sell order and negative if it is a buy order) and keeps a cancellation flag. An order is considered as empty if 121 its volume is equal to 0. When order is matched (partially or completely) on_matched event listeners are invoked with information what orders are matched, at what price and what volume with. Usually the library provides a generic class for an order that accepts in its constructor side of the order. Also, that class provides Buy and Sell static methods that create orders with fixed side. For convenience, a method to call the constructor in curried form side->*args->Order is also provided (T) # from marketsim.order.Market class definition @staticmethod def Buy(volume): return Market(Side.Buy, volume) @staticmethod def Sell(volume): return Market(Side.Sell, volume) @staticmethod def T(side): return lambda volume: Market(side, volume) 10.2.2 Market orders Market order (order.Market class) derives from order.Base and simply add some logic how it should be processed in an order book. 10.2.3 Limit orders Limit order (order.Limit class) stores also its limit price and has a bit more complicated logic of processing in an order book. 10.2.4 Iceberg orders Iceberg orders (order.Iceberg class) are virtual orders (so they are composed as a sequence of basic orders) and are never stored in order books. Once an iceberg order is sent to an order book, it creates an order of underlying type with volume constrained by some limit and sends it to the order book instead of itself. Once the underlying order is matched completely, the iceberg order resends another order to the order book till all iceberg order volume will be traded. Iceberg orders are created by the following function in curried form: def iceberg(volumeLimit, orderFactory): def inner(*args): return Iceberg(volumeLimit, orderFactory, *args) return inner where volume is a maximal volume of an order that can be issued by the iceberg order and orderFactory is a factory to create underlying orders like order.Market.Sell or order.Limit.Buy and *args are parameters to be passed to order’s initializer. 122 10.2.5 Cancel orders Cancel orders (order.Cancel class) are aimed to cancel already issued limit (and possibly iceberg) orders. If a user wants to cancel anOrder she should send order.Cancel(anOrder) to the same order book. It will notify the order book event listeners if the best order in the book has changed. Also, on_order_cancelled event is fired. 10.2.6 Limit market orders Limit market order (order.LimitMarket class) is a virtual order which is composed of a limit order and a cancel order sent immediately after the limit one thus combining behavior of market and limit orders: the trade is done at price better than given one (limit order behavior) but in case of failure the limit order is not stored in the order book (market order behavior). This class also has optional delay parameter which, when given, instructs to store the limit order for this delay (Should it be extracted into a separate Cancellable virtual order which should be parametrized by an order to cancel – it might be a limit order, an iceberg or an always best order???). 10.2.7 Always best orders Always best order (order.AlwaysBest class) is a virtual order that ensures that it has the best price in the order book. It is implemented as a limit order which is cancelled once the best price in the order queue has changed and is sent again to the order book with a price one tick better than the best price in the book. If several always best orders are sent to one side of an order book, a price race will start thus leading to their elimination by orders of the other side. 10.2.8 Virtual market orders Virtual market orders (order.VirtualMarket class) are utilitary orders that are used to evaluate a P&L of a market order without submitting it. These orders don’t make any influence onto the market and end with call to orderBook.evaluateOrderPriceAsync method. When the P&L is evaluated, on_matched event is fired with parameters (self, None, PnL, volume_matched) 10.3 Order book and order queue Order book represents a single asset traded in some market. Same asset traded in different markets would have been represented by different order books. An order book stores unfulfilled limit orders sent for this asset in two order queues, one for each trade sides (Asks for sell orders and Bids for buy orders). Order queues are organized in a way to extract quickly the best order and to place a new order inside. In order to achieve this a heap based implementation is used. Order books support a notion of a tick size: all limit orders stored in the book should have prices that are multipliers of the chosen tick size. If an order has a limit price not divisible by the tick size it is rounded to the closest ’weaker’ tick (’floored’ for buy orders and ’ceiled’ for sell orders). 123 Market orders are processed by an order book in the following way: if there are unfulfilled limit orders at the opposite trade side, the market order is matched against them till either it is fulfilled or there are no more unfilled limit orders. Price for the trade is taken as the limit order limit price. Limit orders are matched in order of their price (ascending for sell orders and descending for buy orders). If two orders have the same price, it is garanteed that the elder order will be matched first. Limit orders firstly processed exactly as market orders. If a limit order is not filled completely it is stored in a corresponding order queue. There is a notion of transaction costs: if a user wants to define functions computing transaction fees for market, limit and cancel orders she should pass functions of form (anOrder, orderBook) > Price to the order book constructor. If Price is negative, the trader gains some money on this transaction. If the functions are given, once an order is processed by an order book, method order.charge(price) is called. The default implementation for the method delegates it to trader.charge(price) where trader is a trader associated with the order. 10.3.1 Order book For the moment, there are two kinds of order books: local and remote ones. Local order books execute methods immediately but remote onces try to simulate some delay between a trader and a market by means of message passing (so they are asynchronous by their nature). These books try to have the same interface in order that traders cannot tell the difference between them. The base class for the order books is: class orderbook._base.BookBase(object): def __init__(self, tickSize=1, label="") def queue(self, side) def tickSize(self) def process(self, order) def bids(self) def asks(self) def price(self) def spread(self) def evaluateOrderPriceAsync(self, side, volume, callback) Methods bids, asks and queue give access to queues composing the order book. Methods price and spread return middle arithmetic price and spread if they are defined (i.e. bids and asks are not empty). Orders are handled by an order book by process method. If process method is called recursively (e.g. from a listener of on_best_changed event) its order is put into an internal queue which is to be processed when the current order processing is finished. This ensures that at every moment of time only one order is processed. Method evaluateOrderPriceAsync is used to compute P&L of a market order of given side and volume without executing it. Since this operation might be expensive to be computed locally in case of a remote order book we return the result price asynchronously by calling function given by callback parameter. 124 10.3.2 Local order book Local order books extend the base order book by concrete order processing implementation (methods processLimitOrder, cancelOrder etc.) and allow user to define functions computing transaction fees: class orderbook.Local(BookBase): """ Order book for a single asset in a market Maintains two order queues for orders of different sides """ def __init__(self, tickSize=1, label="", marketOrderFee = None, limitOrderFee = None, cancelOrderFee = None) 10.3.3 Order queue Order queues can be accessed via queue, asks and bids methods of an order book and provide following user interface: class orderbook._queue.Queue(object): def __init__(self, ...): # event to be called when the best order changes self.on_best_changed = Event() # event (orderQueue, cancelledOrder) to be called when an order is cancelled self.on_order_cancelled = Event() def book(self) def empty(self) def best(self) def withPricesBetterThan(self, limit) def volumeWithPriceBetterThan(self, limit) def sorted(self) def sortedPVs(self) The best order in a queue can be obtained by calling best method provided that the queue is not empty. Method withPricesBetterThan enumerates orders having limit price better or equal to limit. This function is used to obtain total volume of orders that can be traded on price better or equal to the given one: volumeWithPriceBetterThan. Property sorted enumerates orders in their limit price order. Property sortedPVs enumerates aggregate volumes for each price in the queue in their order. When the best order changes in a queue (a new order arrival or the best order has matched), on_best_changed event listeners get notified. 125 10.3.4 Remote order book Remote order book (orderbook.Remote class) represents an order book for a remote trader. Remoteness means that there is some delay between moment when an order is sent to a market and the moment when the order is received by the market so it models latency in telecommunication networks. A remote book constructor accepts a reference to an actual order book (or to another remote order book) and a reference to a two-way communication channel. Class remote.TwoWayLink implements a two-way telecommunication channel having different latency functions in each direction (to market and from market). It also ensures that messages are delivired to the recipient in the order they were sent. Queues in a remote book are instances of orderbook._remote.Queue class. This class is connected to the real order queue and listens on_best_changed events thus keeping information about the best order in the queue up-to-date. When a remote order book receives an order, it is cloned and sent to the actual order book via communication link. The remote order book gets subscribed to the clone order’s events via downside link. It leads to that in some moments of time the state of the original order and its clone are not synchronised (and this is normal). 10.4 Traders Trader functionality can be divided into the following parts: • Ability to send orders to the market • Tracking number of assets traded and current P&L • Managing number of assets and money that can be involved into trade (currently, a trader may send as many orders as it wishes; in future, when modelling money management there should be some limitations on how many resources it may use) • Logic for determining moments of time when orders should to be sent and their parameters. In our library trading strategy classes encapsulate trader behaviour (the last point) while trader classes are responsible for resource management (first three points). Class trader.Base provides basic functionality for all traders: P&L bookkeeping (PnL method) and notifying listeners about two events: on_order_sent when a new order is sent to market and on_traded when an order sent to market partially or completely fulfilled. Class trader.SingleAsset derives from trader.Base and adds tracking of a number of assets traded (amount property). Class trader.SingleAssetSingleMarket derives from trader.SingleAsset and associates with a specific order book where orders are to be sent. Class trader.SingleAssetMultipleMarkets stores references to a collection of order books where the trader can trade. Class observable.Efficiency can be used to measure trader "efficiency" i.e. trader’s balance if it were "cleared" (if the trader bought missing assets or sold the excess ones by sending a market order for -trader.amount assets). 126 10.5 Strategies 10.5.1 Strategy base classes All strategies should derive from strategy._basic.Strategy class. This class keeps suspended flag and stores a reference to a trader for the strategy. class Strategy(object): def __init__(self, trader): self._suspended = False self._trader = trader @property def suspended(self): return self._suspended def suspend(self, s): self._suspended = s @property def trader(self): return self._trader Generic one side strategy Class strategy.OneSide derives from strategy._basic.Strategy and provides generic structure for creating strategies that trade only at one side. It is parametrized by following data: • Trader. • Side of the trader. • Event source. It is an object having advise method that allows to start listening some events. Typical examples of event sources are timer events or an event that the best price in an order queue has changed. Of course, possible choices of event sources are not limited to these examples. • Function to calculate parameters of an order to create. For limit orders it should return pair (price, volume) and for market orders it should return a 1-tuple (volume,). This function has one parameter - trader which can be requested for example for its current P&L and about amount of the asset being traded. • Factory to create orders that will be used as a function in curried form side -> *orderArgs -> Order where *orderArgs is type of return value of the function calculating order parameters. 127 Typical values for this factory are marketOrderT, limitOrderT or, for exalmple, icebergT(someVolume, limitOrderT). The strategy wakes up in moments of time given by the event source, calculates parameters of an order to create, creates an order and sends it to the order book. Generic two side strategy Class strategy.TwoSide derives from strategy._basic.Strategy and provides generic structure for creating traders that trade on both sides. It is parametrized by following data (see also strategy.OneSide parameters): • Trader. • Event source. It is an object having advise method that allows to start listening some events. • Function to calculate parameters of an order to create. It should return either pair (side, *orderArgs) if it wants to create an order, or None if no order should be created. • Factory to create orders that will be used as a function in curried form side -> *orderArgs -> Order where *orderArgs is type of return value of the function calculating order parameters. The strategy wakes up in moments of time given by the event source, calculates parameters and side of an order to create, if the side is defined creates an order and sends it to the order book. 10.5.2 Concrete strategies Liquidity provider Liquidity provider (strategy.LiquidityProviderSide class) strategy is implemented as instance of strategy.OneSide. It has followng parameters: • trader - strategy’s trader • side - side of orders to create (default: Side.Sell) • orderFactoryT - order factory function (default: order.Limit.T) • initialValue - initial price which is taken if orderBook is empty (default: 100) • creationIntervalDistr - defines intervals of time between order creation (default: exponential distribution with λ=1) • priceDistr - defines multipliers for current asset price when price of order to create is calculated (default: log normal distribution with µ=0 and σ=0.1) • volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1) 128 It wakes up in moments of time given by creationIntervalDistr, checks the last best price of orders in the corresponding queue, takes initialValue if it is empty, multiplies it by a value taken from priceDistr to obtain price of the order to create, calculates order volume using volumeDistr, creates an order via orderFactoryT(side) and tells the trader to send it. For convenience, there is a strategy.LiquidityProvider function that creates two liquidity providers with same parameters except but of two different trading sides. Figure 10.1: Liquidity provider sample run Sample usage: from marketsim.veusz_graph import Graph, showGraphs import random from marketsim import strategy, trader, orderbook, scheduler, observable world = scheduler.create() avg = observable.avg book_A = orderbook.Local(tickSize=0.01, label="A") price_graph = Graph("Price") assetPrice = observable.Price(book_A) price_graph.addTimeSeries([\ 129 assetPrice, avg(assetPrice, alpha=0.15), avg(assetPrice, alpha=0.015), avg(assetPrice, alpha=0.65)]) def volume(v): return lambda: v*random.expovariate(.1) lp_A = strategy.LiquidityProvider(\ trader.SASM(book_A, "A"), volumeDistr=volume(10)).trader lp_a = strategy.LiquidityProvider(\ trader.SASM(book_A, "a"), volumeDistr=volume(1)).trader spread_graph = Graph("Bid-Ask Spread") spread_graph.addTimeSerie(observable.BidPrice(book_A)) spread_graph.addTimeSerie(observable.AskPrice(book_A)) eff_graph = Graph("efficiency") eff_graph.addTimeSerie(observable.Efficiency(lp_a)) eff_graph.addTimeSerie(observable.PnL(lp_a)) world.workTill(500) showGraphs("liquidity", [price_graph, spread_graph, eff_graph]) Canceller strategy.Canceller is an agent aimed to cancel persistent (limit, iceberg etc.) orders issued by a trader. In order to do that it subscribes to on_order_sent event of the trader, stores incoming orders in an array and in some moments of time chooses randomly an order and cancels it by sending order.Cancel. It has following parameters: • trader - trader to subscribe to • cancellationIntervalDistr - intervals of times between order cancellations (default: exponential distribution with λ=1) • choiceFunc - function N -> idx that chooses which order should be cancelled 130 Fundamental value strategy Fundamental value strategy (strategy.FundamentalValue class) believes that an asset should cost some specific price (’fundamental value’) and if current asset price is lower than fundamental value it starts to buy the asset and if the price is higher than it starts to sell the asset. It has following parameters: • trader -strategy’s trader • orderFactoryT - order factory function (default: order.Market.T) • creationIntervalDistr - defines intervals of time between order creation (default: exponential distribution with λ=1) • fundamentalValue - defines fundamental value (default: constant 100) • volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1) Figure 10.2: Fundamental value trader sample run (fv=200) Sample usage: from marketsim.veusz_graph import Graph, showGraphs from marketsim import strategy, orderbook, trader, scheduler, observable world = scheduler.create() book_A = orderbook.Local(tickSize=0.01, label="A") 131 price_graph = Graph("Price") assetPrice = observable.Price(book_A) price_graph.addTimeSerie(assetPrice) avg = observable.avg trend = observable.trend price_graph.addTimeSerie(avg(assetPrice)) lp_A = strategy.LiquidityProvider(trader.SASM(book_A), volumeDistr=lambda: 70).trader trader_200 = strategy.FundamentalValue(trader.SASM(book_A, "t200"),\ fundamentalValue=lambda: 200., volumeDistr=lambda: 12).trader trader_150 = strategy.FundamentalValue(trader.SASM(book_A, "t150"), \ fundamentalValue=lambda: 150., volumeDistr=lambda: 1).trader eff_graph = Graph("efficiency") trend_graph = Graph("efficiency trend") pnl_graph = Graph("P&L") volume_graph = Graph("volume") def addToGraph(traders): for t in traders: e = observable.Efficiency(t) eff_graph.addTimeSerie(avg(e)) trend_graph.addTimeSerie(trend(e)) pnl_graph.addTimeSerie(observable.PnL(t)) volume_graph.addTimeSerie(observable.VolumeTraded(t)) addToGraph([trader_150, trader_200]) world.workTill(1500) showGraphs("fv_trader", [price_graph, eff_graph, trend_graph, pnl_graph, volume_graph]) Dependent price strategy Dependent price strategy (strategy.Dependency class) believes that the fair price of an asset A is completely correlated with price of another asset B and the following relation should be held: PriceA = 132 kPriceB , where k is some factor. It may be considered as a variety of a fundamental value strategy with the exception that it is invoked every the time price of another asset B has changed. It has following parameters: • trader -strategy’s trader • bookToDependOn - asset that is considered as a reference one • orderFactoryT - order factory function (default: order.Market.T) • factor - multiplier to obtain fair asset price from the reference asset price • volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1) Figure 10.3: Dependency trader sample run (k=0.5) Sample usage: from marketsim.veusz_graph import Graph, showGraphs import random from marketsim import strategy, orderbook, trader, scheduler, observable world = scheduler.create() book_A = orderbook.Local(tickSize=0.01, label="A") 133 book_B = orderbook.Local(tickSize=0.01, label="B") price_graph = Graph("Price") assetPrice_A = observable.Price(book_A) price_graph.addTimeSerie(assetPrice_A) assetPrice_B = observable.Price(book_B) price_graph.addTimeSerie(assetPrice_B) avg = observable.avg price_graph.addTimeSerie(avg(assetPrice_A, alpha=0.15)) price_graph.addTimeSerie(avg(assetPrice_B, alpha=0.15)) liqVol = lambda: random.expovariate(.1)*5 lp_A = strategy.LiquidityProvider(trader.SASM(book_A), \ defaultValue=50., volumeDistr=liqVol).trader lp_B = strategy.LiquidityProvider(trader.SASM(book_B), \ defaultValue=150., volumeDistr=liqVol).trader dep_AB = strategy.Dependency(trader.SASM(book_A, "AB"), book_B, factor=2).trader dep_BA = strategy.Dependency(trader.SASM(book_B, "BA"), book_A, factor=.5).trader eff_graph = Graph("efficiency") eff_graph.addTimeSerie(observable.Efficiency(dep_AB)) eff_graph.addTimeSerie(observable.Efficiency(dep_BA)) eff_graph.addTimeSerie(observable.PnL(dep_AB)) eff_graph.addTimeSerie(observable.PnL(dep_BA)) world.workTill(500) showGraphs("dependency", [price_graph, eff_graph]) Noise strategy Noise strategy (strategy.Noise class) is quite dummy strategy that randomly creates an order and sends it to the order book. It has following parameters: • trader -strategy’s trader • orderFactoryT - order factory function (default: order.Market.T) 134 • creationIntervalDistr - defines intervals of time between order creation (default: exponential distribution with λ=1) • sideDistr - side of orders to create (default: discrete uniform distribution P(Sell)=P(Buy)=.5) • volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1) Signal strategy Signal strategy (strategy.Signal class) listens to some discrete signal and when the signal becomes more than some threshold the strategy starts to buy. When the signal gets lower than -threshold the strategy starts to sell. It has following parameters: • trader -strategy’s trader • signal - signal to be listened to • orderFactoryT - order factory function (default: order.Market.T) • threshold - threshold when the trader starts to act (default: 0.7) • volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1) Figure 10.4: Signal trader sample run (signal=20-t/10) Sample usage: 135 from marketsim.veusz_graph import Graph, showGraphs from marketsim import signal, strategy, trader, orderbook, scheduler, observable world = scheduler.create() book_A = orderbook.Local(tickSize=0.01, label="A") price_graph = Graph("Price") assetPrice = observable.Price(book_A) price_graph.addTimeSerie(assetPrice) avg = observable.avg price_graph.addTimeSerie(avg(assetPrice)) lp_A = strategy.LiquidityProvider(trader.SASM(book_A), volumeDistr=lambda:1).trader signal = signal.RandomWalk(initialValue=20, deltaDistr=lambda: -.1, label="signal") trader = strategy.Signal(trader.SASM(book_A, "signal"), signal).trader price_graph.addTimeSerie(signal) price_graph.addTimeSerie(observable.VolumeTraded(trader)) eff_graph = Graph("efficiency") eff_graph.addTimeSerie(observable.Efficiency(trader)) eff_graph.addTimeSerie(observable.PnL(trader)) world.workTill(500) showGraphs("signal_trader", [price_graph, eff_graph]) Signal signal.RandomWalk is a sample discrete signal that changes at some moments of time by incrementing on some random value. It has following parameters: • initialValue - initial value of the signal (default: 0) • deltaDistr - increment distribution function (default: normal distribution with µ=0, σ=1) • intervalDistr - defines intervals of time between order creation (default: exponential distribution with λ=1) 136 Trend follower Trend follower (strategy.TrendFollower class) can be considered as a sort of a signal strategy (strategy.Signal) where the signal is a trend of the asset. Under trend we understand the first derivative of some moving average of asset prices. If the derivative is positive, the trader buys; if negative – sells. Since moving average is a continuously changing signal, we check its derivative at random moments of time. It has following parameters: • trader -strategy’s trader • average - moving average object. It should have methods update(time,value) and derivativeAt(time). By default, we use exponentially weighted moving average with α = 0.15. • orderFactoryT - order factory function (default: order.Market.T) • threshold - threshold when the trader starts to act (default: 0.) • creationIntervalDistr - defines intervals of time between order creation (default: exponential distribution with λ=1) • volumeDistr - defines volumes of orders to create (default: exponential distribution with λ=1) 10.5.3 Arbitrage trading strategy Arbitrage trading strategy (strategy.Arbitrage) tries to make arbitrage trading the same asset on different markets. Once a bid price at one market becomes more than ask price at other market, the strategy sends two complimentary market orders with opposite sides and the same volume (which is calculated using order queue’s volumeWithPriceBetterThan method) having thus non-negative profit. This strategy is initialized by a sequence of order books it will follow for. Sample usage: from marketsim.veusz_graph import Graph, showGraphs from marketsim import trader, strategy, orderbook, remote, scheduler, observable world = scheduler.create() book_A = orderbook.Local(tickSize=0.01, label="A") book_B = orderbook.Local(tickSize=0.01, label="B") link = remote.TwoWayLink() remote_A = orderbook.Remote(book_A, link) remote_B = orderbook.Remote(book_B, link) price_graph = Graph("Price") spread_graph = Graph("Bid-Ask Spread") 137 Figure 10.5: Arbitrage trader sample run (Bid/Ask price) Figure 10.6: Arbitrage trader sample run (Cross spread) 138 cross_graph = Graph("Cross Bid-Ask Spread") arbitrager = strategy.Arbitrage(\ trader.SingleAssetMultipleMarket([remote_A, remote_B])).trader assetPrice = observable.Price(book_A) price_graph.addTimeSerie(assetPrice) avg = observable.avg cross_AB = observable.CrossSpread(book_A, book_B) cross_BA = observable.CrossSpread(book_B, book_A) cross_graph.addTimeSerie(cross_AB) cross_graph.addTimeSerie(cross_BA) cross_graph.addTimeSerie(avg(cross_AB)) cross_graph.addTimeSerie(avg(cross_BA)) spread_graph.addTimeSerie(avg(observable.BidPrice(book_A))) spread_graph.addTimeSerie(avg(observable.AskPrice(book_A))) spread_graph.addTimeSerie(avg(observable.BidPrice(book_B))) spread_graph.addTimeSerie(avg(observable.AskPrice(book_B))) ewma_0_15 = observable.EWMA(assetPrice, alpha=0.15) price_graph.addTimeSerie(observable.OnEveryDt(1, ewma_0_15)) lp_A = strategy.LiquidityProvider(trader.SASM(remote_A)) lp_B = strategy.LiquidityProvider(trader.SASM(remote_B)) world.workTill(500) showGraphs("arbitrage", [price_graph, spread_graph, cross_graph]) 10.5.4 Adaptive strategies An adaptive strategy is a strategy that is suspended or resumed in function of its "efficiency". Adaptive strategies in the library are parametrized by an object measuring the efficiency since there can be different ways to do it. The default estimator is the first derivative of trader "efficiency" exponentially weighted moving average. To measure strategy efficiency without influencing the market we use a "phantom" strategy having same parameters as the original one (except volumeDistr is taken as constant 1) but 139 sending order.VirtualMarket to the order book. Function adaptive.withEstimator creates a real strategy coupled with its "phantom" and estimator object. def withEstimator(constructor, *args, **kwargs): assert len(args) == 0, "positional arguments are not supported" efficiencyFunc = kwargs[’efficiencyFunc’] \ if ’efficiencyFunc’ in kwargs \ else lambda trader: trend(efficiency(trader)) real = constructor(*args, **kwargs) real.estimator = createVirtual(constructor, copy(kwargs)) real.efficiency = efficiencyFunc(real.estimator.trader) return real Function adaptive.suspendIfNotEffective suspends a strategy if it is considered as ineffective: def suspendIfNotEffective(strategy): strategy.efficiency.on_changed += \ lambda _: strategy.suspend(strategy.efficiency.value < 0) return strategy Class adaptive.chooseTheBest is a composite strategy. It is initialized with a sequence of strategies coupled with their efficiency estimators (i.e. having efficiency field). In some moments of time (given by eventGen parameter) the most effective strategy is chosen and made running; other strategies are suspended. class chooseTheBest(Strategy): def __init__(self, strategies, event_gen=None): assert all(map(lambda s: s.trader == strategies[0].trader, strategies)) if event_gen is None: event_gen = scheduler.Timer(lambda: 1) Strategy.__init__(self, strategies[0].trader) self._strategies = strategies event_gen.advise(self._chooseTheBest) def _chooseTheBest(self, _): best = -10e38 for s in self._strategies: if s.efficiency.value > best: best = s.efficiency.value for s in self._strategies: s.suspend(best != s.efficiency.value) 140 Part IV Imperfection and predictability in order-driven markets 141 Chapter 11 Market imperfection and predictibility This chapter addresses the notion of efficiency, in the sense of short-term predictibility, of financial markets, in the light of the dramatic, recent changes that has led to a fierce competition between a growing number of actors, and a dramatic increase in the influence of high frequency trading. 11.1 Forecasting and market efficiency Forecasting the market has been one of the most exciting financial subjects for over a century. In 1900, L. Bachelier [25] admitted, “Undoubtedly, the Theory of Probability will never be applicable to the movements of quoted prices and the dynamics of the Stock Exchange will never be an exact science. However, it is possible to study mathematically the static state of the market at a given instant to establish the probability law for the price fluctuations that the market admits at this instant." 70 years later, Fama [126] proposed a formal definition of the market efficiency: “a market in which prices always fully reflect available information is called efficient.” Opinions have been always divergent about the market efficiency. B Malkiel [235] concluded that most investors trying to predict stocks’ returns always ended up with profits inferior to passive strategies. In his book Fooled by Randomness, N. Taleb [306] argued that even the best performances can be explained by luck and randomness. Interestingly enough, the temptation to discover some predictable, and hopefully profitable, signals in the middle of those millions of numbers has never been as high as today. In the academic world, [67] is a detailed study of the statistical properties of the intraday returns, leading to the conclusion that there is no evidence of correlation between successive returns. Similarly, [127] concludes that stock returns exhibit negligible temporal autocorrelation. However, the use of limit order book data as in [327] has yielded more promising results - in particular, the liquidity imbalance at the best bid and ask limits seems to be informative to predict the next trade sign. In the professional world, many books present strategies predicting the market and always earning money, see e.g. [254], [313]. When testing those strategies on realistic samples, results are quite different and the strategies are no longer profitable. It is quite possible that therisk of over-fitting, inherent to such methods, has played a key role in the seemingly good performances published in those books. The study presented here is performed both from an academic and a professional perspective. For each 143 prediction method, not only are statistical results presented, but also, the performances of the correspondant strategies are presented. The aim is to give another point of view of a good prediction and of an efficient market. In the first section, the data and the test methodology are presented. In the second section a nonlinear method based on conditional probability matrices, is used to test the predictive power of each indicator. In the last section, the linear regression is introduced to combine the different indicators, and several regularization tools are tested in order to enhance the performances of the strategies. 11.2 Data, methodology and performances measures 11.2.1 Data We focus on the EUROSTOXX 50 European liquid stocks. One year (2013) of full daily order book data are used to achieve the study. For a stock with a mid price S t at time t, the return to be predicted over a period δt is Ln( SSt+δt ). At the time t, one can use all the available data for any time s ≤ t to perform t the prediction. The focus is on predicting the stocks’ returns over a fixed period δt using some limit order book indicators. Once the returns and the indicators are computed, the data are sampled on a fixed time grid from 10 : 00 am to 5 : 00 pm with a resolution δt. Three different resolutions are tested: 1, 5 and 30 minutes. Below are the definitions of the studied indicators and the rationale behind using them to predict the returns. t ). Two effects justify the use of the past return indicaPast return: the past return is defined as Ln( SSt−δt tor to predict the next return; the mean-reversion effect and the momentum effect. If a stock, suddenly, shows an abnormal return that, significantly, deviates the stock’s price from its historical mean value, the mean reversion effect is observed when another large return with opposite sign occurs rapidly after, driving the stock price back to its usual average range. On the other hand, if the stock exhibits, in a progressive fashion, a significant deviation, then the momentum effect occurs when more and more market participants become convinced of the relevance of the move and trade in the same sense, thereby increasing the deviation. P Order book imbalance: the liquidity on the bid (respectively ask) side is defined as Liqbid = 5i=1 wi bi qbi P (respectively Liqask = 5i=1 wi ai qai ), where bi (respectively ai ) is the price at the limit i on the bid (respectively ask) side, qbi (respectively qai ) is the corresponding available quantity, and wi is a decreasing function on i used to give more weight to the best limits. Those indicators give an idea about the instantaneous volume available for trading on each side of the order book. Finally, the order book imbalance Liqbid ). This indicator summarizes the order book static state and gives an idea about the is defined as Ln( Liq ask buy-sell instantaneous equilibrium. When this indicator is significantly higher (respectively lower) than 0, the available quantity at the bid side is significantly higher (respectively lower) than the one at the ask side; only few participants are willing to sell (respectively buy) the stock, which might reflect a market 144 consensus that the stock will move up (respectively down). Flow quantity: this indicator summarizes the order book dynamic over the last period δt. Qb (respectively Q s ) is denoted as the sum of the bought (respectively sold) quantities, over the last period δt b and the flow quantity is defined as Ln( Q Q s ). This indicator is similar to the order flow and shows a high positive autocorrelation. The rationale behind using the flow quantity is to verify if the persistence of the flow is informative about the next return. EMA: for a process (X)ti observed at discrete times (ti ), the Exponential Moving Average EMA(d, X) with delay d is defined as EMA(d, X)t0 = Xt0 and for 1 ≤ i EMA(d, X)ti = ωXti + (1 − ω)EMA(d, X)ti−1 , where ω = min(1, ti −tdi−1 ). The EMA is a weighted average of the process with an exponential decay. The smaller d is, the shorter the EMA memory is. 11.2.2 Methodology The aim of this study is to empirically test the market efficiency by predicting the stocks’ returns over three different time intervals: 1, 5 and 30 minutes. In Section 11.3, the indicators are the past returns, the order book imbalance and the flow quantity. A simple method based on historical conditional probabilities is used to assess, separately, the informative effect of each indicator. In Section 11.4, the three indicators and their EMA(X, d) for d ∈ (1, 2, 4, 8, 16, 32, 64, 128, 256) are combined in order to perform a better prediction than the case of a single indicator. Different methods, based on linear regression, are tested. In particular, some statistical and numerical stability problems of the linear regression are addressed. In the different sections, the predictions are tested statistically, then used to design a simple trading strategy. The goal is to verify whether one can find a profitable strategy covering 0.5 basis point trading costs. This trading cost is realistic and corresponds to many funds, brokers, and banks trading costs. The possibility of determining, if it exists, a strategy that stays profitable after paying the costs, would be an empirical argument of the market inefficiency. Notice that, in all the sections, the learning samples are sliding windows containing sufficient number of days, and the testing samples are the next days. The models parameters are fitted in-sample on a learning sample, and the strategies are tested out-of-sample on a testing sample. The sliding training windows prevents the methodology from any overfitting, since performances are only computed out of sample. 11.2.3 Performance measures In most studies addressing market efficiency, results are summarized in a linear correlation coefficient. However, such a measure is not sufficient to conclude about returns predictability or the market efficiency: any interpretation of the results should depend on the predicted signal and a corresponding trading strategy. A 1% correlation is high if the signal is supposed to be totally random, and 99% correlation is unsufficient if the signal is supposed to be perfectly predictable. Moreover, a trader making 1 euro each time trading a stock with 50.01% probability and losing 1 euro with 49.99% probability, might be considered as a noise trader. However, if this strategy can be run, over 500 stocks, one time per second, for 8 hours a day, at the end of the day the gain will be the sum S n of n = 14.4 million 145 realisations. Assuming that the hypotheses of the central limit theorem are √satisfied, law N(E, σ √ ). n Thus the probability of having a negative trading day is −E n Φ( σ ) Sn n has a normal = Φ(−0.62) = 26.5%, much lower than that of a noise trader. From now on, we shall adopt the very empirical, but quite realistic, view that returns are considered predictable, and thus, the market is considered inefficient, if one can run a profitable strategy covering the trading costs. 11.3 Conditional probability matrices The conditional probability matrices provide empirical estimates of conditional probability laws. To apply this method, the data need to be discretized in a small number of classes. Denote the explanatory variable as X, the logarithmic return as Y, so that Ytn = Ln( Denote the classes of X (respectively Y) as CX = {CiX S tn+1 S tn ), and the frequencies matrix as M. : i ∈ N+ ∩ {i ≤ S X }} (respectively C Y = {CiY : i ∈ N+ ∩ {i ≤ S Y }}). S X (respectively S Y ) denotes the total number of classes for X (respectively Y). For a given learning period [0, T ] containing N observations, the matrix of frequencies at the time T is constructed as: i, j MT = card({(Xtn ∈ CiX , Ytn ∈ C Yj )}) where n ∈ N+ ∩ {n ≤ N}, and Xtn (respectively Ytn ) is the nth observed value of X (respectively Y), observed at the time tn . Then, a prediction of the "next" return YT conditional to the observations XT at time T can be computed using the matrix MT . The idea of this method is a simple application of the statistical independence test. If some events A = "Xtn ∈ CiX " and B="Ytn ∈ C Yj " are statistically independent, then P(A|B) = P(A). For example, to check if the past returns (denoted X in this example) can help predicting the future returns (denoted Y in this example), the returns are classified into two classes and the empirical frequencies matrix is computed. Table 11.1 shows the results for the 1-minute returns of Deutsche Telekom over the year 2013. A = “X < 000 B = “X > 000 A = “Y < 000 19,950 21,597 B = “Y > 000 21,597 20,448 Table 11.1: Historical frequencies matrix for Deutsche Telekom over 2013 In probabilistic terms, the historical probability to observe a negative return is P(A) = 49.70% and to observe a positive return is P(B) = 50.30%. Therefore, a trader always buying the stock would have a success rate of 50.30%. Also note that P(A/A) = 48.02%, P(B/A) = 51.98%, P(A/B) = 51.37%, P(B/B) = 48.63%. Thus, a trader playing the mean-reversion (buy when the past return is negative and sell when the past return is positive), would have a success rate of 51.67%. The same approach, when trading the strategy over 500 stocks, gives a success rate of 54.38% for the buy strategy and of 72.91% for the mean reversion strategy. This simple test shows that the smallest statistical bias can be profitable and useful for designing a trading strategy. However the previous strategy is not realistic: the conditional probabilities are computed in-sample and the full data set of Deutsche Telekom was used for the computation. In reality, predictions have to be computed using only the past data. It is, thus, important to have stationary probabilities. Table 11.2 shows that the monthly observed frequencies are 146 quite stable, and thus can be used to estimate out-of-sample probabilities. Each month, one can use the observed frequencies of the previous month as an estimator of current month probabilities. Month P(A/A) P(B/A) P(A/B) P(B/B) Jan 0.49 0.51 0.50 0.50 Feb 0.48 0.52 0.52 0.48 Mar 0.47 0.53 0.51 0.49 Apr 0.48 0.52 0.53 0.47 May 0.47 0.53 0.52 0.48 Jun 0.50 0.50 0.49 0.51 Jul 0.48 0.52 0.51 0.49 Aug 0.51 0.49 0.50 0.50 Sep 0.51 0.49 0.51 0.49 Oct 0.47 0.53 0.50 0.50 Nov 0.46 0.54 0.51 0.49 Dec 0.44 0.56 0.55 0.45 Table 11.2: Monthly historical conditional probabilities: In the most cases, P(A/A) and P(B/B) are lower than 50% where P(B/A) and P(A/B) are higher than 50%. In the following paragraphs, frequencies matrices are computed on sliding windows for the different indicators. Several classification and prediction methods are presented. 11.3.1 Binary method In the binary case, returns are classified into positive and negative as the previous example and explanatory variables are classified relatively to their historical mean. A typical constructed matrix is shown in Table 11.1. Denote, in the example shown on Table 11.1, C1X = {X < X = 0}, C2X = {X > X = 0}, C1Y = {Y < 0}, C2Y = {Y > 0}. Y can be predicted using different formula based on the frequency matrix. Below some estimators examples: Yb1 : The sign of the most likely next return conditionally to the current state. Yb2 : The expectation of the most likely next return conditionally to the current state. Yb3 : The expectation of next return conditionally to the current state. X +1 if XT ∈ C1 Yb1 = −1 if XT ∈ C X 2 Y X X E(Y|Y ∈ C2 ∩ X ∈ C1 ) if XT ∈ C1 Yb2 = E(Y|Y ∈ C Y ∩ X ∈ C X ) if X ∈ C X T 1 2 2 X X E(Y|X ∈ C1 ) if XT ∈ C1 Yb3 = E(Y|X ∈ C X ) if X ∈ C X 2 T 2 In this study, only results based on the estimator Yb3 (denoted b Y in the rest of the paper) are presented. Results computed using different other estimators are equivalent and the differences do not impact the conclusions. To measure the quality of the prediction, four tests are applied: AUC (Area under the curve) combines the true positive rate and the false positive rate to give an idea about the classification quality. 147 Accuracy is defined as the ratio of the correct predictions (Y and b Y have the same sign). Gain is computed on a simple strategy to measure the prediction performance. Predictions are used to run a strategy that buys when the predicted return is positive and sells when it is negative. At each time, for each stock the strategy’s position is in {−100, 000, 0 , +100, 000}. Profitability is defined as the gain divided by the traded notional of the strategy presented above. This measure is useful to estimate the gain with different transaction costs. Figure 1 summarizes the statistical results of predicting the 1-minute returns using the three indicators. For each predictor, the AUC and the accuracy are computed over all the stocks. Results are computed over more than 100,000 observations and the amplitude of the 95% confidence interval is around 0.6%. For the three indicators, the accuracy and the AUC are significantly higher than the 50% random guessing threshold. The graph also shows that the order book imbalance gives the best results, and that the past resturn is the least successful predictor. Detailed results per stock are given in Appendix C. Figure 11.1: The quality of the binary prediction: The AUC and the Accuracy are higher than 50%. The three predictors are better than random guessing and are significatly informative. In Figure 2, the performances of the trading strategies based on the prediction of the 1-minute returns are presented. The strategies are profitable and the results confirm the predictability of the returns (see the details in Appendix C). In Figure 3, the cumulative gains of the strategies based on the 3 indicators over 2013 are represented. When trading without costs, predicting the 1-minute return using the past return and betting 100,000 euros at each time, would make a 5-million Euro profit. Even better, predicting using the order book imbalance would make more than 20 million Euros profit. The results confirm the predictability of the returns, but not the inefficiency of the market. In fact, Figure 4 shows that, when adding the 0.5 bp trading costs, only the strategy based on the order book imbalance remains (marginally) positive. Thus, Figure 5 represents cumulative gain profitability for the in5-minute andC).the 30-minute no conclusion, about the the market efficiency, canand be the made (see more details Appendix strategies (with the trading costs). The strategies are not profitable. Moreover, the predictive power decreases as the time horizon increases. The results of the binary method show that the returns are 148 Figure 11.2: The quality of the binary prediction: For the 3 predictors, the densities of the gain and the profitability are positively biased, confirming the predictability of the returns. Figure 11.3: The quality of the binary prediction: The graphs confirm that the 3 indicators are informative and that the order book imbalance indicator is the most profitable. significantly predictable. Nevertheless, the strategies based on those predictions are not sufficiently profitable to cover the trading costs. In order to enhance the predictions, the same idea is applied to the four-class case. Moreover, a new strategy based on a minimum threshold of the expected return is tested. 11.3.2 Four-class method We now investigate the case where the indicator X is classified into four classes; “very low values” C1X , “low values” C2X , “high values” C3X and “very high values” C4X . At each time tn , Y is predicted as b Y = E(Y|X ∈ CiX ), where CiX is the class of the current observation Xtn . As the previous case, the expectation is estimated from the historical frequencies matrix. Finally, a new trading strategy is tested. 149 Figure 11.4: The quality of the binary prediction: When adding the 0.5 bp trading costs, the strategies are only slightly profitable. The strategy is to buy (respectively sell) 100,000 euros when b Y is positive (respectively negative) and b |Y| > θ, where θ is a minimum threshold (1 bp in this paper). Notice that the case θ = 0 corresponds to the strategy tested in the binary case. The idea of choosing θ > 0 aims to avoid trading the stock when the signal is noisy. In particular, when analyzing the expectations of Y relative to the different classes of X, it is always observed that the absolute value of the expectation is high when X is in one of its extreme classes (C1X or C4X ). On the other hand, when X is in one of the intermediary classes (C2X or C3X ) the expectation of Y is close to 0 reflecting a noisy signal. For each indicator X, the classes are defined as C1X =] − ∞, Xa [, C2X =]Xa , Xb [, C3X =]Xb , Xc [ and C4X =]Xc , +∞[. To compute Xa , Xb and Xc , the 3 following classifications were tested: Quartile classification: the quartile Q1 , Q2 and Q3 are computed in-sample for each day, then averaged over the days. Xa , Xb and Xc corresponds, respectively, to Q1 , Q2 and Q3 . K-means classification: the K-means algorithm [171], applied to the in-sample data with k = 4, gives the centers G1 , G2, G3 and G4 of the optimal (in the sense of the minimum within-cluster sum of squares) clusters. Xa , Xb and Xc are given respectively by G1 +G2 G2 +G3 2 , 2 and G3 +G4 2 . Mean-variance classification: the average X and the standard deviation σ(X) are computed in the learning period. Then, Xa , Xb and Xc correspond, respectively, to X − σ(X), X and X + σ(X). Only the results based on the mean-variance classification are presented here, since the results computed 150 Figure 11.5: The quality of the binary prediction: The strategies are not profitable. Moreover, the performances decreases significantly compared to the 1-minute horizon. using the two other classifications are equivalent and the differences do not impact our conclusions. Figure 6 compares the profitabilities of the binary and the 4-class methods. For the 1-minute prediction, the results of the 4-class method are significantly better. For the longer horizons, the results of both methods are equivalent. Notice also that, using the best indicator, in the 4-class case, one could obtain a significantly positive performance after paying the trading costs. Some more detailed results are given in Appendix C. The interesting result of this first section is that even when using the simplest statistical learning method, the used indicators are informative and provide a better prediction than random guessing. However, in most cases, the obtained performances are too low to conclude about the market inefficiency. In order 151 Figure 11.6: The quality of the 4-class prediction: For the 1-minute prediction, the results of the 4-class method are significantly better than the results of the binary one. For longer horizons, both strategies are not profitable when adding the trading costs. to enhance the performances, the three indicators and their exponential moving average are combined in the next section. 152 11.4 Linear regression In this section, X denotes a 30-column matrix containing the 3 indicators and their EMA(d) for d ∈ (1, 2, 4, 8, 16, 32, 64, 128, 256), and Y denotes the target vector to be predicted. The general approach is to calibrate, on the learning sample, a function f such that f (X) is “the closest possible” to Y, and hope that, after the learning period, the relation between X and Y is still well described by f . Hence f (X) would be a good estimator of Y. In the linear case, f is supposed to be linear and the model errors are supposed to be independent and identically distributed [297]. Actually, the standard textbook model posits a relationship of the form Y = Xβ+, are done with z-scored data (use 11.4.1 Xi −Xi σ(Xi ) ∼ N(0, σ2 ). For technical reasons, the computations instead of Xi ). Ordinary least squares (OLS) The OLS method consists in estimating the unknown parameter β by minimizing the sum of squares of the residuals between the observed variable Y and the linear approximation Xβ. The estimator is denoted b β and is defined as b β = argminβ (Jβ = ||Y − Xβ||22 ) . This criterion is reasonable if at each time i the row Xi of the matrix X and the observation Yi of the vector Y represent independent random sample from their populations. The cost function Jβ dependes quadratically on β, and differentiating with respect to β gives: δJβ = 2tX Xβ − 2tX Y δβ δ2 J β = 2tX X δβδβ . When tX X is invertible, setting the first derivative to 0 gives the unique solution b β = (tX X)−1 tX Y. The statistical properties of this estimator can be calculated straightforward as follows: E(b β|X) = (tX X)−1 tX E(Y|X) = (tX X)−1 tX Xβ = β , Var(b β|X) = (tX X)−1 tX Var(Y|X)X(tX X)−1 = (tX X)−1 tX σ2 IX(tX X)−1 = σ2 (tX X)−1 , E(||b β||22 |X) = E(tY X(tXX)−2 tXY|X)) = T race(X(tXX)−2 tXσ2 I) + ||β||22 = σ2 T race((tXX)−1 ) + ||β||22 X 1 MS E(b β) = E(||b β − β||22 |X) = E(||b β||22 |X) − ||β||22 = σ2 T race((tXX)−1 ) = σ2 λi , , where MSE denotes the mean squared error and the λi ’s are the eigenvalues of tX X. Notice that the OLS estimator is unbiased, but can exhibit an arbitrary high MSE when the matrix tX X has eigenvalues close to 0 . In the out-of-sample period, b Y = Xb β is used to predict the target. As seen in section 11.2, the corresponding trading strategy is to buy (respectively sell) 100,000 euros when b Y > 0 (respectively b Y < 0). To measure the quality of the predictions, the binary method based on the order book imbalance indicator is taken as a benchmark. The linear regression is computed using 30 indicators, including the order 153 book imbalance, thus it should perform at least as well as the binary method. Figure 7 compares the profitabilities of the two strategies. The detailed statistics per stock are given in Appendix C. Similarly Figure 11.7: The quality of the OLS prediction: The results of the OLS method are not better than those of the binary one. to the binary method, the performance of the OLS method decreases as the horizon increases. Moreover, the surprising result is that, when combining all the 30 indicators, the results are not better than just applying the binary method to the order book imbalance indicator. This leads to questioning the quality of the regression. Figure 8 gives some example of the OLS regression coefficients. It is observed that the coefficients are not stable over the time. For example, for some period, the regression coefficient of the order book imbalance indicator is negative. This does not make any financial sense. In fact, when the imbalance is high, the order book shows more liquidity on the bid side (participants willing to buy) than the ask side (participants willing to sell). This state of the order book is observed on average before an up move, i.e., a positive return. The regression coefficient should, thus, be always positive. It is also observed that, for highly correlated indicators, the regression coefficients might be quite different. This result also does not make sense, since one would expect to have close coefficients for similar indicators. From a statistical view, this is explained by the high MSE caused by the high colinearity between the variables. In the following paragraphs, the numerical aspect is addressed, and some popular solutions to the OLS estimation problems are tested. 11.4.2 Ridge regression When solving a linear system AX = B, A being invertible, if a small change in the coefficient matrix (A) or a small change in the right hand side (B) causes a large change in the solution vector (X), the system is said to be ill-conditioned. An example of an ill-conditioned system is given below: 154 Figure 11.8: The quality of the OLS prediction: The graph on the left shows the instability of the regression coefficient of the order book imbalance indicator over the year 2013 for the stock Deutsh Telecom. The graph on the right shows, for a random day, a very different coefficients for similar indicators; the order book imbalance and its exponential moving averages. 1.000 2.000 x 4.000 x 2.000 × = => = 3.000 5.999 y 11.999 y 1.000 . When making a small change in the matrix A: 1.001 2.000 x × y 3.000 5.999 x −0.400 4.000 => = = 2.200 y 11.999 . When making a small change in the vector B: 1.000 2.000 x × 3.000 5.999 y 4.001 x −3.999 = => = 11.999 y 4.000 . Clearly, it is mandatory to take into consideration such effects before achieving any computation when dealing with experimental data. In the literature, various measures of the ill-conditioning of a matrix have been proposed [289], the most popular one [203] probably being is the condition numberK(A) = √ ||A||2 ||A−1 ||2 , where ||.||2 denotes the l2 -norm defined for a vector X as ||X||2 = tX X and for a matrix A as ||AX||2 . ||X||2 ,0 ||X||2 ||A||2 = max The larger K(A), the more ill-conditioned A is. The rationale for K(A) is to measure the sensitivity of the solution X relative to a perturbation of the matrix A or the vector B. More precisely, it is proved that: • If AX = B and A(X + δX) = B + δB then 155 ||δX||2 ||X||2 2 ≤ K(A) ||δB|| ||B||2 ; • If AX = B and (A + δA)(X + δX) = B then ||δX||2 ||X+δX||2 2 ≤ K(A) ||δA|| ||A||2 . Notice that K(A) can be easily computed as the maximum singular value of A. For example, in the system above, K(A) = 49988. The small perturbations can, thus, be amplified by afactor of almost 50000, causing the previous observations. Figure 9 represents the singular values of tX X used to compute the regression of the right graph of Figure 8. The graph shows a hard decreasing singular values. In particular, the condition number is higher than 80000. Figure 11.9: The quality of the OLS prediction: The graph shows that the matrix inverted when computing the OLS coefficient is ill-conditioned. This finding explains the instability observed on the previous section. Not only is the OLS estimator statistically not satisfactory, but the numerical problems caused by the ill-conditioning of the matrix makes the result numerically unreliable. One popular solution to enhance the stability of the estimation of the regression coefficients is the Ridge method. This method was introduced independently by A. Tikhonov, in the context of solving ill-posed problems, around the middle of the 20th century, and by A.E. Hoerl in the context of linear regression. The Ridge regression consists of adding a regularization term to the original OLS problem: βbΓ = argminβ (||Y − Xβ||22 + ||Γβ||22 ) . The new term gives preference to a particular solution with desirable properties. Γ is called the Tikhonov matrix and is usually chosen as a multiple of the identity matrix: λR I, where λR ≥ 0. The new estimator of the linear regression coefficients is called the Ridge estimator, denoted by b βR and defined as follows: b βR = argminβ (||Y − Xβ||22 + λR ||β||22 ) . 156 Similarly to the OLS case, straightforward computations show that b βR = (tX X + λR I)−1 tX Y = Zb β where Z = (I + λR (tX X)−1 )−1 = W −1 and E(b βR |X) = E(ZRb β|X) = Zβ , Var(b βR |X) = Var(Zb β|X) = σ2 Z(tX X)−1 tZ , MS E(b βR ) = E(t(Zbβ−β) (Zb β − β)|X) = E(tβ tZ Zβ|X) − 2tβ Zβ + tβ β , = T race(tZ Zσ2 (tX X)−1 ) + tβ tZ Zβ − 2tβ Zβ + tβ β. Note that (tX X)−1 = Z −1 − I λR , I − Z = (I − Z)WW −1 = (W − I)W −1 = λR (tX X)−1 W −1 = λR (WtX X)−1 = λR (tX X + λR I)−1 , and thus σ2 Z σ2 Z MS E(b βR ) = T race( ) − T race( ) + tβ (I − Z)2 β, λR Z λR Z 2 X λi = σ2 + λ2R tβ (tX X + λR I)−2 β. (λi + λR )2 The first element of the MSE corresponds exactly to the trace of the covariance matrix of b βR , i.e., the total variance of the parameters estimations. The second element is the squared distance from b βR to β and corresponds to the square of the bias introduced when adding the ridge penalty. Notice that, when increasing the λR , the bias increases and the variance decreases. On the other hand, when decreasing the λR , the bias decreases and the variance increases, both converging to their OLS values. To enhance the stability of the linear regression, one should compute a λR , such that MS E(b βR ) ≤ MS E(b β). As proved in [184], this is always possible: Theorem 11.4.1 (Hoerl) There always exist λR ≥ 0 such that MS E(b βR ) ≤ MS E(b β). From a statistical view, adding the Ridge penalty aims at reducing the MSE of the estimator, and is particularly necessary when the covariance matrix is ill-conditioned. From a numerical view, the new matrix to be inverted is tX X + λR I with as eigen values (λi + λR )i . The conditional number is K(tX X + λR I) = λmax +λR λmin +λR ≤ λmax λmin = K(tX X). Hence, the ridge regularization enhances the conditioning of the problem and improves the numerical reliability of the result. From the previous, it can be seen that increasing the λR leads to numerical stability and reduces the variance of the estimator, however it increases the bias of the estimator. One has to chose the λR as a 157 tradoff between those 2 effects. Next, 2 estimators of λR are tested; the Hoerl-Kennard-Baldwin (HKB) estimator [185] and the Lawless-Wang (LW) estimator [217] . In order to compare the stability of the Ridge and the OLS coefficients, Figure 10 and Figure 11 represent the same test of Figure 8, applied, respectively, to the Ridge HKB and the Ridge LW methods. In the 1-minute prediction case, the graphs show that the Ridge LW method gives the most coherent coefficients. In particular, the coefficient of the order book imbalance is always positive (as expected from a financial view) and the coefficients of similar indicators have the same signs. Finally, Figure 12 summarizes the profitabilities of the corresponding strategies of the 2 methods. Appendix C contains more detailed results per stock. Figure 11.10: The quality of the Ridge HKB prediction: The graphs show that the results of the Ridge HKB method are not significantly different from those of the OLS method (Figure 8). In this case, the λR is close to 0 and the effect of the regularization is limited. From the previous results, it can be concluded that adding a regularization term to the regression enhances the predictions. The next section deals with an other method of regularization: the reduction of the indicators’ space. 11.4.3 Least Absolute Shrinkage and Selection Operator (LASSO) Due to the colinearity of the indicators, the eigen values spectrum of the covariance matrix might be concentrated on the largest values, leading to an ill-conditioned regression problem. The Ridge method, reduces this effect by shifting all the eigen values. This transformation leads to a more reliable results, but might introduce a bias in the estimation. In this paragraph, a simpler transformation of the original 158 Figure 11.11: The quality of the Ridge LW prediction: The graph on the left shows the stability of the regression coefficient of the order book imbalance over the year 2013 for Deutsh Telecom. The coefficient is positive during all the period, in line with the financial view. The graph on the right shows, for a random day, a positive coefficients for the order book imbalance and its short term EMAs. The coefficients decreases with the time; ie the state of the order book “long time ago” has a smaller effect than its current state. More over, for longer than a 10-second horizon, the coefficients become negative confirming the mean-reversion effect. Figure 11.12: The quality of the Ridge prediction: For the 1-minute and the 5-minute horizons the LW method performs significantly better than the OLS method. However, for the 30-minute horizon, the HKB method gives the best results. Notice that for the 1-minute case, the LW method improves the performances by 58% compared to the OLS, confirming that stabilizing the regression coefficients (Figure 11 compared to Figure 8), leads to a better trading strategies. indicators’ space, the LASSO regression, is presented. The LASSO method [310] enhances the conditioning of the covariance matrix by reducing the num159 ber of the used indicators. Mathematically, the LASSO regression aims to produce a sparse regression coefficients -ie with some coefficients exactly equal to 0. This is possible thanks to the l1 −penalization. More precisely, the LASSO regression is to estimate the linear regression coefficient as: b βL = argminβ (||Y − Xβ||22 + λL ||β||1 ) where ||.||1 denotes the l1 -norm, defined as the sum of the coordinates’ absolute values. Writing |βi | = βi+ − βi− and βi = βi+ + βi− , with βi+ ≥ 0 and βi− ≤ 0, a classic quadratic problem, with a linear constraints, is obtained and can be solved by a classic solver. As far as known, there is no estimator for λL . In this study, the cross-validation [171] method is applied to select λL out of a set of parameters; T 10−k , where k ∈ (2, 3, 4, 5, 6) and T denotes the number of the observations. Figure 13 compares, graphically, the Ridge and the LASSO regularization, Figure 14 addresses the instability problems observed in Figure 8 and Figure 15 summarizes the results of the strategies corresponding to the LASSO method. The detailed results per stock are given in Appendix C. The next Figure 11.13: The quality of the LASSO prediction: The estimation graphs for the Ridge (on the left) and the LASSO regression (on the right). Notice that the l1 −norm leads to 0 coefficients on the less important axis. paragraph introduces the natural combination of the Ridge and the LASSO regression and presents this paper’s conclusions concerning the market inefficiency. 11.4.4 ELASTIC NET (EN) The EN regression aims to combine the regularization effect of the Ridge method and the selection effect of the LASSO one. The idea is to estimate the regression coefficients as: b βEN = argminβ (||Y − Xβ||22 + λEN1 ||β||1 + λEN2 ||β||22 ) 160 Figure 11.14: The quality of the LASSO prediction: The graphs show that the LASSO regression gives a regression coefficients in line with the financial view (similarly to figure 11). Moreover, the coefficients are sparse and simple for the interpretation. Figure 11.15: The quality of the LASSO prediction: Similar as the Ridge regression, the LASSO regression gives a better profitability than the OLS one. Notice that for the 1-minute case, the LASSO method improves the performances by 165% compared to the OLS. Eventhough the LASSO metho is using less regressors than the OLS method, (and thus less signal), the out of sample results are significantly better in the LASSO case. This result confirms the importance of the signal by noise ratio and highlights the importance of the regularization when adressing an ill-conditioned problem. The detail about the computation can be found in [330]. In this study, the estimation is computed in two steps. In the first step λEN1 and λEN2 are selected via the crossvalidation and the problem is solved same as the LASSO case. In the second step, the final coefficients are obtained by a Ridge regression (λEN1 = 0) over the selected indicators (indicators with 161 a non-zero coefficient in the first step). The two step method avoids useless l1 -penalty effects on the selected coefficients. Figure 16 shows that the coefficients obtained by the EN method are in line with the financial view and combine both regularization effects observed when using the Ridge and the LASSO methods. Finally, Figure 11.16: The quality of the EN prediction: The graphs show that the EN regression gives a regression coefficients in line with the financial view (similarly to figure 11 and 14). the strategy presented in 2.2 (trading only if b Y ≥ |θ| ) is applied to the different regression methods. Figure 17 summarizes the obtained results. The results for the three horizons confirm that the predictions of all the regularized method (Ridge, LASSO, EN) are better than the OLS ones. As detailed in the previous paragraphs, this is always the case when the indicators are highly correlated. Moreover, the graphs show that the EN method gives the best results compared to the other regressions. The 1-minute horizon results underline that, when an indicator has an obvious correlation with the target, using a simple method based exhaustively on this indicator, performs as least as well as more sophisticated methods including more indicators. Finally, the performance of the EN method for the 1-minute horizon suggest that the market is inefficient for such horizon. The conclusion is less obvious for the 5-minute horizon. On the other hand, the 30-minute horizon results show that, none of the tested methods could find any proof of the market inefficiency for such horizon. From the previous, it can be concluded that the market is inefficient in the short term, this inefficiency disappears progressively when the new information are widely diffused. 162 Figure 11.17: The quality of the EN prediction: The EN method gives the best results compared to the other regressions. Conclusions A large empirical study has been performed over the stocks of the EUROSTOXX 50 index, in order to test the predictability of returns. The first part of the study shows that the future returns are not independent of the past dynamic and state of the order book. In particular, the order book imbalance indicator is informative and provides a reliable prediction of the returns. The second part of the study shows that combining different order book indicators using adequate regressions leads to a trading strategies with a good performances even when paying the trading costs. In particular, our results show that the market is inefficient in the short term and that a period of a few minutes is necessary for prices to incorporate new 163 information. 164 Part V Appendices 165 Appendix A Limit order book data The very possibility of an experimental approach to the study of limit order books lies in the availability of data. Although exchanges themselves actually keep an extensive record of the orders that they received together with all their characteristics, the available historical data sets are generally not as complete as one would hope. Therefore, many statistical studies on the order books rely on a set of data that may be lacking some information, and some assumptions may have to be made. Most of the results presented here, at least, those we have produced ourselves, use the Thomson-Reuters TRTH database. After explaining the algorithm used for the processing of limit order book data, we describe the data sets that have been used to obtain the various empirical results presented in this book. A.1 Limit order book data processing Because one cannot distinguish market orders from cancellations in tick by tick data, and since the timestamps of the trades and tick by tick data files are asynchronous, we use a matching procedure to reconstruct the order book events. In a nutshell, we proceed as follows for each stock and each trading day: 1. Parse the tick by tick data file to compute order book state variations: • If the variation is positive (volume at one or more price levels has increased), then label the event as a limit order. • If the variation is negative (volume at one or more price levels has decreased), then label the event as a “likely market order”. • If no variation—this happens when there is just a renumbering in the field “Level” that does not affect the state of the book—do not count an event. 2. Parse the trades file and for each trade: (a) Compare the trade price and volume to likely market orders whose timestamps are in [tT r − ∆t, tT r + ∆t], where tT r is the trade timestamp and ∆t is a predefined time window1 . 1 We set ∆t = 3 s for CAC 40 stocks. We found that the median reporting delay for trades is −900 ms: on average, trades are reported 900 milliseconds before the change is recorded in tick by tick data. 167 Timestamp 33480.158 33480.476 33481.517 33483.218 33484.254 33486.832 33489.014 33490.473 33490.473 33490.473 33490.473 Side B B B B B A B B B B A Level 1 2 5 1 1 1 2 1 1 1 1 Price 121.1 121.05 120.9 121.1 121.1 121.15 121.05 121.1 121.1 121.1 121.15 Quantity 480 1636 1318 420 556 187 1397 342 304 256 237 Table A.1: Tick by tick data file sample. Note that the field “Level” does not necessarily correspond to the distance in ticks from the best opposite quote as there might be gaps in the book. Lines corresponding to the trades in table A.2 are highlighted in italics. Timestamp 33483.097 33490.380 33490.380 33490.380 Last 121.1 121.1 121.1 121.1 Last quantity 60 214 38 48 Table A.2: Trades data file sample. (b) Match the trade to the first likely market order with the same price and volume and label the corresponding event as a market order—making sure the change in order book state happens at the best price limits. (c) Remaining negative variations are labeled as cancellations. Doing so, we have an average matching rate of around 85% for CAC 40 stocks. As a byproduct, one gets the sign of each matched trade, that is, whether it is buyer or seller initiated. Tables A.1 and A.2 below provide an example of the data files and of the matching algorithm A.2 Stylized facts In Chapter 2, we have produced our own empirical plots. We use Thomson-Reuters tick-by-tick data on the Paris Bourse. We select four stocks: France Telecom (FTE.PA) , BNP Paribas (BNPP.PA), Societe Générale (SOGN.PA) and Renault (RENA.PA). For any given stocks, the data displays time-stamps, traded quantities, traded prices, the first five best-bid limits and the first five best-ask limits. Except when mentioned otherwise, all statistics are computed using all trading days from Oct, 1st 2007 to May, 30th 2008, i.e. 168 trading days. On a given day, orders submitted between 9:05am and 5:20pm are taken into account, i.e. first and last minutes of each trading days are removed. 168 A.3 Market taking and making In Chapter 4, we use the following order book data for several types of financial assets: • BNP Paribas (RIC2 : BNPP.PA): 7th component of the CAC40 during the studied period • Peugeot (RIC: PEUP.PA): 38th component of the CAC40 during the studied period • Lagardère SCA (RIC: LAGA.PA): 33th component of the CAC40 during the studied period • Dec.2009 futures on the 3-month Euribor (RIC: FEIZ9) • Dec.2009 futures on the Footsie index (RIC: FFIZ9) We use Reuters RDTH tick-by-tick data from September 10th, 2009 to September 30th, 2009 (i.e. 15 days of trading). For each trading day, we use only 4 hours of data, precisely from 9:30 am to 1:30 pm. As we are studying European markets, this time frame is convenient because it avoids the opening of American markets and the consequent increase of activity. Our data is composed of snapshots of the first five limits of the order books (ten for the BNPP.PA stock). These snapshots are timestamped to the millisecond and taken at each change of any of the limits or at each transaction. The data analysis is performed as follows for a given snapshot: 1. if the transaction fields are not empty, then we record a market order, with given price and volume; 2. if the quantity offered at a given price has increased, then we record a limit order at that price, with a volume equal to the difference of the quantities observed; 3. if the quantity offered at a given price has decreased without any transaction being recorded, then we record a cancellation order at that price, with a volume equal to the difference of the quantities observed; 4. finally, if two orders of the same type are recorded at the same time stamp, we record only one order with a volume equal to the sum of the two measured volumes. Therefore, market orders are well observed since transactions are explicitly recorded, but it is important to note that our measure of the limit orders and cancellation orders is not direct. In table A.3, we give for each studied order book the number of market and limit orders detected on our 15 4-hour samples. On the studied period, market activity ranges from 2.7 trades per minute on the least liquid stock (LAGA.PA) to 14.2 trades per minute on the most traded asset (Footsie futures). 2 Reuters Identification Code 169 Code BNPP.PA PEUP.PA LAGA.PA FEIZ9 FFIZ9 Number of limit orders 321,412 228,422 196,539 110,300 799,858 Number of market orders 48,171 23,888 9,834 10,401 51,020 Table A.3: Number of limit and markets orders recorded on 15 samples of four hours (Sep 10th to Sep 30th, 2009 ; 9:30am to 1:30pm) for 5 different assets (stocks, index futures, bond futures) 170 Appendix B Some useful mathematical notions B.1 Point processes General reference are [104][? ]. Definition B.1.1 (Point process) A point process is an increasing sequence (T n )n∈N of positive random variables defined on a measurable space (Ω, F , P). We will restrict our attention to processes that are nonexplosive, that is, for which limn→∞ T n = ∞. To each realization (T n ) corresponds a counting function (N(t))t∈R+ defined by N(t) = n if t ∈ [T n , T n+1 [, n ≥ 0. (B.1.1) (N(t)) is a right continuous step function with jumps of size 1 and carries the same information as the sequence (T n ), so that (N(t)) is also called a point process. Definition B.1.2 (Multivariate point process) A multivariate point process (or marked point process) is a point process (T n ) for which a random variable Xn is associated to each T n . The variables Xn take their values in a measurable space (E, E). We will restrict our attention to the case where E = {1, . . . , M}, m ∈ N∗ . For each m ∈ {1, . . . , M}, we can define the counting processes N m (t) = X I(T n ≤ t)I(Xn = i). (B.1.2) n≥1 We also call the process N(t) = (N 1 (t), . . . , N M (t)) a multivariate point process. Definition B.1.3 (Intensity of a point process) A point process (N(t))t∈R+ can be completely characterized by its (conditional) intensity function, λ(t), defined as λ(t) = lim u→0 P [N(t + u) − N(u) = 1|Ft ] , u 171 (B.1.3) where Ft is the history of the process up to time t, that is, the specification of all points in [0, t]. Intuitively P [N(t + u) − N(u) = 1|Ft ] = λ(t) u + o(u), (B.1.4) P [N(t + u) − N(u) = 0|Ft ] = 1 − λ(t)u + o(u), (B.1.5) P [N(t + u) − N(u) > 1|Ft ] = o(u). (B.1.6) This is naturally extended to the multivariate case by setting for each m ∈ {1, . . . , M} P [N m (t + u) − N m (u) = 1|Ft ] . u→0 u λm (t) = lim (B.1.7) Definition B.1.4 A point process is stationary when for every r ∈ N∗ and all bounded Borel subsets A1 , . . . , Ar of the real line, the joint distribution of {N(A1 + t), . . . , N(Ar + t)} does not depend on t. B.1.1 Hawkes processes We recall several classical results on multivariate Hawkes processes. Let then N = (N 1 , ..., N D ) be a D-dimensional point process with intensity vector λ = (λ1 , ..., λD ). Definition B.1.5 We say that N = (N 1 , . . . , N D ) is a multivariate Hawkes process when λm t = λm 0 + D X αm j t Z j e−βm j (t−s) dN s 0 j=1 for 1 6 m 6 D. ij Proposition B.1.1 Define the processes µi j as: µt = αi j {µi j }1≤i, j≤D . Then, the process (N, µ) is markovian. Rt 0 j e−βi j (t−s) dN s , 1 ≤ i, j ≤ D, and let µ = Proof. The proof of this classical result can be found e.g. in [57]. Stationarity Extending the early stability and stationarity result in [178], [57] proves a general result for the multivariate Hawkes processes just introduced: Proposition B.1.2 Let the matrix A be defined by Ai j = α ji , β ji 1 ≤ i, j ≤ D. If its spectral radius ρ(A) satisfies the condition ρ(A) < 1, 172 (B.1.8) then there exists a (unique) multivariate point process N = (N 1 , . . . , N m ) whose intensity is specified as in Definition B.1.5. Morevover, this process is stable, and reaches its unique stationary distribution exponentially fast. The interested reader is referred to the original paper [57] for the proof of this result. Note that, in Section B.1.1, a straightforward, explicit construction of a Lyapunov function for such a process is given. This function actually provides an alternate proof of stability, together with the exponential convergence towards the stationary distribution, based on the concept of V-geometric ergodicity of [247]. Lyapunov functions for Hawkes processes For the sake of completeness, an explicit construction of a Lyapunov function for a multi-dimensional Hawkes processes N = (N i ) with intensities λit = λi0 + t XZ j αi j e−βi j (t−s) dN s 0 j is provided here. Denote as in Proposition B.1.1 ij µt = t Z j αi j e−βi j (t−s) dN s , 0 so that there holds λit = λi0 + X ij µt . (B.1.9) j We assume for simplicity 1 the following ∀i, j, αi j > 0, βi j > 0, (B.1.10) as well as the spectral condition (B.1.8). The infinitesimal generator associated to the markovian process (µi j ), 1 6 i, j 6 D, is the operator LH F(µ) = X λ j (F(µ + ∆ j (µ)) − F(µ)) − j X i, j βi j µi j ∂F , ∂µi j where µ is the vector with components µi j and λ j is as in (B.1.9). The notation ∆ j (µ) characterizes the jumps in those of the entries in µ that are affected by a jump of the process N j . For a fixed index j, it is given by the vector with entries αi j at the relevant spots, and zero entries elsewhere. A Lyapunov function for the associated semi-group is sought under the form V(µ) = X δi j µi j i, j 1 The classical assumption of irreducibility would be sufficient, as long as all the β’s are strictly positive 173 (B.1.11) (since the intensities are always positive, a linear function will be coercive). Assuming (B.1.11), there holds LH V = X λ j( X j δi j αi j ) − X i βi j µi j δi j i, j or LH V = X j (λ0 + X i, j µ jk )δi j αi j − βi j µi j δi j . (B.1.12) k At this stage, it is convenient to introduce the maximal eigenvector of the matrix A (introduced in Proposition B.1.2) with entries Ai j = α ji . β ji Denote by κ the associated maximal eigenvalue. By Assumption (B.1.8), one has that 0 < κ < 1 and furthermore, by Perron-Frobenius theorem, there holds: ∀i, i > 0. Assuming that δi j ≡ the expression for V becomes V(µ) = i , βi j (B.1.13) X µi j i . βi j i, j (B.1.14) Plugging (B.1.14) in (B.1.12) yields LH V = X j λ0 δi j αi j + i, j = X µ jk i i, j,k X j λ0 δi j αi j + (κ − 1) i, j using the identity P j A ji i αi j X − β jk µ jk δ jk βi j j,k X k µ jk , j,k = κ j . A comparison with (B.1.14) easily yields the upper bound LH V 6 −γV + C, (B.1.15) with γ = (1 − K)βmin , βmin ≡ In fi, j (βi j ) > 0 by assumption, and C = P j i, j λ0 δi j αi j ≡ κ.λ0 . The following result generalizes the form of Lyapunov functions beyond Equation (B.1.11): Lemma B.1.1 Under the standing assumptions (B.1.8) and (B.1.10), one can construct a Lyapunov function of arbitrary high polynomial growth at infinity. Proof. Let n ∈ N∗ , and V be the function defined in (B.1.14). The function V n satisfies LH V n (µ) = X λ j (V n (µ + ∆ j (µ)) − V n (µ)) − nV n−1 ( j X i, j βi j µi j ∂V ). ∂µi j Upon factoring V n (µ + ∆ j (µ)) − V n (µ): n−1 X V n (µ + ∆ j (µ)) − V n (µ) = (V(µ + ∆ j (µ)) − V(µ))( V n−1−k (µ + ∆ j (µ))V k (µ)), k=0 174 (B.1.16) the linearity of V yields the following expression V n (µ + ∆ j (µ)) − V n (µ) = nV n−1 (µ)(V(µ + ∆ j (µ)) − V(µ)) + M j V(µ), where M j V(µ) can be bounded by a polynomial function of order n − 1 at infinity in µ. Therefore, one can rewrite (B.1.16) as follows LH V n (µ) = (nV n−1 LH V)(µ) + MV(µ), (B.1.17) where M(V)(µ) is a polynomial of order n − 1 in µ. Combining (B.1.14) with (B.1.17) shows that V n is also a Lyapunov function for the Hawkes process. B.2 Ergodic theory for Markov processes B.2.1 The ergodic theorem The Ergodic Theorem for Markov processes states the following: Theorem B.2.1 ([247][239][63]) Let H be in L1 (Π(dX)). Then, a.s. 1 lim t→+∞ t B.2.2 Z t G(Xt )dt = Z G(X)Π(dX). 0 Stochastic stability The main reference here is [? ]. Recall some useful definitions from the theory of Markov chains and stochastic stability. Let (Qt )t≥0 be the Markov transition probability function of a Markov process at time t, that is Qt (x, E) := P [X(t) ∈ E|X(0) = x] , t ∈ R+ , x ∈ S, E ⊂ S, (B.2.1) where S is the (locally compact, metric) state space. We recall that a (aperiodic, irreducible) Markov process is ergodic if an invariant probability measure π exists and lim ||Qt (x, .) − π(.)|| = 0, ∀x ∈ S, t→∞ (B.2.2) where ||.|| designates for a signed measure ν the total variation norm defined as ||ν|| := sup |ν( f )| = sup ν(E) − inf ν(E). f :| f |<1 E∈B(S) E∈B(S) (B.2.3) In (B.2.3), B(S) is the Borel σ-field generated by S, and for a measurable function f on S, ν( f ) := R f dν. S V-uniform ergodicity. A Markov process is said V-uniformly ergodic if there exists a coercive func175 tion V > 1, an invariant distribution π, and constants 0 < r < 1, and R < ∞ such that ||Qt (x, .) − π(.)|| ≤ Rrt V(x), x ∈ S, t ∈ R+ . (B.2.4) V−uniform ergodicity can be characterized in terms of the infinitesimal generator of the Markov process. Indeed, it is shown in [247, 246] that it is equivalent to the existence of a coercive function V (the “Lyapunov test function”) such that LV(x) ≤ −βV(x) + γ, (Geometric drift condition.) (B.2.5) for some positive constants β and γ. (Theorems 6.1 and 7.1 in [246].) Intuitively, condition (B.2.5) says that the larger V(X(t)) the stronger X is pulled back towards the center of the state space S. 176 Appendix C Comparison of various prediction methods C.1 Results for the binary classification 177 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN Order book imbalance AUC Accuracy 0.54 0.54 0.56 0.56 0.61 0.61 0.56 0.56 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.56 0.56 0.53 0.53 0.55 0.55 0.62 0.62 0.55 0.55 0.54 0.54 0.53 0.53 0.54 0.54 0.56 0.56 0.56 0.56 0.63 0.63 0.64 0.64 0.58 0.58 0.54 0.54 0.62 0.62 0.52 0.52 0.56 0.56 0.56 0.56 0.53 0.53 0.60 0.60 0.59 0.59 0.59 0.59 0.58 0.58 0.60 0.60 0.56 0.56 0.57 0.57 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.59 0.59 0.57 0.57 0.56 0.56 0.57 0.57 0.57 0.57 Flow quantity AUC Accuracy 0.53 0.53 0.53 0.53 0.51 0.51 0.53 0.53 0.52 0.52 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.52 0.52 0.53 0.53 0.58 0.58 0.51 0.51 0.53 0.53 0.52 0.52 0.53 0.53 0.52 0.52 0.54 0.54 0.51 0.51 0.51 0.51 0.51 0.51 0.52 0.52 0.50 0.50 0.51 0.51 0.52 0.52 0.54 0.54 0.53 0.53 0.51 0.51 0.55 0.55 0.52 0.52 0.52 0.52 0.53 0.53 0.55 0.55 0.54 0.54 0.53 0.53 0.53 0.53 0.53 0.53 0.52 0.52 0.53 0.53 0.53 0.53 0.52 0.52 0.53 0.53 0.50 0.50 0.52 0.52 0.53 0.53 0.52 0.52 Past return AUC Accuracy 0.51 0.51 0.51 0.51 0.53 0.53 0.51 0.51 0.50 0.50 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.53 0.53 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.50 0.50 0.55 0.55 0.56 0.56 0.51 0.51 0.51 0.51 0.54 0.54 0.51 0.51 0.50 0.50 0.51 0.51 0.51 0.51 0.53 0.53 0.50 0.50 0.52 0.52 0.51 0.51 0.52 0.52 0.50 0.50 0.51 0.51 0.51 0.51 0.51 0.51 0.50 0.50 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 Table C.1: The quality of the binary prediction: 1-minute prediction AUC and accuracy per stock C.2 Results for the four-class classification 178 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN Order book imbalance Gain σ(Gain) 1388 1201 1603 1112 2775 1219 1969 1278 1156 1102 1269 1055 1954 1537 1330 1219 1591 993 1120 1608 1878 1572 4144 1881 2003 1373 1380 1275 1251 1372 1410 1113 1586 1416 1762 1315 3723 1655 2996 1185 2245 1193 1256 956 3977 1764 1195 1763 2031 1227 2220 1433 1511 1564 4019 1911 2481 1452 2445 1220 1895 1107 2367 1109 1978 1173 2694 1451 1323 1348 1717 1535 1368 1040 1225 1022 1612 1359 1108 983 1419 1294 2694 1267 3039 2025 1402 766 2142 1223 2044 1440 Flow quantity Gain σ(Gain) 1107 1308 996 1005 221 1107 1244 1316 921 1311 1142 1251 1866 1700 1240 1325 958 1143 831 1620 1461 1601 2853 1691 674 1428 1130 1228 905 1405 1252 1211 848 1196 1523 1295 295 1384 321 1161 481 1722 831 977 177 1324 853 1896 934 1389 1626 1514 1493 1491 153 1787 1742 1525 533 1148 791 1485 894 1242 1670 1565 1700 1607 1475 1880 1393 1577 1118 1123 939 1071 1209 1449 967 1196 1014 1275 1156 1341 382 1850 551 860 1114 1391 1165 1397 Past return Gain σ(Gain) 174 1264 169 936 638 1175 190 1419 2 1185 289 1296 595 1934 347 1394 231 1196 526 1911 600 1665 1496 1542 582 1603 208 1390 310 1672 376 1113 308 1298 12 1281 1219 1307 1109 1201 323 1445 326 950 1210 1577 643 2060 156 1355 566 1403 217 1720 1048 1954 145 1344 613 1267 194 1006 438 1220 182 1251 292 1558 307 1747 383 1684 107 1190 117 1084 455 1607 164 1124 379 1436 290 1194 683 2002 222 949 244 1326 225 1359 Table C.2: The quality of the binary prediction: The daily gain average and standard deviation for the 1-minute prediction (without trading costs) Notice that the nans on the tables of the Appendix 2 correspond to the cases where |b Y| is always lower 179 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN Order book imbalance Gain σ(Gain) -191 1189 81 1112 1141 1063 370 1179 -422 1064 -363 1002 303 1477 -260 1176 -40 963 -402 1596 251 1486 2971 1714 313 1299 -231 1243 -394 1368 -170 1072 50 1407 185 1265 2151 1456 1513 971 583 1096 -362 934 2369 1565 -405 1718 402 1140 762 1332 -186 1519 2333 1715 1110 1375 831 1119 366 1011 816 985 377 1113 1233 1308 -182 1251 205 1431 -279 998 -340 1000 -48 1326 -472 966 -162 1263 1124 1130 1434 1940 -253 730 547 1113 446 1373 Flow quantity Gain σ(Gain) -788 1325 -980 1057 -1199 1309 -697 1335 -955 1338 -734 1249 -58 1681 -530 1263 -906 1164 -1022 1618 -492 1606 934 1612 -1064 1488 -748 1235 -959 1423 -656 1224 -949 1225 -389 1296 -1069 1610 -1136 1375 -1108 1887 -1058 1024 -1403 1539 -846 1901 -951 1438 -312 1503 -450 1470 -1081 1822 -195 1535 -1183 1296 -1019 1490 -797 1274 -272 1575 -184 1585 -399 1864 -492 1566 -720 1127 -944 1093 -694 1463 -898 1209 -872 1296 -686 1342 -896 1953 -1246 938 -804 1386 -785 1408 Past return Gain σ(Gain) -1222 1531 -1211 1164 -952 1162 -1301 1574 -1298 1558 -1122 1503 -910 2027 -1256 1510 -1246 1369 -1115 1998 -975 1690 -27 1549 -1152 1560 -1206 1529 -1277 1819 -1093 1324 -1128 1516 -1104 1575 -329 1198 -281 1046 -1047 1592 -1278 1206 -484 1490 -968 2002 -1249 1513 -1094 1475 -1186 1890 -517 1820 -1155 1457 -928 1235 -1260 1177 -982 1236 -1255 1490 -1188 1713 -1122 1960 -1064 1822 -1382 1454 -1428 1277 -1060 1655 -1353 1363 -1339 1493 -1044 1257 -738 2067 -1344 1142 -1186 1452 -979 1584 Table C.3: The quality of the binary prediction: The daily gain average and standard deviation for the 1-minute prediction (with trading costs) than θ thus no positions are taken. 180 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN Order book imbalance AUC Accuracy 0.58 0.59 0.71 0.72 0.69 0.69 0.60 0.60 0.60 0.60 0.53 0.55 0.57 0.57 0.57 0.58 0.60 0.60 0.58 0.59 0.59 0.60 0.70 0.70 0.58 0.60 0.57 0.57 0.55 0.55 0.60 0.60 0.71 0.72 0.60 0.60 0.73 0.73 0.76 0.76 0.64 0.64 0.54 0.59 0.68 0.68 0.55 0.56 0.62 0.62 0.63 0.63 0.55 0.55 0.67 0.67 0.68 0.68 0.65 0.66 0.66 0.66 0.67 0.67 0.61 0.62 0.63 0.63 0.58 0.58 0.57 0.56 0.60 0.60 0.52 0.61 0.56 0.58 0.56 0.61 0.57 0.58 0.68 0.68 0.64 0.65 0.50 0.63 0.63 0.63 0.62 0.62 Flow quantity AUC Accuracy 0.50 0.42 nan nan 0.50 0.54 0.50 0.54 nan nan 0.50 0.59 0.55 0.55 0.55 0.55 nan nan 0.50 0.50 0.50 0.56 0.64 0.64 nan nan 0.50 0.51 0.54 0.56 0.55 0.56 nan nan 0.50 0.55 nan nan nan nan nan nan nan nan nan nan 0.50 0.54 nan nan 0.56 0.56 0.54 0.55 nan nan 0.58 0.58 nan nan 0.55 0.55 nan nan 0.50 0.51 0.53 0.58 0.53 0.55 0.52 0.51 nan nan 0.50 0.56 0.54 0.58 0.55 0.56 nan nan 0.53 0.57 0.50 0.54 nan nan nan nan 0.49 0.49 Past return AUC Accuracy 0.50 0.50 0.50 0.58 0.61 0.61 0.48 0.48 0.49 0.50 0.50 0.56 0.55 0.56 0.54 0.55 0.58 0.58 0.52 0.53 0.56 0.56 0.55 0.56 0.56 0.56 0.54 0.54 0.52 0.52 0.56 0.56 0.51 0.51 0.52 0.56 0.57 0.60 0.61 0.61 0.53 0.53 0.50 0.46 0.60 0.60 0.52 0.54 0.53 0.53 0.57 0.57 0.52 0.55 0.58 0.58 0.55 0.55 0.58 0.58 0.54 0.54 0.58 0.58 0.52 0.54 0.57 0.57 0.52 0.52 0.58 0.58 0.50 0.60 0.52 0.54 0.54 0.55 0.59 0.59 0.56 0.57 0.56 0.56 0.57 0.57 nan nan 0.51 0.52 0.52 0.53 Table C.4: The quality of the 4-class prediction: 1-minute prediction AUC and accuracy per stock 181 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN Order book imbalance Gain σ(Gain) 137 388 306 577 1363 779 440 651 87 287 21 128 390 665 107 281 168 366 171 428 486 715 2534 1240 594 786 93 289 34 224 154 451 488 827 351 596 2219 1056 2000 773 651 680 10 93 2520 1420 184 503 504 692 738 951 109 373 2512 1248 1039 914 930 847 370 533 800 674 440 613 1234 1013 192 556 228 501 26 127 50 196 210 519 26 139 123 434 1402 825 1316 1393 16 104 583 826 530 745 Flow quantity Gain σ(Gain) -6 98 0 0 4 47 5 63 0 0 14 137 273 669 47 276 0 0 3 66 11 139 1364 1077 0 0 2 24 38 212 13 111 0 0 17 164 0 0 0 0 0 0 0 0 0 0 2 25 0 0 155 512 59 296 0 0 151 587 0 0 26 145 0 0 6 94 142 445 85 380 4 158 0 0 24 187 30 186 31 198 0 0 36 232 17 197 0 0 0 0 -0 78 Past return Gain σ(Gain) 4 131 3 42 68 276 -2 132 -2 48 14 99 208 582 52 238 4 47 44 453 136 469 202 560 55 320 16 191 12 291 27 147 3 66 10 106 193 503 110 300 10 168 1 38 249 756 56 410 21 171 115 409 7 138 185 731 44 223 64 277 3 50 22 112 11 116 110 555 29 364 168 635 6 90 6 200 88 362 28 162 37 214 34 205 247 835 0 0 5 141 1 215 Table C.5: The quality of the 4-class prediction: The daily gain average and standard deviation for the 1-minute prediction (without trading costs) C.3 Performances of the OLS method C.4 Performances of the ridge method 182 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN Order book imbalance Gain σ(Gain) 22 263 128 329 586 559 125 408 15 189 -14 105 107 507 -1 193 21 210 34 271 116 506 1848 1102 174 550 -7 245 -23 204 38 281 241 526 88 388 1338 881 1082 613 185 475 -5 72 1518 1179 2 412 142 464 340 722 -13 329 1514 1096 547 702 372 581 111 322 285 443 113 417 611 809 38 450 20 392 1 69 2 120 25 403 2 74 16 317 656 663 693 1159 1 56 214 617 171 545 Flow quantity Gain σ(Gain) -9 150 0 0 -1 16 -0 32 0 0 1 86 31 465 1 199 0 0 -12 126 -8 131 518 844 0 0 -1 14 8 122 -3 73 0 0 -14 157 0 0 0 0 0 0 0 0 0 0 -2 24 0 0 28 331 6 209 0 0 3 400 0 0 -3 62 0 0 -6 96 40 254 -2 299 -31 203 0 0 -4 137 -1 114 -2 89 0 0 6 139 -5 173 0 0 0 0 -7 115 Past return Gain σ(Gain) -38 183 1 31 8 194 -25 168 -7 51 -2 77 16 474 -12 184 -4 42 -65 481 18 362 18 442 -23 274 -32 199 -33 311 -5 111 -10 79 -4 91 -18 443 -25 211 -22 173 -3 49 58 636 -41 394 -8 126 18 292 -12 147 -20 658 -10 198 -11 169 -6 46 -5 85 -8 105 27 437 -42 372 49 463 -0 79 -30 207 -7 289 -3 141 -14 195 -7 183 19 628 0 0 -27 175 -45 246 Table C.6: The quality of the 4-class prediction: The daily gain average and standard deviation for the 1-minute prediction (with trading costs) C.5 Performances of the LASSO method 183 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN Order book imbalance Gain σ(Gain) -11 887 -57 669 -14 762 -87 862 -20 807 38 759 -16 1263 -25 783 -61 744 -28 998 4 1108 75 962 12 1054 75 872 54 1054 110 761 27 722 29 830 27 991 7 628 -70 911 49 660 18 1011 53 1413 59 906 3 1017 -21 1138 -128 1359 -8 894 -36 831 29 641 -19 671 -24 844 -87 878 32 1132 2 1150 -29 810 4 683 -66 996 127 771 -31 896 -12 759 130 1529 5 543 21 874 71 929 Flow quantity Gain σ(Gain) -6 845 -17 633 69 689 43 1075 -3 781 -93 774 -63 1138 -23 923 19 726 -2 1179 -135 1082 -105 1161 6 1055 -51 825 -89 1152 80 742 81 700 43 827 -40 971 -18 651 -4 963 108 689 2 1094 67 1253 -24 847 -73 960 105 1205 -54 1329 -161 912 15 725 -25 688 31 755 24 789 -5 920 61 1217 -60 1072 25 856 -52 709 22 994 -35 802 -79 837 42 918 58 1498 31 546 -15 899 120 994 Past return Gain σ(Gain) -41 823 -21 624 -41 729 -29 897 -67 722 -46 765 16 1084 -13 901 -18 745 -9 1151 -52 972 -6 1117 49 1111 9 961 -35 996 100 743 -14 718 41 872 55 959 -16 645 65 826 73 669 11 1085 -5 1335 25 823 51 949 -80 1142 85 1288 17 860 -26 675 -7 727 15 727 -29 841 3 925 46 1140 48 1090 7 794 -15 682 -51 945 -59 725 8 838 111 912 81 1357 -26 508 6 859 75 1055 Table C.7: The quality of the 4-class prediction: The daily gain average and standard deviation for the 30-minute prediction (without trading costs) 184 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN Order book imbalance Gain σ(Gain) -55 887 -96 672 -57 764 -132 863 -61 809 -7 758 -58 1265 -65 784 -101 743 -71 997 -39 1110 31 960 -31 1052 36 874 9 1053 72 763 -12 722 -9 830 -17 993 -36 627 -106 911 10 661 -26 1011 10 1415 14 905 -40 1016 -63 1137 -172 1359 -48 896 -82 830 -13 641 -57 674 -65 845 -128 877 -7 1130 -37 1149 -67 810 -37 683 -105 997 84 772 -73 896 -49 760 84 1529 -39 543 -24 874 33 929 Flow quantity Gain σ(Gain) -51 845 -61 635 23 687 -6 1072 -47 780 -136 777 -108 1137 -69 923 -25 726 -46 1180 -182 1085 -152 1163 -37 1054 -93 825 -138 1151 40 742 36 702 -1 828 -81 974 -58 652 -45 965 66 690 -44 1096 19 1252 -70 847 -117 962 58 1207 -97 1327 -204 913 -30 725 -66 691 -9 754 -23 788 -52 920 15 1218 -103 1073 -21 856 -100 709 -23 995 -77 805 -123 836 -4 919 15 1499 -14 545 -61 900 76 995 Past return Gain σ(Gain) -84 824 -62 625 -80 731 -73 896 -108 724 -84 767 -25 1082 -53 902 -60 742 -51 1149 -94 972 -48 1116 4 1109 -31 961 -77 997 65 743 -53 720 -2 869 17 959 -57 642 22 824 34 666 -32 1087 -51 1336 -16 818 5 947 -122 1144 47 1290 -22 859 -68 675 -49 728 -22 728 -71 839 -41 920 5 1140 6 1089 -34 797 -60 680 -93 946 -98 725 -34 838 68 913 40 1359 -67 509 -37 856 38 1058 Table C.8: The quality of the binary prediction: The daily gain average and standard deviation for the 30-minute prediction (with 0.5 bp trading costs) 185 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN 1-min horizon AUC Accuracy 0.54 0.54 0.57 0.57 0.61 0.61 0.55 0.55 0.54 0.54 0.54 0.54 0.54 0.54 0.55 0.55 0.56 0.56 0.53 0.53 0.55 0.55 0.62 0.62 0.55 0.55 0.54 0.54 0.53 0.53 0.55 0.55 0.56 0.56 0.56 0.56 0.62 0.62 0.64 0.64 0.57 0.57 0.54 0.54 0.61 0.61 0.53 0.53 0.56 0.56 0.57 0.57 0.53 0.53 0.59 0.59 0.59 0.59 0.59 0.59 0.58 0.58 0.60 0.60 0.56 0.56 0.57 0.57 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.59 0.59 0.56 0.56 0.56 0.56 0.57 0.57 0.56 0.56 5-min horizon AUC Accuracy 0.50 0.50 0.52 0.52 0.53 0.53 0.51 0.51 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.56 0.56 0.51 0.51 0.51 0.51 0.51 0.51 0.52 0.52 0.52 0.52 0.51 0.51 0.53 0.53 0.54 0.54 0.52 0.52 0.51 0.51 0.54 0.54 0.50 0.50 0.51 0.51 0.52 0.52 0.51 0.51 0.51 0.51 0.53 0.53 0.52 0.52 0.53 0.53 0.52 0.52 0.51 0.51 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.52 0.52 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 0.52 0.52 30-min horizon AUC Accuracy 0.50 0.50 0.49 0.49 0.50 0.50 0.51 0.51 0.50 0.50 0.51 0.51 0.49 0.49 0.49 0.49 0.49 0.49 0.50 0.50 0.52 0.52 0.52 0.52 0.50 0.50 0.50 0.50 0.51 0.51 0.51 0.51 0.50 0.51 0.51 0.51 0.48 0.48 0.50 0.50 0.48 0.48 0.50 0.50 0.50 0.50 0.52 0.52 0.50 0.50 0.51 0.51 0.49 0.49 0.50 0.50 0.52 0.52 0.52 0.52 0.50 0.50 0.51 0.51 0.50 0.50 0.51 0.51 0.49 0.49 0.49 0.49 0.49 0.49 0.51 0.51 0.52 0.52 0.50 0.50 0.51 0.51 0.50 0.50 0.49 0.49 0.50 0.50 0.51 0.51 0.51 0.51 Table C.9: The quality of the OLS prediction: The AUC and the accuracy per stock for the different horizons 186 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN 1-min horizon AUC Accuracy 0.54 0.54 0.57 0.57 0.61 0.61 0.55 0.55 0.54 0.54 0.54 0.54 0.54 0.54 0.55 0.55 0.56 0.56 0.53 0.53 0.55 0.55 0.62 0.62 0.56 0.55 0.54 0.54 0.53 0.53 0.55 0.55 0.56 0.56 0.56 0.56 0.62 0.62 0.65 0.65 0.57 0.57 0.54 0.54 0.62 0.62 0.53 0.53 0.57 0.57 0.57 0.57 0.53 0.53 0.60 0.60 0.59 0.59 0.59 0.59 0.59 0.59 0.60 0.60 0.56 0.56 0.58 0.58 0.54 0.54 0.54 0.54 0.54 0.54 0.55 0.55 0.54 0.54 0.54 0.54 0.55 0.55 0.59 0.59 0.57 0.57 0.56 0.56 0.57 0.57 0.57 0.57 5-min horizon AUC Accuracy 0.50 0.50 0.52 0.52 0.53 0.53 0.51 0.51 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.56 0.56 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.52 0.52 0.52 0.51 0.51 0.53 0.53 0.54 0.54 0.52 0.52 0.51 0.51 0.54 0.54 0.50 0.50 0.52 0.52 0.53 0.53 0.51 0.50 0.52 0.52 0.53 0.53 0.52 0.52 0.53 0.53 0.52 0.52 0.51 0.51 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.52 0.52 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 0.52 0.52 30-min horizon AUC Accuracy 0.50 0.50 0.50 0.50 0.49 0.49 0.51 0.51 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.52 0.52 0.52 0.52 0.50 0.50 0.50 0.50 0.51 0.51 0.51 0.51 0.51 0.52 0.51 0.51 0.48 0.48 0.50 0.50 0.48 0.48 0.50 0.50 0.51 0.51 0.53 0.52 0.50 0.50 0.52 0.52 0.50 0.50 0.50 0.50 0.52 0.52 0.50 0.50 0.50 0.50 0.51 0.51 0.49 0.49 0.52 0.52 0.50 0.50 0.50 0.50 0.51 0.51 0.51 0.51 0.52 0.52 0.51 0.51 0.50 0.50 0.51 0.51 0.49 0.49 0.49 0.49 0.51 0.51 0.51 0.51 Table C.10: The quality of the Ridge HKB prediction: The AUC and the accuracy per stock for the different horizons 187 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN 1-min horizon AUC Accuracy 0.55 0.55 0.57 0.57 0.61 0.61 0.56 0.56 0.54 0.54 0.55 0.55 0.54 0.54 0.55 0.55 0.56 0.56 0.54 0.54 0.55 0.55 0.62 0.62 0.56 0.56 0.54 0.54 0.53 0.53 0.56 0.56 0.57 0.57 0.56 0.56 0.63 0.63 0.65 0.65 0.58 0.58 0.54 0.54 0.62 0.62 0.53 0.53 0.57 0.57 0.57 0.57 0.53 0.53 0.60 0.60 0.60 0.60 0.59 0.59 0.59 0.59 0.60 0.60 0.57 0.57 0.58 0.58 0.55 0.55 0.54 0.54 0.55 0.55 0.55 0.55 0.55 0.55 0.55 0.55 0.55 0.55 0.60 0.60 0.57 0.57 0.57 0.57 0.58 0.58 0.57 0.57 5-min horizon AUC Accuracy 0.52 0.52 0.53 0.53 0.54 0.54 0.52 0.52 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.52 0.52 0.51 0.51 0.57 0.57 0.51 0.51 0.52 0.52 0.51 0.51 0.52 0.53 0.52 0.52 0.51 0.51 0.54 0.54 0.55 0.55 0.52 0.52 0.52 0.52 0.55 0.55 0.50 0.50 0.52 0.52 0.53 0.53 0.51 0.51 0.53 0.53 0.54 0.54 0.52 0.52 0.54 0.54 0.53 0.53 0.52 0.52 0.53 0.53 0.51 0.51 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 0.52 0.52 0.52 0.52 0.53 0.53 0.52 0.52 0.51 0.51 0.52 0.52 0.52 0.52 30-min horizon AUC Accuracy 0.50 0.50 0.49 0.49 0.50 0.50 0.52 0.52 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.49 0.49 0.50 0.50 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.52 0.52 0.51 0.52 0.52 0.52 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.51 0.51 0.51 0.49 0.49 0.52 0.52 0.50 0.50 0.52 0.52 0.50 0.50 0.48 0.48 0.51 0.51 0.50 0.50 0.50 0.50 0.51 0.51 0.50 0.50 0.51 0.51 0.49 0.49 0.51 0.51 0.50 0.50 0.51 0.51 0.52 0.52 0.51 0.52 0.50 0.50 0.51 0.51 0.49 0.49 0.51 0.51 0.51 0.51 0.50 0.50 Table C.11: The quality of the Ridge LW prediction: The AUC and the accuracy per stock for the different horizons 188 Stock INTERBREW AIR LIQUIDE ALLIANZ ASML Holding NV BASF AG BAYER AG BBVARGENTARIA BAY MOT WERKE DANONE BNP PARIBAS CARREFOUR CRH PLC IRLANDE AXA DAIMLER CHRYSLER DEUTSCHE BANK AG VINCI DEUTSCHE TELEKOM ESSILOR INTERNATIONAL ENEL ENI E.ON AG TOTAL GENERALI ASSIC SOCIETE GENERALE GDF SUEZ IBERDROLA I ING INTESABCI INDITEX LVMH MUNICH RE LOREAL PHILIPS ELECTR. REPSOL RWE ST BANCO SAN CENTRAL HISPANO SANOFI SAP AG SAINT GOBAIN SIEMENS AG SCHNEIDER ELECTRIC SA TELEFONICA UNICREDIT SPA UNILEVER CERT VIVENDI UNIVERSAL VOLKSWAGEN 1-min horizon AUC Accuracy 0.54 0.54 0.58 0.58 0.61 0.61 0.56 0.56 0.53 0.53 0.54 0.54 0.54 0.54 0.55 0.55 0.56 0.56 0.54 0.54 0.55 0.55 0.62 0.62 0.55 0.55 0.53 0.53 0.53 0.53 0.55 0.55 0.58 0.58 0.56 0.56 0.62 0.62 0.64 0.64 0.57 0.57 0.54 0.54 0.62 0.62 0.53 0.53 0.56 0.56 0.56 0.56 0.52 0.52 0.60 0.60 0.59 0.59 0.59 0.59 0.58 0.58 0.60 0.60 0.56 0.56 0.57 0.57 0.54 0.54 0.54 0.54 0.54 0.54 0.53 0.53 0.54 0.54 0.54 0.54 0.54 0.54 0.59 0.59 0.57 0.57 0.57 0.57 0.57 0.57 0.56 0.56 5-min horizon AUC Accuracy 0.51 0.51 0.52 0.52 0.54 0.54 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.51 0.56 0.56 0.51 0.51 0.52 0.52 0.51 0.51 0.52 0.53 0.52 0.52 0.51 0.51 0.53 0.53 0.55 0.55 0.52 0.52 0.52 0.52 0.54 0.54 0.50 0.50 0.52 0.52 0.53 0.53 0.51 0.51 0.53 0.53 0.53 0.53 0.52 0.52 0.54 0.54 0.53 0.53 0.52 0.52 0.52 0.52 0.51 0.51 0.52 0.52 0.51 0.51 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.51 0.53 0.53 0.52 0.52 0.51 0.51 0.52 0.52 0.52 0.52 30-min horizon AUC Accuracy 0.50 0.50 0.49 0.49 0.52 0.52 0.51 0.51 0.51 0.51 0.50 0.50 0.49 0.49 0.49 0.49 0.50 0.50 0.49 0.49 0.50 0.50 0.52 0.52 0.49 0.49 0.50 0.50 0.51 0.51 0.52 0.52 0.52 0.52 0.50 0.50 0.50 0.50 0.49 0.49 0.49 0.50 0.51 0.51 0.51 0.51 0.52 0.52 0.51 0.51 0.53 0.53 0.50 0.50 0.50 0.50 0.52 0.52 0.51 0.51 0.50 0.50 0.50 0.50 0.50 0.50 0.51 0.51 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.52 0.52 0.50 0.50 0.49 0.49 0.51 0.51 0.48 0.48 0.51 0.51 0.52 0.52 0.49 0.49 Table C.12: The quality of the LASSO prediction: The AUC and the accuracy per stock for the different horizons 189 190 Bibliography [1] Einstein A. Über die von der molekularkinetischen theorie der wärme geforderte bewegung von in ruhenden flüssigkeiten suspendierten teilchen. Ann. d. Physik, 17:549–560, 1905. [2] Einstein A. Zur theorie der brownschen bewegung. Ann. d. Physik, 19:371–381, 1906. [3] F. Abergel, J.P. Bouchaud, T. Foucault, C.A. Lehalle, and M. Rosenbaum, editors. Market Microstructure: Confronting Many Viewpoints. Wiley, 2012. [4] F. Abergel, B.K. Chakrabarti, Chakraborti, and A. Gosh, editors. Econophysics of systemic risk and network dynamics. Springer, New economic windows, 2012. [5] F. Abergel, B.K. Chakrabarti, A. Chakraborti, and M. Mitra, editors. Econophysics of orderdriven markets. Springer, 2011. [6] F. Abergel, N. Huth, and I. Muni Toke. Financial bubbles analysis with a cross-sectional estimator. Available at SSRN 1474612, 2009. [7] F. Abergel and A. Jedidi. A mathematical approach to order book modelling. International Journal of Theoretical and Applied Finance, 16(5), 2013. [8] F. Abergel and A. Jedidi. Long time behaviour of a Hawkes process-based limit order book. 2015. [9] F. Abergel, C.-A. Lehalle, and M. Rosenbaum. Understanding the stakes of high-frequency trading. Journal of Trading, 9(4):49–73, 2014. [10] F. Abergel and M. Politi. Optimizing a basket against the efficient market hypothesis. Quantitative Finance, 13(1):13–23, 2013. [11] F. Abergel and R. Zaatour. What drives option prices? The Journal of Trading, 7(3):12–28, 2012. [12] L. Adamopoulos. Cluster models for earthquakes: Regional comparisons. Journal of the International Association for Mathematical Geology, 8:463–475, 1976. [13] Y. Aït-Sahalia, J. Cacho-Diaz, and R.J.A. Laeven. Modeling financial contagion using mutually exciting jump processes. Working Paper 15850, National Bureau of Economic Research, mar 2010. [14] Y. Aït-Sahalia and J. Jacod. Estimating the degree of activity of jumps in high frequency data. The Annals of Statistics, 37(5A):2202–2244, 2009. 191 [15] Y. Aït-Sahalia and A. Mykland. The effects of random and discrete sampling when estimating continuous-time diffusions. Technical Report 0276, National Bureau of Economic Research, Inc, apr 2002. [16] M. Aitken, A. Kua, P. Brown, T. Watter, and H.Y. Izan. An intraday analysis of the probability of trading on the ASX at the asking price. Australian Journal of Management, 20(2):115–154, dec 1995. [17] M. Al-Suhaibani and L. Kryzanowski. An exploratory analysis of the order book, and order flow and execution on the saudi stock market. Journal of Banking & Finance, 24(8):1323–1357, August 2000. [18] V. Alfi, M. Cristelli, L. Pietronero, and A. Zaccaria. Minimal agent based model for financial markets i - origin and self-organization of stylized facts. The European Physical Journal B, 67(3):13 pages, 2009. [19] V. Alfi, M. Cristelli, L. Pietronero, and A. Zaccaria. Minimal agent based model for financial markets II. The European Physical Journal B - Condensed Matter and Complex Systems, 67(3):399– 417, February 2009. [20] A. Alfonsi, A. Fruth, and A. Schied. Optimal execution strategies in limit order books with general shape functions. Quantitative Finance, 10(2):143–157, 2010. [21] R. Almgren, C. Thum, E. Hauptmann, and H. Li. Direct estimation of equity market impact. working paper. [22] M. Anane and F. Abergel. Empirical evidence of market inefficiency : Predicting single-stock returns. Proceedings of the 8th Kolkata econophysics conference, Springer, 2015. [23] M. Anane, F. Abergel, and X. F. Lu. Mathematical modeling of the order book: New approach of predicting single-stock returns. 2015. [24] S. Asmussen. Applied probability and queues, volume 51 of Applications of Mathematics (New York). Springer-Verlag, New York, second edition, 2003. [25] L. Bachelier. Théorie de la spéculation. Annales scientifiques de l’École Normale Supérieure, Sér.3, 3:21–86, 1900. [26] E. Bacry, K. Dayri, and J.F. Muzy. Non-parametric kernel estimation for symmetric hawkes processes. application to high frequency financial data. European Physics Journal (B), 85, 2012. [27] E. Bacry, S. Delattre, M. Hoffmann, and J. F. Muzy. Modelling microstructure noise with mutually exciting point processes. Quantitative Finance, 13(1):65–77, 2013. [28] E. Bacry, S. Delattre, M. Hoffmann, and J.F. Muzy. Scaling limits for Hawkes processes and application to financial statistics. Stochastic Processes and Applications, 123(7):2475–2499, 2013. 192 [29] P. Bak, M. Paczuski, and M. Shubik. Price variations in a stock market with many agents. Physica A: Statistical and Theoretical Physics, 246(3-4):430 – 453, 1997. [30] O. E. Barndorff-Nielsen and N. Shephard. Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress, chapter Variation, jumps, market frictions and high frequency data in financial econometrics. Econometric Society Monograph. Cambridge University Press, 2007. [31] O.E. Barndorff-Nielsen and N. Shephard. Econometric of testing for jumps in financial economics using bipower variation. Oxford Financial Research Centre, 2004. [32] M.S. Bartlett. The spectral analysis of point processes. Journal of the Royal Statistical Society, (25):264–296, 1963. [33] L. Bauwens and N. Hautsch. Dynamic latent factor models for intensity processes. CORE Discussion Papers 2003103, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2003. [34] L. Bauwens and N. Hautsch. Modelling financial high frequency data using point processes. In Handbook of Financial Time Series, pages 953–979. 2009. [35] C. Bayer, U. Horst, and J. Qiu. A functional limit theorem for limit order books. 2014. [36] E. Bayraktar, U. Horst, and R. Sircar. Queueing theoretic approaches to financial price fluctuations. In Handbook of Financial Engineering. Elsevier, 2007. [37] H. Beltran-Lopez, P. Giot, and J. Grammig. Commonalities in the order book. Financial Markets and Portfolio Management, 23(3):209–242, 2009. [38] N. Bershova and D. Rakhlin. The non-linear market impact of large trades: Evidence from buyside order flow. http://dx.doi.org/10.2139/ssrn.2197534, 2013. [39] B. Biais, T. Foucault, and P. Hillion. Microstructure des marches financiers : Institutions, modeles et tests empiriques. Presses Universitaires de France - PUF, May 1997. [40] B. Biais, P. Hillion, and C. Spatt. An Empirical Analysis of the Limit Order Book and the Order Flow in the Paris Bourse. The Journal of Finance, 50:1655–1689, 1995. [41] B. Biais, P. Hillion, and C. Spatt. An empirical analysis of the limit order book and the order flow in the paris bourse. Journal of Finance, 50(5):1655–1689, 1995. [42] P. Billingsley. Convergence of Probability Measures. Wiley, 2nd ed., 1999. [43] M Blatt, S. Wiseman, and E. Domany. Superparamagnetic clustering of data. Physical Review Letters, 76(18):3251, April 1996. [44] A. Blazejewski and R. Coggins. A local non-parametric model for trade sign inference. Physica A: Statistical Mechanics and its Applications, 348(1):481–495, 2005. 193 [45] T. Bollerslev, R. F. Engle, and D. B. Nelson. Chapter 49 arch models. In R. F. Engle and D. L. McFadden, editors, Handbook of Econometrics 4, pages 2959 – 3038. Elsevier, 1994. [46] G. Bonanno, F. Lillo, and R. N. Mantegna. High-frequency cross-correlation in a set of stocks. Quantitative Finance, 1(1):96, 2001. [47] J.-P. Bouchaud, J. D. Farmer, and F. Lillo. How markets slowly digest changes in supply and demand. In T. Hens and K. R. Schenk-Hoppé, editors, Handbook of Financial Markets: Dynamics and Evolution. Elsevier, 2009. [48] J.-P. Bouchaud, Y. Gefen, M. Potters, and M. Wyart. Fluctuations and response in financial markets: The subtle nature of random price changes. Quantitative Finance, 4(2):176–190, 2004. [49] J. P. Bouchaud, M. Mézard, and M. Potters. Statistical properties of stock order books: empirical results and models. Quantitative Finance, 2:251–256, 2002. [50] J.-P. Bouchaud, M. Mézard, and M. Potters. Statistical properties of stock order books: empirical results and models. Quantitative Finance, 2:251–256, 2002. [51] J.-P. Bouchaud and M. Potters. Theory of Financial Risks: From Statistical Physics to Risk Management. Cambridge University Press, 1 edition, January 2000. [52] J.P. Bouchaud. Economics needs a scientific revolution. Nature, 455(7217):1181, 2008. [53] J.P. Bouchaud and M. Potters. Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management. Cambridge University Press, 2 edition, feb 2004. [54] C. Bowsher. Modelling security market events in continuous time: Intensity based, multivariate point process models. Nuffield College Economics Discussion Papers, nov 2002. [55] C. Bowsher. Modelling security market events in continuous time: Intensity based, multivariate point process models. Journal of Econometrics, 141:876–912, dec 2007. [56] C. G. Bowsher. Modelling security market events in continuous time: Intensity based, multivariate point process models. Journal of Econometrics, 141(2):876–912, December 2007. [57] P. Brémaud and L. Massoulié. Stability of nonlinear Hawkes processes. Ann. Probab., 24:1563– 1588, 1996. [58] P. Brémaud, L. Massoulié, and A. Ridolfi. Power spectra of random spike fields and related processes. Adv. in Appl. Probab., 37:1116–1146, 2005. [59] P. Brémaud. Markov chains: Gibbs fields, Monte Carlo simulation, and queues. Springer, 1999. [60] William A. Brock and Cars H. Hommes. Heterogeneous beliefs and routes to chaos in a simple asset pricing model. Journal of Economic Dynamics and Control, 22(8-9):1235–1274, August 1998. 194 [61] W. Buffett. The superinvestors of graham-and-doddsville. 1984. [62] C. Cao, O. Hansch, and X. Wang Beardsley. The informational content of an open limit order book. EFA 2004 Maastricht Meetings Paper, mar 2004. [63] P. Cattiaux, D. Chafaï, and A. Guillin. Central limit theorems for additive functionals of ergodic markov diffusions processes. Lat. Am. J. Probab. Math. Stat, 9(2):337–382, 2012. [64] Anindya S. Chakrabarti and Bikas K. Chakrabarti. Microeconomics of the ideal gas like market models. Physica A: Statistical Mechanics and its Applications, 388(19):4151 – 4158, 2009. [65] Anindya S. Chakrabarti and Bikas K. Chakrabarti. Statistical theories of income and wealth distribution. Economics E-Journal (open access), 4, 2010. [66] Anindya Sundar Chakrabarti, Bikas K. Chakrabarti, Arnab Chatterjee, and Manipushpak Mitra. The kolkata paise restaurant problem and resource utilization. Physica A: Statistical Mechanics and its Applications, 388(12):2420–2426, June 2009. [67] A. Chakraborti, I. Muni Toke, M. Patriarca, and F. Abergel. Econophysics review: I. empirical facts. Quantitative Finance, 11(7):991–1012, 2011. [68] A. Chakraborti, I. Muni Toke, M. Patriarca, and F. Abergel. Econophysics review: II. agent-based models. Quantitative Finance, 11(7):1013–1041, 2011. [69] A. Chakraborti, M. Patriarca, and M. S. Santhanam. Econophysics of Markets and Business Networks, chapter Financial Time-series Analysis: a Brief Overview. Springer, Milan, 2007. [70] Anirban Chakraborti, Ioane M. Toke, Marco Patriarca, and Frederic Abergel. Econophysics: Empirical facts and agent-based models. Sep 2009. http://arxiv.org/abs/0909.1974. [71] D. Challet and D. Morton de la Chapelle. Collective portfolio optimization in brokerage data: The role of transaction cost structure. Market Microstructure: Confronting Many Viewpoints, 2012. [72] D. Challet and D. Morton de la Chapelle. A robust measure of investor contrarian behaviour. Econophysics of Systemic Risk and Network Dynamics, 2013. [73] D. Challet and R. Stinchcombe. Analyzing and modeling 1+ 1d markets. Physica A: Statistical Mechanics and its Applications, 300(1-2):285–299, 2001. [74] D. Challet and R. Stinchcombe. Analyzing and modeling 1+1d markets. Physica A, 300:285–299, 2001. [75] D. Challet and R. Stinchcombe. Non-constant rates and over-diffusive prices in a simple model of limit order markets. Quantitative Finance, 3:155–162, 2003. [76] Damien Challet, Matteo Marsili, and Yi-Cheng Zhang. Minority Games. Oxford University Press, illustrated edition edition, November 2004. 195 [77] Damien Challet and Robin Stinchcombe. Analyzing and modeling 1+1d markets. Physica A: Statistical Mechanics and its Applications, 300(1–2):285–299, November 2001. [78] Ernest P. Chan. Quantitative Trading: How to Build Your Own Algorithmic Trading Business (Wiley Trading). November 2008. [79] M.L. Chaudhry and J.G.C. Templeton. A first course in bulk queues. John Wiley and Sons, 1983. [80] S.S. Chen, D.L. Donoho, and M.A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1):33–61, 1998. [81] C. Chiarella and G. Iori. A simulation analysis of the microstructure of double auction markets. Quantitative Finance, 2(5):346–353, 2002. [82] Carl Chiarella, Xue-Zhong He, and Cars Hommes. A dynamic analysis of moving average rules. Journal of Economic Dynamics and Control, 30(9-10):1729–1753, September 2006. [83] E.S. Chornoboy, L.P. Schramm, and A.F. Karr. Maximum likelihood identification of neural point process systems. Biol Cybern, 59:265–275, 1988. [84] J. Claerbout and F. Muir. Robust modeling with erratic data. Geophysics, 38(5):826–844, 1973. [85] P. K. Clark. A subordinated stochastic process model with finite variance for speculative prices. Econometrica, 41(1):135–55, January 1973. [86] D. Cliff and J. Bruten. Zero is not enough: On the lower limit of agent intelligence for continuous double auction markets. Technical Report HPL-97-141, Hewlett-Packard Laboratories, Bristol, UK, 1997. [87] K.J. Cohen, S.F. Maier, R.A. Schwartz, and D.K. Whitcomb. A simulation model of stock exchange trading. Simulation, 41(5):181, 1983. [88] M.H. Cohen. Report of the special study of the securities markets of the securities and exchanges commission. Technical report, 1963. U.S. Congress, 88th Cong., 1st sess., H. Document 95. [89] Milton H. Cohen. Reflections on the special study of the securities markets. Technical report, May 1963. Speech at the Practising Law Intitute, New York. [90] R. Cont. Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance, 1(2):223–236, 2001. [91] R. Cont. Volatility clustering in financial markets: Empirical facts and Agent-Based models. In Long Memory in Economics, pages 289–309. 2007. [92] R. Cont and J.-P. Bouchaud. Herd behavior and aggregate fluctuations in financial markets. Macroeconomic Dynamics, 4(02):170–196, 2000. [93] R. Cont and A. de Larrard. Price dynamics in a markovian limit order book market. jan 2011. 196 [94] R. Cont and A. de Larrard. Price dynamics in a markovian limit order market. arXiv:1104.4596, April 2011. [95] R. Cont and A. de Larrard. Order book dynamics in liquid markets: limit theorems and diffusion approximations. Preprint, 2012. [96] R. Cont and A. de Larrard. Price dynamics in a markovian limit order book market. SIAM Journal on Financial Mathematics, 2013. [97] R. Cont, S. Stoikov, and R. Talreja. A stochastic model for order book dynamics. Operations Research, 58:549–563, 2010. [98] R. Cont and P. Tankov. Financial modelling with jump processes. Chapman & Hall/CRC, 2004. [99] A. C. C. Coolen. The Mathematical Theory Of Minority Games: Statistical Mechanics Of Interacting Agents. Oxford University Press, illustrated edition edition, January 2005. [100] D. R. Cox and V. Isham. An Introduction to the Theory of Point Processes. Chapman & Hall, 1980. [101] J. Da Fonseca and R. Zaatour. Clustering and mean reversion in a Hawkes microstructure model. Journal of Futures Markets, 2014. [102] J. Da Fonseca and R. Zaatour. Hawkes process: Fast calibration, application to trade clustering, and diffusive limit. Journal of Futures Markets, 34:548–579, 2014. [103] D. J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes. Springer, 2003. [104] D. J. Daley and D. Vere-Jones. An introduction to the theory of point processes. Vol. I. Probability and its Applications (New York). Springer-Verlag, New York, second edition, 2003. Elementary theory and methods. [105] D. J. Daley and D. Vere-Jones. Scoring probability forecasts for point processes: the entropy score and information gain. J. Appl. Probab., 41A:297–312, 2004. Stochastic methods and their applications. [106] D. J. Daley and D. Vere-Jones. An introduction to the theory of point processes. Vol. II. Probability and its Applications (New York). Springer, New York, second edition, 2008. General theory and structure. [107] M. G. Daniels, J. D. Farmer, L. Gillemot, G. Iori, and E. Smith. Quantitative model of price diffusion and market friction based on trading as a mechanistic random process. Physical Review Letters, 90:549–563, 2003. [108] L. de Haan, S. Resnick, and H. Drees. How to make a hill plot. The Annals of Statistics, 28(1):254–274, 2000. 197 [109] I.M. Dremin and A.V. Leonidov. On distribution of number of trades in different time windows in the stock market. Physica A: Statistical Mechanics and its Applications, 353:388 – 402, 2005. [110] D. Duffie. Dynamic asset pricing theory. Princeton University Press Princeton, NJ, 1996. [111] M. Duflo. Méthodes récursives aléatoires. Masson, 1990. [112] A. Dufour and R.F. Engle. Time and the price impact of a trade. JOURNAL OF FINANCE, 55:2467–2498, 2000. [113] Z. Eisler, J.-P. Bouchaud, and J. Kockelkoren. Models for the impact of all order book events. Market Microstructure: Confronting Many Viewpoints, pages 115–136, 2012. [114] Z. Eisler, J. Kertesz, F. Lillo, and R. N. Mantegna. Diffusive behavior and the modeling of characteristic times in limit order executions. Quantitative Finance, 9(5):547–563, 2009. [115] P. Embrechts, T. Liniger, and L. Lin. Multivariate hawkes processes : an application to financial data. Journal of Applied Probability, 48A:367–378, 2011. [116] R. F. Engle. The econometrics of Ultra-High-Frequency data. Econometrica, 68(1):1–22, 2000. [117] R. F. Engle and J. R. Russell. Forecasting the frequency of changes in quoted foreign exchange prices with the autoregressive conditional duration model. Journal of Empirical Finance, 4(23):187–212, June 1997. [118] R.F. Engle and A. Lunde. Trades and quotes: A bivariate point process. Journal of Financial Econometrics, 1:159–188, 1998. [119] R.F. Engle and J.R. Russell. Forecasting the frequency of changes in quoted foreign exchange prices with the autoregressive conditional duration model. Journal of Empirical Finance, 4(23):187–212, 1997. [120] R.F. Engle and J.R. Russell. Autoregressive conditional duration: A new model for irregularly spaced transaction data. Econometrica, 66(5):1127–1162, sep 1998. [121] T. W. Epps. Comovements in stock prices in the very short-run. Journal of the American Statistical Association, 74, 1979. [122] Thomas W. Epps. Comovements in stock prices in the very short run. Journal of the American Statistical Association, 74(366):291–298, June 1979. [123] P. Erdos and A. Renyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5:17–61, 1960. [124] E. Errais, K. Giesecke, and L. Goldberg. Affine point processes and portfolio credit risk. SIAM Journal on Financial Mathematics, 2010. [125] S. N. Ethier and T. G. Kurtz. Markov processes characterization and convergence. Wiley, 2005. 198 [126] E. F. Fama. Efficient capital markets : A review of theory and empirical work. Journal of finance, 1969. [127] F. Lillo J. D. Farmer. The long memory of the efficient market. 2008. [128] J. D. Farmer and D. Foley. The economy needs agent-based modelling. Nature, 460(7256):685– 686, 2009. [129] J. D. Farmer, A. Gerig, F. Lillo, and S. Mike. Market efficiency and the long-memory of supply and demand: is price impact variable and permanent or fixed and temporary? Quantitative Finance, 6(2):107–112, 2006. [130] J. D. Farmer, A. Gerig, F. Lillo, and H. Waelbroeck. How efficiency shapes market impact. 2013. [131] J. D. Farmer, L. Gillemot, F. Lillo, S. Mike, and A. Sen. What really causes large price changes? Quantitative finance, 4(4):383–397, 2004. [132] J. D. Farmer, L. Gillemot, F. Lillo, S. Mike, and A. Sen. What really causes large price changes? Technical report, apr 2004. [133] J. D. Farmer, P. Patelli, and I. I. Zovko. The predictive power of zero intelligence in financial markets. Quantitative Finance Papers. cond-mat/0309233, arXiv.org, September 2003. [134] J. D. Farmer, P. Patelli, and I. I. Zovko. The predictive power of zero intelligence in financial markets. Proceedings of the National Academy of Sciences, 102(6):2254–2259, 2005. [135] J.D. Farmer, A. Gerig, F. Lillo, and S. Mike. Market efficiency and the long-memory of supply and demand: Is price impact variable and permanent or fixed and temporary? Quantitative Finance, 6(2):107–112, 2006. [136] J.D. Farmer, P. Patelli, and I. Zovko. The predictive power of zero-intelligence in financial markets. PNAS, 102(1):2254–2259, 2005. [137] W. Feller. Introduction to the Theory of Probability and its Applications, Vol. 2. Wiley, New York, 1968. [138] A. Ferraris. Equity market impact models. 2008. [139] T. Fletcher, Z. Hussain, and J. Shawe-Taylor. Multiple kernel learning on the limit order book. Quantitative Finance, preprint, 2010. [140] L. Foata, M. Vidhamali, and F. Abergel. Multi-agent order book simulation: Mono-and multiasset high-frequency market making strategies. Econophysics of Order-driven Markets, pages 139–152, 2011. [141] Hans Follmer and Alexander Schied. Stochastic Finance: An Introduction In Discrete Time 2. Walter de Gruyter & Co, 2nd revised edition edition, November 2004. 199 [142] T. Foucault, O. Kadan, and E. Kandel. Limit order book as a market for liquidity. Review of Financial Studies, 18(4):1171–1217, December 2005. [143] T. Foucault and A.J. Menkveld. Competition for order flow and smart order routing systems. Journal of Finance, 63:119–158, 2008. [144] J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, feb 2010. [145] G.M. Furnival and R.W. Wilson. Regression by leaps and bounds. Technometrics, 16:499–511, 1974. [146] X. Gabaix, P. Gopikrishnan, V. Plerou, and H. E. Stanley. Institutional investors and stock market volatility. Quarterly Journal of Economics, 121(2):461–504, May 2006. [147] Xavier Gabaix. Power laws in economics and finance. Annual Review of Economics, 1(1):255– 294, 2009. [148] M. Gallegati and A. P. Kirman, editors. Beyond the Representative Agent. Edward Elgar Publishing, 1st edition, November 1999. [149] M. B Garman. Market microstructure. Journal of Financial Economics, 3(3):257 – 275, 1976. [150] J. Gatheral. The volatility surface: a practitioner’s guide. Wiley, 2006. [151] J. Gatheral. No-dynamic-arbitrage and market impact. Quantitative Finance, 10(7):749–759, 2010. [152] J. Gatheral and R. Oomen. Zero-intelligence realized variance estimation. Finance and Stochastics, 14:249–283, 2010. [153] J.F. Germain and F. Roueff. Weak convergence of the regularization path in penalized m- estimation. Scand. J. Stat., 37(3):477–495, 2010. [154] L. R. Glosten. Is the electronic open limit order book inevitable? The Journal of Finance, 49(4):1127–1161, September 1994. [155] P. W. Glynn and S. P. Meyn. A Liapounov bound for solutions of the Poisson equation. Annals of Probability, 24(2):916–931, 1996. [156] D. K. Gode and S. Sunder. Allocative efficiency of markets with zero-intelligence traders: Market as a partial substitute for individual rationality. The Journal of Political Economy, 101(1):119– 137, 1993. [157] P Gopikrishnan, M. Meyer, L. A. Amaral, and H. E. Stanley. Inverse cubic law for the probability distribution of stock price variations. The European Physical Journal B, 3:139, 1998. [158] P. Gopikrishnan, V. Plerou, L. A. Amaral, M. Meyer, and H. E. Stanley. Scaling of the distribution of fluctuations of financial market indices. Phys. Rev. E, 60(5):5305–5316, Nov 1999. 200 [159] P. Gopikrishnan, V. Plerou, X. Gabaix, and H. E. Stanley. Statistical properties of share volume traded in financial markets. Physical Review - Series E-, 62(4; PART A):4493–4496, 2000. [160] C. Gourieroux, J. Jasiak, and G. Le Fol. Intra-day market activity. Journal of Financial Markets, 2(3):193–226, aug 1999. [161] J. E. Griffin and R. C. A. Oomen. Sampling returns for realized variance calculations: tick time or transaction time? Econometric Reviews, 27(1):230–253, 2008. [162] G Gu and W. Zhou. Emergence of long memory in stock volatility from a modified mike–farmer model. Europhysics Letters, 86, 2009. [163] G.-F. Gu, W. Chen, and W.-X. Zhou. Empirical shape function of limit-order books in the chinese stock market. Physica A: Statistical Mechanics and its Applications, 387(21):5182–5188, September 2008. [164] D.M. Guillaume, M.M. Dacorogna, R.R. Davé, U.A. Müller, R.B. Olsen, and O.V. Pictet. From the bird’s eye to the microscope: A survey of new stylized facts of the intra-daily foreign exchange markets. Finance and Stochastics, 1(2):95–129, 1997. [165] N.H. Hakansson, A. Beja, and J. Kale. On the feasibility of automated market making by a programmed specialist. Journal of Finance, pages 1–20, 1985. [166] A. D. Hall and N. Hautsch. Modelling the buy and sell intensity in a limit order book market. Journal of Financial Markets, 10(3):249–286, August 2007. [167] J. Hasbrouck. Measuring the information content of stock trades. Journal of Finance, 46(1):179– 207, mar 1991. [168] J. Hasbrouck. Empirical market microstructure: The institutions, economics, and econometrics of securities trading. 2006. [169] J. Hasbrouck. Empirical Market Microstructure: The Institutions, Economics, and Econometrics of Securities Trading. Oxford University Press, USA, 2007. [170] J. Hasbrouck. Measuring the information content of stock trades. The Journal of Finance, 46(1):179–207, 2012. [171] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2011. [172] T. Hastie, R. Tibshirani, and J.H. Friedman. The Elements of Statistical Learning. Springer, corrected edition, jul 2003. [173] J.A. Hausman, A.W. Lo, and A.C. MacKinlay. An ordered probit analysis of transaction stock prices. Journal of Financial Economics, 31(3):319–379, jun 1992. [174] N. Hautsch. Modelling irregularly spaced financial data. Springer, 2004. 201 [175] A. G. Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1):83–90, April 1971. [176] A. G. Hawkes and D. Oakes. A cluster process representation of a Self-Exciting process. Journal of Applied Probability, 11(3):493–503, September 1974. [177] A.G. Hawkes. Point spectra of some mutually exciting point processes. Journal of the Royal Statistical Society. Series B (Methodological), 33(3):438–443, 1971. [178] A.G. Hawkes and D. Oakes. A cluster process representation of a self-exciting process. Journal of Applied Probability, 11(3):493–503, 1974. [179] Takaki Hayashi and Nakahiro Yoshida. On covariance estimation of non-synchronously observed diffusion processes. Bernoulli, 11(2):359–379, 2005. [180] SL Heston. A closed-form solution for options with stochastic volatility with applications to bond and currency options. Rev. Financ. Stud., 6(2):327–343, April 1993. [181] P. Hewlett. Clustering of order arrivals, price impact and trade path optimisation. Workshop on Financial Modeling, 2006. [182] P. Hewlett. Clustering of order arrivals, price impact and trade path optimisation. In Workshop on Financial Modeling with Jump processes, Ecole Polytechnique, 2006. [183] B. M. Hill. A simple general approach to inference about the tail of a distribution. The Annals of Statistics, 3(5):1163–1174, September 1975. [184] A. E. Hoerl and R. W. Kennard. Ridge Regression: Biased Estimation for Nonorthogonal Problems. 1970. [185] A.E. Hoerl, R.W. Kennard, and K.F. Baldwin. Ridge Regression: Some Simulations. 1975. [186] B. Hollifield, R. A. Miller, and P. Sandås. Empirical analysis of limit order markets. The Review of Economic Studies, 71(4):1027–1063, oct 2004. [187] U. Horst and M. Paulsen. A law of large numbers for limit order books. 2015. [188] W. HUANG, Y. Nakamori, and S. WANG. Forecasting stock market movement direction with support vector machine. Computer Operations Research, 32(10):2513–2522, 2005. [189] N. Huth and F. Abergel. The times change: Multivariate subordination, empirical facts. SSRN eLibrary, March 2009. [190] N. Huth and F. Abergel. High frequency correlation modelling. Econophysics of order-driven markets, pages 189–202, 2011. [191] N. Huth and F. Abergel. High frequency lead/lag relationships-empirical facts. arXiv preprint arXiv:1111.7103, 2011. 202 [192] N. Huth and F. Abergel. The times change: multivariate subordination. empirical facts. Quantitative Finance, 12(1):1–10, 2012. [193] Giulia Iori, Saqib Jafarey, and Francisco G. Padilla. Systemic risk on the interbank market. Journal of Economic Behavior & Organization, 61(4):525–542, December 2006. [194] Giulia Iori and Ovidiu V. Precup. Weighted network analysis of high-frequency cross-correlation measures. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 75(3):036110–7, March 2007. [195] P. C. Ivanov, A. Yuen, B. Podobnik, and Y. Lee. Common scaling patterns in intertrade times of u. s. stocks. Phys. Rev. E, 69(5):056107, May 2004. [196] J. Jacod and A. N. Shyriaev. Limit theorems for stochastic processes. Springer, 2003. [197] Jean Jacod. Multivariate point processes: predictable projection, radon-nikodym derivatives, representation of martingales. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 31(3):235–253, 1975. [198] E. Jondeau, A. Perilla, and M. Rockinger. Optimal liquidation strategies in illiquid markets. Swiss Finance Institute Research Paper No.09-24, nov 2008. [199] N. Kaldor. Economic growth and capital accumulation. The Theory of Capital, Macmillan, London, 1961. [200] N. El Karoui, S. Peng, and M. C. Quenez. Backward stochastic differential equations in finance. Mathematical Finance, 7(1):1–71, 1997. [201] D.B. Keim and A. Madhaven. The upstairs market for large-block transactions: Analysis and measurement of price effects. Review of Financial Studies, 9(1):1–36, 1996. [202] MG Kendall and A.B. Hill. The analysis of economic time-series-Part I: Prices. Journal of the Royal Statistical Society. Series A (General), pages 11–34, 1953. [203] W. Cheney D Kincaid. Numerical Mathematics and Computing. 2008. [204] C. P. Kindleberger and R. Z. Aliber. Manias, Panics, And Crashes: A History Of Financial Crises. John Wiley & Sons, fifth edition edition, 2005. [205] A. P. Kirman. In R. Frydman and E.S. Phelps, editors, Individual Forecasting and Aggregate Outcomes: Rational Expectations Examined, chapter On mistaken beliefs and resultant equilibria. Cambridge University Press, 1983. [206] A. P. Kirman. Whom or what does the representative individual represent? The Journal of Economic Perspectives, pages 117–136, 1992. [207] A. P. Kirman. Ants, rationality, and recruitment. The Quarterly Journal of Economics, pages 137–156, 1993. 203 [208] A. P. Kirman. Reflections on interaction and markets. Quantitative Finance, 2(5):322–326, 2002. [209] L. Koskinen. Handbook of financial time series. International Statistical Review, 78:148–149, 2010. [210] U. Krengel and A. Brunel. Ergodic theorems. Walter de Gruyter, 1985. [211] C.M. Kuan and T. Liu. Forecasting exchange rates using feedforward and recurrent neural networks. Journal of Applied Econometrics, 10:347–364, 1995. [212] L. Kullmann, J. Toyli, J. Kertesz, A. Kanto, and K. Kaski. Characteristic times in stock market indices. Physica A: Statistical Mechanics and its Applications, 269(1):98 – 110, 1999. [213] A. S. Kyle. Continuous auctions and insider trading. Econometrica, 53(6):1315–1335, November 1985. [214] P. Lakner, J. Reed, and S. Stoikov. High frequency asymptotics for the limit order book. http://pages.stern.nyu.edu/ jreed/Papers/LimitFinal.pdf, 2013. Preprint. [215] J. Large. Measuring the resiliency of an electronic limit order book. Journal of Financial Markets, 10(1):1–25, February 2007. [216] J.H. Large. Measuring the resiliency of an electronic limit order book. Journal of Financial Markets, 10(1):1–25, 2007. [217] J. F. Lawless and P. Wang. A Simulation Study of Ridge and Other Regression Estimators. 1976. [218] B. LeBaron. Agent-based computational finance. Handbook of computational economics, 2:1187–1233, 2006. [219] B. LeBaron. Agent-based financial markets: Matching stylized facts with style. In David C. Colander, editor, Post Walrasian macroeconomics, page 439. Cambridge University Press, 2006. [220] B LeBaron and R. Yamamoto. Long-memory in an order-driven market. Physica A: Statistical Mechanics and its Applications, 383(1):85–89, September 2007. [221] F. Lillo and J. D. Farmer. The long memory of the efficient market. Studies in Nonlinear Dynamics & Econometrics, 8(3), 2004. [222] F. Lillo and J. D. Farmer. The long memory of the efficient market. Studies in Nonlinear Dynamics & Econometrics, 8(3), September 2004. [223] F. Lillo, J. D. Farmer, and R. N. Mantegna. Econophysics: Master curve for price-impact function. Nature, 421(6919):130, 129, 2003. [224] F. Lillo, J. D. Farmer, and Mantegna R.N. Master curve for price impact function. Nature, 421(6919):129–130, 2003. 204 [225] F. Lillo and J.D. Farmer. The long memory of the efficient market. Studies in Nonlinear Dynamics & Econometrics, 8(3), 2004. [226] T.F. Liniger. Multivariate Hawkes Processes. PhD thesis, ETH Zürich, 2009. [227] J. Linnainmaa and I. Rosu. Weather and time series determinants of liquidity in a limit order market. 2009. [228] A. Lo, A. MacKinlay, and J. Zhang. Econometric models of limit-order executions. NBER WORKING PAPER SERIES, 1997. [229] A. Lo, A. MacKinlay, and J. Zhang. Weather and time series determinants of liquidity in a limit order market. 2009. [230] A.W. Lo, A. Craig MacKinlay, and J. Zhang. Econometric models of limit-order executions. Journal of Financial Economics, 65(1):31–71, jul 2002. [231] T. Lux. Handbook of Research on Complexity, chapter Applications of Statistical Physics in Finance and Economics. Edward Elgar Publishing Ltd, May 2009. [232] T. Lux, M. Marchesi, and G. Italy. Volatility clustering in financial markets. International Journal of Theoretical and Applied Finance, 3:675–702, 2000. [233] T. Lux and F. Westerhoff. Economics crisis. Nature Physics, 5(1):2–3, 2009. [234] Thomas Lux and D Sornette. On rational bubbles and fat tails. Journal of Money, Credit, and Banking, 34(3a):589–610, 2002. [235] B. G. Malkiel. A random walk down wall street. 1973. [236] Paul Malliavin and Maria Elvira Mancino. Fourier series method for measurement of multivariate volatilities. Finance and Stochastics, 6(1):49–61, 2002. [237] B. Mandelbrot. The variation of certain speculative prices. Journal of business, 36(4):394, 1963. [238] R. N. Mantegna and H. E. Stanley. Introduction to Econophysics: Correlations and Complexity in Finance. Cambridge University Press, 1 edition, July 2007. [239] G. Maruyama and H. Tanaka. Ergodic property of n-dimensional recurrent markov processes. Mem. Fac. Sci. Kyushu Univ. Ser. A, 13:157–172, 1959. [240] S. Maslov. Simple model of a limit order-driven market. Physica A: Statistical Mechanics and its Applications, 278(3-4):571–578, 2000. [241] S. Maslov and M. Mills. Price fluctuations from the order book perspective – empirical facts and a simple model. Physica A: Statistical Mechanics and its Applications, 299(1-2):234–246, 2001. [242] M. McAleer and M. C. Medeiros. Realized volatility: A review. Econometric Reviews, 27(1):10– 45, 2008. 205 [243] L. Meier, S. Van de Geer, and P. Buehlmann. High-dimensional additive modeling. jun 2008. [244] S. Meyn and R.L. Tweedie. Stability of markovian processes I: criteria for discrete-time chains. Advances in Applied Probability, 24:542–574, 1992. [245] S. Meyn and R.L. Tweedie. Stability of markovian processes II: continuous-time processes and sampled chains. Advances in Applied Probability, 25:487–517, 1993. [246] S. Meyn and R.L. Tweedie. Stability of markovian processes III: Foster-lyapunov criteria for continuous-time processes. Advances in Applied Probability, 25:518–548, 1993. [247] S. Meyn and R.L. Tweedie. Markov Chains and Stochastic Stability. Cambridge University Press, 2nd ed, 2009. [248] S. Mike and J. D. Farmer. An empirical behavioral model of liquidity and volatility. Journal of Economic Dynamics and Control, 32(1):200–234, 2008. [249] J. Møller and J.G. Rasmussen. Approximate simulation of hawkes processes, 2004. [250] J. Møller and G.L. Torrisi. The pair correlation function of spatial Hawkes processes. Statist. Probab. Lett., 77:995–1003, 2007. [251] Jesper Møller and Jakob G. Rasmussen. Perfect simulation of hawkes processes. Advances in Applied Probability, 37(3):629–646, 2005. [252] I. Muni Toke. A note on analytical properties of a one-side order book model with exponential order flows. Submitted. [253] I. Muni Toke. Market Making in an order book model and its impact on the spread. In F. Abergel, B. K. Chakrabarti, A. Chakraborti, and M. Mitra, editors, Econophysics of Order-driven Markets, pages 49–64. Springer, 2011. [254] J. Murphy. Technical analysis of the financial markets: A comprehensive guide to trading methods and applications. [255] R. Næs and J. A Skjeltorp. Order book characteristics and the volume–volatility relation: Empirical evidence from a limit order market. Journal of Financial Markets, 9(4):408–432, 2006. [256] A. A. Obizhaeva and J. Wang. Optimal trading strategy and supply/demand dynamics. Journal of Financial Markets, to appear, 2012. [257] Y. Ogata. The asymptotic behaviour of maximum likelihood estimators for stationary point processes. Annals of the Institute of Statistical Mathematics, 30(1):243–261, dec 1978. [258] Y. Ogata. On lewis’ simulation method for point processes. Information Theory, IEEE Transactions on, 27(1):23–31, jan 1981. [259] Y. Ogata. On Lewis’ simulation method for point processes. IEEE Transactions on Information Theory, 27(1):23–31, 1981. 206 [260] Y. Ogata. Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, pages 9–27, 1988. [261] M. O’Hara. Market Microstructure Theory. Blackwell Publishers, second edition edition, November 1997. [262] M. F. M. Osborne. Brownian motion in the stock market. Operations Research, 7(2):145–173, March 1959. [263] F. O’Sullivan, B.S. Yandell, and W.J. Raynor. Automatic smoothing of regression functions in generalized linear models. the Journal of the American Statistical Association, 81:96–103, 1986. [264] A. Pagan. The econometrics of financial markets. Journal of empirical finance, 3(1):15–102, 1996. [265] C. Parlour and D. Seppi. Limit order markets: A survey. pages 61–96, 2008. [266] C. A. PArlour and D. Seppi. Limit order markets: a survey. In Handbook of Financial Intermediation & Banking, pages 63–96. Elsevier, 2008. [267] C.A. Parlour. Price dynamics in limit order markets. Review of Financial Studies, 11(4):789–816, 1998. [268] V. Plerou, P. Gopikrishnan, L. A. Nunes Amaral, X. Gabaix, and E. H. Stanley. Economic fluctuations and anomalous diffusion. Phys. Rev. E, 62(3):R3023–R3026, Sep 2000. [269] M. Politi and E. Scalas. Fitting the empirical distribution of intertrade durations. Physica A: Statistical Mechanics and its Applications, 387(8-9):2025 – 2034, 2008. [270] F. Pomponio and F. Abergel. Trade-throughs : Empirical facts - application to lead-lag measures. SSRN eLibrary, 2010. [271] F. Pomponio and F. Abergel. Trade-throughs: Empirical facts and application to lead-lag measures. Econophysics of Order-driven Markets, pages 3–16, 2011. [272] F. Pomponio and F. Abergel. Trade-throughs: Empirical facts and application to lead-lag measures. In F. Abergel, B.K. Chakrabarti, A. Chakraborti, and M. Mitra, editors, Econophysics of Order-driven Markets, New Economic Windows, pages 3–16. Springer Milan, 2011. [273] F. Pomponio and I. Muni Toke. Modelling trade-throughs in a limited order book using Hawkes processes. Economics: The Open-Access, Open-Assessment E-Journal, 6, 2012. [274] D.C. Porter. The probability of a trade at the ask: An examination of interday and intraday behavior. Journal of Financial and Quantitative Analysis, 27(2):209–227, 1992. [275] M. Potters and J. P. Bouchaud. More statistical properties of order books and price impact. Physica A: Statistical Mechanics and its Applications, 324(1-2):133–140, 2003. 207 [276] M. Potters and J.-P. Bouchaud. More statistical properties of order books and price impact. Physica A: Statistical Mechanics and its Applications, 324(1–2):133–140, June 2003. [277] M. Potters and J.P. Bouchaud. More statistical properties of order books and price impact. Physica A, 324(1-2):133–140, 2003. [278] S. Predoiu, G. Shaikhet, and S. Shreve. Optimal execution in a general one-sided limit-order book. SIAM Journal on Financial Mathematics, 2(1):183–212, January 2011. [279] T. Preis, S. Golke, W. Paul, and J. J. Schneider. Multi-agent-based order book model of financial markets. Europhysics Letters (EPL), 75(3):510–516, August 2006. [280] T. Preis, S. Golke, W. Paul, and J. J. Schneider. Statistical analysis of financial returns for a multiagent order book model of asset trading. Physical Review E, 76(1), 2007. [281] G. Pristas. Limit order book and asset liquidity. Cuvillier, 2008. [282] N. Privault. Understanding Markov Chains. Springer, 2013. [283] M. Raberto, S. Cincotti, S.M. Focardi, and M. Marchesi. Agent-based simulation of a financial market. Physica A: Statistical Mechanics and its Applications, 299(1-2):319–327, 2001. [284] A. Rakotomamonjy, F. Bach, F. Canu, and Y. Grandvalet. Simplemkl. Journal of Machine Learning Research, 9:2491–2521, nov 2008. [285] E. Renault and B.J.M. Werker. Stochatic volatility models with transaction time risk. 2005. [286] R. Reno. A closer look at the epps effect. International Journal of Theoretical and Applied Finance, 6(1):87–102, 2003. [287] P. Reynaud-Bouret and E. Roy. Some non asymptotic tail estimates for Hawkes processes. Bull. Belg. Math. Soc. Simon Stevin, 13:883–896, 2006. [288] P. Reynaud-Bouret and S. Schbath. Adaptive estimation for hawkes processes; application to genome analysis. Ann. Statist., 38:2781–2822, 2010. [289] J. D. Riley. Solving systems of linear equations with a positive definite, symmetric, but possibly ill-conditioned matrix. 1955. [290] I. Rocu. A dynamic model of the limit order book. Review of Financial Studies, 22(11):4601– 4641, 2009. [291] R. Roll. A simple implicit measure of the effective bid-ask spread in an efficient market. The Journal of Finance, 49:1127–1139, 1984. [292] J. R. Russell and T. Kim. A new model for limit order book dynamics. In Tim Bollerslev, Jeffrey Russell, and Mark Watson, editors, Volatility and Time Series Econometrics: Essays in Honor of Robert Engle. Oxford University Press, USA, April 2010. 208 [293] M. N Saha, B. N. Srivastava, and MN & Srivastava Saha. A treatise on heat. Indian Press, Allahabad, 1950. [294] E Samanidou, E Zschischang, D Stauffer, and T Lux. Agent-based models of financial markets. Reports on Progress in Physics, 70(3):409–450, 2007. [295] F. Santosa and W.W. Symes. Linear inversion of band-limited reflection seismograms. SIAM Journal on Scientific and Statistical Computing, 7(4):1307–1330, 1986. [296] J. B. Seaborn. Hypergeometric Functions and their Applications. Number 8 in Text in Applied Mathematics. Springer, 1991. [297] G. A. F. Seber and A. J. Lee. Linear Regression Analysis. Wiley, 2003. [298] J. Shadbolt and J.G. Taylor. Neural networks and the financial markets: predicting, combining and portfolio optimisation. 2002. [299] A. C Silva and V. M. Yakovenko. Stochastic volatility of financial markets as the fluctuating rate of trading: An empirical study. Physica A: Statistical Mechanics and its Applications, 382(1):278 – 285, 2007. [300] S. Sinha, A. Chatterjee, A. Chakraborti, and B. K. Chakrabarti. Econophysics: An Introduction. Wiley-VCH, 1 edition, November 2010. [301] F. Slanina. Mean-field approximation for a limit order driven market model. Phys. Rev. E, 64(5):056136, Oct 2001. [302] F. Slanina. Critical comparison of several order-book models for stock-market fluctuations. The European Physical Journal B-Condensed Matter and Complex Systems, 61(2):225–240, 2008. [303] E. Smith, J. D. Farmer, L. Gillemot, and S. Krishnamurthy. Statistical theory of the continuous double auction. Quantitative Finance, 3(6):481–514, 2003. [304] John N. Smithin. What Is Money? Routledge, October 1999. [305] G.J. Stigler. Public regulation of the securities markets. Journal of Business, pages 117–142, 1964. [306] N. Taleb. Fooled by randomness: The hidden role of chance in the markets and in life. 2001. [307] H.L. Taylor, S.C. Banks, and J.F. McCoy. Deconvolution with the l 1 norm. Geophysics, 44(1):39– 52, 1979. [308] S. Thurner, J. D Farmer, and J. Geanakoplos. Leverage causes fat tails and clustered volatility. Preprint, 2009. [309] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58(1):267–288, 1994. 209 [310] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Roayl Statistical Society. Series B, 58(1):267–288, 1996. [311] Ioane Muni Toke. "market making" behaviour in an order book model and its impact on the bid-ask spread. Quantitative finance papers, arXiv.org, 2010. [312] B. Tóth, Y. Lempérière, C. Deremble, J. de Lataillade, J. Kockelkoren, and J.-P. Bouchaud. Anomalous price impact and the critical nature of liquidity in financial markets. Physical Review X, 1, 2011. [313] G. Vidyamurthy. Pairs trading: Quantitative methods and analysis. 2004. [314] J. Voit. The statistical mechanics of financial markets. Texts and monographs in physics. Springer, 3 edition, 2005. [315] M. von Smoluchowski. Zur kinetischen theorie der brownschen molekularbewegung und der suspensionen. Annalen der Physik, 326(14):756–780, 1906. [316] R. F. Voss. Fractals in nature: from characterization to simulation. pages 21–70, 1988. [317] S. Walczak. An empirical analysis of data requirements for financial forecasting with neural networks. J. Manage. Inf. Syst., 17:203–222, 2001. [318] P. Weber and B. Rosenow. Order book approach to price impact. Quantitative Finance, 5(4):357– 364, 2005. [319] P. Weber and B. Rosenow. Large stock price changes: volume or liquidity? Quantitative Finance, 6(1):7–14, 2006. [320] W. Whitt. Stochastic-process limits. Springer Series in Operations Research. Springer-Verlag, New York, 2002. An introduction to stochastic-process limits and their application to queues. [321] WIKIPEDIA. High-frequency trading. http://en.wikipedia.org/wiki/ High-frequency_trading, jan 2011. [322] M. Wyart and J.-P. Bouchaud. Self-referential behaviour, overreaction and conventions in financial markets. Journal of Economic Behavior & Organization, 63(1):1–24, May 2007. [323] M. Wyart, J.-P. Bouchaud, J. Kockelkoren, M. Potters, and M. Vettorazzo. Relation between bid-ask spread, impact and volatility in order-driven markets. Quantitative Finance, 8(1):41–57, 2008. [324] J. Cacho-Diaz Y. Aït-Sahalia and R.J.A. Laeven. Modelling financial contagion using mutually exciting hawkes processes. Preprint, 2013. [325] Victor M. Yakovenko and J. Barkley Rosser. Colloquium: Statistical mechanics of money, wealth, and income. Reviews of Modern Physics, 81(4):1703, December 2009. 210 [326] Y. Yudovina. A simple model of a limit order book. http://arxiv.org/abs/1205.7017, 2012. Preprint. [327] B. Zheng, E. Moulines, and F. Abergel. Price jump prediction in limit order book. Journal of Mathematical Finance, 3(2):242–255, 2012. [328] B. Zheng, F. Roueff, and F. Abergel. Modelling bid and ask prices using constrained hawkes processes: Ergodicity and scaling limit. SIAM Journal on Financial Mathematics, 5:99–136, 2014. [329] J. Zhu and T. Hastie. Kernel logistic regression and the import vector machine. Journal of Computational and Graphical Statistics, 14:185–205, 2005. [330] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, 67(2):301–320, 2005. 211
© Copyright 2025