Pearson Correlation Coefficient

Pearson correlation coefficient (or Pearson coefficient) between x and y is defined by:


where Cov(x,y) is covariance and Var(x) is variance.

For the forex time series we analyze, the mean is typically at least two orders of magnitude smaller than the RMS. For this reason we often neglect the mean. Then, Cov[x,y] is simply the amplitude of the zero-lag bin of the cross-correlation function and Var[x] is the amplitude of the zero-lag bin of the autocorrelation function. When dealing with covariance alone, one does not know whether its change reflect the change in the strength of correlation between x and y or in strength of their independent variation. Pearson correlation coefficient allows one to analyse the tightness of the correlation between two quantities as such, leaving aside the question of the overall strength of their variation (correlated or not).

Trading System

In our usage, a trading system is an alogorithm to decide what, when, and with what allocation of capital needs to be bought or sold to maximize profit and minimize risk. Such decisions are made regularly and are based on a variety of input data, reflecting the changing market environment and prior history. The adequate level of complexity is high enough to require that a trading system be implemented as a computer program. The tasks of order execution may but do not need to fall into the scope of a trading system in our usage of the word.

Under conditions of complete market efficiency (when price quote time series is a martingale) there is no need for a trading system in our sense of the word — Modern Portfolio Theory will suffice. In some contexts the meaning of the term is reduced to denote an electronic or computer system that merely executes external orders, rather than generates them.

One may argue (as does Taleb) that everyday human experience which emphasizes cooperation in a more or less deterministic environment prepares us poorly for survival in the markets which are random to a very high degree. Our brain may be poorly equipped to deal with the randomness, let alone detect those traces of predictability and order which do exist in it. A response to this challenge may be to use higher faculties of our brain to build trading systems around abstract concepts (which are beyond the reach of computers) and then leave to computers the execution of routine decision making (counting odds) according to those systems.

Developing, back-testing and marketing buy-side trading systems for the forex traders is the main goal of the Forex Automaton™ project.


Quote Currency

Quote currency is the second currency of the currency pair. When the pair is represented as a ratio, quote currency is in the denominator of the ratio. For example, in USD/CHF, CHF is the quote currency. The price quote shows how much of the quote currency a unit of the base currency will buy.

Base Currency

Base currency is the first currency of the currency pair. When the currency pair is represented as a ratio, like USD/CHF, or EUR/USD, the base currency is in the numerator. The price quote shows how much of the quote currency a unit of the base currency will buy.


To avoid excessive complexity, by “trend” associated with a given time interval we mean simply the sign of the difference in the price quote at the beginning and the end of this time interval, no matter how large or small that difference is.


LIBOR — London Inter Bank Offered Rate. This is a money market rate offered on the interbank market for unsecured investments of varying maturity, denominated in various currencies. A popular capital cost indicator. LIBORs are “fixed” daily at 11am (UTC) by the British Bankers’s Association (BBA) by gathering input from the LIBOR Panel, a body representing a few banks believed to be important in the interbank market. The input is supposed to reflect today’s evaluation of the state of the money market (based on actual interbank loan offers) by each panel participant. BBA publishes the data for EUR, USD, GBP, JPY, CHF, CAD, AUD, DKK, NZD, and SEK. The maturities tracked are s/n-o/n (BBA abbreviation, s/n-o/n means spot/next – overnight), one week, two weeks, one month, two months, three months, four months, five months, six months, seven months, eight months, nine months, ten months, eleven months, and twelve months. If you need historical LIBOR interest rates data, BBA is the place to go.


RMS (or RMSD) — Root-Mean-Square Deviation. In accordance with ROOT terminology (a de facto standard in certain circles, for better of worse) RMS denotes Standard Deviation, and it would be better to call it RMSD. Often, RMS does not involve mean subtraction, but in our usage it does.

RMS is a measure of the width of the distribution, corresponding to Sigma in the case of Gaussian distribution.


In our usage cross-correlation is a generalization of autocorrelation to a pair of time series:

C(T)=E[x(t),y(t+T)]|over all available t

where T is the time lag. We will also use the more context-specific term intermarket correlation for certain cross-correlations. Just like significant autocorrelation at non-zero time lags lets one predict (in the statistical sense) the process on the basis of its own past history (“auto-” means “self-“), significant cross-correlation at non-zero time lags lets one predict X on the basis of Y (in the same sense with the same caveates) or Y on the basis of X, depending on the sign of T where the signal occurs. For example if you notice that XXX/YYY lags begind UUU/VVV, and you know that UUU/VVV just went up, you can go long on XXX/YYY and have a better than average chance of ending up with a winning trade.

Feel free to browse our collection of cross-correlation data for the forex markets — some of them are highly non-trivial.


Fig.1:A martingale market synthesized from hourly returns in AUD/JPY, over time. Time axis is labeled in MM-YY format. By construction, there is no predictable trend other than the long-term trend created by the tiny positive deviation from zero of the average hourly return. Are you a chartist? Do you believe in moving averages or Elliott waves? Do you feel you could day-trade this market? This is a particular example of what we refer to as “fair game” and for all practical purposes take to represent the embodiment of the efficient market hypothesis.

To synthesize such a chart, you first obtain a distribution of returns from the real time series. Then you histogram the returns. Then you start with an arbitrary number (1 was used in this case) and generate a random number according to the distribution of returns you got (a software package such as ROOT lets you do that). Having a starting price and a return, you obtain the next price in the series. You can continue this random walk process as long as you want. It may be counterintuitive to some that a random walk looks like this (Fig.1). Indeed you see a chart where you might be tempted to identify trend lines, points where the trend changes, and possibly even lines of support and resistance. From this standpoint you can understand the source of our Olympian attitude towards all kinds of current news: the pseudo-random market in Fig.1 could generate a very rich stream of news reports and “current analysis” — all totally content-free by construction. Chartists and reporters must admit: tools and concepts that let one distinguish between predictability and randomness are peripheral to their method of operation. However these tools and concepts are central to the Forex Automaton™ approach. Martingale is one of such concepts.

Martingale is a stochastic process (a time series like a forex exchange rate) where an expectation of any element does not depend on the prehistory (although properties of its distribution other than the expectation may depend on the prehistory).

By implication, the expectation for a change in an exchange rate to be recorded just now does not depend on a similar change recorded time T before now. The same would apply to returns or logarithmic returns.

The efficient market hypothesis can be restated to read that time series elements x(t) and x(t+T) are statistically independent variables for any value of T (other than, of course, 0) and therefore the autocorrelation of the time series is zero (except for, of course, at the zero time lag). Therefore a martingale market is the “efficient market”. Thus the efficient market hypothesis becomes falsifiable via observation of correlations.

In “Forecast of future prices, unbiased markets and martingale models” (Journal of Business, V.39, January 1966, 242-255) Benoit Mandelbrot defines martingale in a weaker and a stronger sense: “…to define a martingale, one may begin by postulating that it is possible to speak of a single value for E[Z(t+T)|Z(t)], without having to specify by which past values this expectation is conditioned. In a later stage, one will add the postulate that E[Z(t+T)|Z(t)]=Z(t).” I am bringing this up because the martingales constructed by random sampling from observed distributions of returns will never fall into the “later stage” category since the mean of the observed returns is never exactly zero. But they are certainly constructed to have Z(t+T) unconditioned by any specific past and therefore are martingales in the sense of the initial definition.

AUD/JPY autocorrelation compared with martingale autocorrelation

Fig.2:Autocorrelations of the martingale market from the Fig.1 (red) and of the actual AUD/JPY for the same time period (green).

You might be confused to believe that Fig.1 was a predictable market. But it is the Fig.2, autocorrelation, that tells the difference between actual (predictable to some extent) and pseudo-random behavior.

A spike in AUD/JPY hourly data of the kind that causes the autocorrelation feature specific to this market.

Fig.3:A spike in hourly AUD/JPY is the kind of pattern that causes the autocorrelation feature seen in Fig.2.

The remarkable feature of Fig.2 that distinguishes real data and is not present in the simulation is the downward spikes at one hour lag. Fig.3 illustrates what kind of real-market feature that corresponds to: a market jumping up and down (or down and up) on the hour-by-hour time scale. This happens when markets jump the gun or have a knee-jerk reaction to something, which they later come to regret, figurally speaking. The autocorrelation in Fig.2 tells you that this happens in real-life AUD/JPY a lot more often, or with a lot higher magnitude, than it does in the memory-free simulation (when you do not have memory, you do not regret!). Why does this make the market “predictable”? Because statistically, you can bet on market to regret the knee-jerk and be right more often than not. A forex trading system, such as the one under development here, can be built to continuously analyze the situation, learn the patterns like the one shown, and beat the market with confidence.


We use autocorrelation to quantify market inefficiency. The autocorrelation function is an old, common knowledge method which anybody with access to the data can in principle apply. Therefore it is perfect for demonstration in that we do not reveal any proprietary know-how by using it, yet it is quite convincing as it relates directly to the concept of a martingale.

The autocorrelation is defined as the expectation value of the product of the elements of the time series separated by time lag, and is a function of this time lag:

A(T)=E[x(t),x(t+T)]|over all available t

where T denotes the time lag and E is the expectation (averaging operator). Most often we will be dealing with autocorrelation of logarithmic returns. Unless stated otherwise, the time lag we show is the lag in “business time” or in other words, week-end and holiday periods (periods with no data) are excluded.

By construction, an autocorrelation is symmetric around 0. Therefore, plotting only one side (either positive or negative lags) is sufficient. Because in most contexts we talk about prediction, it is more intuitive to plot the negative lag side — that way one can interpret the axis of lags as a time axis, keeping in mind that what is about to happen (and is being predicted) is located at the 0 bin. Although in reality — and this needs to be said for the more rigorous reader — this axis is a diagonal direction of fixed time sums in the two-dimensional space of pairs of time points.

The time processes we know from experience to be predictable, such as the beating of our heart, variation of atmospheric temperature with season, ocean surf and tides, and the like, have informative autocorrelation values at non-zero time lags. Predictability does not imply causality, nor is causality always needed — even though winter does not cause spring, once you know you are in the middle of winter, you can predict that the temperature will be much higher in just a couple of months with good degree of confidence. It is often possible to transform the time series representing the process so that “informative” means non-zero autocorrelation, and non-zero means “informative”. Sometimes this can be done by replacing the original time series by that of increments or ratios (such as logarithmic returns), sometimes by subtracting an autocorrelation of a suitably constructed reference process — a synthetic, usually computer-simulated model of reality which incorporates the features we know about and consider trivial, but not the ones we want to learn about.

Forex Automaton™ has accumulated a collection of non-trivial correlation data for the forex markets. We regard their existence as a sufficient, but not a necessary condition of predictability. In other words, the market can be still predictable with completely trivial two-point autocorrelations (but with e.g. non-trivial genuine three-point correlations — although this does sound like one of those artificial math concoctions). If two-point autocorrelations happen to be non-trivial, that’s a sure sign of predictability — but you can’t count on that. Therefore, we do not suggest building a trading system bottom-up on the basis of autocorrelations (otherwise we would not make them public), or at least this is not how our own trading system was designed and built. But once the autocorrelations are found, ignoring them would be foolish.