The First Annual Summary of Forex Automaton Research Progress, April 2009

User Rating: / 7
Written by Forex Automaton   
Friday, 03 April 2009 13:22
Article Index
The First Annual Summary of Forex Automaton Research Progress, April 2009
Forex bipolarity
Leading indicators
Periodicity or oscillation
What did the crisis change?
Black-box algorithm paper-trading
Conclusions. Future.
All Pages

Forex Automaton was launched in April 2008 with the ambitious mission of leveraging the specific algorithmic know-how to create a trading signal service geared toward retail forex traders. From the very beginning a two-prong strategy was adopted: first, development of the trading system product whose usefulness relies on secrecy of the relevant know-how. Second, white-paper research focusing on statistical properties of the market time series, especially those aspects which are potentially interesting from the point of view of algorithmic trading, however counter-intuitive, technical and remote from the mainstream picture of forex trading they may be. As of now, it is mostly the second prong that's visible to the website visitor. This document summarizes the main findings to emerge so far from a year of studies, including some glimpses into the progress made on the black-box algorithmic trading system front.

Market inefficiency: a physicist's perspective

The efficient market hypothesis (EMH) postulates that the "easy" money is perpetually "already made" -- without specifying the time scale. This seems to be inspired by advances of natural sciences and has the look and feel of a conservation law familiar from physics. Indeed, isn't a wealth generating automaton akin to a perpetuum mobile, of either first or second kind, violating the law of energy conservation or the second law of thermodynamics? Isn't the statement that there is "no free lunch" akin to the energy conservation principle? Finance engineers use it to solve problems much the same way electrical engineers do.

Financial engineers usually begin the history of their discipline with Louis Bachelier's 1900 PhD thesis "The Theory of Speculation", where the concept of random walk and Fourier's heat theory were applied to the movement of prices. (I base my discussion of Bachelier's work on the account given by Mandelbrot in "The Misbehavior of Markets"). Ironically, the Victorian physics look and feel was being imported into finance just as physics was about to undergo a dramatic separation with Victorian age: an intellectual revolution marked by the advent of quantum mechanics and relativity in the early XX century. Experiments with subatomic particles and precise astronomical observations limited the applicability of many common sense concepts rooted in the everyday experience to the physical scales typical of, well, that experience -- and challenged Victorian confidence in their scale-independent universality.

Later, in what is sometimes called measurements of zero, experimentalists demonstrated that many naive symmetry statements (deemed to be conservation laws) are overstatements. In school, students of modern physics are confronted with the problem: an alien space ship approaches Earth. As the aliens do not want to annihilate into gamma-rays upon physical contact with our world, they want to know: is our world made of matter or of anti-matter? Of course we don't know whether our and their definition of matter and antimatter are the same. The fact that such a definition can be communicated in principle means that the physical world is fundamentally asymmetric.

In physics, P, C and CP symmetries turn out to be overstatements which may make us feel comfortable mentally, but are wrong. The physics paradigm of Enlightenment was promoting the sense of false symmetry and perhaps false security. A modern physics graduate suspects that the EMH is a similar overstatement making us feel comfortable emotionally: are you excited by the wealth generation opportunities offered by the speculative markets of today, but worried about inherent risks? Relax, consistent speculative profits are impossible statistically, say EMH proponents. As a mechanism of psychological adaptation, EMH is perhaps a useful part of social culture but a questionable part of science, even of the imprecise science such as economics -- since crowd dynamics can be studied even with liberal arts methodology. As a quantitative tool in sell-side finance business, EMH is part of "financial engineering" where the ideal symmetry rather than its real breaking, the supposed beauty of the world ("efficiency" of the market) rather than its actual ugliness seem to form the subject matter. Yet it is the ugliness (broken symmetry, lacking perfection) that makes the practitioner money in the real-life markets. It's easy to confuse perfection with stability (be it of cash flow or of scientifically reproducible results), and in search for stability to seek perfection and to end up idealizing the world. Meanwhile to the practitioner the main "business opportunity" is human imperfection -- and is it a stable one...

Symmetry (perfection) justifies reduction in information, symmetry breaking (an imperfection) on the contrary is always specific and brings in extra information. Let's look at specific patterns in real-life data.

The method of analysis

Data aggregation

Initial raw data comes as a series of bid and ask prices recorded at specific times. The series can be binned or aggregated on various time scales, the aggregation consisting of creating a series of consecutive, adjacent time bins (intervals) of the same length, and calculating the open, close, low and high levels of price for each of them. The time scale of analysis is the time duration of the bin.

Logarithmic returns

Long-term absolute level of the price is almost irrelevant to a forex trader, what matters is relative movements. I use logarithmic returns to eliminate one trivial source of non-stationarity of the correlation functions which is the possible long time-scale trend in the time dependence of the price. Ordinary returns eliminate it as well, but given the questionable convergence of second moments of the typical financial quantities (as discussed by Mandelbrot), keeping the analysis logarithmic is very important.

Statistical reference and statistical significance

On the hour scale, distributions of logarithmic returns look roughly exponential. The good news from this is that the second moments do converge, therefore the Central Limit Theorem and thus the usual machinery of statistical analysis, based on the normal distribution, applies to the correlation measurements (which are in this case sums of large numbers of products of quasi-exponentially distributed quantities).

The pimary reference is the so-called martingale, or a time series with no prehistory dependence. Non-zero auto-correlation values at non-zero time lags, when statistically significant, falsify this reference and signify predictability in the "weak" statistical sense.

The statistical errors, the measure of uncertainty of measurement, are calculated directly for each correlation function measured by simulating a large number of uncorrelated time series, reproducing the volatility of the forex time series under study. On each of these, the same analysis is performed, and the precision of the resulting quantities can be estimated by looking at their variation among such reference time series. This general solution allows one to handle all situations, regardless of the exact shape of the distribution of logarithmic returns, the resulting degree of closeness to the normal distribution for the correlation quantity, and the effects of the possible time window cuts applied to the time series. This also eliminates doubts related to the software implementation of the mathematical techniques. However, effects of variation in the properties of the original time series with time (non-stationarity) are beyond the scope of this solution, and have to be addressed by e.g. ad hoc splitting of the time series into "volatile" and "pre-crisis" pieces.

In the figures that follow what matters, to the first order, is the magnitude of non-zero lag features with respect to the level of statistical accuracy indicated by the red shade. The mean of the red band at each bin is the embodiment of the EMH for the market in question.

A self-delusion is possible when a subset of results of a particular flavor is chosen to support a particular conclusion with no due regard to the rest of cases which may support a contradicting conclusion. Repetition of the same analysis on different data sets, the number of those sets being as complete as possible, is typical for the Forex Automaton style and helps avoid such a pitfall. Speaking of inter-market correlations, a full complement of exchange rate pairs involving AUD/JPY, AUD/USD, CHF/JPY, EUR/AUD, EUR/CHF, EUR/GBP, EUR/JPY, EUR/USD, GBP/CHF, GBP/JPY, GBP/USD, USD/CAD, USD/CHF, USD/JPY has been analyzed for a particular time span from 2002 to 2008.

Pattern 1: "bipolar disorder"

The initial study of autocorrelations and inter-market correlations in forex was done within the data sample of the same fixed time span, from 00:00 2002-08-20 to 00:00 2008-02-01 (New York time). There is no fiddling with the choice of time coverage for the different sub-samples -- these are apple-to-apple comparisons.

AUD/JPY bipolar disorder autocorrelation AUD/USD bipolar disorder autocorrelation GBP/JPY bipolar disorder autocorrelation USD/CAD bipolar disorder autocorrelation
CHF/JPY bipolar disorder autocorrelation EUR/AUD bipolar disorder autocorrelation EUR/CHF bipolar disorder autocorrelation EUR/GBP bipolar disorder autocorrelation

Fig. 1.1. The most stunning predictive autocorrelations on the hour scale among the forex exchange rates. Click on the panel to get to the article reporting the measurement. Top row, left to right: AUD/JPY, AUD/USD, GBP/JPY, USD/CAD. Bottom row, left to right: CHF/JPY, EUR/AUD, EUR/CHF, EUR/GBP. The noise (red in the figure) is obtained from martingale simulations based on the historical volatilities of the forex rate in question for the period under study. The noise is presented as mean plus-minus 1 RMS, where the RMS characterizes distribution of the correlation value obtained for this particular time lag bin by analyzing 20 independent simulated pairs of uncorrelated time series. The RMS is a measure of accuracy in the determination of the correlation values, and is dependent on the amount of data and the time scale.

The so-called "bipolar disorder" is a tendency to form quickly alternating rises and falls, more pronounced than in a fully unpredictable time series of the same volatility, shows up as negative deeps surrounding the zero-time lag peak. The expression "bipolar disorder" follows B.Graham who used to attribute to the stock market the behavioral features of a metaphoric manic-depressive patient, Mr Market. The manic-depressive successions of rises and falls in the price time series, whereby rises seem to trigger the falls and vice versa, creates the negative auto-correlation feature in the corresponding series of logarithmic returns -- recall that the product of two returns with the opposite sign is negative. Being next to each other on the chosen time scale, they create the negative deep at the unit lag. The feature often persists on more than one time scale (see AUD/JPY analysis as an example).

The statistical concept of bipolar disorder is the exact opposite to the concept of a "trend", the latter being a positively self-correlated sequence of price movements.

AUD/JPY 1-hour delayed autocorrelation history
AUD/USD 1-hour delayed autocorrelation history
CHF/JPY 1-hour delayed autocorrelation history

Fig. 1.2. History of the 1-hour lagged autocorrelation magnitude in some markets with "bipolar disorder" effect. Click on the panel to get to the article reporting the measurement. Time axis is labeled in MM-YY format. The number of bins (24) equals the number of quarters in the six-year period. The noise (red in the figure) is obtained from martingale simulations based on the historical volatilities of the forex rate in question for the period under study. The noise is presented as mean plus-minus 1 RMS, where the RMS characterizes distribution of the correlation value obtained for this particular time lag bin by analyzing 20 independent simulated pairs of uncorrelated time series. The RMS is a measure of accuracy in the determination of the correlation values, an uncertainty dependent on the amount of data and the time scale. From top to bottom: AUD/JPY, AUD/USD, CHF/JPY.

To be of practical value for forecasting and algorithmic trading, these features have to be expectable in the future, at least in principle. A rare but huge event and a frequent one of a moderate magnitude may leave the same trace on the autocorrelation. At the very least, one must ensure that these time-averaged signals are not merely diluted residues of certain once-in-a-lifetime events. If they are merely that, a trading system with regular decision-making and execution, such as the one being built here on the Forex Automaton site, is not the optimal strategy to take advantage of them. Time histories of the effect shown in Fig.1.2, demonstrate that we are not  dealing merely with consequences of just a few rare events in the time-integrated autocorrelations, Fig.1.1. The inefficiency in question ("bipolar disorder") is historically continuous.

Pattern 2: "leading indicators"

Correlated patterns can involve more than one market or forex exchange rate. A cross-correlation or intemarket correlation, fully analogous to the autocorrelation, is the way of detecting such patterns. Unlike an autocorrelation, the inter-market correlation does not need to be symmetric, and its lack of symmetry, if significant, indicates that the forex exchange rates are not born equal: there are leaders and followers among them.

Like the autocorrelation, cross-market correlation is a function of its time lag variable

td = t1 - t2,

where indices 1 and 2 denote different market time series. A large positive correlation at lag td=0 indicates that the markets move in tandem, a large negative one -- that they move in the opposite directions. Non-zero correlations at non-zero time lags are particularly informative, it is these correlations that allow us to make statements like "A leads and B follows". Indeed, if A leads and B follows, then the movement that happened in A at t1 = a, happens in B at time t2 = a+d, where d is greater than 0. Then, a positive correlation value is associated with

td = t1 - t2 = a - (a+d) = -d.

Naturally, an even larger positive correlation value is likely to be associated with lag 0, but the peak will look skewed to the left. Similarly, one can discuss a situation when B leads and A follows, the two being positively correlated, and conclude that the correlation peak will be skewed to the right.

AUD/JPY leading indicator for EUR/CHF AUD/JPY leading indicator for EUR/USD AUD/JPY leading indicator for GBP/USD CHF/JPY leading indicator for EUR/AUD CHF/JPY leading indicator for CHF/USD
EUR/JPY leading indicator for EUR/AUD EUR/USD leading indicator for CHF/EUR GBP/USD leading indicator for CHF/EUR EUR/JPY leading indicator for EUR/USD EUR/JPY leading indicator for GBP/USD
EUR/JPY leading indicator for CHF/USD GBP/JPY leading indicator for EUR/USD EUR/USD leading indicator for JPY/USD GBP/USD leading indicator for CAD/USD EUR/CHF leading indicator for USD/JPY

Fig.2.1.Pairs of exchange rates with the most stunning predictive cross-correlations on the hour scale. Click on the panel to get to the article reporting the measurement. Top row, left to right: AUD/JPY and EUR/CHF, AUD/JPY and EUR/USD, AUD/JPY and GBP/USD, CHF/JPY and EUR/AUD, CHF/JPY and USD/CHF. Middle row, left to right: EUR/AUD and EUR/JPY, EUR/CHF and EUR/USD, EUR/CHF and GBP/USD, EUR/JPY and EUR/USD, EUR/GBP and GBP/USD. Bottom row, left to right: EUR/JPY and USD/CHF, EUR/USD and GBP/JPY, EUR/USD and USD/JPY, GBP/USD and USD/CAD, USD/JPY and EUR/CHF. Vertical scales may differ among panels.

The good news is that the leading indicators are detectable with this method of analysis, and some of them are seen with a very high degree of statistical significance. The bad news is that the question "Which is the leading indicator for A?" has no straightforward general answer. It is not clear what makes B a leading indicator for A and not for C, or why D has no leading indicator. As a very crude rule of thumb, it can be inferred that a forex rate with a high interest rate differential, like AUD/JPY or EUR/JPY is likely to serve as a leading indicator for other exchange rates.

Pattern 3: periodicity or oscillation

An autocorrelation of a periodic time series is a periodic function. From the point of view of a forex trading system development, this is a particularly attractive situation since periodic time series are easy to predict. At first sight this looks too good to be true, but take a look at the figures below.

AUD LIBOR periodicity
USD LIBOR periodicity
NZD LIBOR periodicity
SEK 1-week LIBOR periodicity
CHF 12-month LIBOR periodicity
GBP 3-month LIBOR periodicity

Fig.3.1.Periodicity in LIBOR. Click on the panel to get to the article reporting the measurement. Top to bottom: AUD overnight LIBOR (day), USD overnight LIBOR (day), NZD overnight LIBOR (day), SEK 1-week LIBOR (day), CHF 12-month LIBOR (day), GBP 3-month LIBOR (day), Vertical scales may differ among panels.

USD/CAD periodicity
AUD/JPY periodicity

Fig.3.2.Periodicity in forex. Click on the panel to get to the article reporting the measurement. Top to bottom: USD/CAD (hour), AUD/JPY (10 seconds). Vertical scales may differ among panels.

AUD/JPY on 10 second time scale (Fig.3.2, bottom panel) presents a case of blatant periodic predictability in forex, even though perhaps of little practical value due to the smallness (1 pip) of the amplitude of the predictable component. The most natural explanation of this pattern seems to be algorithmic trading by large volume players. Large players do practice algorithmic trading, one of the major problems being the sheer scale of the trading. Large institutions move money on the scale where their trading itself is able to move the market. Concealing their intentions and minimizing the effect of the trading on price is one of the top priorities. A conceivable solution is to distribute the volume thinly over time. Because computers emulate continuity by discreteness of high frequency, and the programmers may have decided that once per 30 seconds is frequent enough, the trades seem to end up being placed in bursts. There may be another large institution placing the opposite trades in the same manner at the same time. It's not difficult to see that the net result of such activities may produce an autocorrelation like Fig.3.2, bottom panel.

What did the crisis change?

With the present crisis a topical theme, many various epithets are being attributed to the markets. What exactly changed with forex during the crisis? Did the exchange rates become unpredictable (efficient)? Less predictable than usual? More predictable in a new way? Was it a qualitative change in the set of predictability patterns? Or was it only a change in the amplitude of the fluctuations? -- this was the set of question I set out to explore in the "Time evolution..." section of the blog. So far, having looked at the autocorrelations only, a few things became clearer.

There is a pre-history dependence pattern to accompany the volatility increase

Over and over in the time dependence of the auto-correlation peak structure for the various forex rates, the same pattern is seen: as the volatility goes up dramatically starting in Fall 2008, so does the "bipolar disorder" pattern. From a pure math point of view, "volatility" as  the second moment with zero time lag (essentially, variance) tells one nothing about predictability or unpredictability, and the volatility of any magnitude can exist in "efficient" martingale markets or in the markets which are to some extent pre-history dependent. In real life, the acute phase of the crisis did not flatten the correlation functions at non-zero lags, but distorted them to move even further away from the pattern of a stable trend. The markets oscillate, they move up today because they moved down yesterday and vice versa. This is the "bipolar disorder" pattern.

Fig.4.1. Modification of the auto-correlation structure in the vicinity of zero lag peak during the present financial crisis. Click on the panel to get to the article reporting the measurement. Vertical scales may differ among panels. Top: low volatility phase of the crisis. Bottom: same market during the high volatility phase of the crisis (its end point may differ for analyses performed at different time). Left to right: AUD/JPY, AUD/USD, CHF/JPY, EUR/CHF, EUR/GBP, EUR/JPY, EUR/USD, GBP/CHF, GBP/JPY, GBP/USD.

The bipolar disorder pattern which manifests itself in the negative correlation quantity at the next-to-zero lag, is seen in Fig.4.1 to become more popular in the bottom row of plots. The correlations seem to be dragged to conform with the "bipolar" shape, at least such is the conclusion drawn by comparing the top panels of pictures, which cover the time slice from August 2007 to August 2008, to the bottom row which covers what happened afterwards at much higher volatility.

Should we be surprised that the markets turned manic-depressive during the panic, instead of being "efficient"?

Is it real? Swarms of black-box automata paper-trade real and simulated markets

The Holy Grail of automated trading system development is to find stable imperfections -- deviations from market efficiency which remain ever-present, or present at predictable times, or present most of the time without turning into their opposites the rest of time, or at lest present long enough for the algorithm to detect and "learn" them, use them, and "unlearn" them if they are no longer there without disastrous losses. In this quest, the "black box" system does not have to limit itself with the subset of imperfections which can be conveniently explained to humans or interpreted by them. A useful distinction is between the "artificial intelligence" (AI) core, capable of "learning", and adjustable parameters which control the learning process and the execution of the strategy. One of the tasks of the trading algorithm design is to define adjustable parameters, quantitative knobs that tune the system. These are then optimized by realistic simulated trading runs on historical data.

At the end of the day, a single set of parameters has to be chosen on the basis of simulated performance, before real-time operation can be launched. Understanding that the past performance is no indication of future results, relying on past performance alone feels unsatisfying intellectually. Equally unsatisfying is to reduce the problem of success or failure of the AI core as such to the question of success or failure of an individual set of parameters. A solution to this problem is to look simultaneously at the entire range of possible sets of parameters and evaluate the AI as such by comparing performance of the exact same AI with the exact same sets of parameters on real data and the artificial data with deliberately eliminated predictability.

Eliminating predictability from the market time series is easier than it sounds: the time series is analyzed to infer the distribution of returns. Synthesizing a series of random numbers according to a given recorded distribution is a problem solved easily with a tool such as ROOT, and it does not require assumptions about the distribution other than that it's integrable (which is a far safer assumption than requiring existence of a mean, not to mention existence of a second moment). Then you start with an arbitrary number and generate a random number according to the distribution of returns you got. Having a starting price and a return, you obtain the next price in the series. You can continue this random walk process as long as you want. The martingale market so obtained is devoid of predictability and all real-life features other than the possible long range trend of the original market it models.

Return vs risk for trading AUD/JPY algorithmically with real and simulated data

Fig.5.1.Return vs risk for paper trading in real AUD/JPY and simulated reference random walk markets. Vertical axis: annualized return, calculated as an average of independent monthly statistics, 1=100% per annum. Horizontal axis: standard deviation (RMS) of such annualized return from the same monthly statistics, same unit. Each point represents an instance of a trading system, a simulated trader; points differ by trading strategy which is subject to optimization. Black points: trading 4 fantasy markets with volatility of AUD/JPY, random by construction (no predictability or patterns). Red points: real AUD/JPY. Mean annualized returns above 5 (500%) and RMS above 20 are excluded to make the picture easier to analyze visually.

The results for the AUD/JPY time series, daily data, covering the time range from August 20, 2002 through September 01, 2008, are shown in Fig.5.1. Each point in Fig.5.1 represents a simulated trading history. Points differ by the input parameters which enter the decision making algorithm. The parameters affect the style of forecasting as well as money management (stop loss, sensitivity to entrance signal, and sensitivity to exit signal). There is currently only one parameter responsible for the forecasting as such, and its nature will not be disclosed.

The fact that larger return requires larger risk, as seen from Fig.5.1, is familiar to every investor and does not need much commenting. The key theme of Fig.5.1 is the comparison between real and "fake" markets. Exactly the same algorithm trades real and four synthesized Monte Carlo markets, going through exactly the same selection of over 13 thousand combinations of parameters. It is the ability to exploit the non-randomness of the real market that constitutes the result of the learning process and it's the difference between the red and black points collectively that demonstrates the quality of our work. Indeed, relatively few red points lie in the undesirable area of negative return, despite the latter being the predominant outcome of trading a random market. For a given degree of risk, the red points provide consistently better return.

Fig.5.1 shows that you can be lucky and win against the random market, as is seen from many "successful" black points. Efficient market theorists would use this property of the game to reconcile the trading success of the few outstanding individuals with the "you can't beat the market" assumption -- to them, an outstanding trader would represent just such a black point outlier outcome. While the argument itself is unassailable in every individual case, Fig.5.1 disagrees with the efficient market hypothesis for the minimum bias statistical sample of simulated trading histories, by demonstrating the ability of an algorithm (no matter how designed) to "beat" the unpredictable market of the EMH with consistently higher returns and thus to see the difference between the real and "efficient" market.

Conclusions and the future


  • The signals of forex predictability are generally weak enough to justify thoughtful statistical treatment. Leading indicators do exist even within the forex market itself, but in a "weak" statistical sense.
  • Forex prices often do not move in trends, if by trend a broad positive autocorrelation is understood.
  • Forex time series are not random walks, either. The way they move however is closer to the random walk than to a steady trend, just like a negative number is closer to zero than to a positive number -- and just like a negative number is not zero, forex time series are in general not random. They are non-random because they are not free of pre-history, even though their pre-history dependence is of "today will be the opposite of yesterday" type, rather than "today will be like yesterday". The actual dependencies can also be more complex than can be adequately described by the two-point language, like yesterday vs today.
  • Since steady trends, today-like-yesterday, tomorrow-like-today, are more intuitive for humans to deal with, this feature of forex spells trouble for human traders, especially the beginners.
  • A properly constructed "forex automaton" or a trading algorithm, on the contrary, should be able to capitalize on the market non-randomness no matter what flavor of non-randomness this is. To optimize and launch our first real-time trading signal service is the goal for the second year of operation.


Bookmark with:    Digg    reddit    Facebook    StumbleUpon    Newsvine
Last Updated ( Thursday, 31 March 2011 10:06 )