

The First Annual Summary of Forex Automaton Research Progress, April 2009 
Written by Forex Automaton  
Friday, 03 April 2009 13:22  
Forex Automaton was launched in April 2008 with the ambitious mission of leveraging the specific algorithmic knowhow to create a trading signal service geared toward retail forex traders. From the very beginning a twoprong strategy was adopted: first, development of the trading system product whose usefulness relies on secrecy of the relevant knowhow. Second, whitepaper research focusing on statistical properties of the market time series, especially those aspects which are potentially interesting from the point of view of algorithmic trading, however counterintuitive, technical and remote from the mainstream picture of forex trading they may be. As of now, it is mostly the second prong that's visible to the website visitor. This document summarizes the main findings to emerge so far from a year of studies, including some glimpses into the progress made on the blackbox algorithmic trading system front. Market inefficiency: a physicist's perspectiveThe efficient market hypothesis (EMH) postulates that the "easy" money is perpetually "already made"  without specifying the time scale. This seems to be inspired by advances of natural sciences and has the look and feel of a conservation law familiar from physics. Indeed, isn't a wealth generating automaton akin to a perpetuum mobile, of either first or second kind, violating the law of energy conservation or the second law of thermodynamics? Isn't the statement that there is "no free lunch" akin to the energy conservation principle? Finance engineers use it to solve problems much the same way electrical engineers do. Financial engineers usually begin the history of their discipline with Louis Bachelier's 1900 PhD thesis "The Theory of Speculation", where the concept of random walk and Fourier's heat theory were applied to the movement of prices. (I base my discussion of Bachelier's work on the account given by Mandelbrot in "The Misbehavior of Markets"). Ironically, the Victorian physics look and feel was being imported into finance just as physics was about to undergo a dramatic separation with Victorian age: an intellectual revolution marked by the advent of quantum mechanics and relativity in the early XX century. Experiments with subatomic particles and precise astronomical observations limited the applicability of many common sense concepts rooted in the everyday experience to the physical scales typical of, well, that experience  and challenged Victorian confidence in their scaleindependent universality. Later, in what is sometimes called measurements of zero, experimentalists demonstrated that many naive symmetry statements (deemed to be conservation laws) are overstatements. In school, students of modern physics are confronted with the problem: an alien space ship approaches Earth. As the aliens do not want to annihilate into gammarays upon physical contact with our world, they want to know: is our world made of matter or of antimatter? Of course we don't know whether our and their definition of matter and antimatter are the same. The fact that such a definition can be communicated in principle means that the physical world is fundamentally asymmetric. In physics, P, C and CP symmetries turn out to be overstatements which may make us feel comfortable mentally, but are wrong. The physics paradigm of Enlightenment was promoting the sense of false symmetry and perhaps false security. A modern physics graduate suspects that the EMH is a similar overstatement making us feel comfortable emotionally: are you excited by the wealth generation opportunities offered by the speculative markets of today, but worried about inherent risks? Relax, consistent speculative profits are impossible statistically, say EMH proponents. As a mechanism of psychological adaptation, EMH is perhaps a useful part of social culture but a questionable part of science, even of the imprecise science such as economics  since crowd dynamics can be studied even with liberal arts methodology. As a quantitative tool in sellside finance business, EMH is part of "financial engineering" where the ideal symmetry rather than its real breaking, the supposed beauty of the world ("efficiency" of the market) rather than its actual ugliness seem to form the subject matter. Yet it is the ugliness (broken symmetry, lacking perfection) that makes the practitioner money in the reallife markets. It's easy to confuse perfection with stability (be it of cash flow or of scientifically reproducible results), and in search for stability to seek perfection and to end up idealizing the world. Meanwhile to the practitioner the main "business opportunity" is human imperfection  and is it a stable one... Symmetry (perfection) justifies reduction in information, symmetry breaking (an imperfection) on the contrary is always specific and brings in extra information. Let's look at specific patterns in reallife data. The method of analysisData aggregationInitial raw data comes as a series of bid and ask prices recorded at specific times. The series can be binned or aggregated on various time scales, the aggregation consisting of creating a series of consecutive, adjacent time bins (intervals) of the same length, and calculating the open, close, low and high levels of price for each of them. The time scale of analysis is the time duration of the bin. Logarithmic returnsLongterm absolute level of the price is almost irrelevant to a forex trader, what matters is relative movements. I use logarithmic returns to eliminate one trivial source of nonstationarity of the correlation functions which is the possible long timescale trend in the time dependence of the price. Ordinary returns eliminate it as well, but given the questionable convergence of second moments of the typical financial quantities (as discussed by Mandelbrot), keeping the analysis logarithmic is very important. Statistical reference and statistical significanceOn the hour scale, distributions of logarithmic returns look roughly exponential. The good news from this is that the second moments do converge, therefore the Central Limit Theorem and thus the usual machinery of statistical analysis, based on the normal distribution, applies to the correlation measurements (which are in this case sums of large numbers of products of quasiexponentially distributed quantities). The pimary reference is the socalled martingale, or a time series with no prehistory dependence. Nonzero autocorrelation values at nonzero time lags, when statistically significant, falsify this reference and signify predictability in the "weak" statistical sense. The statistical errors, the measure of uncertainty of measurement, are calculated directly for each correlation function measured by simulating a large number of uncorrelated time series, reproducing the volatility of the forex time series under study. On each of these, the same analysis is performed, and the precision of the resulting quantities can be estimated by looking at their variation among such reference time series. This general solution allows one to handle all situations, regardless of the exact shape of the distribution of logarithmic returns, the resulting degree of closeness to the normal distribution for the correlation quantity, and the effects of the possible time window cuts applied to the time series. This also eliminates doubts related to the software implementation of the mathematical techniques. However, effects of variation in the properties of the original time series with time (nonstationarity) are beyond the scope of this solution, and have to be addressed by e.g. ad hoc splitting of the time series into "volatile" and "precrisis" pieces. In the figures that follow what matters, to the first order, is the magnitude of nonzero lag features with respect to the level of statistical accuracy indicated by the red shade. The mean of the red band at each bin is the embodiment of the EMH for the market in question. A selfdelusion is possible when a subset of results of a particular flavor is chosen to support a particular conclusion with no due regard to the rest of cases which may support a contradicting conclusion. Repetition of the same analysis on different data sets, the number of those sets being as complete as possible, is typical for the Forex Automaton style and helps avoid such a pitfall. Speaking of intermarket correlations, a full complement of exchange rate pairs involving AUD/JPY, AUD/USD, CHF/JPY, EUR/AUD, EUR/CHF, EUR/GBP, EUR/JPY, EUR/USD, GBP/CHF, GBP/JPY, GBP/USD, USD/CAD, USD/CHF, USD/JPY has been analyzed for a particular time span from 2002 to 2008. Pattern 1: "bipolar disorder"The initial study of autocorrelations and intermarket correlations in forex was done within the data sample of the same fixed time span, from 00:00 20020820 to 00:00 20080201 (New York time). There is no fiddling with the choice of time coverage for the different subsamples  these are appletoapple comparisons. The socalled "bipolar disorder" is a tendency to form quickly alternating rises and falls, more pronounced than in a fully unpredictable time series of the same volatility, shows up as negative deeps surrounding the zerotime lag peak. The expression "bipolar disorder" follows B.Graham who used to attribute to the stock market the behavioral features of a metaphoric manicdepressive patient, Mr Market. The manicdepressive successions of rises and falls in the price time series, whereby rises seem to trigger the falls and vice versa, creates the negative autocorrelation feature in the corresponding series of logarithmic returns  recall that the product of two returns with the opposite sign is negative. Being next to each other on the chosen time scale, they create the negative deep at the unit lag. The feature often persists on more than one time scale (see AUD/JPY analysis as an example). The statistical concept of bipolar disorder is the exact opposite to the concept of a "trend", the latter being a positively selfcorrelated sequence of price movements. To be of practical value for forecasting and algorithmic trading, these features have to be expectable in the future, at least in principle. A rare but huge event and a frequent one of a moderate magnitude may leave the same trace on the autocorrelation. At the very least, one must ensure that these timeaveraged signals are not merely diluted residues of certain onceinalifetime events. If they are merely that, a trading system with regular decisionmaking and execution, such as the one being built here on the Forex Automaton site, is not the optimal strategy to take advantage of them. Time histories of the effect shown in Fig.1.2, demonstrate that we are not dealing merely with consequences of just a few rare events in the timeintegrated autocorrelations, Fig.1.1. The inefficiency in question ("bipolar disorder") is historically continuous. Pattern 2: "leading indicators"Correlated patterns can involve more than one market or forex exchange rate. A crosscorrelation or intemarket correlation, fully analogous to the autocorrelation, is the way of detecting such patterns. Unlike an autocorrelation, the intermarket correlation does not need to be symmetric, and its lack of symmetry, if significant, indicates that the forex exchange rates are not born equal: there are leaders and followers among them. Like the autocorrelation, crossmarket correlation is a function of its time lag variable t_{d} = t_{1}  t_{2}, where indices 1 and 2 denote different market time series. A large positive correlation at lag t_{d}=0 indicates that the markets move in tandem, a large negative one  that they move in the opposite directions. Nonzero correlations at nonzero time lags are particularly informative, it is these correlations that allow us to make statements like "A leads and B follows". Indeed, if A leads and B follows, then the movement that happened in A at t_{1} = a, happens in B at time t_{2} = a+d, where d is greater than 0. Then, a positive correlation value is associated with t_{d} = t_{1}  t_{2} = a  (a+d) = d. Naturally, an even larger positive correlation value is likely to be associated with lag 0, but the peak will look skewed to the left. Similarly, one can discuss a situation when B leads and A follows, the two being positively correlated, and conclude that the correlation peak will be skewed to the right. The good news is that the leading indicators are detectable with this method of analysis, and some of them are seen with a very high degree of statistical significance. The bad news is that the question "Which is the leading indicator for A?" has no straightforward general answer. It is not clear what makes B a leading indicator for A and not for C, or why D has no leading indicator. As a very crude rule of thumb, it can be inferred that a forex rate with a high interest rate differential, like AUD/JPY or EUR/JPY is likely to serve as a leading indicator for other exchange rates. Pattern 3: periodicity or oscillationAn autocorrelation of a periodic time series is a periodic function. From the point of view of a forex trading system development, this is a particularly attractive situation since periodic time series are easy to predict. At first sight this looks too good to be true, but take a look at the figures below. AUD/JPY on 10 second time scale (Fig.3.2, bottom panel) presents a case of blatant periodic predictability in forex, even though perhaps of little practical value due to the smallness (1 pip) of the amplitude of the predictable component. The most natural explanation of this pattern seems to be algorithmic trading by large volume players. Large players do practice algorithmic trading, one of the major problems being the sheer scale of the trading. Large institutions move money on the scale where their trading itself is able to move the market. Concealing their intentions and minimizing the effect of the trading on price is one of the top priorities. A conceivable solution is to distribute the volume thinly over time. Because computers emulate continuity by discreteness of high frequency, and the programmers may have decided that once per 30 seconds is frequent enough, the trades seem to end up being placed in bursts. There may be another large institution placing the opposite trades in the same manner at the same time. It's not difficult to see that the net result of such activities may produce an autocorrelation like Fig.3.2, bottom panel. What did the crisis change?With the present crisis a topical theme, many various epithets are being attributed to the markets. What exactly changed with forex during the crisis? Did the exchange rates become unpredictable (efficient)? Less predictable than usual? More predictable in a new way? Was it a qualitative change in the set of predictability patterns? Or was it only a change in the amplitude of the fluctuations?  this was the set of question I set out to explore in the "Time evolution..." section of the blog. So far, having looked at the autocorrelations only, a few things became clearer. There is a prehistory dependence pattern to accompany the volatility increaseOver and over in the time dependence of the autocorrelation peak structure for the various forex rates, the same pattern is seen: as the volatility goes up dramatically starting in Fall 2008, so does the "bipolar disorder" pattern. From a pure math point of view, "volatility" as the second moment with zero time lag (essentially, variance) tells one nothing about predictability or unpredictability, and the volatility of any magnitude can exist in "efficient" martingale markets or in the markets which are to some extent prehistory dependent. In real life, the acute phase of the crisis did not flatten the correlation functions at nonzero lags, but distorted them to move even further away from the pattern of a stable trend. The markets oscillate, they move up today because they moved down yesterday and vice versa. This is the "bipolar disorder" pattern. The bipolar disorder pattern which manifests itself in the negative correlation quantity at the nexttozero lag, is seen in Fig.4.1 to become more popular in the bottom row of plots. The correlations seem to be dragged to conform with the "bipolar" shape, at least such is the conclusion drawn by comparing the top panels of pictures, which cover the time slice from August 2007 to August 2008, to the bottom row which covers what happened afterwards at much higher volatility. Should we be surprised that the markets turned manicdepressive during the panic, instead of being "efficient"? Is it real? Swarms of blackbox automata papertrade real and simulated marketsThe Holy Grail of automated trading system development is to find stable imperfections  deviations from market efficiency which remain everpresent, or present at predictable times, or present most of the time without turning into their opposites the rest of time, or at lest present long enough for the algorithm to detect and "learn" them, use them, and "unlearn" them if they are no longer there without disastrous losses. In this quest, the "black box" system does not have to limit itself with the subset of imperfections which can be conveniently explained to humans or interpreted by them. A useful distinction is between the "artificial intelligence" (AI) core, capable of "learning", and adjustable parameters which control the learning process and the execution of the strategy. One of the tasks of the trading algorithm design is to define adjustable parameters, quantitative knobs that tune the system. These are then optimized by realistic simulated trading runs on historical data. At the end of the day, a single set of parameters has to be chosen on the basis of simulated performance, before realtime operation can be launched. Understanding that the past performance is no indication of future results, relying on past performance alone feels unsatisfying intellectually. Equally unsatisfying is to reduce the problem of success or failure of the AI core as such to the question of success or failure of an individual set of parameters. A solution to this problem is to look simultaneously at the entire range of possible sets of parameters and evaluate the AI as such by comparing performance of the exact same AI with the exact same sets of parameters on real data and the artificial data with deliberately eliminated predictability. Eliminating predictability from the market time series is easier than it sounds: the time series is analyzed to infer the distribution of returns. Synthesizing a series of random numbers according to a given recorded distribution is a problem solved easily with a tool such as ROOT, and it does not require assumptions about the distribution other than that it's integrable (which is a far safer assumption than requiring existence of a mean, not to mention existence of a second moment). Then you start with an arbitrary number and generate a random number according to the distribution of returns you got. Having a starting price and a return, you obtain the next price in the series. You can continue this random walk process as long as you want. The martingale market so obtained is devoid of predictability and all reallife features other than the possible long range trend of the original market it models. The results for the AUD/JPY time series, daily data, covering the time range from August 20, 2002 through September 01, 2008, are shown in Fig.5.1. Each point in Fig.5.1 represents a simulated trading history. Points differ by the input parameters which enter the decision making algorithm. The parameters affect the style of forecasting as well as money management (stop loss, sensitivity to entrance signal, and sensitivity to exit signal). There is currently only one parameter responsible for the forecasting as such, and its nature will not be disclosed. The fact that larger return requires larger risk, as seen from Fig.5.1, is familiar to every investor and does not need much commenting. The key theme of Fig.5.1 is the comparison between real and "fake" markets. Exactly the same algorithm trades real and four synthesized Monte Carlo markets, going through exactly the same selection of over 13 thousand combinations of parameters. It is the ability to exploit the nonrandomness of the real market that constitutes the result of the learning process and it's the difference between the red and black points collectively that demonstrates the quality of our work. Indeed, relatively few red points lie in the undesirable area of negative return, despite the latter being the predominant outcome of trading a random market. For a given degree of risk, the red points provide consistently better return. Fig.5.1 shows that you can be lucky and win against the random market, as is seen from many "successful" black points. Efficient market theorists would use this property of the game to reconcile the trading success of the few outstanding individuals with the "you can't beat the market" assumption  to them, an outstanding trader would represent just such a black point outlier outcome. While the argument itself is unassailable in every individual case, Fig.5.1 disagrees with the efficient market hypothesis for the minimum bias statistical sample of simulated trading histories, by demonstrating the ability of an algorithm (no matter how designed) to "beat" the unpredictable market of the EMH with consistently higher returns and thus to see the difference between the real and "efficient" market. Conclusions and the future


Last Updated ( Thursday, 31 March 2011 10:06 ) 