

Forecasting optimization: an overlooked parameter fix improves quality while "efficient market" Monte Carlo supports the results. 
Written by Forex Automaton  
Friday, 27 November 2009 10:27  
Following up on the topic of our forex prediction quality measurements, I've decided to conduct the same analysis on the simulated data, unpredictable by construction. As before, I am tracing the dependence of the Pearson correlation coefficients between predicted and actual logarithmic returns in day close value on the magnitude of the forecasting parameter nicknamed Fred.
Recall that the Pearson correlation coefficient has a known range from 1 (the quantities being total opposites) to 1 (total correlation) and that way there is a scale for comparison to know what is large and what is small. A realistic expectation for a measurement like that is to lie around 0. If forex is efficient, there is no way to design a system capable of making predictions, since all available information is instantly discounted by the market, therefore yesterday's (and older) data are of no use to predict today's close: all yesterday's information was discounted yesterday. Therefore, in such a hypothetic situation, the predicted close (or equally, its representative in the analysis, predicted logarithmic return) and the actual one have only one choice  to yield zero covariance and zero Pearson correlation coefficient after a proper construction of these measures for a long enough chunk of data. An observation of a significant positive correlation, on the contrary, is sufficient to kill the efficient market hypothesis and is a constructive proof of market inefficiency  that is, a proof delivered by explicitly constructing an object which is not supposed to exits, according to the EMH "theory". Of course, such an object  essentially a moneymaking algorithm  is more than academically interesting. Unfortunately for pure science, as long as the internal essence of the algorithm is not shared with the public (and there are good reasons not to share it), it does not belong to the realm of science and does not "prove" anything in the academic sense. Therefore, here I am merely demonstrating and documenting, not fighting academic battles. As usual in my Monte Carlo studies, I prepare time series to mimic the mean values and volatilities of the real forex ones as described in the martingale article. Twenty such different artificial time series have been prepared this time for each of the major forex markets under study, AUD/USD, EUR/USD, GBP/USD, USD/CAD, USD/CHF and USD/JPY. The simulated and real time series in this measurement cover the time span from August 20, 2002 through August 21, 2009. As usual, the twenty simulated series are subjected to the same analysis as the real data, performing the predictions and constructing the correlation coefficients between such predictions and reality. For the purpose of inferring the accuracy of the measurements, the results of the twenty independent random series are treated as independent results. The mean and halfwidth of the red band in Fig.1 correspond to the mean and standard deviation of the Pearson correlation coefficient as measured on the simulated data. As you see, there are no surprises as to the magnitude. (A significantly positive quantity here might indicate that either the algorithm or the backtesting framework gets access to the data from the future, artificially improving the performance  or that something else is wrong). The lack of variation in the simulated data in Fig.1, once Fred exceeds a certain value (an effect not found in the data), indicates meaningful differences between reality and Monte Carlo, beyond the factors taken into account when creating the simulation. (Obviously the kind of differences we are after fit the same description). As before, Fig.1 as such is free of bias  it shows you all the possible Fred values. Absence of the "benefit of hindsight" is thus ensured on the stage of Fig.1 analysis: the statement that it is more likely for an arbitrarily chosen Fred value to result in a positive correlation between reality and forecast carries no bias towards a particular value and thus no benefit of hindsight. The benefit of hindsight will enter the game once a single value of Fred is chosen on the basis of Fig.1. So if I made that choice and showed you past performance for that chosen value only, those simulated track records would be subject to the benefit of hindsight caveat. The change, as compared to the previous measurement, is brought about mainly by a change in an obscure parameter (known as BigNumber) used to convert floating point quantities into integers to be used inside certain internal logic of the code. The BigNumber used before turned out to be too big  so much so that the tails of a logarithmic return distribution did not fit inside the limits of the integer type used. Thus, the outlier events were not properly treated.
After the fix, the new Fig.1 makes more sense, suggesting either fairly low Fred around 14 or the high one in the area where the variation of MC points with Fred stops. I prefer lower Fred and Fig.2 illustrates why: by discarding the outliers in the prediction times reality distribution, Fig.2 reveals that in the higher Fred range (60 and above), the positiveness of the Pearson coefficient was achieved entirely due to exceptional events  essentially, good forecasts in a high volatility environment  whereas, Fred=14 is seen to give good results in the bulk of the distribution. This may be the solution of the problem with the trading system optimization so far, which has been carried out for higher Fred values  but resulted in trades being placed almost exclusively in extreme volatility environments, raising doubts about sustainability of such optima. This is not to say that a forex trading system optimized for rare trading in extreme regimes has no right to exits  just that I would not like to pursue such a system as the first product launched. 

Last Updated ( Friday, 29 January 2010 16:32 ) 