More on how I know my forex forecasting works.

User Rating: / 0
Written by Forex Automaton   
Friday, 23 October 2009 15:42

This is a brief follow-up to the previous post on how I know my forex forecasting works. In that post I disclosed a measurement of a figure of merit I use to monitor the forecasting quality and optimize the algorithm, the figure of merit being the covariance of predicted and actual logarithmic returns on a day scale. The measurements were carried out for 16 values of the control parameter nicknamed Fred, which is currently the only "make it or break it" parameter responsible for the forecasting, and the only one being currently optimized. (As an aside note, there are other quantities which control the process like for example how big a chunk of data you look at. Those are believed to be more mundane and are currently fixed as some "reasonable" values -- which is not to say that I won't decide to take a more quantitative look at how reasonable those values are sooner or later.)  

The covariance of predicted and actual logarithmic returns is not the best quantity to look at when aggregating data for the different forex exchange rates: because of the somewhat different volatilities of those markets (different even despite the fact that the logarithmic returns take the absolute value of the exchange rate out of the picture), the resulting numbers for the major forex were volatility-weighted averages. Moreover, a quantity like 10-6, even if it's more than 2 standard deviations above zero, does not communicate the result to the non-expert in the intuitive way the result deserves.

These are the reasons why I went over from covariances to Pearson correlation coefficients, and today I am presenting the updated measurements.

Pearson correlation of predicted and actual day-scale forex logarithmic returns as a function of the forecasting parameter

Fig.1. Pearson correlation of predicted and actual day-scale logarithmic returns as a function of the forecasting parameter nicknamed Fred. The vertical bars, so called error-bars are a measure of uncertainty, are calculated as discussed in the previous post and have the same meaning. Back-testing simulations give the forecasing engine no access to the future data, direct or indirect. Significantly positive (and ideally, large) values correspond to quality forecasting.

The Pearson correlation coefficent is more intuitive because it has a known range from -1 (total anticorrelation) to 1 (total correlation) and that way there is a scale for comparison to know what is large and what is small.

As before, Fig.1 as such is free of bias -- it shows you all the possible Fred values. Absence of the "benefit of hindsight" is thus ensured on the stage of Fig.1 analysis: the statement that it is more likely for an arbitrarily chosen Fred value to result in a positive correlation between reality and forecast carries no bias towards a particular value and thus no benefit of hindsight. No doubt, the benefit of hindsight will enter the game once a single value of Fred is chosen on the basis of Fig.1. So if I made that choice and showed you past performance for that chosen value only, those simulated track records would be subject to the benefit of hindsight caveat.

The correlation in Fig.1 reaches values as high as 0.03 (3%) -- is that a large or a small number? A toy model may help. Imagine a coin tossing game with a fair coin (the one that has equal probabilities of heads and tails). Your chance of winning ("predicting") a single outcome is 1/2 and so is your chance of losing. Imagine that before every coin toss, you receive an advice on the outcome and place your bet accordingly. The advice alters your winning chances so that your chance of winning is now 1/2+a, and your chance of losing is 1/2-a. (We can say that when a>0, the advice improves your chances of winning.) The Pearson correlation between reality and prediction (advice) in this game is not hard to compute:


Actual outcomes

-1 1

Predicted outcomes

-1 1/4+a/2 1/4-a/2
1 1/4-a/2 1/4+a/2

Table 1: probabilities of outcomes in the coin tossing game with unfair advantage. The stake in the game is chosen to be 1, so the outcome can be -1 (a loss) or 1 (a win).

Since the coin itself is fair, both the predicted outcome and the actual outcome have variance of 1. The covariance of predicted and actual outcome is:

Cov(prediction,reality) = -2(1/4-a/2) + 2(1/4+a/2) = 2a.

In case of lack of advantage (a=0, meaningless advice), the covariance is zero. Good advice (Table 1 defines "good") leads to positive covariance.

Pearson(prediction,reality) = 2a/(1×1) = 2a.

If interpreted in terms of the coin tossing game, the Pearson correlation of 0.03 corresponds to a scenario when your unfair advantage parameter a equals 0.015. If you will, this is an intuitive lower bound measure of the degree of inefficiency in forex on the day scale, since this is the scale of decision making in this study. I say lower bound because a better algorithm could come up with a better result -- a higher correlation. Whether you think this number is large or small depends on what you think of this market -- and of the abilities of its large scale players to exhaust its opportunities. A very interesting direction of research will be to conduct a similar study for other time scales, ideally, down to the time scales of high frequency trading. Another direction is to see whether a multi-market pattern analysis  yields better results compared to the independent analysis of markets. 

Another interesting (although not quite quantitative yet) conclusion is that with such degree of predictability, I was "taking too much risk" in my Monte Carlo studies of simulated trading by risking 10% of the account. This could have "randomized" the results and complicated the optimization. See Ed Thorp on Kelly Criterion for more details.

Bookmark with:    Digg    reddit    Facebook    StumbleUpon    Newsvine
Last Updated ( Tuesday, 09 February 2010 12:48 )