Forex Automaton as a Shannon’s Communication Channel. Introducing Kelly Criterion.

The intention of this post is to tie together several topics which appeared on my radar screen in the course of the trading system optimization. First, it has been understandably hard to fully rid oneself of vestiges of the mainstream financial theory based on the postulate of market efficiency, while building a wealth-generating tool relying explicitly on demonstrable market inefficiencies. The realization that Sharpe ratio does not let one make an objective choice of a portfolio was there from the beginning, and I recall perceiving this fact as a “necessary evil”. Then came the understanding of the fact that an arithmetic average of returns gives one a biased picture of long-term return, and consequently, Sharpe ratio is built around biased quantities.

In what followed, the key concept was that of Kelly Criterion and the rich intellectual context it is part of — quite remote from the mainstream financial engineering. Initially I came across a mention of Kelly Criterion in some pieces by Edward Thorp, found on his website, but did not fully appreciate the depth of the context. Later, prompted by some readers of this site, I learned about Ed Thorp’s more extensive exposition of Kelly Criterion, “The Kelly Criterion in blackjack, sports betting, and the stock market”.

A good place to begin is probably Claude Shannon’s work The Mathematical Theory of Communication (originally, The Bell System Technical Journal, Vol. 27, pp. 379-423, 623-656, July, October, 1948). The work introduces the concept of information in the strict theoretical sense. It deals with measures of information and redundancy. These are probably two most important concepts in the algorithmic trading — suffice it to say that if the markets had not been redundant, algorithmic trading in the sense discussed and developed here would have been impossible and trading would degenerate into gambling. Much of’s research content directly deals with measurements of informational redundancy in forex. Shannon’s information theory is, among other important things, the venue for cryptography, linguistics, statistics and mathematics to meet in the context of financial speculation. Shannon introduced the concept of mutual information to characterize transmission capacity of communication channels.

In 1956, Kelly introduced what is known as Kelly Criterion in an application of Shannon’s theory to gambling. He titled the work “A New Interpretation of Information Rate”. The communication channel considered is a very specific one: it is a noisy channel allowing a gambler to know in advance the outcomes of chance events and bet accordingly. “Noisy” means that the information gambler receives is in general not perfect. Shannon’s mutual information (channel capacity) is the measure of quality of betting tips the gambler receives. Because the chance of losing is non-negligible, the gambler can not bet all of her capital on every game, but because the value of insider data she receives is non-negligible either, her optimal betting ratio is non-zero. The optimal betting ratio in the simple case of two symmetric outcomes equals the difference between the win and loss probabilities if the tip is followed. The maximum expected logarithmic rate of growth of capital (per game) turns out to be Shannon’s mutual information of the abstract communication channel, with true information as the input and prediction as the output (or vice versa, since the formula is symmetric).

This gets me back to the measurements of Forex Automaton prediction quality — the state of the art at the moment is the measurement of Pearson correlation coefficient between predicted and real value of logarithmic returns. I am eager to see what the plot will look like if Shannon’s mutual information is used instead of the correlation coefficient — this is particularly interesting, given Kelly’s interpretation of the quantity.

Where does this all leave the common theory with its Sharpe ratios and efficient portfolios? A recent (June 2009) article by Javier Estrada, Geometric Mean Maximization: an Overlooked Portfolio Approach? proved informative to me. Estrada contrasts Sharpe ratio maximization which leaves one with not one, but an entire efficient frontier of portfolios with geometric mean maximization (Kelly by other name) leading to the largest expected terminal wealth. Estrada notes that SRM (Sharpe ratio maximization) is a one-period framework, while GMM (geometric mean maximization) is a multi-period framework — I take this to mean that the biased nature of Sharpe when cumulative returns are concerned (as already noted, one of my early disappointments with this statistic) is thus acknowledged in the academic community.

Why do the practitioners overlook a useful criterion? — wonders Estrada. My guess is that the idea of Kelly’s “private wire” and the like is simply too alien to the version of the financial theory based on the postulates of market efficiency, thus making the corollary of Shannon’s information theory as applied to markets, the Kelly Criterion, genetically alien as well. One should never underestimate the role of the overall surrounding cultural and ideological context in the development of science. The symptomatic overstatement of symmetries by the theorists of “market efficiency” may be the case in point. Certainly, Kelly makes more sense to someone dealing with wealth generation on the basis of quantifiable and specific advantages, rather than just submitting oneself to the egalitarian random walk of the hypothetical “efficient market”. This is because Kelly addresses the problem of the value of information directly, it is at the heart of his approach.

Possible Figures Of Merit Related To Return On Investment: The Arithmetic And Geometric Mean

When evaluating the performance of a trading system, I calculate the first moment (an arithmetic mean of the series of returns) as well as the second one (a variance of the series). Originally my “Sharpe-like” ratio, used to adjust the return for the risk, was a ratio of the first moment to the square root of the second. The series of returns would be composed of annualized returns calculated every month.

However there is a subtlety. If {r1,r2,r3,… rn} is a series of monthly returns for some period (ratios of capital at the end of the period to that at the start of the period), then the total return for the same period will be the product r1r2r3… rn.

When each monthly return is already annualized (exponentiated to the 12th power), the opposite operation (applying power 1/12 to the product) is required to get the actual return for the year. This operation is known as taking the geometric mean.

Arithmetic mean is always greater than or equal to the geometric mean. This fact is well known from university math courses.

For example, if the series of returns is 1.0, 1.081, 0.9, 1.02, then the geometric average is 0.998084 and the arithmetic average is 1.00025. In this particular example, the arithmetic average of returns will tell you that you are making money whereas in fact you are losing.

Arithmetic mean equals the geometric mean in the particular case when all elements of the series are equal. The more volatility there is in the series, the more difference between the two means can be expected.

The bottom line is that the results of the research so far should be revisited using the simple return and the Sharpe ratio based on such return instead of the first moment of the series of annualized returns. The first moment gives a biased estimate of the actual return.

Nevertheless, since the main driving logic in the choice of parameters was minimization of the drawdown, the basic conclusions regarding the parameter choices are likely to stay the same.