Progress towards algorithmic implementation of Kelly Criterion

User Rating: / 2
Written by Forex Automaton   
Friday, 26 February 2010 13:49

The "Kelly Criterion" in quant folklore is based on the exposition and development of Kelly's work done by Edward Thorp. Acknowledging Thorp's contribution, I find the original article by Kelly conceptually more relevant to the realities of algorithmic trading as developed by ForexAutomaton. Our forecasting algorithm, as any forecasting tool, can be very naturally considered a case of the hypothetic communication channel discussed by Kelly, and the related mathematical objects, such as joint and conditional probability density distributions for the communicated "symbols" (forecasts and real quotes), are being already monitored here, despite the fact that the Kelly Criterion for capital allocation to trades remains to be coded.

In preparation for the literal implementation of Kelly's prescription in code, I took Kelly's paper "A New Interpretation of Information Rate" published in the Bell System Technical Journal, 35: 917-926, July 1956, and worked through his reasoning. In this post, I elaborate on some elements of the derivation which are abbreviated in the journal publication, make an effort to put the work in the context of financial trading, as opposed to parimutuel betting context used by Kelly, and lay ground for implementation of a capital allocation algorithm.

I use Kelly's notation with minor changes:

  • $u$, the underlying price increment from one trading decision to the next
  • $p(s)$, the probability of outcome s
  • $p(rvert s)$, the (conditional) probability that the forecast is r'th under condition that the actual outcome is s'th
  • $p(s,r)$, the joint probability of s'th outcome and r'th forecast
  • $q(r)$, r'th forecast probability
  • $q(svert r)$, the (conditional) probability that the actual outcome is s'th under condition of r'th forecast
  • $alpha(s)$, amount returned after placing and eventually closing a trade with stop-loss placed so that the maximum loss is 1. This amount is assumed to be non-negative, in other words, we assume that the stop-loss is perfect. When trading financial instruments, $alpha(u)$ is typically piece-wise linear.

  • $a(svert r)$, the fraction of trading capital committed (subjected to the risk of loss) to the outcome s after receiving forecast for outcome r.

$a(svert r)$ is somewhat counter-intuitive since most people do not picture trading, especially in a single market, as betting on a set of outcomes, despite the fact that a set of outcomes is what they actually face.

In spot forex, the full variety of exposures a trader can have to a single market can be expressed by combining a very simple piece-wise linear $alpha(s)$ with an $a(svert r)$ independent of s (but in general, dependent on r). This extends naturally to a portfolio of trades in various markets; $a(svert r)$ would take care of that, but s would number possible multi-market outcomes.

Next, Kelly states that

begin{displaymath} alpha(s) = 1/p(s) end{displaymath} (1)

- so called "fair odds" in parimutuel terminology. Going over from parimutuel to forex (or other financial trading for that matter), if s is linear with price, this condition does not hold in most cases. However,  there is a certain freedom in determining how price is related to s - it does not need to be linear. Whatever relationship is chosen, it must be independent of the trader's decision on the direction of trade (long or short), because that decision depends on the choice of how to relate the signal to price. Fig.1 below indicates that simply using the expected price change (u in our notation) for s does not satisfy -- at least does not always satisfy -- Equation 1.

Kelly quasi-parimutuel take-back in AUD/USD, hour scale trading system under development.

Fig.1. Parimutuel-like "take-back" from hypothetic forex trades in AUD/USD, hour-scale trading system currently under development (very preliminary). The mean of 1.06 implies that with one dollar committed to the stop-loss, one would expect to take-back one dollar and six cents in an hour.  The risk is high as this result is created mostly by the tail outcomes. The peak at zero corresponds to the total loss of the stake. Spreads, commissions are neglected. The distribution can only remotely resemble power law with exponent -1, and if at all, then only on the tail.  The -1 power is required if equation 1 holds and s is linear with price. 

Denoting the initial capital as $V_0$, capital after $N$ trades as $V_N$, the number of times that the forecast is r and the actual outcome is s as $W(s,r)$,

begin{displaymath} V_N = prod_{s,r}[a(svert r) alpha(s)]^{W(s,r)}V_0, end{displaymath} (2)


The exponential capital growth rate, G, is

begin{displaymath} G=lim_{N rightarrow infty} frac{1}{N} logfrac{V_N}{V_0}=sum_{s,r} p(s,r) log{a(svert r)} end{displaymath} (3)

or using Eq.1,

begin{displaymath} G=sum_{s,r} p(s,r)[log(a(svert r))-log(p(s))]. end{displaymath} (4)

Of these two terms, the second one has nothing to do with trading system optimization:

begin{displaymath} sum_{s,r} p(s,r) log(p(s))=sum_{s,r}p(s)p(rvert s) log(p(s)) = sum_s p(s)log(p(s)) end{displaymath} (5)

being Shannon's entropy of the market. It's interesting that entropy of the market seems to help wealth generation, entering the expression for the speed of wealth accumulation with a negative sign, while being in itself negative due to the fact that p(s) is less than 1 for every s, and therefore every logarithm is negative. Now we can see that the path of the least resistance, the one leading to staying with the market (as opposed to beating it), is to "buy and hold":

begin{displaymath} a(svert r)=p(s), end{displaymath} (6)

or to allocate capital with no regard to forecast. Or rather, we know that to buy and hold is the way to be "with the market" - what's interesting is the formal representation of this within the Kelly framework, given by the equation above. Of course, here we completely ignore macroeconomic (or ideological?) considerations, which may be relevant to e.g. stock market, namely that one's nominal wealth may increase as a result of stock market exposure, reflecting the supposed material progress of mankind (or long-term inflationary dilution of the value of fiat money).

The goal is to maximize the rate of growth of capital G by choosing the best way of responding to forecasts, represented in the equations by$a(svert r)$.

Note that we do not assume anything about how good the forecasts are. Since there is not much we can do about market entropy represented by the second term, we maximize

begin{displaymath} sum_{s,r}p(s,r)log{a(svert r)} end{displaymath} (7)

Both $p(s,r)$ and $a(svert r)$ are influenced by the decisions made by the trading system developer. This expression allows one to become mathematically precise about the relationship between the two interrelated components necessary to any operational trading system: the forecasting (market insight, trading strategy) and money management. These are represented by $p(s,r)$ and $a(svert r)$ respectively, and we see that

  1. $p(s,r)$ and $a(svert r)$ enter as a product, which means that any of the two components, forecasting and money management, if it happens to be a failure, can ruin the game

  2. forecasting is more important: its representative, $p(s,r)$, enters the expression directly (linearly), whereas $a(svert r)$ enters through the logarithm (logarithmically). Thus, money management is only logarithmically important, contrary to an opinion ascribed to famous theoretical physicist L.D.Landau in Russian theorist's folklore. [According to the anecdote, one of Landau's students noticed him counting change after paying for lunch at a university cafeteria. The student asked: "L.D., you teach us to make order of magnitude estimates for everything, how come you are counting change?" to which Landau retorted: "Money's in the exponent!"]

Speaking of, $p(s,r)$, or rather, its non-trivial feature of central interest, the correlation between s and r, has been monitored for the Danica trading system via Pearson correlation coefficents. Improving $p(s,r)$ is outside the scope of Kelly's interest: he focuses on finding the best $a(svert r)$. In doing that, we are bound by the normalizations of probabilities:

begin{displaymath} sum_{s}p(s,r) =q(r), end{displaymath} (8)


begin{displaymath} sum_{s}a(svert r) = 1. end{displaymath} (9)

These are the constraints affecting the maximization of G.

Next I am going to elaborate on the maximization (Kelly limits his exposition of the subject to a remark on "convexity of logarithm"). We are maximizing a function of form

begin{displaymath} sum_i x_i log{y_i}, end{displaymath} (10)

subject to the constraints

begin{displaymath} sum_i x_i =x, end{displaymath} (11)


begin{displaymath} sum_i y_i =y end{displaymath} (12)

I am going to use the method of Lagrange multipliers. In this method, a function $Lambda$ is composed

begin{displaymath} Lambda=sum_i x_i log{y_i}+lambda_1left(sum_i x_i-xright)+lambda_2left(sum_i y_i-yright) end{displaymath} (13)

and one requires that the partial derivatives of this function be zero:

begin{displaymath} frac{partialLambda}{partial x_i}=log{y_i}+lambda_1 = 0, end{displaymath} (14)


begin{displaymath} frac{partialLambda}{partial y_i}=frac{x_i}{y_i}+lambda_2 = 0, end{displaymath} (15)


begin{displaymath} frac{partialLambda}{partial lambda_1}=sum_i x_i -x = 0, end{displaymath} (16)



begin{displaymath} frac{partialLambda}{partial lambda_2}=sum_i y_i -y = 0, end{displaymath} (17)

The last two equations, 16 and 17, restate the constraints.

From Eq.15, $y_i= -x_i/lambda_2$, which together with Eq.16 can be used to eliminate $y_i$ and $x_i$ from Eq.17. Then,

begin{displaymath} -lambda_2 = frac{x}{y} =frac{x_i}{y_i} end{displaymath} (18)

$y_i=x_ifrac{y}{x}$ maximizes expression 10, subject to the constraints. That this is a maximum and not a minimum can be seen from the negativeness of the second derivatives (convexity) of the function: indeed, while $x_i$ and $y_i$ are non-negative, $lambda_1$ and $lambda_2$ have no other choice but to be negative.

With this input, we return to Eq.7,8,9 and recognize the pieces: the term written as Eq.7 is maximized by putting

begin{displaymath} a(svert r)=frac{p(svert r)}{sum_k p(k,r)} = frac{p(s,r)}{q(r)}=q(svert r). end{displaymath} (19)

The last transition relies on

begin{displaymath} p(s,r)=q(r)q(svert r), end{displaymath} (20)

which has its roots in the foundations of the probability calculus and does not limit generality.

Note that the only summation in Eq.7 identified with summation over i in Eq.10 when using Lagrange multipliers, was the summation over s.

To restate the main result, Eq.19, in words: the rate of capital growth is maximized by having the allocation of capital to outcomes, given the forecast, match the probability distribution of those outcomes, given the forecast.

In the particular case when Efficient Market Hypothesis is true, the markets are unpredictable (instantly price in all available information), thus

begin{displaymath} q(svert r) = q(s) end{displaymath} (21)

since s and r, reality and forecast, are completely statistically independent.

All of the preceding was a theoretical preparation to answer the following practical question: given the history of forecast and real outcomes (which can be used to construct and keep updating throughout the trading operation an estimator of $q(svert r)$ whose accuracy will improve with time), how do we allocate capital to trades on each step, after the forecast r is received?

The hypothetic Kelly trader would bet all of his capital since all of the outcomes are mutually exclusive and some of them cancel each other in a way that his net position is zero unless r biases q(s) in such a way that $q(svert r)$ is moved in one direction. (Note: provided sufficient amount of past data exist, that won't happen unless the system has a history of being successful under this particular condition!) Then that is the direction of the resulting trade, and the capital allocated to the trade is the remainder position.

The fact that the space of signals is not the space of price movements is immaterial, since whatever Jacobian is needed to transform the differential probability density from one space into the other, will be the same on both sides of Eq.19. Therefore,

begin{displaymath} a(uvert r)=q(uvert r) end{displaymath} (22)


on the basis of Eq.19. Having received the forecast r, the hypothetic Kelly-optimal trader would bet fraction q(u|r) of his capital on the fact that the market would move u units, and so for every possible outcome u. What would be his net position and capital allocation to communicate to broker? It will be

begin{displaymath} v=sum_u Theta(u)q(uvert r) end{displaymath} (23)


where $Theta(u)$ is Heaviside step function: +1 for positive u, -1 for negative u. This expression returns a fraction of capital to be allocated with a sign, where plus sign corresponds to long and minus sign to short position. Spreads and capital costs are ignored.

A hypothetic case of Gaussian returns with standard deviation $sigma$ and mean $mu$ is easy to consider analytically:

begin{displaymath} v=-F(0;mu,sigma)+(1-F(0;mu,sigma))=1-2F(0;mu,sigma)=mbox{erf}left(frac{mu}{sigmasqrt{2}} right) end{displaymath} (24)

where $F(x;mu,sigma)$ is the cumulative form of Gaussian distribution and erf is the error function. It's noteworthy that in this special case, Kelly allocation would depend only on the Sharpe ratio $mu/sigma$. In the good years of the S&P500, with the Sharpe ratio of 0.3 or so, a Kelly investor would risk erf(0.2) which is about 0.2 (20%) of his trading capital as a stop-loss set on S&P500, and keep the rest of it in "riskless" assets. Note that in this framework, by risk we mean the amount committed to the potential stop-loss in the quasi-parimutuel way. Then what would be the actual position and leverage of such an investor? The only answer is: such that Eq.1 holds, or as close to that as possible. Fig.1 illustrates that Eq.1 is unlikely to hold exactly. I mark this as an open issue with Kelly framework. 

Extension to a portfolio of multiple markets is somewhat complicated by the fact that in financial markets, you can not bet on a complex outcome (such as EUR/USD down, USD/JPY up, ...) with the same dollar. The solution I see for N markets is to split the capital into N equal pieces. The point being, there is no reason to say that e.g. USD/JPY is a priori worse a market than EUR/USD, although in the context of a particular forecast r such a statement can make a perfect sense!) This consideration makes the pieces equal a priori. Now, within each piece, the same process as just described for the single market, takes place. The money allocation system is effectively a "multi-channel" one, each "channel" dedicated to a particular market and sharing same fraction of the trading capital.

Similar to the forecasting system which may consider markets jointly or in isolation, the Kelly capital allocation component can in principle, given enough data, consider implications of a forecast for market i on the capital allocation for market j and vice versa.

Bookmark with:    Digg    reddit    Facebook    StumbleUpon    Newsvine
Last Updated ( Saturday, 25 February 2012 12:19 )