High order cumulants

Written by Forex Automaton   
Monday, 30 August 2010 16:30

Cumulants are statistical measures of correlation designed to go to zero whenever any one or more quantities under study become statistically independent of the rest. Cumulants generalize the concept of a correlation measure; in particular, a correlation of two bodies, quantities and so on, the most intuitive one, can be represented and measured by the second-order cumulant. Higher orders can be conceived.

Financial time series come as sequences of bars or "candles", one bar per time step of the series. The bar has an open, close, low and high levels of price. In the liquid market like forex, close, low and high should be sufficient while open is typically not too different from the previous close and is believed to be redundant.

To judge the quality of market predictions, we are interested in multivariate cumulants, since for each of the three essential components of a candle there is a prediction. Because we make predictions for each of these components, the number of variables we would like to correlate is even, and therefore we are interested in even-order cumulants.

The simplest of these is second order cumulant also known as covariance:

c2 = E[x1x2] - E[x1]E[x2] (1)

Here E[] stands for the averaging (a.k.a. expectation) operator.

In general, n-th order cumulant is constructed by making a correlation term E[x1x2...xn] and subtracting all terms which are not "genuinely" n-th order correlations, but are composed of lower order ingredients.

Take for example the daily candle. We can generate predictions for daily changes in low and high, such that the correlation between the real and predicted change for high will be positive. Same for low. If we take a trading position having yesterday's low as a stop-loss and yesterday's high as a profit target, we want to make sure that not only there is a tendency for low not to be hit when the prediction says so (attested to by the positive correlation between the daily change in low and its forecast), and not only there is a tendency for the high to be hit when the prediction says so (attested to by the positive correlation between the daily change in high and its forecast), but that these two things tend to happen "simultaneously" within the same trade. This is the essence of the difference between the "genuine" fourth order correlation and a mere superposition of two second order ones.

Forth order cumulant is defined as:

c4 = E[1,2,3,4]
- E[1,2,3]E[4] - E[1]E[2,3,4] - E[1,3,4]E[2] - E[1,2,4]E[3]
- E[1,2]E[3,4] - E[1,3]E[2,4] - E[1,4]E[2,3]
+ 2(E[1,2]E[3]E[4] + E[1,3]E[2]E[4] + E[1,4]E[2]E[3] + E[2,3]E[1]E[4] + E[2,4]E[1]E[3] + E[3,4]E[1]E[2])
- 6E[1]E[2]E[3]E[4].

Here we use the notation: E[x1x2...] gets replaced by E[1,2...] for the sake of brevity.

What is being subtracted is in fact products of lower order cumulants, which in turn subtract their lower order cumulants, which is why there are terms with both plus and minus sign alternating in a certain order. A recurrence relation exists allowing one to express higher order cumulants in terms of lower order ones.

A cumulant of order higher than 2 will go to zero if any two quantities are proportional to each other:

x1=ax2. (3)


The fact that it will also go to zero whenever any one quantity is statistically independent of the rest, combined with the additivity of cumulants, implies that a higher order cumulant will go to zero whenever any pair of quantities has even a less deterministic, randomized form of that equation:

x1=ax2 + r (4)

where r is a random number independent of x2.

A non-zero higher order cumulant indicates that a relationship between the data is not merely Eq. (4), with its familiar visualization as a diagonally elongated cloud in the x1, x2 space -- even though one may see such a cloud and other signatures of two-point correlations when subjecting higher-order correlated data to a lower order analysis.

To keep the cumulant independent of the units in which the underlying quantities are expressed, we sometimes normalize it:

C4 = c4/(Var[1]Var[2]Var[3]Var[4])1/2, (5)

where Var is variance.

Bookmark with:

Deli.cio.us    Digg    reddit    Facebook    StumbleUpon    Newsvine
Last Updated ( Tuesday, 12 April 2011 12:56 )