 ## Markov property of the extremes in the binned random walk time series

User Rating:     / 0
PoorBest
Written by Forex Automaton
Wednesday, 08 September 2010 16:00

Random walk is an important reference process in statistics and its properties have to be studied in financial applications, where hypothetic random walk of price remains an essential component of efficient market theories. Our day and hour time-scale predictive models demonstrated stable positive correlations between reality and prediction for returns taken in the time series of the respective (daily and hourly) lows and highs. However, same property is shown to hold for the simple Brownian motion random walk model. What are the origins and the implications of this effect?

Random walk is the case of maximum possible freedom but you can't get rid of the continuity of price. The picture can be compared with Brownian motion in physics, the famous problem of a particle of pollen colliding with numerous surrounding molecules of water. Even though the velocity of the particle changes unpredictably and is discontinuous, its position is constrained by inertia and undergoes a continuous evolution. Brownian motion is the limiting case of minimum effect of past history on the future. The previous position of the particle, serving as an initial condition to its next move, does limit the possibilities for the particle's future. However this is the absolute minimum for any effect of past on the future, and is, in a sense, the most trivial form of predictability one can observe.

When looking for less trivial forms of predictability, it is important to know the manifestations of this absolute (non-negligible) minimum within the scope of the observation techniques one is using, as it presents what a physicist would call a background.

To simulate the random walk price evolution, we start with price level of 1. The price then undergoes an arithmetic random walk without drift (Wiener process), each successive level pi+1 being

 pi+1 = pi + ri (1)

where ri is an instance of a pseudorandom number distributed normally with 0 mean and sigma 0.00002. After 400 steps of such evolution, the interval is "closed", which means, its lowest, highest, and close levels are recorded to form a "candle" or a bar of data. What time interval is ascribed to this bar is irrelevant, but for the sake of definitiveness we called it a day bar and simulated 7372 such bars. The resulting bar chart is presented in Fig.1. This is a true martingale which is stationary in returns. Fig.1. Chart of the simulated random walk data used in the analysis.

Let's look at the correlations inside the low-high-close triplet. As usual, the quantities we are going to look at are not the actual low, high and close but logarithms of their ratios to the values they had during the previous interval (candle). These are the so-called logarithmic returns.

The two-point temporal correlation analysis deals with pairs of time instances. The time instances are the "close" times of the candles.

 td = t1 - t2 (2)

is the time difference variable plotted along the horizontal axis in the histograms. Here t1 and t2 are two different time instances; the corresponding data can be taken from the same time series or different time series.   Fig.2. Autocorrelations of logarithmic returns in the random walk market: daily low, high, close. Simulated random walk data.

An autocorrelation is a particular case when both 1 and 2 mark the same time series. Therefore, it  is by construction symmetric; the value observed at zero lag is an estimate of variance.

The time series of close shows zero autocorrelation outside the zero lag bin; lows and highs show predictability on the basis of the immediate past. Indeed, if the price follows a trajectory of a Brownian particle, the fact that it expanded its range upwards between time instances 0 and 1 implies, by virtue of continuity of price, that the price is likely to be in the upper part of the bar at close (a statement not to be confused with being able to predict movement of close with respect to the previous close -- that's impossible in  a random walk) and therefore is likely to diffuse further up, thus moving even higher for the next bar. A similar argument can be made for the bar's low.

Thus it appears that in Brownian motion of price, the evolutions of lows and highs are Markov processes.   Fig.3. Correlations among the logarithmic returns in the three components of the day bar. Simulated random walk data.

Logarithmic returns in high, low and close can be correlated among themselves: three pairs can be constructed among these three variables. The resulting correlations are shown in Fig.3. To interpret these plots, recall the definition of the difference variable, Eq.2. In the correlation plots for low and close, high and close, the +1 time lag bin shows content as high as the zero lag bin. Given that index "2" indicates close, this means that whatever happens in close, tends to happen during the next time interval (next bar) in low and high: td = 1 takes place for t2 one unit earlier (smaller) than t1.

On the contrary, whatever happens in high and low, has no effect on close. The asymmetry of these statement is  a direct reflection of the asymmetry of the correlation peak.

Finally, low and high are seen to be positively (not a surprise) correlated, with a unit time lag.

The status of the correlation magnitudes at non-zero lags is intriguing. The dimensionless ratios of the significant non-zero lag correlations to the zero lag ones have to be either determined by the only dimensionless parameter in the problem or to be combinations of fundamental constants such as pi or the Euler's number -- or to be independent constants. The only dimensionless parameter in the problem is the number of "collisions" the Brownian particle experiences between the successive measurements (or the number of ticks for the price during the simulated time interval which constitutes the "candle") -- 400 in the simulation under study. Fig.4. Measurements of correlations between actual and predicted logarithmic returns in the random walk model.

Fig.4 reproduces, in the random walk context, what has been shown many times for the real markets -- an optimization curve used to maximize predictability of low and high, measured by Pearson correlation coefficient between real and predicted logarithmic returns, by varying a system parameter nicknamed Fred. Unlike real-life data used in Danica's optimization, the set of points for close is totally flat and near zero in magnitude. Unlike real life data, the simple random walk sees no difference between high and low (would be strange if it did; the question remains whether the result will hold for a geometric random walk).

For the first time, shown in Fig.4 are the data for a 4-point cumulant between logarithmic returns in high and low and their predicted values. The quantity has never been reported for real life data, but has been looked at and will be reported in the future. For now, suffice it to say that it is positive in real life data, meaning that the model obtains a regime when there is a "correlation" between its stop-loss not being hit and its profit target being hit, if these are set at the past lows and highs as has been discussed numerous times.

The reason the same quantity, the cumulant, is negative in the random walk model, is not clear and the fact is puzzling.

What are the conclusions so far?

Positive correlation coefficients between reality and forecasts for (logarithmic) differences of daily (hourly) highs and lows don't come for free and do prove that the predictive model is working. They don't automatically prove that the market is different from a random walk. Just by observing the positive correlations between predicted and actual returns in daily (Danica) or hourly (Heidi) highest and lowest levels of price, we don't know whether the models take advantage of other forms of predictability in the real-life markets in addition to the minimum one provided by the continuity of price, in order to deliver these positive correlations. We do know that the models are able to take advantage of certain other forms of predictability, should those be present in the data.

Triviality of the "underlying" in case of Black-Scholes/Merton theory does not make the theory's accomplishment -- an option pricing formula -- trivial. The adaptive predictive models like Danica and Heidi are not Black-Scholes solutions in a sense that there is no proof that these models are "solutions" to the problem, even within some idealized ansatz. But to see them doing something similar to the job of Black-Scholes' option pricing formula -- namely, making a statement about future by extrapolating price diffusion into the future -- is reassuring.

The non-trivial aspect of the situation is that the models have no a priori, built-in knowledge of the fact that the process they are dealing with is a continuous evolution of price. They effectively learn (which learning is facilitated by the proper choice of Fred parameter) to treat it as such. We of course do not go that far as to claim that the models use AI to effectively reproduce a Nobel-Prize-winning theory -- yet from the practical point of view, it remains to be seen whether an option pricing algorithm can be built on the basis of a price evolution model such as Danica or Heidi, as opposed to solving a diffusion equation, and thus bypassing the questionable, assumption-laden Black-Scholes theory, and if so, whether the market actions of agents armed with the two respective approaches -- Black-Scholes equation and its black-box equivalent -- will be significantly different.

Last Updated ( Tuesday, 22 March 2011 09:45 )