Several papers by Stephen J Taylor and Brian G Kingsma, “Non-stationarity in Sugar Prices” http://www.jstor.org/pss/3009471 and “An Analysis of the Variance and Distribution of Commodity Price Changes” deal with a method of pre-processing price data which is suitable for many cycles analysis applications.

They note that the most common forms of pre-processing are doing nothing (lambda=1) or taking logs (lambda=0) and have determined a function which provides other values of lambda which are more suitable for particular commodities. This is desirable because many commodities have faster movements at the highs than at the lows, even after a log transformation.

Zlambda(t) = (Z(t)^lambda-1)/lamdba . . . except when lambda=0, then use log(Z(t))

Edward R Dewey was aware that sunspot numbers were skewed, with fast fluctuations at highs than at lows. He realized that taking logs went to the other extreme (even apart from the problem of log(0)) and so used a function:

Z(t) = log(SSN(t)+20)

which solves the worst of the skewness. However this lambda function should do a better job.

In this case I think a few pictures are worth a thousand words.

In the above graph the smaller fluctuations at sunspot lows are evident. However taking logs as below reverses the problem, making the fluctuations larger at the lows.

Dewey solved this problem by adding 20 before taking the logs. That is quite a good approximation to using the lambda function, shown below. In both cases the fluctuations are now about equally large at highs and lows.

When the monthly change of raw sunspot numbers are used the results are heavily modulated by the 11 year sunspot number as seem above. Below we see that after applying a lambda=0.4 function to sunspot numbers the strong 11 year modulation is removed. This data is suitable for looking for shorter term cycles.

The same principle applies to commodities. About 21 years ago I did an analysis of many commodity markets and worked out the best lambda values to use for each, but don’t have those results available.

A word of caution however regarding commodity prices. Commodity prices vary because of two conditions, long term inflation and short term supply and demand. Applying a lamdba function without recognizing this will not get proper results. It is best to remove the inflationary component first (by dividing by a suitable deflator) and then apply the lambda function.

The lambda function is available in CATS – Cycles Analysis & Timeseries Software which is a free package downloadable at http://www.cyclesresearchinstitute.org/cats.html.

OO, err. You may get away with that in economics but this supposed to be science, you can’t do that sort of thing ! 😉

I complain about people doing running average “smoothers” because it distorts the frequency spectrum, I’d have no idea what effect that sort of massaging does to phase or frequency but I would bet it is not benign. Doesn’t it risk shifting the peaks?

I could just about buy a square root if you could convince me SSN was the square of something physical (which it may be) and not just rectified by being insensitive to polarity. But adding an arbitrary const and takng the log !

“Below we see that after applying a lambda=0.4 function to sunspot numbers the strong 11 year modulation is removed. This data is suitable for looking for shorter term cycles.”

But do we have _any_ idea what else is removed or worse, still there but changed in unquantified ways.

It may ‘help’ with processing but how to interpret the result? It’s no longer the spectrum of SSN, what is it the spectrum of ?

Anyway, thanks for these very informative pages and the CAT software. I’ll give it a try.

Greg, it does not move any peaks in the time series data. The only effect is to change the spread of the graph at the high versus low parts. For example log scale is a special case of lambda. Square root is another special case. And so on.

In general, using lambda successfully means that some peaks will diminish or disappear from the spectrum. That is because data in an initially wrong format will have beat frequencies or fringes around major spectral peaks. When the true underlying basis is found, these will disappear.

Sunspot numbers are a pretty random concept to start with. This is just an attempt to get a basis that more accurately reflects some more fundamental underlying physics. If the 155 day cycle can be made to be continuous rather than heavily modulated, then it is a success. Especially if a number of other short term cycles have the same result also.

Regards, Ray

Hi Ray, I’ve been having a look again at spectral content of SSN and the effect of sqrt() are interesting. How do you decide that 0.4 is “best”, what is the test criterion?

I still can’t see a justification for this sort of thing in science but one could conceive a hand-waving justification for SSN as a proxy for some physical quantity and its being related to the square of that quantity. eg SSN is a proxy for total area which is ( possibly ) due to the square of the magnetic field.

If there was some objective reason for doing this processing maybe it could be used as an indication of where to look for the underlying mechanism.

I don’t get anything useful from the link to the sugar paper. Do you have any information about when and why this processing is justified? I suspect this is a mindset that is a bit foreign to the thinking of econometrics, but it is essential in science.

Is there any reason to suggest that rate of change of a variable should NOT be greater when the variable itself is greater. Even a mass on a spring would show that. Unless the HF component is assumed to be independent ‘noise’ that is unrelated to the signal.

regards, Greg Goodman.

Hi Greg, thanks for your thoughts. It is good to look at things deeply.

At one time I did analysis of best lambda function for a bunch of commodities and markets. I can’t find the results of this now. So just some recollections, possibly not 100% accurate.

1. In stock markets the taking of log of price will generally vastly improve the behaviour. That is because equal percentage changes are the most meaningful thing. This is a common practice that seems generally accepted. It is very important for long series spanning decades or centuries. If stock markets are detrended by something like CPI then there does remain some behaviour that is a bit like that described for commodities below. I know that this is so but have never tried to work it thru.

2. In commodities markets it is often the case that the fluctuations in price are even higher percentages at the peaks than the troughs. We can understand this as being due to shortages which cause price peaks. I tiny change in a low supply has a more dramatic effect. There is probably a most desirable expression which might be inverse price in some cases, but I think it was mostly otherwise.

3. Not all commodities are the same, but I think there were a couple of clusters.

4. In exchange rates taking log is generally a good idea. The exchange rate can be done either way (eg. pounds/dollar or dollars/pound). Taking logs makes these equivalent. Also the changes over long periods may accumulate but not as much as for stocks.

Using rate of change will tend to show the shorter period cycles better whereas the raw data will show the longest cycles. Some people have even used second difference for economic data and this can be very good for getting timing of turning points accurately. Timing of turning points is often desirable for economic and commodity data. Some small amount of smoothing may be needed if the results are a bit noisy.

For me the justification of using sqrt(SSN) is very clear. If the shorter term cycles persist over long periods but are highly modulated by the longer time cycles (as is the case), and the transformation removes this factor and makes the shorter cycles rather consistent over time, then this is desirable. My justification for this is the ability to predict accurately. That is the test of whether the method is good or not. It is the aim of science. Sure, we later want to understand why that particular transformation was the right one. But IMO theory comes about as a result of analysis, not the other way around. In physics, too many people think that their models are reality.