When dealing with time series we do not always have the luxury of evenly spaced points in time. Particularly for data from geology and related areas the data sampling may be based on depth of core or some such measurement and is later converted to time as a result of the analysis. There are several problems in processing such data. Firstly techniques are needed for processing irregular data. Secondly, the normal statistics are developed for regular data and so it is difficult to tell whether hypothesis are to be found significant or not.
In CATS (Cycles Analysis Timeseries Software), developed for and available free from CRI (Cycles Research Institute), the first issue can be dealt with by using a suitably short time step resolution and specifying the date of each measurement, and getting CATS to do interpolation. The data will then look like a series with many more values. All the normal manipulations may then be done, even including calculations mixing with other data with similar irregularities and producing graphs etc.
However the statistics of multivariate analysis are generally based on regular independent data, and that is not the case here, so the statistics do not apply. This means that the researcher may be reduced to a “seat of the pants” approach when it comes to statistical tests.
The paper “Comparison of correlation analysis techniques for irregularly sampled time series” by K. Rehfeld, N. Marwan, J. Heitzig, and J. Kurths is aimed at addressing and quantifying this second problem. The paper does address cycles analysis of the data.
Abstract. Geoscientific measurements often provide time series with irregular time sampling, requiring either data reconstruction (interpolation) or sophisticated methods to handle irregular sampling. We compare the linear interpolation technique and different approaches for analyzing the correlation functions and persistence of irregularly sampled time series, as Lomb-Scargle Fourier transformation and kernel-based methods. In a thorough benchmark test we investigate the performance of these techniques.
All methods have comparable root mean square errors (RMSEs) for low skewness of the inter-observation time distribution. For high skewness, very irregular data, interpolation bias and RMSE increase strongly. We find a 40 % lower RMSE for the lag-1 autocorrelation function (ACF) for the Gaussian kernel method vs. the linear interpolation scheme,in the analysis of highly irregular time series. For the cross correlation function (CCF) the RMSE is then lower by 60 %. The application of the Lomb-Scargle technique gave results comparable to the kernel methods for the univariate, but poorer results in the bivariate case. Especially the high-frequency components of the signal, where classical methods show a strong bias in ACF and CCF magnitude, are preserved when using the kernel methods.
We illustrate the performances of interpolation vs. Gaussian kernel method by applying both to paleo-data from four locations, reflecting late Holocene Asian monsoon variability as derived from speleothem δ18O measurements. Cross correlation results are similar for both methods, which we attribute to the long time scales of the common variability. The persistence time (memory) is strongly overestimated when using the standard, interpolation-based, approach. Hence, the Gaussian kernel is a reliable and more robust estimator with significant advantages compared to other techniques and suitable for large scale application to paleo-data.
The full paper in PDF form is available.