Unveiling Causal Relationships in Time Series Data
Stavros Stavroglou, Athanasios Pantelous, Hui Wang
Source:vignettes/series.Rmd
series.Rmd
This vignette demonstrates advanced techniques for examining causal
relationships between time series using the
patterncausality
package. We will focus on three key
aspects:
- Cross-validation methods: To rigorously assess the robustness of our findings, ensuring they are not mere artifacts of the data.
- Parameter optimization: To fine-tune our analysis for the most accurate and reliable results.
- Visualization of causality relationships: To provide clear and intuitive insights into the causal connections between time series.
Through cross-validation, we aim to understand:
- Reliability of results: How dependable are our conclusions?
- Robustness across different sample sizes: Do our findings hold true regardless of the amount of data used?
- Stability of causality patterns: Are the identified causal relationships consistent over time and across different data subsets?
Cross-Validation: Ensuring the Reliability of Causal Inference
To demonstrate the application of cross-validation, we will begin by
importing a climate dataset from the patterncausality
package.
library(patterncausality)
#> Error in get(paste0(generic, ".", class), envir = get_method_env()) :
#> object 'type_sum.accel' not found
data(climate_indices)
Now, let’s apply cross-validation to evaluate the robustness of pattern causality. We will use the Pacific North American (PNA) and North Atlantic Oscillation (NAO) climate indices as our example time series.
set.seed(123)
X <- climate_indices$PNA
Y <- climate_indices$NAO
result <- pcCrossValidation(
X = X,
Y = Y,
numberset = c(100, 200, 300, 400, 500),
E = 3,
tau = 2,
metric = "euclidean",
h = 1,
weighted = FALSE
)
print(result$results)
#> , , positive
#>
#> value
#> 100 0.4444444
#> 200 0.3157895
#> 300 0.2739726
#> 400 0.2912621
#> 500 0.2812500
#>
#> , , negative
#>
#> value
#> 100 0.1666667
#> 200 0.0877193
#> 300 0.3424658
#> 400 0.2330097
#> 500 0.2500000
#>
#> , , dark
#>
#> value
#> 100 0.3888889
#> 200 0.5964912
#> 300 0.3835616
#> 400 0.4757282
#> 500 0.4687500
To better visualize the results, we will use the plot
function to generate a line chart.
plot(result)
As you can see from the plot, the location of the causality tends to stabilize as the sample size increases. This indicates that our method is effective at capturing the underlying patterns and causal connections within the time series.
In this tutorial, you’ve learned how to use cross-validation to assess the reliability of time series causality and how to use visualization tools to better understand the results.
Cross-Validation: Convergence of Pattern Causality
Now, let’s examine the cross-validation process when the
random
parameter is set to FALSE
. This
approach uses a systematic sampling method rather than random
sampling.
set.seed(123)
X <- climate_indices$PNA
Y <- climate_indices$NAO
result_non_random <- pcCrossValidation(
X = X,
Y = Y,
numberset = c(100, 200, 300, 400, 500),
E = 3,
tau = 2,
metric = "euclidean",
h = 1,
weighted = FALSE,
random = FALSE
)
print(result_non_random$results)
#> , , positive
#>
#> value
#> 100 0.2941176
#> 200 0.2400000
#> 300 0.2972973
#> 400 0.2692308
#> 500 0.3000000
#>
#> , , negative
#>
#> value
#> 100 0.1764706
#> 200 0.3200000
#> 300 0.3108108
#> 400 0.2596154
#> 500 0.2307692
#>
#> , , dark
#>
#> value
#> 100 0.5294118
#> 200 0.4400000
#> 300 0.3918919
#> 400 0.4711538
#> 500 0.4692308
We can also visualize the results of the non-random cross-validation:
plot(result_non_random)
By comparing the results of the random and non-random cross-validation, you can gain a deeper understanding of how different sampling methods affect the stability and reliability of the causality analysis.
Cross-Validation with Bootstrap Analysis
To obtain more robust results and understand the uncertainty in our causality measures, we can use bootstrap sampling in our cross-validation analysis. This approach repeatedly samples the data with replacement and provides statistical summaries of the causality measures.
set.seed(123)
X <- climate_indices$PNA
Y <- climate_indices$NAO
result_boot <- pcCrossValidation(
X = X,
Y = Y,
numberset = c(100, 200, 300, 400, 500),
E = 3,
tau = 2,
metric = "euclidean",
h = 1,
weighted = FALSE,
random = TRUE,
bootstrap = 100 # Perform 100 bootstrap iterations
)
The bootstrap analysis provides several statistical measures for each sample size: - Mean: Average causality measure across bootstrap samples - 5% and 95% quantiles: Confidence intervals for the causality measure - Median: Central tendency measure robust to outliers
Let’s examine the results:
print(result_boot$results)
#> , , positive
#>
#> mean 5% 95% median
#> 100 0.3052773 0.05924242 0.5849359 0.2857143
#> 200 0.2935589 0.13307692 0.5120465 0.2752525
#> 300 0.3149741 0.16566667 0.4890152 0.3153707
#> 400 0.2775644 0.13495733 0.4466071 0.2680697
#> 500 0.2958245 0.16400752 0.4295309 0.2964772
#>
#> , , negative
#>
#> mean 5% 95% median
#> 100 0.3004937 0.07692308 0.5626359 0.2886905
#> 200 0.3068532 0.10907115 0.5131054 0.3086168
#> 300 0.2915992 0.14277950 0.4883413 0.2774431
#> 400 0.3238175 0.18165584 0.4521804 0.3226190
#> 500 0.3115941 0.19348760 0.4747401 0.3079832
#>
#> , , dark
#>
#> mean 5% 95% median
#> 100 0.3942290 0.2105263 0.5722689 0.4000000
#> 200 0.3995879 0.2821146 0.5356906 0.3953488
#> 300 0.3934267 0.3094797 0.4777146 0.3947140
#> 400 0.3986181 0.3194255 0.4805484 0.3917306
#> 500 0.3925814 0.3239467 0.4662068 0.3876275
We can visualize the bootstrap results using the plot function, which now shows confidence intervals:
plot(result_boot, separate = TRUE)
The shaded area in the plot represents the range between the 5th and 95th percentiles of the bootstrap samples, providing a measure of uncertainty in our causality estimates. The solid line shows the median value, which is more robust to outliers than the mean.