Title: | Estimating Copula Entropy and Transfer Entropy |
---|---|
Description: | The nonparametric methods for estimating copula entropy, transfer entropy, and the statistics for multivariate normality test and two-sample test are implemented. The methods for estimating transfer entropy and the statistics for multivariate normality test and two-sample test are based on the method for estimating copula entropy. The method for change point detection with copula entropy based two-sample test is also implemented. Please refer to Ma and Sun (2011) <doi:10.1016/S1007-0214(11)70008-6>, Ma (2019) <doi:10.48550/arXiv.1910.04375>, Ma (2022) <doi:10.48550/arXiv.2206.05956>, Ma (2023) <doi:10.48550/arXiv.2307.07247>, and Ma (2024) <doi:10.48550/arXiv.2403.07892> for more information. |
Authors: | MA Jian [aut, cre] |
Maintainer: | MA Jian <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.5 |
Built: | 2025-02-28 05:47:19 UTC |
Source: | https://github.com/majianthu/copent |
Testing conditional independence between (x,y) conditional on z with copula entropy.
ci(x,y,z,k=3,dt=2)
ci(x,y,z,k=3,dt=2)
x |
the data with 1 row |
y |
the data with 1 row |
z |
the data with 1 row |
k |
kth nearest neighbour, default = 3 |
dt |
the type of distance between samples, 1 for Eclidean distance; 2 for Maximum distance |
This program involves testing conditional independence between (x,y) conditional on z with copula entropy nonparametrically. It was proposed in Ma (2019).
The algorithm composes of two simple steps: estimating three copula entropy terms with copent
and then calculate the test statistic.
The argument x,y,z are for the data with 1 row and same length as samples from random variables. The argument k and dt is used in the kNN method for estimating entropy. k is for the kth nearest neighbour (default = 3) and dt is for the type of distance between samples which has currently two value options (1 for Eclidean distance, and 2(default) for Maximum distance).
The function returns the value of the test statistic of conditional independence.
Ma, Jian. Estimating Transfer Entropy via Copula Entropy. arXiv preprint arXiv:1910.04375, 2019.
library(copent) library(mnormt) rho1 <- 0.5 rho2 <- 0.6 rho3 <- 0.5 sigma <- matrix(c(1,rho1,rho2,rho1,1,rho3,rho2,rho3,1),3,3) x <- rmnorm(500,c(0,0,0),sigma) ci1 <- ci(x[,1],x[,2],x[,3])
library(copent) library(mnormt) rho1 <- 0.5 rho2 <- 0.6 rho3 <- 0.5 sigma <- matrix(c(1,rho1,rho2,rho1,1,rho3,rho2,rho3,1),3,3) x <- rmnorm(500,c(0,0,0),sigma) ci1 <- ci(x[,1],x[,2],x[,3])
Construct empirical copula by rank statistic.
construct_empirical_copula(x)
construct_empirical_copula(x)
x |
the data with each row as a sample |
This program involves estimating empirical copula from data by rank statistic nonparametrically. It was proposed in Ma and Sun (2008, 2011). The algorithm is the first step of estimating copula entropy copent
.
The argument x is for the data with each row as a sample from random variables.
The function returns the estimated empirical copula of data x.
Ma, J., & Sun, Z. (2011). Mutual information is copula entropy. Tsinghua Science & Technology, 16(1): 51-54. See also ArXiv preprint, arXiv: 0808.0845, 2008.
library(mnormt) rho <- 0.5 sigma <- matrix(c(1,rho,rho,1),2,2) x <- rmnorm(500,c(0,0),sigma) xc1 <- construct_empirical_copula(x)
library(mnormt) rho <- 0.5 sigma <- matrix(c(1,rho,rho,1),2,2) x <- rmnorm(500,c(0,0),sigma) xc1 <- construct_empirical_copula(x)
Estimating copula entropy nonparametrically.
copent(x,k=3,dt=2)
copent(x,k=3,dt=2)
x |
data with each row as a sample |
k |
kth nearest neighbour, default = 3 |
dt |
the type of distance between samples, 1 for Eclidean distance; 2 for Maximum distance |
This program involves estimating copula entropy from data nonparametrically. It was proposed in Ma and Sun (2008, 2011).
The algorithm composes of two simple steps: estimating empirical copula by rank statistic using construct_empirical_copula
and then estimating copula entropy with kNN method using entknn
proposed in Kraskov et al (2004).
The argument x is for the data with each row as a sample from random variables. The argument k and dt is used in the kNN method for estimating entropy. k is for the kth nearest neighbour (default = 3) and dt is for the type of distance between samples which has currently two value options (1 for Eclidean distance, and 2(default) for Maximum distance).
Copula Entropy is proved to be equivalent to negative mutual information so this program can also be used to estimate multivariate mutual information.
The function returns negative value of copula entropy of data x.
Ma, J., & Sun, Z. (2011). Mutual information is copula entropy. Tsinghua Science & Technology, 16(1): 51-54. See also arXiv preprint arXiv:0808.0845, 2008.
Kraskov, A., St\"ogbauer, H., & Grassberger, P. (2004). Estimating Mutual Information. Physical Review E, 69(6), 66138.
library(mnormt) rho <- 0.5 sigma <- matrix(c(1,rho,rho,1),2,2) x <- rmnorm(500,c(0,0),sigma) ce1 <- copent(x,3,2)
library(mnormt) rho <- 0.5 sigma <- matrix(c(1,rho,rho,1),2,2) x <- rmnorm(500,c(0,0),sigma) ce1 <- copent(x,3,2)
Single change point detection with copula entropy based two-sample test.
cpd(x,thd=0.13,n=15,k=3,dt=2,ncores=0)
cpd(x,thd=0.13,n=15,k=3,dt=2,ncores=0)
x |
data with each row as a sample of d-dimensional random variables |
thd |
threshold of the statistic of two-sample test for detecting a change point, default = 0.13 |
n |
the argument used by two-sample test, default = 15 |
k |
kth nearest neighbour, default = 3 |
dt |
the type of distance between samples, 1 for Eclidean distance; 2 for Maximum distance. default = 2 |
ncores |
number of cores to be used for parallel computing, default = 0 for all the cores |
This program involves detecting single change point in univariate or multivariate time series data with copula entropy based two-sample test. It was proposed in Ma (2024), in which a group of two-sample tests are performed on time series data and the change point is considered to be associated with the maximum of the statistics of all the tests.
The argument x is for the data with each row as a sample of d-dimensional random variables. The argument thd is for the threshold of the statistic of two-sample test for detecting a change point. If the maximum of the statistics of all the two-sample tests is below the threshold, no change point is detected. The argument n is the argument used by the two-sample test function tst
. The argument k and dt is used in the kNN method for estimating entropy. k is for the kth nearest neighbour (default = 3) and dt is for the type of distance between samples which has currently two value options (1 for Eclidean distance, and 2(default) for Maximum distance). The argument ncores is for the number of cores to be used for parallel computing. If the default 0 is used, then all the cores will be used.
The function returns a list containing
stats |
the estimated statistics of all the two-sample tests |
maxstat |
the maximum of the estimated statistics |
pos |
the change point detected |
Ma, Jian. Change Point Detection with Copula Entropy based Two-Sample Test. arXiv preprint arXiv:2403.07892, 2024.
x = c(rnorm(15,0,1),rnorm(15,0,10)) cpd(x,thd=0.15,ncores=2)
x = c(rnorm(15,0,1),rnorm(15,0,10)) cpd(x,thd=0.15,ncores=2)
Estimating entropy from data with kNN method.
entknn(x,k=3,dt=2)
entknn(x,k=3,dt=2)
x |
the data with each row as a sample |
k |
kth nearest neighbour, default = 3 |
dt |
the type of distance between samples, = 1 for Eclidean distance; other for Maximum distance |
This program involves estimating entropy from data by kNN method. It was proposed in Kraskov et al (2004). The algorithm is the second step of estimating copula entropy copent
.
The argument x is for the data with each row as a sample from random variables. The argument k and dt is used in the kNN method for estimating entropy. k is for the kth nearest neighbour (default = 3) and dt is for the type of distance between samples which has currently two value options (1 for Eclidean distance, and 2(default) for Maximum distance).
The function returns the estimated entropy value of data x.
Kraskov, A., St\"ogbauer, H., & Grassberger, P. (2004). Estimating Mutual Information. Physical Review E, 69(6), 66138.
library(mnormt) rho <- 0.5 sigma <- matrix(c(1,rho,rho,1),2,2) x <- rmnorm(500,c(0,0),sigma) xent1 <- entknn(x)
library(mnormt) rho <- 0.5 sigma <- matrix(c(1,rho,rho,1),2,2) x <- rmnorm(500,c(0,0),sigma) xent1 <- entknn(x)
Multiple change point detection with copula entropy based two-sample test.
mcpd(x,maxp=5,thd=0.13,minseglen=10,n=15,k=3,dt=2,ncores=0)
mcpd(x,maxp=5,thd=0.13,minseglen=10,n=15,k=3,dt=2,ncores=0)
x |
data with each row as a sample of d-dimensional random variables |
maxp |
maximal number of change points, default = 5 |
thd |
threshold of the statistic of two-sample test for detecting change points, default = 0.13 |
minseglen |
minimal length of binary segmentation, default = 10 |
n |
the parameter used by two-sample test, default = 15 |
k |
kth nearest neighbour, default = 3 |
dt |
the type of distance between samples, 1 for Eclidean distance; 2 for Maximum distance |
ncores |
number of cores to be used for parallel computing, default = 0 for all the cores |
This program involves detecting multiple change points in univariate or multivariate time series data with copula entropy based two-sample test. It was proposed in Ma (2024). The method is a combination of binary segmentation and single change point detection implemented in cpd
.
The argument x is for the data with each row as a sample of d-dimensional random variables. The argument maxp is for the maximal number of change points. The argument thd is for the threshold of the statistic of two-sample test for detecting a change point used in cpd
. The argument minseglen is for the minimal length of each segment in binary segmentation. If the length of a segment is shorter than minseglen, then no detection will be performed on the segment. The argument k and dt is used in the kNN method for estimating entropy. k is for the kth nearest neighbour (default = 3) and dt is for the type of distance between samples which has currently two value options (1 for Eclidean distance, and 2(default) for Maximum distance). The argument ncores is for the number of cores to be used for parallel computing. If the default 0 is used, then all the cores will be used.
The function returns a list containing
maxstat |
the maximal statistics of the detected change points |
pos |
the change points detected |
Ma, Jian. Change Point Detection with Copula Entropy based Two-Sample Test. arXiv preprint arXiv:2403.07892, 2024.
x = c(rnorm(15,0,1),rnorm(10,0,10),rnorm(10,0,1)) mcpd(x,thd=0.15,ncores=2)
x = c(rnorm(15,0,1),rnorm(10,0,10),rnorm(10,0,1)) mcpd(x,thd=0.15,ncores=2)
Estimating the statistic for testing multivariate normality based on copula entropy.
mvnt(x,k=3,dt=2)
mvnt(x,k=3,dt=2)
x |
data with each row as a sample of d-dimensional random variables |
k |
kth nearest neighbour, default = 3 |
dt |
the type of distance between samples, 1 for Eclidean distance; 2 for Maximum distance |
This program involves estimating the statistic for testing multivariate normality based on copula entropy. It was proposed in Ma (2022). The test statistic is defined as the difference between the copula entropies of unknown distribution and the Gaussian distribution with same covariance.
The argument x is for the data with each row as a sample of d-dimensional random variables. The argument k and dt is used in the kNN method for estimating entropy. k is for the kth nearest neighbour (default = 3) and dt is for the type of distance between samples which has currently two value options (1 for Eclidean distance, and 2(default) for Maximum distance).
The function returns the statistic for testing multivariate normality of x.
Ma, Jian. Multivariate Normality Test with Copula Entropy. arXiv preprint arXiv:2206.05956, 2022.
library(mnormt) rho <- 0.5 sigma <- matrix(c(1,rho,rho,1),2,2) x <- rmnorm(1000,c(0,0),sigma) mvnt(x)
library(mnormt) rho <- 0.5 sigma <- matrix(c(1,rho,rho,1),2,2) x <- rmnorm(1000,c(0,0),sigma) mvnt(x)
Estimating transfer entropy via copula entropy nonparametrically.
transent(x,y,lag=1,k=3,dt=2)
transent(x,y,lag=1,k=3,dt=2)
x |
data with 1 row |
y |
data with 1 row |
lag |
time lag, >0 |
k |
kth nearest neighbour, default = 3 |
dt |
the type of distance between samples, 1 for Eclidean distance; 2 for Maximum distance |
This program involves estimating transfer entropy from y to x with time lag lag via copula entropy nonparametrically. It was proposed in Ma (2019).
The algorithm first prepare the data according to lag, and then call ci
for conditional independence testing.
The argument x,y are for the data with 1 row as samples from random variables. The argument lag is for time lag. The argument k and dt is used in the kNN method for estimating entropy. k is for the kth nearest neighbour (default = 3) and dt is for the type of distance between samples which has currently two value options (1 for Eclidean distance, and 2(default) for Maximum distance).
The function returns the value of transfer entropy from y to x with time lag lag.
Ma, Jian. Estimating Transfer Entropy via Copula Entropy. arXiv preprint arXiv:1910.04375, 2019.
library(copent) num = 300 x = rnorm(num) y = rnorm(num) transent(y,x,2)
library(copent) num = 300 x = rnorm(num) y = rnorm(num) transent(y,x,2)
Estimating the statistic for two-sample test based on copula entropy.
tst(s0,s1,n=12,k=3,dt=2)
tst(s0,s1,n=12,k=3,dt=2)
s0 , s1
|
two samples with each row as a sample of d-dimensional random variables |
n |
repeat time of estimation to reduce estimation bias, default = 12 |
k |
kth nearest neighbour, default = 3 |
dt |
the type of distance between samples, 1 for Eclidean distance; 2 for Maximum distance |
This program involves estimating the statistic for non-parametric multivariate two-sample test based on copula entropy. It was proposed in Ma (2023). The test statistic is defined as the difference between the copula entropies of the null hypothesis and the alternative of two-sample test.
The argument s0,s1 is for the two samples with each row as a sample of d-dimensional random variables. The argument n is the repeat time of estimation for reducing the estimation bias (dafault = 12). The argument k and dt is used in the kNN method for estimating entropy. k is for the kth nearest neighbour (default = 3) and dt is for the type of distance between samples which has currently two value options (1 for Eclidean distance, and 2(default) for Maximum distance).
The function returns the statistic for two-sample test on s0,s1.
Ma, Jian. Two-Sample Test with Copula Entropy. arXiv preprint arXiv:2307.07247, 2023.
library(mnormt) rho <- 0.5 sigma <- matrix(c(1,rho,rho,1),2,2) s0 <- rmnorm(400,c(0,0),sigma) s1 <- rmnorm(500,c(5,5),sigma) tst(s0,s1)
library(mnormt) rho <- 0.5 sigma <- matrix(c(1,rho,rho,1),2,2) s0 <- rmnorm(400,c(0,0),sigma) s1 <- rmnorm(500,c(5,5),sigma) tst(s0,s1)