Covariance and Covariance Matrix


Covariance is a statistical metric to investigate a relation between two-dimensional data (X_i,Y_i)~(i=1,2,3,\cdots,n). This metric is used for computing correlation coefficient explained in the next page.

The Covariance C(X,Y) can be defined as follows:

\displaystyle Cov(X,Y)=\frac{1}{n}\sum_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y})  

Here, \bar{X} and \bar{Y} are the average of the data sequences (X_i~(i=1,2,3,\cdots,n) and Y_i~(i=1,2,3,\cdots,n)), respectively.

As you can see from this definition the C(X,X) corresponds to the variance V[X], with dataset X_i~(i=1,2,3,\cdots,n).

\displaystyle Cov(X,X)=\frac{1}{n}\sum_{i=1}^n(X_i-\bar{X})^2 = V[X]  

If you have m-dimensional data (X_i^1,X_i^2,\cdots,X_i^m)~(i=1,2,3,\cdots,n), we can compute each pair of m variables, and calculate each covariance. As a result from all the computed covariances, we can construct a m \times m matrix, called Covariance Matrix.

The (p,q) element of the covariance matrix is given by the following expression \Sigma_{pq}:

\displaystyle \Sigma_{pq}=Cov(X^p,Y^q)=\frac{1}{n}\sum_{i=1}^n(X_i^p-\bar{X^p})(X_i^q-\bar{X^q})  

Diagonal elements of the covariance matrix V[X^p] is equal to the variance of each variable X_i^p~(i=1,2,3,\cdots,n).

In fact, the correlation coefficient explained in the next page is more frequently used than this covariance matrix.

Comments are closed.