Correlation Coefficient

Pocket

The Correlation Coefficient) is a metric that measures the relation between two dimensional datasets such as height and weight.

There are several types of correlation coefficient, among them the most widely used is called the Pearson’s correlation coefficient.

Let us consider the two dimensional datasets (X_i,Y_i)~(i=1,2,3,\cdots,n). These datasets may represents the data for Height (X) and Weight (Y) in a population, or scores in Mathematics (X) and English (Y) subjects by group of students.

Then, the correlation coefficient C(X,Y) is given by

\displaystyle C(X,Y)=\frac{1}{n}\frac{\sum_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y})}{\sigma_X\sigma_Y}

 

Here, \bar{X} and \bar{Y} denote the average of the dataset X_i~(i=1,2,3,\cdots,n) and Y_i~(i=1,2,3,\cdots,n), respectively. In addition, \sigma_X and \sigma_Y is the standard deviation of dataset X_i~(i=1,2,3,\cdots,n) and Y_i~(i=1,2,3,\cdots,n), respectively.

From the definition of the Correlation Coefficient C(X,Y), we can show that -1\le C(X,Y)\le 1.

When the Correlation Coefficient is close to 1, it implies that two datasets are strongly positively correlated. On the other hand,
when the Correlation coefficient is close to -1, it denotes that two datasets are strongly negatively correlated. In case that the Correlation Coefficient is close to zero, there is no significant relationship or correlation between both datasets.

to verify if the two datasets are significantly correlated or not, we have to pay attention to the size of datasets n.

For example, even though the Correlation Coefficient is 0.6, sometimes there is real correlation but in other cases the correlation may not be significant enough due to small size of datasets.

In general, by increasing the dataset size, the correlation coefficient becomes more reliable.

This issue will be explained in more detail in the statistical test section.

Comments are closed.