CrunchingAlphaα

Pearson's Correlation Coefficient — Formulae, Interpretation [Worked Examples]

by Jack Bodeley on September 14, 2021

Pearson's Coefficient of Correlation, or Pearsonian Correlation coefficient, is a mathematical method of measuring the intensity or magnitude of linear relationship between two variables as suggested by Karl Pearson (1867 - 1936), a British Biometrician and Statician.

It is by far the most widely used method of measuring correlation in practice today.

A correlation coefficient is merely a mathematical relationship, this has nothing to do with cause and effect relation.

Correlation is a measure of the degree of relatedness.

It measures the association between two or more variables. When movement in one variable tends to be accompanied by corresponding movements in other variables, they are said to be correlated. Correlation can be positive or negative, it can be linear or curvilinear, and it can also be simple, partial or multiple.

Formulae

$$r_{xy}=\frac{n\sum xy-\sum x\sum y}{\sqrt{n\sum x^2-(\sum x)^2}\times \sqrt{n\sum y^2-(\sum y)^2}}$$

The formula measures Pearsonian Correlation Coefficient between two variables $x$ and $y$ usually denoted $r(x,y)$ or $r_{xy}$ or simply $r$.

It is a numerical measure of the linear relationship between the two variables.

This relationship can also be defined by the ratio of the covariance between $x$ and $y$, to the product of the standard deviations of $x$ and $y$.

$$r_{xy}=\frac{Cov(x, y)}{\sigma_x\sigma_y}$$

In a bivariate distribution where;

$$Cov(x, y)=\frac{\sum(x-\bar{x})(y-\bar{y})}{n}$$

$$\sigma_x=\sqrt\frac{\sum(x-\bar{x})^2}{n}$$

$$\sigma_y=\sqrt\frac{\sum(y-\bar{y})^2}{n}$$

Properties

The important properties of Pearsonian Correlation Coefficient are:

  1. Pearsonian Correlation Coefficient cannot exceed 1 numerically i.e. it always lies between -1 and +1, that is $-1\le r\le1$. Any value of $r$ lying out of these limits is incorrect.
  2. Pearsonian Correlation Coefficient is independent of the change of origin and scale. Given variables $x$ and $y$, for instance, if these are mathematically transformed to new variables $u$ and $v$ by change of origin and scale i.e. $$u=\frac{x-a}{h}$$ $$and;$$ $$v=\frac{y-b}{k}$$ where $a$, $h$, $b$ and $k$ are constants, $h\lt0$ and $k\gt0$, then the correlation coefficient between $u$ and $v$ is the same $r_{xy}=r_{uv} $. This is one of the most important properties of correlation coefficient and is extremely helpful numerical computation of $r$.
  3. If two variables are independent they are uncorrelated but the converse need not necessarily be true i.e. uncorrelated variables need not necessarily be independent. Uncorrelation between two variables $x$ and $y$ i.e. $r_{xy}=0$, implies abscence of a linear relationship but they may be related quadratically, logarithmically or trigonomically.
  4. Pearsonian Correlation Coefficient is the geometric mean of the two regression coefficients i.e. $$r_{xy}=\pm\sqrt{b_{xy}\times b_{yx}}$$ Both regression coefficients will either be positive or negative.
  5. The square of a Pearsonian Correlation Coefficient is known as the Coefficient of Determination. It measures the percentage variation in the dependent variable that is accounted for by the independent variable—this is useful in interpreting the value of $r$.

Interpretation

Positive values of $r$ indicates positive correlation, negative values of $r$ indicate negative correlation, whereas $r=0$ indicates absence of correlation.

The degree of correlation corresponding to various values of $r$ can be summed up as follows:

Value of $r$Degree of Correlation
$\pm1$perfect correlation
$\pm0.9$very high correlation
$\pm0.75$sufficiently high correlation
$\pm0.6$moderate correlation
$\pm0.3$possible correlation
$0$absence of correlation

Probable Error of Correlation Coefficient $(PE_r)$

After obtaining $r$, we want to find out the dependability or reliability of the coefficient.

Probable error of the correlation coefficient ($PE_r$) measures the reliability of obtained correlation coefficients. Generally, this is done by considering whether the conditions of random sampling are satisfied as follows:

$$PE_r=0.6745~SE_r$$

$$or;$$

$$PE_r=0.6745\frac{1-r^2}{\sqrt{n}}$$

Importance of Probable Error

The probable error $(PE_r)$ is used in the determination of limits. The limits of the population correlation coefficient are $r\pm PE_r$, that means that if we take another random sample of size $n$ from the same population, the sample correlation coefficient of the second sample will be within the determined limits, with 0.5 probability. However, the smaller the sample size, the highher the probability of inaccuracy. Ideally, the sample size $n$ should be fairly large.

Interpretation of Probable Error

The interpretation of $r$ based on $(PE_r)$ is as follows:

Value of $r$Correlation
$\lt PE_r$insignificant correlation
$\gt 6\times PE_r$significant correlation

If $PE_r$ is too small, correlation exists where $r\gt0.5$

Worked Examples

Example 1

Find the Pearsonian Correlation Coefficient between volatility and prices of the following 10 stocks and interprete it:

Stock12345678910
Volatility50505560656565606050
Price11131416161515141313

Solution 1

StockVolatility $(x)$Price $(y)$$(x-\bar{x})^2$$(y-\bar{y})^2$$(x-\bar{x})(y-\bar{y})$
1501164924
250136418
35514900
46016444
5651649414
665154917
765154917
86014400
9601341-2
1050136418
Total5801403602270

$$\bar{x}=\frac{\sum~x}{n}=\frac{580}{10}=58$$

$$\bar{y}=\frac{\sum~y}{n}-\frac{140}{10}=14$$

Pearsonian correlation coefficient:

$$\sigma_x=\sqrt\frac{\sum(x-\bar{x})^2}{n}=\sqrt\frac{360}{10}=6$$

$$\sigma_y=\sqrt\frac{\sum(y-\bar{y})^2}{n}=\sqrt\frac{22}{10}\approx1.4832$$

$$Cov(x, y)=\frac{\sum(x-\bar{x})(y-\bar{y})}{n}=\frac{70}{10}=7$$

$$r_{xy}=\frac{Cov(x, y)}{\sigma_x\sigma_y}=\frac{7}{6\times 1.4832}\approx0.7866$$

Interpretation: There is a high positive correlation between volatility and price.

Example 2

The data on price and quantity of a commodity in a market for 5 months is given below:

MonthJanFebMarAprMay
Price1010111212
Quantity56433
  1. Find the Pearsonian correlation coefficient between price and quantity and comment on its sign and magnitude.

Solution 2.1

Price $(x)$Quantity $(y)$$x^2$$y^2$$xy$
1051002550
1061003660
1141211644
123144936
123144936
Total = 552160995226

Pearsonian correlation coefficient:

$$r_{xy}=\frac{n\sum xy-\sum x\sum y}{\sqrt{n\sum x^2-(\sum x)^2}\times \sqrt{n\sum y^2-(\sum y)^2}}$$

$$r_{xy}=\frac{5(226)-55(21)}{\sqrt{5(609)-(55)^2}\times \sqrt{5(95)-(21)^2}}$$

$$r_{xy}\approx-0.9587$$

Comment: The negative sign of $r$ indicates a negative correlation between price and quantity. The magnitude of -0.9587 indicates a very high negative correlation.

Example 3

Consider the following series of scores obtained by two teams:

Teams
A45706530904050758560
B35907040954060808050
  1. Find the Pearsonian correlation coefficient
  2. Find the probable error and interprete it

Solution 3.1

A $(x)$B $(y)$$u$$v$$u^2$$v^2$$uv$
45301-214-2
709064361624
6570512515
3040-2-1412
9095104.510020.2545
40400-1010
506021412
75807349921
85809381927
6050401600
Total4212.531662.25118

$$u=\frac{x-a}{h}$$ $$and;$$ $$v=\frac{y-b}{k}$$

If we assume that $a=40$, $h=5$, $b=50$ and $k=10$, then:

$$u=\frac{x-40}{5}$$

$$v=\frac{y-50}{10}$$

Pearsonian correlation coefficient:

$$r_{xy}=r_{uv}=\frac{n\sum uv-\sum u\sum v}{\sqrt{n\sum u^2-(\sum u)^2}\times \sqrt{n\sum v^2-(\sum v)^2}}$$

$$r_{uv}=\frac{10(118)-42(12.5)}{\sqrt{10(316)-(42)^2}\times \sqrt{10(62.25)-(12.5)^2}}$$

$$r_{uv}\approx0.8119$$

This is a high positive correlation.

Solution 3.2

$$PE_r=0.6745\frac{1-r^2}{\sqrt{n}}$$

$$PE_r=0.6745\frac{1-0.8119^2}{\sqrt{10}}$$

$$PE_r\approx0.0727$$

Interpretation: Because $r_{uv}=0.8119$ is greater than six times the $PE_r$ (0.4362), the correlation is significant.