# How to calculate the Pearson’s Correlation coefficient between two datasets in python ?

Examples of how to calculate the Pearson’s Correlation coefficient between two datasets in python:

### Create a dataset

Let's first create some data:

````import numpy as np`

`def f(a,b,c,X):`
`        eps = c * np.random.randn(X.shape)`
`        return a * X + b + eps`

`a = 1 # slope`
`b = 0 # intercept`
`c = 1.0 # noise`

`X = np.random.randint(100, size=250)`

`Y = f(a,b,c,X)`
```

and use matplotlib to visualize it:

````import matplotlib.pyplot as plt`

`plt.scatter(X,Y)`

`plt.xlim(-10,110)`

`plt.title("How to calculate the Pearson’s Correlation coefficient \n between two datasets in python ?")`

`plt.xlabel('X')`
`plt.ylabel('Y')`

`plt.savefig("Pearson_Correlation_coefficient_01.png", bbox_inches='tight')`

`plt.show()`
``` How to calculate the Pearson’s Correlation coefficient between two datasets in python ?

### Calculate the Pearson’s Correlation coefficient using scipy

To calculate the Pearson’s Correlation coefficient between variables X and Y, a solution is to use scipy.stats.pearsonr

````from scipy.stats import pearsonr`

`corr, _ = pearsonr(X, Y)`
```

gives

````0.9434925682236153`
```

that can be rounded:

````round(corr,2)`
```

gives then

````0.94`
```

### Examples of Pearson’s Correlation coefficients calculation

Lets now reproduce the example from wikipedia:

````import matplotlib.pyplot as plt`
`import numpy as np`

`from scipy.stats import pearsonr`

`def f(a,b,c,X):`
`    eps = c * np.random.randn(X.shape)`
`    return a * X + b + eps`

`A = [1.0,1.0,1.0,0.0,-1.0,-1.0,-1.0]`
`B = [0.0,0.0,0.0,0.0,0.0,0.0,0.0]`
`C = [1.0, 10, 20, 20, 20 ,10, 1.0]`

`n = 1`
`for a,b,c in zip(A,B,C):`
`    print(a,b,c)`

`    X = np.random.randint(100, size=250)`

`    Y = f(a,b,c,X)`

`    corr, _ = pearsonr(X, Y)`

`    plt.scatter(X,Y)`

`    plt.xlim(-10,110)`

`    plt.title("""`
`    How to calculate the Pearson’s Correlation coefficient \n `
`    between two datasets in python ? \n corrcoef = {} \n a = {} b = {} c = {}""".format( str(round(corr,2)), a, b, c) )`

`    plt.xlabel('X')`
`    plt.ylabel('Y')`

`    plt.savefig("Pearson_Correlation_coefficient_{}.png".format(n), bbox_inches='tight')`

`    plt.show()`

`    n += 1`
```

gives How to calculate the Pearson’s Correlation coefficient between two datasets in python ?

### Calculate the Pearson’s Correlation coefficient using numpy

Another solution is to use numpy with numpy.corrcoef:

````import numpy as np`

`np.corrcoef(X,Y)`
```

gives

````[[1.         0.94349257]`
` [0.94349257 1.        ]]`
```
Image

of 