analitics

Pages

Sunday, August 9, 2020

Python 3.8.5 : Pearson Product Moment Correlation with corrcoef from numpy.

The python package named numpy come with corrcoef function to return Pearson product-moment correlation coefficients.
This method has a limitation in that it can compute the correlation matrix between two variables only.
The full name is the Pearson Product Moment Correlation (PPMC).
The PPMC is not able to tell the difference between dependent variables and independent variables.
The documentation about this function can be found here.
More examples of Pearson Correlation can be found on this website.
My example presented in this tutorial, use the random packet to randomly generate integers and then calculate the correlation coefficients.
All of these are calculated five times in a for a cycle and each time the seed parameters are changed randomly.
Each time the correlation matrices are printed and then the random number graphs are displayed.
Let's see the source code:
import random

import numpy as np

nr_integers = 100
size_integers = 100

import matplotlib
import matplotlib.pyplot as plt

# set from 0 to 4 seed for random and show result 
for e in range(5):
    # change random seed
    np.random.seed(e)
    # nr_integers random integers between 0 and size_integers
    x = np.random.randint(0, size_integers, nr_integers)
    # Positive Correlation with some noise created with
    # nr_integers random integers between 0 and size_integers
    positive_y = x + np.random.normal(0, size_integers, nr_integers)
    correlation_positive = np.corrcoef(x, positive_y)
    # show matrix for correlation_positive
    print(correlation_positive)
    # Negative Correlation with same noise created with 
    # nr_integers random integers between 0 and size_integers
    negative_y = 100 - x + np.random.normal(0, size_integers, nr_integers)
    correlation_negative = np.corrcoef(x, negative_y)
    # show matrix for output with plt
    print(correlation_negative)
    # set graphic for plt with two graphics for each output with subplot
    plt.subplot(1, 2, 1)
    plt.scatter(x,positive_y)
    plt.subplot(1, 2, 2)
    plt.scatter(x,negative_y)
    # show the graph 
    plt.show()