The first thing that comes up in your mind you want to measure the association between two variables is probably the correlation coefficient (Pearson correlation coefficient). This coefficient consists of the covariance of the two variables over the product of their standard deviation, basically it is a kind of “normalised” covariance. It’s easy to understand, quick to compute and it is available in almost all the software computing tools. Unfortunately, the correlation coefficient is not able to measure any nonlinear relationship between the variables and this is the reason why sometimes other metrics can be used.
The Mutual Information is one of these alternatives. It comes from Information Theory and essentially it measures the amount of information that one variable contains about another one. The definition of Mutual Information is strictly related to the definition of Entropy (H) which tries to define the “unpredictability” of a variable. The Mutual Information (I) between two variables is equal to the sum of their entropies minus their joint entropy:
Differently from the correlation coefficient, the value of Mutual Information is not bounded and it can be hard to understand. Thus, we can consider a normalized version of Mutual Information, called Uncertainty Coefficient (introduced in 1970) which takes the following form:
This coefficient can be seen as the part of that can be predicted given .
Let’s try with an example, a sinusoid function. The correlation between time and the function value is normally close to zero, differently from the uncertainty coefficient. We start with a sinusoid function (figure 1, upper left) and then we change the order (random resampling) of an increasing fraction of the original samples. We see in Figure 1 the original signal with 25%, 50% and 75% of reshuffled samples. What happens to the correlation coefficient and mutual information in those cases?
In Figure 2 we can see how both the measures vary on average in 50 runs.
We can observe how the uncertainty coefficient (based on the mutual information) seems to represent the quantity of “order” inside the signal. more consistently than the correlation coefficient which appears very noisy.