Sunday, February 28, 2021

Micro averaged precision, recall, and F1 are always the same

This is something that causes confusion when it happens because it isn't very well known. If you're evaluating a multi-class classifier using micro-averaging, the precision, recall, and F1 scores will be exactly the same, always. In fact this is explained in the sklearn documentation as "Note that if all labels are included, “micro”-averaging in a multiclass setting will produce precision, recall and that are all identical to accuracy.". Let's prove this, first using an example and then more generally.

Let's say that your classifier's classes (or labels) are A, B, and C and that you have the following classifier predictions together with the true classes:

predtrue
AA
AA
BA
CA
AB
BB
BC
CC
CC

To measure the micro-averaged precision, recall, and F1 you need to first measure the number true positives (TP), false positives (FP), and false negatives (FN) of each class. For example, if we look at just class A, the TP is the number of rows where A was predicted and also the true class, that is, 2 (first two rows). The FP is the number of rows where A was predicted but was not the true class, that is, 1 (the fifth row). The FN is the number of rows where A was not predicted but was the true class, that is, 1 (the third row).

ClassTPFPFN
A212
B121
C211

You calculate the micro-averaged precision ($P$), recall ($R$), and $F1$ for labels $L$ as follows:

  • $P = \frac{\sum_{l\in L}{{TP}_l}}{\sum_{l\in L}{({TP}_l + {FP}_l)}} = \frac{2+1+2}{(2+1)+(1+2)+(2+1)} = \frac{5}{9} = 55.6\%$
  • $R = \frac{\sum_{l\in L}{{TP}_l}}{\sum_{l\in L}{({TP}_l + {FN}_l)}} = \frac{2+1+2}{(2+2)+(1+1)+(2+1)} = \frac{5}{9} = 55.6\%$
  • $F1 = \frac{2 \times P \times R}{P + R} = \frac{2 \times \frac{5}{9} \times \frac{5}{9}}{\frac{5}{9} + \frac{5}{9}} = 55.6\%$
So we got all the scores the same. If $P$ and $R$ are the same than $F1$ will be the same because the harmonic mean of the same number is that number (just like the arithmetic mean). We can easily prove this: $$ \frac{2 \times X \times X}{X + X} = \frac{2X^2}{2X} = X $$ This is assuming that $X$ is not zero, which would mean that the precision and the recall are both equal to zero, which is impossible.

So the question we should ask is why are the precision and recall always the same. If the precision and recall are always the same, then their definition should be identical. Let's see if we can simplify their identity. $$ P = R \\ \frac{\sum_{l\in L}{{TP}_l}}{\sum_{l\in L}{({TP}_l + {FP}_l)}} = \frac{\sum_{l\in L}{{TP}_l}}{\sum_{l\in L}{({TP}_l + {FN}_l)}} \\ \sum_{l\in L}{({TP}_l + {FP}_l)} = \sum_{l\in L}{({TP}_l + {FN}_l)} \\ \sum_{l\in L}{{TP}_l} + \sum_{l\in L}{{FP}_l} = \sum_{l\in L}{{TP}_l} + \sum_{l\in L}{{FN}_l} \\ \sum_{l\in L}{{FP}_l} = \sum_{l\in L}{{FN}_l} $$

So it all boils down to whether or not the sum of each label's false positives is always equal to the sum of each label's false negatives. Let's say that you have the following row in the first table:

predtrue
AB

This is an error, but is it a false positive or a false negative? From the perspective of label A it's a false positive, but from the perspective of label B it's a false negative. So for every error in the table, it will be both a false positive for the label in the prediction column and a false negative for the label in the true column. This means that the number of false positives and false negatives will always be balanced, as with every false positive for one label there will be a false negative for another label.

And with this we have proven that the number of false negatives will always equal the number of false positives, which further proves that the precision will always be equal to the recall, which further proves that the F1 score will always be equal to them as well.