Stochastic AUROC
Five previous posts were a preparation for this one. We prepared age model which uses only age factor. The model gives us the Score with low cardinal number because there are only three values: -1, 0, 1. And we plotted ROC and calculated AUROC.
The question which has bugged me for eleven years, since the moment I heard the ROC and AUROC definitions in the School of Data Analysis, is: "How come there are only three points on ROC, when, by definition, there should be at least 891 points?" The magic number 891 is the number of passengers in the Titanic dataset, and three is the number of different model answers.
Let's start thinking from first principles. What is our ROC approach? To assign a score, to make a rank. But if we have only -1, 0, 1 values, the rank looks more like three unorganized crowds. The nature of these crowds shows itself when we are applying our algorithm: "move from the highest values of scores toward lower". When all scores are different, we have a strictly determined order. When a group of passengers has the same score, we can only pick randomly from this group. The target value for each passenger guides our path over the TPR-FPR plane. So, now we have random wandering.
Let's check the picture. Blue line - we were EXTREMELY lucky and picked passengers from the crowd in the right order: survived first. Orange - opposite extreme, deceased first. The lines close to the diagonal of the rectangle are random trajectories.
There are two points where all lines intersect. Their coordinates are determined by the number of survived and deceased passengers in each group. The number of survived passengers is the height of the rectangle, and the number of deceased passengers is the width.
Phew. Here we are. I hope it is clear now that, when we have a small number of different values from the model, we don't have enough information to rank objects - passengers - deterministically. The sequence which guides ROC becomes stochastic, and ROC becomes a random walk trajectory.
I think it is important because:
👉 This approach closes a serious gap in understanding ROC properties for a wide class of applications. For example, we have a binary feature and a binary target. What is ROC in this case? Now we know.
👉 The cardinality of the model's output is a property which is frequently ignored. But it can be either accidentally blown up by noisy calculations, or reduced by aggressive quantization. We have to be aware of these effects.
