Digitization of objects

The simplest things are the hardest.

In this post, I want to start discussing the stochasticity of ROC. This is a very simple but very important post for me, because it gives me instruments to discuss complex ideas using very simple language.

We use machine learning to make decisions. And these decisions deal with real-world objects. A computer cannot operate with the objects directly, but it can operate with their descriptions. Features.

In our Titanic problem, we already know these features: sex, age, pclass, and so on.

A typical step is to organize these features into a table (X), where each row is an object (a person), and each column corresponds to one feature.

(Y) is the target we want to predict. In our case, it is survival. Let me be a little inconsistent here and draw happy and unhappy faces. I am not punk enough to draw deceased passengers with crosses instead of eyes.

To the chase. Instead of saying “let’s split the object table,” I want to use statements like this:

“We have a crowd with an average survival rate of 0.38. If we split it into male and female crowds, their survival rates are 0.19 and 0.74 respectively.”

Now, instead of splitting and sorting a table, we can tell people to form different crowds or line up into a rank. I think this is much easier to understand.