It's alive!

Finally I ran the full cycle of training and applying my EGBDT model in JupyterLab.

I spent two days in a very unpleasant debug session because I broke a simple rule:

Always do EDA!

EDA—Exploratory Data Analysis—is simple: before you do anything with your data, get a taste of it. Check the mean of the target and features. Take a small sample and read its raw dump. Plot histograms of your factors. Do smoke tests.

Instead, I just downloaded the dataset and jumped straight into training. The best I saw was 0.2 MSE on train and 0.3 on test. I started suspecting deep, fundamental problems—some math interfering with my plans.

Then a very simple thought: plot the graphs. Nothing extraordinary—just a basis-function factor over time.

It turned out my iron friend used
sin(𝑡) instead of sin(50𝑡). I was trying to approximate a high-frequency signal with a low-frequency one.

Fixing that made the MSE zero. On the first iteration.

Incredible—and incredibly unsatisfying to spend two days on something so simple: skipping EDA at the start.