It's alive!
Finally I ran the full cycle of training and applying my EGBDT model in JupyterLab.
I spent two days in a very unpleasant debug session because I broke a simple rule:
Always do EDA!
EDA—Exploratory Data Analysis—is simple: before you do anything with your data, get a taste of it. Check the mean of the target and features. Take a small sample and read its raw dump. Plot histograms of your factors. Do smoke tests.
Instead, I just downloaded the dataset and jumped straight into training. The best I saw was 0.2 MSE on train and 0.3 on test. I started suspecting deep, fundamental problems—some math interfering with my plans.
Then a very simple thought: plot the graphs. Nothing extraordinary—just a basis-function factor over time.
It turned out my iron friend used
sin(𝑡) instead of sin(50𝑡). I was trying to approximate a high-frequency signal with a low-frequency one.
Fixing that made the MSE zero. On the first iteration.
Incredible—and incredibly unsatisfying to spend two days on something so simple: skipping EDA at the start.
