Originally published at: The Kaggle Grandmasters Playbook: 7 Battle-Tested Modeling Techniques for Tabular Data | NVIDIA Technical Blog
Over hundreds of Kaggle competitions, we’ve refined a playbook that consistently lands us near the top of the leaderboard—no matter if we’re working with millions of rows, missing values, or test sets that behave nothing like the training data. This isn’t just a collection of modeling tricks—it’s a repeatable system for solving real-world tabular problems…
- Pseudo-labels can also be used for pretraining. Fine-tune on the initial data as a last step to reduce noise introduced earlier.
Does this mean the following pipeline? Train on labeled data => Run pseudo-labeling to get labels of the unlabeled samples => Retrain on larger data => Fine-tune only on the originally labeled data.
if thats so, wouldnt the model trained in step 3 already contain all the information and fine tuning it wont add much of a value?