Leveraging Machine Learning to Detect Fraud: Tips to Developing a Winning Kaggle Solution

Originally published at: Leveraging Machine Learning to Detect Fraud: Tips to Developing a Winning Kaggle Solution | NVIDIA Developer Blog

Kaggle is an online community that allows data scientists and machine learning engineers to find and publish data sets, learn, explore, build models, and collaborate with their peers. Members also enter competitions to solve data science challenges. Kaggle members earn the following medals for their achievements: Novice, Contributor, Expert, Master, and Grandmaster. The quality and…

The secret to creating a high-scoring model in this competition was feature engineering. The features that made the difference for the winning team were new columns created from group aggregations of other columns. Computing group aggregations can naturally be done in parallel and they benefit from using GPU instead of CPU. Chris created a notebook containing the XGBoost model of the 1st place solution converted to use RAPIDS cuDF. To read one million rows and create 262 features on the CPU using pandas took 5 minutes. To read and develop those features on GPU with RAPIDS cuDF took 20 seconds, 15x faster!