Optimizing Fraud Detection in Financial Services with Graph Neural Networks and NVIDIA GPUs

Originally published at: https://developer.nvidia.com/blog/optimizing-fraud-detection-in-financial-services-with-graph-neural-networks-and-nvidia-gpus/

Learn an end-to-end workflow showcasing best practices for detecting financial services fraud using GNNs and GPUs.

What batch sizes were used while scaling from 1 to 8 GPUs on the MAG240M dataset?

We used batch size of 8192 and this batch size gave us the best classification accuracy. We see similar speedups with lower batch sizes as well.

Hi!
Can you share the full end-to-end code for fraud detection (including R-GCN building, training and downstream XGBoost applying)?

1 Like

Hi I know I am a bit late to the topic, but I have some questions I was wondering if you could answer.

So far, I have conducted preprocessing, and the dataset now contains 20 numerical features. As the article suggests, I have saved the bulk of the data on the edges between nodes, leaving the nodes featureless, besides their distinct IDs. Moreover, from my perspective, it would seem that these transactions only have one relationship, which is “Credit card purchases from Merchant”. Now, I have some questions regarding the suggestions of the article

  1. The more I look into R-GCN and GCN for that matter, it would seem that these models do not use edge features, but node features instead. As such, wouldn’t it be ineffective to conduct node embeddings, and node classification as the article suggests, since there is no information on the nodes themselves, and its IDs provide no information to detect fraudulence?
  2. Does R-GCN provide any significant advantage to GCN in this instance, as there is only one type of relationship ?
  3. I have also seen the article suggest using Link Prediction as part of the approach, but I do not understand how it helps with detecting fraudulent transactions.

I am having a pretty hard time understanding this article, and the methods for that matter, and I would really appreciate some clarification.

Hi there nthqhai2002!

Thanks for reading this blog and for your awesome questions!

it would seem that these transactions only have one relationship, which is “Credit card purchases from Merchant”

In order to make the edge undirected and to allow message propogation between both classes of the graph, we also add a second reverse edge type, in your cases “Merchant has purchase from Credit Card”

As such, wouldn’t it be ineffective to conduct node embeddings?

The IDs themselves in a transductive setting are valuable as well, as a learned user embedding encodes the generalized structural behavioral embedding of the user. You can also aggregate adjacent edge features per node to user as a node feature

You are correct that an architecture that propogates edge information would likely be useful here. The purpose of the blog post was mainly to show baseline usefulness, but if you’re interested in edge-inclusive papers, you can refer to: Exploiting Edge Features for Graph Neural Networks | IEEE Conference Publication | IEEE Xplore

I have also seen the article suggest using Link Prediction as part of the approach, but I do not understand how it helps with detecting fraudulent transactions.

Link Prediction in this case is used to generate robust representations of nodes, which can be used downstream in the direct prediction of the fraud label. Often in the fraud detection domain, labels are noisy and generally weak. Training representations on non-noisy labels (transaction presence) often has more consistent convergence properties.

1 Like

Hi kkranen!
Thanks a lot for the response, your answers have helped me a lot !
I have another question regarding this workflow, if you do not mind. Suppose I have trained the model on data from 2015 until 2021, and new influx of data from 2022 comes in. At this stage, I would follow the workflow in order to generate robust node embeddings, then attach them to the 2022 tabular data according to the unique node IDs, then conduct predictions. My question is should the node embeddings of 2022 be generated by submitting the entire dataset from 2015 to 2022 to the workflow, or do I only need to exclusively use data of 2022?

1 Like

Hello,
Could you find the reproducible source code?