Novel Transformer Model Achieves State-of-the-Art Benchmarks in 3D Medical Image Analysis

Originally published at:

The NVIDIA Swin UNETR model is the first attempt for large-scale transformer-based self-supervised learning in 3D medical imaging.

1 Like

I am trying to train on custom data using the colab file provided in the github i.e. " swin_unetr_btcv_segmentation_3d.ipynb". I am getting cuda out of memory error. How much cuda memory is required for the SWIN UNETR model? my patch size is (96,96,32). number of features=48. input channel = 1, output channel = 2

transformers throwing VNet&UNet under the bus?

Thanks for your interest in this work. We tested the memory usage locally with 3D volume of (96, 96, 32), batch size = 1, sample_num = , it takes around ~9G memory.
However, the memory usage also depends on how large the validation or test data are, a basic training/validation consumption is to use a 11G GPU with batch size=1, sample_num=1.