my machine is :
v100 16GB card*4
I first run lots of cudnn operations with very large tensor close to 2GB size.
at the end I run a final cudaDeviceSynchronize, it wait for a very long time, and return a fail message like this:
Cuda failure: 39
I can not find such failure code on documentation and internet, can any help?