Gettig "RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED"

When I disabled cuDNN with torch.backends.cudnn.enabled = False, the model trains without a problem but with cuDNN turned on, this error message is thrown. So I am assuming this is not an out-of-memory problem but a problem with my environment because there was not problem when I run it on an HPC from my institute but it occurs when I used my local machine. Both are runned through Docker containers, so perhaps it is a problem with the driver version of my host machine?

Hi,

Please refer to below links in case it’s useful:


Thanks