Nsight cuDNN error with CNN but not normal NN

Hi all,

I’m trouble-shooting running nsys on a GPU using a simple CNN.

Confusingly, my code runs totally fine without nsys (e.g., just “python”), but when I run with nsys, I keep running into cuDNN errors. Further, it seems to run fun when I switch from a CNN to a standard MLP.

Thanks in advance for any/all advice!

Things I’ve tried:

  • Switching from a CNN to a simple MLP: For some reason, this resolves the error
  • Upgrading nsys versions: No dice
  • Referencing the full python path in the nsys command
  • Running with sudo
  • Running with -E

Version information:
Torch version: 2.0.0+cu117
CUDA version: 11.7
Torchvision version: 0.15.1+cu117
cuDNN version: 8500
Nsight version: NVIDIA Nsight Systems version 2020.3.4.32-52657a0

Command
nsys profile -w true -o my_profile python test_profile.py

Error
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

NVIDIA-smi
image

Simple CNN

Are you in a position where you can update both your CUDA version (CUDA Toolkit) and your Nsight Systems version?

I have not seen this issue before, but it seems likely to cuDNN compat issue. You say that it happened with an updated nsys version as well, are you sure it was calling the newer version and not still calling the older one even though there was a newer one available?