Possible bug in NsightCompute; Unable to find a valid cuDNN algorithm

Hi
I have noticed that nsight compute on RTX 3080 is unable to measure shared_efficiency (smsp__sass_average_data_bytes_per_wavefront_mem_shared.pct) for some PyTorch applications. I have created a case and the put the files here.

I guess there is a problem with current version of nsight compute that ends up with this error:

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

Some system information are shown below:

$ ls ~/nvidia
cudnn-11.2-linux-x64-v8.1.0.77.tgz
cuda_11.2.0_460.27.04_linux.run
nsight-compute-linux-2020.3.0.18-29307467.run

If there is a quick workaround for that, I appreciate it.

I am unable to reproduce any issues with Nsight Compute 2020.3.0 when profiling this metric on a pytorch restnet50 example. You will need to provide more details as how to reproduce this issue you are seeing, particular the target application that is being used.

The example is here. I prepared the .py file by reducing the number of epochs to 1. You can download the zip file containing the script and input from here. Please see “nsight-cmd.txt” for the command. You may also want to clone the latest PyTorch from source as I did.