I have noticed that nsight compute on RTX 3080 is unable to measure shared_efficiency (smsp__sass_average_data_bytes_per_wavefront_mem_shared.pct) for some PyTorch applications. I have created a case and the put the files here.
I guess there is a problem with current version of nsight compute that ends up with this error:
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
Some system information are shown below:
$ ls ~/nvidia cudnn-11.2-linux-x64-v22.214.171.124.tgz cuda_11.2.0_460.27.04_linux.run nsight-compute-linux-2020.3.0.18-29307467.run
If there is a quick workaround for that, I appreciate it.