A100 nsight compute profiling error "cuDNN error: CUDNN_STATUS_INTERNAL_ERROR"

sunflower71 · September 5, 2021, 6:19am

I want to profile vgg.py on A100 GPU with nsight compute CLI. The command I used is as below:
sudo /usr/local/cuda-11.0/nsight-compute-2020.1.2/target/linux-desktop-glibc_2_11_3-x64/ncu --export “temp_resul
t” --force-overwrite --target-processes all --kernel-regex-base function --launch-skip-before-match 0 --sampling-interval auto --sampling
-buffer-size 33554432 --cache-control all --clock-control base --apply-rules yes --metrics smsp__sass_data_bytes_m
em_shared --page details --csv /opt/conda/bin/python /home/vgg.py

The error is :
raceback (most recent call last):
File “/home/vgg.py”, line 161, in
main()
File “/home/vgg.py”, line 75, in main
_ = model(dummy_input_batch)
File “/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 1051, in _call_impl
return forward_call(*input, **kwargs)
File “/opt/conda/lib/python3.7/site-packages/torchvision/models/resnet.py”, line 249, in forward
return self._forward_impl(x)
File “/opt/conda/lib/python3.7/site-packages/torchvision/models/resnet.py”, line 232, in _forward_impl
x = self.conv1(x)
File “/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 1051, in _call_impl
return forward_call(*input, **kwargs)
File “/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py”, line 443, in forward
return self._conv_forward(input, self.weight, self.bias)
File “/opt/conda/lib/python3.7/site-packages/torch/nn/modules/conv.py”, line 440, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn’t trigger the error, please include your original rep
ro script when reporting this issue.
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 3, 224, 224], dtype=torch.float, device=‘cuda’, requires_grad=True)
net = torch.nn.Conv2d(3, 64, kernel_size=[7, 7], padding=[3, 3], stride=[2, 2], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams
data_type = CUDNN_DATA_FLOAT
padding = [3, 3, 0]
stride = [2, 2, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0x5560d2a70980
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 1, 3, 224, 224,
strideA = 150528, 50176, 224, 1,
output: TensorDescriptor 0x5560ce2d4c00
type = CUDNN_DATA_FLOAT
nbDims = 4
dimA = 1, 64, 112, 112,
strideA = 802816, 12544, 112, 1,
weight: FilterDescriptor 0x5560d154efa0
type = CUDNN_DATA_FLOAT
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 64, 3, 7, 7,
Pointer addresses:
input: 0x7f783d400000
output: 0x7f7841660000
weight: 0x7f7840600000

felix_dt · September 13, 2021, 10:40am

I recommend trying this again with the latest available Nsight Compute version which has many bug fixes and new features. You don’t need to use the version shipped within your CUDA 11.0 toolkit, even when building your apps using this toolkit. You can download the latest standalone version from https://developer.nvidia.com/nsight-compute

sunflower71 · September 20, 2021, 4:42am

Thanks for help!

Topic		Replies	Views
Profling a simple deep learning code : no python backtrace + cannot use cudnn trace Profiling x86 Windows Targets cudnn	19	1151	December 13, 2023
Profiling failure due to CUDNN_STATUS_INTERNAL_ERROR Nsight Compute	3	1870	October 12, 2021
Training quickdraw model using CudnnLSTM leads to CUDNN_STATUS_EXECUTION_FAILED cuDNN	1	1065	April 17, 2019
ERROR: cudnn failure (CUDNN_STATUS_INTERNAL_ERROR) in mnistCUDNN.cpp:414 cuDNN	5	815	September 15, 2024
ERROR: cudnn failure (CUDNN_STATUS_EXECUTION_FAILED) in mnistCUDNN.cpp:625 cuDNN cudnn	4	6345	February 23, 2021
What is the meaning of error in Nsight UI Diagnostics Summary Profiling Linux Targets	3	950	February 2, 2023
Nsys profile error: invalidArgumentException, unknown API driver activity Profiling Linux Targets nsight	17	3483	July 28, 2023
Not able to run AI workloads on H100 GPU AI Foundation Models and Endpoints tensorrt , cuda , tensorflow , kernel , ubuntu , cudnn , rapids	6	1054	December 28, 2024
Cudnn-10.2-linux-x64-v8.1.0.77.tgz requires CUDA 11? cuDNN	3	799	February 5, 2021
Nsight Systems causes CuPy to crash in Windows 10 if nvcc is invoked for kernel compilation Profiling Linux Targets nsight , python , cupy , nvcc	4	1679	July 21, 2023

A100 nsight compute profiling error "cuDNN error: CUDNN_STATUS_INTERNAL_ERROR"

Related topics