CUDNN enabled increase runtime by 25%

valtmaneddy · June 29, 2023, 12:56pm

Hi All,
Enviroments tested:
torch=1.11, cuda11.3, cudnn8200 (conda),
AND
torch_1.13, cu11.7, cudnn8500 (conda),
AND
pytorch-nvidia-docker 22.07
All with Ubuntu 20.04.

Running a torch model in eval mode with CUDNN-enabled, consistently increase 25% runtime.

CUDNN, CUDA Deep Neural Network, is a specialized library that should improve runtime of CUDA.

I’ve attached the report from torch profiler with/without CUDNN.

nvidia-issue.txt (5.6 KB)

AakankshaS · July 3, 2023, 4:01pm

Hi @valtmaneddy ,
Can you please try utilizing NHWC format, i.e. channels_last (see (beta) Channels Last Memory Format in PyTorch — PyTorch Tutorials 2.0.1+cu117 documentation), cuDNN should be quite a bit faster than without.
Please let us know if issue still persist.
Thanks

valtmaneddy · July 6, 2023, 10:57am

HI @AakankshaS ,
I’ve tested with torch2.01+cu117 as you’ve asked.
in torch2.01 there’s an improvement with channels last+cudnn over no-cudnn.
in torch1.13 there’s an improvement with no-cudnn over with-cudnn.
best is torch2+ with cudnn with channels last.

runtime torch_CUDNN_DISABLED_1.13cu11.6cudnn8302_CHNNELS_LAST_NO, runtime: 28.96msec
runtime torch_CUDNN_DISABLED_1.13cu11.6cudnn8302_CHNNELS_LAST_YES, runtime: 30.55msec
runtime torch1.13cu11.6cudnn8302_CHNNELS_LAST_NO, runtime: 30.43msec
runtime torch1.13cu11.6cudnn8302_CHNNELS_LAST_YES, runtime: 29.91msec

runtime torch_CUDNN_DISABLED_2.0cu11.7cudnn8500_CHNNELS_LAST_NO, runtime: 29.41msec
runtime torch_CUDNN_DISABLED_2.0cu11.7cudnn8500_CHNNELS_LAST_YES, runtime: 31.59msec
runtime torch2.0cu11.7cudnn8500_CHNNELS_LAST_NO, runtime: 27.84msec
runtime torch2.0cu11.7cudnn8500_CHNNELS_LAST_YES, runtime: 27.00msec

Is there a way to get CUDNN improve runtime with torch<2?

Topic		Replies	Views
Pytorch 1.2 cuda 10.0 vs. pytorch 1.9 cuda 11.1 significant slowdown cuDNN	3	1007	July 8, 2021
High latency on cudnn enabled pytorch for a single image hdr model cuDNN cuda , pytorch , cudnn , jetson	0	428	February 6, 2024
Inference with different CUDNN versions gives different outputs cuDNN cuda , pytorch	1	1271	July 26, 2023
cuDNN error: CUDNN_STATUS_EXECUTION_FAILED Cuda 10.1 with Pytorch cuDNN	1	1715	July 3, 2019
Gettig "RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED" cuDNN	1	4440	May 2, 2020
Cudnn_status_execution_failed cuDNN	1	2051	April 15, 2021
Cudnn 7.3 has poor performance on GeForce RTX 2080 cuDNN	0	894	October 12, 2018
Cudnn convolution is significantly slow cuDNN	3	1183	April 19, 2022
How to get better conv performance with cudnn? cuDNN	1	795	September 25, 2023
Cudnn and pytorch convolution operations differ greatly in accuracy cuDNN cudnn	0	90	September 26, 2024

CUDNN enabled increase runtime by 25%

Related topics