CUDNN enabled increase runtime by 25%

Hi All,
Enviroments tested:
torch=1.11, cuda11.3, cudnn8200 (conda),
AND
torch_1.13, cu11.7, cudnn8500 (conda),
AND
pytorch-nvidia-docker 22.07
All with Ubuntu 20.04.

Running a torch model in eval mode with CUDNN-enabled, consistently increase 25% runtime.

CUDNN, CUDA Deep Neural Network, is a specialized library that should improve runtime of CUDA.

I’ve attached the report from torch profiler with/without CUDNN.

nvidia-issue.txt (5.6 KB)

1 Like

Hi @valtmaneddy ,
Can you please try utilizing NHWC format, i.e. channels_last (see (beta) Channels Last Memory Format in PyTorch — PyTorch Tutorials 2.0.1+cu117 documentation), cuDNN should be quite a bit faster than without.
Please let us know if issue still persist.
Thanks

HI @AakankshaS ,
I’ve tested with torch2.01+cu117 as you’ve asked.
in torch2.01 there’s an improvement with channels last+cudnn over no-cudnn.
in torch1.13 there’s an improvement with no-cudnn over with-cudnn.
best is torch2+ with cudnn with channels last.

runtime torch_CUDNN_DISABLED_1.13cu11.6cudnn8302_CHNNELS_LAST_NO, runtime: 28.96msec
runtime torch_CUDNN_DISABLED_1.13cu11.6cudnn8302_CHNNELS_LAST_YES, runtime: 30.55msec
runtime torch1.13cu11.6cudnn8302_CHNNELS_LAST_NO, runtime: 30.43msec
runtime torch1.13cu11.6cudnn8302_CHNNELS_LAST_YES, runtime: 29.91msec

runtime torch_CUDNN_DISABLED_2.0cu11.7cudnn8500_CHNNELS_LAST_NO, runtime: 29.41msec
runtime torch_CUDNN_DISABLED_2.0cu11.7cudnn8500_CHNNELS_LAST_YES, runtime: 31.59msec
runtime torch2.0cu11.7cudnn8500_CHNNELS_LAST_NO, runtime: 27.84msec
runtime torch2.0cu11.7cudnn8500_CHNNELS_LAST_YES, runtime: 27.00msec

Is there a way to get CUDNN improve runtime with torch<2?