CUDA 11.1 vs CUDA 10.0 significant slowdown

Hello,
I recently upgraded my pytorch from 1.2 (CUDA 10.0) to the latest 1.9 (CUDA 11.1).

However using the same RTX 2080 TI I observe massive slowdown in inference (the same code is used). Using pytorch autograd profiler I see

Pytorch 1.2, CUDA 10.0
convolution 0.74% 386.450us 8.93% 4.690ms 72.161us 9.91% 9.902ms 152.342us 65
_convolution 1.57% 824.964us 8.20% 4.304ms 66.216us 9.60% 9.592ms 147.563us 65
conv2d 0.72% 377.362us 8.82% 4.632ms 75.928us 9.10% 9.086ms 148.953us 61

Pytorch 1.9, CUDA 11.1
aten::convolution 1.33% 850.084us 17.73% 11.327ms 174.265us 270.592us 0.34% 26.337ms 405.189us 65
aten::_convolution 2.03% 1.295ms 16.40% 10.477ms 161.187us 321.695us 0.40% 26.067ms 401.026us 65
aten::conv2d 1.22% 781.662us 17.38% 11.105ms 182.043us 247.232us 0.31% 25.043ms 410.548us

As can be clearly seen, there is almost 3 times slowdown with the new CUDA.

Looking for suggestions and answers,

Thank you,
Alex

Hi @alex.spivakovsky
Please note this forum branch is dedicated to CUDA GDB support. You question might be more suitable for different forums: