Convolution performance question on Volta in a windows environment

Greg118 · June 13, 2018, 3:58am

I was testing cudnnConvolutionForward on various GPU (two of which shown below). With the goal of seeing performance differences on the GV100 with and without tensor cores enabled vs. Pascal GPU. To make the test simple I used a single convolution, to make the differences clearer I ran it on an 8k image. When comparing times Titan X & P5000 were faster than the GV100.

I tried to look closer with Visual Profiler’s kernel analysis and ran into “Internal Profiling error” on the GV100. Then I also tried profiling in Visual Studio with the Nsight profiler and it failed to capture any kernels. Are some features still in development for Volta?
I saw that cuDNN uses 128x128 relu on Volta and 128x32 relu on Pascal.

Tested project in Visual Studio 15 & 17. Used compute_61,sm_61;compute_70,sm_70. CUDA 9.2 (with patch), Nsight 5.6.

Trying to profile with GV100 (Convolution time ~31ms)
tve — ImgBB

Profiling with a Titan X (Pascal) but used Maxwell cuDNN function (Convolution time ~16ms)
txo — ImgBB

I also ran Nvidia’s “conv_sample” in the Linux sample codes (Ran it on windows) and saw similar results.

I’m unsure why the GV100 is slower than the older cards that I have for convolution, I was thinking about moving it to a Linux machine to test there but thought I would post my question before more tests.

I was able to see/test performance differences with matrix multiplication on Volta but haven’t seen logical performance results for convolution with cuDNN.

Does anyone have insight on any of this?

Thank you

Update(6/27):
Nsight Visual Studio Edition 5.6 which supports Volta was released May 31, 2018, about a half a year after the first Volta desktop card was released December 7, 2017. I was using Nsight version 5.6.0.18099.

feisuzhu · December 19, 2019, 10:14am

Experiencing this too…

We have a loss function runs 5s on Tesla V100 and 2s on Titan Xp. This is frustrating.

Topic		Replies	Views
Low performance for convolution in cuDNN on Tesla V100 cuDNN	5	2214	August 2, 2018
The convoluion was slow in cudnn 8.4.0 and 8.4.1 cuDNN	6	971	September 17, 2022
convulution not running in parallel cuDNN	3	2056	June 22, 2020
Cudnn convolution is significantly slow cuDNN	3	1260	April 19, 2022
P100 is much slower than P4??? CUDA Programming and Performance	0	490	August 15, 2018
Nsight System profile tells volta_scudnn while using RTX 2080 Ti Profiling x86 Windows Targets	2	1246	June 8, 2021
CUDNN: cudnnConvolutionForward very bad performance(very long execution time) on xavier Jetson AGX Xavier	3	1169	April 30, 2019
CuDnn slow convolution operation cuDNN kernel	1	231	January 31, 2025
Cudnn 7.3 has poor performance on GeForce RTX 2080 cuDNN	0	902	October 12, 2018
TensorRT 2x slower than Cudnn for single Conv2D (74 ms vs. 156 ms) TensorRT	6	974	February 5, 2021

Convolution performance question on Volta in a windows environment

Related topics