Does TensorRT support conv3d with Tensor Core ?

I have a model with 3d operation, the main layer is conv3d. With nvporf, 90% inference time is cudnn::detail::implicit_convolveND_sgemm<float …>. After use fp16, little performance improvement. the main inference time is cudnn::detail::implicit_convolveND_sgemm<__half …>.

TensorRT Version is 7 With RTX2080TI, any suggestions? Thanks

Any reply or some advices? thanks.

I have the same question

@2701018719, Any advices or why ?


Yes, TRT support 3d conv layer. Speed depends on lot of param like GPU type etc.
Kernel selection depends on layer parameters, we have fast kernels for some common used parameters, like 333 filter size.
Others will use a general default kernel implement, which might be slow.

Please refer below link:

Also, could you please try using latest CUDA/cuDNN/TRT version?


Hi, thank you for your reply.

I just using the latest CUDA/cuDNN/TRT version with CUDA 10.2, cuDNN 7.6.5 , TRT

I analyzed the network time based on nvprof again. Network time is mainly concentrated in a conv3d layer with 3×3×3 filter size, 32 ngroups, 1 or 2 stride and 1 pading. The input(128x8x28x28, etc) and out(128x8x28x28, etc). The layer will perform implicit_convolveND_sgemm 32 times with fp16, and my model contains many such layer(33).

I tested the time consumption of part networks, with fp32 2.5ms, with fp16 2.8ms.

Any advices? Thanks

3D group conv specific kernel is currently not supported in TRT7.
In TRT7 we will split group conv and call kernel for each group. In this case since you have 32 ngroups and we run 32 times conv, that’s might be causing the performance to drop.


Hi, Thank you for your explanation.

I test the 3D group conv based on cuDNN 7.6.5. With nvprof, the kernel implicit_convolveND_sgemm still run 32 times. Does cuDnn supported the 3D group conv specific kernel ?

3D group conv specific kernel is currently not supported.


ok, I got it, thanks.