TensorRT 7 conv3d is not running on Tensor Cores

maxim.godovicyn · September 6, 2021, 5:13pm

Description

Hello.
I’m trying to optimize pytorch NN on Jetson AGX Xavier 32GB with TensorRT, but I can’t make conv3d run on Tensor Cores. I’ve created really small and easy NN with convolution and activation only, and I expect it to run on tensor cores, but it’s not. I am running onnx model with trtexec and profiling it with nvprof + -m tensor_precision_fu_utilization and see no Tensor Core utilisation. Am I missing something?

Environment

TensorRT Version: 7.1.3-1+cuda10.2
GPU Type: Volta
Nvidia Driver Version: Jetpack 4.5.1
CUDA Version: 10.2
CUDNN Version: 8.0
Operating System + Version: 4.9.201-tegra (Jetpack 4.5.1)
Python Version (if applicable): -
TensorFlow Version (if applicable):-
PyTorch Version (if applicable): 1.6 (for onnx model creation)
Baremetal or Container (if container which image + tag):

Relevant Files

Onnx model:
test.onnx (432.6 KB)

Nvprof log file:
nvprof_test.log (9.5 KB)

Steps To Reproduce

Code to create onnx model:

import torch
from torch import nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv = nn.Conv3d(64, 64, 3, padding=1)
        self.pool = nn.MaxPool3d(2, 2)

    def forward(self, x):
        x = self.conv(x)
        x = F.relu(x)
        return x

image_dims = (1, 64, 64, 64, 64)

dummy_input = torch.rand(image_dims, device="cuda")

model = Net().to("cuda")

torch.onnx.export(model, dummy_input, "test.onnx", verbose=True, opset_version=11)

Command to run model and create nvprof log:
sudo -E /usr/local/cuda-10.2/bin/nvprof -m tensor_precision_fu_utilization --log-file nvprof_test.log /usr/src/tensorrt/bin/trtexec --onnx=test.onnx --explicitBatch --dumpProfile --fp16 --verbose --workspace=4096

NVES · September 7, 2021, 4:08am

Hi,
Below link might help you with your query, Kindly check below link for all 3d support layers:

Thanks!

maxim.godovicyn · September 7, 2021, 6:52am

Hi,
Thanks for your answer, but I’ve checked these links and cudNN guidelines for 3D convolutions on Tensor Cores, but i don’t see any limitation I’ve broken. All of my input and output channels are multiple of 8, my kernel is (3, 3, 3), padding is (1, 1, 1). So what’s wrong? I have wrong board configuration or this net shouldn’t run on Tensor Core?

spolisetty · September 9, 2021, 10:21am

Hi,

It’s may be difficult to say whether a single layer can choose MMA format as it requires additional kernel to transform input which may make the overall performance is not that good.

Thank you.

maxim.godovicyn · September 9, 2021, 1:22pm

Hi,

Thanks for your answer. I understand that, but when I do the same thing with conv2d I see tensorrt at least trying to use tensor core instructions (h884***). On the other hand, with 3d convolution I don’t see it. How can I check if conv3d can be offloaded to Tensor Cores. I believe, i checked all the limitations and i don’t see if I am breaking one.

Thank you

maxim.godovicyn · September 10, 2021, 4:21pm

Hi,

I’ve double checked and replaced in my model creation code Conv3d/MaxPool3d with Conv2d/MaxPool2d and i see Conv2d executing on Tensor Cores.

import torch
from torch import nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv = nn.Conv2d(64, 64, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)

    def forward(self, x):
        x = self.conv(x)
        x = F.relu(x)
        return x

#image_dims = (1, 64, 64, 64, 64)
image_dims = (1, 64, 64, 64)

dummy_input = torch.rand(image_dims, device="cuda")

model = Net().to("cuda")

torch.onnx.export(model, dummy_input, "test.onnx", verbose=True, opset_version=11)

Can you please share a link or some documentation on cudNN 8.0 for jetson platforms, because as I got it, it differs from desktop release and it’s not clear, what is supported and what is not?

spolisetty · September 20, 2021, 5:57pm

Hi,

Sorry for the delayed response, hope following similar post will give more details to you.

For cuDNN please refer https://developer.nvidia.com/cudnn

If you still need further assistance w.r.t jetson, we recommend you to please post your concern on Jetson related forum to get better help.

Thank you.

maxim.godovicyn · September 22, 2021, 9:13am

Hi,

Thank you for your answer. The post you supposed unfortunately did not clarify. I understand that grouped conv3d is not supported and TRT have to split it and run kernel for each convolution, but why they are not on Tensor Cores?

I’ve created a new topic in jetson xavier section, thank you!

Topic		Replies	Views
TensorRT 7 conv3d is not running on Tensor Cores Jetson Xavier NX tensorrt	16	1517	December 1, 2021
Conv3D does not use Tensor Cores TensorRT tensorrt , cuda , cudnn	8	1068	October 23, 2020
Openvx/Visionworks graph input from GPU memory Buffer Jetson TX2 tensorrt , cuda , visionworks	5	1348	October 18, 2021
Can't install TensorRT in virtual environment of Jetson AGX Xavier Jetson AGX Xavier tensorrt , jetson-inference	3	1673	October 18, 2021
TensorRT importing issues Jetson TX2 tensorrt	10	713	August 1, 2022
How to use tensorrt in python of AGX Xavier JetPack5.1 Jetson AGX Xavier tensorrt , jetson-inference , python	8	2249	April 5, 2023
TensorRT 5.0.2.6 onnx run error TensorRT	17	3342	October 12, 2021
Conv3D - Running it on Tensor Core - cuDNN cuDNN	6	1499	June 12, 2020
TensorRT Quick Start Guide Example is not running (JetPack 4.2.2) Jetson AGX Xavier tensorrt , onnx	6	899	January 5, 2022
Driver error when upgrade TensorRT 6 to TensorRT 7 using YoloV4 in INT8 mode Jetson Xavier NX tensorrt	23	2487	October 18, 2021

TensorRT 7 conv3d is not running on Tensor Cores

Description

Environment

Relevant Files

Steps To Reproduce

Related topics