Hello.
I’m trying to optimize pytorch NN on Jetson AGX Xavier 32GB with TensorRT, but I can’t make conv3d run on Tensor Cores. I’ve created really small and easy NN with convolution and activation only, and I expect it to run on tensor cores, but it’s not. I am running onnx model with trtexec and profiling it with nvprof + -m tensor_precision_fu_utilization and see no Tensor Core utilisation. Am I missing something?
Environment
TensorRT Version: 7.1.3-1+cuda10.2 GPU Type: Volta Nvidia Driver Version: Jetpack 4.5.1 CUDA Version: 10.2 CUDNN Version: 8.0 Operating System + Version: 4.9.201-tegra (Jetpack 4.5.1) Python Version (if applicable): - TensorFlow Version (if applicable):- PyTorch Version (if applicable): 1.6 (for onnx model creation) Baremetal or Container (if container which image + tag):
Hi,
Thanks for your answer, but I’ve checked these links and cudNN guidelines for 3D convolutions on Tensor Cores, but i don’t see any limitation I’ve broken. All of my input and output channels are multiple of 8, my kernel is (3, 3, 3), padding is (1, 1, 1). So what’s wrong? I have wrong board configuration or this net shouldn’t run on Tensor Core?
It’s may be difficult to say whether a single layer can choose MMA format as it requires additional kernel to transform input which may make the overall performance is not that good.
Thanks for your answer. I understand that, but when I do the same thing with conv2d I see tensorrt at least trying to use tensor core instructions (h884***). On the other hand, with 3d convolution I don’t see it. How can I check if conv3d can be offloaded to Tensor Cores. I believe, i checked all the limitations and i don’t see if I am breaking one.
I’ve double checked and replaced in my model creation code Conv3d/MaxPool3d with Conv2d/MaxPool2d and i see Conv2d executing on Tensor Cores.
import torch
from torch import nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv = nn.Conv2d(64, 64, 3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
def forward(self, x):
x = self.conv(x)
x = F.relu(x)
return x
#image_dims = (1, 64, 64, 64, 64)
image_dims = (1, 64, 64, 64)
dummy_input = torch.rand(image_dims, device="cuda")
model = Net().to("cuda")
torch.onnx.export(model, dummy_input, "test.onnx", verbose=True, opset_version=11)
Can you please share a link or some documentation on cudNN 8.0 for jetson platforms, because as I got it, it differs from desktop release and it’s not clear, what is supported and what is not?
Thank you for your answer. The post you supposed unfortunately did not clarify. I understand that grouped conv3d is not supported and TRT have to split it and run kernel for each convolution, but why they are not on Tensor Cores?
I’ve created a new topic in jetson xavier section, thank you!