Description
Hello.
I’m trying to optimize pytorch NN on Jetson AGX Xavier 32GB with TensorRT, but I can’t make conv3d run on Tensor Cores. I’ve created really small and easy NN with convolution and activation only, and I expect it to run on tensor cores, but it’s not. However, if I replace conv3d with conv2d - I see conv2d running on tensor cores. I am running onnx model with trtexec and profiling it with nvprof + -m tensor_precision_fu_utilization and see no Tensor Core utilisation. Am I missing something?
I have another post in TensorRT section, but I was advised to post it in Jetson related forum.
Original post: TensorRT 7 conv3d is not running on Tensor Cores - #8 by maxim.godovicyn
Environment
TensorRT Version : 7.1.3-1+cuda10.2
GPU Type : Volta
Nvidia Driver Version : Jetpack 4.5.1
CUDA Version : 10.2
CUDNN Version : 8.0
Operating System + Version : 4.9.201-tegra (Jetpack 4.5.1)
Python Version (if applicable) : -
TensorFlow Version (if applicable) :-
PyTorch Version (if applicable) : 1.6 (for onnx model creation)
Baremetal or Container (if container which image + tag) :
Relevant Files
Onnx model:
test.onnx (432.6 KB)
Nvprof log file:
nvprof_test.log (9.5 KB)
Steps To Reproduce
Code to create onnx model:
import torch
from torch import nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv = nn.Conv3d(64, 64, 3, padding=1)
self.pool = nn.MaxPool3d(2, 2)
def forward(self, x):
x = self.conv(x)
x = F.relu(x)
return x
image_dims = (1, 64, 64, 64, 64)
dummy_input = torch.rand(image_dims, device="cuda")
model = Net().to("cuda")
torch.onnx.export(model, dummy_input, "test.onnx", verbose=True, opset_version=11)
Command to run model and create nvprof log:
sudo -E /usr/local/cuda-10.2/bin/nvprof -m tensor_precision_fu_utilization --log-file nvprof_test.log /usr/src/tensorrt/bin/trtexec --onnx=test.onnx --explicitBatch --dumpProfile --fp16 --verbose --workspace=4096