Slow inference with C++ slowfast

Description

I deploy the action detect model “slowfast” using C++ API defination. But its inference takes almost 1 second. (60+ms in pytorch). It seems to be due to the 3dconv. I wonder if this is because jetson nx doesn’t support 3dconv well or something else.
I have asked for help in 3dconv takes too long · Issue #2153 · NVIDIA/TensorRT · GitHub

Environment

TensorRT Version: 8.2.1
GPU Type: jetson xavier nx
Nvidia Driver Version:
CUDA Version: 10.2
CUDNN Version: 8.0.0
Operating System + Version: Ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered