Working with 1D in TensorRT

I imported my ONNX model using a parser in TensorRT.
My ONNX model include two conv1d layers.

  1. I did not see any 1D convolution layer in the TensorRT layer list (see 2. Layers and Features) :
    Support Matrix :: NVIDIA Deep Learning TensorRT Documentation
    There is only IConvolutionLayer for 2D and 3D convolution.
    I would like to know if TensorRT uses a specific conv1d layer or if it adapts his conv2d layer to 1D.

  2. I also would like to know if there are some specific features with 1D (caveats, thinks to avoid, differents ways to optimize than if it would be 2D).

Hi @julie.fraysse

We don’t natively support it. If we see one in ONNX we convert the 1D conv into a 2D conv, perform the operation, and convert the result back.


1 Like

Thanks for your reply,

  1. Is this why I see a lot of “copyPackedKernel” during batch inference execution?
  2. These kernels seem to cousume a lot of time (nearly half of the time). Is this normal ?

Hi @juliefraysse,

Sorry for the delayed response. We do convert Conv1D to Unsqueeze → Conv2D → Squeeze, we add a shuffle node while doing it. And copyPackedKernel comes from it. It’s very slow, expected behavior. They are not just doing copy operation and do some other operations as well.

Thank you.

Thank you for your valued help on this topic.

I work on a 1D (signal) convolutional neural network.

  1. Like I said these copyPackedKernels take nearly half of the inference time. Is there a way to avoid or reduce the time occupied by these kernels via TensorRT ?
  2. I can’t get to use all the power of Tensor Cores despite all my attempts. TensorRT choose very rarely the Tensor Cores option. It is very frustrating because Tensor Cores represent 80% of the computing power (RTX4000 mobile). I would like to make use of this computing power. That is why I went from TensorRT to cuDNN and cuBLAS. I implemented a matrix multiplication enabling the Tensor Cores but the use of the Tensor Cores remains generally random. For exemple, for the convolutions, my idea was to decompose 1D signal, use a 2D matrix multiplication for doing the 1D convolution and rebuilt the result after. Is that a good idea ? What would you recommend in terms of 2D matrix sizes ? Which programming level should I use ? (TensorRT, cudNN, cuBLAS, WMMA).

Thanks in advance.

Hi @julie.fraysse,

We recommend you to to Enlarge the batch size, make your network deeper and use fp16/int8 precision, which will be helpful to take full advantage of Tensor Cores. We still recommend you to use TensorRT to do this since cudnn and cublas are integrated in TensorRT. WMMA is the Tensor Cores interface for users and the kernels may not be fully optimized.

Thank you.