How to handle TensorRT DIM 5 input (NCHW / NHWC)?


I have a model created on tensorflow 2.x, then converted to ONNX, then converted to an engine using trtexec (v8.0.1).
The model operates on several input images in a sequence:
The model input dimensions are 1x-1x-1x-1x3 (batch size, number of images, height, width, channel).
I’m using TensorRT C API to run inference.

I have a few questions:

  1. I’ve read that TensorRT implementation is NCHW. Does NCHW mean I have to send TensorRT planar RGB images? For example OpenCV uses interleaved RGB, so do I have to convert those images to planar RGB?

  2. How does NCHW generalize to DIM5?
    Should I register my input blob as1x10x224x224x3, for example and feed 10 RGB interleaved images, like tensorflow training. was made). Will TensorRT mistaken

  3. Can I shorter inference time if I retrain my model for 1x10x3x224x224 and infer on RGB planar images?

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging

Thanks @NVES ,

I have no problem converting my model to TensorRT engine using ‘trtexec’.

Could you kindly review my questions? I think they are general and not specific to any model.

I’m mainly interested to understand what TensorRT considers as ‘C’ when you provide a DIM5 tensor, and whether
I can improve inference time by using a model with input tensor 1x10x3x224x224 instead of 1x10x224x224x3.


It depends on how the model is written.
If you setup input dimension to (batch size, number of images, height, width, channel) and there is a shuffle to convert it back to (batch size, number of images, channel, height, width) to serve the conv afterward, then the data should be prepared with the interleaved format.
If you set up the input dimension to (batch size, number of images, channel, height, width) then you should prepare the data in a planar format.

Besides, we have API to set up TensorFormat to allow you to set up the input layout. For example, you can have (batch size, number of images, channel, height, width) as the input dimension and use the HWC format. Then you can also use interleaved data.

For layers like conv, we always assume that the dimension of input is (N, N1, …, C, H, W). Dimension and TensorFormat are different things. Dimension defines the shape of a tensor and TensorFormat defines the memory layout.

For the number of dimensions, we have already avoided calling our format “NCHW” as it is misleading, we usually call it “Linear”. So it does not limit how many dimensions are there.

Thank you.

Thanks @spolisetty ,

Your explanation clears a few things up for me.

My model (sent to you via messsages) has no shuffle layer before the first CONV2D, and I define input dimnesions to (batch size, number of images, height, width, channel), and provide interleaved format data. The behavior of the model is as I expect (visually).

  1. Can you confirm the " StatefulPartitionedCall/ToyModel/time_distributed/conv_layer/conv2d/Conv2D__39" TRANSPOSE layer behaves similar to the Shuffle layer you proposed?

  2. Performance wise, can you estimate whether it costs more time to prepare planar data out of interleaved data (in real-time inference from capture devices) or use the shuffle layer?


What is " StatefulPartitionedCall/ToyModel/time_distributed/conv_layer/conv2d/Conv2D__39" TRANSPOSE layer, regarding perf, it depends on real use case. As TRT also has some optimized NHWC kernels. We need two models to verify perf. One is NCHW as input dimension but specifies the CHW4 input format. Another has NHWC as an input dimension and has a shuffle inside.

Thank you.