TensorRT Batch Inferences : empty outputs


Hi all,

I have been working in Python with YoloV3 and YoloV4 on Jetson Nano and Jetson Xavier NX for a while, with a batch size of 1 and I never had an issue there.

Now for a project I am trying to use YoloV4 (yolov4-288) with multiple inputs, ie a batch size of 2.

I have been able to properly convert YoloV4 to a ONNX file with a static input dimension of [64, 3, 288, 288], and then to convert it to a TensorRT file with a input size of [2, 3, 288, 288].

When I run the inferences with the inputs being the frames of two videos, the engine output is good for only the first frame, while it is completely null for the second frame.

I would like to know what I can do to make the inferences working for the second frame.


TensorRT Version: 7.1.3
GPU Type: Jetson Xavier NX
CUDA Version: 10.2
Jetpack: 4.4.1
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 1.15
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

All files including the yolov4.onnx, yolov4.trt, the conversion scripts and a script test_batches.py with the video files to test on, are on my Google Drive : https://drive.google.com/drive/folders/1QYAuu4x8aprGykTDggoaOBayOgJymssd?usp=sharing

Steps To Reproduce

The script test_batches.py show the number and indexes of detections from the outputs of the trt engine. Here the trt engine get the frames of two videos. It will raise an error after the first batch processing (if you comment line 285 it will process the following frames).

The detection outputs here only concern the first frame. All the detection arrays for the second frame are equal to zeros. And if you change the second frame by replacing with Nothing or with the first frame then the results are the same.

Hi @thomallain,
You are currently using static shapes.
In order to make this work for a different batch size, you should use dynamic shapes


Hi @AakankshaS,

I have followed this topic to use the dynamic shapes ONNX to TensorRT with dynamic batch size in Python

I have changed my conversion from Darknet to ONNX to get a ONNX model with a N batch size (like in the other topic) https://drive.google.com/file/d/19p6DzZGm4xTkvxRU74ELzmH8Vin-I4XP/view?usp=sharing

To convert my ONNX file to TensorRT, I have used the command (which works):

trtexec --onnx=yolov4-416.onnx --fp16 --workspace=4096 --explicitBatch --verbose --shapes=000_net:2x3x416x416 --saveEngine=yolov4-416.trt --optShapes=000_net:2x3x416x416 --maxShapes=000_net:4x3x416x416 --minShapes=000_net:1x3x416x416

Now during runtime, I got the following :

  • engine.get_binding_shape(0) -> (-1, 3, 416,416)
  • engine.max_batch_size -> 1
  • engine.num_optimization_profiles  -> 1

So according to the other topic, the input shape and max_batch_size are correct. But my engine only considers that it has one optimization profile (with the -1 dimension), and not even the one under --shapes from the trtexec command. And because of that it, it tries to allocates negative memory space.

Why my optimization profiles from the trtexec command are not present ? And is it the right way to make the dynamic shape for the batch inference ?

1 Like

I have also followed the dynamic shapes process, for batch size 3, and am also seeing missing inferences for the first two images. My output:

[array([0., 0., 0., …, 0., 0., 0.], dtype=float32), array([0., 0., 0., …, 0., 0., 0.], dtype=float32), array([-0.6542635, 0.774671 , -2.804221 , …, 0. , 0. ,
0. ], dtype=float32)]

1 Like

Hi @thomallain ,
Apologies for delayed response, are you still facing the issue?

@thomallain , Hi, I am also facing the same issue. Have you found the solution to this problem?

This looks like a Jetson issue. We recommend you to raise it to the respective platform from the below link


@NVES , Hi. In my case, I generated TensorRT model by using trtexec on GTX 1070 so I think this particular issue also exist on devices aside from Jetson systems.