TensorRT Engine batch inference only has one result

Description

I used onnx2trt to generate engine file with batchsize 1. And the inference result is correct. Then I tried setting batchsize to 64 and generate a new engine and do inference. But the output is the same as before. My model is a classification network with 10 classes. With batchsize of 64, I’d expect the output to be of shape (64, 10) and all 64 outputs are the same. However, only the (1, 10) element is correct, the rest 63 are all 0.

This is the inference code I used: https://drive.google.com/file/d/1msXAYG9IbIxY1sLZyXdR52ZAtBSBN161/view?usp=sharing

Can anyone take a look and tell me what is wrong with it? Thanks.

Environment

freshly installed Jetpack 4.4 DP

Can you share the sample script and model file to reproduce the issue so we can help better?

Thanks

Thank you for your reply.

The python code I used is here: https://drive.google.com/file/d/1-t_-selZ-bDTIuI5vO3TSUIUhe-GCtR6/view?usp=sharing

And the trt engine is here: https://drive.google.com/file/d/1F0O4312WwHVhd81jz3xV-eySpZfEgrqo/view?usp=sharing

The generated plan files are not portable across platforms or TensorRT versions.
Could you please share the ONNX file as well so that i can generate the trt engine using that model file?

Thanks

Sorry I forgot about it.

Here is the onnx model: https://drive.google.com/file/d/1KIOLQ-pC5dUckZTxgTv7skWfCTIYST6c/view?usp=sharing

Thanks.

@SunilJB Hi, is there any update on this issue? Thanks.

We are looking into it, will update you accordingly.

Thanks

Since TRT >= 7 requires EXPLICIT_BATCH for ONNX, for fixed-shape model, the batch size is fixed.
You have to use a dynamic shape model in this case.
Please find below link with a minimal example of exporting Alexnet from PyTorch with dynamic batch size here: https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930

Thanks

Thank you for your update.

After generating the dynamic batch size onnx model, the onnx2trt tool cannot parse the onnx model to trt engine.

What I tried afterwards is when exporting onnx model, use dummy data with explicit batchsize, (in my case, 64x3x192x48 instead of 1x3x192x48). The exported onnx will be set to use batchsize of 64. And when using onnx2trt, just set batchsize to 1.