TensortRT execute with variable batch size gave incorrect results

Description

Using code very similar to MNIST UFF sample.

  1. Engine created with
    builder->setMaxBatchSize(32);

  2. Then run inference on 32 input tiles gives expected result.
    context->execute(32, &internalConfig->buffers[0]);

  3. Then run inference on 31 input tiles gives expected result.
    context->execute(31, &internalConfig->buffers[0]);

  4. Then back to 32 input tiles gives the wrong result for input 32
    context->execute(32, &internalConfig->buffers[0]);

I can provide code if required but initially just asking if this was a known issue.
If I always run inference on the whole batch size then it’s all fine.
Just seems a waste when I don’t have enough tiles for a whole batch.

Environment

TensorRT Version: 7.1.3.0
GPU Type: Jetson NX
Nvidia Driver Version:
CUDA Version: 10.2.89
CUDNN Version: 8.0.0.180
Operating System + Version: Jetpack 4.4.1
Python Version (if applicable):
TensorFlow Version (if applicable): 1.15
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/#error-messaging
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/#faq

Thanks!