TensorRT dynamic shape err: [slot.h::decode::151] Error Code 2: Internal Error (Assertion index < nbSlots failed.invalid encoded reference to a slot)

luchangli · September 13, 2021, 8:38am

Description

I got this error when I use TensorRT inference Resnet50 with dynamic batch.
I set multiple dynamic batch profiles : [[1, 1, 1], [2, 2, 2], [4, 4, 4], [6, 6, 6], [8, 8, 8], [10, 10, 10], [16, 16, 16], [32, 32, 32], [1, 64, 128]] , each is [min, opt, max] batch pairs for dynamic shape profiles. Exept the last profile, others are only optimized for a specific batch.
When I infer dynamic batch input, I will switch between different profiles. However, when I switch to the last profile from other profiles at the second time, I get this error and thus switch profile failed. I can switch between profiles except the last without fail.

A clear and concise description of the bug or issue.

Environment

TensorRT Version: both 7.0 and 8.0
GPU Type: v100
Nvidia Driver Version: cuda11.0 driver
CUDA Version: 10.2
CUDNN Version: 8.2
Operating System + Version: ubuntu18.04
Python Version (if applicable): 3.7.5
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

spolisetty · September 13, 2021, 12:44pm

Hi,

Could you please share us issue repro script/model to try from our end for better debugging.

Thank you.

luchangli · September 13, 2021, 12:59pm

The model is resnet50.pb with input shape [-1,3, 224, 224], the pb model is converted to onnx via opset=11 and then converted to trt model.
the c++ infer script is just like this:
int curProfileId = trtEngine.engineContext->getOptimizationProfile();
if (curProfileId != profileId) {
bool status = trtEngine.engineContext->setOptimizationProfile(profileId);
}
// call setBindingDimensions to set binding shape
// call executeV2 to execute

the dynamic batch profiles are batch profiles : [[1, 1, 1], [2, 2, 2], [4, 4, 4], [6, 6, 6], [8, 8, 8], [10, 10, 10], [16, 16, 16], [32, 32, 32], [1, 64, 128]]. only the last profile cover a range of batches.
When I get different input shape, I choose best profile id, however, when I switch to last profile at the second time, the setOptimizationProfile failed. Meanwhile, when I switch between other profiles that optimize a specific batch, no error occurs for tensorrt 8.0, but some times error occures for tensorrt 7.0.

luchangli · September 13, 2021, 1:22pm

Could you please give me an c++ code example about how to use dynamic batch, I didn’t find one in trt samples? Thank you very much.

spolisetty · September 13, 2021, 1:43pm

Hi @luchangli,

Please refer following sample and developer guide. Hope this will help you.

Thank you.

luchangli · September 13, 2021, 2:10pm

In the sampleDynamicReshape, it’s seems there is just one profile. Do you have multi profile version? Only one profile, the profile can’t have best performance for other batches. And What is the best practice to use multiple profiles?
Thank you very much.

spolisetty · September 15, 2021, 10:10am

Hi,

Currently we do not have example for multiple, apart from following inputs Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

Thank you.

silva.jappoz · September 16, 2021, 11:43am

Hi,
I have the same exact problem with a MobileNet V2 with dynamic batch, TensorRT 8 and CUDA 11.3.

In my case, the model runs just fine for a while and then, after getting this error, the network starts to output garbage.
This error happens always after a profile switch, usually after switching from the very first profile [1,1,1].

The same network was running fine in a different laptop with TensorRT 7.

spolisetty · September 16, 2021, 11:46am

Hi @silva.jappoz,

Please share issue repro ONNX model/scripts to try from our end for better assistance.

Thank you.

silva.jappoz · September 16, 2021, 4:57pm

model_nchw.onnx (10.1 MB)

The code I use for inference:

std::vector<float> TRT_Engine::infer(cv::InputArrayOfArrays _imgs)
{


  std::vector<cv::Mat> imgs;
  _imgs.getMatVector(imgs);
  const size_t batch_size = imgs.size();

  // need to select the right optimization profile
  size_t op = selectOP(batch_size);

  context_->setOptimizationProfileAsync(op, stream_);

  CudaSafeCall(cudaStreamSynchronize(stream_));

  int inputIndex =  engine_->getBindingIndex(input_name_per_profile_[op].c_str());
  int outputIndex = engine_->getBindingIndex(output_name_per_profile_[op].c_str());

  size_t num_inputs = batch_size * c_ * w_ * h_;
  size_t num_outputs = batch_size * 1;
  std::vector<float> input_buffer_host;
  std::vector<float> output_host;
  input_buffer_host.resize(num_inputs);
  output_host.resize(num_outputs);
  size_t img_vol = w_ * h_ * c_;

  std::vector<void *> buffers;
  buffers.resize(engine_->getNbBindings());
  CudaSafeCall(cudaMalloc(&buffers[inputIndex], num_inputs * sizeof(float)));
  CudaSafeCall(cudaMalloc(&buffers[outputIndex], num_outputs * sizeof(float)));

  float* data = input_buffer_host.data();
  for(const auto& img : imgs)
  {
    cv::Mat resized;
    cv::resize(img, resized, cv::Size(h_,w_));
    mat2chw(resized, data);
    data += img_vol;
  }


  CudaSafeCall(cudaMemcpyAsync(buffers[inputIndex],
                       input_buffer_host.data(),
                       num_inputs * sizeof(float),
                       cudaMemcpyHostToDevice, stream_));

  CudaSafeCall(cudaStreamSynchronize(stream_));


  context_->setBindingDimensions(inputIndex, Dims4(batch_size, c_, h_, w_));
  context_->enqueueV2(buffers.data(), stream_, nullptr);

  CudaSafeCall(cudaStreamSynchronize(stream_));

  CudaSafeCall(cudaMemcpyAsync(output_host.data(),
                        buffers[outputIndex],
                        num_outputs * sizeof(float),
                        cudaMemcpyDeviceToHost, stream_));

  CudaSafeCall(cudaStreamSynchronize(stream_));
  CudaSafeCall(cudaFree(buffers[inputIndex]));
  CudaSafeCall(cudaFree(buffers[outputIndex]));

  return output_host;

}

where the function selectOp is meant to select the most proper optimization profile to be used based on the current batch size and mat2chw is a function for converting a cv::Mat from hwc to chw format. Those functions have been already tested and proved to work correctly.

The image size is 96x96x3.
My GPU is a Quadro RTX 4000.
The batches I am optimizing for are:
[1,1,1], [1,2,3], [2,3,4], [3,4,5], [4,5,8], [5,8,16], [8,16,32], [16,32,64], [64,64,64].

silva.jappoz · September 17, 2021, 11:32am

Moreover,
I have found that the error shows up always after having switched to profile 0 ([1,1,1]) OR profile 8 ([64,64,64]), regardless which is the new profile being selected.

E.g.
profile 1 used → OK
profile 2 used → OK
profile 5 used → OK
profile 0 used → OK
profile # → ERROR

silva.jappoz · September 21, 2021, 12:49pm

up

spolisetty · September 21, 2021, 4:02pm

Hi @silva.jappoz,

Could you please confirm are you using multiple execution contexts ? Different contexts can’t use the same profile at the same time.

spolisetty · September 21, 2021, 5:04pm

Also could you please share us complete inference script to try from our end for better debugging.

Thank you.

silva.jappoz · September 22, 2021, 6:58am

No, I am not.

Here attached the complete code I use for inference…
forum_code_test.zip (4.2 KB)

luchangli · September 27, 2021, 7:26am

Does tensorRT support switching between profiles within one contex? Or we must create new context for each profile? Thanks.

spolisetty · September 27, 2021, 5:02pm

Hi,

Switching between profiles should work, also we do not see any obvious errors in the code you’ve shared. Could you please share us error verbose logs for better debugging.

Thank you.

silva.jappoz · September 28, 2021, 12:18pm

Hi, thanks for the help.
Here you can find attached a log file. I am logging also output classification results per batch. You can see that the results are OK (i.e. in range [0,1] until the switch to the last profile occurs and after that they are somehow exploding to nonsense values.

BTW, I didn’t menage to output more verbose logs from tensorrt even by using the sample::gLogger from Nvidia examples with severity == kVERBOSE. Any help for this as well will be very much appreciated.

Regards

log.36159 (179.6 KB)

spolisetty · October 4, 2021, 4:44pm

Hi,

We have 8.2 EA version available now. Could you please try on latest TensorRT version 8.2 EA and let us know if you still face this issue.

Thank you.

Topic		Replies	Views
How to use different profile in tensorrt? TensorRT tensorrt , python	3	1404	July 19, 2022
TensorRT С++ optimization profile TensorRT tensorrt , opencv , cuda	29	3071	September 9, 2021
Terrible scaling behavior of TensorRT using C++ API TensorRT tensorrt , cudnn	5	37	March 6, 2025
Network has dynamic or shape inputs but no optimization profiles have been defined TensorRT tensorrt , cuda , onnx	6	5820	July 16, 2020
[TensorRT] ERROR: input: dynamic input is missing dimensions in profile 0 TensorRT	11	7142	October 12, 2021
Error while trying to convert onnx to tensorrt engine TensorRT tensorrt , cudnn , onnx	1	67	March 28, 2025
Cuda Driver (TensorRT internal error) TensorRT	3	123	March 11, 2025
TensorRT fails to build FasterRCNN GIE model with using INT8 TensorRT	28	9210	May 3, 2018
Inference multiple images TensorRT TensorRT	8	2268	November 9, 2020
How can I access the same TensorRT engine model in different thread TensorRT cudnn	1	565	November 27, 2023

TensorRT dynamic shape err: [slot.h::decode::151] Error Code 2: Internal Error (Assertion index < nbSlots failed.invalid encoded reference to a slot)

Description

Environment

Relevant Files

Steps To Reproduce

Related topics