TensorRT dynamic shape err: [slot.h::decode::151] Error Code 2: Internal Error (Assertion index < nbSlots failed.invalid encoded reference to a slot)

Description

I got this error when I use TensorRT inference Resnet50 with dynamic batch.
I set multiple dynamic batch profiles : [[1, 1, 1], [2, 2, 2], [4, 4, 4], [6, 6, 6], [8, 8, 8], [10, 10, 10], [16, 16, 16], [32, 32, 32], [1, 64, 128]] , each is [min, opt, max] batch pairs for dynamic shape profiles. Exept the last profile, others are only optimized for a specific batch.
When I infer dynamic batch input, I will switch between different profiles. However, when I switch to the last profile from other profiles at the second time, I get this error and thus switch profile failed. I can switch between profiles except the last without fail.

A clear and concise description of the bug or issue.

Environment

TensorRT Version: both 7.0 and 8.0
GPU Type: v100
Nvidia Driver Version: cuda11.0 driver
CUDA Version: 10.2
CUDNN Version: 8.2
Operating System + Version: ubuntu18.04
Python Version (if applicable): 3.7.5
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,

Could you please share us issue repro script/model to try from our end for better debugging.

Thank you.

The model is resnet50.pb with input shape [-1,3, 224, 224], the pb model is converted to onnx via opset=11 and then converted to trt model.
the c++ infer script is just like this:
int curProfileId = trtEngine.engineContext->getOptimizationProfile();
if (curProfileId != profileId) {
bool status = trtEngine.engineContext->setOptimizationProfile(profileId);
}
// call setBindingDimensions to set binding shape
// call executeV2 to execute

the dynamic batch profiles are batch profiles : [[1, 1, 1], [2, 2, 2], [4, 4, 4], [6, 6, 6], [8, 8, 8], [10, 10, 10], [16, 16, 16], [32, 32, 32], [1, 64, 128]]. only the last profile cover a range of batches.
When I get different input shape, I choose best profile id, however, when I switch to last profile at the second time, the setOptimizationProfile failed. Meanwhile, when I switch between other profiles that optimize a specific batch, no error occurs for tensorrt 8.0, but some times error occures for tensorrt 7.0.

Could you please give me an c++ code example about how to use dynamic batch, I didn’t find one in trt samples? Thank you very much.

Hi @luchangli,

Please refer following sample and developer guide. Hope this will help you.

Thank you.

In the sampleDynamicReshape, it’s seems there is just one profile. Do you have multi profile version? Only one profile, the profile can’t have best performance for other batches. And What is the best practice to use multiple profiles?
Thank you very much.

Hi,

Currently we do not have example for multiple, apart from following inputs Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

Thank you.

Hi,
I have the same exact problem with a MobileNet V2 with dynamic batch, TensorRT 8 and CUDA 11.3.

In my case, the model runs just fine for a while and then, after getting this error, the network starts to output garbage.
This error happens always after a profile switch, usually after switching from the very first profile [1,1,1].

The same network was running fine in a different laptop with TensorRT 7.

Hi @silva.jappoz,

Please share issue repro ONNX model/scripts to try from our end for better assistance.

Thank you.

model_nchw.onnx (10.1 MB)

The code I use for inference:

std::vector<float> TRT_Engine::infer(cv::InputArrayOfArrays _imgs)
{


  std::vector<cv::Mat> imgs;
  _imgs.getMatVector(imgs);
  const size_t batch_size = imgs.size();

  // need to select the right optimization profile
  size_t op = selectOP(batch_size);

  context_->setOptimizationProfileAsync(op, stream_);

  CudaSafeCall(cudaStreamSynchronize(stream_));

  int inputIndex =  engine_->getBindingIndex(input_name_per_profile_[op].c_str());
  int outputIndex = engine_->getBindingIndex(output_name_per_profile_[op].c_str());

  size_t num_inputs = batch_size * c_ * w_ * h_;
  size_t num_outputs = batch_size * 1;
  std::vector<float> input_buffer_host;
  std::vector<float> output_host;
  input_buffer_host.resize(num_inputs);
  output_host.resize(num_outputs);
  size_t img_vol = w_ * h_ * c_;

  std::vector<void *> buffers;
  buffers.resize(engine_->getNbBindings());
  CudaSafeCall(cudaMalloc(&buffers[inputIndex], num_inputs * sizeof(float)));
  CudaSafeCall(cudaMalloc(&buffers[outputIndex], num_outputs * sizeof(float)));

  float* data = input_buffer_host.data();
  for(const auto& img : imgs)
  {
    cv::Mat resized;
    cv::resize(img, resized, cv::Size(h_,w_));
    mat2chw(resized, data);
    data += img_vol;
  }


  CudaSafeCall(cudaMemcpyAsync(buffers[inputIndex],
                       input_buffer_host.data(),
                       num_inputs * sizeof(float),
                       cudaMemcpyHostToDevice, stream_));

  CudaSafeCall(cudaStreamSynchronize(stream_));


  context_->setBindingDimensions(inputIndex, Dims4(batch_size, c_, h_, w_));
  context_->enqueueV2(buffers.data(), stream_, nullptr);

  CudaSafeCall(cudaStreamSynchronize(stream_));

  CudaSafeCall(cudaMemcpyAsync(output_host.data(),
                        buffers[outputIndex],
                        num_outputs * sizeof(float),
                        cudaMemcpyDeviceToHost, stream_));

  CudaSafeCall(cudaStreamSynchronize(stream_));
  CudaSafeCall(cudaFree(buffers[inputIndex]));
  CudaSafeCall(cudaFree(buffers[outputIndex]));

  return output_host;

}

where the function selectOp is meant to select the most proper optimization profile to be used based on the current batch size and mat2chw is a function for converting a cv::Mat from hwc to chw format. Those functions have been already tested and proved to work correctly.

The image size is 96x96x3.
My GPU is a Quadro RTX 4000.
The batches I am optimizing for are:
[1,1,1], [1,2,3], [2,3,4], [3,4,5], [4,5,8], [5,8,16], [8,16,32], [16,32,64], [64,64,64].

Moreover,
I have found that the error shows up always after having switched to profile 0 ([1,1,1]) OR profile 8 ([64,64,64]), regardless which is the new profile being selected.

E.g.
profile 1 used → OK
profile 2 used → OK
profile 5 used → OK
profile 0 used → OK
profile # → ERROR

up

Hi @silva.jappoz,

Could you please confirm are you using multiple execution contexts ? Different contexts can’t use the same profile at the same time.

Also could you please share us complete inference script to try from our end for better debugging.

Thank you.

No, I am not.

Here attached the complete code I use for inference…
forum_code_test.zip (4.2 KB)

Does tensorRT support switching between profiles within one contex? Or we must create new context for each profile? Thanks.

Hi,

Switching between profiles should work, also we do not see any obvious errors in the code you’ve shared. Could you please share us error verbose logs for better debugging.

Thank you.

Hi, thanks for the help.
Here you can find attached a log file. I am logging also output classification results per batch. You can see that the results are OK (i.e. in range [0,1] until the switch to the last profile occurs and after that they are somehow exploding to nonsense values.

BTW, I didn’t menage to output more verbose logs from tensorrt even by using the sample::gLogger from Nvidia examples with severity == kVERBOSE. Any help for this as well will be very much appreciated.

Regards

log.36159 (179.6 KB)

Hi,

We have 8.2 EA version available now. Could you please try on latest TensorRT version 8.2 EA and let us know if you still face this issue.

Thank you.