TensorRT 8.6.1.6 more resource hungry than 7.X?

Description

I have some multi-threaded code basd on TensorRT for evaluating fully convolutional networks. I recently upgraded to TensorRT 8.6.1.6 from 7.2.3.4 (I’d previously used several 7.XXX versions sucessfully). The code creates N copies of the model per GPU (one execution thread per copy) be de-serialising, createExecutionContext, etc. separately in a for loop (c++). Previously I could set N=3 on my laptop (single GPU RTX5000 mobile) and all was fine. With TensorRT 8.6.1.6 it fails with N=3 either at the createExecutionContext stage (3rd model, first 2 are fine), or when the code comes to use the models. N=2 works fine on this machine even with TensorRT 8.6.1.6. It seems the new version is creating instances that are more resource hungry than previous versions and I can’t use 3 per GPU any more? (or another explanation?). The only difference in code is I had to remove the following line as the method no longer exists:

builder->setMaxWorkspaceSize(1 << 20);

I’ve no idea if this is related. If so is there another way to control GPU resource usage?

Note: The code is largely stolen from the sampleOnnxMNIST example (or rather the version of that in TRT 7.X)

Environment

TensorRT Version: 8.6.1.6
GPU Type: RTX5000 (mobile)
Nvidia Driver Version: 531.14
CUDA Version: 12.1
CUDNN Version: -8.9.0.131_cuda12 (aklthough didn’t explicitly compile against this, DLLs are present)
Operating System + Version: Windows 10
Python Version (if applicable): n/a
TensorFlow Version (if applicable): n/a
PyTorch Version (if applicable): n/a
Baremetal or Container (if container which image + tag): no container, native code

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

It’s not a bug as such, just a limitation/advice seeking.

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

TensorRT 8.x may have different internal optimizations and default settings that increase memory and resource usage compared to 7.x.
New features or changes in the CUDA driver or library dependencies might also contribute to higher resource demands.

The method IBuilderConfig::setMaxWorkspaceSize() has been deprecated. In TensorRT 8.x, workspace size can be controlled using the IBuilderConfig::setMemoryPoolLimit() with MemoryPoolType::kWORKSPACE.