TensorRT more resource hungry than 7.X?


I have some multi-threaded code basd on TensorRT for evaluating fully convolutional networks. I recently upgraded to TensorRT from (I’d previously used several 7.XXX versions sucessfully). The code creates N copies of the model per GPU (one execution thread per copy) be de-serialising, createExecutionContext, etc. separately in a for loop (c++). Previously I could set N=3 on my laptop (single GPU RTX5000 mobile) and all was fine. With TensorRT it fails with N=3 either at the createExecutionContext stage (3rd model, first 2 are fine), or when the code comes to use the models. N=2 works fine on this machine even with TensorRT It seems the new version is creating instances that are more resource hungry than previous versions and I can’t use 3 per GPU any more? (or another explanation?). The only difference in code is I had to remove the following line as the method no longer exists:

builder->setMaxWorkspaceSize(1 << 20);

I’ve no idea if this is related. If so is there another way to control GPU resource usage?

Note: The code is largely stolen from the sampleOnnxMNIST example (or rather the version of that in TRT 7.X)


TensorRT Version:
GPU Type: RTX5000 (mobile)
Nvidia Driver Version: 531.14
CUDA Version: 12.1
CUDNN Version: - (aklthough didn’t explicitly compile against this, DLLs are present)
Operating System + Version: Windows 10
Python Version (if applicable): n/a
TensorFlow Version (if applicable): n/a
PyTorch Version (if applicable): n/a
Baremetal or Container (if container which image + tag): no container, native code

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

It’s not a bug as such, just a limitation/advice seeking.

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered