TensorRT support for multiple GPUs - URGENT

We are finding that the only way we can use TensorRT (7.2.3.4) on a new GPU that we haven’t used before, we have to rebuild TensorRT on that GPU type first.

For example, our software works on RTX 2070 Max Q but didn’t work on a GTX 1050 TI. So we got hold of a 1050 TI to build TRT on that machine but it didn’t work on a 1050. So we had to buy a 1050 to build yet another version. We thought that our TensorRT built on a GTX1660 would work on an RTX 2080 TI but it turned out we were wrong. It returns null when trying to load the engine into memory

Is the lack of inter-GPU compatibility expected with Tensor RT? If yes, what is the bare minimum of GPU types we would need to buy for to support all your GPUs above GTX1050.

@NVES @spolisetty Please help asap as we have an unhappy customer because of our lack of RTX 2080 TI support.

Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-priorities
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!

@NVES thank you for the prompt reply and for the links.

So does TensorRT not transfer well between GPUs without building on that specific GPU type?

My question wasn’t about multi-threading/streaming i don’t think?

Hi,

Serialized engines are not portable across platforms or TensorRT versions. Engines are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version). It is recommended to build the serialized engines on the targeted platforms directly.

Please refer to the below link for the same.
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#work

Thank you.

@spolisetty ok understood. Can you recommend which GPUs to use that will support the most number of other GPU types?

For example, if we build Tensor RT on a RTX 2070 then you should be able to support x, y and z GPUs. Or do we always need to have the EXACT same GPU as every customer?

Also, if we were to build on the non-TI version of a GPU, could we use tensorRT on the TI version (or vice- versa)? Ie if we build on a GTX 1050, would TRT work on a 1050TI?

@spolisetty @NVES is there anything you can recommend to allow us to benefit from the faster inference that TensorRT provides, but that is easier to port between machines? For example, should we be using ONNX or something?

Hi,

I believe this may not work better. It is alway recommended to build engine on the same host we will run infence, even same type of gpu.

Yes we can use ONNX to port across the platforms and use it to build the TensorRT engine.

Please refer support matrix for more info on TensorRT hardware/software requirements.
https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix

Thank you.