Question regarding Tensorrt engine build vs inference environment (TensorRT version, Platform, etc)

Description

We like to convert models trained with TAO and save them as tensorrt engine files.
These saved .engine files will be used with trition server docker containers for inferencing (on the same host machine on which the models were built - same GPU). We would upgrade triton docker base image as new images are released, but we would like to use already converted modes (.engine files) with the new versions of docker images whenever its possible. Therefore, we would like to know which specific changes in model build environment vs inference environment are acceptable and which would result in mismatch issues.

TensorRT documentation(Python API> building an engine and saving .engine file) states following:

Serialized engines are not portable across platforms or TensorRT versions. 
Engines are specific to the exact GPU model they were built on (in addition to the platform and the TensorRT version).

We would like to know:

  • what is referred as platform here ? (what constitutes it, does it include changes in CUDA version, CuDNN version etc)
  • TensorRT versions - does is have to match exact versions (build vs inference environment) or only need to match major/ major-minor versions?

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/#error-messaging
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/#faq

Thanks!

thanks for the quick reply.

I think you have misunderstood my question. I am not facing an issue at the moment. What I have described above is a question about definitions for some facts given under TensorRT documentation. And it would be great to clarify the definitions so that we can add some validation checks to my code to see if the converted engines are compatible with inference environment.

TensorRT documentation(Python API> building an engine and saving .engine file) states following:

Serialized engines are not portable across platforms or TensorRT versions. 
Engines are specific to the exact GPU model they were built on (in addition to the platform and the TensorRT version).

We would like to know:

  • what is referred as platform here ? (what constitutes it, does it include changes in CUDA version, CuDNN version etc)
  • TensorRT versions - does is have to match exact versions (build vs inference environment) or only need to match major/ major-minor versions?

Thanks alot

Hi,

Sorry for the delayed response.

Platform refers to OS here. However we have to make sure above. Also versions should be exactly same.

Thank you.