We like to convert models trained with TAO and save them as tensorrt engine files.
.engine files will be used with trition server docker containers for inferencing (on the same host machine on which the models were built - same GPU). We would upgrade triton docker base image as new images are released, but we would like to use already converted modes (.engine files) with the new versions of docker images whenever its possible. Therefore, we would like to know which specific changes in model build environment vs inference environment are acceptable and which would result in mismatch issues.
TensorRT documentation(Python API> building an engine and saving
.engine file) states following:
Serialized engines are not portable across platforms or TensorRT versions. Engines are specific to the exact GPU model they were built on (in addition to the platform and the TensorRT version).
We would like to know:
- what is referred as
platformhere ? (what constitutes it, does it include changes in CUDA version, CuDNN version etc)
- TensorRT versions - does is have to match exact versions (build vs inference environment) or only need to match major/ major-minor versions?