Dear Forum or Mods,
When I convert a model to a .plan-file with tensor-rt, the result will be GPU-specific due to the auto-tuning. Did you do ever run evaluations on how the generated .plan-file will perform on a different GPU using the same architecture? E.g. two GPU’s using GPU_ARCH “75” (Like the RTX 2080 and RTX 2070). Will there be any difference at all, will it be slower, less accurate?
The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be re-targeted to the specific GPU you want to run.
Please refer to the below link for the same,
I just tried out the same .plan file on two GPUs (RTX 2080Ti and RTX6000) of the same GPU_ARCH (build it on one and run it on the other and build&run it on the same GPU). I was not able to spot any difference in throughput or accuracy. Is this just random or can I assume nearly same behaviour on 2 GPUs of the same Generation and GPU_ARCH?