Do tensorRT plan files are portable across different GPUs which have the same type

Hi all!

In the sdk documentation, it say “The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to platforms and the TensorRT version) and must be re-targeted to the specific GPU in case you want to run them on a different GPU.”

I am confused about the meaning of “the different GPU”, Is different instance GPU with the same type the different GPU.

I have the following exp, which result is odd.

With gtx 1080ti and float32 mode
I do not use the plan file, its fps is 123.55
I use the plan file that generated in GPU-0 inference in GPU-0, its fps is 123.94
I use the plan file that generated in GPU-0 inference in GPU-1, its fps is 126.67 and it WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.

With gtx 1080ti and int8 mode
I do not use the plan file, its fps is 211
I use the plan file that generated in GPU-0 inference in GPU-0, its fps is 195
I use the plan file that generated in GPU-0 inference in GPU-1, its fps is 194 and it WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.

the odd place is when i use plan file in int8 mode,its speed is slower. but i use plan file in floa32 mode is ok

Beside, what does “WARNING: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.” means? i port the plan file in GPUs of the same type, why i get this warning?

Hi,

Using the same plan file across multiple GPUs of the same type (1080 ti’s in your case) is fine. That WARNING is just a catch-all to warn users not to use the same plan file on a 1080, a 1070, a T4, a V100, etc. and expect it work the same. However, TensorRT models even take into account the current available memory on the device at the time of creating the plan file. So even if you’re using the same GPU (1080ti) but the one you used to create the plan file had background processes taking up half of the GPUs memory, you could see different performance than if the memory was all free at the time of creating the plan file.

Regarding performance (speed), INT8 optimization is not guaranteed to improve performance, it’s very dependent on the model. When INT8 optimization isn’t possible, it ends up falling back on FP16, and then falling back again on FP32 if need be. You can get a rough idea of this based on the file size of your models. Sometimes a model that you try to optimize for INT8 will end up the same size as FP32, meaning it ended up falling back on higher precisions in order to maintain accuracy or something along those lines.

Thanks,
NVIDIA Enterprise Support

Funny enough, I do get that kind of warning if I disconnect the physical monitor from the devices (Jetson Nanos). When monitor is attached I do not get them. I built everything when the monitor was connected, could that be the reason?
Should I rebuild everything without a monitor (only using vnc) to avoid that warning?

Best regards, Walter