Q: If I build the engine on one GPU and run the engine on another GPU, does this work?
A: We recommend that you donβt; however if you do, youβll need to follow these guidelines:
The major, minor, and patch versions of TensorRT must match between systems. This ensures you are picking kernels that are still present and have not undergone certain optimizations or bug fixes that would change their behavior.
The CUDA compute capability major and minor versions must match between systems. This ensures that the same hardware features are present so the kernel does not fail to execute. An example would be mixing cards with different precision capabilities.
The following properties should match between systems:
β Maximum GPU graphics clock speed
β Maximum GPU memory clock speed
β GPU memory bus width
β Total GPU memory
β GPU L2 cache size
β SM processor count
β Asynchronous engine count
If any of the previous properties do not match, you receive the following warning: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
If you still want to proceed, then you should build the engine on the smallest SKU in the family because autotuner choices made on smaller GPUs generalize better.