I am trying to optimize model using TensorRT executable trtexec on server grade GPU Nvidia Tesla and want to run that optimized model on Nvidia Jetson NX.
But from this post, it seems TensorRT optimized model are not only very specific to TensorRT version but also the kind of GPU used to generate that optimized model.
But I still want to know if there is a way to port TensorRT optimized model from server grade gpu to Nvidia Jetson by generating some non-specific optimized model. Or is there any way to simulate Nvidia Jetson NX on Ubuntu server which can output such kind of model.
Q: If I build the engine on one GPU and run the engine on another GPU, does this work?
A: We recommend that you don’t; however if you do, you’ll need to follow these guidelines:
The major, minor, and patch versions of TensorRT must match between systems. This ensures you are picking kernels that are still present and have not undergone certain optimizations or bug fixes that would change their behavior.
The CUDA compute capability major and minor versions must match between systems. This ensures that the same hardware features are present so the kernel does not fail to execute. An example would be mixing cards with different precision capabilities.
The following properties should match between systems:
– Maximum GPU graphics clock speed
– Maximum GPU memory clock speed
– GPU memory bus width
– Total GPU memory
– GPU L2 cache size
– SM processor count
– Asynchronous engine count
If any of the previous properties do not match, you receive the following warning: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
If you still want to proceed, then you should build the engine on the smallest SKU in the family because autotuner choices made on smaller GPUs generalize better.