I want to run same YOLOv3 model with different weights for specialized audience on Jetson Xavier NX using TensorRT 126.96.36.199. How should I approach this for the most optimized way?
For example I want to load model and I want real-time images to being inferenced by weights A when the robot is moving > 5m/s and being inferenced by weights B when the robot is moving < 5m/s. So loading and unloading model is not an option because switch has to be quick.
TensorRT Version: 188.8.131.52
GPU Type: Jetson Xavier NX
Nvidia Driver Version: Jetpack Release 35.1.0
I hope the following approach will help you.
Create two separate TensorRT engines for the Yolo V3 model, one for weights A and another for weights B. You can use the TensorRT C++ API, Python API, or trtexec to create these engines. A variety of settings can be selected through the engine creation process, such as the input and output model shapes, precision, and optimization parameters.
Load both engines into memory. Switch to engine A if the robot is moving faster than 5 m/s. Otherwise, switch to engine B. Then perform the inference based on the robot’s speed of movement.