When using TensorRT to optimize the inference running speed of the two estimators of foundationpose, namely the score and pose refine estimators, it was found that the effect was not significant. The inference running time was reduced from over 2 seconds to 1.4 seconds. I don’t know what the reason is.
When the program starts running, the pose estimation time for the entire estimator is approximately 20 seconds, which is indeed quite slow.
Hi,
The first run is expected to be slow since some initialization is required.
Have you tried the command below to maximize the device performance first?
$ sudo nvpmodel -m 0
$ sudo jetson_clocks
Thanks.
These two instructions are useful. The time has been reduced from 20 seconds to around 6 seconds. Are there any other methods that we can discuss?
Hi,
Yes, for Jetson Nano, you can quantize the model to FP16 to get a better performance.
Usually, this can be done via trtexec with an extra configuration:
$ /usr/src/tensorrt/bin/trtexec --onnx=[file] --fp16
Thanks.
Yes,I did it.
Hi,
Please check if the GPU utilization is full in your use case first:
$ sudo tegrastats
Ideally, the loading of GR3D should be 99% when the device is running in performance mode.
Thanks.
However, the result I obtained was that the PT model took approximately 0.6 seconds (for an input size of 252*6*160*160), while the TensorRT result was around 0.2 seconds, and the reduction in time was not significant. However, there was a roughly 10-fold difference for Yolo and Detr.
Yes,it is OK. It should be sudo tegrastats and the GR3D is 99%
However, the result I obtained was that the PT model took approximately 0.6 seconds (for an input size of 252*6*160*160), while the TensorRT result was around 0.2 seconds, and the reduction in time was not significant. However, there was a roughly 10-fold difference for Yolo and Detr.
How about this?
Hi,
Sorry for the typo.
Do you use the same model for PT and TensorRT?
Suppose you can optimize both model with TensorRT.
Thanks.
Yes, of course.
However, sometimes I found it was not 99%. How to improve it when I am running the code?
Hi,
You can try to profile the pipeline with Nsight System and locate the bottleneck.
Usually, this is caused by some CPU-based pre-processing and post-processing.
Thanks.

