I have this object detection model (SSD mobilenet 0.3Mpx image input) optimized from TensorFlow through TRT, precision FP16. The TRT model runs a bit faster on my workstation GPU (GTX1080) after the optimization. (10fps > 11.5fps)
Just mention, I can run the model correctly if I export it with TFv1. I don’t know why it fails on DeepStream when exported with TF2, but that’s another topic.
I’ve been following this: tf-trt-user-guide
Now, I want to take the TRT model and run it on a Jetson Nano.
Previously, the original TF model (no TRT) was able to run on the Nano, extremely slow, 2 seconds per frame, but it run.
After some tweaks in the config files (deepstream, triton), I can start the pipeline, then Triton server tries to serve the model. Triton is able to find the model, there are no errors related to shape, dimensions, etc, those were fixed.
At some point the debug info stops, CPU and memory usage hit 100%, and the system is unresponsive for 5 minutes. At the end, the process ends killed because Out of Memory, confirmed from dmesg output.
Questions
What is exactly doing Triton Server that takes all system resources? Can I move anything from that to the workstation?
I already increased swap to 4GB, same result.
What can I try to make it work? Different TRT export parameters?
Does it makes sense to keep trying? Will the model run much faster on the Nano than before?
Is there a way to further optimize the model to enable real-time object detection.?
Thanks as always to the Nvidia forum team, providing information and solutions.
Hi @AastaLLL !
Thanks for the thorough response. I will try all the steps and mark the topic as solved soon.
However, I have one question left.
Why is it that before optimization, the TensorFlow model was able to be loaded and run by Triton server, and then after optimization, is not possible to be loaded because of Out of Memory?
Is Triton Server performing a hardware-specific optimization step?
In addition, the TensorFlow graph definition file size is larger after the optimization, is that because it stores additional TRT weights? Does that cause extra memory use?
PS:
Correct me if I’m wrong, Jetson Nano has unified memory, so if swap memory frees CPU memory, there is more room for GPU?
If you have applied the TF-TRT optimization with Triton, then yes, it does some hardware-specific optimization.
Please note that if TF-TRT is used, Triton needs to load both TensorFlow and TensorRT libraries.
Usually, the file size increases since it needs to save the TensorFlow staff as well as TensorRT’s.
There is no obvious relation between memory usage and file size.
It’s more related to the library you used and the model depth.
For the swap memory, yes it is.
But the system should prefer to use physical memory first.
It won’t be easy to control an allocation from a swap or physical memory.