Jetson Nano Out of Memory running TRT Model

xtianhb.glb · November 26, 2021, 3:37pm

Hello!

Context

I’m developing an object detection application in my workstation GPU, in order to later deploy it on a Jetson Nano.

Platform: GTX1080 | Jetson Nano DevKit 4GB
DeepStream: 6.0
Triton: 21.08
TRT: 8.0
CUDA: 11
JetPack: 4.6
Docker: nvcr.io/nvidia/deepstream:6.0-triton

I have this object detection model (SSD mobilenet 0.3Mpx image input) optimized from TensorFlow through TRT, precision FP16. The TRT model runs a bit faster on my workstation GPU (GTX1080) after the optimization. (10fps > 11.5fps)
Just mention, I can run the model correctly if I export it with TFv1. I don’t know why it fails on DeepStream when exported with TF2, but that’s another topic.
I’ve been following this: tf-trt-user-guide
Now, I want to take the TRT model and run it on a Jetson Nano.
Previously, the original TF model (no TRT) was able to run on the Nano, extremely slow, 2 seconds per frame, but it run.
After some tweaks in the config files (deepstream, triton), I can start the pipeline, then Triton server tries to serve the model. Triton is able to find the model, there are no errors related to shape, dimensions, etc, those were fixed.
At some point the debug info stops, CPU and memory usage hit 100%, and the system is unresponsive for 5 minutes. At the end, the process ends killed because Out of Memory, confirmed from dmesg output.

Questions

What is exactly doing Triton Server that takes all system resources? Can I move anything from that to the workstation?
I already increased swap to 4GB, same result.
What can I try to make it work? Different TRT export parameters?
Does it makes sense to keep trying? Will the model run much faster on the Nano than before?
Is there a way to further optimize the model to enable real-time object detection.?

Thanks as always to the Nvidia forum team, providing information and solutions.

AastaLLL · November 29, 2021, 3:03am

Hi,

1. If you want to deploy the model on Jetson, these resources need to be allocated on the device directly.

2. swap is CPU memory. It won’t increase GPU memory amount.

3. Could you measure the total memory usage on desktop first.
Since Nano has only 4GiB memory, it has some limitation on a complicated model.

4. Please check below for the inference benchmark of Jeston.
Usually, it’s recommended to use pure TensorRT for less memory and better performance.

5. You can reproduce above performance with the source code in the below GitHub:

Thanks.

xtianhb.glb · November 29, 2021, 11:45am

Hi @AastaLLL !
Thanks for the thorough response. I will try all the steps and mark the topic as solved soon.

However, I have one question left.
Why is it that before optimization, the TensorFlow model was able to be loaded and run by Triton server, and then after optimization, is not possible to be loaded because of Out of Memory?
Is Triton Server performing a hardware-specific optimization step?

In addition, the TensorFlow graph definition file size is larger after the optimization, is that because it stores additional TRT weights? Does that cause extra memory use?

PS:
Correct me if I’m wrong, Jetson Nano has unified memory, so if swap memory frees CPU memory, there is more room for GPU?

Thanks!!!

AastaLLL · December 3, 2021, 5:29am

Hi,

If you have applied the TF-TRT optimization with Triton, then yes, it does some hardware-specific optimization.
Please note that if TF-TRT is used, Triton needs to load both TensorFlow and TensorRT libraries.

Usually, the file size increases since it needs to save the TensorFlow staff as well as TensorRT’s.
There is no obvious relation between memory usage and file size.
It’s more related to the library you used and the model depth.

For the swap memory, yes it is.
But the system should prefer to use physical memory first.
It won’t be easy to control an allocation from a swap or physical memory.

Thanks.

xtianhb.glb · December 3, 2021, 12:19pm

@AastaLLL Thank you so much, that is the answer I’ve been looking for.
Thanks!

system · December 22, 2021, 6:23am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tf-trt Jetson Nano - process killed - conversion running out of memory? Jetson Nano tensorrt , tensorflow	5	1374	October 18, 2021
TensorFlow object detection inference out of memory Jetson Nano	7	3144	October 18, 2021
Jetson Nano (2gb) OOM DeepStream SDK tensorflow	6	955	April 14, 2022
Memory usage difference between Jetson Nano vs PC loading the same model Jetson Nano tensorrt , tensorflow	2	801	March 14, 2022
TensorFlow always out of memory Jetson Nano	5	4918	July 7, 2021
Out of memory error from TensorFlow: any workaround for this, or do I just need a bigger boat? Jetson Nano	11	14457	June 12, 2020
TensorFlow-TensorRT inference time and memory consumption on Nano Jetson Nano	2	1018	October 18, 2021
Jetson Nano 2GB Tensorflow Object Detection API/TensorRT, Ran out of memory Jetson Nano tensorflow , nano2gb	2	535	October 15, 2021
Deepstream out of memory when transfer the onnx model into engine on Jetson Nano board DeepStream SDK	3	809	October 12, 2021
TensorFlow Models : GPU Out of Memory Jetson Nano	3	2774	November 5, 2019

Jetson Nano Out of Memory running TRT Model

Related topics