Jetson Orin Nano's RAM keeps getting full, the board crashes

Hello,

  • I have a JETSON ORIN NANO Developer Kit 8GB and my goal is to implement Real Time Yolo Object Detection, with rtsp source from an IP Camera.
  • I have exported my custom trained yolov11 model to TensorRT format and I am using the ultralytics library and yolo predict cfg cli command to run it, and there is no problem with the inference. However, using the ‘free’ command, I have discovered that the used RAM it’s constantly increasing (while the free one it’s decreasing), causing the board to crash.
  • I have tried using Stream=True parameter and the free RAM it’s still decreasing, but only slower. The program should run non-stop and save the inference results, therefore I need a long-term solution.
  • As for my current environment, I am using:
  1. Jetpack 6.1 with cuda 12.6, cuda driver version 540.4.0, libcudnn 9.5
  2. TensorRT 10.3.0
  3. Torch 2.5.0
  4. Torch 0.20.0
  5. Ultralytics 8.3.64

Not sure if you have a leak or not.
Problem is you are asking alot from 8gb of ram.
Set your swap file up on the NVMe to at least 60Gb and see what happens. If you are still running off SD card don’t use the large swap file.
Also install jtop and use gnome system monitor, those are not tools used for finding leaks.

If it is a leak you don’t have a chance to even come close to finding it due to the complexity and interactions between the packages.

Hi,

Do you see the RAM usage increases after the first frame inference?

If so, it’s recommended to check your app as there might be some memory leakage.
Have you checked the YOLOv11 detection sample from Ultralytics to see if any issues?

Thanks.

Hello,
Thanks for the response!
So, you’re saying it’s too big of a task for this specific board?

About the swap file, I don’t have a SSD attached at the moment and I currently have 3.9 swap memory. But I’ve noticed that the board enters the swap memory after RAM getting full, and it eventually crashes if used swap is too large. Therefore, I am not sure if just adding more swap memory would be the solution.

Also, I have already installed jtop. As for gnome system monitor, I am connecting to the board remotely, via ssh.

Hi,
Thanks for responding!

  • Yes, the RAM usage is going up after the first frame inference.
  • I have also tried running with the Ultralytics pre-trained model (exported it to .engine format), and I have tested on a mp4 video containing cars and pedestrians. (I did not find any video samples from Ultralytics, they only have some images, and in their documentation, they are just testing on some videos on youtube, so I’ve figured any video would do). The inference was correct, but the behaviour was the same - RAM used memory constantly increasing.
  • I have tried checking the app using memory_profiler, and I’ve got these results, but I am not sure how to interpret them. They are listed below:




  • Can you tell me how am I supposed to efficiently check for memory leakage? And how am I supposed to fix it? I am only using the scripts from the Ultralytics library, nothing more.

You don’t.

You only have 2 choices, one is move to a metal box with a RTX or better GPU. Another is too patch your board buy going to a fast nvme with a heatsink and bump your swap up to 60-100 GB on that NVMe. If you need pure performance buy a GPU and main board and build it your self.

Hi,

You can check it with Valgrind.
Or our compute-sanitizer can also check memory leakage:

/usr/local/cuda/bin/compute-sanitizer -h
...
Memcheck-specific options:
  --check-cache-control                 Check cache control memory accesses.
  --detect-missing-module-unload        Detect leaks caused by missing module unload calls. This option should not be used if the application uses the CUDA runtime.
  --leak-check arg (=no)                <full|no> Print leak information for CUDA allocations.
  --padding arg (=0)                    Size in bytes for padding buffer to add after each allocation.
  --report-api-errors arg (=explicit)   Print errors if any API call fails.
                                        all      : Report all CUDA API errors, including APIs invoked implicitly
                                        explicit : Report errors in explicit CUDA API calls only
                                        no       : Disable reporting of CUDA API errors
  --track-stream-ordered-races arg (=no)
                                        Track CUDA stream-ordered allocations races.
                                        all              : Track and report all CUDA stream-ordered allocations races
                                        use-before-alloc : Track and report use-before-alloc CUDA stream-ordered allocations races
                                        use-after-free   : Track and report use-after-free CUDA stream-ordered allocations races
                                        no               : Disable tracking and reporting for CUDA stream-ordered allocations races

Thanks.

I see. I wanted to find the cheapest solution while obtaining fast good real-time inference time for running such a program, and this board seemed like the solution to it.
Thanks for the suggestions!

Hello,

Thanks for responding!
After testing some more, I have discovered that

  • the used RAM memory actually stops increasing after (around) 3.2GB,
  • then the Buffer/Cache memory is going up to 4GB,
  • while the free one is being depleted (under 1GB).
  • then it starts using the swap memory, which ultimately leads to crashing.
  • it took 3 hours and 37 minutes for the board to start using the swap memory then I stopped it.

Therefore I have used this command: sync; echo 3 > /proc/sys/vm/drop_caches to empty the cache memory every other minute. The program run for more than 20h straight. I am not sure is the best solution to the problem but I am considering it.

Hi,

It is a possible WAR.
Is that good for your use case already?

About the issue, is the stream_inference the same as the inference?
It looks like the memory increases after calling * stream_inference()* (maybe load the model).
But no change when calling inference.

Thanks.

Hello,

I have decided to run it again without emptying the cache memory at all, and see how long it takes to crash after starting using the swap memory. It entered in swap after 1h45m but it’s been running now for 48h straight. I’ve noticed that both the used and free ram memory have gone up a bit.
The stats right now look like this:

image

So, I am waiting to see if it’s gonna crash at all. I think the first time it crashed may have been in the beginning when I did not know about the Stream=True parameter (my bad), but the used swap memory is rising, nonetheless.

About the stream_inference() vs the inference() functions, I can’t test right now (as the program is still running), but I am pretty sure I’ve tested the program on videos (without the stream=True parameter), and the behaviour was the same.

Thank you!

Hi,

Thanks for the update.
Please let us know if the crash happens again.