Memory issue while trying to run a script

I ran a python script with indeed got killed after 2 mins.
The code is used to convert .weight file to .tf and to .tf-trt.

But Jetson nx ran out of memory.
I even tried to change the power input to 15W 6CPU.

I use a Jetson Xavier nx, Cuda 10.2.

Error log:



.
2020-08-22 15:31:22.362558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3624 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
Killed

GPUs mandate physical RAM, and so even if you added swap, this would probably still fail. Swap can help since it will allow some normal user space applications to swap out, and thus leave more physical RAM for the GPU, but the return on this is limited. Basically, you must use less RAM.

The power model won’t help with this, but if you are using more than one thread, then limiting the number of threads will help. If you use a certain number of CUDA cores, then using fewer would help. I’m not particularly good with the AI end, so I couldn’t tell you how to specifically go about this, but basically this where you would start.

Hi,
Maybe you could take the Nvidia team’s answer of my thread as a reference, Faster rcnn memory consumption, hope that will be helpful.

@linuxdev @klyuan1986 Thanks for the response, But this dint occur in my previous jetson nx. I even tried your solution @klyuan1986, but i use a python script to run the tensorflow trt inference.
The cmd waits for around 5 mins then, the out of memory issue is raised top right corner.

Google colab worked fine, which had lesser RAM size than jetson.

Is there any solution?

Hi,

May I know more about the failure scenario (2 min later)?
Does it occurs a while after inference?
If yes, it sounds like there is a memory leakage in the implementation.

Thank.

After few minutes, my Jetson freezes and I have to stop the power supply.

Yes it does occurs during inference.
Conversion to tensorRT was successful but ran out memory sometimes.

Hi,

In general, the memory usage should be stable when inference for each frame.
Since both buffer and engine is re-used in the inference time.
Would you mind to check if there is any leakage in your code first?

By the way, please also check the command shared below also:

Thanks.