Out of memory during training

byigal · June 29, 2021, 2:20pm

I am following the “Hello AI world” of Nvidia on my new Jetson-Nano dev kit (4GB). in the 3rd video ( here) , a dog/cat training is done, on top of the existing network. The command is:
python3 train.py --model-dir=models/cat_dog data/cat_dog

, and it aborts with “killed” message.
When I add the flags that are supposed to reduce memory needs " --batch-size=4 --workers=1 --epochs=1" it starts running, yet aborting with “OSError: [Errno 12] Cannot allocate memory”`

Yet in the video it runs well, even though it runs on the 2GB model, while I use the 4GB model.
In my case, I also terminated all other applications.
Any idea why can’t it finish the task?
Can I execute the training outside the Jetson Nano?

AastaLLL · June 30, 2021, 2:39am

Hi,

Could you try to set --batch-size 2 --workers 1 to decrease the memory usage?
Thanks.

MtHiker · June 30, 2021, 3:43am

And, refer following posts as well:

byigal · July 1, 2021, 4:47am

Thanks. I restarted the system and this time the run passed, so I assume my memory usage is on the edge. I will try that later,

byigal · July 1, 2021, 4:48am

Thanks. Somehow the run passed after restart, so I am progressing now. I will try that next time.

AastaLLL · July 5, 2021, 4:39am

Hi, byigal

Is this issue fixed when restarting the program?
Thanks.

byigal · July 5, 2021, 2:48pm

I restarted few times. At first it wasn’t solved. Then it went well after making sure I do nothing but this. it made me thinking that the memory needed is on the edge.

AastaLLL · July 12, 2021, 5:39am

Hi,

You can monitor the memory status with tegrastats.
If the usage is close to the maximum, maybe you can even lower the batch size to 1.

Thanks.

Topic		Replies	Views
Training custom model on Jetson Nano doesnt work Jetson Nano jetson-inference , ai-training	5	593	January 22, 2024
Training on jetson nano is killed Jetson Nano ai-training	3	483	February 12, 2024
Low memory problem in Jetson nano Jetson Nano	4	7117	October 14, 2021
OSError: [Errno 12] Cannot allocate memory Error in the jetson nano Jetson Nano yolo	3	1671	December 15, 2021
Jetson Nano B01 RAM memory runs out Jetson Nano	7	1496	October 14, 2021
The machine was hang and restarted during running train,py in section "Re-training on the Cat/Dog Dataset" Jetson Nano jetson-inference , python	4	436	July 11, 2022
Got OSError: [Errno 12] Cannot allocate memory Jetson Xavier NX kernel	7	2807	October 18, 2021
Device memory is insufficient for Jetson example Jetson Nano jetson-inference	3	1468	March 23, 2022
Jetson-inference: Retraining cat_dog using train.py is not running Jetson Nano	8	1066	October 14, 2021
trying to do yolo on Jetson nano,but resulting nano shut down everytime due to memory problem. Jetson Nano	6	2439	October 18, 2021

Out of memory during training

Related topics