hi, i’m trying to retrain ssd-mobilenet v1 following the tutorial “hello ai world”,
when I try to run the train_ssd.py script after some time and some saved epochs, I get an error, which I think is due to the fact that the jetson nano (4GB) runs out of memory.
I have allocated the swap memory correctly,however i can’t turn off the gui:
I have already tried some commands like:
sudo systemctl isolate multi-user.target
sudo systemctl set-default multi-user.target
these commands are executed without error but have no effect;
consequently I tried to use other commands such as:
sudo systemctl disable lightdm.service
sudo systemctl stop lightdm.service
these are the answers I get:
Failed to disable unit: Unit file lightdm.service does not exist.
Failed to stop lightdm.service: Unit lightdm.service not loaded.
the only commands that seem to work partially are:
sudo systemctl isolate graphical.target
sudo init 3
after executing these commands I enter the username and password, then the screen freezes, I have to wait more than 20 min for the commands to be executed, after these 20 mins I have to repeatedly press ctrl + alt + f2 to keep the terminal open.
All of this wouldn’t even be a problem if it weren’t for the fact that only 300MB of memory is saved when I start jetson in text mode, which apparently isn’t enough to complete training (usually the program crashes at 12-15 epoch).
hi, I have tried to run train_ssd.py with --batch-size = 1 and --workers = 0 but it did not work.
I have performed the commands in the guide but they did not work:
the command systemctl set-default multi-user.target does nothing and jetson boot into desktop mode. sudo init 3 work but the memory saved is only 300MB as I have already said here:
the only commands that seem to work partially are:
sudo systemctl isolate graphical.target
sudo init 3
after executing these commands I enter the username and password, then the screen freezes, I have to wait more than 20 min for the commands to be executed, after these 20 mins I have to repeatedly press ctrl + alt + f2 to keep the terminal open.
All of this wouldn’t even be a problem if it weren’t for the fact that only 300MB of memory is saved when I start jetson in text mode, which apparently isn’t enough to complete training (usually the program crashes at 12-15 epoch
Do you mean that it failed to run at all, or that it eventually runs out of memory?
Using those command-line options, mounting additional swap, and disabling the GUI are usually enough to be able to run train_ssd.py. What’s the error that you get when the program fails?
OK, if it says process killed, that very likely means that it ran out of memory.
If you are having memory issue, it’s still recommended to disable desktop and save the additional memory.
This seems abnormal, along with the other symptoms that you describe such as the screen freezing when you try to log in. Do you have another SD card that you could flash with a fresh install of JetPack?
Alternatively, if you have a Linux PC or laptop, you should be able to run the train_ssd.py on it and not have such concerns about the memory usage.
Hi @utente1480, you may want to open a new topic specifically about that topic if you are having problems. Although it may seem that re-flashing JetPack is also a good idea if your system is misbehaving at this point still.
hi, I write for anyone who has the same problem as me, using the LXDE desktop allowed me to save enough memory to train the model.
I know it may sound strange , but these are the values of used memory for each configuration:
unity / gnome: 1104MB used
text mode: 824 MB used
LXDE: 516 MB used