I can't disable GUI for complete the train_ssd.py script

hi, i’m trying to retrain ssd-mobilenet v1 following the tutorial “hello ai world”,
when I try to run the train_ssd.py script after some time and some saved epochs, I get an error, which I think is due to the fact that the jetson nano (4GB) runs out of memory.

I have allocated the swap memory correctly,however i can’t turn off the gui:
I have already tried some commands like:

  • sudo systemctl isolate multi-user.target
  • sudo systemctl set-default multi-user.target

these commands are executed without error but have no effect;
consequently I tried to use other commands such as:

  • sudo systemctl disable lightdm.service
  • sudo systemctl stop lightdm.service

these are the answers I get:

  • Failed to disable unit: Unit file lightdm.service does not exist.
  • Failed to stop lightdm.service: Unit lightdm.service not loaded.

the only commands that seem to work partially are:

  • sudo systemctl isolate graphical.target
  • sudo init 3

after executing these commands I enter the username and password, then the screen freezes, I have to wait more than 20 min for the commands to be executed, after these 20 mins I have to repeatedly press ctrl + alt + f2 to keep the terminal open.

All of this wouldn’t even be a problem if it weren’t for the fact that only 300MB of memory is saved when I start jetson in text mode, which apparently isn’t enough to complete training (usually the program crashes at 12-15 epoch).

Any advice is welcome, thanks in advance

Hi @utente1480, see here for instructions to disable the desktop GUI: https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-transfer-learning.md#disabling-the-desktop-gui

Also, have you tried running train_ssd.py with --batch-size=1 and --workers=0 (these options will also save memory)

hi, I have tried to run train_ssd.py with --batch-size = 1 and --workers = 0 but it did not work.

I have performed the commands in the guide but they did not work:
the command systemctl set-default multi-user.target does nothing and jetson boot into desktop mode.
sudo init 3 work but the memory saved is only 300MB as I have already said here:

the only commands that seem to work partially are:

  • sudo systemctl isolate graphical.target
  • sudo init 3

after executing these commands I enter the username and password, then the screen freezes, I have to wait more than 20 min for the commands to be executed, after these 20 mins I have to repeatedly press ctrl + alt + f2 to keep the terminal open.

All of this wouldn’t even be a problem if it weren’t for the fact that only 300MB of memory is saved when I start jetson in text mode, which apparently isn’t enough to complete training (usually the program crashes at 12-15 epoch

Do you mean that it failed to run at all, or that it eventually runs out of memory?

Using those command-line options, mounting additional swap, and disabling the GUI are usually enough to be able to run train_ssd.py. What’s the error that you get when the program fails?

hi @dusty_nv;i mean that it run out of memory, this is the error:

Segmentation fault (core dumped)

sometimes it show me that the process is killed.

i forgot to mention i am using GNOME desktop

OK, if it says process killed, that very likely means that it ran out of memory.

If you are having memory issue, it’s still recommended to disable desktop and save the additional memory.

This seems abnormal, along with the other symptoms that you describe such as the screen freezing when you try to log in. Do you have another SD card that you could flash with a fresh install of JetPack?

Alternatively, if you have a Linux PC or laptop, you should be able to run the train_ssd.py on it and not have such concerns about the memory usage.

Hello @dusty_nv, unfortunately I don’t have another linux pc,and I’d rather not start over by reinstalling the operating system

however I can try to use the LXDE desktop and later try to disable the GUI;do you think this is a strategy that can work?

Unfortunately I’m not sure if that will fix whatever is keeping you from disabling the desktop, however LXDE does use less memory than GNOME desktop.

Also, when you mounted the swap memory, did you disable ZRAM as shown here? https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-transfer-learning.md#mounting-swap

ZRAM actually consumes physical RAM, so if you haven’t already disabling that should free up memory.

thanks for the time you are dedicating to me and the quick answers … yes I have disabled zram

i don’t know why but i can’t even install lightdm … i think i will re-download the jetpack

sorry if i still bother you but i want to try to configure lightdm, as reported in this tutorial (Save 1GB of Memory! Use LXDE on your Jetson - JetsonHacks)
could you give me some advice?

Hi @utente1480, you may want to open a new topic specifically about that topic if you are having problems. Although it may seem that re-flashing JetPack is also a good idea if your system is misbehaving at this point still.

1 Like

hi, I write for anyone who has the same problem as me, using the LXDE desktop allowed me to save enough memory to train the model.

I know it may sound strange , but these are the values of used memory for each configuration:
unity / gnome: 1104MB used
text mode: 824 MB used
LXDE: 516 MB used

OK great, glad to hear that you got LXDE running and were able to run the training!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.