Running tao faster_rcnn train with nohup

Hello,

When I run the nohup tao faster_rcnn train --gpu_index $GPU_INDEX -e $SPECS_DIR/default_spec_resnet18.txt &, on my local machine (not from the docker container), I get the following error:

2021-09-10 16:03:25,860 [INFO] root: Registry: ['nvcr.io']
2021-09-10 16:03:26,424 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/user/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
the input device is not a TTY
2021-09-10 16:03:26,991 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

If run the above command without nohup, it works just fine. How should I redirect the output from the docker to a file instead on stdout?

The system specifications are as follows:
• Hardware (Quadro RTX 6000)
• Network Type (Faster_rcnn)
• TAO Version (docker_tag = v3.21.08-py3 )

Why do you need to run nohup?

More, for example, actually you can save stdout to a file when you run with putty.

If you still want to use nohup, please try below step.

(venv_3.0) morganh@dl:~$ python
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import os
>>> os.system(“nohup /home/morganh/venv_3.0/bin/tao --help”)
nohup: ignoring input and appending output to ‘nohup.out’
0
>>>
(venv_3.0) morganh@dl:~$ cat nohup.out
~/.tao_mounts.json wasn’t found. Falling back to obtain mount points and docker configs from ~/.tlt_mounts.json.
Please note that this will be deprecated going forward.
usage: tao [-h]
{list,stop,info,augment,bpnet,classification,converter,detectnet_v2,dssd,emotionnet,faster_rcnn,fpenet,gazenet,gesturenet,heartratenet,intent_slot_classification,lprnet,mask_rcnn,multitask_classification,n_gram,punctuation_and_capitalization,question_answering,retinanet,speech_to_text,speech_to_text_citrinet,ssd,text_classification,token_classification,unet,yolo_v3,yolo_v4}

Launcher for TAO Toolkit.

optional arguments:
-h, --help show this help message and exit

tasks:
{list,stop,info,augment,bpnet,classification,converter,detectnet_v2,dssd,emotionnet,faster_rcnn,fpenet,gazenet,gesturenet,heartratenet,intent_slot_classification,lprnet,mask_rcnn,multitask_classification,n_gram,punctuation_and_capitalization,question_answering,retinanet,speech_to_text,speech_to_text_citrinet,ssd,text_classification,token_classification,unet,yolo_v3,yolo_v4}
(venv_3.0) morganh@dl:~$

I have timed access to the training server and therefore I wanted to run with nohup. Anway tmux seems to do the job for now

1 Like