For multi-GPU, change --gpus based on your machine.
2022-06-15 02:15:27,288 [INFO] root: Registry: ['nvcr.io']
2022-06-15 02:15:27,528 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-06-15 02:15:27,712 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/sysadmin/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
2022-06-15 02:15:33,657 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
Since yesterday, there is a new update in ngccli which results in this issue. Internal team is working on that to update the launcher with a fix.
Above is just a workaround. I will check another workaround for jupyter.
That means rather than running as normal in Jupyter, I run docker in bash and work inside container?
What should I set for yourlocaldolder:dockerfolder?
For this workaround, please ignore tao launcher.
For example, if you run training,
root@03d61af590ba:/workspace# mask_rcnn train --help
Using TensorFlow backend.
usage: mask_rcnn train [-h] [--num_processes NUM_PROCESSES] [--gpus GPUS]
[--gpu_index GPU_INDEX [GPU_INDEX ...]] [--use_amp]
[--log_file LOG_FILE] -e EXPERIMENT_SPEC_FILE -k KEY -d
MODEL_DIR
{dataset_convert,evaluate,export,inference,inference_trt,prune,train}
...
optional arguments:
-h, --help show this help message and exit
--num_processes NUM_PROCESSES, -np NUM_PROCESSES
The number of horovod child processes to be spawned.
Default is -1(equal to --gpus).
--gpus GPUS The number of GPUs to be used for the job.
--gpu_index GPU_INDEX [GPU_INDEX ...]
The indices of the GPU's to be used.
--use_amp Flag to enable Auto Mixed Precision.
--log_file LOG_FILE Path to the output log file.
-e EXPERIMENT_SPEC_FILE, --experiment_spec_file EXPERIMENT_SPEC_FILE
Path to spec file. Absolute path or relative to
working directory. If not specified, default spec from
spec_loader.py is used.
-k KEY, --key KEY Key to save or load a .tlt model.
-d MODEL_DIR, --model_dir MODEL_DIR
Dir to save or load a .tlt model.
tasks:
{dataset_convert,evaluate,export,inference,inference_trt,prune,train}
Hi! I have the same issue with !tao n_gram train . I followed the steps above, and have n_gram: command not found error when launching n_gram from inside the container. Could you help please?