TLT using Singularity Containers over Docker

pandamit · April 22, 2021, 6:09pm

The tlt CLI uses Docker containers under to hood to train and prune a model. We have in place DGX boxes as compute node part of the larger HPC infrastructure with a caveat that using Docker is forbidden. I have earlier used TF-TRT and Triton containers from NGC as singularity files and they have always worked fine. For some reason, there is no documentation to run TLT with singularity containers. The post-installation steps mentioned for non-root usage still point to the docker’s non-root feature enabling which in itself has some root (sudo *) dependencies. Is there a way to make tlt work with singularity containers?

Morganh · April 23, 2021, 1:09am

Reference: Tlt-streamanalytics training in Singularity - #4 by Morganh
Please check if it helps for you.

Morganh · April 23, 2021, 9:44am

One more tip, please try to pull the TLT 3.0 docker directly instead of using tlt-launcher.

docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3

Example:

morganh@dl:~$ docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3

v3.0-dp-py3: Pulling from nvidia/tlt-streamanalytics
Digest: sha256:3e20634106145588534caf2887fdc1093e0e167a0933b0a993e5a077684bd89e
Status: Image is up to date for nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3
nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3

morganh@dl:~$ docker run --runtime=nvidia -it -v /home/morganh/demo:/workspace/demo nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3 /bin/bash
–2021-04-23 10:05:27-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 13.225.93.33, 13.225.93.84, 13.225.93.94, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|13.225.93.33|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 24976582 (24M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’

ngccli_reg_linux.zip 100%[====================================================================================================>] 23.82M 32.0MB/s in 0.7s

2021-04-23 10:05:28 (32.0 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [24976582/24976582]

Archive: /opt/ngccli/ngccli_reg_linux.zip
inflating: /opt/ngccli/ngc
extracting: /opt/ngccli/ngc.md5

root@9f4979ebd897:/workspace# ls
EULA.pdf README.md demo examples

root@9f4979ebd897:/workspace# cd demo/
root@9f4979ebd897:/workspace/demo# mask_rcnn train -e spec.txt -d /workspace/demo/result -k nvidia_tlt

Topic		Replies	Views
Tlt-streamanalytics training in Singularity TAO Toolkit	6	939	October 12, 2021
Tlt-streamanalytics training in Singularity without root access TensorRT	1	417	February 26, 2021
Instructions/Guide/Tutorials to run TLT 3 on any cloud platform TAO Toolkit	2	907	October 12, 2021
TLT for jetson nano with jetpack 4.5 classification notebook TAO Toolkit	14	907	October 12, 2021
Run TLT inside docker TAO Toolkit	9	1566	August 27, 2021
Can tlt launcher work without online access to NGC repository? TAO Toolkit	4	459	November 22, 2021
(help needed) running tlt training with singularity TAO Toolkit	2	634	August 22, 2021
Error while running TLT Docker TAO Toolkit	6	1813	October 12, 2021
Run Tao inside docker TAO Toolkit docker , tao	4	1698	February 20, 2022
TLT 3.0 Container Error while Convert to TFRecord TAO Toolkit	4	584	September 11, 2021

TLT using Singularity Containers over Docker

Related topics