The tlt CLI uses Docker containers under to hood to train and prune a model. We have in place DGX boxes as compute node part of the larger HPC infrastructure with a caveat that using Docker is forbidden. I have earlier used TF-TRT and Triton containers from NGC as singularity files and they have always worked fine. For some reason, there is no documentation to run TLT with singularity containers. The post-installation steps mentioned for non-root usage still point to the docker’s non-root feature enabling which in itself has some root (sudo *) dependencies. Is there a way to make tlt work with singularity containers?
Reference: Tlt-streamanalytics training in Singularity - #4 by Morganh
Please check if it helps for you.
One more tip, please try to pull the TLT 3.0 docker directly instead of using tlt-launcher.
docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3
morganh@dl:~$ docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3
v3.0-dp-py3: Pulling from nvidia/tlt-streamanalytics
Status: Image is up to date for nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3
morganh@dl:~$ docker run --runtime=nvidia -it -v /home/morganh/demo:/workspace/demo nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3 /bin/bash
–2021-04-23 10:05:27-- https://ngc.nvidia.com/downloads/ngccli_reg_linux.zip
Resolving ngc.nvidia.com (ngc.nvidia.com)… 220.127.116.11, 18.104.22.168, 22.214.171.124, …
Connecting to ngc.nvidia.com (ngc.nvidia.com)|126.96.36.199|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 24976582 (24M) [application/zip]
Saving to: ‘/opt/ngccli/ngccli_reg_linux.zip’
ngccli_reg_linux.zip 100%[====================================================================================================>] 23.82M 32.0MB/s in 0.7s
2021-04-23 10:05:28 (32.0 MB/s) - ‘/opt/ngccli/ngccli_reg_linux.zip’ saved [24976582/24976582]
EULA.pdf README.md demo examples
root@9f4979ebd897:/workspace# cd demo/
root@9f4979ebd897:/workspace/demo# mask_rcnn train -e spec.txt -d /workspace/demo/result -k nvidia_tlt