Problem about installing TLT

a228867745 · March 8, 2021, 8:02am

I want to use tlt, but I meet some problem when I install the TLT launcher.

(launcher) (base) vlab@vlab-C180300750:~$ tlt detectnet_v2 --help
2021-03-08 15:43:36,966 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
File “/usr/local/bin/detectnet_v2”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/entrypoint/detectnet_v2.py”, line 12, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 227, in launch_job
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 47, in get_modules
File “/usr/lib/python3.6/importlib/init.py”, line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 994, in _gcd_import
File “”, line 971, in _find_and_load
File “”, line 955, in _find_and_load_unlocked
File “”, line 665, in _load_unlocked
File “”, line 678, in exec_module
File “”, line 219, in _call_with_frames_removed
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/export.py”, line 8, in
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/export/exporter.py”, line 12, in
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/keras_exporter.py”, line 22, in
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py”, line 27, in
File “/usr/local/lib/python3.6/dist-packages/pycuda/autoinit.py”, line 9, in
context = make_default_context()
File “/usr/local/lib/python3.6/dist-packages/pycuda/tools.py”, line 204, in make_default_context
“on any of the %d detected devices” % ndevices)
RuntimeError: make_default_context() wasn’t able to create a context on any of the 1 detected devices
2021-03-08 15:43:43,809 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

My environment:

tlt_mounts.json

tlt --help works well

docker is ok

Morganh · March 8, 2021, 10:23am

Can you run tlt info ?
More, did you follow TLT Launcher — Transfer Learning Toolkit 3.0 documentation ?

Morganh · March 8, 2021, 10:26am

Please also refer to tlt-export error - #3 by 010akv

a228867745 · March 9, 2021, 1:32am

Thank you for your reply!
my tlt info:

Configuration of the TLT Instance
dockers: [‘nvcr.io/nvidia/tlt-streamanalytics’, ‘nvcr.io/nvidia/tlt-pytorch’]
format_version: 1.0
tlt_version: 3.0
published_date: 02/02/2021

yes, I follow that doc step by step.

a228867745 · March 9, 2021, 1:49am

I had read that issue before I wrote this question.
on my pc ,I find

-rwxr-xr-x 1 root root 241 Sep 13 21:15 /usr/local/bin/tlt-dataset-convert*
-rwxr-xr-x 1 root root 227 Sep 13 21:15 /usr/local/bin/tlt-evaluate*
-rwxr-xr-x 1 root root 225 Sep 13 21:15 /usr/local/bin/tlt-export*
-rwxr-xr-x 1 root root 224 Sep 13 21:15 /usr/local/bin/tlt-infer*
-rwxr-xr-x 1 root root 229 Sep 13 21:15 /usr/local/bin/tlt-int8-tensorfile*
-rwxr-xr-x 1 root root 224 Sep 13 21:15 /usr/local/bin/tlt-prune*
-rwxr-xr-x 1 root root 215 Sep 13 21:15 /usr/local/bin/tlt-pull*
-rwxr-xr-x 1 root root 736 Aug 27 21:09 /usr/local/bin/tlt-train*
-rwxr-xr-x 1 root root 224 Sep 13 21:15 /usr/local/bin/tlt-train-g1*

neither in /usr/local/bin nor in ~/.local/bin
why? Does it mean I install TLT failed?
Or I should look for that in docker? how to enter docker contain?
Thank you very much.

Morganh · March 9, 2021, 1:55am

To narrow down, there is a debug way. You can check if you can login the TLT 3.0 docker.

$ docker run --runtime=nvidia -it nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3 /bin/bash

a228867745 · March 9, 2021, 2:01am

Thank you for your reply.
OK, It seems to be OK.

Morganh · March 9, 2021, 2:33am

Firstly, please check if you will get error for all the modules. Try
$ tlt ssd --help

$ tlt dssd --help
…
etc

Next, please check if you already meet the requirement. TLT Launcher — Transfer Learning Toolkit 3.0 documentation
Pay attention to https://github.com/NVIDIA-AI-IOT/gesture_recognition_tlt_deepstream#prerequisites too.

Last, I am afraid it is setup issue. Search “return _bootstrap._gcd_import(name[level:], package, level)” or others in google for help.

a228867745 · March 9, 2021, 2:40am

OK, thank you.
I will keep trying until I resolve this issue, and recorde below.

Topic		Replies	Views
Running tlt- docker.errors.DockerException: Error while fetching server API version TAO Toolkit	16	3658	August 28, 2021
Run TLT inside docker TAO Toolkit	9	1566	August 27, 2021
Train with my own tlt model #2 TAO Toolkit	42	2777	February 8, 2022
TLT V2.0 Classification TAO Toolkit	26	2786	August 3, 2021
OSError: Specfile not found plz help TAO Toolkit	16	1585	October 12, 2021
TLT 3.0 Container Error while Convert to TFRecord TAO Toolkit	4	584	September 11, 2021
docker.errors.ImageNotFound: 404 Client Error TAO Toolkit	14	3531	February 18, 2022
AssertionError: Config path must be a valid unix path. No file found at: /root/.docker/config.json. Did you run docker login? TAO Toolkit tao	11	1968	July 6, 2022
TLT how to make sure tlt is speaking to docker thru correct HOST IP and so on TAO Toolkit docker , ai-training	11	1346	October 4, 2021
Docker instantiation failed when running tao ssd TAO Toolkit	17	928	December 28, 2021

Problem about installing TLT

Related topics