Docker instantiation failed when running tao ssd

user77490 · December 2, 2021, 4:16pm

Hello,
when i run tao ssd dataset-convert …
in HP GPU machine,
I got the error as below, this is already after i reinstalled the driver, i used this: sudo apt install nvidia-driver460…

(launcher) root@HP-Z-DSWS:~# tao ssd dataset_convert \

     -d $SPECS_DIR/ssd_tfrecords_kitti_train.txt \
     -o $DATA_DOWNLOAD_DIR/tfrecords/kitti_train

########
2021-12-02 17:08:05,450 [INFO] root: Registry: [‘nvcr.io’]
2021-12-02 17:08:05,643 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2021-12-02 17:08:05,755 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/root/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
File “/usr/local/bin/ssd”, line 8, in
sys.exit(main())
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/entrypoint/ssd.py”, line 12, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 256, in launch_job
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 47, in get_modules
File “/usr/lib/python3.6/importlib/init.py”, line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 994, in _gcd_import
File “”, line 971, in _find_and_load
File “”, line 955, in _find_and_load_unlocked
File “”, line 665, in _load_unlocked
File “”, line 678, in exec_module
File “”, line 219, in _call_with_frames_removed
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/export.py”, line 10, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/export/ssd_exporter.py”, line 30, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/keras_exporter.py”, line 22, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py”, line 27, in
File “/usr/local/lib/python3.6/dist-packages/pycuda/autoinit.py”, line 5, in
cuda.init()
pycuda._driver.LogicError: cuInit failed: forward compatibility was attempted on non supported HW
2021-12-02 17:08:11,241 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

• Hardware: GPU name: Quadro RTX 8000
Driver Version: 460.106.00 CUDA Version: 11.2
• Network Type: ssd_resnet18
• TLT Version
dockers:
nvidia/tao/tao-toolkit-tf:
v3.21.11-tf1.15.5-py3:

            v3.21.11-tf1.15.4-py3:
    nvidia/tao/tao-toolkit-pyt:
            v3.21.11-py3:
    nvidia/tao/tao-toolkit-lm:
            v3.21.08-py3:

format_version: 2.0
toolkit_version: 3.21.11
published_date: 11/08/2021

Thanks a lot!

Morganh · December 3, 2021, 2:57am

Can you run below command and share the result?
$ nvidia-smi

user77490 · December 3, 2021, 2:06pm

Morganh · December 3, 2021, 2:07pm

How about
$ dpkg -l |grep cuda

user77490 · December 3, 2021, 2:10pm

yes, i can, as above shown, thanks!

Morganh · December 3, 2021, 2:11pm

No, it should not be the same result. Could you please run
$ dpkg -l |grep cuda

user77490 · December 3, 2021, 2:13pm

ii cuda-command-line-tools-10-1 ii cuda-compiler-10-1 ii cuda-cudart-10-1 ii cuda-cudart-dev-10-1 ii cuda-cufft-10-1 ii cuda-cufft-dev-10-1 ii cuda-cuobjdump-10-1 ii cuda-cupti-10-1 ii cuda-curand-10-1 ii cuda-curand-dev-10-1 ii cuda-cusolver-10-1 ii cuda-cusolver-dev-10-1 ii cuda-cusparse-10-1 ii cuda-cusparse-dev-10-1 ii cuda-documentation-10-1 ii cuda-driver-dev-10-1 ii cuda-gdb-10-1 ii cuda-gpu-library-advisor-10-1 ii cuda-libraries-dev-10-1 ii cuda-license-10-1 ii cuda-license-10-2 ii cuda-memcheck-10-1 ii cuda-misc-headers-10-1 ii cuda-npp-10-1 ii cuda-npp-dev-10-1 ii cuda-nsight-10-1 ii cuda-nsight-compute-10-1 ii cuda-nsight-systems-10-1 ii cuda-nvcc-10-1 ii cuda-nvdisasm-10-1 ii cuda-nvgraph-10-1 ii cuda-nvgraph-dev-10-1 ii cuda-nvjpeg-10-1 ii cuda-nvjpeg-dev-10-1 ii cuda-nvml-dev-10-1 ii cuda-nvprof-10-1 ii cuda-nvprune-10-1 ii cuda-nvrtc-10-1 ii cuda-nvrtc-dev-10-1 ii cuda-nvtx-10-1 ii cuda-nvvp-10-1 ii cuda-repo-ubuntu1804 ii cuda-samples-10-1 ii cuda-sanitizer-api-10-1 ii cuda-toolkit-10-1 ii cuda-tools-10-1 ii cuda-visual-tools-10-1 10.1.243-1 amd64 CUDA command-line tools
10.1.243-1 amd64 CUDA compiler
10.1.243-1 amd64 CUDA Runtime native Libraries
10.1.243-1 amd64 CUDA Runtime native dev links, headers
10.1.243-1 amd64 CUFFT native runtime libraries
10.1.243-1 amd64 CUFFT native dev links, headers
10.1.243-1 amd64 CUDA cuobjdump
10.1.243-1 amd64 CUDA profiling tools interface.
10.1.243-1 amd64 CURAND native runtime libraries
10.1.243-1 amd64 CURAND native dev links, headers
10.1.243-1 amd64 CUDA solver native runtime libraries
10.1.243-1 amd64 CUDA solver native dev links, headers
10.1.243-1 amd64 CUSPARSE native runtime libraries
10.1.243-1 amd64 CUSPARSE native dev links, headers
10.1.243-1 amd64 CUDA documentation
10.1.243-1 amd64 CUDA Driver native dev stub library
10.1.243-1 amd64 CUDA-GDB
10.1.243-1 amd64 CUDA GPU Library Advisor.
10.1.243-1 amd64 CUDA Libraries 10.1 development meta-package
10.1.243-1 amd64 CUDA licenses
10.2.89-1 amd64 CUDA licenses
10.1.243-1 amd64 CUDA-MEMCHECK
10.1.243-1 amd64 CUDA miscellaneous headers
10.1.243-1 amd64 NPP native runtime libraries
10.1.243-1 amd64 NPP native dev links, headers
10.1.243-1 amd64 CUDA nsight
10.1.243-1 amd64 NVIDIA Nsight Compute
10.1.243-1 amd64 NVIDIA Nsight Systems
10.1.243-1 amd64 CUDA nvcc
10.1.243-1 amd64 CUDA disassembler
10.1.243-1 amd64 NVGRAPH native runtime libraries
10.1.243-1 amd64 NVGRAPH native dev links, headers
10.1.243-1 amd64 NVJPEG native runtime libraries
10.1.243-1 amd64 NVJPEG native dev links, headers
10.1.243-1 amd64 NVML native dev links, headers
10.1.243-1 amd64 CUDA Profiler tools
10.1.243-1 amd64 CUDA nvprune
10.1.243-1 amd64 NVRTC native runtime libraries
10.1.243-1 amd64 NVRTC native dev links, headers
10.1.243-1 amd64 NVIDIA Tools Extension
10.1.243-1 amd64 CUDA nvvp
10.2.89-1 amd64 cuda repository configuration files
10.1.243-1 amd64 CUDA example applications
10.1.243-1 amd64 CUDA Sanitizer API
10.1.243-1 amd64 CUDA Toolkit 10.1 meta-package
10.1.243-1 amd64 CUDA Tools meta-package
10.1.243-1 amd64 CUDA visual tools

user77490 · December 3, 2021, 2:16pm

I got this as above.
Actually I used tlt-streamanalytics container in the beginning: nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3, it can train detectnet_v2_resnet18, it used the code tlt-train detectnet_v2 … but it not worked with tlt-train ssd, and then i went to tao.

Morganh · December 3, 2021, 2:20pm

This does not make sense. For tlt 2.0, “tlt-train ssd” should be working as well.

For your current issue, I am afraid you need to update CUDA.

user77490 · December 3, 2021, 2:30pm

Thanks for the reply!
Actually I used tlt-streamanalytics container in the beginning: nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3, it can train detectnet_v2_resnet18, it used the code tlt-train detectnet_v2 … but it not worked with tlt-train ssd, and then i went to tao.
my question firstly related to deepstream python app: why a tlt detectnet_v2_resnet18 model with 90% accuracy cannot detect anything when applying to deepstream python apps? this is another topic i am looking for the answer.
second quesiton is why tlt-train ssd in the container nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 gives this error meanwhile it works with tlt-train detectnet_v2
the third question is about tao-ssd, why there is cudainit failed.
Sorry for so many questions. these three questions are the steps i tried and got one by one… I am thinking whether i should raise three topics, or can ask all here? thanks so much!!

user77490 · December 3, 2021, 2:35pm

For my first question about deepstream python apps not detecting anything with detectnet_V2_resnet18 model of 90% accuracy, I only saw this in the forum.

user77490 · December 3, 2021, 2:39pm

update cuda, ok, i will try, thanks, do you have any idea about the deepstream python apps not detecting bbox (another question as above)?

Morganh · December 3, 2021, 2:42pm

Please create your issues in new topics separately.

user77490 · December 3, 2021, 2:47pm

thanks, i will do that, just a quick question: tlt-train in the container nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 should give same results as tao in terms of models themselves part?

Morganh · December 3, 2021, 2:52pm

Actually TLT 2.0 is released in Aug,2020.
See

The TAO version starts from 3.21.08 in 2021.
So, we cannot compare these two version directly.

user77490 · December 3, 2021, 11:00pm

ok I see, thanks:)

Can I ask something still related to TLT/TAO here?
tlt-train detectnet_v2, prune and retrain went well, but when i go for tlt-infer, the errors come out like this: Using TensorFlow backend.
2021-12-03 22:58:24.552363: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
Traceback (most recent call last):
File “/usr/local/bin/tlt-infer”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_infer.py”, line 54, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/inference.py”, line 187, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/spec_handler/spec_loader.py”, line 88, in load_inference_spec
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/spec_handler/spec_loader.py”, line 50, in load_proto
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/spec_handler/spec_loader.py”, line 36, in _load_from_file
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 734, in Merge
allow_unknown_field=allow_unknown_field)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 802, in MergeLines
return parser.MergeLines(lines, message)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 827, in MergeLines
self._ParseOrMerge(lines, message)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 849, in _ParseOrMerge
self._MergeField(tokenizer, message)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 974, in _MergeField
merger(tokenizer, message, field)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 1048, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 974, in _MergeField
merger(tokenizer, message, field)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 1048, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 974, in _MergeField
merger(tokenizer, message, field)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 1048, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 974, in _MergeField
merger(tokenizer, message, field)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 1048, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 941, in _MergeField
(message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 33:9 : Message type “ClusteringConfig” has no field named “clustering_algorithm”.

Morganh · December 4, 2021, 5:25am

When you use old version of tlt, suggest you download corresponding jupyter notebook or refer to its user guide.
Different version of TLT/TAO may have different parameters.

Above is an example. You need to modify the inference spec file.

TLT/TAO user guide: NVIDIA TAO - NVIDIA Docs

system · December 28, 2021, 1:17am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Train with my own tlt model #2 TAO Toolkit	42	2778	February 8, 2022
Docker instantiation failed with error: 500 Server Error: Internal Server Error ("OCI runtime create failed...) TAO Toolkit ubuntu , docker	51	8678	December 6, 2021
TLT V2.0 Classification TAO Toolkit	26	2787	August 3, 2021
No CUDA-capable device is detected on tao detectnet_v2 dataset convert TAO Toolkit pycuda , omniverse_extension	13	6151	January 4, 2022
pycuda._driver.LogicError: cuInit failed: system not yet initialized TAO Toolkit	18	6815	October 12, 2021
Running TLT 3.0 in DGX A100, driver-version error TAO Toolkit	8	1368	September 19, 2021
Problem with tlt file mounting TAO Toolkit	29	2342	January 6, 2022
Error in TAO-Toolkit while training TAO Toolkit	15	1511	July 6, 2022
Tao-converter error TAO Toolkit	34	1967	November 10, 2021
No CUDA-capable device is detected TAO Toolkit cuda , tao	9	33	February 17, 2025

Docker instantiation failed when running tao ssd

Related topics