Facing error after training command

user86169 · February 25, 2022, 4:23pm

Docker_tag:–> v3.21.08-py3
Network Type → detectnet_v2
Training spec →
training_spec.txt (3.3 KB)

Hi, I am facing one error after typing training command : -

training command →

tao detectnet_v2 train -k tlt_encode -r /workspace/tao-experiments/results -e /workspace/tao-experiments/specs/training_spec.txt --gpu_index 1

error →

2022-02-25 16:14:53,358 [ERROR] tensorflow: ==================================
Object was never used (type <class ‘tensorflow.python.framework.ops.Tensor’>):
<tf.Tensor ‘IsVariableInitialized_308:0’ shape=() dtype=bool>
If you want to mark it as used call its “mark_used()” method.
It was originally created here:
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/utilities.py”, line 143, in get_singular_monitored_session File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1104, in init
stop_grace_period_secs=stop_grace_period_secs) File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 727, in init
self._sess = self._coordinated_creator.create_session() File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/hooks/hooks.py”, line 285, in begin File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/tf_should_use.py”, line 198, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))

2022-02-25 21:44:54,373 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · February 26, 2022, 9:03am

Did you ever run detectnet_v2 jupyter notebook? Is it successful?

user86169 · February 27, 2022, 1:14pm

no till now I haven’t run jupiter notebook. I am just trying to train using below training command : -

training command :-
tao detectnet_v2 train -k tlt_encode -r /workspace/tao-experiments/results -e /workspace/tao-experiments/specs/spec.txt --gpu_index 1

then I got the error which is in the previous comment.

Morganh · February 27, 2022, 2:29pm

Could you upload the full log as a file?

marcobiusfbm09 · February 28, 2022, 11:36am

Hi, I think I have the same error here.

train_error.log (402.8 KB)

Morganh · February 28, 2022, 12:07pm

Maybe it is similar to Troubleshooting Guide — TAO Toolkit 3.22.05 documentation

Please try to train with a new result folder.

marcobiusfbm09 · February 28, 2022, 12:48pm

The link you sent does not seem to help. But some changes:
I created a new venv for the jupyter notebook (this time with Python3.6 instead of 3.8) and deleted what I understand is the results folder ($USER_EXPERIMENT_DIR/experiment_dir_unpruned).

Now I don’t see errors, but the train stops with no training at all.

train_error-p36-new_folder.log (46.6 KB)

Morganh · February 28, 2022, 12:51pm

"Illegal instruction (core dumped) "

Above error comes from old CPU. You can search and find similar topics in forum.

marcobiusfbm09 · February 28, 2022, 1:01pm

Ok, I see the topics talking about old CPU. I’ll try another host. Thanks.

Just one question, does this means that tao is running the dockers ‘outside’ nvidia container?
I thought all this stuff was running in GPU. (by the way this host has a GTX970)

Morganh · February 28, 2022, 1:06pm

No, tao is running with tao dockers. TAO Toolkit for Computer Vision | NVIDIA NGC

For "Illegal instruction (core dumped) ", the reason is as below.
Old CPUs were missing AVX2 instruction set.
See Core dumped on examples - #3 by Morganh

system · March 14, 2022, 1:07pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error Facing in Training command TAO Toolkit	13	949	March 9, 2022
Error in TAO-Toolkit while training TAO Toolkit	2	1100	January 4, 2022
While invoking TAO container directly getting error tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1 TAO Toolkit	2	817	March 8, 2022
Error while training Deformable_detr using TAO TAO Toolkit tensorflow , nvbugs , python	17	329	March 1, 2024
Using tao detectnet_v2 returns python/json argument errors in functions TAO Toolkit tao	4	844	January 10, 2022
Error while training detectnet v2 taotollkit on default notebook TAO Toolkit	2	307	March 9, 2024
Referenced before assignment after tao-deploy detectnet_v2 inference TAO Toolkit tensorrt	3	447	January 3, 2023
Object Detection using TAO DetectNet_v2. Run TAO training stopped TAO Toolkit python	16	688	July 6, 2022
Trying to use my Dataset TAO Toolkit	8	774	October 12, 2021
Unable to successfully execute tao command in cv_samples_v1.4.0 TAO Toolkit	10	538	September 6, 2022

Facing error after training command

Related topics