TLT Classification crash

danish.ansari · May 17, 2021, 10:37am

While running tlt-train classification I am getting below error:
Using TensorFlow backend.
2021-05-17 10:27:27.402390: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
File “/usr/local/bin/tlt-train-g1”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py”, line 39, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/makenet/scripts/train.py”, line 444, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/makenet/scripts/train.py”, line 98, in parse_command_line
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/makenet/scripts/train.py”, line 61, in build_command_line_parser
AttributeError: ‘list’ object has no attribute ‘add_argument’

I am running this command: tlt-train classification -e /workspace/examples/classification/specs/classification_spec.cfg -r /workspace/tlt_training/output -k tlt --gpus 1
even if I run just: tlt-train classification is gives the same error.

Morganh · May 17, 2021, 10:39am

Did you run in TLT 3.0 or TLT 2.0?

danish.ansari · May 17, 2021, 10:41am

I think it’s 3.0
docker run -it --gpus all --entrypoint /bin/bash -v /home/ubuntu/tlt_training:/workspace/tlt_training nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3

Morganh · May 17, 2021, 10:44am

For TLT 3.0, all the commands are suggested running at host PC instead of inside docker.
For example, $ tlt classification train xxx
See Migrating to TLT 3.0 — Transfer Learning Toolkit 3.0 documentation and TLT Launcher — Transfer Learning Toolkit 3.0 documentation

If you still want to run your current command inside the docker, you can try
# classification train -e /workspace/examples/classification/specs/classification_spec.cfg -r /workspace/tlt_training/output -k tlt --gpus 1

Topic		Replies	Views
Tlt detectnet_v2 dataset_convert FileNotFoundError: [Errno 2] No such file or directory: TAO Toolkit ubuntu	5	1778	September 4, 2021
TLT V2.0 Classification TAO Toolkit	26	2787	August 3, 2021
An error occurred when running TLT training TAO Toolkit	2	830	October 12, 2021
Can't see the classification and other folder inside TLT-V3 TAO Toolkit	21	2494	October 12, 2021
Error while using tlt-infer with mask_rcnn on a custom dataset TAO Toolkit jetson-inference	6	523	October 12, 2021
Trying to use my Dataset TAO Toolkit	8	775	October 12, 2021
Problem about installing TLT TAO Toolkit	9	1306	October 12, 2021
OSError: Specfile not found plz help TAO Toolkit	16	1585	October 12, 2021
Error doing inference with yolo exported enginge TAO Toolkit yolo	4	669	October 4, 2021
LPRNet Error on Openalpr Dataset while training TAO Toolkit	18	865	October 12, 2021

TLT Classification crash

Related topics