Tlt-augment execution error occurs

yeongjae8066 · November 27, 2020, 4:31am

Hello, an error occurred in the beginning while executing the tlt-augment command. The error total log is as follows. Here’s what I think is important.

0%| | 0/7481 [00:00<?, ?it/s]/usr/local/lib/python3.6/dist-packages/iva/augment/build_augmentor.py:486: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.

Is there encoding problem?

tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(3, 8), b.shape=(3, 3), m=8, n=3, k=3
[[{{node ArithmeticOptimizer/FoldTransposeIntoMatMul_MatMul_8}}]]
[[Minimum_1/_25]]
(1) Internal: Blas GEMM launch failed : a.shape=(3, 8), b.shape=(3, 3), m=8, n=3, k=3
[[{{node ArithmeticOptimizer/FoldTransposeIntoMatMul_MatMul_8}}]]

By any chance, I am doing rtx30 series link, and I know this is not suitable for tlt2, but is tlt-augment not allowed for the same reason? I only used the default file in the jupyter, and there is one image in the output file.

Thank you

2020-11-27 03:50:43.471231: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
Using TensorFlow backend.
2020-11-27 03:50:45,059 [INFO] iva.augment.spec_handler.spec_loader: Merging specification from /workspace/examples/augment/specs/default_spec.txt
2020-11-27 03:50:45,060 [INFO] iva.augment.build_augmentor: Input dataset: /workspace/tlt-experiments/data/training
2020-11-27 03:50:45,060 [INFO] iva.augment.build_augmentor: Output dataset: /workspace/tlt-experiments/augment/augmented_dataset
2020-11-27 03:50:45.088832: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-27 03:50:45.115590: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-27 03:50:45.115962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 3080 major: 8 minor: 6 memoryClockRate(GHz): 1.74
pciBusID: 0000:01:00.0
2020-11-27 03:50:45.115980: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-27 03:50:45.116880: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-27 03:50:45.117778: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-11-27 03:50:45.117946: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-11-27 03:50:45.118769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-11-27 03:50:45.119377: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-11-27 03:50:45.121309: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-27 03:50:45.121385: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-27 03:50:45.121780: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-27 03:50:45.122124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-11-27 03:50:45.122142: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-27 03:53:20.922483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-27 03:53:20.922511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-11-27 03:53:20.922516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-11-27 03:53:20.922680: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-27 03:53:20.923120: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-27 03:53:20.923456: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-27 03:53:20.923777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8707 MB memory) → physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6)
2020-11-27 03:53:20,929 [INFO] iva.augment.build_augmentor: Constructed the augmentor pipeline.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:14: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.

2020-11-27 03:53:20,982 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:14: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:15: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead.

2020-11-27 03:53:20,982 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:15: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:16: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

2020-11-27 03:53:20,983 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:16: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

0%| | 0/7481 [00:00<?, ?it/s]/usr/local/lib/python3.6/dist-packages/iva/augment/build_augmentor.py:486: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
2020-11-27 03:53:21.260419: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-27 03:54:23.384407: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x62870d0
2020-11-27 03:54:23.384481: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-11-27 03:55:01.934536: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
0%| | 1/7481 [01:41<209:56:22, 101.04s/it]2020-11-27 03:55:02.042145: E tensorflow/stream_executor/cuda/cuda_blas.cc:429] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
return fn(*args)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1350, in _run_fn
target_list, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(3, 8), b.shape=(3, 3), m=8, n=3, k=3
[[{{node ArithmeticOptimizer/FoldTransposeIntoMatMul_MatMul_8}}]]
[[Minimum_1/_25]]
(1) Internal: Blas GEMM launch failed : a.shape=(3, 8), b.shape=(3, 3), m=8, n=3, k=3
[[{{node ArithmeticOptimizer/FoldTransposeIntoMatMul_MatMul_8}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/bin/tlt-augment”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/augment/scripts/dataset_augment.py”, line 75, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/augment/build_augmentor.py”, line 241, in augment
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 956, in run
run_metadata_ptr)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1180, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1359, in _do_run
run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(3, 8), b.shape=(3, 3), m=8, n=3, k=3
[[{{node ArithmeticOptimizer/FoldTransposeIntoMatMul_MatMul_8}}]]
[[Minimum_1/_25]]
(1) Internal: Blas GEMM launch failed : a.shape=(3, 8), b.shape=(3, 3), m=8, n=3, k=3
[[{{node ArithmeticOptimizer/FoldTransposeIntoMatMul_MatMul_8}}]]
0 successful operations.
0 derived errors ignored.

Morganh · November 27, 2020, 5:58am

I am afraid it still results from the rtx30 series.
Below are some culprit log.

failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED

yeongjae8066 · November 27, 2020, 6:00am

I also referred to this link, but I think so

Topic		Replies	Views
Tlt-augment:command not found TAO Toolkit	3	624	October 12, 2021
TLT DetectnetV2, Problem (Solved! -> RTX 3070 not supported by tlt 2.0_py3) TAO Toolkit	12	819	October 12, 2021
ERROR: failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED TAO Toolkit	2	12787	October 12, 2021
How can I use tlt-augment TAO Toolkit	8	781	October 12, 2021
Error wile using TLT pretrained model tlt_semantic_segmentation:resnet101 TAO Toolkit	7	682	August 27, 2021
Error while training on tlt TAO Toolkit	4	777	September 5, 2021
Error during training using RTX3090 with TLT docker, it is ok with RTX2070 : failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED TAO Toolkit	2	1820	October 12, 2021
Error using tao augment command TAO Toolkit	9	657	April 28, 2023
Tlt train error: Value 'sm_86' is not defined for option 'gpu-name' TAO Toolkit	2	3666	October 12, 2021
Error: google.protobuf.text_format.ParseError: 57:5 : Message type "AugmentationConfig" has no field named "transform_prob" TAO Toolkit	4	1730	October 12, 2021

Tlt-augment execution error occurs

Related topics