Hello,
Thanks so much for replying to the questions.
I want to run the jupyter notebook for YOLO network using TLT. I installed and ran every thing on a linux 18.04 before and I did not have a problem.
I wanted to run the training on another computer but I am getting the following error. How can I fix this problem? Thanks.
2020-12-07 17:32:08,717 [INFO] iva.yolo.scripts.train: Number of images in the training dataset: 6778
Epoch 1/80
2020-12-07 17:32:16.681687: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-07 17:33:21.366397: E tensorflow/stream_executor/cuda/cuda_blas.cc:429] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2020-12-07 17:33:21.366438: E tensorflow/stream_executor/cuda/cuda_blas.cc:2437] Internal: failed BLAS call, see log for details
Traceback (most recent call last):
File "/usr/local/bin/tlt-train-g1", line 8, in <module>
sys.exit(main())
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py", line 51, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo/scripts/train.py", line 239, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo/scripts/train.py", line 183, in run_experiment
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 154, in fit_loop
outs = f(ins)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas xGEMMBatched launch failed : a.shape=[4,3,3], b.shape=[4,3,3], m=3, n=3, k=3, batch_size=4
[[{{node CompositeTransform_6/CompositeTransform_5/CompositeTransform_4/CompositeTransform_3/CompositeTransform_2/CompositeTransform_1/CompositeTransform/RandomFlip/MatMul}}]]
[[cond_7/GatherNd_1/_4621]]
(1) Internal: Blas xGEMMBatched launch failed : a.shape=[4,3,3], b.shape=[4,3,3], m=3, n=3, k=3, batch_size=4
[[{{node CompositeTransform_6/CompositeTransform_5/CompositeTransform_4/CompositeTransform_3/CompositeTransform_2/CompositeTransform_1/CompositeTransform/RandomFlip/MatMul}}]]
0 successful operations.
0 derived errors ignored.