• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
Hi, The same dataset, same docker, same config file, same ubuntu, different machine. I trained successfully on RTX 2080ti. However, in RTX 3080ti, the bus is shown below. Please help!
File “/root/.cache/bazel/_bazel_root/b770f990bb7b9e2db5771981fb3a38b4/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/models/yolov4_model.py”, line 692, in train
File “/usr/local/lib/python3.6/dist-packages/keras/engine/training.py”, line 1039, in fit
validation_steps=validation_steps)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py”, line 154, in fit_loop
outs = f(ins)
File “/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py”, line 2715, in call
return self._call(inputs)
File “/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py”, line 2675, in _call
fetched = self._callable_fn(*array_vals)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1472, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_Dataset_map__map_func_set_random_wrapper_5710}} Invalid PNG data, size 789337
[[{{node AssetLoader/DecodePng}}]]
[[data_loader_out]]
print("To run with multigpu, please change --gpus based on the number of available GPUs in your machine.")
!yolo_v4 train -e $SPECS_DIR/yolo_v4_train_resnet18_kitti.txt \
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
-k $KEY \
--gpus 1
Yes. I have narrowed our dataset. with a small number of datasets that are checked very carefully. I still have this problem.
Will check with nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3 and let you know later
Hi @Morganh The sequence format is used. Got the same problem. All the image/label in the log is normal. log.txt (4.7 KB)
Is this a problem caused by RAM? I checked RAM, my RAM has the problem as above.
There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks
From the log, the culprit is /workspace/tao-experiments/data/training/image_2/on the floor-6-237.png . Please check it.