Floating point exception during inference

I’m trying to train and do inference with yolov4 on a custom dataset. While training works well, I get a floating point exception during the inference part. The custom dataset consists of PNG images of size 720 x 1280.

I used the following command for training:

tlt yolo_v4 train -e /workspace/tlt-experiments/yolo_v4/specs/train_config.txt \
    -r /workspace/tlt-experiments/yolo_v4/experiment_dir_unpruned \
    -k MY_KEY \
    --gpus 1

which resulted in the following error (at inference time):
output_training.txt (57.7 KB)

Similarly, I used this command for inference:

tlt yolo_v4 inference \
    -i /workspace/tlt-experiments/data/images/ -o /workspace/tlt-experiments/infer_images \
    -e /workspace/tlt-experiments/yolo_v4/specs/train_config.txt \
    -l /workspace/tlt-experiments/infer_labels \
    -m /workspace/tlt-experiments/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_001.tlt \
    -k MY_KEY

which led to this error:
output_inference.txt (8.5 KB)

System information:

Linux philipp-gpu 4.9.0-15-amd64 #1 SMP Debian 4.9.258-1 (2021-03-08) x86_64 GNU/Linux


Wed May  5 07:48:25 2021       
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No running processes found                                                 |

Please double check your test images.
Are there any empty image file? If not, please test fewer images step by step to dig out the reason.

Thanks for your reply. The origin of the problem was a faulty target class mapping on my side.
I’d suggest to implement a clearer error message if possible.

Thanks for the info. May I know what is the faulty mapping, is it mapping to a dummy class or a wrong class?

Yes, at first, I had accidentally removed all mappings which led to the error described above.