Floating point exception during inference

philipp15 · May 5, 2021, 3:11pm

I’m trying to train and do inference with yolov4 on a custom dataset. While training works well, I get a floating point exception during the inference part. The custom dataset consists of PNG images of size 720 x 1280.

I used the following command for training:

tlt yolo_v4 train -e /workspace/tlt-experiments/yolo_v4/specs/train_config.txt \
    -r /workspace/tlt-experiments/yolo_v4/experiment_dir_unpruned \
    -k MY_KEY \
    --gpus 1

which resulted in the following error (at inference time):
output_training.txt (57.7 KB)

Similarly, I used this command for inference:

tlt yolo_v4 inference \
    -i /workspace/tlt-experiments/data/images/ -o /workspace/tlt-experiments/infer_images \
    -e /workspace/tlt-experiments/yolo_v4/specs/train_config.txt \
    -l /workspace/tlt-experiments/infer_labels \
    -m /workspace/tlt-experiments/yolo_v4/experiment_dir_unpruned/weights/yolov4_resnet18_epoch_001.tlt \
    -k MY_KEY

which led to this error:
output_inference.txt (8.5 KB)

System information:
uname:

Linux philipp-gpu 4.9.0-15-amd64 #1 SMP Debian 4.9.258-1 (2021-03-08) x86_64 GNU/Linux

nvidia-smi:

Wed May  5 07:48:25 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Morganh · May 5, 2021, 3:50pm

Please double check your test images.
Are there any empty image file? If not, please test fewer images step by step to dig out the reason.

philipp15 · May 6, 2021, 7:29am

Thanks for your reply. The origin of the problem was a faulty target class mapping on my side.
I’d suggest to implement a clearer error message if possible.

Morganh · May 6, 2021, 7:35am

Thanks for the info. May I know what is the faulty mapping, is it mapping to a dummy class or a wrong class?

philipp15 · May 7, 2021, 6:29am

Yes, at first, I had accidentally removed all mappings which led to the error described above.