Inference YOLO_v4 int8 mode doesn't show any bounding box

Morganh · November 11, 2021, 8:43am

No problem. And actually I cannot reproduce this kind of issue all the time.
But some end users meet this problem.
So, could you share more info about the reproduce step and training environment?
For example,

The cuda/trt version in your local PC
The jupyter notebook . You can upload the .ipynb file here.

renlifeng · November 11, 2021, 10:03am

jupyter notebook
attached. it is yolo_v4.ipynb in cv_samples_vv1.2.0, which was downloaded from ngc.
I did not follow the notebook literally. My network is not stable. So instead of retrying download the 12G image file for days, I downloaded and extracted the images from a mirror inside my country, and from outside of the notebook.
I also ssh to the machine that running the notebook. X11 problems bothered me. So I change !tao yolo_v4 train to !echo tao … and execute the command in terminal.

yolo_v4.ipynb (188.0 KB)

command line history:
annotated-cmd-history.txt (6.2 KB)

cuda
both cuda-10-2 and cuda-11-1 installed, /usr/local/cuda links to 11-1 ultimately.
trt 7.2.2-1+cuda11.1
As I tried to use darknet_xxx.weights with deepsteam 4.5.1, I have replace libnvinfer_plugin.so.7.2.2 with the version built with TensorrtOSS 7.2.2.

Additional info:

os: ubuntu 18.04, nvidia related package installed from nvidia cuda/machine-learning repo.
gpu: rtx2080 ti
driver: 495.44
cudnn 8.0.5.39-1+cuda11.1

As I am using tao toolkit, the docker image may be useful. It is nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.08-py3. It has:

cuda-11.1
trt 7.2.3
cudnn 8.1.1

Morganh · November 11, 2021, 11:06am

Hi @renlifeng
To narrow down, could you refer to my comment in TLT YOLOv4 (CSPDakrnet53) - TensorRT INT8 model gives wrong predictions (0 mAP) - #23 by Morganh and use “yolo_v4 export” to generate tensorrt engine directly and retry?

yolo_v4 export -k nvidia_tlt -m epoch_010.tlt -e spec.txt --engine_file 384_1248.engine --data_type int8 --batch_size 8 --batches 10 --cal_cache_file export/cal.bin --cal_data_file export/cal.tensorfile --cal_image_dir /kitti_path/training/image_2 -o 384_1248.etlt

renlifeng · November 12, 2021, 2:28am

I missed your point of generating engine directly in my last reply. So I deleted it.

But still no box. By the way, there are odds (1 in 5?)the command will fail with illegal memory access. I use the following command:

tao yolo_v4 export -k nvidia_tlt -m /workspace/tao-experiments/yolo_v4/experiment_dir_retrain/weights/yolov4_resnet18_epoch_060.tlt -e /workspace/tao-experiments/yolo_v4/specs/yolo_v4_retrain_resnet18_kitti.txt --engine_file /workspace/tao-experiments/yolo_v4/export/trt.engine --data_type int8 --batch_size 8 --batches 10 --cal_image_dir /workspace/tao-experiments/yolo_v4/data/training/image_2 --cal_cache_file /workspace/tao-experiments/yolo_v4/export/cal.bin --cal_data_file /workspace/tao-experiments/yolo_v4/export/cal.tensorfile -o /workspace/tao-experiments/yolo_v4/export/yolov4_resnet18_epoch_060.etlt

And the error is:

[TensorRT] ERROR: engine.cpp (984) - Cuda Error in executeInternal: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
[TensorRT] INTERNAL ERROR: Assertion failed: context->executeV2(&bindings[0])
…/builder/cudnnCalibrator.cpp:1148
Aborting…

Morganh · November 12, 2021, 2:39am

Can you directly login the docker and run again?
Step:
$ tao yolo_v4 run /bin/bash
Then inside the docker,
#tao yolo_v4 export xxx

renlifeng · November 12, 2021, 3:01am

Inside the docker, still no box.

 root@f88186999295:/workspace/tao-experiments/yolo_v4# yolo_v4 export --engine_file ...
 root@f88186999295:/workspace/tao-experiments/yolo_v4# yolo_v4  inference ...

Morganh · November 12, 2021, 3:32am

Please try to run inside the docker again with my cal.bin. Thanks.
cal.bin.txt (8.4 KB)

renlifeng · November 12, 2021, 3:48am

Your cal.bin does the trick. Now I got mAP 0.90074. Thank you!
My purpose is train the model using custom dataset. Can I use your cal.bin when the .tlt model is ready?

Comparing to local generated cal.bin, this time there are lots of warning like:
[WARNING] Missing dynamic range for tensor (Unnamed Layer* 306) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing dynamic range for tensor activation_2/Relu:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor

Morganh · November 12, 2021, 3:51am

So, the issue results from cal.bin file.

No, we should generate different cal.bin for different dataset.

renlifeng · November 12, 2021, 4:31am

OK, thanks. Wish the bug be fixed in next release.

Morganh · November 12, 2021, 4:48am

No, this is still not a bug. I also use the same way to generate cal.bin.
So, it should be related to the environment.

Can you attach your cal.bin ?

renlifeng · November 12, 2021, 4:51am

cal.bin.generated (7.0 KB)

Topic		Replies	Views
TLT YOLOv3 Int8 can not detect anything TAO Toolkit	17	1689	October 12, 2021
Unable to export QAT yolov3 in int8 TAO Toolkit	7	550	April 25, 2023
Convert TAO Yolov4 model to DLA engine fails TAO Toolkit	22	1667	March 1, 2022
Unable to deploy TAO 4.0.1 yolov4 model on deepstream6.0 TAO Toolkit deepstream	43	1079	August 18, 2023
Yolov3 worklfow or incorrect calibration file for int8 inference TAO Toolkit tensorrt , yolo , deepstream	6	527	July 6, 2023
Tao pre-trained yolo4tiny - AssertionError: Must have more boxes than clusters TAO Toolkit	54	2266	January 21, 2022
TLT YOLOv4 (CSPDakrnet53) - TensorRT INT8 model gives wrong predictions (0 mAP) TAO Toolkit yolo	35	3821	December 6, 2021
Error in Generating TFrecords for yolov4 TAO Toolkit	38	1225	May 17, 2022
Using a onnx model in INT8 mode for jetson Orin AGX TAO Toolkit yolo , onnx , jetson , deepstream	15	890	May 21, 2024
Yolo-v4 on colab - ModuleNotFound - No module named 'uff' TAO Toolkit tao	18	435	March 14, 2024

Inference YOLO_v4 int8 mode doesn't show any bounding box

Related topics