Inference YOLO_v4 int8 mode doesn't show any bounding box

I use and then follow GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream to install environment to run yolov4 inside docker container.

And the image above is the $ dpkg -l |grep cuda log inside that container

OK. To narrow down, please not use Deepstream container.
Please trigger TAO docker directly, then export and generate cal.bin file. Then generate trt engine and run inference with it.

I use tao docker, but still get no box. I use the jupyter notebook in cv_samples_vv1.2.0, uncomment the int8 export/convert clause and execute. FP32 works.

Do you solve your problem? How?

Simply using TAO docker directly does not help. Change batch size from 8 to 1, both in tao converter command line and in the eval_config section of specs/yolo_v4_retrain_resnet18_kitti.txt, did the trick.

But I don’t know why these changes matter. Could you please shed some light on this?

tao converter … -p Input,1x3x384x1248,1x3x384x1248,1x3x384x1248
eval_config {
batch_size: 1


Can you share your latest spec file? Thanks.

Sure. But I don’t known how to upload the file as an attachment, even after reading Attaching Files to Forum Topics/Posts.

Sorry for pasting the file here.

— begin
random_seed: 42
yolov4_config {
big_anchor_shape: “[(114.94, 60.67), (159.06, 114.59), (297.59, 176.38)]”
mid_anchor_shape: “[(42.99, 31.91), (79.57, 31.75), (56.80, 56.93)]”
small_anchor_shape: “[(15.60, 13.88), (30.25, 20.25), (20.67, 49.63)]”
box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “resnet”
nlayers: 18
arch_conv_blocks: 2
loss_loc_weight: 0.8
loss_neg_obj_weights: 100.0
loss_class_weights: 0.5
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.1
small_grid_xy_extend: 0.2
freeze_bn: false
#freeze_blocks: 0
force_relu: false
training_config {
batch_size_per_gpu: 8
num_epochs: 80
enable_qat: false
checkpoint_interval: 10
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 1e-4
soft_start: 0.3
regularizer {
type: NO_REG
weight: 3e-9
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
pruned_model_path: “/workspace/tao-experiments/yolo_v4/experiment_dir_pruned/yolov4_resnet18_pruned.tlt”
eval_config {
average_precision_mode: SAMPLE
batch_size: 1
matching_iou_threshold: 0.5
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
top_k: 200
force_on_cpu: true
augmentation_config {
hue: 0.1
saturation: 1.5
horizontal_flip: 0.5
jitter: 0.3
output_width: 1248
output_height: 384
output_channel: 3
randomize_input_shape_period: 0
mosaic_prob: 0.5
dataset_config {
data_sources: {
tfrecords_path: “/workspace/tao-experiments/data/training/tfrecords/train*”
image_directory_path: “/workspace/tao-experiments/data/training”
include_difficult_in_training: true
image_extension: “png”
target_class_mapping {
key: “car”
value: “car”
target_class_mapping {
key: “pedestrian”
value: “pedestrian”
target_class_mapping {
key: “cyclist”
value: “cyclist”
target_class_mapping {
key: “van”
value: “car”
target_class_mapping {
key: “person_sitting”
value: “pedestrian”
validation_data_sources: {
tfrecords_path: “/workspace/tao-experiments/data/val/tfrecords/val*”
image_directory_path: “/workspace/tao-experiments/data/val”
— end

For uploading the file, please click “upload” button when you reply your comments.

In the spec, there are training bs and eval bs.
training_config {
batch_size_per_gpu: 8

eval_config {
average_precision_mode: SAMPLE
batch_size: 1

Do you mean when you change eval bs from 8 to 1, then there is no issue now. Am I correct?

Thanks for the tip.

Yes. I only changed the eval bs, and only did that after the training is done.

I also change min/opt/max shape to 1x3x384x1248 when convert the engine.

Thanks for the info.

I am really sorry. In fact I never make int8 work. Changing batch size, input size have no effect on this.

I tried to tweak the command line for many time and got myself confused. By mistake, I specified fp16 type but put the engine file in int8 directory and thought that was a int8 engine. The boxes was actually inferenced by fp16 engine.


No problem. And actually I cannot reproduce this kind of issue all the time.
But some end users meet this problem.
So, could you share more info about the reproduce step and training environment?
For example,

  • The cuda/trt version in your local PC
  • The jupyter notebook . You can upload the .ipynb file here.
  • jupyter notebook
    attached. it is yolo_v4.ipynb in cv_samples_vv1.2.0, which was downloaded from ngc.
    I did not follow the notebook literally. My network is not stable. So instead of retrying download the 12G image file for days, I downloaded and extracted the images from a mirror inside my country, and from outside of the notebook.
    I also ssh to the machine that running the notebook. X11 problems bothered me. So I change !tao yolo_v4 train to !echo tao … and execute the command in terminal.

yolo_v4.ipynb (188.0 KB)

command line history:
annotated-cmd-history.txt (6.2 KB)

  • cuda
    both cuda-10-2 and cuda-11-1 installed, /usr/local/cuda links to 11-1 ultimately.
  • trt 7.2.2-1+cuda11.1
    As I tried to use darknet_xxx.weights with deepsteam 4.5.1, I have replace with the version built with TensorrtOSS 7.2.2.

Additional info:

  • os: ubuntu 18.04, nvidia related package installed from nvidia cuda/machine-learning repo.
  • gpu: rtx2080 ti
  • driver: 495.44
  • cudnn

As I am using tao toolkit, the docker image may be useful. It is It has:

  • cuda-11.1
  • trt 7.2.3
  • cudnn 8.1.1

Hi @renlifeng
To narrow down, could you refer to my comment in TLT YOLOv4 (CSPDakrnet53) - TensorRT INT8 model gives wrong predictions (0 mAP) - #23 by Morganh and use “yolo_v4 export” to generate tensorrt engine directly and retry?

yolo_v4 export -k nvidia_tlt -m epoch_010.tlt -e spec.txt --engine_file 384_1248.engine --data_type int8 --batch_size 8 --batches 10 --cal_cache_file export/cal.bin --cal_data_file export/cal.tensorfile --cal_image_dir /kitti_path/training/image_2 -o 384_1248.etlt

I missed your point of generating engine directly in my last reply. So I deleted it.

But still no box. By the way, there are odds (1 in 5?)the command will fail with illegal memory access. I use the following command:

tao yolo_v4 export -k nvidia_tlt -m /workspace/tao-experiments/yolo_v4/experiment_dir_retrain/weights/yolov4_resnet18_epoch_060.tlt -e /workspace/tao-experiments/yolo_v4/specs/yolo_v4_retrain_resnet18_kitti.txt --engine_file /workspace/tao-experiments/yolo_v4/export/trt.engine --data_type int8 --batch_size 8 --batches 10 --cal_image_dir /workspace/tao-experiments/yolo_v4/data/training/image_2 --cal_cache_file /workspace/tao-experiments/yolo_v4/export/cal.bin --cal_data_file /workspace/tao-experiments/yolo_v4/export/cal.tensorfile -o /workspace/tao-experiments/yolo_v4/export/yolov4_resnet18_epoch_060.etlt

And the error is:

[TensorRT] ERROR: engine.cpp (984) - Cuda Error in executeInternal: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
[TensorRT] INTERNAL ERROR: Assertion failed: context->executeV2(&bindings[0])

Can you directly login the docker and run again?
$ tao yolo_v4 run /bin/bash
Then inside the docker,
#tao yolo_v4 export xxx

Inside the docker, still no box.

 root@f88186999295:/workspace/tao-experiments/yolo_v4# yolo_v4 export --engine_file ...
 root@f88186999295:/workspace/tao-experiments/yolo_v4# yolo_v4  inference ...

Please try to run inside the docker again with my cal.bin. Thanks.
cal.bin.txt (8.4 KB)

Your cal.bin does the trick. Now I got mAP 0.90074. Thank you!
My purpose is train the model using custom dataset. Can I use your cal.bin when the .tlt model is ready?

Comparing to local generated cal.bin, this time there are lots of warning like:
[WARNING] Missing dynamic range for tensor (Unnamed Layer* 306) [Constant]_output, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing dynamic range for tensor activation_2/Relu:0, expect fall back to non-int8 implementation for any layer consuming or producing given tensor

So, the issue results from cal.bin file.

No, we should generate different cal.bin for different dataset.

OK, thanks. Wish the bug be fixed in next release.