• Hardware: RTX 2080Ti
• Network Type: Yolo_v4
• TLT Version: v3.21.08-py3
• Training spec file: yolo_v4_train_resnet18_kitti.txt (2.3 KB)
yolo_v4_retrain_resnet18_kitti.txt (2.3 KB)
Hi everyone, I followed YOLOv4 — TAO Toolkit 3.22.05 documentation to retrain my custom model. When I export and run inference my model, it doesn’t show any bbox. However, when I run with fp16, it works successfully.
Here are my command lines in the Jupyter script:
Export:
!tao yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_resnet18_epoch_$EPOCH.tlt \
-o $USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt \
-e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
-k $KEY \
--cal_image_dir $USER_EXPERIMENT_DIR/data/training/image_2 \
--data_type int8 \
--batch_size 8 \
--batches 10 \
--cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \
--cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
--verbose \
--gen_ds_config
Convert:
tao converter -k $KEY \
-p Input,1x3x384x1248,8x3x384x1248,16x3x384x1248 \
-c $USER_EXPERIMENT_DIR/export/cal.bin \
-e $USER_EXPERIMENT_DIR/export/trt.engine \
-b 2 \
-o BatchedNMS \
-m 8 \
-t int8 \
$USER_EXPERIMENT_DIR/export/yolov4_resnet18_epoch_$EPOCH.etlt
Inference:
!tao yolo_v4 inference -m $USER_EXPERIMENT_DIR/export/trt.engine \
-e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
-i $DATA_DOWNLOAD_DIR/test_samples \
-o $USER_EXPERIMENT_DIR/yolo_infer_images \
-t 0.6
I also followed @Morganh in Error when running custom YOLOv4 on deepstream_python_apps - #9 by thuan169993 for adjust the annotation format but I still get the same issue.
Could you guys help me with this problem? If you guys need anything else, please just let me know. Thanks in advance!
Hi,
To narrow down, could you please try with KITTI dataset?
Hi @Morganh , here is my update. I trained my YOLOv4 with a part of KITTI dataset and I still have the same problem. Could you guys help me please?
See TLT YOLOv4 (CSPDakrnet53) - TensorRT INT8 model gives wrong predictions (0 mAP) - #23 by Morganh , actually I cannot reproduce the int8 issue for CSPDarknet53 backbone.
Also the user in that topic has no issue for yolo_v4 int8 with resnet18 backbone.
Please check if my test step is useful.
virsg
October 20, 2021, 2:15am
7
Hi @thuan169993 , I had a similar problem with INT8 mode, and it was related to the calibration file (cal.bin) generated when you exported the mode with tao yolo_v4 export
Hi @virsg , did you fix it ?
@thuan169993
Which version of Tensorrt is it?
Can you share $ dpkg -l |grep cuda
Sorry for late response. I use 7.2.3
Here is the log inside my container
Morganh
November 1, 2021, 2:17pm
11
May I know what’s your container? Is it TLT/TAO?
Morganh
November 2, 2021, 7:32am
13
OK. To narrow down, please not use Deepstream container.
Please trigger TAO docker directly, then export and generate cal.bin file. Then generate trt engine and run inference with it.
I use tao docker, but still get no box. I use the jupyter notebook in cv_samples_vv1.2.0, uncomment the int8 export/convert clause and execute. FP32 works.
Do you solve your problem? How?
Simply using TAO docker directly does not help. Change batch size from 8 to 1, both in tao converter command line and in the eval_config section of specs/yolo_v4_retrain_resnet18_kitti.txt, did the trick.
But I don’t know why these changes matter. Could you please shed some light on this?
tao converter … -p Input,1x3x384x1248,1x3x384x1248,1x3x384x1248
eval_config {
batch_size: 1
…
}
Morganh
November 11, 2021, 1:47am
17
@renlifeng
Can you share your latest spec file? Thanks.
Sure. But I don’t known how to upload the file as an attachment, even after reading Attaching Files to Forum Topics/Posts .
Sorry for pasting the file here.
— begin
random_seed: 42
yolov4_config {
big_anchor_shape: “[(114.94, 60.67), (159.06, 114.59), (297.59, 176.38)]”
mid_anchor_shape: “[(42.99, 31.91), (79.57, 31.75), (56.80, 56.93)]”
small_anchor_shape: “[(15.60, 13.88), (30.25, 20.25), (20.67, 49.63)]”
box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “resnet”
nlayers: 18
arch_conv_blocks: 2
loss_loc_weight: 0.8
loss_neg_obj_weights: 100.0
loss_class_weights: 0.5
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.1
small_grid_xy_extend: 0.2
freeze_bn: false
#freeze_blocks: 0
force_relu: false
}
training_config {
batch_size_per_gpu: 8
num_epochs: 80
enable_qat: false
checkpoint_interval: 10
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 1e-4
soft_start: 0.3
}
}
regularizer {
type: NO_REG
weight: 3e-9
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
pruned_model_path: “/workspace/tao-experiments/yolo_v4/experiment_dir_pruned/yolov4_resnet18_pruned.tlt”
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 1
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
top_k: 200
force_on_cpu: true
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure:1.5
vertical_flip:0
horizontal_flip: 0.5
jitter: 0.3
output_width: 1248
output_height: 384
output_channel: 3
randomize_input_shape_period: 0
mosaic_prob: 0.5
mosaic_min_ratio:0.2
}
dataset_config {
data_sources: {
tfrecords_path: “/workspace/tao-experiments/data/training/tfrecords/train*”
image_directory_path: “/workspace/tao-experiments/data/training”
}
include_difficult_in_training: true
image_extension: “png”
target_class_mapping {
key: “car”
value: “car”
}
target_class_mapping {
key: “pedestrian”
value: “pedestrian”
}
target_class_mapping {
key: “cyclist”
value: “cyclist”
}
target_class_mapping {
key: “van”
value: “car”
}
target_class_mapping {
key: “person_sitting”
value: “pedestrian”
}
validation_data_sources: {
tfrecords_path: “/workspace/tao-experiments/data/val/tfrecords/val*”
image_directory_path: “/workspace/tao-experiments/data/val”
}
}
— end
Morganh
November 11, 2021, 2:49am
19
For uploading the file, please click “upload” button when you reply your comments.
In the spec, there are training bs and eval bs.
training_config {
batch_size_per_gpu: 8
eval_config {
average_precision_mode: SAMPLE
batch_size: 1
Do you mean when you change eval bs from 8 to 1, then there is no issue now. Am I correct?
Thanks for the tip.
Yes. I only changed the eval bs, and only did that after the training is done.
I also change min/opt/max shape to 1x3x384x1248 when convert the engine.
I am really sorry. In fact I never make int8 work. Changing batch size, input size have no effect on this.
I tried to tweak the command line for many time and got myself confused. By mistake, I specified fp16 type but put the engine file in int8 directory and thought that was a int8 engine. The boxes was actually inferenced by fp16 engine.
Sorry.