Error while converting model using TAO

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
V100
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
Yolo_v3
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
random_seed: 42
yolov3_config {
big_anchor_shape: “[(114.94, 60.67), (159.06, 114.59), (297.59, 176.38)]”
mid_anchor_shape: “[(42.99, 31.91), (79.57, 31.75), (56.80, 56.93)]”
small_anchor_shape: “[(15.60, 13.88), (30.25, 20.25), (20.67, 49.63)]”
matching_neutral_box_iou: 0.7
arch: “resnet”
nlayers: 18
arch_conv_blocks: 2
loss_loc_weight: 0.8
loss_neg_obj_weights: 100.0
loss_class_weights: 1.0
freeze_bn: false
#freeze_blocks: 0
force_relu: false
}
training_config {
batch_size_per_gpu: 16
num_epochs: 500
enable_qat: false
checkpoint_interval: 10
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 1e-6
max_learning_rate: 1e-4
soft_start: 0.1
annealing: 0.5
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
pretrain_model_path: “EXPERIMENT_DIR/pretrained_resnet18/pretrained_object_detection_vresnet18/resnet_18.hdf5”
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 8
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
top_k: 200
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure:1.5
vertical_flip:0
horizontal_flip: 0.5
jitter: 0.3
output_width: 1248
output_height: 384
output_channel: 3
randomize_input_shape_period: 0
}
dataset_config {
data_sources: {
label_directory_path: “/workspace/tao-experiments/data/training/label_2”
image_directory_path: “/workspace/tao-experiments/data/training/image_2”
}
include_difficult_in_training: true
target_class_mapping {
key: “ball”
value: “ball”
}
target_class_mapping {
key: “bottle”
value: “bottle”
}
target_class_mapping {
key: “grass”
value: “grass”
}
target_class_mapping {
key: “leaf”
value: “leaf”
}
target_class_mapping {
key: “milk-box”
value: “milk-box”
}
target_class_mapping {
key: “plastic-bag”
value: “plastic-bag”
}
validation_data_sources: {
label_directory_path: “/workspace/tao-experiments/data/val/label”
image_directory_path: “/workspace/tao-experiments/data/val/image”
}
}
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
Hi,guys:
The following error occurred when using tao-converter to convert yolov3 weights:


I understand that the -d parameter is no longer necessary, any ideas?

Can you check if the etlt file is available?
! tao yolo_v3 run ls $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt

It works, thank you. But when I use tao-converter on xavier, the following error appears:


-d parameter is the same as the one used during training

It is SSD model while previously it is a yolo_v3 model.
For the error, please build TRT OSS plugin. You can refer to YOLOv4 — TAO Toolkit 3.0 documentation

According to the documentation to build the TRT OSS plugin and replace it, I tried to convert the yolov3 model, and the following error occurred:

Sorry, I am afraid you are using v100 instead of Jetson devices. Did you follow YOLOv4 — TAO Toolkit 3.0 documentation ?

There is no doubt that I am using jetson. I replaced TRT OSS according to that document:

According to the description above, it is a v100 machine, right?

Yes, used to train the model. Now I want to deploy to DeepStream in jetson device.

OK, can you share the result of below?
$ ll /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so*

sure,

I am afraid you did not replace the plugin correctly.

The expected is as below.
$ ll /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so*
lrwxrwxrwx 1 root root 26 6月 6 2020 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so → libnvinfer_plugin.so.7.1.3*
lrwxrwxrwx 1 root root 26 10月 12 15:12 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7 → libnvinfer_plugin.so.7.1.3*
lrwxrwxrwx 1 root root 26 10月 12 15:12 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.0.0 → libnvinfer_plugin.so.7.1.3*
-rwxr-xr-x 1 root root 10009144 10月 12 15:06 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.1.3*

Please follow step 4 of YOLOv4 — TAO Toolkit 3.0 documentation

I follow step 4 replace the plugin correctly, but the error still exists:

Can you try official demo etlt model file?

wget https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip -O models.zip

then,
./tao-converter -k nvidia_tlt -d 3,544,960 -p image_input,1x3x544x960,1x3x544x960,1x3x544x960 -o BatchedNMS -e /export/trt.fp16.engine -t fp16 -i nchw -m 8 yolov4_resnet18.etlt

Try the official demo etlt model file and still get the following error


But I successfully converted the ssd model file under the same conditions and deployed it to deepstream

Please modify
-p image_input,
to
-p Input,

it doesn’t work

Can you run
$ md5sum yolov4_resnet18.etlt

image

Can you double check?
On my side, the generation is successful in NX.
$ ./tao-converter -k nvidia_tlt -d 3,544,960 -p Input,1x3x544x960,1x3x544x960,1x3x544x960 -o BatchedNMS -e /export/trt.fp16.engine -t fp16 -i nchw -m 8 yolov4_resnet18.etlt

Yesterday, another forum user also ran it successfully with this yolo_v4_resnet18.etlt.
See Error in Yolov4 engine conversion, - #41 by Morganh