Converting yolov7 to deepstream

Platform: AGX Xavier
Problem: No detections from yolov7 from Deepstream

Steps:
(1) Converted yolov7 to onnx and recd the following output

python3 export.py --weights ./yolov7.pt --batch-size 1 --grid --end2end --fp16 --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640

Namespace(batch_size=1, conf_thres=0.35, device=‘cpu’, dynamic=False, dynamic_batch=False, end2end=True, fp16=True, grid=True, img_size=[640, 640], include_nms=False, int8=False, iou_thres=0.65, max_wh=None, simplify=False, topk_all=100, weights=‘./yolov7.pt’)

YOLOR 🚀 v0.1-126-g84932d7 torch 1.10.2+cu102 CPU

Fusing layers…

RepConv.fuse_repvgg_block

RepConv.fuse_repvgg_block

RepConv.fuse_repvgg_block

Model Summary: 306 layers, 36905341 parameters, 36905341 gradients

/home/user/.local/lib/python3.6/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at …/aten/src/ATen/native/TensorShape.cpp:2157.)

return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]

Starting TorchScript export with torch 1.10.2+cu102…

/home/user/yolo/yolov7/models/yolo.py:52: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

if self.grid[i].shape[2:4] != x[i].shape[2:4]:

TorchScript export success, saved as ./yolov7.torchscript.pt

CoreML export failure: No module named ‘coremltools’

Starting TorchScript-Lite export with torch 1.10.2+cu102…

TorchScript-Lite export success, saved as ./yolov7.torchscript.ptl

Starting ONNX export with onnx 1.9.0…

Starting export end2end onnx model for TensorRT…

/home/user/.local/lib/python3.6/site-packages/torch/_tensor.py:1013: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won’t be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:417.)

return self._grad

/home/user/yolo/yolov7/models/experimental.py:130: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

det_classes = torch.randint(0, num_classes, (batch_size, max_output_boxes), dtype=torch.int32)

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

WARNING: The shape inference of TRT::EfficientNMS_TRT type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

ONNX export success, saved as ./yolov7.onnx

Export complete (12.68s). Visualize with GitHub - lutzroeder/netron: Visualizer for neural network, deep learning and machine learning models.

(2) Copied the yolov7.onnx to models folder in deepstream.
Added a labels.txt file with 80 classes
Used the following config:
[property]

gpu-id=0

net-scale-factor=0.0039215697906911373

onnx-file=…/…/models/Yolov7/yolov7.onnx

#proto-file=…/…/models/Primary_Detector/resnet10.prototxt

#model-engine-file=…/…/models/Primary_Detector/resnet10.caffemodel_b30_gpu0_int8.engine

model-engine-file=…/…/models/Yolov7/yolov7.onnx_b1_gpu0_fp16.engine

labelfile-path=…/…/models/Yolov7/labels.txt

#int8-calib-file=…/…/models/Primary_Detector/cal_trt.bin

force-explicit-batch-dim=1

force-implicit-batch-dim=0

batch-size=1

process-mode=1

model-color-format=0

0=FP32, 1=INT8, 2=FP16 mode

network-mode=2

num-detected-classes=100

interval=0

gie-unique-id=1

output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid

#force-implicit-batch-dim=1

parse-bbox-func-name=NvDsInferParseCustomEfficientNMS

custom-lib-path=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_infercustomparser.so

1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)

cluster-mode=2

#scaling-filter=0

#scaling-compute-hw=0

#Use the config params below for dbscan clustering mode

#[class-attrs-all]

#detected-min-w=4

#detected-min-h=4

#minBoxes=3

#Use the config params below for NMS clustering mode

[class-attrs-all]

topk=200

nms-iou-threshold=0.45

pre-cluster-threshold=0.25

(3) Ran deepstream. It converts the onnx to engine file:
0:00:52.957520528 24832 0x55b6bcfaa0 INFO nvinfer gstnvinfer.cpp:654:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.0/samples/models/Yolov7/yolov7.onnx_b1_gpu0_fp16.engine

INFO: [Implicit Engine Info]: layers num: 5

0 INPUT kFLOAT images 3x640x640

1 OUTPUT kINT32 num_dets 1

2 OUTPUT kFLOAT det_boxes 100x4

3 OUTPUT kFLOAT det_scores 100

4 OUTPUT kINT32 det_classes 100

There are only 80 classes but it shows 100!! so I changed the number of classes to 100 in config file!! wonder if that is correct.

Net RESULT: NO detections on video where I was getting good detections from the same model without deepstream.

Following the tutorial in:

and instruction in GitHub - WongKinYiu/yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

I havent heard back any response.
I assume yolo networks (beyond yolov3) are little tough to manage in Jetsons!!

Is there an alternate to yolov3 that is recommended?

what do you mean about “manage in Jetson”? you can try yolov4, yolov4-tiny, yolo5s deepstream sample in deepstream_tao_apps, and there are other yolo deepstream sampleyolo sample.

Its good to know that higher models are not an issue. Yolov3 is giving me 3 to 4 fps on AGX Xavier (with 608 by 608 size). I was thinking that it gets tougher beyond that but maybe thats not true. Also, I see that yolov7 has almost half the number of weights and hence a lot faster so I am interested in using that.

The link you shared appears just what I was looking for. Wonder how I missed it on my own. Thanks for that.

Also, other than reducing the size from 608 to 416 (below which the accuracy falls in my case), I am unable to optimize by increasing the multiplexing with streammux (increased batch size and the buffer timeout period). Wonder if I am doing something wrong. jtop shows that GPU is 100% used with about 3GB of memory used (1 GB overflow to the shared memory). I dont think memory is an issue but wonder how I can increase fps using more batching!! Any pointers?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

please refer to this link

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.