Load only .engine file

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 6.0
• JetPack Version (valid for Jetson only) 4.6
• TensorRT Version 8.0.1.6

I have trained a pytorch model, best.pt. For I used GhostConv & DWConv etc. layers,
it’s difficult to convert to right .cfg and .wts via gen_wts_yoloV5.py in Deepstream-Ylo
or in tensorrtx. So I exported to onnx in yolov5, then use trtexec to produce the .engine
file.
Now I want load the .engine file directly from deepstream without .cfg and .wts, but not succeeded.
deepstream-app -c deepstream_app_config.txt

Using winsys: x11
0:00:05.465532190 21034 0xf7b1f50 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.0/sources/DeepStream-Yolo/./best_exp.engine
INFO: [Implicit Engine Info]: layers num: 5
0 INPUT kHALF images 3x640x640
1 OUTPUT kHALF 490 3x80x80x6
2 OUTPUT kHALF 558 3x40x40x6
3 OUTPUT kHALF 626 3x20x20x6
4 OUTPUT kFLOAT output 25200x6

0:00:05.465893821 21034 0xf7b1f50 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.0/sources/DeepStream-Yolo/./best_exp.engine
0:00:05.475766009 21034 0xf7b1f50 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-6.0/sources/DeepStream-Yolo/config_infer_primary_yoloV5.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

p: Pause
r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
To go back to the tiled display, right-click anywhere on the window.

**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:194>: Pipeline ready

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 260
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 260
** INFO: <bus_callback:180>: Pipeline running

WARNING: Num classes mismatch. Configured: 1, detected by network: 0
deepstream-app: nvdsparsebbox_Yolo.cpp:137: bool NvDsInferParseCustomYolo(const std::vector&, const NvDsInferNetworkInfo&, const NvDsInferParseDetectionParams&, std::vector&, const uint&, const uint&): Assertion `layer.inferDims.numDims == 3’ failed.
Aborted (core dumped)

deepstream_app_config.txt:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0

[source0]
enable=1
type=3
uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
type=2
sync=0
gpu-id=0
nvbuf-memory-type=0

[osd]
enable=1
gpu-id=0
border-width=5
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
live-source=0
batch-size=1
batched-push-timeout=40000
width=1920
height=1080
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt

[tests]
file-loop=0

config_infer_primary_yoloV5.txt:
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
custom-network-config=yolov5n.cfg
model-file=yolov5n.wts
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=0
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25

How to fix this error? Or, is it possible to run without .cfg and .wts?

The post-processor -NvDsInferParseYolo() function defined in “parse-bbox-func-name=NvDsInferParseYolo” does not meet your model, you need to modify it to adopt your model.

You may could refer to GitHub - beyondli/Yolo_on_Jetson

Thank you much. I will try it.

As per Yolo_on_Jetson case, I failed to convert .pt to .onnx, for my .pt applied ghostconv which not defined in common.py,. I customed the common.py, yet it not work.
And I exported onnx via yolov5 b6.1, and copied it to the docker container, and tensorrt succeeded to output a .engine
But when I try to integrate it with deepstream, errors appeared as follows:

(p3) shisun@nx:~/Yolo_on_Jetson/deepstream$ deepstream-app -c deepstream_app_config.txt
Opening in BLOCKING MODE

Using winsys: x11
0:00:04.279979065 25951 0x12cca2a0 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/home/shisun/Yolo_on_Jetson/deepstream/best.engine
INFO: [Implicit Engine Info]: layers num: 5
0 INPUT kHALF images 3x640x640
1 OUTPUT kHALF 485 3x80x80x6
2 OUTPUT kHALF 545 3x40x40x6
3 OUTPUT kHALF 605 3x20x20x6
4 OUTPUT kFLOAT output 25200x6

0:00:04.280289368 25951 0x12cca2a0 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /home/shisun/Yolo_on_Jetson/deepstream/best.engine
0:00:04.292904919 25951 0x12cca2a0 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/shisun/Yolo_on_Jetson/deepstream/config_infer_primary.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

p: Pause
r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
To go back to the tiled display, right-click anywhere on the window.

** INFO: <bus_callback:194>: Pipeline ready

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:180>: Pipeline running

NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4

**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
deepstream-app: nvdsparsebbox_Yolo.cpp:203: bool NvDsInferParseCustomYolo(const std::vector&, const NvDsInferNetworkInfo&, const NvDsInferParseDetectionParams&, std::vector&, const uint&, const uint&): Assertion `layer.inferDims.numDims == 3 || layer.inferDims.numDims == 4’ failed.
Aborted (core dumped)

deepstream_app.config.txt

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0

[source0]
enable=1
type=3
uri=file:/home/shisun/81fd5.mp4
#uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
num-sources=1
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
gpu-id=0
nvbuf-memory-type=0

[sink1]
enable=1
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
#iframeinterval=10
bitrate=2000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
output-file=out.mp4
source-id=0

[osd]
enable=1
gpu-id=0
border-width=5
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
live-source=0
batch-size=1
batched-push-timeout=40000
width=640
height=640
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt

[tests]
file-loop=0

config_infer_primary.txt

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0

#model-engine-file=./models/yolov5s_fp32.engine
#network-mode=0

model-engine-file=/home/shisun/Yolo_on_Jetson/deepstream/best.engine
network-mode=2

#model-engine-file=./models/yolov5s_int8.engine
#int8-calib-file=./models/yolov5s_calibration.cache
#network-mode=1

network-input-order=0
#infer-dims=640;640;3
symmetric-padding=1

labelfile-path=labels.txt
batch-size=1
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1

parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

num-detected-classes=1

[class-attrs-all]
nms-iou-threshold=0.3
pre-cluster-threshold=0.5
发件人:mchi via NVIDIA Developer Forums nvidia@discoursemail.com
发送日期:2022-05-30 16:01:32
收件人:zqsun@shsunimage.com
主题:[NVIDIA Developer Forums] [Intelligent Video Analytics/DeepStream SDK] Load only .engine file

or kindly info me which file shall I modify?
Thank you very much

发件人:mchi via NVIDIA Developer Forums nvidia@discoursemail.com
发送日期:2022-05-30 16:01:32
收件人:zqsun@shsunimage.com
主题:[NVIDIA Developer Forums] [Intelligent Video Analytics/DeepStream SDK] Load only .engine file

These error are from the post-processor code under /opt/nvidia/deepstream/deepstream/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo, you need to understand the meaning of each output of your model and modify the post-processor code under nvdsinfer_custom_impl_Yolo accordingly. These post-processor code are for YoloV3, you need to modify them to make them work for your model

Thank you for quick reply. I am tring.

Hi,
I tried Yolo-on-Jetson and it works with FP32 trt model.
However the fps is around 30. And I need to run my model on FP16 to
get higher fps. How to modify the nvdsbboxparseyolo.cpp?
Thank you.

did you build this FP32 model outside of DeepStream?

yes. I built with trtexec for I cannot convert it to wts ang cfg.

Do I need to redefine all float type to kHALF, or how to let C++ to load
the right fp16 data from trt file?

you can build the fp16 image with trtexec, and keep “network-mode=2”.

Do I need to redefine all float type to kHALF, or how to let C++ to load
the right fp16 data from trt file?
No, you just need to specify the fp16 option in trtexec command.

Thank you for reply.
I tried fp16 option in generating engine file and keeo mode=2 in the configuation file. And deepstream starts like below:
INFO: [Implicit Engine Info]: layers num: 5
0 INPUT kHALF images 3x640x640
1 OUTPUT kHALF 485 3x80x80x6
2 OUTPUT kHALF 545 3x40x40x6
3 OUTPUT kHALF 605 3x20x20x6
4 OUTPUT kFLOAT output 25200x6

const std::vector<const NvDsInferLayerInfo*> sortedLayers =
SortLayers (outputLayersInfo);

std::vector<NvDsInferParseObjectInfo> objects;

for (uint idx = 0; idx < outputLayersInfo.size(); ++idx)
{
    const NvDsInferLayerInfo &layer = *sortedLayers[idx];

    assert(layer.inferDims.numDims == 3 || layer.inferDims.numDims == 4);

when going to this “assert”, error occured. I found the idx=0 layer is 25200x6
so I changed below:
original code: for (uint idx = 0; idx < outputLayersInfo.size(); ++idx)
modified: for (uint idx = 1; idx < outputLayersInfo.size(); ++idx)
then above error disappeared. but when going to idx=3
3x80x80x80
another error appeared:
Segmentaion fault, core dumped.

Continue:
when I running fp32 model via Yolo_on_Jetson, it works:
modified: for (uint idx = 1; idx < outputLayersInfo.size(); ++idx)

Using winsys: x11
WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0:00:05.933134080 9207 0x7f2c002290 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/home/shisun/Yolo_on_Jetson/deepstream _61/models/best61.trt
INFO: [Implicit Engine Info]: layers num: 5
0 INPUT kFLOAT images 3x640x640
1 OUTPUT kFLOAT 485 3x80x80x6
2 OUTPUT kFLOAT 543 3x40x40x6
3 OUTPUT kFLOAT 601 3x20x20x6
4 OUTPUT kFLOAT output 25200x6

0:00:05.933577632 9207 0x7f2c002290 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /home/shisun/Yolo_on_Jetson/deepstream _61/models/best61.trt
0:00:05.950792288 9207 0x7f2c002290 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/shisun/Yolo_on_Jetson/deepstream _61/config_infer_primary.txt sucessfully

hi,
I regererated trt file with only fp16 option, no other optimization option. And now fp16 model works, no need for api modification.
Thank you very much.

1 Like

Coool! So we can close this?

ok

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.