Detections change in Deepstream 6.2

samuel17 · June 20, 2023, 2:17pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Good afternoon. I was hoping to start a discussion regarding an issue im seeing with deepstream 6.2. In a nutshell, we are seeing detections change when running model A though deepstream 6.1.1 compared to deepstream 6.2 and Triton 23.05

The model in question is yolov5. When comparing each deepstream version, the only difference between deepstream 6.1.1 and deepstream 6.2 in terms of the model is the TensorRT version. For some models, detections increase while others they decrease.

On the other hand, when running a model through triton detections are even more different than deepstream.

I am more concerned with the differences in deepstream as I am with triton as I know the pipeline is a lot difference. From our findings, it seems like there were major changes in TensorRT that is affecting model performance. Could you provide some detail for why this would happen?

I want to stress again that the Gstreamer pipeline is the exact same. The library for orchestrating compilation to fp16/int8/fp32 is the exact same, the only difference is that TensorRT is using a newer version (8.6 vs 8.4)

I have been struggling to find out why for a few weeks now so any info you could provide would be greatly appreciated. We really need to update to deepstream 6.2 for changes unrelated to inference, but we are blocked from doing so as our test results are not deterministic from the previous version.

Fiona.Chen · June 21, 2023, 12:44am

Did you tested with FP32 or FP16?

We have found some issue with Yolo models. NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream (github.com)

deepstream_tao_apps/configs/yolov4_tao/pgie_yolov4_tao_config.txt at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

samuel17 · June 21, 2023, 7:52pm

Thanks for the response. We are not using TAO to train these models. Would this issue still occur with a model trained with this repository?

I understand that some layers are not suppose to be quantized to int8, but should this be deterministic across deepstream 6.1 and deepstream 6.2? Our issue is that when updating to deepstream 6.2, the same model does not get the same detections. This is very important to us since we have a set of internal tests that are required before a model is ready for production. The change to deepstream 6.2 caused our model to miss some detections while others is increases detections

On the other hand, results with Triton are wildly different than deepstream. I ran the following tests with the same models, and each time the detections results were different… I tried with fp16 and int8

I will attempt the same test with fp32 and post my results here

deepstream 6.1 | TensorRT 8.4
deepstream 6.2 | TensorRT 8.6
triton 23.05 | TensorRT 8.6
triton 22.08 | TensorRT 8.4

The models I used for testing were yolov5, faster rcnn (Caffe + TAO) so I dont think its model type thats affecting it…

samuel17 · June 22, 2023, 12:03am

I just tried with fp32 and I am still getting different results when running models through deepstream 6.1 vs deepstream 6.2

I am very blocked so any information you can provide would be great. Deepstream and TensorRT are great because they offer huge amounts of abstraction, but for the same reason I am unsure how to debug this. We need to upgrade to a new deepstream version to support other elements of our pipeline, but we cannot do that until we can figure our why detections are getting worse…

In almost all cases, detection performance gets worse (i.e the number of true-positives decrease)

Fiona.Chen · June 23, 2023, 9:38am

To debug such issue, you can try to dump the input to gst-nvinfer plugin to check whether they are different. DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

samuel17 · June 23, 2023, 5:42pm

I will try using the gst-nvinfer dump plugin. I already tried saving the buffer post inference to a jpg file in both deepstream 6.1 and 6.2 and compared them with a frame diff and found they are the exact same. This is the RGBA image extracted post nvinfer

My belief is that TensorRT is affecting detection. Is that a possibility from your end? Again, the difference on the same video + model is better extreme across different deepstream versions. For example

DS 6.1.1 = 18188 detections
DS 6.1 = 18181
DS 6.2 = 16237 detections

Also, .engine file sizes are different although they had the same pgie config
DS 6.1 = 99.4 mb
DS 6.1.1 = 94.6 mb
DS 6.2 = 94.4 mb

this is the model architecture

        Layer                         Input Shape         Output Shape        WeightPtr
(0)     conv_silu                     [3, 544, 960]       [64, 272, 480]      7168
(1)     conv_silu                     [64, 272, 480]      [128, 136, 240]     81408
(2)     conv_silu                     [128, 136, 240]     [64, 136, 240]      89856
(3)     route: 1                      -                   [128, 136, 240]     -
(4)     conv_silu                     [128, 136, 240]     [64, 136, 240]      98304
(5)     conv_silu                     [64, 136, 240]      [64, 136, 240]      102656
(6)     conv_silu                     [64, 136, 240]      [64, 136, 240]      139776
(7)     shortcut_add_linear: 4        [64, 136, 240]      [64, 136, 240]      -
(8)     conv_silu                     [64, 136, 240]      [64, 136, 240]      144128
(9)     conv_silu                     [64, 136, 240]      [64, 136, 240]      181248
(10)    shortcut_add_linear: 7        [64, 136, 240]      [64, 136, 240]      -
(11)    conv_silu                     [64, 136, 240]      [64, 136, 240]      185600
(12)    conv_silu                     [64, 136, 240]      [64, 136, 240]      222720
(13)    shortcut_add_linear: 10       [64, 136, 240]      [64, 136, 240]      -
(14)    route: 13, 2                  -                   [128, 136, 240]     -
(15)    conv_silu                     [128, 136, 240]     [128, 136, 240]     239616
(16)    conv_silu                     [128, 136, 240]     [256, 68, 120]      535552
(17)    conv_silu                     [256, 68, 120]      [128, 68, 120]      568832
(18)    route: 16                     -                   [256, 68, 120]      -
(19)    conv_silu                     [256, 68, 120]      [128, 68, 120]      602112
(20)    conv_silu                     [128, 68, 120]      [128, 68, 120]      619008
(21)    conv_silu                     [128, 68, 120]      [128, 68, 120]      766976
(22)    shortcut_add_linear: 19       [128, 68, 120]      [128, 68, 120]      -
(23)    conv_silu                     [128, 68, 120]      [128, 68, 120]      783872
(24)    conv_silu                     [128, 68, 120]      [128, 68, 120]      931840
(25)    shortcut_add_linear: 22       [128, 68, 120]      [128, 68, 120]      -
(26)    conv_silu                     [128, 68, 120]      [128, 68, 120]      948736
(27)    conv_silu                     [128, 68, 120]      [128, 68, 120]      1096704
(28)    shortcut_add_linear: 25       [128, 68, 120]      [128, 68, 120]      -
(29)    conv_silu                     [128, 68, 120]      [128, 68, 120]      1113600
(30)    conv_silu                     [128, 68, 120]      [128, 68, 120]      1261568
(31)    shortcut_add_linear: 28       [128, 68, 120]      [128, 68, 120]      -
(32)    conv_silu                     [128, 68, 120]      [128, 68, 120]      1278464
(33)    conv_silu                     [128, 68, 120]      [128, 68, 120]      1426432
(34)    shortcut_add_linear: 31       [128, 68, 120]      [128, 68, 120]      -
(35)    conv_silu                     [128, 68, 120]      [128, 68, 120]      1443328
(36)    conv_silu                     [128, 68, 120]      [128, 68, 120]      1591296
(37)    shortcut_add_linear: 34       [128, 68, 120]      [128, 68, 120]      -
(38)    route: 37, 17                 -                   [256, 68, 120]      -
(39)    conv_silu                     [256, 68, 120]      [256, 68, 120]      1657856
(40)    conv_silu                     [256, 68, 120]      [512, 34, 60]       2839552
(41)    conv_silu                     [512, 34, 60]       [256, 34, 60]       2971648
(42)    route: 40                     -                   [512, 34, 60]       -
(43)    conv_silu                     [512, 34, 60]       [256, 34, 60]       3103744
(44)    conv_silu                     [256, 34, 60]       [256, 34, 60]       3170304
(45)    conv_silu                     [256, 34, 60]       [256, 34, 60]       3761152
(46)    shortcut_add_linear: 43       [256, 34, 60]       [256, 34, 60]       -
(47)    conv_silu                     [256, 34, 60]       [256, 34, 60]       3827712
(48)    conv_silu                     [256, 34, 60]       [256, 34, 60]       4418560
(49)    shortcut_add_linear: 46       [256, 34, 60]       [256, 34, 60]       -
(50)    conv_silu                     [256, 34, 60]       [256, 34, 60]       4485120
(51)    conv_silu                     [256, 34, 60]       [256, 34, 60]       5075968
(52)    shortcut_add_linear: 49       [256, 34, 60]       [256, 34, 60]       -
(53)    conv_silu                     [256, 34, 60]       [256, 34, 60]       5142528
(54)    conv_silu                     [256, 34, 60]       [256, 34, 60]       5733376
(55)    shortcut_add_linear: 52       [256, 34, 60]       [256, 34, 60]       -
(56)    conv_silu                     [256, 34, 60]       [256, 34, 60]       5799936
(57)    conv_silu                     [256, 34, 60]       [256, 34, 60]       6390784
(58)    shortcut_add_linear: 55       [256, 34, 60]       [256, 34, 60]       -
(59)    conv_silu                     [256, 34, 60]       [256, 34, 60]       6457344
(60)    conv_silu                     [256, 34, 60]       [256, 34, 60]       7048192
(61)    shortcut_add_linear: 58       [256, 34, 60]       [256, 34, 60]       -
(62)    conv_silu                     [256, 34, 60]       [256, 34, 60]       7114752
(63)    conv_silu                     [256, 34, 60]       [256, 34, 60]       7705600
(64)    shortcut_add_linear: 61       [256, 34, 60]       [256, 34, 60]       -
(65)    conv_silu                     [256, 34, 60]       [256, 34, 60]       7772160
(66)    conv_silu                     [256, 34, 60]       [256, 34, 60]       8363008
(67)    shortcut_add_linear: 64       [256, 34, 60]       [256, 34, 60]       -
(68)    conv_silu                     [256, 34, 60]       [256, 34, 60]       8429568
(69)    conv_silu                     [256, 34, 60]       [256, 34, 60]       9020416
(70)    shortcut_add_linear: 67       [256, 34, 60]       [256, 34, 60]       -
(71)    route: 70, 41                 -                   [512, 34, 60]       -
(72)    conv_silu                     [512, 34, 60]       [512, 34, 60]       9284608
(73)    conv_silu                     [512, 34, 60]       [1024, 17, 30]      14007296
(74)    conv_silu                     [1024, 17, 30]      [512, 17, 30]       14533632
(75)    route: 73                     -                   [1024, 17, 30]      -
(76)    conv_silu                     [1024, 17, 30]      [512, 17, 30]       15059968
(77)    conv_silu                     [512, 17, 30]       [512, 17, 30]       15324160
(78)    conv_silu                     [512, 17, 30]       [512, 17, 30]       17685504
(79)    shortcut_add_linear: 76       [512, 17, 30]       [512, 17, 30]       -
(80)    conv_silu                     [512, 17, 30]       [512, 17, 30]       17949696
(81)    conv_silu                     [512, 17, 30]       [512, 17, 30]       20311040
(82)    shortcut_add_linear: 79       [512, 17, 30]       [512, 17, 30]       -
(83)    conv_silu                     [512, 17, 30]       [512, 17, 30]       20575232
(84)    conv_silu                     [512, 17, 30]       [512, 17, 30]       22936576
(85)    shortcut_add_linear: 82       [512, 17, 30]       [512, 17, 30]       -
(86)    route: 85, 74                 -                   [1024, 17, 30]      -
(87)    conv_silu                     [1024, 17, 30]      [1024, 17, 30]      23989248
(88)    conv_silu                     [1024, 17, 30]      [512, 17, 30]       24515584
(89)    maxpool                       [512, 17, 30]       [512, 17, 30]       -
(90)    maxpool                       [512, 17, 30]       [512, 17, 30]       -
(91)    maxpool                       [512, 17, 30]       [512, 17, 30]       -
(92)    route: 88, 89, 90, 91         -                   [2048, 17, 30]      -
(93)    conv_silu                     [2048, 17, 30]      [1024, 17, 30]      26616832
(94)    conv_silu                     [1024, 17, 30]      [512, 17, 30]       27143168
(95)    upsample                      [512, 17, 30]       [512, 34, 60]       -
(96)    route: 95, 72                 -                   [1024, 34, 60]      -
(97)    conv_silu                     [1024, 34, 60]      [256, 34, 60]       27406336
(98)    route: 96                     -                   [1024, 34, 60]      -
(99)    conv_silu                     [1024, 34, 60]      [256, 34, 60]       27669504
(100)   conv_silu                     [256, 34, 60]       [256, 34, 60]       27736064
(101)   conv_silu                     [256, 34, 60]       [256, 34, 60]       28326912
(102)   conv_silu                     [256, 34, 60]       [256, 34, 60]       28393472
(103)   conv_silu                     [256, 34, 60]       [256, 34, 60]       28984320
(104)   conv_silu                     [256, 34, 60]       [256, 34, 60]       29050880
(105)   conv_silu                     [256, 34, 60]       [256, 34, 60]       29641728
(106)   route: 105, 97                -                   [512, 34, 60]       -
(107)   conv_silu                     [512, 34, 60]       [512, 34, 60]       29905920
(108)   conv_silu                     [512, 34, 60]       [256, 34, 60]       30038016
(109)   upsample                      [256, 34, 60]       [256, 68, 120]      -
(110)   route: 109, 39                -                   [512, 68, 120]      -
(111)   conv_silu                     [512, 68, 120]      [128, 68, 120]      30104064
(112)   route: 110                    -                   [512, 68, 120]      -
(113)   conv_silu                     [512, 68, 120]      [128, 68, 120]      30170112
(114)   conv_silu                     [128, 68, 120]      [128, 68, 120]      30187008
(115)   conv_silu                     [128, 68, 120]      [128, 68, 120]      30334976
(116)   conv_silu                     [128, 68, 120]      [128, 68, 120]      30351872
(117)   conv_silu                     [128, 68, 120]      [128, 68, 120]      30499840
(118)   conv_silu                     [128, 68, 120]      [128, 68, 120]      30516736
(119)   conv_silu                     [128, 68, 120]      [128, 68, 120]      30664704
(120)   route: 119, 111               -                   [256, 68, 120]      -
(121)   conv_silu                     [256, 68, 120]      [256, 68, 120]      30731264
(122)   conv_silu                     [256, 68, 120]      [256, 34, 60]       31322112
(123)   route: 122, 108               -                   [512, 34, 60]       -
(124)   conv_silu                     [512, 34, 60]       [256, 34, 60]       31454208
(125)   route: 123                    -                   [512, 34, 60]       -
(126)   conv_silu                     [512, 34, 60]       [256, 34, 60]       31586304
(127)   conv_silu                     [256, 34, 60]       [256, 34, 60]       31652864
(128)   conv_silu                     [256, 34, 60]       [256, 34, 60]       32243712
(129)   conv_silu                     [256, 34, 60]       [256, 34, 60]       32310272
(130)   conv_silu                     [256, 34, 60]       [256, 34, 60]       32901120
(131)   conv_silu                     [256, 34, 60]       [256, 34, 60]       32967680
(132)   conv_silu                     [256, 34, 60]       [256, 34, 60]       33558528
(133)   route: 132, 124               -                   [512, 34, 60]       -
(134)   conv_silu                     [512, 34, 60]       [512, 34, 60]       33822720
(135)   conv_silu                     [512, 34, 60]       [512, 17, 30]       36184064
(136)   route: 135, 94                -                   [1024, 17, 30]      -
(137)   conv_silu                     [1024, 17, 30]      [512, 17, 30]       36710400
(138)   route: 136                    -                   [1024, 17, 30]      -
(139)   conv_silu                     [1024, 17, 30]      [512, 17, 30]       37236736
(140)   conv_silu                     [512, 17, 30]       [512, 17, 30]       37500928
(141)   conv_silu                     [512, 17, 30]       [512, 17, 30]       39862272
(142)   conv_silu                     [512, 17, 30]       [512, 17, 30]       40126464
(143)   conv_silu                     [512, 17, 30]       [512, 17, 30]       42487808
(144)   conv_silu                     [512, 17, 30]       [512, 17, 30]       42752000
(145)   conv_silu                     [512, 17, 30]       [512, 17, 30]       45113344
(146)   route: 145, 137               -                   [1024, 17, 30]      -
(147)   conv_silu                     [1024, 17, 30]      [1024, 17, 30]      46166016
(148)   route: 121                    -                   [256, 68, 120]      -
(149)   conv_logistic                 [256, 68, 120]      [21, 68, 120]       46171413
(150)   yolo                          [21, 68, 120]       -                   -
(151)   route: 134                    -                   [512, 34, 60]       -
(152)   conv_logistic                 [512, 34, 60]       [21, 34, 60]        46182186
(153)   yolo                          [21, 34, 60]        -                   -
(154)   route: 147                    -                   [1024, 17, 30]      -
(155)   conv_logistic                 [1024, 17, 30]      [21, 17, 30]        46203711
(156)   yolo                          [21, 17, 30]        -                   -

DS 6.1 Config

[property]
gpu-id = 0
model-color-format = 0
labelfile-path = /labels.txt
uff-input-blob-name = input_image
process-mode = 1
num-detected-classes = 2
interval = 0
batch-size = 1
gie-unique-id = 1
is-classifier = 0
maintain-aspect-ratio = 1
network-mode = 2
workspace-size = 9000
net-scale-factor = .0039215697906911373
cluster-mode = 2
offsets = 0;0;0
force-implicit-batch-dim = 1
infer-dims = 3;544;960
custom-network-config = /best_ap.cfg
model-file=/best_ap.wts
model-engine-file = /fp16.engine
parse-bbox-func-name = NvDsInferParseYolo
custom-lib-path = /opt/nvidia/deepstream/deepstream-6.1/sources/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name = NvDsInferYoloCudaEngineGet

[class-attrs-0]
#class = P
post-cluster-threshold = 0.83

[class-attrs-1]
#class = R
post-cluster-threshold = 0.85

DS 6.1.1 Config

[property]
gpu-id = 0
model-color-format = 0
labelfile-path = /labels.txt
uff-input-blob-name = input_image
process-mode = 1
num-detected-classes = 2
interval = 0
batch-size = 1
gie-unique-id = 1
is-classifier = 0
maintain-aspect-ratio = 1
network-mode = 2
workspace-size = 9000
net-scale-factor = .0039215697906911373
cluster-mode = 2
offsets = 0;0;0
force-implicit-batch-dim = 1
infer-dims = 3;544;960
custom-network-config = /best_ap.cfg
model-file=/best_ap.wts
model-engine-file = /fp16.engine
parse-bbox-func-name = NvDsInferParseYolo
custom-lib-path = /opt/nvidia/deepstream/deepstream-6.1/sources/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name = NvDsInferYoloCudaEngineGet

[class-attrs-0]
post-cluster-threshold = 0.83

[class-attrs-1]
post-cluster-threshold = 0.85

[property]
gpu-id = 0
model-color-format = 0
labelfile-path = /labels.txt
uff-input-blob-name = input_image
process-mode = 1
num-detected-classes = 2
interval = 0
batch-size = 1
gie-unique-id = 1
is-classifier = 0
maintain-aspect-ratio = 1
network-mode = 2
workspace-size = 9000
net-scale-factor = .0039215697906911373
cluster-mode = 2
offsets = 0;0;0
force-implicit-batch-dim = 1
infer-dims = 3;544;960
custom-network-config = /best_ap.cfg
model-engine-file = /fp16.engine
parse-bbox-func-name = NvDsInferParseYolo
model-file=/best_ap.wts
custom-lib-path = /opt/nvidia/deepstream/deepstream-6.2/sources/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name = NvDsInferYoloCudaEngineGet

[class-attrs-0]
post-cluster-threshold = 0.87

[class-attrs-1]
post-cluster-threshold = 0.9

each deepstream is running in its own docker container separate to the other

samuel17 · June 23, 2023, 7:31pm

I edited above to show my testing with 6.1 included as well. I have now tested the same model between deepstream 6.1/6.1.1/6.2 and got different results each time…

I believe this is a TensorRT issue but I have no way of knowing for sure. Ive ran each test at fp32 and fp16

samuel17 · June 27, 2023, 11:34am

Here is a link to a GitHub issue where I raise the same concern

github.com/marcoslucianops/DeepStream-Yolo

Create release branches

opened 11:39AM - 31 May 23 UTC

HeeebsInc

I want to start by saying thank you for all your hard work with this code repo. … Its clear it was no easy feat and its been a life saver for me personally. Wanted to reach out to ask if it would be possible to create release branches as you update the code base. For example, the new change to onnx will be much better long term, but we are seeing issues with inference performance compared with the previous release. This makes it difficult to create releases if you overwrite the old with the new without keeping a copy of the old. I remember the same thing happened with deepstream 5. Is there a specific reason you dont use releases? I know you are only a single person so I dont want to create new work for you, but all I ask is if there is a new version you create a new tag/branch while keeping the old release intact. The issue we are seeing with onnx is that the same model does not get the same detections. For example, we have a test video that we run our models through for QA. When running our main model though the video using the old version, we get 3 detections. Running the same model (with the same threshold + pgie variables) through the same video with this current onnx version, we get 0 detections. I confirmed that the model is making predictions using other videos but the results are not the same. At first I thought this was a configuration issue with the new version, but running it through "easier" videos we confirmed that we are able to get good detections with the new version. This non-deterministic behavior makes it tough for us to sign off on deployment I also want to extend a hand and offer support wherever you need.

Is there any update on why yolov5 detection performance would change across different deepstream versions @Fiona.Chen? Even if the quantization process remains the same

yuweiw · July 1, 2023, 9:01am

Hi @samuel17, could you attach the video you used for comparison?

samuel17 · July 1, 2023, 8:23pm

@yuweiw Apologies but the video we use for comparison contains confidential data that I cannot share. After further investigating, I gathered the following results

After consulting with the creator of the yolo repo we use for compiling the model to TensorRT, we found that the biggest difference occurs when comparing .wts to .onnx files (regardless of the deepstream version). Even though there are still differences for .wts between each DS version, its negligible compared to moving to .onnx - I dont believe this is a deepstream issue as it has to do with the method for how we create the .engine file

On the other hand, I am still not able to figure out why there is such a major change between triton and deepstream. In the image I dropped above, the triton tests were ran over images that were obtained from deepstream. I ran our test video through deepstream, and saved each GPU buffer to .png. After running deepstream, I inferenced over every frame using triton and the results are pretty significant.

Now that we have a answer for deepstream, we are still very blocked on triton differences. I made sure normalization is the same

I understand deepstream is used for a different use case. But surely my experiment should produce results that say they are similar. Again, the key frame images I used during the triton test were generated by deepstream to make sure decode is the same and triton saw the same information that deepstream did

yuweiw · July 3, 2023, 2:38am

OK. Now let’s analyze the differences between Deepstream and Triton. Could you help reply the following questions first?
1.How do you save each GPU buffer to .png
2.How do you inference over every frame using triton
3.Could you share the picture with the greatest difference. You can directly message to me.
4.Is this comparison process on DeepStream6.2.

samuel17 · July 3, 2023, 6:21pm

Thanks for the support @yuweiw

Buffer probe attached to the end of the pipeline (NVDSOSD)

def buffer_probe(self, pad, info, u_data, inference_interval):
  gst_buffer = info.get_buffer()
  batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
  if not batch_meta:
      return Gst.PadProbeReturn.OK
  l_frame = batch_meta.frame_meta_list
  while l_frame is not None:
      try:
          frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
          self.pbar.update(1)
          self.pbar.set_description(f'{self.detection_dict}')
      except StopIteration:
          break
      source_idx = frame_meta.pad_index
      if source_idx != 0:
          try:
              l_frame = l_frame.next
              continue
          except StopIteration:
              break
      l_obj = frame_meta.obj_meta_list
      frame_num = frame_meta.frame_num
      frame_img = pyds.get_nvds_buf_surface(hash(gst_buffer), frame_meta.batch_id)
      frame_img = np.array(frame_img, copy=True, order='C')
      frame_img = cv2.cvtColor(frame_img, cv2.COLOR_RGBA2BGR)
      cv2.imwrite(f'{self.output_frame_dir}/{frame_num}.png', frame_img, [int(cv2.IMWRITE_PNG_COMPRESSION),0])

After running the deepstream pipeline, I have an output directory where each frame from the original video is saved. I then load each image from the directory into the triton client.

model_config = client.get_model_config(model_name)
input_layer = model_config['input'][0]['name']
output_layer = model_config['output'][0]['name']
input_type = model_config['input'][0]['data_type']
dtype = np.float32 if input_type == 'TYPE_FP32' else np.float16
if rgb_img.shape[0] != resolution_height or rgb_img.shape[1] != resolution_width:
    rgb_img = letterbox(rgb_img, new_shape = (resolution_height, resolution_width), auto=False, stride = 32)[0]
img_pp = rgb_img.transpose((2, 0, 1))
img_pp = np.ascontiguousarray(img_pp)  # contiguous
img_pp = img_pp.astype(np.float32)
img_pp /= 255.0
img_pp = img_pp[None].astype(dtype)

inputs = [
    httpclient.InferInput(input_layer, img_pp.shape, np_to_triton_dtype(dtype)),
]

inputs[0].set_data_from_numpy(img_pp)

outputs = [
    httpclient.InferRequestedOutput(output_layer),
]

response = client.infer(
    model_name,
    model_version="1",
    inputs=inputs,
    request_id=str(1),
    outputs=outputs
)

output0_data = response.as_numpy(output_layer)

post_process = image_processing.PostProcess()

output_boxes = post_process.non_max_suppression(prediction=output0_data)[0]

I will get the picture with the greatest difference and direct message it ASAP.
This is a direct comparison between Deepstream 6.2 and Triton 23.01. They both use TRT 8.5.2.2

When inferencing with triton, was originally converting the image to fp16 before pre-processing. I changed it so that it converts to fp32 first, performs normalization, then before sending it to triton I convert the array to fp16. Doing this I am able to get very very close to deepstream 6.2 (closer than I ever have been). I am still getting differences - and these differences occur in very crucial key frame images so they cannot be written off. I will send one of these key frame images to you

yuweiw · July 4, 2023, 2:15am

OK, you can click on my icon and message to me.

Did you add the probe function in src pad or sink pad of nvdsosd? If it was added on the src pad, the image you saved should have bbox. Then you used this image to preprocess it by yourself and sent it to TensorRT, is that right? Could you attach the preprocess you used?
And what’s your hardware platform, Jetson(Orin, Xaviar…) or GPU(T4…)?

yuweiw · July 12, 2023, 9:19am

Hi @samuel17 , about the differences between DeepStream and Triton, is this still the issue for your case?

samuel17 · July 12, 2023, 10:58am

@yuweiw i was probing the sink pad so there were no bounding boxes on the images. I am doing this testing on a a RTX 3090 to start - I will eventually test it on an A100 and Orin when I get results that are deterministic

the pre processing I used for triton is as follows

 img_pp = rgb_img.transpose((2, 0, 1))
 img_pp = np.ascontiguousarray(img_pp)  # contiguous
 img_pp = img_pp.astype(np.float32)
 img_pp /= 255.0
 img_pp = img_pp[None].astype(dtype)

I will try my best to get a good key frame image for you by Friday., I have to go through some levels of approval since its senstivie

Thanks for the help,

samuel17 · July 12, 2023, 11:11am

@yuweiw yes - the results are still different between the two. I am running more experiments and will send you my findings by Friday. As mentioned, the results are close but not the same

I am also consulting with our machine learning team to see if we can train a model with and without letterbox resizing to do comparison since the post processor we use for our models within deepstream is open source, while the post processor we are using for triton is in house

I will definitely have more updates for you by Friday

yuweiw · July 13, 2023, 1:24am

In theory, these two results cannot be exactly the same. The image decoders and pre-processing are different in these two scenarios. If the bias is not significant, it is normal.

OK. It is best to provide the original image and the concrete operation steps for triton and DeepStream, as well as the differences in their results. Thanks

yingliu · July 28, 2023, 6:31am

Hi @samuel17 ,
I just checked the test result you posted in github, and found the TP between DS6.1/6.1.1/DS6.2 is very close when wts is used.

As you can see the above result is more consistent than the result you shared before.

So is it still a problem with the latest result when wts is used?

As the Triton and TensorRT tests have different pipeline compared to DeepStream, and there are two threads have different updates in this forum and github, would you mind summarizing the current issue that impacting you upgrading to DS6.2?
I’d like to focus on DeepStream behavior and focus on the issue impacting your upgrading to DS6.2, thanks.

samuel17 · July 28, 2023, 10:06am

@yingliu apologies as I should have updated both threads. When using wts files, there is still a difference but it is negligible (as you referenced above). The issue we found when initially upgrading to DS 6.2 was that we also upgraded the yolo module to use .onnx instead of weights, which was impacting recall significantly. We still have not found out why moving to .onnx significantly changes model performance at the same thresholds we used when using .wts. I still believe its a TensorRT issue, but directed the questions toward the maintainer of the github repository we use for Yolo. Do you have any idea why moving to .onnx would impact model performance?

yingliu · July 29, 2023, 3:14am

Can you share the way how the wts is created and how it is converted to ONNX file?