Running Jetson's detectnet.cpp with peoplenet pruned model

I am trying to run detectnet.cpp with TLT’s pruned peoplenet model. Peoplenet model is converted to fp16.engine. Then loaded to detectnet.cpp.

The console command is as follows.

./detectnet --network=resnet34_peoplenet_pruned.etlt_b1_gpu0_fp16.engine --class_labels=labels.txt --input_blob=input_1 --output_cvg=output_cov/Sigmoid --output_bbox=output_bbox/BiasAdd /home/xaiver/nvidia/jetson-inference/Sample_videos/v1.mp4

But I have a lot of detections and got something wrong.

[TRT]    ------------------------------------------------
[TRT]    Timing Report /opt/nvidia/deepstream/deepstream-5.0/samples/models/tlt_peoplenet/resnet34_peoplenet_pruned.etlt_b1_gpu0_fp16.engine
[TRT]    ------------------------------------------------
[TRT]    Pre-Process   CPU   0.04055ms  CUDA   0.18432ms
[TRT]    Network       CPU  14.99438ms  CUDA  14.19568ms
[TRT]    Post-Process  CPU   3.47062ms  CUDA   3.39354ms
[cuda]      invalid resource handle (error 400) (hex 0x190)
[cuda]      /home/xaiver/nvidia/jetson-inference/Debug/aarch64/include/jetson-inference/tensorNet.h:685
[TRT]    Visualize     CPU   0.00000ms  CUDA   0.00000ms
[TRT]    Total         CPU  18.50555ms  CUDA  17.77354ms
[TRT]    ------------------------------------------------

[cuda]      invalid configuration argument (error 9) (hex 0x09)
[cuda]      /home/xaiver/nvidia/jetson-inference/c/detectNet.cpp:1075
[TRT]    detectNet::Detect() -- failed to render overlay
850 objects detected

I have a lot of small detections.
bounding box 831 (481.056641, 1052.025513) (481.555664, 1055.881714) w=0.499023 h=3.856201
detected obj 832 class #2 (face) confidence=0.973145
bounding box 832 (578.097656, 1052.574219) (577.915039, 1055.559937) w=-0.182617 h=2.985718
detected obj 833 class #2 (face) confidence=0.933105
bounding box 833 (1120.608398, 1048.807739) (1121.612305, 1048.978760) w=1.003906 h=0.171021
detected obj 834 class #2 (face) confidence=0.642090
bounding box 834 (1152.405518, 1049.162476) (1153.195312, 1049.602173) w=0.789795 h=0.439697
detected obj 835 class #2 (face) confidence=1.000000
bounding box 835 (1184.818359, 1048.687500) (1185.407227, 1049.225464) w=0.588867 h=0.537964
detected obj 836 class #2 (face) confidence=0.999023
bounding box 836 (1216.858398, 1049.339355) (1218.359375, 1049.618652) w=1.500977 h=0.279297
detected obj 837 class #2 (face) confidence=0.967773
bounding box 837 (1474.087891, 1059.216431) (1474.457031, 1055.874023) w=0.369141 h=-3.342407
detected obj 838 class #2 (face) confidence=0.765137
bounding box 838 (1504.670898, 1057.549072) (1506.859375, 1054.043823) w=2.188477 h=-3.505249
detected obj 839 class #2 (face) confidence=0.996094
bounding box 839 (1600.468262, 1049.096069) (1600.940430, 1049.894897) w=0.472168 h=0.798828
detected obj 840 class #2 (face) confidence=0.830078
bounding box 840 (1632.708496, 1049.069946) (1632.822266, 1053.307129) w=0.113770 h=4.237183
detected obj 841 class #2 (face) confidence=0.831543
bounding box 841 (1664.512695, 1049.350098) (1664.958984, 1050.860352) w=0.446289 h=1.510254
detected obj 842 class #2 (face) confidence=0.751465
bounding box 842 (1696.633301, 1049.480957) (1696.767578, 1048.604614) w=0.134277 h=-0.876343
detected obj 843 class #2 (face) confidence=0.999512
bounding box 843 (1727.856201, 1048.794189) (1729.166992, 1048.737915) w=1.310791 h=-0.056274
detected obj 844 class #2 (face) confidence=1.000000
bounding box 844 (1760.435303, 1048.584961) (1760.528320, 1049.162964) w=0.093018 h=0.578003
detected obj 845 class #2 (face) confidence=0.960938
bounding box 845 (1792.379639, 1049.445068) (1792.906250, 1051.211304) w=0.526611 h=1.766235
detected obj 846 class #2 (face) confidence=0.954590
bounding box 846 (1824.377197, 1049.437378) (1824.727051, 1050.389282) w=0.349854 h=0.951904
detected obj 847 class #2 (face) confidence=0.596191
bounding box 847 (1856.512695, 1048.688721) (1857.023438, 1049.171753) w=0.510742 h=0.483032

Why I don’t have correct detection?

Hi @edit_or
I think Jetson forum should be able to help you better here, hence moving this to Jetson Team.
Thanks!

Hi @edit_or, I haven’t added pre/post-processing support for the peoplenet model, so I’m not exactly sure what format it expects for the input/output tensors. I suggest using peoplenet model through DeepStream or TLT as shown here:

https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet