resnet10.caffemodel_b8_fp16.engine is optimized for

I’ve understood that DeepStream sample app allows high-speed inference.
So I’m investigating “source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt” settings.

[primary-gie]
model-engine-file = … / … / models / Primary_Detector_Nano / resnet10.caffemodel_b8_fp16.engine

Here, the model called resnet10.caffemodel_b8_fp16.engine is optimized for “Only DeepStream” ?

Can’t use this model, as in the Jetson Inference example (detectnet-camera.py)?

(Python bindings for DeepStream currently state that all image frames cannot be obtained.
】 I received an answer that the corresponding version has not yet been released.

https://devtalk.nvidia.com/default/topic/1068306/deepstream-sdk/fail-to-get-frame-in-tensor-metadata/post/5412437/#5412437

Therefore, I want to use it together with the conventional image frame acquisition method)

Can’t use this model, as in the Jetson Inference example (detectnet-camera.py)?
You can use this engine in other application which run on the same platform and Jetpack version, and you need have the same pre-precess and post-process as DeepStream is doing for this model.

Thank you.

I can’t understand what you mean by “the same pre-precess and post-process as DeepStream is doing for this model”.

But I saw a similar question,
I tried the following command, but it is “Segmentation fault (core dumped)” output on the console.
Is something wrong?

https://devtalk.nvidia.com/default/topic/1027623/jetson-tx2/pretrained-models-for-detectnet-vehicles/

(* I’ve copied “Primary_Detector_Nano” from DeepStream to ~/jetson-inference/data/networks/…)

Below is log…--------------------------------------------------------------------------------------------

user-desktop:~/jetson-inference/python/examples$ python3 detectnet-console.py city_0.jpg test.jpg --prototxt=…/…/data/networks/Primary_Detector_Nano/resnet10.prototxt --model=…/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel

jetson.inference.init.py
jetson.inference – initializing Python 3.6 bindings…
jetson.inference – registering module types…
jetson.inference – done registering module types
jetson.inference – done Python 3.6 binding initialization
jetson.utils.init.py
jetson.utils – initializing Python 3.6 bindings…
jetson.utils – registering module functions…
jetson.utils – done registering module functions
jetson.utils – registering module types…
jetson.utils – done registering module types
jetson.utils – done Python 3.6 binding initialization
[image] loaded ‘city_0.jpg’ (1024 x 512, 3 channels)
jetson.inference – PyTensorNet_New()
jetson.inference – PyDetectNet_Init()
jetson.inference – detectNet loading network using argv command line params
jetson.inference – detectNet.init() argv[0] = ‘detectnet-console.py’
jetson.inference – detectNet.init() argv[1] = ‘city_0.jpg’
jetson.inference – detectNet.init() argv[2] = ‘test.jpg’
jetson.inference – detectNet.init() argv[3] = ‘–prototxt=…/…/data/networks/Primary_Detector_Nano/resnet10.prototxt’
jetson.inference – detectNet.init() argv[4] = ‘–model=…/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel’

detectNet – loading detection network model from:
– prototxt …/…/data/networks/Primary_Detector_Nano/resnet10.prototxt
– model …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel
– input_blob ‘data’
– output_cvg ‘coverage’
– output_bbox ‘bboxes’
– mean_pixel 0.000000
– mean_binary NULL
– class_labels NULL
– threshold 0.500000
– batch_size 1

[TRT] TensorRT version 5.1.6
[TRT] loading NVIDIA plugins…
[TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[TRT] Plugin Creator registration succeeded - NMS_TRT
[TRT] Plugin Creator registration succeeded - Reorg_TRT
[TRT] Plugin Creator registration succeeded - Region_TRT
[TRT] Plugin Creator registration succeeded - Clip_TRT
[TRT] Plugin Creator registration succeeded - LReLU_TRT
[TRT] Plugin Creator registration succeeded - PriorBox_TRT
[TRT] Plugin Creator registration succeeded - Normalize_TRT
[TRT] Plugin Creator registration succeeded - RPROI_TRT
[TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - caffe (extension ‘.caffemodel’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel.1.1.GPU.FP16.engine
[TRT] cache file not found, profiling network model on device GPU
[TRT] device GPU, loading …/…/data/networks/Primary_Detector_Nano/resnet10.prototxt …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel
[TRT] failed to retrieve tensor for Output “coverage”
[TRT] failed to retrieve tensor for Output “bboxes”
Segmentation fault (core dumped)

I can’t understand what you mean by “the same pre-precess and post-process as DeepStream is doing for this model”.
Pre-preocess means the operations like resize, substract mean, etc.
Post-precess means the output parser.

[TRT] failed to retrieve tensor for Output “coverage”
[TRT] failed to retrieve tensor for Output “bboxes”
There is not “coverage” and “bboxes” layers in resnet10.prototxt

Thank you mchi.

I added below option (from “resnet10.prototxt”)

–output_cvg=‘conv2d_cov’
–output_bbox=‘conv2d_bbox’

user-desktop:~/jetson-inference/python/examples$ python3 detectnet-console.py city_0.jpg test.jpg --prototxt=…/…/data/networks/Primary_Detector_Nano/resnet10.prototxt --model=…/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel --output_cvg=‘conv2d_cov’ --output_bbox=‘conv2d_bbox’

Then, Segmentation fault has been resolved.

However another error was printed.

jetson.inference – detectNet failed to load built-in network ‘ssd-mobilenet-v2’

Why ‘ssd-mobilenet-v2’?


[TRT] device GPU, …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel loaded
[TRT] device GPU, CUDA engine context initialized with 3 bindings
[TRT] binding – index 0
– name ‘input_1’
– type FP32
– in/out INPUT
– # dims 3
– dim #0 3 (CHANNEL)
– dim #1 272 (SPATIAL)
– dim #2 480 (SPATIAL)
[TRT] binding – index 1
– name ‘conv2d_bbox’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 16 (CHANNEL)
– dim #1 17 (SPATIAL)
– dim #2 30 (SPATIAL)
[TRT] binding – index 2
– name ‘conv2d_cov’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 4 (CHANNEL)
– dim #1 17 (SPATIAL)
– dim #2 30 (SPATIAL)
[TRT] binding to input 0 data binding index: -1
[TRT] binding to input 0 data dims (b=1 c=0 h=0 w=0) size=0
[TRT] failed to alloc CUDA mapped memory for tensor input, 0 bytes
detectNet – failed to initialize.
jetson.inference – detectNet failed to load built-in network ‘ssd-mobilenet-v2’
PyTensorNet_Dealloc()
Traceback (most recent call last):
File “detectnet-console.py”, line 51, in
net = jetson.inference.detectNet(opt.network, sys.argv, opt.threshold)
Exception: jetson.inference – detectNet failed to load network
jetson.utils – freeing CUDA mapped memory

Why ‘ssd-mobilenet-v2’?
I think it should be from detectnet-console.py, please look into the code.

Why ‘ssd-mobilenet-v2’?
I think it should be from detectnet-console.py, please look into the code.

I understand the cause. I did not specify ‘ssd-mobilenet-v2’ in the source,
but it could not load the specified “resnet10.caffemodel” so it was trying to read the default instead…

I was able to run “detectnet-console.py” improving with the below parameters.

      -- input_blob   'input_1'
      -- output_cvg   'conv2d_cov/Sigmoid'
      -- output_bbox  'conv2d_bbox'

…But the results obtained are useless…

I detected attached ‘city_0.jpg’, but detects (2016 meaningless bbox)
Why is this?


jetson.inference – initializing Python 3.6 bindings…
jetson.inference – registering module types…
jetson.inference – done registering module types
jetson.inference – done Python 3.6 binding initialization
jetson.utils – initializing Python 3.6 bindings…
jetson.utils – registering module functions…
jetson.utils – done registering module functions
jetson.utils – registering module types…
jetson.utils – done registering module types
jetson.utils – done Python 3.6 binding initialization
[image] loaded ‘city_0.jpg’ (1024 x 512, 3 channels)
jetson.inference – PyTensorNet_New()
jetson.inference – PyDetectNet_Init()
jetson.inference – detectNet loading network using argv command line params
jetson.inference – detectNet.init() argv[0] = ‘detectnet-console.py’
jetson.inference – detectNet.init() argv[1] = ‘city_0.jpg’
jetson.inference – detectNet.init() argv[2] = ‘test.jpg’
jetson.inference – detectNet.init() argv[3] = ‘–prototxt=…/…/data/networks/Primary_Detector_Nano/resnet10.prototxt’
jetson.inference – detectNet.init() argv[4] = ‘–model=…/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel’
jetson.inference – detectNet.init() argv[5] = ‘–output_cvg=conv2d_cov/Sigmoid’
jetson.inference – detectNet.init() argv[6] = ‘–output_bbox=conv2d_bbox’
jetson.inference – detectNet.init() argv[7] = ‘–input_blob=input_1’

detectNet – loading detection network model from:
– prototxt …/…/data/networks/Primary_Detector_Nano/resnet10.prototxt
– model …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel
– input_blob ‘input_1’
– output_cvg ‘conv2d_cov/Sigmoid’
– output_bbox ‘conv2d_bbox’
– mean_pixel 0.000000
– mean_binary NULL
– class_labels NULL
– threshold 0.500000
– batch_size 1

[TRT] TensorRT version 5.1.6
[TRT] loading NVIDIA plugins…
[TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[TRT] Plugin Creator registration succeeded - NMS_TRT
[TRT] Plugin Creator registration succeeded - Reorg_TRT
[TRT] Plugin Creator registration succeeded - Region_TRT
[TRT] Plugin Creator registration succeeded - Clip_TRT
[TRT] Plugin Creator registration succeeded - LReLU_TRT
[TRT] Plugin Creator registration succeeded - PriorBox_TRT
[TRT] Plugin Creator registration succeeded - Normalize_TRT
[TRT] Plugin Creator registration succeeded - RPROI_TRT
[TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - caffe (extension ‘.caffemodel’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel.1.1.GPU.FP16.engine
[TRT] cache file not found, profiling network model on device GPU
[TRT] device GPU, loading …/…/data/networks/Primary_Detector_Nano/resnet10.prototxt …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel
[TRT] retrieved Output tensor “conv2d_cov/Sigmoid”: 4x17x30
[TRT] retrieved Output tensor “conv2d_bbox”: 16x17x30
[TRT] retrieved Input tensor “input_1”: 3x272x480
jetson.inference.init.py
jetson.utils.init.py
detected 2016 objects in image
<detectNet.Detection object>
– ClassID: 0
– Confidence: 24.5156
– Left: 2.13333
– Top: 4.46564e-09
– Right: 0.747566
– Bottom: 0.00201247
– Width: -1.38577
– Height: 0.00201247
– Area: -0.00278881
– Center: (1.44045, 0.00100624)

************************************************Omission (There are 2014 useless objects with no size)

<detectNet.Detection object>
– ClassID: 3
– Confidence: 10.3906
– Left: 991.679
– Top: 481.882
– Right: 989.867
– Bottom: 481.882
– Width: -1.81195
– Height: 0
– Area: -0
– Center: (990.773, 481.882)
[TRT] device GPU, configuring CUDA engine
[TRT] device GPU, building FP16: ON
[TRT] device GPU, building INT8: OFF
[TRT] device GPU, building CUDA engine (this may take a few minutes the first time a network is loaded)
[TRT] device GPU, completed building CUDA engine
[TRT] network profiling complete, writing engine cache to …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel.1.1.GPU.FP16.engine
[TRT] device GPU, completed writing engine cache to …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel.1.1.GPU.FP16.engine
[TRT] device GPU, …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel loaded
[TRT] device GPU, CUDA engine context initialized with 3 bindings
[TRT] binding – index 0
– name ‘input_1’
– type FP32
– in/out INPUT
– # dims 3
– dim #0 3 (CHANNEL)
– dim #1 272 (SPATIAL)
– dim #2 480 (SPATIAL)
[TRT] binding – index 1
– name ‘conv2d_bbox’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 16 (CHANNEL)
– dim #1 17 (SPATIAL)
– dim #2 30 (SPATIAL)
[TRT] binding – index 2
– name ‘conv2d_cov/Sigmoid’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 4 (CHANNEL)
– dim #1 17 (SPATIAL)
– dim #2 30 (SPATIAL)
[TRT] binding to input 0 input_1 binding index: 0
[TRT] binding to input 0 input_1 dims (b=1 c=3 h=272 w=480) size=1566720
[TRT] binding to output 0 conv2d_cov/Sigmoid binding index: 2
[TRT] binding to output 0 conv2d_cov/Sigmoid dims (b=1 c=4 h=17 w=30) size=8160
[TRT] binding to output 1 conv2d_bbox binding index: 1
[TRT] binding to output 1 conv2d_bbox dims (b=1 c=16 h=17 w=30) size=32640
device GPU, …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel initialized.
detectNet – number object classes: 4
detectNet – maximum bounding boxes: 2040
[cuda] cudaDetectionOverlay((float4*)input, (float4*)output, width, height, detections, numDetections, (float4*)mClassColors[1])
[cuda] invalid configuration argument (error 9) (hex 0x09)
[cuda] /home/gk40002251/jetson-inference/c/detectNet.cpp:945
[TRT] detectNet::Detect() – failed to render overlay

[TRT] ------------------------------------------------
[TRT] Timing Report …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel
[TRT] ------------------------------------------------
[TRT] Pre-Process CPU 0.12172ms CUDA 0.80625ms
[TRT] Network CPU 20.92214ms CUDA 20.25526ms
[TRT] Post-Process CPU 166.44992ms CUDA 166.67921ms
[cuda] cudaEventElapsedTime(&cuda_time, mEventsGPU[evt], mEventsGPU[evt+1])
[cuda] invalid resource handle (error 33) (hex 0x21)
[cuda] /home/gk40002251/jetson-inference/build/aarch64/include/jetson-inference/tensorNet.h:499
[TRT] Visualize CPU 0.00000ms CUDA 0.00000ms
[TRT] Total CPU 187.49377ms CUDA 187.74072ms
[TRT] ------------------------------------------------

[TRT] note – when processing a single image, run ‘sudo jetson_clocks’ before
to disable DVFS for more accurate profiling/timing measurements

jetson.utils – freeing CUDA mapped memory
PyTensorNet_Dealloc()
jetson.inference – PyDetection_Dealloc()

************************************************Omission(Large amount of Dealloc)

As I mentioned previously, have you added the same pre-precess and post-process as DeepStream does for ResNet10?
If you use the post-precess/parser of ssd-mobilenet-v2 to parse the output of ResNet10, I don’t think it will work.

1.About pre-process

Examining the DeepStream pre-process (https://www.nvidia.com/content/apac/gtc/ja/pdf/2018/1025.pdf) 34 pages, sorry in Japanese

“YUV” data is passed from the stage preceding the inferring nvinfer.
I think it is a specification to infer “YUV” data.

So,Using Img (RGBA)data read by jetson.utils.loadImageRGBA of detectnet-console.py
net.Detect (img, width, height, opt.overlay) doesn’t work.

You say

Have you added the same pre-precess and post-process as DeepStream does for ResNet10?

That said, I read the source of net.Detect, but it seems that RGBA is expected to come
I think it is necessary to detect it using YUV internally using cudaRGBAToYUV etc.

Do I have to make PyDetectNet.cpp myself to convert to YUV?
jetson-inference / python / bindings / PyDetectNet.cpp

If so, it is not so easy. .

  1. About post-process

I think that post-process / parser processing is done by below setting.
Correct?

(This value is obtained from output-blob-names of “dstest1_pgie_config.txt”
Output-blob-names = conv2d_bbox; conv2d_cov / Sigmoid)

-output_cvg ‘conv2d_cov / Sigmoid’
-output_bbox ‘conv2d_bbox’

Hi
Sorry for late response!

Fime: /opt/nvidia/deepstream/deepstream-4.0/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp

Pre-process function:
NvDsInferContextImpl::queueInputBatch(NvDsInferContextBatchInput &batchInput)
In this function, it calls convertFcn() to do data formation conversion, substract mean value, etc.

Post-process function:
NvDsInferContextImpl::dequeueOutputBatch(NvDsInferContextBatchOutput &batchOutput)
In this function, it calls the BBOX parser function to parse the BBOX.

You could add prints in these two functons and run deepstream-test1 to check how Pre-process and Post-process are going.

Thanks!