resnet10.caffemodel_b8_fp16.engine is optimized for

yasuda-toshihiro · December 20, 2019, 3:28am

I’ve understood that DeepStream sample app allows high-speed inference.
So I’m investigating “source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt” settings.

[primary-gie]
model-engine-file = … / … / models / Primary_Detector_Nano / resnet10.caffemodel_b8_fp16.engine

Here, the model called resnet10.caffemodel_b8_fp16.engine is optimized for “Only DeepStream” ?

Can’t use this model, as in the Jetson Inference example (detectnet-camera.py)?

(Python bindings for DeepStream currently state that all image frames cannot be obtained.
】 I received an answer that the corresponding version has not yet been released.

https://devtalk.nvidia.com/default/topic/1068306/deepstream-sdk/fail-to-get-frame-in-tensor-metadata/post/5412437/#5412437

Therefore, I want to use it together with the conventional image frame acquisition method)

mchi · December 20, 2019, 4:04am

Can’t use this model, as in the Jetson Inference example (detectnet-camera.py)?
You can use this engine in other application which run on the same platform and Jetpack version, and you need have the same pre-precess and post-process as DeepStream is doing for this model.

yasuda-toshihiro · December 20, 2019, 6:49am

Thank you.

I can’t understand what you mean by “the same pre-precess and post-process as DeepStream is doing for this model”.

But I saw a similar question,
I tried the following command, but it is “Segmentation fault (core dumped)” output on the console.
Is something wrong?

https://devtalk.nvidia.com/default/topic/1027623/jetson-tx2/pretrained-models-for-detectnet-vehicles/

(* I’ve copied “Primary_Detector_Nano” from DeepStream to ~/jetson-inference/data/networks/…)

Below is log…--------------------------------------------------------------------------------------------

user-desktop:~/jetson-inference/python/examples$ python3 detectnet-console.py city_0.jpg test.jpg --prototxt=…/…/data/networks/Primary_Detector_Nano/resnet10.prototxt --model=…/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel

jetson.inference.init.py
jetson.inference – initializing Python 3.6 bindings…
jetson.inference – registering module types…
jetson.inference – done registering module types
jetson.inference – done Python 3.6 binding initialization
jetson.utils.init.py
jetson.utils – initializing Python 3.6 bindings…
jetson.utils – registering module functions…
jetson.utils – done registering module functions
jetson.utils – registering module types…
jetson.utils – done registering module types
jetson.utils – done Python 3.6 binding initialization
[image] loaded ‘city_0.jpg’ (1024 x 512, 3 channels)
jetson.inference – PyTensorNet_New()
jetson.inference – PyDetectNet_Init()
jetson.inference – detectNet loading network using argv command line params
jetson.inference – detectNet.init() argv[0] = ‘detectnet-console.py’
jetson.inference – detectNet.init() argv[1] = ‘city_0.jpg’
jetson.inference – detectNet.init() argv[2] = ‘test.jpg’
jetson.inference – detectNet.init() argv[3] = ‘–prototxt=…/…/data/networks/Primary_Detector_Nano/resnet10.prototxt’
jetson.inference – detectNet.init() argv[4] = ‘–model=…/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel’

detectNet – loading detection network model from:
– prototxt …/…/data/networks/Primary_Detector_Nano/resnet10.prototxt
– model …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel
– input_blob ‘data’
– output_cvg ‘coverage’
– output_bbox ‘bboxes’
– mean_pixel 0.000000
– mean_binary NULL
– class_labels NULL
– threshold 0.500000
– batch_size 1

[TRT] TensorRT version 5.1.6
[TRT] loading NVIDIA plugins…
[TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[TRT] Plugin Creator registration succeeded - NMS_TRT
[TRT] Plugin Creator registration succeeded - Reorg_TRT
[TRT] Plugin Creator registration succeeded - Region_TRT
[TRT] Plugin Creator registration succeeded - Clip_TRT
[TRT] Plugin Creator registration succeeded - LReLU_TRT
[TRT] Plugin Creator registration succeeded - PriorBox_TRT
[TRT] Plugin Creator registration succeeded - Normalize_TRT
[TRT] Plugin Creator registration succeeded - RPROI_TRT
[TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - caffe (extension ‘.caffemodel’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel.1.1.GPU.FP16.engine
[TRT] cache file not found, profiling network model on device GPU
[TRT] device GPU, loading …/…/data/networks/Primary_Detector_Nano/resnet10.prototxt …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel
[TRT] failed to retrieve tensor for Output “coverage”
[TRT] failed to retrieve tensor for Output “bboxes”
Segmentation fault (core dumped)

mchi · December 20, 2019, 7:21am

I can’t understand what you mean by “the same pre-precess and post-process as DeepStream is doing for this model”.
Pre-preocess means the operations like resize, substract mean, etc.
Post-precess means the output parser.

[TRT] failed to retrieve tensor for Output “coverage”
[TRT] failed to retrieve tensor for Output “bboxes”
There is not “coverage” and “bboxes” layers in resnet10.prototxt

yasuda-toshihiro · December 20, 2019, 9:19am

Thank you mchi.

I added below option (from “resnet10.prototxt”)

–output_cvg=‘conv2d_cov’
–output_bbox=‘conv2d_bbox’

user-desktop:~/jetson-inference/python/examples$ python3 detectnet-console.py city_0.jpg test.jpg --prototxt=…/…/data/networks/Primary_Detector_Nano/resnet10.prototxt --model=…/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel --output_cvg=‘conv2d_cov’ --output_bbox=‘conv2d_bbox’

Then, Segmentation fault has been resolved.

However another error was printed.

jetson.inference – detectNet failed to load built-in network ‘ssd-mobilenet-v2’

Why ‘ssd-mobilenet-v2’?

[TRT] device GPU, …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel loaded
[TRT] device GPU, CUDA engine context initialized with 3 bindings
[TRT] binding – index 0
– name ‘input_1’
– type FP32
– in/out INPUT
– # dims 3
– dim #0 3 (CHANNEL)
– dim #1 272 (SPATIAL)
– dim #2 480 (SPATIAL)
[TRT] binding – index 1
– name ‘conv2d_bbox’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 16 (CHANNEL)
– dim #1 17 (SPATIAL)
– dim #2 30 (SPATIAL)
[TRT] binding – index 2
– name ‘conv2d_cov’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 4 (CHANNEL)
– dim #1 17 (SPATIAL)
– dim #2 30 (SPATIAL)
[TRT] binding to input 0 data binding index: -1
[TRT] binding to input 0 data dims (b=1 c=0 h=0 w=0) size=0
[TRT] failed to alloc CUDA mapped memory for tensor input, 0 bytes
detectNet – failed to initialize.
jetson.inference – detectNet failed to load built-in network ‘ssd-mobilenet-v2’
PyTensorNet_Dealloc()
Traceback (most recent call last):
File “detectnet-console.py”, line 51, in
net = jetson.inference.detectNet(opt.network, sys.argv, opt.threshold)
Exception: jetson.inference – detectNet failed to load network
jetson.utils – freeing CUDA mapped memory

mchi · December 20, 2019, 9:28am

Why ‘ssd-mobilenet-v2’?
I think it should be from detectnet-console.py, please look into the code.

yasuda-toshihiro · December 23, 2019, 1:51am

Why ‘ssd-mobilenet-v2’?
I think it should be from detectnet-console.py, please look into the code.

I understand the cause. I did not specify ‘ssd-mobilenet-v2’ in the source,
but it could not load the specified “resnet10.caffemodel” so it was trying to read the default instead…

I was able to run “detectnet-console.py” improving with the below parameters.

      -- input_blob   'input_1'
      -- output_cvg   'conv2d_cov/Sigmoid'
      -- output_bbox  'conv2d_bbox'

…But the results obtained are useless…

I detected attached ‘city_0.jpg’, but detects (2016 meaningless bbox)
Why is this?

jetson.inference – initializing Python 3.6 bindings…
jetson.inference – registering module types…
jetson.inference – done registering module types
jetson.inference – done Python 3.6 binding initialization
jetson.utils – initializing Python 3.6 bindings…
jetson.utils – registering module functions…
jetson.utils – done registering module functions
jetson.utils – registering module types…
jetson.utils – done registering module types
jetson.utils – done Python 3.6 binding initialization
[image] loaded ‘city_0.jpg’ (1024 x 512, 3 channels)
jetson.inference – PyTensorNet_New()
jetson.inference – PyDetectNet_Init()
jetson.inference – detectNet loading network using argv command line params
jetson.inference – detectNet.init() argv[0] = ‘detectnet-console.py’
jetson.inference – detectNet.init() argv[1] = ‘city_0.jpg’
jetson.inference – detectNet.init() argv[2] = ‘test.jpg’
jetson.inference – detectNet.init() argv[3] = ‘–prototxt=…/…/data/networks/Primary_Detector_Nano/resnet10.prototxt’
jetson.inference – detectNet.init() argv[4] = ‘–model=…/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel’
jetson.inference – detectNet.init() argv[5] = ‘–output_cvg=conv2d_cov/Sigmoid’
jetson.inference – detectNet.init() argv[6] = ‘–output_bbox=conv2d_bbox’
jetson.inference – detectNet.init() argv[7] = ‘–input_blob=input_1’

detectNet – loading detection network model from:
– prototxt …/…/data/networks/Primary_Detector_Nano/resnet10.prototxt
– model …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel
– input_blob ‘input_1’
– output_cvg ‘conv2d_cov/Sigmoid’
– output_bbox ‘conv2d_bbox’
– mean_pixel 0.000000
– mean_binary NULL
– class_labels NULL
– threshold 0.500000
– batch_size 1

[TRT] TensorRT version 5.1.6
[TRT] loading NVIDIA plugins…
[TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[TRT] Plugin Creator registration succeeded - NMS_TRT
[TRT] Plugin Creator registration succeeded - Reorg_TRT
[TRT] Plugin Creator registration succeeded - Region_TRT
[TRT] Plugin Creator registration succeeded - Clip_TRT
[TRT] Plugin Creator registration succeeded - LReLU_TRT
[TRT] Plugin Creator registration succeeded - PriorBox_TRT
[TRT] Plugin Creator registration succeeded - Normalize_TRT
[TRT] Plugin Creator registration succeeded - RPROI_TRT
[TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[TRT] completed loading NVIDIA plugins.
[TRT] detected model format - caffe (extension ‘.caffemodel’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel.1.1.GPU.FP16.engine
[TRT] cache file not found, profiling network model on device GPU
[TRT] device GPU, loading …/…/data/networks/Primary_Detector_Nano/resnet10.prototxt …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel
[TRT] retrieved Output tensor “conv2d_cov/Sigmoid”: 4x17x30
[TRT] retrieved Output tensor “conv2d_bbox”: 16x17x30
[TRT] retrieved Input tensor “input_1”: 3x272x480
jetson.inference.init.py
jetson.utils.init.py
detected 2016 objects in image
<detectNet.Detection object>
– ClassID: 0
– Confidence: 24.5156
– Left: 2.13333
– Top: 4.46564e-09
– Right: 0.747566
– Bottom: 0.00201247
– Width: -1.38577
– Height: 0.00201247
– Area: -0.00278881
– Center: (1.44045, 0.00100624)

************************************************Omission (There are 2014 useless objects with no size)

<detectNet.Detection object>
– ClassID: 3
– Confidence: 10.3906
– Left: 991.679
– Top: 481.882
– Right: 989.867
– Bottom: 481.882
– Width: -1.81195
– Height: 0
– Area: -0
– Center: (990.773, 481.882)
[TRT] device GPU, configuring CUDA engine
[TRT] device GPU, building FP16: ON
[TRT] device GPU, building INT8: OFF
[TRT] device GPU, building CUDA engine (this may take a few minutes the first time a network is loaded)
[TRT] device GPU, completed building CUDA engine
[TRT] network profiling complete, writing engine cache to …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel.1.1.GPU.FP16.engine
[TRT] device GPU, completed writing engine cache to …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel.1.1.GPU.FP16.engine
[TRT] device GPU, …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel loaded
[TRT] device GPU, CUDA engine context initialized with 3 bindings
[TRT] binding – index 0
– name ‘input_1’
– type FP32
– in/out INPUT
– # dims 3
– dim #0 3 (CHANNEL)
– dim #1 272 (SPATIAL)
– dim #2 480 (SPATIAL)
[TRT] binding – index 1
– name ‘conv2d_bbox’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 16 (CHANNEL)
– dim #1 17 (SPATIAL)
– dim #2 30 (SPATIAL)
[TRT] binding – index 2
– name ‘conv2d_cov/Sigmoid’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 4 (CHANNEL)
– dim #1 17 (SPATIAL)
– dim #2 30 (SPATIAL)
[TRT] binding to input 0 input_1 binding index: 0
[TRT] binding to input 0 input_1 dims (b=1 c=3 h=272 w=480) size=1566720
[TRT] binding to output 0 conv2d_cov/Sigmoid binding index: 2
[TRT] binding to output 0 conv2d_cov/Sigmoid dims (b=1 c=4 h=17 w=30) size=8160
[TRT] binding to output 1 conv2d_bbox binding index: 1
[TRT] binding to output 1 conv2d_bbox dims (b=1 c=16 h=17 w=30) size=32640
device GPU, …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel initialized.
detectNet – number object classes: 4
detectNet – maximum bounding boxes: 2040
[cuda] cudaDetectionOverlay((float4*)input, (float4*)output, width, height, detections, numDetections, (float4*)mClassColors[1])
[cuda] invalid configuration argument (error 9) (hex 0x09)
[cuda] /home/gk40002251/jetson-inference/c/detectNet.cpp:945
[TRT] detectNet::Detect() – failed to render overlay

[TRT] ------------------------------------------------
[TRT] Timing Report …/…/data/networks/Primary_Detector_Nano/resnet10.caffemodel
[TRT] ------------------------------------------------
[TRT] Pre-Process CPU 0.12172ms CUDA 0.80625ms
[TRT] Network CPU 20.92214ms CUDA 20.25526ms
[TRT] Post-Process CPU 166.44992ms CUDA 166.67921ms
[cuda] cudaEventElapsedTime(&cuda_time, mEventsGPU[evt], mEventsGPU[evt+1])
[cuda] invalid resource handle (error 33) (hex 0x21)
[cuda] /home/gk40002251/jetson-inference/build/aarch64/include/jetson-inference/tensorNet.h:499
[TRT] Visualize CPU 0.00000ms CUDA 0.00000ms
[TRT] Total CPU 187.49377ms CUDA 187.74072ms
[TRT] ------------------------------------------------

[TRT] note – when processing a single image, run ‘sudo jetson_clocks’ before
to disable DVFS for more accurate profiling/timing measurements

jetson.utils – freeing CUDA mapped memory
PyTensorNet_Dealloc()
jetson.inference – PyDetection_Dealloc()

************************************************Omission(Large amount of Dealloc)

mchi · December 23, 2019, 3:36pm

As I mentioned previously, have you added the same pre-precess and post-process as DeepStream does for ResNet10?
If you use the post-precess/parser of ssd-mobilenet-v2 to parse the output of ResNet10, I don’t think it will work.

yasuda-toshihiro · December 24, 2019, 3:27am

1.About pre-process

Examining the DeepStream pre-process (https://www.nvidia.com/content/apac/gtc/ja/pdf/2018/1025.pdf) 34 pages, sorry in Japanese

“YUV” data is passed from the stage preceding the inferring nvinfer.
I think it is a specification to infer “YUV” data.

So,Using Img (RGBA)data read by jetson.utils.loadImageRGBA of detectnet-console.py
net.Detect (img, width, height, opt.overlay) doesn’t work.

You say

Have you added the same pre-precess and post-process as DeepStream does for ResNet10?

That said, I read the source of net.Detect, but it seems that RGBA is expected to come
I think it is necessary to detect it using YUV internally using cudaRGBAToYUV etc.

Do I have to make PyDetectNet.cpp myself to convert to YUV?
jetson-inference / python / bindings / PyDetectNet.cpp

If so, it is not so easy. .

About post-process

I think that post-process / parser processing is done by below setting.
Correct?

(This value is obtained from output-blob-names of “dstest1_pgie_config.txt”
Output-blob-names = conv2d_bbox; conv2d_cov / Sigmoid)

-output_cvg ‘conv2d_cov / Sigmoid’
-output_bbox ‘conv2d_bbox’

mchi · January 21, 2020, 11:19am

Hi
Sorry for late response!

Fime: /opt/nvidia/deepstream/deepstream-4.0/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp

Pre-process function:
NvDsInferContextImpl::queueInputBatch(NvDsInferContextBatchInput &batchInput)
In this function, it calls convertFcn() to do data formation conversion, substract mean value, etc.

Post-process function:
NvDsInferContextImpl::dequeueOutputBatch(NvDsInferContextBatchOutput &batchOutput)
In this function, it calls the BBOX parser function to parse the BBOX.

You could add prints in these two functons and run deepstream-test1 to check how Pre-process and Post-process are going.

Thanks!

Topic		Replies	Views
ONNX model with Jetson-Inference using GPU Jetson Xavier NX tensorrt , jetson-inference , onnx	38	5623	October 18, 2021
Converting Custom RetinaNet model to TensorRT in DeepStream DeepStream SDK tensorrt , neural-network-framework , jetson , deepstream , net	28	86	January 21, 2025
openCv + detectNet in python Jetson Nano camera , python	11	2344	October 15, 2021
Issue with Deepstream Inference of custom 3D action recognition model DeepStream SDK	8	1005	May 18, 2022
Using 10 lines of code tutorial, feed the frame into opencv Jetson Xavier NX opencv	13	1600	October 18, 2021
Converting Caffe model to TensorRT Jetson TX2	33	11464	October 18, 2021
Pretrained Models for detectnet - Vehicles Jetson TX2	19	6164	October 18, 2021
Face detection using jetson inference and custom model Jetson Nano tensorrt , jetson-inference	6	2213	March 9, 2022
Hello AI World - new object detection training and video interfaces Jetson Nano	29	4473	April 20, 2021
TensorRT 10.3 does not support legacy caffe models for Jetpack6.2 Jetson Orin Nano cudnn , jetson , deepstream	5	93	February 3, 2025

resnet10.caffemodel_b8_fp16.engine is optimized for

Related topics