Face detection using jetson inference and custom model

Machine0815 · February 11, 2022, 6:21pm

Hi,
I am trying to use res10_300x300_ssd_iter_140000_fp16.caffemodel with detectNet in jetson inference as cusom model:

net = jetson.inference.detectNet(argv=[
‘–prototxt=Resources/models/caffe_model_for_dace_detection/deploy.prototxt’, ‘–model=Resources/models/caffe_model_for_dace_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel’,‘–input-blob=input_0’, ‘–output-cvg=scores’, ‘–output-bbox=boxes’],threshold= self.threshold)

And I folowed the answer from this issue: Conversion from caffemodel to TensorRT - #6 by AastaLLL

But I got this error message :

[TRT] TensorRT version 8.0.1
[TRT] loading NVIDIA plugins…
[TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[TRT] Registered plugin creator - ::NMS_TRT version 1
[TRT] Registered plugin creator - ::Reorg_TRT version 1
[TRT] Registered plugin creator - ::Region_TRT version 1
[TRT] Registered plugin creator - ::Clip_TRT version 1
[TRT] Registered plugin creator - ::LReLU_TRT version 1
[TRT] Registered plugin creator - ::PriorBox_TRT version 1
[TRT] Registered plugin creator - ::Normalize_TRT version 1
[TRT] Registered plugin creator - ::ScatterND version 1
[TRT] Registered plugin creator - ::RPROI_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[TRT] Could not register plugin creator - ::FlattenConcat_TRT version 1
[TRT] Registered plugin creator - ::CropAndResize version 1
[TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[TRT] Registered plugin creator - ::Proposal version 1
[TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT] Registered plugin creator - ::Split version 1
[TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT] detected model format - caffe (extension ‘.caffemodel’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 232, GPU 3806 (MiB)
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file Resources/models/caffe_model_for_dace_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel.1.1.8001.GPU.FP16.engine
[TRT] cache file not found, profiling network model on device GPU
[TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 232, GPU 3794 (MiB)
[TRT] device GPU, loading Resources/models/caffe_model_for_dace_detection/deploy.prototxt Resources/models/caffe_model_for_dace_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel
[TRT] failed to retrieve tensor for Output “scores”
Segmentation fault (core dumped)

What should I do to avoid this error?

Thanks

AastaLLL · February 14, 2022, 5:40am

Hi

[TRT] failed to retrieve tensor for Output “scores”

Based on the error, it seems some issue when TensorRT tries to find the output layer.
Do you name the output layer as scores and boxes?

If yes, please share the deploy.prototxt and res10_300x300_ssd_iter_140000_fp16.caffemodel with us for checking.

Thanks.

Machine0815 · February 14, 2022, 2:06pm

Hey, thank you for your answer.

No, the name of the output layer is : detection_out .
So I corrected it like this :
net = jetson.inference.detectNet(argv=[
‘–prototxt=Resources/models/caffe_model_for_face_detection/deploy.prototxt’, ‘–model=Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel’, ‘–output-cvg=detection_out’, ‘–output-bbox=detection_out’],threshold= 0.5)

The network is correctly loaded:

detectNet – loading detection network model from:
– prototxt Resources/models/caffe_model_for_face_detection/deploy.prototxt
– model Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel
– input_blob ‘data’
– output_cvg ‘detection_out’
– output_bbox ‘detection_out’
– mean_pixel 0.000000
– mean_binary NULL
– class_labels NULL
– threshold 0.500000
– batch_size 1

[TRT] TensorRT version 8.0.1
[TRT] loading NVIDIA plugins…
[TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[TRT] Registered plugin creator - ::NMS_TRT version 1
[TRT] Registered plugin creator - ::Reorg_TRT version 1
[TRT] Registered plugin creator - ::Region_TRT version 1
[TRT] Registered plugin creator - ::Clip_TRT version 1
[TRT] Registered plugin creator - ::LReLU_TRT version 1
[TRT] Registered plugin creator - ::PriorBox_TRT version 1
[TRT] Registered plugin creator - ::Normalize_TRT version 1
[TRT] Registered plugin creator - ::ScatterND version 1
[TRT] Registered plugin creator - ::RPROI_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[TRT] Could not register plugin creator - ::FlattenConcat_TRT version 1
[TRT] Registered plugin creator - ::CropAndResize version 1
[TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[TRT] Registered plugin creator - ::Proposal version 1
[TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT] Registered plugin creator - ::Split version 1
[TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT] detected model format - caffe (extension ‘.caffemodel’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 232, GPU 3648 (MiB)
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel.1.1.8001.GPU.FP16.engine
[TRT] loading network plan from engine cache… Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel.1.1.8001.GPU.FP16.engine
[TRT] device GPU, loaded Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel
[TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 238, GPU 3653 (MiB)
[TRT] Loaded engine size: 6 MB
[TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 238 MiB, GPU 3653 MiB
[TRT] Using cublas a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU -14, now: CPU 414, GPU 3658 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +240, GPU +104, now: CPU 654, GPU 3762 (MiB)
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 654, GPU 3762 (MiB)
[TRT] Deserialization required 3067226 microseconds.
[TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 654 MiB, GPU 3762 MiB
[TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 654 MiB, GPU 3762 MiB
[TRT] Using cublas a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 654, GPU 3762 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 654, GPU 3762 (MiB)
[TRT] Total per-runner device memory is 5528064
[TRT] Total per-runner host memory is 51328
[TRT] Allocated activation device memory of size 3086336
[TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 655 MiB, GPU 3769 MiB
[TRT]
[TRT] CUDA engine context initialized on device GPU:
[TRT] – layers 74
[TRT] – maxBatchSize 1
[TRT] – deviceMemory 3086336
[TRT] – bindings 2
[TRT] binding 0
– index 0
– name ‘data’
– type FP32
– in/out INPUT
– # dims 3
– dim #0 3
– dim #1 300
– dim #2 300
[TRT] binding 1
– index 1
– name ‘detection_out’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1
– dim #1 200
– dim #2 7
[TRT]
[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=1 c=3 h=300 w=300) size=1080000
[TRT] binding to output 0 detection_out binding index: 1
[TRT] binding to output 0 detection_out dims (b=1 c=1 h=200 w=7) size=5600
[TRT] binding to output 1 detection_out binding index: 1
[TRT] binding to output 1 detection_out dims (b=1 c=1 h=200 w=7) size=5600
[TRT]
[TRT] device GPU, Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel initialized.
[TRT] detectNet – number object classes: 1
[TRT] detectNet – maximum bounding boxes: 1400

But it does’nt detect any faces in the picture and the number of detentions is 0.

can you check it please ?

Here is the deploy.prototxt and res10_300x300_ssd_iter_140000_fp16.caffemodel which I use.

res10_300x300_ssd_iter_140000_fp16.caffemodel (5.1 MB)
deploy.prototxt (28.2 KB)

dusty_nv · February 14, 2022, 7:52pm

Hi @Machine0815, you may need to update the pre/post-processing code in jetson-inference/c/detectNet.cpp to reflect what your model expects:

caffemodel pre-processing: jetson-inference/detectNet.cpp at 91f0a7022ccb3bab07b1770cec812ab1c9da6138 · dusty-nv/jetson-inference · GitHub
caffemodel post-processing: jetson-inference/detectNet.cpp at 91f0a7022ccb3bab07b1770cec812ab1c9da6138 · dusty-nv/jetson-inference · GitHub

I can’t specify exactly what that entails, as it specific to the model that you are trying to use. As it is now, when handling caffe detection models, the code expects a model that has two separate output layers (a coverage/confidence layer and a bounding-box offset layer). It appears you would need to modify that to support the single-layer output that your model has.

Machine0815 · February 15, 2022, 11:48am

Thank you for your answer!

unfortunately I didn’t get it. I have to say I am not that good at c++.
I found the model (Caffe-SSD with resnet) to be the most accurate but when I use it in real time I only get 3 fps on jetson nano. (using cuda you get 5-6 fps). This is not enough for my application.
Jetson Infernece gives a wonderful fps but unfortunately the face detection model in Jetson inference (Facenet) is not accurate at all.

Is there another way to do real-time face detection on Jetson nano using Jetson Inference or other platforms which give me more than 10 fps on jetson nano?
do you have any suggestions?
Thanks

dusty_nv · February 15, 2022, 3:25pm

One of the reasons that jetson.inference gets good FPS is that it uses TensorRT for inferencing. You can also use TensorRT directly from Python to support your custom model. You can find the TensorRT Python samples under /usr/src/tensorrt/samples/python/ and the documentation here:

If you were using PyTorch or TensorFlow, there are extensions to those frameworks that allow you to run the model with TensorRT without actually needing to use the TensorRT API.

You may also want to check the Jetson Community Projects page for other projects doing similar thing: Jetson Community Projects | NVIDIA Developer

The Facenet model included with jetson.inference is old and based on an outdated DNN architecture. One alternative would be to train your own SSD-Mobilenet model with PyTorch (like shown in the tutorial), and that would run fine with jetson.inference and get good FPS.

Topic		Replies	Views
Jetson inference failed to load detectNet model Jetson Nano jetson-inference , cudnn	1	415	February 20, 2024
jetson-inference facenet testing & Other models testing Jetson Nano	6	1929	May 7, 2019
Detectnet on custom model (ssd-mobilenet-v2) got error cuTensor Error in executeCutensor: 7 (Internal cuTensor reformat failed) Jetson Nano jetson-inference , ai-training	6	1630	December 8, 2020
Run custom Resnet SSD using TensorRT Jetson Nano	5	1535	September 26, 2019
Segmentation fault occurs while running jetson-inference example application Jetson Nano jetson-inference	1	374	August 18, 2021
Face-detection porting from python application Jetson Nano jetson-inference	6	817	April 20, 2020
Have already https://github.com/dusty-nv/jetson-inference/releases The file is placed in */date/networks, but still not executed AI for Media jetson-inference	4	162	March 6, 2025
Orin Nano Hello AI World Inferencing Not Working in JetPack 6.1 Jetson Orin Nano jetson-inference	5	463	December 11, 2024
TensorRT 10.3 does not support legacy caffe models for Jetpack6.2 Jetson Orin Nano cudnn , jetson , deepstream	5	986	February 3, 2025
Can't use any model with jetson-inference Jetson Orin Nano jetson-inference	3	860	November 4, 2024

Face detection using jetson inference and custom model

Related topics