Face detection using jetson inference and custom model

Hi,
I am trying to use res10_300x300_ssd_iter_140000_fp16.caffemodel with detectNet in jetson inference as cusom model:

net = jetson.inference.detectNet(argv=[
‘–prototxt=Resources/models/caffe_model_for_dace_detection/deploy.prototxt’, ‘–model=Resources/models/caffe_model_for_dace_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel’,’–input-blob=input_0’, ‘–output-cvg=scores’, ‘–output-bbox=boxes’],threshold= self.threshold)

And I folowed the answer from this issue: Conversion from caffemodel to TensorRT - #6 by AastaLLL

But I got this error message :

[TRT] TensorRT version 8.0.1
[TRT] loading NVIDIA plugins…
[TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[TRT] Registered plugin creator - ::NMS_TRT version 1
[TRT] Registered plugin creator - ::Reorg_TRT version 1
[TRT] Registered plugin creator - ::Region_TRT version 1
[TRT] Registered plugin creator - ::Clip_TRT version 1
[TRT] Registered plugin creator - ::LReLU_TRT version 1
[TRT] Registered plugin creator - ::PriorBox_TRT version 1
[TRT] Registered plugin creator - ::Normalize_TRT version 1
[TRT] Registered plugin creator - ::ScatterND version 1
[TRT] Registered plugin creator - ::RPROI_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[TRT] Could not register plugin creator - ::FlattenConcat_TRT version 1
[TRT] Registered plugin creator - ::CropAndResize version 1
[TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[TRT] Registered plugin creator - ::Proposal version 1
[TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT] Registered plugin creator - ::Split version 1
[TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT] detected model format - caffe (extension ‘.caffemodel’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 232, GPU 3806 (MiB)
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file Resources/models/caffe_model_for_dace_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel.1.1.8001.GPU.FP16.engine
[TRT] cache file not found, profiling network model on device GPU
[TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 232, GPU 3794 (MiB)
[TRT] device GPU, loading Resources/models/caffe_model_for_dace_detection/deploy.prototxt Resources/models/caffe_model_for_dace_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel
[TRT] failed to retrieve tensor for Output “scores”
Segmentation fault (core dumped)

What should I do to avoid this error?

Thanks

Hi

[TRT] failed to retrieve tensor for Output “scores”

Based on the error, it seems some issue when TensorRT tries to find the output layer.
Do you name the output layer as scores and boxes?

If yes, please share the deploy.prototxt and res10_300x300_ssd_iter_140000_fp16.caffemodel with us for checking.

Thanks.

Hey, thank you for your answer.

No, the name of the output layer is : detection_out .
So I corrected it like this :
net = jetson.inference.detectNet(argv=[
‘–prototxt=Resources/models/caffe_model_for_face_detection/deploy.prototxt’, ‘–model=Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel’, ‘–output-cvg=detection_out’, ‘–output-bbox=detection_out’],threshold= 0.5)

The network is correctly loaded:

detectNet – loading detection network model from:
– prototxt Resources/models/caffe_model_for_face_detection/deploy.prototxt
– model Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel
– input_blob ‘data’
– output_cvg ‘detection_out’
– output_bbox ‘detection_out’
– mean_pixel 0.000000
– mean_binary NULL
– class_labels NULL
– threshold 0.500000
– batch_size 1

[TRT] TensorRT version 8.0.1
[TRT] loading NVIDIA plugins…
[TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[TRT] Registered plugin creator - ::NMS_TRT version 1
[TRT] Registered plugin creator - ::Reorg_TRT version 1
[TRT] Registered plugin creator - ::Region_TRT version 1
[TRT] Registered plugin creator - ::Clip_TRT version 1
[TRT] Registered plugin creator - ::LReLU_TRT version 1
[TRT] Registered plugin creator - ::PriorBox_TRT version 1
[TRT] Registered plugin creator - ::Normalize_TRT version 1
[TRT] Registered plugin creator - ::ScatterND version 1
[TRT] Registered plugin creator - ::RPROI_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[TRT] Could not register plugin creator - ::FlattenConcat_TRT version 1
[TRT] Registered plugin creator - ::CropAndResize version 1
[TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[TRT] Registered plugin creator - ::Proposal version 1
[TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT] Registered plugin creator - ::Split version 1
[TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT] detected model format - caffe (extension ‘.caffemodel’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 232, GPU 3648 (MiB)
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel.1.1.8001.GPU.FP16.engine
[TRT] loading network plan from engine cache… Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel.1.1.8001.GPU.FP16.engine
[TRT] device GPU, loaded Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel
[TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 238, GPU 3653 (MiB)
[TRT] Loaded engine size: 6 MB
[TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 238 MiB, GPU 3653 MiB
[TRT] Using cublas a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU -14, now: CPU 414, GPU 3658 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +240, GPU +104, now: CPU 654, GPU 3762 (MiB)
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 654, GPU 3762 (MiB)
[TRT] Deserialization required 3067226 microseconds.
[TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 654 MiB, GPU 3762 MiB
[TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 654 MiB, GPU 3762 MiB
[TRT] Using cublas a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 654, GPU 3762 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 654, GPU 3762 (MiB)
[TRT] Total per-runner device memory is 5528064
[TRT] Total per-runner host memory is 51328
[TRT] Allocated activation device memory of size 3086336
[TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 655 MiB, GPU 3769 MiB
[TRT]
[TRT] CUDA engine context initialized on device GPU:
[TRT] – layers 74
[TRT] – maxBatchSize 1
[TRT] – deviceMemory 3086336
[TRT] – bindings 2
[TRT] binding 0
– index 0
– name ‘data’
– type FP32
– in/out INPUT
– # dims 3
– dim #0 3
– dim #1 300
– dim #2 300
[TRT] binding 1
– index 1
– name ‘detection_out’
– type FP32
– in/out OUTPUT
– # dims 3
– dim #0 1
– dim #1 200
– dim #2 7
[TRT]
[TRT] binding to input 0 data binding index: 0
[TRT] binding to input 0 data dims (b=1 c=3 h=300 w=300) size=1080000
[TRT] binding to output 0 detection_out binding index: 1
[TRT] binding to output 0 detection_out dims (b=1 c=1 h=200 w=7) size=5600
[TRT] binding to output 1 detection_out binding index: 1
[TRT] binding to output 1 detection_out dims (b=1 c=1 h=200 w=7) size=5600
[TRT]
[TRT] device GPU, Resources/models/caffe_model_for_face_detection/res10_300x300_ssd_iter_140000_fp16.caffemodel initialized.
[TRT] detectNet – number object classes: 1
[TRT] detectNet – maximum bounding boxes: 1400

But it does’nt detect any faces in the picture and the number of detentions is 0.

can you check it please ?

Here is the deploy.prototxt and res10_300x300_ssd_iter_140000_fp16.caffemodel which I use.

res10_300x300_ssd_iter_140000_fp16.caffemodel (5.1 MB)
deploy.prototxt (28.2 KB)

Hi @Machine0815, you may need to update the pre/post-processing code in jetson-inference/c/detectNet.cpp to reflect what your model expects:

I can’t specify exactly what that entails, as it specific to the model that you are trying to use. As it is now, when handling caffe detection models, the code expects a model that has two separate output layers (a coverage/confidence layer and a bounding-box offset layer). It appears you would need to modify that to support the single-layer output that your model has.

Thank you for your answer!

unfortunately I didn’t get it. I have to say I am not that good at c++.
I found the model (Caffe-SSD with resnet) to be the most accurate but when I use it in real time I only get 3 fps on jetson nano. (using cuda you get 5-6 fps). This is not enough for my application.
Jetson Infernece gives a wonderful fps but unfortunately the face detection model in Jetson inference (Facenet) is not accurate at all.

Is there another way to do real-time face detection on Jetson nano using Jetson Inference or other platforms which give me more than 10 fps on jetson nano?
do you have any suggestions?
Thanks

One of the reasons that jetson.inference gets good FPS is that it uses TensorRT for inferencing. You can also use TensorRT directly from Python to support your custom model. You can find the TensorRT Python samples under /usr/src/tensorrt/samples/python/ and the documentation here:

If you were using PyTorch or TensorFlow, there are extensions to those frameworks that allow you to run the model with TensorRT without actually needing to use the TensorRT API.

You may also want to check the Jetson Community Projects page for other projects doing similar thing: Jetson Community Projects | NVIDIA Developer

The Facenet model included with jetson.inference is old and based on an outdated DNN architecture. One alternative would be to train your own SSD-Mobilenet model with PyTorch (like shown in the tutorial), and that would run fine with jetson.inference and get good FPS.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.