FaceDetectIR in Deepstream 6.1, poor performance on IR but not on color?

Hi everybody,

I’m attempting to run a demo of the FaceDetectIR model on Deepstream 6.1. I have something up and running in a docker container, but I am surprised to find the performance is very poor on a stream from an IR camera (see last part of the attached video). By poor performance, I mean that my face is not detected unless I go really close to the camera (< 30 cm).

At first I thought that I had downloaded the wrong model as the performance is fine when the IR camera disables IR and the camera goes to color (see first part of the video). I’ve tried to play around with some of the thresholds, but with no improvement in the results. Is this kind of performance to be expected? By writing here I hope to get some tips or suggestion as to how the performance can be improved.

My setup:

System:    Kernel: 5.15.0-50-generic x86_64 bits: 64 compiler: N/A Desktop: Gnome 3.36.9 
           Distro: Ubuntu 20.04.4 LTS (Focal Fossa) 
CPU:       Topology: 12-Core model: 12th Gen Intel Core i9-12900K bits: 64 type: MT MCP arch: N/A L2 cache: 30.0 MiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 152985 
           Speed: 700 MHz min/max: 800/5200 MHz Core speeds (MHz): 1: 700 2: 699 3: 801 4: 801 5: 700 6: 701 7: 700 8: 700 
           9: 801 10: 800 11: 800 12: 800 13: 700 14: 800 15: 801 16: 800 17: 800 18: 700 19: 700 20: 701 21: 701 22: 700 
           23: 800 24: 800 
Graphics:  Device-1: NVIDIA vendor: ASUSTeK driver: nvidia v: 510.85.02 bus ID: 01:00.0 
           Display: x11 server: X.Org 1.20.13 driver: fbdev,nouveau unloaded: modesetting,vesa resolution: 3440x1440~50Hz 
           OpenGL: renderer: NVIDIA GeForce RTX 3090/PCIe/SSE2 v: 4.6.0 NVIDIA 510.85.02 direct render: Yes

Camera used: HIKVision DS-2CD2145FWD-I

Script to reproduce:

#!/bin/bash

b_flag=''
s_flag=''

print_usage() {
  printf "Usage: -b to build, -s to run shell in the container. No arguments to run the docker container (might require sudo)"
}

while getopts 'sb' flag; do
  case "${flag}" in
    b) b_flag='true' ;;
    s) s_flag='true' ;;
    *) print_usage
       exit 1 ;;
  esac
done

APPDIR="$PWD"

if [[ $b_flag == 'true' ]];
then


    echo "get facedetectir models"
    FACEDETECTIR_MODELS_PATH="$APPDIR/models/tao_pretrained_models/facedetectir/"
    mkdir -p "$FACEDETECTIR_MODELS_PATH"
    wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/tao/facedetectir/versions/pruned_v1.0.1/zip \
    -O "$APPDIR/facedetectir_pruned_v1.0.zip"
    unzip "$APPDIR/facedetectir_pruned_v1.0.zip" -d "$FACEDETECTIR_MODELS_PATH"
    rm "$APPDIR/facedetectir_pruned_v1.0.zip"

    echo "get facedetectir configs"

    FACEDETECTIR_CONFIGS_PATH="$APPDIR/configs/tao_pretrained_models/facedetectir/"
    mkdir -p "$FACEDETECTIR_CONFIGS_PATH"
    wget --content-disposition https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/archive/refs/tags/DS_6.1.zip \
    -O "$APPDIR/facedetectir_deepstream_reference_apps.zip"
    unzip "$APPDIR/facedetectir_deepstream_reference_apps.zip" -d "$APPDIR/facedetectir_deepstream_reference_apps"
    cp \
        "$APPDIR/facedetectir_deepstream_reference_apps/deepstream_reference_apps-DS_6.1/deepstream_app_tao_configs/config_infer_primary_facedetectir.txt" \
        "$FACEDETECTIR_CONFIGS_PATH"
    cp \
        "$APPDIR/facedetectir_deepstream_reference_apps/deepstream_reference_apps-DS_6.1/deepstream_app_tao_configs/deepstream_app_source1_facedetectir.txt" \
        "$FACEDETECTIR_CONFIGS_PATH"
    rm -Rf "$APPDIR/facedetectir_deepstream_reference_apps"
    rm "$APPDIR/facedetectir_deepstream_reference_apps.zip"

    echo "editing configs"
    sed -i 's/input-dims=3;240;384;0/infer-dims=3;240;384/' "$FACEDETECTIR_CONFIGS_PATH/config_infer_primary_facedetectir.txt"
    sed -i 's+tlt-encoded-model=../../models/tao_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt+tlt-encoded-model=../../../models/tao_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt+' "$FACEDETECTIR_CONFIGS_PATH/config_infer_primary_facedetectir.txt"
    sed -i 's+labelfile-path=labels_facedetectir.txt+labelfile-path=../../../models/tao_pretrained_models/facedetectir/labels.txt+' "$FACEDETECTIR_CONFIGS_PATH/config_infer_primary_facedetectir.txt"
    sed -i 's+int8-calib-file=../../models/tao_pretrained_models/facedetectir/facedetectir_int8.txt+int8-calib-file=../../../models/tao_pretrained_models/facedetectir/facedetectir_int8.txt+' "$FACEDETECTIR_CONFIGS_PATH/config_infer_primary_facedetectir.txt"
    sed -i 's+model-engine-file=../../models/tao_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt_b1_gpu0_int8.engine+model-engine-file=../../../models/tao_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt_b1_gpu0_int8.engine+' "$FACEDETECTIR_CONFIGS_PATH/config_infer_primary_facedetectir.txt"
    sed -i 's/deepstream-6.0/deepstream-6.1/g' "$FACEDETECTIR_CONFIGS_PATH/deepstream_app_source1_facedetectir.txt"
    sed -i 's+model-engine-file=../../models/tao_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt_b1_gpu0_int8.engine+model-engine-file=../../../models/tao_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt_b1_gpu0_int8.engine+' "$FACEDETECTIR_CONFIGS_PATH/deepstream_app_source1_facedetectir.txt"
    sed -i 's+uri=file://../../streams/sample_1080p_h265.mp4+uri=rtsp://youruri+' "$FACEDETECTIR_CONFIGS_PATH/deepstream_app_source1_facedetectir.txt"

else
  DEEPSTREAM_BASE=/opt/nvidia/deepstream/deepstream-6.1/samples

  echo "run docker container"

  xhost +

  if [[ $s_flag == 'true' ]];
  then
    docker run --gpus all -it --rm --network host -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -v "$APPDIR/configs/tao_pretrained_models":"$DEEPSTREAM_BASE/configs/tao_pretrained_models" -v "$APPDIR/models/tao_pretrained_models":"$DEEPSTREAM_BASE/models/tao_pretrained_models" -w /opt/nvidia/deepstream/deepstream-6.1/samples/models/tao_pretrained_models --entrypoint="" nvcr.io/nvidia/deepstream:6.1.1-triton bash
  else
    docker run --gpus all -it --rm --network host -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -v "$APPDIR/configs/tao_pretrained_models":"$DEEPSTREAM_BASE/configs/tao_pretrained_models" -v "$APPDIR/models/tao_pretrained_models":"$DEEPSTREAM_BASE/models/tao_pretrained_models" -w /opt/nvidia/deepstream/deepstream-6.1/samples/models/tao_pretrained_models --entrypoint="" nvcr.io/nvidia/deepstream:6.1.1-triton bash -c 'deepstream-app -c /opt/nvidia/deepstream/deepstream-6.1/samples/configs/tao_pretrained_models/facedetectir/deepstream_app_source1_facedetectir.txt'
  fi
fi

I’ve put the above code in a script.sh file in a new directory. the run it first to download/build:

./script.sh -b

Followed by actually running the docker container with everything ready:

./script.sh

My console output looks something like this:

dannie@da9e-linux:~/testing2$ sudo ./script.sh
run docker container
access control disabled, clients can connect from any host

(gst-plugin-scanner:18): GStreamer-WARNING **: 10:47:00.265: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_udp.so': librivermax.so.0: cannot open shared object file: No such file or directory

(gst-plugin-scanner:18): GStreamer-WARNING **: 10:47:00.297: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstchromaprint.so': libavcodec.so.58: cannot open shared object file: No such file or directory
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-6.1/lib/libnvds_nvmultiobjecttracker.so
~~ CLOG[include/modules/NvMultiObjectTracker/NvTrackerParams.hpp, getConfigRoot() @line 52]: [NvTrackerParams::getConfigRoot()] !!![WARNING] Invalid low-level config file caused an exception, but will go ahead with the default config values
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
~~ CLOG[include/modules/NvMultiObjectTracker/NvTrackerParams.hpp, getConfigRoot() @line 52]: [NvTrackerParams::getConfigRoot()] !!![WARNING] Invalid low-level config file caused an exception, but will go ahead with the default config values
[NvMultiObjectTracker] Initialized
0:00:01.044895181     1 0x7f9fbc002380 INFO                 nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1909> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.1/samples/configs/tao_pretrained_models/facedetectir/../../../models/tao_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt_b1_gpu0_int8.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT input_1         3x240x384       
1   OUTPUT kFLOAT output_bbox/BiasAdd 4x15x24         
2   OUTPUT kFLOAT output_cov/Sigmoid 1x15x24         

0:00:01.086644639     1 0x7f9fbc002380 INFO                 nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2012> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.1/samples/configs/tao_pretrained_models/facedetectir/../../../models/tao_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt_b1_gpu0_int8.engine
0:00:01.087806096     1 0x7f9fbc002380 INFO                 nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-6.1/samples/configs/tao_pretrained_models/facedetectir/config_infer_primary_facedetectir.txt sucessfully

Runtime commands:
	h: Print this help
	q: Quit

	p: Pause
	r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.

** INFO: <bus_callback:194>: Pipeline ready

** INFO: <bus_callback:180>: Pipeline running


**PERF:  FPS 0 (Avg)	
**PERF:  0.00 (0.00)	
** INFO: <bus_callback:180>: Pipeline running

**PERF:  24.01 (23.15)	
**PERF:  25.01 (24.16)	
**PERF:  25.03 (24.45)	
**PERF:  25.03 (24.59)	
**PERF:  25.02 (24.68)	
**PERF:  25.02 (24.73)	
**PERF:  25.01 (24.77)	
**PERF:  25.01 (24.80)	
**PERF:  25.04 (24.93)	
**PERF:  25.00 (24.94)	

The edited configs ends up looking like this:
config_infer_primary_facedetectir.txt:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=tlt_encode
tlt-encoded-model=../../../models/tao_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt
labelfile-path=../../../models/tao_pretrained_models/facedetectir/labels.txt
int8-calib-file=../../../models/tao_pretrained_models/facedetectir/facedetectir_int8.txt
model-engine-file=../../../models/tao_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt_b1_gpu0_int8.engine
infer-dims=3;240;384
uff-input-blob-name=input_1
batch-size=1
process-mode=1
model-color-format=0
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=1
interval=0
gie-unique-id=1
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
cluster-mode=2

#Use the config params below for dbscan clustering mode
#[class-attrs-all]
#detected-min-w=4
#detected-min-h=4
#minBoxes=3
#eps=0.7

#Use the config params below for NMS clustering mode
[class-attrs-all]
topk=20
nms-iou-threshold=0.5
pre-cluster-threshold=0.2

## Per class configurations
[class-attrs-0]
topk=20
nms-iou-threshold=0.5
pre-cluster-threshold=0.4

#[class-attrs-1]
#pre-cluster-threshold=0.05
#eps=0.7
#dbscan-min-score=0.5

and deepstream_app_source1_facedetectir.txt:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=1

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
gpu-id=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
num-sources=1
uri=rtsp://youruri
gpu-id=0

[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=1
source-id=0
gpu-id=0

[osd]
enable=1
gpu-id=0
border-width=3
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial

[primary-gie]
enable=1
gpu-id=0
# Modify as necessary
model-engine-file=../../../models/tao_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt_b1_gpu0_int8.engine
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
gie-unique-id=1
config-file=config_infer_primary_facedetectir.txt

[sink1]
enable=0
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265 3=mpeg4
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=2000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
output-file=out.mp4
source-id=0

[sink2]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming 5=Overlay
type=4
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=4000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
# set below properties in case of RTSPStreaming
rtsp-port=8554
udp-port=5400

[tracker]
enable=1
# For NvDCF and DeepSORT tracker, tracker-width and tracker-height must be a multiple of 32, respectively
tracker-width=640
tracker-height=384
ll-lib-file=/opt/nvidia/deepstream/deepstream-6.1/lib/libnvds_nvmultiobjecttracker.so
# ll-config-file required to set different tracker types
# ll-config-file=../deepstream-app/config_tracker_IOU.yml
ll-config-file=../deepstream-app/config_tracker_NvDCF_perf.yml
# ll-config-file=../deepstream-app/config_tracker_NvDCF_accuracy.yml
# ll-config-file=../deepstream-app/config_tracker_DeepSORT.yml
gpu-id=0
enable-batch-process=1
enable-past-frame=1
display-tracking-id=1

[tests]
file-loop=0

Could you try FaceDetect model as well.

From FaceDetect — TAO Toolkit 3.22.05 documentation , “Compared to the FaceirNet model, this model gives better results with RGB images and smaller faces.”

@Morganh Thank you - the FaceDetect does a great job. It seems that it is not only doing a better job with RGB images and smaller faces, but also IR images. I guess FaceDetectIR is becoming deprecated?

Enjoy my victory dance ;)

Glad to know it works now. Compared to the PeopleNet model, the FaceirNet model gives better results detecting large faces, such as faces in webcam images. Compared to the FaceirNet model, the FaceDetect model gives better results with RGB images and smaller faces.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.