FaceDetect IR Training using TLT 3.0 and Custom Dataset

Hi ,

Hardware - DGPU

GPU - Tesla T4

I am trying to train a new model for face detection using my own custom dataset. While surfing around Nvidia Developer website I found the Getting started with TLT 3.0 Documentation for custom model training with jupyter notebook and related instructions ➟➟➟ Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation

Before starting with the instructions I installed all the prerequisites like Docker, Nvidia Docker Container and Jupyter notebook prerequisites,
➟➟➟ Installation Guide — NVIDIA Cloud Native Technologies documentation
➟➟➟ Run Docker as a non-root user – The Geek Diary
➟➟➟ Container Toolkit - https://psnow.ext.hpe.com/doc/a00094832enw
➟➟➟ Virtual Environment - Virtualenv with Virtualenvwrapper on Ubuntu | by Aditya Chhabra | Medium

This is the TLT Jupyter Notebook instructions I followed to train the model using the Wider Face Dataset as per instructions ➟➟➟ https://ngc.nvidia.com/catalog/resources/nvidia:tlt_cv_samples

All the steps where completed as mentioned below, ✔️

  • Set up env variables, map drives and install dependencies
  • Prepare dataset and pre-trained model
  • Provide training specification
  • Run TLT training
  • Evaluate the trained model
  • Prune the trained model
  • Retrain the pruned model
  • Evaluate the retrained model
  • Visualize inferences
  • Deploy
  • Verify Deployed Model

These are the folders created as outputs in the tlt-experiments directory,

According to the documentation the final trained model is saved in experiment_dir_final directory and inside the directory there are two files,

  • resnet18_detector.etlt
  • resnet18_detector.trt

Queries

  1. How can I use this trained model to test a live video cam feed or input video file so that the model can detect faces. ?

  2. This is the current file with the pretrained model that is running the live cam feed /opt/nvidia/deepstream/deepstream-.1/samples/configs/tlt_pretrained_models/deepstream_app_source1_facedetectir.txt
    I run this file using the command deepstream-app -c deepstream_app_source1_facedetectir.txt

  3. Is there any way i can generate a int8.txt for the trained model as the pretrained model file has one file along when I downloaded from ngc.

  4. How to convert and generate the engine file?

  5. How to use custom dataset ? Any guide for labelling and preparing datasets

Appreciate if anyone guides me through.

Thanks!

  1. Please refer to DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation End user can run inference via deepstream.
  2. Yes, you can refer to them. The files include config_infer_primary_facedetectir.txt, deepstream_app_source1_facedetectir.txt and labels_facedetectir.txt.
  3. Refer to DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation and DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation. The example can be found at tlt_cv_samples_v1.0.2/detectnet_v2/detectnet_v2.ipynb too.
  4. See DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation. The example can be found at section 10.B of tlt_cv_samples_v1.0.2/detectnet_v2/detectnet_v2.ipynb too.
  5. Refer to section 2 of tlt_cv_samples_v1.0.2/detectnet_v2/detectnet_v2.ipynb or tlt_cv_samples_v1.0.2/facenet/facenet.ipynb

Hi @Morganh,

Im not able to get the proper way to genrate the int8 txt file as there is no instructions on how to get it exported in facenet.ipynb

This is the files in the pretrained model I download from NGC , tested and worked perfectly with videofiles, IP cams and webcam streams.

image

I want to generate the similar files for the custom trained models which I am unable to create.
image

Can you please guide me on the how to create this steps for facenet and the detectnet_v2.ipynb was far advanced to my undersatnding as I am new to deepstream.

Facenet is based on detectNet_v2 network. So, please refer to section 10.A of tlt_cv_samples_v1.0.2/detectnet_v2/detectnet_v2.ipynb

DetectNet_v2 model supports int8 inference mode in TensorRT. In order to use int8 mode, we must calibrate the model to run 8-bit inferences -

  • Generate calibration tensorfile from the training data using detectnet_v2 calibration_tensorfile
  • Use tlt-export to generate int8 calibration table.

!tlt detectnet_v2 calibration_tensorfile -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt
-m 10
-o $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor

!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
!rm -rf $LOCAL_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
!tlt detectnet_v2 export
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt
-k $KEY
–cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor
–data_type int8
–batches 10
–batch_size 4
–max_batch_size 4
–engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8
–cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
–verbose

Hi @Morganh,

These are the files that are generated after running the commands mentioned above,
image

And the last file resnet18_detector_trt.int8 shows 0bytes.

How to run the model as deepstream-app -c deepstream_app_source1_facedetectir.txt

Please share your full command and full log.

Hi @Morganh,

The below mentioned are the files present in tlt_pretrained_models that I have earlier used for running the FacDetectIR pretrained model.

glueck@gluecktx2DS5:/opt/nvidia/deepstream/deepstream-5.1/samples/configs/tlt_pretrained_models$ cat config_infer_primary_facedetectir.txt

    ################################################################################
    # Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
    #
    # Permission is hereby granted, free of charge, to any person obtaining a
    # copy of this software and associated documentation files (the "Software"),
    # to deal in the Software without restriction, including without limitation
    # the rights to use, copy, modify, merge, publish, distribute, sublicense,
    # and/or sell copies of the Software, and to permit persons to whom the
    # Software is furnished to do so, subject to the following conditions:
    #
    # The above copyright notice and this permission notice shall be included in
    # all copies or substantial portions of the Software.
    #
    # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
    # THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
    # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
    # DEALINGS IN THE SOFTWARE.
    ################################################################################

    [property]
    gpu-id=0
    net-scale-factor=0.0039215697906911373
    tlt-model-key=tlt_encode
    tlt-encoded-model=../../models/tlt_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt
    labelfile-path=labels_facedetectir.txt
    int8-calib-file=../../models/tlt_pretrained_models/facedetectir/facedetectir_int8.txt
    model-engine-file=../../models/tlt_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt_b1_gpu0_int8.engine
    input-dims=3;240;384;0
    uff-input-blob-name=input_1
    batch-size=1
    process-mode=1
    model-color-format=0
    ## 0=FP32, 1=INT8, 2=FP16 mode
    network-mode=1
    num-detected-classes=1
    interval=0
    gie-unique-id=1
    output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid

    [class-attrs-all]
    pre-cluster-threshold=0.2
    group-threshold=1
    ## Set eps=0.7 and minBoxes for cluster-mode=1(DBSCAN)
    eps=0.2
    #minBoxes=3

glueck@gluecktx2DS5:/opt/nvidia/deepstream/deepstream-5.1/samples/configs/tlt_pretrained_models$ cat deepstream_app_source1_facedetectir.txt

################################################################################
# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
################################################################################

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=1

[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
gpu-id=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=2
num-sources=1
#uri=file://../../streams/sample_1080p_h265.mp4
uri=rtsp://root:Glueck321@10.0.1.36/axis-media/media.amp?streamprofile=H264
gpu-id=0

[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
source-id=0
gpu-id=0

[osd]
enable=1
gpu-id=0
border-width=3
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial

[primary-gie]
enable=1
gpu-id=0
# Modify as necessary
model-engine-file=../../models/tlt_pretrained_models/facedetectir/resnet18_facedetectir_pruned.etlt_b1_gpu0_int8.engine
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
gie-unique-id=1
config-file=config_infer_primary_facedetectir.txt

[sink1]
enable=0
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265 3=mpeg4
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=2000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
output-file=out.mp4
source-id=0

[sink2]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming 5=Overlay
type=4
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=0
bitrate=4000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
# set below properties in case of RTSPStreaming
rtsp-port=8554
udp-port=5400

[tracker]
enable=1
tracker-width=640
tracker-height=384
#ll-lib-file=/opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_mot_iou.so
#ll-lib-file=/opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_nvdcf.so
ll-lib-file=/opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_mot_klt.so
#ll-config-file required for DCF/IOU only
ll-config-file=../deepstream-app/tracker_config.yml
#ll-config-file=iou_config.txt
gpu-id=0
#enable-batch-process applicable to DCF only
enable-batch-process=1

[tests]
file-loop=1

I usually run the file using the command;
deepstream-app -c deepstream_app_source1_facedetectir.txt
This is the output where face is detected.:

Also while running this I see a warning in the console that INT8 not supported trying FP16.

I want to run the same file with the custom model that I trained in any format INT8 or FP16 where the cell below is the only command mentioned in the ipynb for converting. There is no exporting step.


image

I followed the detectnet_v2 and converted the calibration.bin and now i want to know a way through which i can test on a real time stream either with INT8 or FP16.??

These files where generated by just replacing the paths in the cells to the trained resnet18_detector.etlt in the facenet directory in tlt-experiments.
image

As mentioned above, the facenet is actually based on detectnet_v2. So, all the commands in detectnet_v2 can be used in facenet.
After training, you already have a tlt file. Then, see FaceDetect IR Training using TLT 3.0 and Custom Dataset - #4 by Morganh , this is the exporting step. it will generate calibration.bin file and resnet18_detector.etlt file.
With these two files, you can copy them into the device where you want to run inference.
Set int8-calib-file = calibration.bin
tlt-encoded-model = your_etlt_file

If you are running in Nano, since it is not supporting INT8, so the log may prompt “INT8 not supported by platform”.

More, if you run inference in Nano, as mentioned in “A.Geenrate TensorRT engine”, “for the jetson devices, please download the converter for jetson from dev zone link”, please download tlt-converter in Nano, and run it against the etlt file, it will generate .trt engine file (set -t fp16 ). If run against etlt file and calibration.bin file, it will generate int8 .trt engine file (set -t int8).
This is the option for deployment. See DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation

Hey @Morganh ,

Got the custom trained model working by following the steps mentioned above.

Thanks for your support on this post.

Hi @Morganh ,

As the facenet custom model is working using the steps mentioned by you.

Queries :

  1. I already have trained gender and age caffe models and its respective prototxts. Is there a way to use those caffe models as seconday models in the same face detection file (deepstream_app_source1_facedetectir.txt) ?

  2. Should i also create calib.bin files for the caffe models? If yes, how can that be done? Is there any script for creating calibration.bin files for these models.?

For 1), Your facedetectir can work as primary engine, and your gender/age caffe model works as secondary engine. Similar scenario was seen at deepstream-5.0/samples/configs/deepstream-app/config_infer_secondary_vehicletypes.txt
For 2), in TLT, it does not create calb.bin file for any caffe model.

Hi @Morganh,

I found the scenario where vehiclestypes run as secondary. In my scenario I want to run our exisiting gender model which is in caffe. It has the model file and the prototxt file as mentioned in the vehicle scenario. But the only file missing is the mean.ppm as there is only mean.binaryproto in the caffe model that I have. What can be done to generate the other mean file suitable for the secondary classifier.

If you are taking the caffe model as the secondary classifier, actually it is not a TLT topic, I observe that you already create a topic in deepstream forum for help. Please refer to Mean file "mean.ppm" of deepstream - #5 by Amycao too. It is talking about how to generate .ppm file.
More searching result locate at Search results for 'mean.ppm #intelligent-video-analytics:deepstream-sdk order:latest_topic' - NVIDIA Developer Forums