TLT model issues

• Hardware Platform (Jetson / GPU) - GeForce RTX 3090
• DeepStream Version - 5.1
• TensorRT Version - 7.2.2 (libnvinfer_plugin.so built from github 21.02 tag)
• NVIDIA GPU Driver Version (valid for GPU only) - 460.32.03
• Issue Type( questions, new requirements, bugs) - Bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

I am experiencing a number of issues running the TLT pre-trained models. I have attached a .tar.gz file can be used to reproduce my issues. The .tar.gz file contains:

  • docker-compose.yaml, defines docker container built from Dockerfile, mounts in ./models and ./configs directories
  • Dockerfile, builds environment using nvcr.io/nvidia/deepstream:5.1-21.02-devel. The Dockerfile builds cmake v3.13.5, then builds libnvinfer_plugin.so from the nvidia/TensorRT github repo 21.02 tag, it then builds libnvds_infercustomparser_tlt.so from release/tlt3.0 branch of the NVIDIA-AI-IOT/deepstream_tlt_apps github repo.
  • configs directory, contains config files which have been modified from the “tlt_pretrained_models” configs included with deepstream, incorrect paths have been fixed, config_infer_primary_*.txt files have been modified to use libnvds_infercustomparser_tlt.so, deepstream_app_source1_detection_*.txt files have been added for each model based on deepstream_app_source1_detection_models.txt.

Ensure nvidia-container-runtime is installed on the host machine and configured as the default docker runtime, host machine is running Ubuntu 18.04. Extract the attached archive. The TLT models must be then downloaded and extracted into the models directory, the models are downloaded from this link: https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip found on this page: Transfer Learning Toolkit (TLT) Integration with DeepStream — DeepStream 5.1 Release documentation

Run the “./start.sh” script to build and run the container. Run each of the deepstream_app_source1_*.txt config files with “deepstream-app -c filename.txt”.

Every model/config has issues, below I list the issues that I observe with each config:

deepstream_app_source1_detection_dssd.txt: works but misses a lot of detections and labels are wrong

deepstream_app_source1_detection_frcnn.txt:

NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1716> [UID = 1]: Trying to create engine from model files
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:33 [TRT]: UffParser: Could not read buffer.
parseModel: Failed to parse UFF model
ERROR: tlt/tlt_decode.cpp:274 failed to build network since parsing model errors.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:797 Failed to create network using custom network creation function
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:862 Failed to get cuda engine from custom library API
0:00:01.095817111    91 0x55d362a84e00 ERROR                nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1736> [UID = 1]: build engine file failed
Segmentation fault (core dumped)

deepstream_app_source1_peoplenet.txt: same errors as deepstream_app_source1_detection_frcnn.txt

deepstream_app_source1_detection_retinanet.txt runs for a few seconds and then this:

deepstream-app: nvdsinfer_custombboxparser_tlt.cpp:81: bool NvDsInferParseCustomNMSTLT(const std::vector<NvDsInferLayerInfo>&, const NvDsInferNetworkInfo&, const NvDsInferParseDetectionParams&, std::vector<NvDsInferObjectDetectionInfo>&): Assertion `(int) det[1] < out_class_size' failed.
Aborted (core dumped)

deepstream_app_source1_detection_ssd.txt: same error as deepstream_app_source1_detection_retinanet.txt

deepstream_app_source1_detection_yolov3.txt: makes detections but behaves like a classification model instead of outputting detection boxes

deepstream_app_source1_detection_yolov4.txt: makes detections but behaves like a classification model instead of outputting detection boxes

I would appreciate any help in solving these issues. Let me know if you need any further information.

Thanks

nvidia-tlt.tar.gz (8.1 KB)

Hey customer,
you should get all the nvinfer config files and lables(ssd/dssd/retinanet/frcnn/yolov3/yolov4) from deepstream_tlt_apps/configs at release/tlt3.0 · NVIDIA-AI-IOT/deepstream_tlt_apps · GitHub

And for yolov3/yolov4, we cannot support it on TRT7.2, you can see GitHub - NVIDIA-AI-IOT/deepstream_tlt_apps at release/tlt3.0

For peoplenet, will check locally.

For peopleNet, you are using peopleSegNet_resnet50.etlt in your config_infer_primary_peoplenet.txt, it’s not correct, you should use resnet34_peoplenet_pruned.etlt refer /opt/nvidia/deepstream/deepstream-5.1/samples/configs/tlt_pretrained_models/README

mkdir -p ../../models/tlt_pretrained_models/peoplenet && \
    wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplenet/versions/pruned_v2.0/files/resnet34_peoplenet_pruned.etlt \
    -O ../../models/tlt_pretrained_models/peoplenet/resnet34_peoplenet_pruned.etlt

#tlt-encoded-model=../../models/tlt_pretrained_models/peoplenet/resnet34_peoplenet_pruned.etlt
tlt-encoded-model=../../models/tlt_pretrained_models/peopleSegNet/peopleSegNet_resnet50.etlt

Hello,

Thank you for your help, I now have most of them working using those config files however I am still having issues with peoplenet and unet:

deepstream_app_source1_peoplenet.txt I have downloaded the model you specified but get the following errors:

0:00:09.186980123    19 0x5559cb4cae00 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1716> [UID = 1]: Trying to create engine from model files
ERROR: ../nvdsinfer/nvdsinfer_func_utils.cpp:33 [TRT]: UffParser: Unsupported number of graph 0
parseModel: Failed to parse UFF model
ERROR: tlt/tlt_decode.cpp:274 failed to build network since parsing model errors.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:797 Failed to create network using custom network creation function
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:862 Failed to get cuda engine from custom library API
0:00:09.242426481    19 0x5559cb4cae00 ERROR                nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1736> [UID = 1]: build engine file failed
Segmentation fault (core dumped)

deepstream_app_source1_detection_unet.txt: runs but makes no detections, I ran the following to generate the engine file:

tlt-converter -e models/unet/unet_resnet18.etlt_b1_gpu0_fp16.engine -p input_1,1x3x608x960,1x3x608x960,1x3x608x960 -t fp16 -k tlt_encode -m 1 tlt_encode models/unet/unet_resnet18.etlt

I have attached an updated .tar.gz file with the new configuration files. I would appreciate your help on these remaining issues.

On the subject of yolov3/yolov4, is there a fix in the works for this? when will it be ready?

Thanks again,
Lee

nvidia-tlt.tar.gz (9.0 KB)

Hey customer, good to know most models can work!

For the remaining issues:
1.peoplenet: let me check
2.Unet: current deepstream-app cannot support unet, you need to run it using ds-tlt app, refer GitHub - NVIDIA-AI-IOT/deepstream_tlt_apps: Sample apps to demonstrate how to deploy models trained with TLT on DeepStream

For segmentation model:
Usage: ds-tlt  config_file <file1> [file2] ... [fileN]

3.For yolov3/4, we will release new etlt models which can run well with TRT7.2

For peoplenet, your nvinfer config files still not correct, you need to keep in mind that peoplenet is a different model than peopleSegNet, you should refer the /opt/nvidia/deepstream/deepstream-5.1/samples/configs/tlt_pretrained_models/config_infer_primary_peoplenet.txt


[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=tlt_encode
tlt-encoded-model=../../models/tlt_pretrained_models/peoplenet/resnet34_peoplenet_pruned.etlt
labelfile-path=labels_peoplenet.txt
model-engine-file=../../models/tlt_pretrained_models/peoplenet/resnet34_peoplenet_pruned.etlt_b1_gpu0_fp16.engine
input-dims=3;544;960;0
uff-input-blob-name=input_1
batch-size=1
process-mode=1
model-color-format=0
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=3
cluster-mode=1
interval=0
gie-unique-id=1
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid

[class-attrs-all]
pre-cluster-threshold=0.4
## Set eps=0.7 and minBoxes for cluster-mode=1(DBSCAN)
eps=0.7
minBoxes=1

Thank you for your help, that solved my issues.