TAO and jetson (Jetpack compatibility )

Please provide the following information when requesting support.

summary:
I followed the TAO guide, made an output that I was supposed to be able to put in my Jetson and run, and it doesn’t work!

• Hardware (T4/V100/Xavier/Nano/etc)
TAO (version I’ve tried will be listed below I’ve tried ) on DGX Station A100 | Jetson AGX Xavier (JP 4.6.1)


 dockers: ['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm']
 format_version: 2.0
 toolkit_version: 3.22.05 # [not working]
 published_date: 05/25/2022 

 dockers: ['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm']
 format_version: 2.0
 toolkit_version: 3.22.02   # [not working]
 published_date: 02/28/2022 #[not working]

dockers: ['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm']
format_version: 2.0
toolkit_version: 3.21.11 [Not working]
published_date: 11/08/2021


when I try to run in deepstream I get

ERROR: [TRT]: UffParser: Validator error: FirstDimTile_4: Unsupported operation _BatchTilePlugin_TRT
parseModel: Failed to parse UFF model

I thinks it has to with the BatchTilePlugin [TensorRT/plugin/batchTilePlugin at master · NVIDIA/TensorRT · GitHub]
Which I belive has been there for quite a while (Clearly not something introduced in the TensorRT8.2.5 version ) So I’m confused why this is not working. Is there some other trick I’m missing here.

• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
Resnet-18 SSD (pretrained_object_detection_vresnet18-2)

• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)

• Training spec file(If have, please share here)

ssd_train_resnet18_kitti.txt (1.6 KB)

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

generated_model.zip (51.2 MB)

Can you show the full command and full log? Thanks.

Hi @Morganh ,
Thanks a lot for getting back!
This is the output of the last run I get the same _BatchTilePlugin_TRT regardless of it is fp16 or uint8,
My input batch size is 1 for this application, so I set the batch size to 1

export command for fp16 (run on DGX station)

!rm -rf $LOCAL_EXPERIMENT_DIR/export
!mkdir -p $LOCAL_EXPERIMENT_DIR/export
!tao ssd export --gpu_index=$GPU_INDEX \
                -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/ssd_resnet18_epoch_020.tlt \
                -k $KEY \
                -o $USER_EXPERIMENT_DIR/export/ganindu_ssd_resnet18_epoch_020.etlt \
                -e $SPECS_DIR/ssd_train_resnet18_kitti.txt \
                --batch_size 1 \
                --data_type fp16 \
                --gen_ds_config

for INT8 (run on DGX station)

!tao ssd export --gpu_index=$GPU_INDEX \
                -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/ssd_resnet18_epoch_020.tlt \
                -o $USER_EXPERIMENT_DIR/export/ganindu_ssd_resnet18_epoch_020.etlt \
                -e $SPECS_DIR/ssd_train_resnet18_kitti.txt \
                -k $KEY \
                --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
                --data_type int8 \
                --batch_size 1 \
                --batches 100 \
                --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
                --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
                --gen_ds_config

full output from deepstream (run on Jetson)

[input]

python <my_pipeline.py> (This is fine I’ve run other models with this fine, all I’ve done is point the working pipeline to a new model)

[output]

pygst initialized..
creating pipeline..
Creating Source 
 
Creating H264Parser 

Creating Decoder 

creating EGLSink

Starting to run pipeline

Using winsys: x11 
Opening in BLOCKING MODE 
0:00:00.279364031 13004   0x558de40870 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1161> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:00.280231237 13004   0x558de40870 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1914> [UID = 1]: Trying to create engine from model files
ERROR: [TRT]: UffParser: Validator error: FirstDimTile_4: Unsupported operation _BatchTilePlugin_TRT
parseModel: Failed to parse UFF model
ERROR: Failed to build network, error in model parsing.
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API
0:00:02.090183217 13004   0x558de40870 ERROR                nvinfer gstnvinfer.cpp:632:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1934> [UID = 1]: build engine file failed
Segmentation fault (core dumped)

Please refer to SSD - NVIDIA Docs

TensorRT OSS build is required for SSD models. This is required because several TensorRT plugins that are required by these models are only available in TensorRT open source repo and not in the general TensorRT release. Specifically, for SSD, we need the batchTilePlugin and NMSPlugin .

Please follow it or https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/master/TRT-OSS

Hi @Morganh,
Thanks a lot for getting back to me, That actually helped I think the Instructions on github are more upto data.

The only change I made is I had to change the CMakeLists.txt file to add the search path for the headers for cub

include_directories(
    ${CUDA_INCLUDE_DIRS}
    ${CUDNN_ROOT_DIR}/include
    ${CMAKE_CURRENT_SOURCE_DIR}/third_party/cub  # add this line to make it build 
)

Then after doing the install bit (copy the library and ldconfig the engine is built successfully )

Afterwards I honestly think this the scope of this question ends! So Thank you a lot!!

[PART 2] I understand this is moving goalposts so Apologies! (In case this needs to go elsewhere)

Having said all of the above it seems like the custom parser library sees something different than anticipated. (therefore I’m not sure if model conversion happens correctly)

I have one of my output tensors as 1x1x1 tensor. I’m using the nvidia boilerplate resnet_18 example network (as explained and attached above) If I could use netron on thiis I belive I’d see [num_bboxes x num_classes x 1] and [num_bboxes x num_offsets x 1] shaped tensors.

This is really confusing, I’m not sure if the conversion wan’t alright or I’m doing something wrong!
NOw I’ve attached .etlt files and the newly generated engine files
and the newly built plugins for your reference.

sample output

Using winsys: x11 
Opening in BLOCKING MODE 
INFO: [Implicit Engine Info]: layers num: 3
0   INPUT  kFLOAT Input           3x300x300       
1   OUTPUT kFLOAT NMS             1x200x7         
2   OUTPUT kFLOAT NMS_1           1x1x1           

Layer name  NMS index = 0 channels 1 height 200 width 7
Layer name  NMS_1 index = 1 channels 1 height 1 width 1

0:00:00.276793336 10583   0x5588f9bc70 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1161> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
0:00:03.629417803 10583   0x5588f9bc70 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/home/nvidia/Workspace/models/ganindu_ssd_resnet18_epoch_020.etlt_b1_gpu0_fp16.engine
0:00:03.646114807 10583   0x5588f9bc70 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /home/nvidia/Workspace/models/ganindu_ssd_resnet18_epoch_020.etlt_b1_gpu0_fp16.engine
0:00:03.691759465 10583   0x5588f9bc70 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-inference> [UID 1]: Load new model:ssd300_pgie_config.txt sucessfully
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
Could not find bbox layer buffer while parsing
0:00:03.973225788 10583   0x5588f94de0 ERROR                nvinfer gstnvinfer.cpp:632:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::fillDetectionOutput() <nvdsinfer_context_impl_output_parsing.cpp:726> [UID = 1]: Failed to parse bboxes using custom parse function
Segmentation fault (core dumped)


[I am using JP 4.6.1]
libnvinfer_plugin.so.8.2.1.backup (14.7 MB)
libnvinfer_plugin.so.8.2.3 (18.5 MB)
ganindu_ssd_resnet18_epoch_020.etlt_b1_gpu0_fp16.engine (27.4 MB)
labels.txt (34 Bytes)
ganindu_ssd_resnet18_epoch_020.etlt (51.2 MB)
nvinfer_config.txt (297 Bytes)

Cheers,
Ganindu.

Hi @Morganh deepstream_tao_apps/nvdsinfer_custombboxparser_tlt.cpp at release/tlt3.0 · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub file had all the answers I’m looking for!

Cheers,
Ganindu!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.