Tlt-convert for custom trained YoloV4 model failed on Jetson Nano 4G

1733208392 · June 24, 2021, 2:31am

I have trained a YoloV4 model then try to convert to the engine file on my Jetson Nano. The full command is

tlt-converter -k tlt-encode \
              -d 3,608,608 \
              -o BatchedNMS \
              -e trt1.engine \
              -m 2 \
              -t fp16 \
              -i nchw \
              -p Input,1x3x608x608,1x3x608x608,2x3x608x608 \
              -w 1610612736 \
              yolov4_cspdarknet19_epoch_055.etlt

The error message is like,

[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[ERROR] ../builder/cudnnBuilderUtils.cpp (414) - Cuda Error in findFastestTactic: 98 (invalid device function)
[WARNING] GPU memory allocation error during getBestTactic: BatchedNMS_N
[ERROR] ../builder/cudnnBuilderUtils.cpp (414) - Cuda Error in findFastestTactic: 98 (invalid device function)
[WARNING] GPU memory allocation error during getBestTactic: BatchedNMS_N
[ERROR] Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[ERROR] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node BatchedNMS_N.)
[ERROR] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node BatchedNMS_N.)
[ERROR] Unable to create engine

Morganh · June 24, 2021, 3:17am

Seems to be out out memory.
-m maximum TensorRT engine batch size (default 16). If meet with out-of-memory issue, please decrease the batch size accordingly.
Please try to decrease -m to 1 and retry.

1733208392 · June 24, 2021, 3:46am

Unfortunately, it still fails so that I suspect there may be other issues and how can I track it down?

 tlt-converter -k tlt-encode \
>               -d 3,608,608 \
>               -o BatchedNMS \
>               -e trt1.engine \
>               -m 1 \
>               -t fp16 \
>               -i nchw \
>               -p Input,1x3x608x608,1x3x608x608,1x3x608x608 \
>               -w 1000000000 \
>               yolov4_cspdarknet19_epoch_055.etlt

Morganh · June 24, 2021, 6:04am

Please remove -p Input,1x3x608x608,1x3x608x608,1x3x608x608 .
Refer to the yolo_v4 jupyter notebook.

1733208392 · June 24, 2021, 6:09am

When I removed the line of -p, the error message is as follows,

[INFO] Detected input dimensions from the model: (-1, 3, 608, 608)
[ERROR] Model has dynamic shape but no optimization profile specified.

Morganh · June 24, 2021, 6:57am

For TLT 3.0-py3 version, “-p” is needed for yolo_v4. See YOLOv4 — Transfer Learning Toolkit 3.0 documentation

To narrow down, can you run tlt-converter successfully in the machine where you run the training?

1733208392 · June 24, 2021, 7:01am

Yes I did successfully run the tlt-converter from Jupyter notebook from a server with -p added. However, the created engine file won’t be working with the the jetson, that is why I try to do tlt-converter from the jetson nano but the w/o success so far.

I pasted the error message from running my app

NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1798> [UID = 1]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-5.1/samples/models/tlt_pretrained_models/firenet/trt.engine failed, try rebuild
0:00:07.345111513 27444     0x39be5670 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1716> [UID = 1]: Trying to create engine from model files
ERROR: failed to build network since there is no model file matched.
ERROR: failed to build network.

Morganh · June 24, 2021, 7:04am

So, it is a still an OOM issue. Can you try to check if below works?

restart Nano
increase “-w”

1733208392 · June 24, 2021, 7:14am

I used -w 2130000000, the mem usage shows below

BTW, I have boot it the jetson into text mode, the initial mem usage is only 0.4GB

I am trying to run it will let you know the result in a few minutes.

1733208392 · June 24, 2021, 7:25am

Seems we made a little progress …

[ERROR] /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/resources.h (460) - Cuda Error in loadKernel: 702 (the launch timed out and was terminated)
[ERROR] ../rtSafe/safeRuntime.cpp (32) - Cuda Error in free: 702 (the launch timed out and was terminated)
terminate called after throwing an instance of 'nvinfer1::CudaError'
  what():  std::exception

Morganh · June 24, 2021, 7:44am

There is one experiment here.I suggest you trying to train a yolo_v4 model with smaller input_size. For example, 128x128.
You can just train for 1 epoch. Then export the tlt model into etlt model. Next, copy the etlt model into the Nano and run tlt-converter again.

1733208392 · June 24, 2021, 8:00am

Thanks for your advice. I will give it a try!

1733208392 · June 24, 2021, 10:33am

Doesn’t seem to work :(

1733208392 · June 24, 2021, 1:07pm

Is there a way to run tlt-converter in the server for the jetson nano? Jetson Nano is too limited in resources.

Morganh · June 24, 2021, 1:16pm

If you run inference in Nano, it is suggested to generate trt engine in Nano to avoid TRT mismatching error.
Can you try more experiments for yolo_v4?
Please download the models , see https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps/blob/master/download_models.sh

# For Faster-RCNN / YoloV3 / YoloV4 /SSD / DSSD / RetinaNet/ UNET/:
# wget https://nvidia.box.com/shared/static/i1cer4s3ox4v8svbfkuj5js8yqm3yazo.zip -O models.zip

Try to run tlt-converter against these models which are trained from Nvidia.
Their key is nvidia_tlt. Input size is 960x544

1733208392 · June 24, 2021, 1:23pm

Ok let me play with them

1733208392 · June 25, 2021, 12:37am

I have downloaded the models then I tried tlt-converter for yolov4_resnet18.etlt. The command I used is

tlt-converter -k tlt-encode  \
                    -d 3,384,1248 \
                    -o BatchedNMS \
                    -e trt.fp16.engine \
                    -t fp16 \
                    -i nchw \
                    -m 8 \
                    yolov4_resnet18.etlt

The error message is as follows,

[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1: Invalid control characters encountered in text.
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:3: Interpreting non ascii codepoint 200.
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:3: Message type "onnx2trt_onnx.ModelProto" has no field named "u".
Failed to parse ONNX model from file/tmp/fileQlEezP
[INFO] Model has no dynamic shape.
[ERROR] Network must have at least one output
[ERROR] Network validation failed.
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Morganh · June 25, 2021, 12:39am

Please set to -d 3,544,960

1733208392 · June 25, 2021, 12:44am

Nope, the result is the same

Morganh · June 25, 2021, 1:02am

Can you add “-p” option?

Topic		Replies	Views
Yolov4 not working in deepstream app? TAO Toolkit	26	1255	August 28, 2021
Error in Yolov4 engine conversion, TAO Toolkit	43	2367	October 26, 2021
[ERROR] Model has dynamic shape but no optimization profile specified. Aborted (core dumped) TAO Toolkit	30	2019	December 13, 2021
Convert model to Jetson Error during model export step in TAO notebook TAO Toolkit	21	2038	February 15, 2022
Unable to generate tensorrt engine using ds-tao-detection app for yolov4_tiny for QAT trained etlt model DeepStream SDK	16	540	June 14, 2023
TLT YOLOv4 (CSPDakrnet53) - TensorRT INT8 model gives wrong predictions (0 mAP) TAO Toolkit yolo	35	3816	December 6, 2021
Error when generating engine file from a TAO trained yolov4_tiny model in Deepstream 6.1.1 DeepStream SDK	11	381	June 12, 2023
Error in integrating Yolov4 in Deepstream 6, 6.1, 6.1.1, and 6.2 TAO Toolkit	14	851	March 21, 2023
Deepstream infrence gives no detection TAO Toolkit	28	1928	December 9, 2021
How to generate the correct engine with tensorrt for Yolov3 TAO Toolkit	8	992	July 22, 2023

Tlt-convert for custom trained YoloV4 model failed on Jetson Nano 4G

Related topics