Tlt-convert for custom trained YoloV4 model failed on Jetson Nano 4G

Is it because of the protobuf version mismatch? How to check the protobuf version in my env?

I am afraid not. Can you add “-p Input,1x3x544x960,8x3x544x960,16x3x544x960” and retry?

Yes, I added but the error message is the same.

[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:1: Invalid control characters encountered in text.
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:3: Interpreting non ascii codepoint 200.
[libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format onnx2trt_onnx.ModelProto: 1:3: Message type "onnx2trt_onnx.ModelProto" has no field named "u".
Failed to parse ONNX model from file/tmp/fileZulMFL
[ERROR] Number of optimization profiles does not match model input node number.

In one Nano board, I can generate trt engine successfully.
Where did you download tlt-converter?

$ ./tlt-converter -k nvidia_tlt -d 3,544,960 -e trt.fp16.engine -t fp16 -p Input,1x3x544x960,8x3x544x960,16x3x544x960 yolov4_resnet18.etlt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[INFO] ModelImporter.cpp:135: No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[INFO] builtin_op_importers.cpp:3659: Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[INFO] builtin_op_importers.cpp:3676: Successfully created plugin: BatchedNMSDynamic_TRT
[INFO] Detected input dimensions from the model: (-1, 3, 544, 960)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 544, 960) for input: Input
[INFO] Using optimization profile opt shape: (8, 3, 544, 960) for input: Input
[INFO] Using optimization profile max shape: (16, 3, 544, 960) for input: Input
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.

[INFO] Detected 1 inputs and 4 output network tensors.

https://docs.nvidia.com/tlt/tlt-user-guide/text/tensorrt.html#tlt-converter-matrix

I looked up from this link, I will double check if this is correct version!

Seems I used the wrong key - it should be nvidia_tlt, I used tlt-encode!!! Silly mistakes

I run your command from you Nano, seems the memory issue appears again…

[ERROR] ../builder/cudnnBuilderUtils.cpp (414) - Cuda Error in findFastestTactic: 98 (invalid device function)
[WARNING] GPU memory allocation error during getBestTactic: BatchedNMS_N
[ERROR] ../builder/cudnnBuilderUtils.cpp (414) - Cuda Error in findFastestTactic: 98 (invalid device function)
[WARNING] GPU memory allocation error during getBestTactic: BatchedNMS_N
[ERROR] Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[ERROR] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node BatchedNMS_N.)
[ERROR] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node BatchedNMS_N.)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

BTW, did you activate the jetson_clocks?

Yes, I run it.
$ sudo nvpmodel -m 0
$ jetson_clocks

Which Jetpack version did you install? What is the output of “$ dpkg -l |grep cuda” ?

I have activate jetson_clocks and turn the full power on.
dpkg -l |grep cuda output

kai@kai-jetson:~/workspace/deepstream_tlt_apps/models/yolov4$ dpkg -l |grep cuda
ii  cuda-command-line-tools-10-2               10.2.89-1                                        arm64        CUDA command-line tools
ii  cuda-compiler-10-2                         10.2.89-1                                        arm64        CUDA compiler
ii  cuda-cudart-10-2                           10.2.89-1                                        arm64        CUDA Runtime native Libraries
ii  cuda-cudart-dev-10-2                       10.2.89-1                                        arm64        CUDA Runtime native dev links, headers
ii  cuda-cufft-10-2                            10.2.89-1                                        arm64        CUFFT native runtime libraries
ii  cuda-cufft-dev-10-2                        10.2.89-1                                        arm64        CUFFT native dev links, headers
ii  cuda-cuobjdump-10-2                        10.2.89-1                                        arm64        CUDA cuobjdump
ii  cuda-cupti-10-2                            10.2.89-1                                        arm64        CUDA profiling tools runtime libs.
ii  cuda-cupti-dev-10-2                        10.2.89-1                                        arm64        CUDA profiling tools interface.
ii  cuda-curand-10-2                           10.2.89-1                                        arm64        CURAND native runtime libraries
ii  cuda-curand-dev-10-2                       10.2.89-1                                        arm64        CURAND native dev links, headers
ii  cuda-cusolver-10-2                         10.2.89-1                                        arm64        CUDA solver native runtime libraries
ii  cuda-cusolver-dev-10-2                     10.2.89-1                                        arm64        CUDA solver native dev links, headers
ii  cuda-cusparse-10-2                         10.2.89-1                                        arm64        CUSPARSE native runtime libraries
ii  cuda-cusparse-dev-10-2                     10.2.89-1                                        arm64        CUSPARSE native dev links, headers
ii  cuda-documentation-10-2                    10.2.89-1                                        arm64        CUDA documentation
ii  cuda-driver-dev-10-2                       10.2.89-1                                        arm64        CUDA Driver native dev stub library
ii  cuda-gdb-10-2                              10.2.89-1                                        arm64        CUDA-GDB
ii  cuda-libraries-10-2                        10.2.89-1                                        arm64        CUDA Libraries 10.2 meta-package
ii  cuda-libraries-dev-10-2                    10.2.89-1                                        arm64        CUDA Libraries 10.2 development meta-package
ii  cuda-license-10-2                          10.2.89-1                                        arm64        CUDA licenses
ii  cuda-memcheck-10-2                         10.2.89-1                                        arm64        CUDA-MEMCHECK
ii  cuda-misc-headers-10-2                     10.2.89-1                                        arm64        CUDA miscellaneous headers
ii  cuda-npp-10-2                              10.2.89-1                                        arm64        NPP native runtime libraries
ii  cuda-npp-dev-10-2                          10.2.89-1                                        arm64        NPP native dev links, headers
ii  cuda-nvcc-10-2                             10.2.89-1                                        arm64        CUDA nvcc
ii  cuda-nvdisasm-10-2                         10.2.89-1                                        arm64        CUDA disassembler
ii  cuda-nvgraph-10-2                          10.2.89-1                                        arm64        NVGRAPH native runtime libraries
ii  cuda-nvgraph-dev-10-2                      10.2.89-1                                        arm64        NVGRAPH native dev links, headers
ii  cuda-nvml-dev-10-2                         10.2.89-1                                        arm64        NVML native dev links, headers
ii  cuda-nvprof-10-2                           10.2.89-1                                        arm64        CUDA Profiler tools
ii  cuda-nvprune-10-2                          10.2.89-1                                        arm64        CUDA nvprune
ii  cuda-nvrtc-10-2                            10.2.89-1                                        arm64        NVRTC native runtime libraries
ii  cuda-nvrtc-dev-10-2                        10.2.89-1                                        arm64        NVRTC native dev links, headers
ii  cuda-nvtx-10-2                             10.2.89-1                                        arm64        NVIDIA Tools Extension
ii  cuda-repo-l4t-10-2-local-10.2.89           1.0-1                                            arm64        cuda repository configuration files
ii  cuda-samples-10-2                          10.2.89-1                                        arm64        CUDA example applications
ii  cuda-toolkit-10-2                          10.2.89-1                                        arm64        CUDA Toolkit 10.2 meta-package
ii  cuda-tools-10-2                            10.2.89-1                                        arm64        CUDA Tools meta-package
ii  graphsurgeon-tf                            7.1.3-1+cuda10.2                                 arm64        GraphSurgeon for TensorRT package
ii  libcudnn8                                  8.0.0.180-1+cuda10.2                             arm64        cuDNN runtime libraries
ii  libcudnn8-dev                              8.0.0.180-1+cuda10.2                             arm64        cuDNN development libraries and headers
ii  libcudnn8-doc                              8.0.0.180-1+cuda10.2                             arm64        cuDNN documents and samples
ii  libnvinfer-bin                             7.1.3-1+cuda10.2                                 arm64        TensorRT binaries
ii  libnvinfer-dev                             7.1.3-1+cuda10.2                                 arm64        TensorRT development libraries and headers
ii  libnvinfer-doc                             7.1.3-1+cuda10.2                                 all          TensorRT documentation
ii  libnvinfer-plugin-dev                      7.1.3-1+cuda10.2                                 arm64        TensorRT plugin libraries
ii  libnvinfer-plugin7                         7.1.3-1+cuda10.2                                 arm64        TensorRT plugin libraries
ii  libnvinfer-samples                         7.1.3-1+cuda10.2                                 all          TensorRT samples
ii  libnvinfer7                                7.1.3-1+cuda10.2                                 arm64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                       7.1.3-1+cuda10.2                                 arm64        TensorRT ONNX libraries
ii  libnvonnxparsers7                          7.1.3-1+cuda10.2                                 arm64        TensorRT ONNX libraries
ii  libnvparsers-dev                           7.1.3-1+cuda10.2                                 arm64        TensorRT parsers libraries
ii  libnvparsers7                              7.1.3-1+cuda10.2                                 arm64        TensorRT parsers libraries
ii  nvidia-container-csv-cuda                  10.2.89-1                                        arm64        Jetpack CUDA CSV file
ii  nvidia-container-csv-cudnn                 8.0.0.180-1+cuda10.2                             arm64        Jetpack CUDNN CSV file
ii  nvidia-container-csv-tensorrt              7.1.3.0-1+cuda10.2                               arm64        Jetpack TensorRT CSV file
ii  nvidia-l4t-cuda                            32.5.1-20210219084526                            arm64        NVIDIA CUDA Package
ii  python-libnvinfer                          7.1.3-1+cuda10.2                                 arm64        Python bindings for TensorRT
ii  python-libnvinfer-dev                      7.1.3-1+cuda10.2                                 arm64        Python development package for TensorRT
ii  python3-libnvinfer                         7.1.3-1+cuda10.2                                 arm64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                     7.1.3-1+cuda10.2                                 arm64        Python 3 development package for TensorRT
ii  tensorrt                                   7.1.3.0-1+cuda10.2                               arm64        Meta package of TensorRT
ii  uff-converter-tf                           7.1.3-1+cuda10.2                                 arm64        UFF converter for TensorRT package

I ran with the same cuda/cudnn/trt version as you.
Can you generate trt engine again and check system status at the same time?
$ sudo tegrastats

I ran with the same cuda/cudnn/trt version as you.
Can you generate trt engine again check system status at the same time?
$ sudo tegrastats

When the problem happens, the peak of the ram is 3289

RAM 3289/3963MB (lfb 82x4MB) SWAP 599/10173MB (cached 20MB) IRAM 0/252kB(lfb 252kB) CPU [5%@1479,2%@1479,0%@1479,4%@1479] EMC_FREQ 64%@1600 GR3D_FREQ 99%@921 VIC_FREQ 0%@192 APE 25 PLL@42C CPU@45C PMIC@100C GPU@43.5C AO@51C thermal@44C POM_5V_IN 6451/4307 POM_5V_GPU 3225/1299 POM_5V_CPU 489/910
RAM 1607/3963MB (lfb 165x4MB) SWAP 598/10173MB (cached 20MB) IRAM 0/252kB(lfb 252kB) CPU [10%@1479,3%@1479,1%@1479,39%@1479] EMC_FREQ 57%@1600 GR3D_FREQ 0%@921 VIC_FREQ 0%@192 APE 25 PLL@42C CPU@45C PMIC@100C GPU@42.5C AO@51C thermal@43.75C POM_5V_IN 3102/4299 POM_5V_GPU 167/1292 POM_5V_CPU 962/910
RAM 758/3963MB (lfb 192x4MB) SWAP 107/10173MB (cached 10MB) IRAM 0/252kB(lfb 252kB) CPU [33%@1479,0%@1479,1

Can you try to generate yolo_v3 model as well? I can run it successfully in my Nano.

$ ./tlt-converter -k nvidia_tlt -d 3,544,960 -e trt.fp16.engine -t fp16 -p Input,1x3x544x960,1x3x544x960,2x3x544x960 yolov3_resnet18.etlt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[INFO] ModelImporter.cpp:135: No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[INFO] builtin_op_importers.cpp:3659: Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[INFO] builtin_op_importers.cpp:3676: Successfully created plugin: BatchedNMSDynamic_TRT
[INFO] Detected input dimensions from the model: (-1, 3, 544, 960)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 544, 960) for input: Input
[INFO] Using optimization profile opt shape: (1, 3, 544, 960) for input: Input
[INFO] Using optimization profile max shape: (2, 3, 544, 960) for input: Input
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 4 output network tensors.
$ ls trt.fp16.engine
trt.fp16.engine

Please share your full log.

kai@kai-jetson:~/workspace/deepstream_tlt_apps/models/yolov3$ ./tlt-converter -k nvidia_tlt -d 3,544,960 -e trt.fp16.engine -t fp16 -p Input,1x3x544x960,1x3x544x960,2x3x544x960 yolov3_resnet18.etlt

[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[INFO] ModelImporter.cpp:135: No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[INFO] builtin_op_importers.cpp:3659: Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[INFO] builtin_op_importers.cpp:3676: Successfully created plugin: BatchedNMSDynamic_TRT
[INFO] Detected input dimensions from the model: (-1, 3, 544, 960)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 544, 960) for input: Input
[INFO] Using optimization profile opt shape: (1, 3, 544, 960) for input: Input
[INFO] Using optimization profile max shape: (2, 3, 544, 960) for input: Input
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[ERROR] ../builder/cudnnBuilderUtils.cpp (414) - Cuda Error in findFastestTactic: 98 (invalid device function)
[WARNING] GPU memory allocation error during getBestTactic: BatchedNMS_N
[ERROR] ../builder/cudnnBuilderUtils.cpp (414) - Cuda Error in findFastestTactic: 98 (invalid device function)
[WARNING] GPU memory allocation error during getBestTactic: BatchedNMS_N
[ERROR] Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[ERROR] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node BatchedNMS_N.)
[ERROR] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node BatchedNMS_N.)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

If possible, you can try to re-flash with Jetpack 4.4 or 4.5. And run above to check if it still happens.

Ok, I will try on another board when I got time

I tried to rebuild with -t f32 it succeeded! Indeeded this is good progress but I don’t know why. Do you have any thoughts on this?

Which model did you build successfully in fp32 mode? Is it in deepstream_tlt_apps/download_models.sh at master · NVIDIA-AI-IOT/deepstream_tlt_apps · GitHub or your own model or both?

Both are successfully built.