In one Nano board, I can generate trt engine successfully.
Where did you download tlt-converter?
$ ./tlt-converter -k nvidia_tlt -d 3,544,960 -e trt.fp16.engine -t fp16 -p Input,1x3x544x960,8x3x544x960,16x3x544x960 yolov4_resnet18.etlt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[INFO] ModelImporter.cpp:135: No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[INFO] builtin_op_importers.cpp:3659: Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[INFO] builtin_op_importers.cpp:3676: Successfully created plugin: BatchedNMSDynamic_TRT
[INFO] Detected input dimensions from the model: (-1, 3, 544, 960)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 544, 960) for input: Input
[INFO] Using optimization profile opt shape: (8, 3, 544, 960) for input: Input
[INFO] Using optimization profile max shape: (16, 3, 544, 960) for input: Input
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 4 output network tensors.
I run your command from you Nano, seems the memory issue appears again…
[ERROR] ../builder/cudnnBuilderUtils.cpp (414) - Cuda Error in findFastestTactic: 98 (invalid device function)
[WARNING] GPU memory allocation error during getBestTactic: BatchedNMS_N
[ERROR] ../builder/cudnnBuilderUtils.cpp (414) - Cuda Error in findFastestTactic: 98 (invalid device function)
[WARNING] GPU memory allocation error during getBestTactic: BatchedNMS_N
[ERROR] Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[ERROR] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node BatchedNMS_N.)
[ERROR] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node BatchedNMS_N.)
[ERROR] Unable to create engine
Segmentation fault (core dumped)
I have activate jetson_clocks and turn the full power on.
dpkg -l |grep cuda output
kai@kai-jetson:~/workspace/deepstream_tlt_apps/models/yolov4$ dpkg -l |grep cuda
ii cuda-command-line-tools-10-2 10.2.89-1 arm64 CUDA command-line tools
ii cuda-compiler-10-2 10.2.89-1 arm64 CUDA compiler
ii cuda-cudart-10-2 10.2.89-1 arm64 CUDA Runtime native Libraries
ii cuda-cudart-dev-10-2 10.2.89-1 arm64 CUDA Runtime native dev links, headers
ii cuda-cufft-10-2 10.2.89-1 arm64 CUFFT native runtime libraries
ii cuda-cufft-dev-10-2 10.2.89-1 arm64 CUFFT native dev links, headers
ii cuda-cuobjdump-10-2 10.2.89-1 arm64 CUDA cuobjdump
ii cuda-cupti-10-2 10.2.89-1 arm64 CUDA profiling tools runtime libs.
ii cuda-cupti-dev-10-2 10.2.89-1 arm64 CUDA profiling tools interface.
ii cuda-curand-10-2 10.2.89-1 arm64 CURAND native runtime libraries
ii cuda-curand-dev-10-2 10.2.89-1 arm64 CURAND native dev links, headers
ii cuda-cusolver-10-2 10.2.89-1 arm64 CUDA solver native runtime libraries
ii cuda-cusolver-dev-10-2 10.2.89-1 arm64 CUDA solver native dev links, headers
ii cuda-cusparse-10-2 10.2.89-1 arm64 CUSPARSE native runtime libraries
ii cuda-cusparse-dev-10-2 10.2.89-1 arm64 CUSPARSE native dev links, headers
ii cuda-documentation-10-2 10.2.89-1 arm64 CUDA documentation
ii cuda-driver-dev-10-2 10.2.89-1 arm64 CUDA Driver native dev stub library
ii cuda-gdb-10-2 10.2.89-1 arm64 CUDA-GDB
ii cuda-libraries-10-2 10.2.89-1 arm64 CUDA Libraries 10.2 meta-package
ii cuda-libraries-dev-10-2 10.2.89-1 arm64 CUDA Libraries 10.2 development meta-package
ii cuda-license-10-2 10.2.89-1 arm64 CUDA licenses
ii cuda-memcheck-10-2 10.2.89-1 arm64 CUDA-MEMCHECK
ii cuda-misc-headers-10-2 10.2.89-1 arm64 CUDA miscellaneous headers
ii cuda-npp-10-2 10.2.89-1 arm64 NPP native runtime libraries
ii cuda-npp-dev-10-2 10.2.89-1 arm64 NPP native dev links, headers
ii cuda-nvcc-10-2 10.2.89-1 arm64 CUDA nvcc
ii cuda-nvdisasm-10-2 10.2.89-1 arm64 CUDA disassembler
ii cuda-nvgraph-10-2 10.2.89-1 arm64 NVGRAPH native runtime libraries
ii cuda-nvgraph-dev-10-2 10.2.89-1 arm64 NVGRAPH native dev links, headers
ii cuda-nvml-dev-10-2 10.2.89-1 arm64 NVML native dev links, headers
ii cuda-nvprof-10-2 10.2.89-1 arm64 CUDA Profiler tools
ii cuda-nvprune-10-2 10.2.89-1 arm64 CUDA nvprune
ii cuda-nvrtc-10-2 10.2.89-1 arm64 NVRTC native runtime libraries
ii cuda-nvrtc-dev-10-2 10.2.89-1 arm64 NVRTC native dev links, headers
ii cuda-nvtx-10-2 10.2.89-1 arm64 NVIDIA Tools Extension
ii cuda-repo-l4t-10-2-local-10.2.89 1.0-1 arm64 cuda repository configuration files
ii cuda-samples-10-2 10.2.89-1 arm64 CUDA example applications
ii cuda-toolkit-10-2 10.2.89-1 arm64 CUDA Toolkit 10.2 meta-package
ii cuda-tools-10-2 10.2.89-1 arm64 CUDA Tools meta-package
ii graphsurgeon-tf 7.1.3-1+cuda10.2 arm64 GraphSurgeon for TensorRT package
ii libcudnn8 8.0.0.180-1+cuda10.2 arm64 cuDNN runtime libraries
ii libcudnn8-dev 8.0.0.180-1+cuda10.2 arm64 cuDNN development libraries and headers
ii libcudnn8-doc 8.0.0.180-1+cuda10.2 arm64 cuDNN documents and samples
ii libnvinfer-bin 7.1.3-1+cuda10.2 arm64 TensorRT binaries
ii libnvinfer-dev 7.1.3-1+cuda10.2 arm64 TensorRT development libraries and headers
ii libnvinfer-doc 7.1.3-1+cuda10.2 all TensorRT documentation
ii libnvinfer-plugin-dev 7.1.3-1+cuda10.2 arm64 TensorRT plugin libraries
ii libnvinfer-plugin7 7.1.3-1+cuda10.2 arm64 TensorRT plugin libraries
ii libnvinfer-samples 7.1.3-1+cuda10.2 all TensorRT samples
ii libnvinfer7 7.1.3-1+cuda10.2 arm64 TensorRT runtime libraries
ii libnvonnxparsers-dev 7.1.3-1+cuda10.2 arm64 TensorRT ONNX libraries
ii libnvonnxparsers7 7.1.3-1+cuda10.2 arm64 TensorRT ONNX libraries
ii libnvparsers-dev 7.1.3-1+cuda10.2 arm64 TensorRT parsers libraries
ii libnvparsers7 7.1.3-1+cuda10.2 arm64 TensorRT parsers libraries
ii nvidia-container-csv-cuda 10.2.89-1 arm64 Jetpack CUDA CSV file
ii nvidia-container-csv-cudnn 8.0.0.180-1+cuda10.2 arm64 Jetpack CUDNN CSV file
ii nvidia-container-csv-tensorrt 7.1.3.0-1+cuda10.2 arm64 Jetpack TensorRT CSV file
ii nvidia-l4t-cuda 32.5.1-20210219084526 arm64 NVIDIA CUDA Package
ii python-libnvinfer 7.1.3-1+cuda10.2 arm64 Python bindings for TensorRT
ii python-libnvinfer-dev 7.1.3-1+cuda10.2 arm64 Python development package for TensorRT
ii python3-libnvinfer 7.1.3-1+cuda10.2 arm64 Python 3 bindings for TensorRT
ii python3-libnvinfer-dev 7.1.3-1+cuda10.2 arm64 Python 3 development package for TensorRT
ii tensorrt 7.1.3.0-1+cuda10.2 arm64 Meta package of TensorRT
ii uff-converter-tf 7.1.3-1+cuda10.2 arm64 UFF converter for TensorRT package
Can you try to generate yolo_v3 model as well? I can run it successfully in my Nano.
$ ./tlt-converter -k nvidia_tlt -d 3,544,960 -e trt.fp16.engine -t fp16 -p Input,1x3x544x960,1x3x544x960,2x3x544x960 yolov3_resnet18.etlt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[INFO] ModelImporter.cpp:135: No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[INFO] builtin_op_importers.cpp:3659: Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[INFO] builtin_op_importers.cpp:3676: Successfully created plugin: BatchedNMSDynamic_TRT
[INFO] Detected input dimensions from the model: (-1, 3, 544, 960)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 544, 960) for input: Input
[INFO] Using optimization profile opt shape: (1, 3, 544, 960) for input: Input
[INFO] Using optimization profile max shape: (2, 3, 544, 960) for input: Input
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 4 output network tensors.
$ ls trt.fp16.engine
trt.fp16.engine
kai@kai-jetson:~/workspace/deepstream_tlt_apps/models/yolov3$ ./tlt-converter -k nvidia_tlt -d 3,544,960 -e trt.fp16.engine -t fp16 -p Input,1x3x544x960,1x3x544x960,2x3x544x960 yolov3_resnet18.etlt
[WARNING] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[WARNING] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[INFO] ModelImporter.cpp:135: No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[INFO] builtin_op_importers.cpp:3659: Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[INFO] builtin_op_importers.cpp:3676: Successfully created plugin: BatchedNMSDynamic_TRT
[INFO] Detected input dimensions from the model: (-1, 3, 544, 960)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 544, 960) for input: Input
[INFO] Using optimization profile opt shape: (1, 3, 544, 960) for input: Input
[INFO] Using optimization profile max shape: (2, 3, 544, 960) for input: Input
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[ERROR] ../builder/cudnnBuilderUtils.cpp (414) - Cuda Error in findFastestTactic: 98 (invalid device function)
[WARNING] GPU memory allocation error during getBestTactic: BatchedNMS_N
[ERROR] ../builder/cudnnBuilderUtils.cpp (414) - Cuda Error in findFastestTactic: 98 (invalid device function)
[WARNING] GPU memory allocation error during getBestTactic: BatchedNMS_N
[ERROR] Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[ERROR] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node BatchedNMS_N.)
[ERROR] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node BatchedNMS_N.)
[ERROR] Unable to create engine
Segmentation fault (core dumped)