tensorrt version:8.5 &8.6
cuda 11,
A30 card,
centos 7,
firstly, convert a pb model to onnx,then using trtexec to convert onnx to rt,but the trtexec stuck there for hours,gpu memory is sufficient and the GPU usage percent is 0%, finally i kill the trtexec
the trtexec output:
[05/31/2024-10:38:40] [I] SMs: 56
[05/31/2024-10:38:40] [I] Device Global Memory: 24258 MiB
[05/31/2024-10:38:40] [I] Shared Memory per SM: 164 KiB
[05/31/2024-10:38:40] [I] Memory Bus Width: 3072 bits (ECC enabled)
[05/31/2024-10:38:40] [I] Application Compute Clock Rate: 1.44 GHz
[05/31/2024-10:38:40] [I] Application Memory Clock Rate: 1.215 GHz
[05/31/2024-10:38:40] [I]
[05/31/2024-10:38:40] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[05/31/2024-10:38:40] [I]
[05/31/2024-10:38:40] [I] TensorRT version: 8.6.1
[05/31/2024-10:38:40] [I] Loading standard plugins
[05/31/2024-10:38:40] [I] [TRT] [MemUsageChange] Init CUDA: CPU +180, GPU +0, now: CPU 185, GPU 294 (MiB)
[05/31/2024-10:38:49] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU -4294965169, GPU +308, now: CPU 1691, GPU 602 (MiB)
[05/31/2024-10:38:49] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[05/31/2024-10:38:49] [I] Start parsing network model.
[05/31/2024-10:38:49] [I] [TRT] ----------------------------------------------------------------
[05/31/2024-10:38:49] [I] [TRT] Input filename: ./verti_value.onnx
[05/31/2024-10:38:49] [I] [TRT] ONNX IR version: 0.0.4
[05/31/2024-10:38:49] [I] [TRT] Opset version: 9
[05/31/2024-10:38:49] [I] [TRT] Producer name: tf2onnx
[05/31/2024-10:38:49] [I] [TRT] Producer version: 1.8.4
[05/31/2024-10:38:49] [I] [TRT] Domain:
[05/31/2024-10:38:49] [I] [TRT] Model version: 0
[05/31/2024-10:38:49] [I] [TRT] Doc string:
[05/31/2024-10:38:49] [I] [TRT] ----------------------------------------------------------------
[05/31/2024-10:38:50] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/31/2024-10:38:50] [W] [TRT] onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[05/31/2024-10:38:50] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[05/31/2024-10:38:50] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output.
[05/31/2024-10:38:50] [I] Finished parsing network model. Parse time: 0.289144
[05/31/2024-10:38:50] [I] [TRT] Graph optimization time: 0.0334055 seconds.
[05/31/2024-10:38:50] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/31/2024-10:40:36] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[05/31/2024-10:40:36] [I] [TRT] Detected 1 inputs and 1 output network tensors.
^C
tensorrt86/bin/trtexec --onnx=./verti_value.onnx --explicitBatch --minShapes=input_node:0:1x48x1x1 --optShapes=input_node:0:16x48x16x1 --maxShapes=input_node:0:32x48x32x1 --shapes=input_node:0:16x48x16x1 --saveEngine=./verti.rt
i donot know how to deliver my verti.onnx to you to recur the problem.