ao-converter doesn’t work well.
As far as the error is concerned, it seems that the specs of the GPU graphic board are insufficient.
→ Failed to allocate the requested amount of GPU memory (8589934592 bytes).
I also confirmed that the recommended specs say that the GPU memory is 34GB or more.
So I plan to buy the GeForce RTX 3090 VENTUS 3X 24G OC.
Does tao-converter work with 24GB or more GPU memory?
■ graphic board
lspci | grep -i nvidia
0f:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] (rev a1)
0f:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)
0f:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1)
0f:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 [GeForce GTX 1650 SUPER] (rev a1)
■ error
make tao-convert-local
tao-converter -k nvidia_tlt \
-p input_1:0,1x288x384x3,32x288x384x3,32x288x384x3 \
-o heatmap_out/BiasAdd:0,conv2d_transpose_1/BiasAdd:0 -e model.engine -u 1 -m 8 -t fp16 model.etlt \
-w 4294967296
[INFO] [MemUsageChange] Init CUDA: CPU +309, GPU +0, now: CPU 321, GPU 605 (MiB)
[INFO] [MemUsageChange] Init builder kernel library: CPU +263, GPU +76, now: CPU 638, GPU 681 (MiB)
[WARNING] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/filecgm83e
[INFO] ONNX IR version: 0.0.5
[INFO] Opset version: 10
[INFO] Producer name: tf2onnx
[INFO] Producer version: 1.9.2
[INFO] Domain:
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] Detected input dimensions from the model: (-1, -1, -1, 3)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 288, 384, 3) for input: input_1:0
[INFO] Using optimization profile opt shape: (32, 288, 384, 3) for input: input_1:0
[INFO] Using optimization profile max shape: (32, 288, 384, 3) for input: input_1:0
Trying to use DLA core 1 on a platform that doesn't have any DLA cores
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +463, GPU +198, now: CPU 1166, GPU 879 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +115, GPU +52, now: CPU 1281, GPU 931 (MiB)
[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.
[ERROR] 2: [virtualMemoryBuffer.cpp::resizePhysical::160] Error Code 2: OutOfMemory (no further information)
[ERROR] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)
[WARNING] Requested amount of GPU memory (8589934592 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[WARNING] Skipping tactic 2 due to insufficient memory on requested size of 8589934592 detected for tactic 0x0000000000000002.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().