Failed to allocate the requested amount of GPU memory

ao-converter doesn’t work well.

As far as the error is concerned, it seems that the specs of the GPU graphic board are insufficient.

→ Failed to allocate the requested amount of GPU memory (8589934592 bytes).

I also confirmed that the recommended specs say that the GPU memory is 34GB or more.

So I plan to buy the GeForce RTX 3090 VENTUS 3X 24G OC.

Does tao-converter work with 24GB or more GPU memory?

■ graphic board

lspci | grep -i nvidia

0f:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] (rev a1)

0f:00.1 Audio device: NVIDIA Corporation TU116 High Definition Audio Controller (rev a1)

0f:00.2 USB controller: NVIDIA Corporation TU116 USB 3.1 Host Controller (rev a1)

0f:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU116 [GeForce GTX 1650 SUPER] (rev a1)

■ error

make tao-convert-local

tao-converter -k nvidia_tlt \

-p input_1:0,1x288x384x3,32x288x384x3,32x288x384x3 \

-o heatmap_out/BiasAdd:0,conv2d_transpose_1/BiasAdd:0 -e model.engine -u 1 -m 8 -t fp16 model.etlt \

-w 4294967296

[INFO] [MemUsageChange] Init CUDA: CPU +309, GPU +0, now: CPU 321, GPU 605 (MiB)

[INFO] [MemUsageChange] Init builder kernel library: CPU +263, GPU +76, now: CPU 638, GPU 681 (MiB)

[WARNING] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars

[INFO] ----------------------------------------------------------------

[INFO] Input filename: /tmp/filecgm83e

[INFO] ONNX IR version: 0.0.5

[INFO] Opset version: 10

[INFO] Producer name: tf2onnx

[INFO] Producer version: 1.9.2

[INFO] Domain:

[INFO] Model version: 0

[INFO] Doc string:

[INFO] ----------------------------------------------------------------

[WARNING] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

[INFO] Detected input dimensions from the model: (-1, -1, -1, 3)

[INFO] Model has dynamic shape. Setting up optimization profiles.

[INFO] Using optimization profile min shape: (1, 288, 384, 3) for input: input_1:0

[INFO] Using optimization profile opt shape: (32, 288, 384, 3) for input: input_1:0

[INFO] Using optimization profile max shape: (32, 288, 384, 3) for input: input_1:0

Trying to use DLA core 1 on a platform that doesn't have any DLA cores

[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +463, GPU +198, now: CPU 1166, GPU 879 (MiB)

[INFO] [MemUsageChange] Init cuDNN: CPU +115, GPU +52, now: CPU 1281, GPU 931 (MiB)

[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.

[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.

[ERROR] 2: [virtualMemoryBuffer.cpp::resizePhysical::160] Error Code 2: OutOfMemory (no further information)

[ERROR] 2: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information)

[WARNING] Requested amount of GPU memory (8589934592 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.

[WARNING] Skipping tactic 2 due to insufficient memory on requested size of 8589934592 detected for tactic 0x0000000000000002.

Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). 

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Yes, suggest to use a dgpu with more GPU memory.
For current dgpu, you can run “./tao-converter -h” for help.

If meet with out-of-memory issue, please decrease the batch size(-m) accordingly.
Or increase the workspace size.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.