TRT5.0: Memory error when building engine

blancpaques · October 16, 2018, 2:13pm

Hi all,

I wanted to give a quick try to TensorRT and ran into the following errors when building the engine from an UFF graph

[TensorRT] ERROR: Tensor: Conv_0/Conv2D at max batch size of 80 exceeds the maximum element count of 2147483647.
To solve this problem I had to reduce the builder max_batch_size parameter to 50 or so. Note that this is much less than the maximum batch size I am able to run using Tensorflow (around 200 before encountering OutOfMemory error). Why is that so?
(the convolution which the errors is referring to is a 3x3x1x64 convolution on patches of size 100x100)
[TensorRT] ERROR: runtime.cpp (24) - Cuda Error in allocate: 2
I have had this error several times and absolutely no clue on what was causing it. One way of getting around was to reduce the max_workspace_size parameter of the builder to let’s say a third of the total GPU memory (5Gb on a P100 with 16GB).

All in all I am not sure that I fully grasped what is behind these max_batch_size and max_workspace_size parameters. Any hints would be greatly appreciated.

Thanks

Edit: using TRT 5.0.0.10 with Cuda 9.0 and CUDNN 7.3

NVES · October 16, 2018, 6:04pm

Hello,

Cuda Error in allocate: 2 usually indicate the API call failed because it was unable to allocate enough memory to perform the requested operation.

When you referenced running tensorflow, was it on the same GPU?

blancpaques · October 16, 2018, 6:47pm

Yes, the exact same GPU.
This error was when building the engine, exactly when calling

engine = builder.build_cuda_engine(network)

The returned value is None and the mentionned error is logged.

blancpaques · October 18, 2018, 9:43am

Please find enclosed a small .zip file with minimal set of dependencies to debug. Everything is pretty much explained in the main python script so I will be brief here. The zip contains:

A .pb file (tensorflow model export)
Converted uff graph
Tensor RT serialized engine
Python script which can be used to: run tensorflow on the reference inputs, build the TRT engine, run the TRT engine and compare with TF results
(The zip contains also reference inputs and outputs as nparray but this is of no use here)
First of all I have noticed that if I run tensorflow and then build the TRT engine IN THE SAME PYTHON PROCESS by launching the script with all option, then I systematically get “[TensorRT] ERROR: runtime.cpp (24) - Cuda Error in allocate: 2”
Playing around with the max_batch_size, patch_size and max_workspace_size_gb parameters in the main python file also results in the errors described above (exceed max element count of xxx and Cuda Error in allocate)

Example

max_batch_size = 200
[TensorRT] ERROR: Tensor: Conv_0/Conv2D at max batch size of 200 exceeds the maximum element count of 2147483647

Example (running on a p100 with 16Gb memory)

max_workspace_size_gb = 8
[TensorRT] ERROR: runtime.cpp (24) - Cuda Error in allocate: 2
[TensorRT] ERROR: runtime.cpp (24) - Cuda Error in allocate: 2

Thanks for your help
test_nvidia.zip (2.49 MB)

NVES · October 19, 2018, 9:25pm

thanks for the repro. I don’t see an all option. (do i just call run_reference_tensorflow(), build_trt_engine, run_trt_engine() inline?)

also, I’m getting following error

root@4639a43cf129:/home/scratch.zhenyih_sw/reproduce.2421196/test_nvidia# python test_trt_for_nvidia.py -o build_trt
[TensorRT] INFO: Creating uff graph
Traceback (most recent call last):
  File "test_trt_for_nvidia.py", line 263, in <module>
    runner[args.o]()
  File "test_trt_for_nvidia.py", line 138, in build_trt_engine
    subprocess.check_call(['convert-to-uff', '-o', uff_file, tf_pb_file])
  File "/usr/lib/python2.7/subprocess.py", line 536, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 523, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

root@4639a43cf129:/home/scratch.zhenyih_sw/reproduce.2421196/test_nvidia# convert-to-uff

I have tf_pb_file = ‘best_deploy.pb’ and uff_file = ‘best_deploy.uff’, but don’t have convert-to-uff. I’m running from a trt5 container. are you running directly from metal/host?

NVES · October 20, 2018, 6:16am

Hello, I’m repro’d it now on DGX P100 16GB GPUs. Triaging now, and will keep you updated.

blancpaques · October 20, 2018, 10:17am

Ok, don’t know if this is still relevan but to answers your previous questions:

Yes indeed I messed up with the .zip, sorry about that. As you figured out, the ‘all’ option --which you don’t have-- is simply:

run_reference_tensorflow()
build_trt_engine()
run_trt_engine()

The convert-to-uff binary comes with the Python UFF package provided with TensorRT. I installed TensorRT from .tar file and followed the procedure here : https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html#installing-tar

I use this to convert my .pb graph to .uff as indicated here https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#samplecode3

Also you are right I am using a virtual machine with GPU on Google Cloud Platform. Never had any problems with allocation errors / resources sharing in the past though.

Thanks for having a look!

NVES · October 29, 2018, 5:27pm

Hello,

The issue is that TensorFlow will reserve almost all the available GPU memory by default. One possible solution is to configure the session with memory limits:

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5) 
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options), graph=graph)

The ideal solution would be to release all the GPU memory after TensorFlow executes, but AFAIK this is not yet possible (see python - Clearing Tensorflow GPU memory after model execution - Stack Overflow and Tensorflow or cuda not giving back gpu memory after session closes · Issue #17048 · tensorflow/tensorflow · GitHub)

blancpaques · October 31, 2018, 9:51am

Thanks very much for the feedback!!

I understand this adress the “Cuda error in allocate” part. Do you by any chance have any more insight on the other error?

max_batch_size = 200
[TensorRT] ERROR: Tensor: Conv_0/Conv2D at max batch size of 200 exceeds the maximum element count of 2147483647

Topic		Replies	Views
`Cuda Error in allocate: 2` when building engine TensorRT	3	2934	October 12, 2021
TensorRT Engine Creation with Resnet50: [TensorRT] ERROR: resources.cpp (199) - Cuda Error in gieCudaMalloc: 2 CUDA Programming and Performance	4	2071	April 9, 2018
OutOfMemory Error in computeCosts: 0 TensorRT	2	916	October 12, 2021
could not find any implementation for node 2-layer MLP, try increasing the workspace size with IBuilder::setMaxWorkspaceSize() TensorRT	4	3766	October 12, 2021
Myelin memory budget exceeded while building TensorRT engine with batch > 1 TensorRT tensorrt	4	970	October 12, 2021
OOM of conv layer TensorRT	4	651	October 12, 2021
TensorRT Python API builder build_engine faiure - Error Code 2: OutOfMemory (no further information) TensorRT	1	1028	March 24, 2022
Internal Error meaning TensorRT	6	1402	February 3, 2020
Error in TFTRT TensorRT	9	3420	June 22, 2020
"Engine buffer is full" TensorRT	15	3723	October 12, 2021

TRT5.0: Memory error when building engine

Related topics