I wanted to give a quick try to TensorRT and ran into the following errors when building the engine from an UFF graph
[TensorRT] ERROR: Tensor: Conv_0/Conv2D at max batch size of 80 exceeds the maximum element count of 2147483647.
To solve this problem I had to reduce the builder max_batch_size parameter to 50 or so. Note that this is much less than the maximum batch size I am able to run using Tensorflow (around 200 before encountering OutOfMemory error). Why is that so?
(the convolution which the errors is referring to is a 3x3x1x64 convolution on patches of size 100x100)
[TensorRT] ERROR: runtime.cpp (24) - Cuda Error in allocate: 2
I have had this error several times and absolutely no clue on what was causing it. One way of getting around was to reduce the max_workspace_size parameter of the builder to let’s say a third of the total GPU memory (5Gb on a P100 with 16GB).
All in all I am not sure that I fully grasped what is behind these max_batch_size and max_workspace_size parameters. Any hints would be greatly appreciated.
Thanks
Edit: using TRT 5.0.0.10 with Cuda 9.0 and CUDNN 7.3
Please find enclosed a small .zip file with minimal set of dependencies to debug. Everything is pretty much explained in the main python script so I will be brief here. The zip contains:
A .pb file (tensorflow model export)
Converted uff graph
Tensor RT serialized engine
Python script which can be used to: run tensorflow on the reference inputs, build the TRT engine, run the TRT engine and compare with TF results
(The zip contains also reference inputs and outputs as nparray but this is of no use here)
First of all I have noticed that if I run tensorflow and then build the TRT engine IN THE SAME PYTHON PROCESS by launching the script with all option, then I systematically get “[TensorRT] ERROR: runtime.cpp (24) - Cuda Error in allocate: 2”
Playing around with the max_batch_size, patch_size and max_workspace_size_gb parameters in the main python file also results in the errors described above (exceed max element count of xxx and Cuda Error in allocate)
Example
max_batch_size = 200
[TensorRT] ERROR: Tensor: Conv_0/Conv2D at max batch size of 200 exceeds the maximum element count of 2147483647
Example (running on a p100 with 16Gb memory)
max_workspace_size_gb = 8
[TensorRT] ERROR: runtime.cpp (24) - Cuda Error in allocate: 2
[TensorRT] ERROR: runtime.cpp (24) - Cuda Error in allocate: 2
thanks for the repro. I don’t see an all option. (do i just call run_reference_tensorflow(), build_trt_engine, run_trt_engine() inline?)
also, I’m getting following error
root@4639a43cf129:/home/scratch.zhenyih_sw/reproduce.2421196/test_nvidia# python test_trt_for_nvidia.py -o build_trt
[TensorRT] INFO: Creating uff graph
Traceback (most recent call last):
File "test_trt_for_nvidia.py", line 263, in <module>
runner[args.o]()
File "test_trt_for_nvidia.py", line 138, in build_trt_engine
subprocess.check_call(['convert-to-uff', '-o', uff_file, tf_pb_file])
File "/usr/lib/python2.7/subprocess.py", line 536, in check_call
retcode = call(*popenargs, **kwargs)
File "/usr/lib/python2.7/subprocess.py", line 523, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
I have tf_pb_file = ‘best_deploy.pb’ and uff_file = ‘best_deploy.uff’, but don’t have convert-to-uff. I’m running from a trt5 container. are you running directly from metal/host?
Also you are right I am using a virtual machine with GPU on Google Cloud Platform. Never had any problems with allocation errors / resources sharing in the past though.
The issue is that TensorFlow will reserve almost all the available GPU memory by default. One possible solution is to configure the session with memory limits: