OOM of conv layer

derekwong6666 · April 1, 2020, 2:25am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 7
GPU Type: 1070
Nvidia Driver Version: 410
CUDA Version: 10.0
CUDNN Version: 7.6
Operating System + Version: ubuntu16
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 1.13.2
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Hi NV,

I have tried 1G 2G,5G 7G 10G workspace, but it still doesn’t work. Any idea?

[TensorRT] VERBOSE: After concat removal: 437 layers
[TensorRT] VERBOSE: Graph construction and optimization completed in 0.335001 seconds.
[TensorRT] VERBOSE: Constructing optimization profile number 0 out of 1
*************** Autotuning format combination: Float(1,1216,428032,1284096) → Float(1,608,107008,6848512) ***************
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] WARNING: GPU memory allocation error during getBestTactic: model/encoder/densenet121/conv1/Conv2D + model/encoder/densenet121/conv1/Relu
[TensorRT] ERROR: Internal error: could not find any implementation for node model/encoder/densenet121/conv1/Conv2D + model/encoder/densenet121/conv1/Relu, try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
[TensorRT] ERROR: …/builder/tacticOptimizer.cpp (1523) - OutOfMemory Error in computeCosts: 0

snippet code:
# convert to TRT
G_LOGGER = trt.Logger(trt.Logger.VERBOSE)
trt.init_libnvinfer_plugins(G_LOGGER, “”)
builder = trt.Builder(G_LOGGER)
builder.max_batch_size = 1
builder.max_workspace_size = 1 << 30
network = builder.create_network()
parser = trt.UffParser()
parser.register_input(“model/focal”, trt.Dims([1]))
parser.register_input(“model/Placeholder”, trt.Dims3([3, 352, 1216]), trt.UffInputOrder.NCHW)
parser.register_output(“model/decoder/truediv_12”)
parser.parse_buffer(uff_model, network, trt.float32)
engine = builder.build_cuda_engine(network)

SunilJB · April 1, 2020, 7:00am

Hi,

You can try few things:

Check the memory consumption of original model on your system.
Try converting the model using ONNX parser using FP32 precision instead of using UFF parser.
You can also try using trtexec command in verbose mode to debug the issue.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
If issue persist, could you please share your model file?

Thanks

derekwong6666 · April 1, 2020, 7:31am

Thanks ur reply.

I check my tensorflow test pipeline. it takes ~7G.
my GPU is 1070 8G. so how to limit memory consumption during building engine?

Thanks.

derekwong6666 · April 1, 2020, 7:42am

Hi SunilJB,

I set
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

then set the max_workspace_size as 2G. it works.
Thanks for ur help.

Derek