I’m currently attempting to convert an ONNX model originally exported based on this PyTorch I3D model. I exported this model using PyTorch 1.2.0 which seemed to have been successful. However, when use TensorRT 7.0.0.11 to build a cuda engine for accelerated inference I receive the following error:
[TensorRT] ERROR: Internal error: could not find any implementation for node (Unnamed Layer* 11) [Convolution] + (Unnamed Layer* 13) [Activation] || (Unnamed Layer* 17) [Convolution] + (Unnamed Layer* 19) [Activation], try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
[TensorRT] ERROR: ../builder/tacticOptimizer.cpp (1523) - OutOfMemory Error in computeCosts: 0
The following is the Python 3.7 code I’m using to build the engine. Note that common
is the common.py file from the TensorRT 7.0.0.11 samples/python directory.
import numpy as np
import common
import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
def main():
print('TensorRT Version:', trt.__version__)
onnx_filename = 'model.onnx'
def build_engine_onnx(model_file):
with trt.Builder(TRT_LOGGER) as builder, \
builder.create_network(common.EXPLICIT_BATCH) as network, \
trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_workspace_size = int((1 << 32) * 1.61)
builder.max_batch_size = 1
with open(model_file, 'rb') as model:
parser.parse(model.read())
return builder.build_cuda_engine(network)
with build_engine_onnx(onnx_filename) as engine:
# failure occurs before reaching this point
pass
if __name__ == "__main__":
main()
I have set the builder.max_workspace_size
to the largest I can for my GPU (2060 SUPER 8GB). Monitoring nvidia-smi I’m able to see that my GPU memory maxes out for a few seconds before finally dropping back to zero when the engine builder fails.
This model fits comfortably on my GPU when using PyTorch inference with a batch size of 5+ so it seems very strange that I wouldn’t be able to use this model with TensorRT with even a batch size of 1.
Is there something improper that I’m doing in the code which is causing TensorRT to use excess memory? Are there any workarounds or settings I can use to use less GPU memory? I’ve also tried using the builder.fp16_mode = True
which seems to allow the engine creation to proceed further along but it is still only able to process about 20% of the layers in the model before running out of memory.