I’m currently attempting to convert an ONNX model originally exported based on this PyTorch I3D model. I exported this model using PyTorch 1.2.0 which seemed to have been successful. However, when use TensorRT 22.214.171.124 to build a cuda engine for accelerated inference I receive the following error:
[TensorRT] ERROR: Internal error: could not find any implementation for node (Unnamed Layer* 11) [Convolution] + (Unnamed Layer* 13) [Activation] || (Unnamed Layer* 17) [Convolution] + (Unnamed Layer* 19) [Activation], try increasing the workspace size with IBuilder::setMaxWorkspaceSize() [TensorRT] ERROR: ../builder/tacticOptimizer.cpp (1523) - OutOfMemory Error in computeCosts: 0
The following is the Python 3.7 code I’m using to build the engine. Note that
common is the common.py file from the TensorRT 126.96.36.199 samples/python directory.
import numpy as np import common import pycuda.driver as cuda import pycuda.autoinit import tensorrt as trt TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE) def main(): print('TensorRT Version:', trt.__version__) onnx_filename = 'model.onnx' def build_engine_onnx(model_file): with trt.Builder(TRT_LOGGER) as builder, \ builder.create_network(common.EXPLICIT_BATCH) as network, \ trt.OnnxParser(network, TRT_LOGGER) as parser: builder.max_workspace_size = int((1 << 32) * 1.61) builder.max_batch_size = 1 with open(model_file, 'rb') as model: parser.parse(model.read()) return builder.build_cuda_engine(network) with build_engine_onnx(onnx_filename) as engine: # failure occurs before reaching this point pass if __name__ == "__main__": main()
I have set the
builder.max_workspace_size to the largest I can for my GPU (2060 SUPER 8GB). Monitoring nvidia-smi I’m able to see that my GPU memory maxes out for a few seconds before finally dropping back to zero when the engine builder fails.
This model fits comfortably on my GPU when using PyTorch inference with a batch size of 5+ so it seems very strange that I wouldn’t be able to use this model with TensorRT with even a batch size of 1.
Is there something improper that I’m doing in the code which is causing TensorRT to use excess memory? Are there any workarounds or settings I can use to use less GPU memory? I’ve also tried using the
builder.fp16_mode = True which seems to allow the engine creation to proceed further along but it is still only able to process about 20% of the layers in the model before running out of memory.