Segmentation fault when creating the trt.Builder in python, works fine with trtexec

I have a network in ONNX format (opset 11). Importing in in python,

import tensorrt as trt

TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with trt.Builder(TRT_LOGGER) as builder:
  with builder.create_network(EXPLICIT_BATCH) as network:
    with trt.OnnxParser(network, TRT_LOGGER) as parser:
      with open('net.onnx', 'rb') as model:
        parser.parse(model.read())

completes successfully. However,

builder.set_max_batch_size = 1

crashes the interpreter with a Segmentation fault (core dumped).

Using

trtexec --onnx=net.onnx --verbose

works find though. Any idea what could go wrong in python?

Hi,

This python snippet:

TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with trt.Builder(TRT_LOGGER) as builder, \
     builder.create_network(EXPLICIT_BATCH) as network, \
     builder.create_builder_config() as config, \
     trt.OnnxParser(network, TRT_LOGGER) as parser:
        
    with open(args.onnx, "rb") as f:
        if not parser.parse(f.read()):
            print('ERROR: Failed to parse the ONNX file: {}'.format(args.onnx))
            for error in range(parser.num_errors):
                print(parser.get_error(error))
            sys.exit(1)

Should be more or less equivalent to:

trtexec --explicitBatch --onnx=net.onnx --verbose

Hopefully with the additional error handling in my snippet above, you might get some more insight into what your segfault was caused from.

However, in general, setting the builder.max_batch_size variable only makes sense for implicit batch models. Additionally (in case it wasn’t just a typo), I don’t believe there is a “set_max_batch_size” parameter, only a “max_batch_size” parameter.

For explicit batch models (like your code above), you can create optimization profiles to specify various ranges of batch sizes instead: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#opt_profiles

Thank you for your answer. Your snippet did not work out of the box because of parser missing. I tried this:

TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with trt.Builder(TRT_LOGGER) as builder:
  with builder.create_network(EXPLICIT_BATCH) as network:
    with trt.OnnxParser(network, TRT_LOGGER) as parser:
      with open('net.onnx', 'rb') as f:
        if not parser.parse(f.read()):
          print('ERROR: Failed to parse the ONNX file: {}'.format(args.onnx))
          for error in range(parser.num_errors):
            print(parser.get_error(error))
        else:
          print('no error')

which prints no error.

I tried trtexec with the extra --explicitBatch command line argument, and it still prints “PASSED”.

Any other idea?

EDIT: yes, it was indeed builder.max_batch_size, not set_max_batch_size. Actually even just doing a tab completion of builder. under ipython or jupyter makes the interpreter crash, so the crash is not due to max_batch_size itself – it seems that the builder object is corrupt.

Hi,

Can you try referring to this code as a reference for parsing network and building the engine with the Python API and see if you still are getting issues?

https://github.com/rmccorm4/tensorrt-utils/blob/20.01/classification/imagenet/onnx_to_tensorrt.py

Example:

python3 onnx_to_tensorrt.py --explicit-batch --onnx=net.onnx

It works with this script, thanks!