AttributeError: 'NoneType' object has no attribute 'create_execution_context'

Hello,

I am trying to run from ONNX to tensorRT

while doing that conversion I was getting this error

context = engine.create_execution_context()
AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’

Hi,
It seems TRT engine is not initialized properly.

Could you please share the script and model file so we can help better?
Also, can you provide details on the platforms you are using:
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version

Thanks

import tensorrt as trt
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda 
import time
model_path = "result.onnx"
input_size = 256
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
def build_engine(model_path):
    with trt.Builder(TRT_LOGGER) as builder,builder.create_network(EXPLICIT_BATCH) as network,trt.OnnxParser(network, TRT_LOGGER) as parser: 
        builder.max_workspace_size = 1<<20
        builder.max_batch_size = 1
        with open(model_path, "rb") as f:
            parser.parse(f.read())
        engine = builder.build_cuda_engine(network)
    return engine


#def inference(engine, context, inputs, out_cpu, in_gpu, out_gpu, stream):
    # async version
    # with engine.create_execution_context() as context:  # cost time to initialize
    # cuda.memcpy_htod_async(in_gpu, inputs, stream)
    # context.execute_async(1, [int(in_gpu), int(out_gpu)], stream.handle, None)
    # cuda.memcpy_dtoh_async(out_cpu, out_gpu, stream)
    # stream.synchronize()
def inference(engine, context, inputs,h_input, h_output, d_input, d_output,stream):
    cuda.memcpy_htod_async(d_input, h_input, stream)
	  # Run inference.
    context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
		# Transfer predictions back from the GPU.
    cuda.memcpy_dtoh_async(h_output, d_output, stream)
    return h_output

    '''
    # sync version
    cuda.memcpy_htod(in_gpu, inputs,stream)
    context.execute(1, [int(in_gpu), int(out_gpu)])
    cuda.memcpy_dtoh(out_cpu, out_gpu,stream)
    return out_cpu'''

def alloc_buf(engine):
    
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=np.float32)
    h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=np.float32)
    # Allocate device memory for inputs and outputs.
    d_input = cuda.mem_alloc(h_input.nbytes)
    d_output = cuda.mem_alloc(h_output.nbytes)
    stream=cuda.Stream()

    return h_input, h_output, d_input, d_output, stream
if __name__ == "__main__":
    for i in range(3):
      if(i==0):
        inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float32)
        engine = build_engine(model_path)
        print("Engine Created :",type(engine))
        context = engine.create_execution_context()
        print("Context executed ",type(context))
        serialized_engine = engine.serialize()
        t1 = time.time()
        #in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
        h_input, h_output, d_input, d_output,stream=alloc_buf(engine)
        res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
        #print(type(res))
        print("using fp32 mode:")
        print("cost time: ", time.time()-t1)
      if(i==1):
        inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float16)
        engine = build_engine(model_path)
        print("Engine Created :",type(engine))
        context = engine.create_execution_context()
        print("Context executed ",type(context))
        serialized_engine = engine.serialize()
        t1 = time.time()
        #in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
        h_input, h_output, d_input, d_output,stream=alloc_buf(engine)
        res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
        print(type(res))
        print("using fp16 mode:")
        print("cost time: ", time.time()-t1)
      if(i==2):
        inputs = np.random.random((1, 3, input_size, input_size)).astype(np.int8)
        engine = build_engine(model_path)
        print("Engine Created :",type(engine))
        context = engine.create_execution_context()
        print("Context executed ",type(context))
        serialized_engine = engine.serialize()
        t1 = time.time()
        #in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
        h_input, h_output, d_input, d_output,stream=alloc_buf(engine)
        res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
        #print(type(res))
        print("using int8 mode:")
        print("cost time: ", time.time()-t1)
    engine_path="FLtask.trt"
    with open(engine_path,"wb") as f:
      f.write(serialized_engine)
      print("Serialized engine")
  1. Details 1 Linux distro and version -- Ubuntu 16.04 2 GPU type --- Nvidia Geforcex 3 Nvidia driver version - 418 4 Nvidia version - 10.0 5 python version -- 3.5 6 pytorch version -- 1.3.0 7 TensorRT version -- 6.0.1.5

Hi,
Issue seems to be due to “EXPLICIT_BATCH” setting in the code.
In TRT 7, ONNX parser supports full-dimensions mode only. Your network definition must be created with the explicitBatch flag set (when using ONNX parser).

Since you are using TRT 6, please replace it with below code

with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:

I tested on both TRT 6 (After code changes) and TRT 7 (without changes), it seems to be working fine on my test onnx model.

Engine Created 1: <class 'tensorrt.tensorrt.ICudaEngine'>
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
Context executed  <class 'tensorrt.tensorrt.IExecutionContext'>
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use enqueue without batch size instead.
using fp32 mode:
cost time:  0.00426483154296875


Engine Created 2: <class 'tensorrt.tensorrt.ICudaEngine'>
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
Context executed  <class 'tensorrt.tensorrt.IExecutionContext'>
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use enqueue without batch size instead.
<class 'numpy.ndarray'>
using fp16 mode:
cost time:  0.010645389556884766


Engine Created 3: <class 'tensorrt.tensorrt.ICudaEngine'>
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
Context executed  <class 'tensorrt.tensorrt.IExecutionContext'>
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use enqueue without batch size instead.
using int8 mode:
cost time:  0.01060032844543457

Serialized engine
FLtask.trt  ---> File generated at the end

Thanks

Hi,

I think I am not able to resolve the issue after changing the code too.

Can you please help me resolve this.

I have attached the onnx file to this.

Hello,

I did the changed the code according but still finding the issue. I have attached which version of tensorRT I have the screenshot of the installed tensorRt

Please find code changed below

import tensorrt as trt
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import time

model_path = "/home/bplus/Desktop/UNIT_MASTER_THESIS/UNIT/unit1/UNIT_Working/result1.onnx"
input_size = 256
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
#EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

def build_engine(model_path):
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_workspace_size = 1 << 20
        builder.max_batch_size = 1
        with open(model_path, "rb") as f:
            parser.parse(f.read())
        engine = builder.build_cuda_engine(network)
    return engine

# def inference(engine, context, inputs, out_cpu, in_gpu, out_gpu, stream):
# async version
# with engine.create_execution_context() as context:  # cost time to initialize
# cuda.memcpy_htod_async(in_gpu, inputs, stream)
# context.execute_async(1, [int(in_gpu), int(out_gpu)], stream.handle, None)
# cuda.memcpy_dtoh_async(out_cpu, out_gpu, stream)
# stream.synchronize()
def inference(engine, context, inputs, h_input, h_output, d_input, d_output, stream):
    cuda.memcpy_htod_async(d_input, h_input, stream)
    # Run inference.
    context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    cuda.memcpy_dtoh_async(h_output, d_output, stream)
    return h_output

    '''
    # sync version
    cuda.memcpy_htod(in_gpu, inputs,stream)
    context.execute(1, [int(in_gpu), int(out_gpu)])
    cuda.memcpy_dtoh(out_cpu, out_gpu,stream)
    return out_cpu'''

def alloc_buf(engine):
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=np.float32)
    h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=np.float32)
    # Allocate device memory for inputs and outputs.
    d_input = cuda.mem_alloc(h_input.nbytes)
    d_output = cuda.mem_alloc(h_output.nbytes)
    stream = cuda.Stream()

    return h_input, h_output, d_input, d_output, stream

if __name__ == "__main__":
    for i in range(3):
        if (i == 0):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float32)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            # print(type(res))
            print("using fp32 mode:")
            print("cost time: ", time.time() - t1)
        if (i == 1):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float16)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            print(type(res))
            print("using fp16 mode:")
            print("cost time: ", time.time() - t1)
        if (i == 2):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.int8)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            # print(type(res))
            print("using int8 mode:")
            print("cost time: ", time.time() - t1)
    engine_path = "FLtask.trt"
    with open(engine_path, "wb") as f:
        f.write(serialized_engine)
        print("Serialized engine")

Hi,

Try changing the workspace size, something like

builder.max_workspace_size = 1<<30

If issue persist, could you please share the complete error log and model file so we can better help?

Thanks

Hello,

I issue still persist this is the complete error log file.

[b]/usr/bin/python3.5 /home/bplus/Desktop/UNIT_MASTER_THESIS/UNIT/unit1/UNIT_Working/sample_code.py
Engine Created : <class ‘NoneType’>
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
File “/home/bplus/Desktop/UNIT_MASTER_THESIS/UNIT/unit1/UNIT_Working/sample_code.py”, line 62, in
context = engine.create_execution_context()
AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’

Process finished with exit code 1[/b]

Please find the attachments of the model file below

Hi,

This specific issue is arising because the ONNX Parser isn’t currently compatible with the ONNX models exported from Pytorch 1.3 - If you downgrade to Pytorch 1.2, this issue should go away.

Or Upgrade to TRT 7. Latest TRT 7 supports Pytorch 1.3.

Thanks

Hello,

I tried upgrading the tensorRT version to 7 and changed the pytorch version to 1.3 please find the attachment of the screenshot of the tensorRT 7 version and pytorch but still finding the issue with it.

import tensorrt as trt
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import time

model_path = "/home/bplus/Desktop/UNIT_MASTER_THESIS/UNIT/unit1/UNIT_Working/result7.onnx"
input_size = 256
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)


def build_engine(model_path):
    with trt.Builder(TRT_LOGGER) as builder,builder.create_network(EXPLICIT_BATCH) as network,trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_workspace_size = 1<<30
        builder.max_batch_size = 1
        with open(model_path, "rb") as f:
            parser.parse(f.read())
        engine = builder.build_cuda_engine(network)
    return engine
# def inference(engine, context, inputs, out_cpu, in_gpu, out_gpu, stream):
# async version
# with engine.create_execution_context() as context:  # cost time to initialize
# cuda.memcpy_htod_async(in_gpu, inputs, stream)
# context.execute_async(1, [int(in_gpu), int(out_gpu)], stream.handle, None)
# cuda.memcpy_dtoh_async(out_cpu, out_gpu, stream)
# stream.synchronize()
def inference(engine, context, inputs, h_input, h_output, d_input, d_output, stream):
    cuda.memcpy_htod_async(d_input, h_input, stream)
    # Run inference.
    context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    cuda.memcpy_dtoh_async(h_output, d_output, stream)
    return h_output

    '''
    # sync version
    cuda.memcpy_htod(in_gpu, inputs,stream)
    context.execute(1, [int(in_gpu), int(out_gpu)])
    cuda.memcpy_dtoh(out_cpu, out_gpu,stream)
    return out_cpu'''


def alloc_buf(engine):
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=np.float32)
    h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=np.float32)
    # Allocate device memory for inputs and outputs.
    d_input = cuda.mem_alloc(h_input.nbytes)
    d_output = cuda.mem_alloc(h_output.nbytes)
    stream = cuda.Stream()

    return h_input, h_output, d_input, d_output, stream


if __name__ == "__main__":
    for i in range(3):
        if (i == 0):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float32)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            # print(type(res))
            print("using fp32 mode:")
            print("cost time: ", time.time() - t1)
        if (i == 1):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float16)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            print(type(res))
            print("using fp16 mode:")
            print("cost time: ", time.time() - t1)
        if (i == 2):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.int8)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            # print(type(res))
            print("using int8 mode:")
            print("cost time: ", time.time() - t1)
    engine_path = "FLtask.trt"
    with open(engine_path, "wb") as f:
        f.write(serialized_engine)
        print("Serialized engine")

Please help me to resolve this

Hi,

If possible, could you please share your model file along with error log so we can further debug the issue?
Or
To validate/debug the your ONNX model you can use “trtexec” command in --verbose mode and share the output log.

“trtexec” command line tool can be used for benchmarking & generating serialized engines from models.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-601/tensorrt-developer-guide/index.html#command-line-programs
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec#example-4-running-an-onnx-model-with-full-dimensions-and-dynamic-shapes
In TRT 7 you need to use the explicit mode while running the command.

Thanks

Hello ,

I have tried it by using the explicit mode please find the error log please find the attachments of model and onnx file

[TensorRT] ERROR: Network must have at least one output
[TensorRT] ERROR: Network validation failed.
Engine Created : <class ‘NoneType’>
Traceback (most recent call last):
File “/home/bplus/Desktop/UNIT_MASTER_THESIS/UNIT/unit1/UNIT_Working/sample_code.py”, line 61, in
context = engine.create_execution_context()
AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’

Hi,

Could you please share the generated ONNX file?

In earlier post explicit mode was mentioned to be used while using trtexec command.

Meanwhile, please try to validate/debug the your ONNX model you can use “trtexec” command in --verbose mode and share the output log.

“trtexec” command line tool can be used for benchmarking & generating serialized engines from models.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-601/tensorrt-developer-guide/index.html#command-line-programs
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec#example-4-running-an-onnx-model-with-full-dimensions-and-dynamic-shapes

Thanks

Hello,

I am not able to resolve this so can you please help resolve this .

Please find the attachment of the onnx model

Hi,

For some reason, attached ONNX model file is not visible to me.
Could you please re-upload the model file?

Also, could you please share the error log that you got while running trtexec command?

Thanks

Hi,

I don’t know I am not able to upload the onnx file.

Can you please say anyway I can upload

Hi,

I don’t know I am not able to upload the onnx file.

Can you please say anyway I can upload

Hi,

You can try uploading the zip file as done earlier for .pt model or you can upload the model to third party drive and share the link in the comment.

Thanks

Hi,

Please find the below for the converted onnx.

Please let me know if you require anything.

Hi,

The issue seems to be due to the ONNX model optimization failure.
ONNX model is using “Pad” operation in “reflect” mode, and TensorRT supports “Pad” operation only in “constant” mode.

Either you update the “Pad” operation to use “constant” mode or create a custom plugin needs to be replace “Pad” operation with “reflect” mode.
Please refer below link for more info regarding “Pad” operation:
https://github.com/onnx/onnx/blob/master/docs/Operators.md#Pad

“trtexec” command output using current model:

ERROR: builtin_op_importers.cpp:2081 In function importPad:
[8] Assertion failed: mode == "constant" && value == 0 && "This version of TensorRT only supports constant 0 padding!"
[01/14/2020-05:33:27] [E] Failed to parse onnx file
[01/14/2020-05:33:27] [E] Parsing model failed

Thanks