AttributeError: 'NoneType' object has no attribute 'create_execution_context'

kshamaramesh.25 · January 9, 2020, 1:16pm

Hello,

I am trying to run from ONNX to tensorRT

while doing that conversion I was getting this error

context = engine.create_execution_context()
AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’

SunilJB · January 9, 2020, 6:13pm

Hi,
It seems TRT engine is not initialized properly.

Could you please share the script and model file so we can help better?
Also, can you provide details on the platforms you are using:
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version

Thanks

kshamaramesh.25 · January 9, 2020, 6:50pm

import tensorrt as trt
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda 
import time
model_path = "result.onnx"
input_size = 256
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
def build_engine(model_path):
    with trt.Builder(TRT_LOGGER) as builder,builder.create_network(EXPLICIT_BATCH) as network,trt.OnnxParser(network, TRT_LOGGER) as parser: 
        builder.max_workspace_size = 1<<20
        builder.max_batch_size = 1
        with open(model_path, "rb") as f:
            parser.parse(f.read())
        engine = builder.build_cuda_engine(network)
    return engine


#def inference(engine, context, inputs, out_cpu, in_gpu, out_gpu, stream):
    # async version
    # with engine.create_execution_context() as context:  # cost time to initialize
    # cuda.memcpy_htod_async(in_gpu, inputs, stream)
    # context.execute_async(1, [int(in_gpu), int(out_gpu)], stream.handle, None)
    # cuda.memcpy_dtoh_async(out_cpu, out_gpu, stream)
    # stream.synchronize()
def inference(engine, context, inputs,h_input, h_output, d_input, d_output,stream):
    cuda.memcpy_htod_async(d_input, h_input, stream)
	  # Run inference.
    context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
		# Transfer predictions back from the GPU.
    cuda.memcpy_dtoh_async(h_output, d_output, stream)
    return h_output

    '''
    # sync version
    cuda.memcpy_htod(in_gpu, inputs,stream)
    context.execute(1, [int(in_gpu), int(out_gpu)])
    cuda.memcpy_dtoh(out_cpu, out_gpu,stream)
    return out_cpu'''

def alloc_buf(engine):
    
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=np.float32)
    h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=np.float32)
    # Allocate device memory for inputs and outputs.
    d_input = cuda.mem_alloc(h_input.nbytes)
    d_output = cuda.mem_alloc(h_output.nbytes)
    stream=cuda.Stream()

    return h_input, h_output, d_input, d_output, stream
if __name__ == "__main__":
    for i in range(3):
      if(i==0):
        inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float32)
        engine = build_engine(model_path)
        print("Engine Created :",type(engine))
        context = engine.create_execution_context()
        print("Context executed ",type(context))
        serialized_engine = engine.serialize()
        t1 = time.time()
        #in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
        h_input, h_output, d_input, d_output,stream=alloc_buf(engine)
        res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
        #print(type(res))
        print("using fp32 mode:")
        print("cost time: ", time.time()-t1)
      if(i==1):
        inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float16)
        engine = build_engine(model_path)
        print("Engine Created :",type(engine))
        context = engine.create_execution_context()
        print("Context executed ",type(context))
        serialized_engine = engine.serialize()
        t1 = time.time()
        #in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
        h_input, h_output, d_input, d_output,stream=alloc_buf(engine)
        res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
        print(type(res))
        print("using fp16 mode:")
        print("cost time: ", time.time()-t1)
      if(i==2):
        inputs = np.random.random((1, 3, input_size, input_size)).astype(np.int8)
        engine = build_engine(model_path)
        print("Engine Created :",type(engine))
        context = engine.create_execution_context()
        print("Context executed ",type(context))
        serialized_engine = engine.serialize()
        t1 = time.time()
        #in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
        h_input, h_output, d_input, d_output,stream=alloc_buf(engine)
        res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
        #print(type(res))
        print("using int8 mode:")
        print("cost time: ", time.time()-t1)
    engine_path="FLtask.trt"
    with open(engine_path,"wb") as f:
      f.write(serialized_engine)
      print("Serialized engine")

Details 1 Linux distro and version -- Ubuntu 16.04 2 GPU type --- Nvidia Geforcex 3 Nvidia driver version - 418 4 Nvidia version - 10.0 5 python version -- 3.5 6 pytorch version -- 1.3.0 7 TensorRT version -- 6.0.1.5

SunilJB · January 10, 2020, 8:23am

Hi,
Issue seems to be due to “EXPLICIT_BATCH” setting in the code.
In TRT 7, ONNX parser supports full-dimensions mode only. Your network definition must be created with the explicitBatch flag set (when using ONNX parser).

Since you are using TRT 6, please replace it with below code

with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:

I tested on both TRT 6 (After code changes) and TRT 7 (without changes), it seems to be working fine on my test onnx model.

Engine Created 1: <class 'tensorrt.tensorrt.ICudaEngine'>
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
Context executed  <class 'tensorrt.tensorrt.IExecutionContext'>
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use enqueue without batch size instead.
using fp32 mode:
cost time:  0.00426483154296875


Engine Created 2: <class 'tensorrt.tensorrt.ICudaEngine'>
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
Context executed  <class 'tensorrt.tensorrt.IExecutionContext'>
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use enqueue without batch size instead.
<class 'numpy.ndarray'>
using fp16 mode:
cost time:  0.010645389556884766


Engine Created 3: <class 'tensorrt.tensorrt.ICudaEngine'>
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
Context executed  <class 'tensorrt.tensorrt.IExecutionContext'>
[TensorRT] WARNING: Explicit batch network detected and batch size specified, use enqueue without batch size instead.
using int8 mode:
cost time:  0.01060032844543457

Serialized engine
FLtask.trt  ---> File generated at the end

Thanks

kshamaramesh.25 · January 10, 2020, 11:24am

Hi,

I think I am not able to resolve the issue after changing the code too.

Can you please help me resolve this.

I have attached the onnx file to this.

kshamaramesh.25 · January 10, 2020, 12:46pm

Hello,

I did the changed the code according but still finding the issue. I have attached which version of tensorRT I have the screenshot of the installed tensorRt

Please find code changed below

import tensorrt as trt
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import time

model_path = "/home/bplus/Desktop/UNIT_MASTER_THESIS/UNIT/unit1/UNIT_Working/result1.onnx"
input_size = 256
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
#EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

def build_engine(model_path):
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_workspace_size = 1 << 20
        builder.max_batch_size = 1
        with open(model_path, "rb") as f:
            parser.parse(f.read())
        engine = builder.build_cuda_engine(network)
    return engine

# def inference(engine, context, inputs, out_cpu, in_gpu, out_gpu, stream):
# async version
# with engine.create_execution_context() as context:  # cost time to initialize
# cuda.memcpy_htod_async(in_gpu, inputs, stream)
# context.execute_async(1, [int(in_gpu), int(out_gpu)], stream.handle, None)
# cuda.memcpy_dtoh_async(out_cpu, out_gpu, stream)
# stream.synchronize()
def inference(engine, context, inputs, h_input, h_output, d_input, d_output, stream):
    cuda.memcpy_htod_async(d_input, h_input, stream)
    # Run inference.
    context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    cuda.memcpy_dtoh_async(h_output, d_output, stream)
    return h_output

    '''
    # sync version
    cuda.memcpy_htod(in_gpu, inputs,stream)
    context.execute(1, [int(in_gpu), int(out_gpu)])
    cuda.memcpy_dtoh(out_cpu, out_gpu,stream)
    return out_cpu'''

def alloc_buf(engine):
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=np.float32)
    h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=np.float32)
    # Allocate device memory for inputs and outputs.
    d_input = cuda.mem_alloc(h_input.nbytes)
    d_output = cuda.mem_alloc(h_output.nbytes)
    stream = cuda.Stream()

    return h_input, h_output, d_input, d_output, stream

if __name__ == "__main__":
    for i in range(3):
        if (i == 0):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float32)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            # print(type(res))
            print("using fp32 mode:")
            print("cost time: ", time.time() - t1)
        if (i == 1):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float16)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            print(type(res))
            print("using fp16 mode:")
            print("cost time: ", time.time() - t1)
        if (i == 2):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.int8)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            # print(type(res))
            print("using int8 mode:")
            print("cost time: ", time.time() - t1)
    engine_path = "FLtask.trt"
    with open(engine_path, "wb") as f:
        f.write(serialized_engine)
        print("Serialized engine")

SunilJB · January 10, 2020, 1:17pm

Hi,

Try changing the workspace size, something like

builder.max_workspace_size = 1<<30

If issue persist, could you please share the complete error log and model file so we can better help?

Thanks

kshamaramesh.25 · January 10, 2020, 1:28pm

Hello,

I issue still persist this is the complete error log file.

[b]/usr/bin/python3.5 /home/bplus/Desktop/UNIT_MASTER_THESIS/UNIT/unit1/UNIT_Working/sample_code.py
Engine Created : <class ‘NoneType’>
[TensorRT] ERROR: Network must have at least one output
Traceback (most recent call last):
File “/home/bplus/Desktop/UNIT_MASTER_THESIS/UNIT/unit1/UNIT_Working/sample_code.py”, line 62, in
context = engine.create_execution_context()
AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’

Process finished with exit code 1[/b]

Please find the attachments of the model file below

SunilJB · January 10, 2020, 3:08pm

Hi,

This specific issue is arising because the ONNX Parser isn’t currently compatible with the ONNX models exported from Pytorch 1.3 - If you downgrade to Pytorch 1.2, this issue should go away.

Or Upgrade to TRT 7. Latest TRT 7 supports Pytorch 1.3.

Thanks

kshamaramesh.25 · January 10, 2020, 6:00pm

Hello,

I tried upgrading the tensorRT version to 7 and changed the pytorch version to 1.3 please find the attachment of the screenshot of the tensorRT 7 version and pytorch but still finding the issue with it.

import tensorrt as trt
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import time

model_path = "/home/bplus/Desktop/UNIT_MASTER_THESIS/UNIT/unit1/UNIT_Working/result7.onnx"
input_size = 256
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)


def build_engine(model_path):
    with trt.Builder(TRT_LOGGER) as builder,builder.create_network(EXPLICIT_BATCH) as network,trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_workspace_size = 1<<30
        builder.max_batch_size = 1
        with open(model_path, "rb") as f:
            parser.parse(f.read())
        engine = builder.build_cuda_engine(network)
    return engine
# def inference(engine, context, inputs, out_cpu, in_gpu, out_gpu, stream):
# async version
# with engine.create_execution_context() as context:  # cost time to initialize
# cuda.memcpy_htod_async(in_gpu, inputs, stream)
# context.execute_async(1, [int(in_gpu), int(out_gpu)], stream.handle, None)
# cuda.memcpy_dtoh_async(out_cpu, out_gpu, stream)
# stream.synchronize()
def inference(engine, context, inputs, h_input, h_output, d_input, d_output, stream):
    cuda.memcpy_htod_async(d_input, h_input, stream)
    # Run inference.
    context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    cuda.memcpy_dtoh_async(h_output, d_output, stream)
    return h_output

    '''
    # sync version
    cuda.memcpy_htod(in_gpu, inputs,stream)
    context.execute(1, [int(in_gpu), int(out_gpu)])
    cuda.memcpy_dtoh(out_cpu, out_gpu,stream)
    return out_cpu'''


def alloc_buf(engine):
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=np.float32)
    h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=np.float32)
    # Allocate device memory for inputs and outputs.
    d_input = cuda.mem_alloc(h_input.nbytes)
    d_output = cuda.mem_alloc(h_output.nbytes)
    stream = cuda.Stream()

    return h_input, h_output, d_input, d_output, stream


if __name__ == "__main__":
    for i in range(3):
        if (i == 0):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float32)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            # print(type(res))
            print("using fp32 mode:")
            print("cost time: ", time.time() - t1)
        if (i == 1):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.float16)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            print(type(res))
            print("using fp16 mode:")
            print("cost time: ", time.time() - t1)
        if (i == 2):
            inputs = np.random.random((1, 3, input_size, input_size)).astype(np.int8)
            engine = build_engine(model_path)
            print("Engine Created :", type(engine))
            context = engine.create_execution_context()
            print("Context executed ", type(context))
            serialized_engine = engine.serialize()
            t1 = time.time()
            # in_cpu, out_cpu, in_gpu, out_gpu, stream = alloc_buf(engine)
            h_input, h_output, d_input, d_output, stream = alloc_buf(engine)
            res = inference(engine, context, inputs.reshape(-1), h_input, h_output, d_input, d_output, stream)
            # print(type(res))
            print("using int8 mode:")
            print("cost time: ", time.time() - t1)
    engine_path = "FLtask.trt"
    with open(engine_path, "wb") as f:
        f.write(serialized_engine)
        print("Serialized engine")

Please help me to resolve this

SunilJB · January 10, 2020, 6:44pm

Hi,

If possible, could you please share your model file along with error log so we can further debug the issue?
Or
To validate/debug the your ONNX model you can use “trtexec” command in --verbose mode and share the output log.

“trtexec” command line tool can be used for benchmarking & generating serialized engines from models.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-601/tensorrt-developer-guide/index.html#command-line-programs
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec#example-4-running-an-onnx-model-with-full-dimensions-and-dynamic-shapes
In TRT 7 you need to use the explicit mode while running the command.

Thanks

kshamaramesh.25 · January 10, 2020, 7:00pm

Hello ,

I have tried it by using the explicit mode please find the error log please find the attachments of model and onnx file

[TensorRT] ERROR: Network must have at least one output
[TensorRT] ERROR: Network validation failed.
Engine Created : <class ‘NoneType’>
Traceback (most recent call last):
File “/home/bplus/Desktop/UNIT_MASTER_THESIS/UNIT/unit1/UNIT_Working/sample_code.py”, line 61, in
context = engine.create_execution_context()
AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’

SunilJB · January 10, 2020, 7:59pm

Hi,

Could you please share the generated ONNX file?

In earlier post explicit mode was mentioned to be used while using trtexec command.

Meanwhile, please try to validate/debug the your ONNX model you can use “trtexec” command in --verbose mode and share the output log.

“trtexec” command line tool can be used for benchmarking & generating serialized engines from models.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-601/tensorrt-developer-guide/index.html#command-line-programs
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec#example-4-running-an-onnx-model-with-full-dimensions-and-dynamic-shapes

Thanks

kshamaramesh.25 · January 13, 2020, 8:03am

Hello,

I am not able to resolve this so can you please help resolve this .

Please find the attachment of the onnx model

SunilJB · January 13, 2020, 10:20am

Hi,

For some reason, attached ONNX model file is not visible to me.
Could you please re-upload the model file?

Also, could you please share the error log that you got while running trtexec command?

Thanks

kshamaramesh.25 · January 13, 2020, 12:16pm

Hi,

I don’t know I am not able to upload the onnx file.

Can you please say anyway I can upload

kshamaramesh.25 · January 13, 2020, 12:17pm

Hi,

I don’t know I am not able to upload the onnx file.

Can you please say anyway I can upload

SunilJB · January 13, 2020, 12:38pm

Hi,

You can try uploading the zip file as done earlier for .pt model or you can upload the model to third party drive and share the link in the comment.

Thanks

kshamaramesh.25 · January 13, 2020, 12:53pm

Hi,

Please find the below for the converted onnx.

Please let me know if you require anything.

SunilJB · January 14, 2020, 8:59am

Hi,

The issue seems to be due to the ONNX model optimization failure.
ONNX model is using “Pad” operation in “reflect” mode, and TensorRT supports “Pad” operation only in “constant” mode.

Either you update the “Pad” operation to use “constant” mode or create a custom plugin needs to be replace “Pad” operation with “reflect” mode.
Please refer below link for more info regarding “Pad” operation:
https://github.com/onnx/onnx/blob/master/docs/Operators.md#Pad

“trtexec” command output using current model:

ERROR: builtin_op_importers.cpp:2081 In function importPad:
[8] Assertion failed: mode == "constant" && value == 0 && "This version of TensorRT only supports constant 0 padding!"
[01/14/2020-05:33:27] [E] Failed to parse onnx file
[01/14/2020-05:33:27] [E] Parsing model failed

Thanks

Topic		Replies	Views
Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size) TensorRT	10	1294	October 12, 2021
Error while building TensorRT OSS 8.0.1 TensorRT	29	3317	July 16, 2021
TensorRT get different result in python and c++ TensorRT	21	2893	August 24, 2022
Issues with torch.nn.ReflectionPad2d(padding) conversion to TRT engine TensorRT tensorrt , pytorch , onnx	21	4199	February 8, 2022
Encountered known unsupported method torch.max_pool3d DeepStream SDK	12	1261	October 12, 2021
Batch Inference Wrong in Python API TensorRT	15	3555	October 12, 2021
'nv_onnx_parser_bindings.ONNXParser' object has no attribute 'convert_to_trt_network' TensorRT	5	792	June 11, 2020
API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::1480, condition: allInputDimensionsSpecified(routine) TensorRT tensorrt , cuda , natural-language-processing-nlp	6	11524	February 1, 2024
[TensorRT] ERROR: Network must have at least one output TensorRT tensorrt	29	2411	September 30, 2021
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	969	September 29, 2022

AttributeError: 'NoneType' object has no attribute 'create_execution_context'

Related topics