About the error 'NoneType' object has no attribute 'create_execution_context'

I try to use yolov3-mobilenet-v2 on my jetson nano and when I create the execution context it give me this error:

AttributeError                            Traceback (most recent call last)
<ipython-input-7-5e5d94a8baad> in __init__(self, model, input_shape, category_num, cuda_ctx)
     26             # 使用create_execution_context创建可推理上下文
---> 27             self.context = self.engine.create_execution_context()       
     28             self.inputs, self.outputs, self.bindings, self.stream = allocate_buffers(self.engine)

AttributeError: 'NoneType' object has no attribute 'create_execution_context'

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-15-7e90f933ff27> in <module>
      1 #12
      2 from IPython.display import Image
----> 3 main_one()
      4 Image("result.jpg")

<ipython-input-12-0b6b13d5f07a> in main_one()
      6     cls_dict = get_cls_dict("yolo_mobilenet_v2_traffic".split('_')[-1])
      7     model_name ="yolo_mobilenet_v2_traffic"
----> 8     trtYOLO = TrtYOLO(model_name, INPUT_HW)
      9     vis = BBoxVisualization(cls_dict)
     10     print("start detection!")

<ipython-input-7-5e5d94a8baad> in __init__(self, model, input_shape, category_num, cuda_ctx)
     29         # 返回报错信息
     30         except Exception as e:
---> 31             raise RuntimeError('fail to allocate CUDA resources') from e
     32         finally:
     33             if self.cuda_ctx:

RuntimeError: fail to allocate CUDA resources

for the environment:

1. Jetpack 4.4
2. Python 3.6.9
3. TensorRT 7.1
4. CUDA 10.2
5. Numpy 1.16.1
6. Opencv 4.1.1

Since I use jupyter, I cannot copy all the code directly here, here is the main function for inference:

	class TrtYOLO(object):
    """TrtYOLO class encapsulates things needed to run TRT YOLO."""
    def _load_engine(self):
        TRTbin = 'yolo/%s.bin' % self.model
        with open(TRTbin, 'rb') as f, trt.Runtime(self.trt_logger) as runtime:
            return runtime.deserialize_cuda_engine(f.read())
    def __init__(self, model, input_shape, category_num=80, cuda_ctx=None):
        """Initialize TensorRT plugins, engine and conetxt."""
        self.model = model
        self.input_shape = input_shape
        self.category_num = category_num
        self.cuda_ctx = cuda_ctx
        if self.cuda_ctx:
        self.inference_fn = do_inference if trt.__version__[0] < '7' else do_inference_v2
        self.trt_logger = trt.Logger(trt.Logger.INFO)
        self.engine = self._load_engine()
            self.context = self.engine.create_execution_context()
            self.inputs, self.outputs, self.bindings, self.stream = allocate_buffers(self.engine)
        except Exception as e:
            raise RuntimeError('fail to allocate CUDA resources') from e
            if self.cuda_ctx:
    def __del__(self):
        """Free CUDA memories."""
        del self.outputs
        del self.inputs
        del self.stream

    def detect(self, img, conf_th=0.3):
        """Detect objects in the input image."""
        img_resized = _preprocess_yolo(img, self.input_shape)

        # Set host input to the image. The do_inference() function
        # will copy the input to the GPU before executing.
        self.inputs[0].host = np.ascontiguousarray(img_resized)
        if self.cuda_ctx:
        trt_outputs = self.inference_fn(
        if self.cuda_ctx:

        boxes, scores, classes = _postprocess_yolo(trt_outputs, img.shape[1], img.shape[0], conf_th)

        # clip x1, y1, x2, y2 within original image
        boxes[:, [0, 2]] = np.clip(boxes[:, [0, 2]], 0, img.shape[1]-1)
        boxes[:, [1, 3]] = np.clip(boxes[:, [1, 3]], 0, img.shape[0]-1)
        return boxes, scores, classes

Here are some documents about the model I use and other related documents:

Hi @kevintgbd,
I believe Jetson team should be able to assist you better here. Hence request you to raise the query in respective forum.