Pycuda Error During running infrence on tensorrt on Jetson Nano

I was successfully able to convert my darknet yolov3 model to tensorrt and i also was able to run the prediction once. But when i ran it again it giving this error.
i used example from
/usr/src/tensorrt/samples/python/yolov3_onnx/
Since my darknet is custom with only 1 number of class there are 18 filters and input shape of 416
So i changed output_shapes = [(1, 18, 13, 13), (1, 18, 26, 26], (1, 18, 52, 52)]

Traceback (most recent call last):
  File "onnx_to_tensorrt.py", line 190, in <module>
    main()
  File "onnx_to_tensorrt.py", line 166, in main
    trt_outputs = common.do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
  File "/home/experio/Documents/yolov3_onnx/common.py", line 191, in do_inference_v2
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
  File "/home/experio/Documents/yolov3_onnx/common.py", line 191, in <listcomp>
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
pycuda._driver.LogicError: cuMemcpyHtoDAsync failed: invalid argument

the error is in common file:

    def do_inference_v2(context, bindings, inputs, outputs, stream):
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]

Hi,

The error indicates the buffer size doesn’t align.
For a customized model, you may also need to update the postprocessing configure:

postprocessor_args = {"yolo_masks": [(6, 7, 8), (3, 4, 5), (0, 1, 2)],                    # A list of 3 three-dimensional tuples for the YOLO masks
                      "yolo_anchors": [(10, 13), (16, 30), (33, 23), (30, 61), (62, 45),  # A list of 9 two-dimensional tuples for the YOLO anchors
                                       (59, 119), (116, 90), (156, 198), (373, 326)],
                      "obj_threshold": 0.6,                                               # Threshold for object coverage, float value between 0 and 1
                      "nms_threshold": 0.5,                                               # Threshold for non-max suppression algorithm, float value between 0 and 1
                      "yolo_input_resolution": input_resolution_yolov3_HW}

The informaiton can be found in the cfg file.
Could you give it a check, and do the corresponding update?

Thanks.

Hi,

I have checked anchors and marks in the custom cfg file, they are same. After running the code, still get the same error.

trt_outputs = common.do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

in common.py
def do_inference_v2(context, bindings, inputs, outputs, stream):
# Transfer input data to the GPU.
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]

This line is showing error. And I also not understand the why it’s show out at first time and its popping error.

Thanks.

Hi,

Would you mind to share the customized model (.cfg & weights) and the modified source (if any) with us?
We want to reproduce this in our environment first.

Thanks.

Hi,
here is the weight
here is cfg

Thanks

Hi,

Since your input is (416,416), you will also need to update the input dimension:

diff --git a/onnx_to_tensorrt.py b/onnx_to_tensorrt.py
index c4fd70b..86b8fb4 100644
--- a/onnx_to_tensorrt.py
+++ b/onnx_to_tensorrt.py
@@ -113,7 +113,7 @@ def get_engine(onnx_file_path, engine_file_path=""):
                         print (parser.get_error(error))
                     return None
             # The actual yolov3.onnx is generated with batch size 64. Reshape input to batch size 1
-            network.get_input(0).shape = [1, 3, 608, 608]
+            network.get_input(0).shape = [1, 3, 416, 416]
             print('Completed parsing of ONNX file')
             print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
             engine = builder.build_cuda_engine(network)
@@ -141,7 +141,7 @@ def main():
         'https://github.com/pjreddie/darknet/raw/f86901f6177dfc6116360a13cc06ab680e0c86b0/data/dog.jpg', checksum_reference=None)

     # Two-dimensional tuple with the target network's (spatial) input resolution in HW ordered
-    input_resolution_yolov3_HW = (608, 608)
+    input_resolution_yolov3_HW = (416, 416)
     # Create a pre-processor object by specifying the required input resolution for YOLOv3
     preprocessor = PreprocessYOLO(input_resolution_yolov3_HW)
     # Load an image from the specified input path, and return it together with  a pre-processed version
@@ -150,7 +150,8 @@ def main():
     shape_orig_WH = image_raw.size

     # Output shapes expected by the post-processor
-    output_shapes = [(1, 255, 19, 19), (1, 255, 38, 38), (1, 255, 76, 76)]
+    output_shapes = [(1, 18, 13, 13), (1, 18, 26, 26), (1, 18, 52, 52)]
+
     # Do inference with TensorRT
     trt_outputs = []
     with get_engine(onnx_file_path, engine_file_path) as engine, engine.create_execution_context() as context:

However, we meet some error when converting your model.
Could you check if the output dimension is correct or not first?

$ python3 onnx_to_tensorrt.py
Reading engine from file yolov3.trt
Running inference on image dog.jpg...
Traceback (most recent call last):
  File "onnx_to_tensorrt.py", line 186, in <module>
    main()
  File "onnx_to_tensorrt.py", line 166, in main
    trt_outputs = [output.reshape(shape) for output, shape in zip(trt_outputs, output_shapes)]
  File "onnx_to_tensorrt.py", line 166, in <listcomp>
    trt_outputs = [output.reshape(shape) for output, shape in zip(trt_outputs, output_shapes)]
ValueError: cannot reshape array of size 6498 into shape (1,18,13,13)

Thanks.