pycuda._driver.LogicError: cuMemcpyHtoDAsync failed: invalid argument

I’ve been trying to run https://github.com/penolove/yolov3-tensorrt/blob/eyeWitnessWrapper/naive_detector.py . Created engine file with the yolov3_to_onnx.py and onnx_to_tensorrt.py scripts from the same repo without getting an error. But now when I try to run the inference script I get;

Traceback (most recent call last):
  File "naive_detector.py", line 171, in <module>
    detection_result = object_detector.detect(image_obj)
  File "naive_detector.py", line 82, in detect
    stream=self.stream)
  File "/home/orcun/yolov3-tensorrt/common.py", line 96, in do_inference
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
  File "/home/orcun/yolov3-tensorrt/common.py", line 96, in <listcomp>
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
pycuda._driver.LogicError: cuMemcpyHtoDAsync failed: invalid argument

Any ideas ?

I meet the same problem when I use tensorrt to do inference for onnx model. But according to this issue, such problem is offen associated with memory error. In my case, it finally turns out my input array size overpasses the memory allocated. So maybe check the input size and dtype match or not.

3 Likes

I just solved it somehow, guess It was because of something I changed in original code (something about output tensor)

Thanks Orcdnz, case closed.

The tensorrt engine input type is numpt.float32, but we give it an int array. In our code, we changed np.ones((1,1024, 1024, 3)) to np.ones((1, 1024, 1024, 3)).astype(np.float32)

3 Likes

As I see that error is pretty common. So wanted to give an overview of what might be wrong in people’s code. This error is usually raised when the allocated gpu memory and the actual data size we pass to that allocation is inconsistent. So you might want to check related parts in your code.

3 Likes

Love it when someone reaches a solution and shares the causation with everyone else, thank you this saved me a lot of time !

3 Likes

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

I have faces the similar issue while trying to pass input with larger batch size that expected. With bs=1 it works like a charm. To use larger bs, you need to create ONNX model without dynamic axes and use it

Posting here incase it will be useful for someone else

I got the same error when I tried to run inference with batch_size>1 (not yolov3, but some other classifier / reid model) and inference worked fined with batch_size=1.

It truns out that the engine was built with implicit batch size and for such engines we need to use execute_async and not execute_async_v2.

context.execute_async(batch_size=batch_size,
                          bindings=bindings,
                          stream_handle=stream.handle)

And we need to follow the folloing when allocating buffers

binding_dims = engine.get_binding_shape(binding)
        if len(binding_dims) == 4:
            # explicit batch case (TensorRT 7+)
            size = trt.volume(binding_dims)
        elif len(binding_dims) == 3:
            # implicit batch case (TensorRT 6 or older)
            size = trt.volume(binding_dims) * engine.max_batch_size

This allowed me to run inference on engines built with implicit batch size with batch_size>1

1 Like