pycuda._driver.LogicError: cuMemcpyHtoDAsync failed: invalid argument

I’ve been trying to run https://github.com/penolove/yolov3-tensorrt/blob/eyeWitnessWrapper/naive_detector.py . Created engine file with the yolov3_to_onnx.py and onnx_to_tensorrt.py scripts from the same repo without getting an error. But now when I try to run the inference script I get;

Traceback (most recent call last):
  File "naive_detector.py", line 171, in <module>
    detection_result = object_detector.detect(image_obj)
  File "naive_detector.py", line 82, in detect
    stream=self.stream)
  File "/home/orcun/yolov3-tensorrt/common.py", line 96, in do_inference
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
  File "/home/orcun/yolov3-tensorrt/common.py", line 96, in <listcomp>
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
pycuda._driver.LogicError: cuMemcpyHtoDAsync failed: invalid argument

Any ideas ?

I meet the same problem when I use tensorrt to do inference for onnx model. But according to this issue, such problem is offen associated with memory error. In my case, it finally turns out my input array size overpasses the memory allocated. So maybe check the input size and dtype match or not.

I just solved it somehow, guess It was because of something I changed in original code (something about output tensor)

Thanks Orcdnz, case closed.

The tensorrt engine input type is numpt.float32, but we give it an int array. In our code, we changed np.ones((1,1024, 1024, 3)) to np.ones((1, 1024, 1024, 3)).astype(np.float32)

1 Like

As I see that error is pretty common. So wanted to give an overview of what might be wrong in people’s code. This error is usually raised when the allocated gpu memory and the actual data size we pass to that allocation is inconsistent. So you might want to check related parts in your code.