Traceback (most recent call last):
File "naive_detector.py", line 171, in <module>
detection_result = object_detector.detect(image_obj)
File "naive_detector.py", line 82, in detect
stream=self.stream)
File "/home/orcun/yolov3-tensorrt/common.py", line 96, in do_inference
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
File "/home/orcun/yolov3-tensorrt/common.py", line 96, in <listcomp>
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
pycuda._driver.LogicError: cuMemcpyHtoDAsync failed: invalid argument
I meet the same problem when I use tensorrt to do inference for onnx model. But according to this issue, such problem is offen associated with memory error. In my case, it finally turns out my input array size overpasses the memory allocated. So maybe check the input size and dtype match or not.
The tensorrt engine input type is numpt.float32, but we give it an int array. In our code, we changed np.ones((1,1024, 1024, 3)) to np.ones((1, 1024, 1024, 3)).astype(np.float32)
As I see that error is pretty common. So wanted to give an overview of what might be wrong in people’s code. This error is usually raised when the allocated gpu memory and the actual data size we pass to that allocation is inconsistent. So you might want to check related parts in your code.
Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
validating your model with the below snippet
check_model.py
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command. https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!
I have faces the similar issue while trying to pass input with larger batch size that expected. With bs=1 it works like a charm. To use larger bs, you need to create ONNX model without dynamic axes and use it
Posting here incase it will be useful for someone else
I got the same error when I tried to run inference with batch_size>1 (not yolov3, but some other classifier / reid model) and inference worked fined with batch_size=1.
It truns out that the engine was built with implicit batch size and for such engines we need to use execute_async and not execute_async_v2.