Error while moving data from cuda-capable device to host memory - Error Code 1: Cuda Runtime (unspecified launch failure)

ranjeet_k · October 3, 2021, 9:39pm

I am following this tutorial for speeding up object detection inference on my nvidia jetson nano.
I have done the following steps:

Converted .onnx model to .plan (engine file) and saved to disk.
loaded the engine using the following code.

with open(plan_path, ‘rb’) as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)

Allocated buffers in host (memory) and device (GPU) for input data and output data.

h_input_1 = cuda.pagelocked_empty(batch_size *
trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype($

print(‘size hinput’, h_input_1.size)
h_output = cuda.pagelocked_empty(batch_size *
trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype($
print(trt.volume(engine.get_binding_shape(1)), "trt.volume(engine.get_bindin$
print(‘size h_output’, h_output.size)

d_input_1 = cuda.mem_alloc(h_input_1.nbytes)
print( h_input_1.nbytes)
print(d_input_1)

d_output = cuda.mem_alloc(h_output.nbytes)
print(d_output, h_output.nbytes)

stream = cuda.Stream()
return h_input_1, d_input_1, h_output, d_output, stream

Copy input data (image) from host memory to Device memory using cuda.memcpy_htod_async(d_input_1, h_input_1, stream) which is successfully completed.
Run the object detection network on loaded input data
Move output data from device memory to host memory using cuda.memcpy_dtoh_async (h_output, d_output, stream).

I am facing issue in step 6 which has been shown in the following image.

Moreover, I can successfully run the inference using detectnet executable code using the following command line:

detectnet --model=models/person/ssd-mobilenet.onnx --labels=models/person/labels.txt
–input-blob=input_0 --output-cvg=scores --output-bbox=boxes
“$IMAGES/person/000001.jpg” $IMAGES/test/person/000001.jpg

I believe (not sure) it creates a runtime_engine to do the inference. Also i could see that it is using GPU when it runs (using jtop).

I have the following questions:

What and why is the Cuda error and how to solve it?
I have shown above that I can use detectnet command to run inference. Which one should be preferred for real-time application and correct way of using it?
What is the differnce between the two ways?

AastaLLL · October 5, 2021, 3:12am

Hi,

1.
The error indicates the GPU task doesn’t finish yet.
Please add a synchronization call to make sure the job has finished.

2.
The inferenece backend are all TensorRT.
You can choose one based on your preference.

3.
If you need to combine with some multimedia usage, like camera input, encoder, decoder, …, etc.
You can also give our Deepstream SDK a try:

Thanks.

Topic		Replies	Views
Cuda transfer from device to host is extremely slow TensorRT cuda	5	2612	February 13, 2022
Pycuda Error During running infrence on tensorrt on Jetson Nano Jetson Nano tensorrt	6	1351	October 15, 2021
Error "pycuda._driver.LogicError: cuMemcpyHtoD failed: invalid device context" While using Flask and TensorRT engine TensorRT tensorrt , cuda , inference-server-triton	2	2186	November 10, 2021
Getting CUDA Error while runnin inference in Jetson Nano 2GB Jetson Nano tensorrt , nano2gb	4	463	October 15, 2021
TensorRT execution fails without error TensorRT	1	1021	February 4, 2020
Cuda Error in executeMemcpy: 1 (invalid argument) TensorRT	5	2268	October 13, 2021
TensorRT python API cpu Jetson AGX Xavier tensorrt	5	477	October 18, 2021
[TensorRT] ERROR: 1: [reformat.cu::NCHWToNCHHW2::1038] Error Code 1: Cuda Runtime (invalid resource handle) Jetson Nano tensorrt	5	5580	November 24, 2021
TensorRT error : cuMemcpyDtoHAsync failed: an illegal memory access was encountered TensorRT tensorrt , cuda	2	2200	April 7, 2021
TensorRT inference out of memory error TensorRT	1	2268	September 14, 2021

Error while moving data from cuda-capable device to host memory - Error Code 1: Cuda Runtime (unspecified launch failure)

Related topics