Hi
I transform my NN to TensorRT in python API.
I want to verified the results before moving to C++ API.
But I can’t transform the output of the NN to numpy.
Because the results are on gpu.
And when I try to move the reuslts by:
output.cpu()
I get the error:
RuntimeError: Cuda error: unspecified launch failure.
How do I solved this issue?
Thanks,
Avi
Hi,
Would you mind to share your code with us?
In general, you will need to copy the buffer back to CPU with pyCUDA like this:
cuda.memcpy_dtoh_async(host_outputs[1], cuda_outputs[1], stream)
If you are using unified memory, which is one kind of shared memory between CPU and GPU.
Please remember to call stream.synchronize()
before accessing the buffer with different process(ex. GPU → CPU).
Thanks.
Thanks,
I found my error - I didn’t upload the input to gpu before sending it to the trt module.
Hi,
Just want to confirm your issue is fixed.
Is everything okay after upload the buffer into GPU?
Thanks.