I transform my NN to TensorRT in python API.
I want to verified the results before moving to C++ API.
But I can’t transform the output of the NN to numpy.
Because the results are on gpu.
And when I try to move the reuslts by:
I get the error:
RuntimeError: Cuda error: unspecified launch failure.
How do I solved this issue?
Would you mind to share your code with us?
In general, you will need to copy the buffer back to CPU with pyCUDA like this:
cuda.memcpy_dtoh_async(host_outputs, cuda_outputs, stream)
If you are using unified memory, which is one kind of shared memory between CPU and GPU.
Please remember to call
stream.synchronize() before accessing the buffer with different process(ex. GPU -> CPU).
I found my error - I didn’t upload the input to gpu before sending it to the trt module.
Just want to confirm your issue is fixed.
Is everything okay after upload the buffer into GPU?