pycuda._driver.LogicError: cuMemcpyDtoHAsync failed: an illegal memory access was encountered

Hello,

I am receiving this error in a very weird way, whenever I use batch_size = 10 in the below code I receive this error and for all other batch sizes, it works fine.

        output = np.zeros((batch_size,512),dtype=np.float32,order="C")
        
        print("number of bytes in output array : ",output.nbytes)
        # print("output type : ",type(output))
        # print(output.shape)
        d_input = cuda.mem_alloc(im_batch.nbytes)
        d_output = cuda.mem_alloc(output.nbytes)
        
        bindings = [int(d_input), int(d_output)]

        # copy input to device, run inference, copy output to host
        cuda.memcpy_htod_async(d_input,im_batch)
        self.context.execute_v2(bindings=bindings)
        # cuda.Context.synchronize()
        cuda.memcpy_dtoh_async(output,d_output)

error message :

cuda.memcpy_dtoh_async(output,d_output)
pycuda._driver.LogicError: cuMemcpyDtoHAsync failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
[05/17/2023-15:58:38] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::42] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::42] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::42] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::42] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::42] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::42] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)
[05/17/2023-15:58:38] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::42] Error Code 1: Cuda Runtime (CUDA-capable device(s) is/are busy or unavailable)

This error appears whenever I set batch_size = 10. Please can you help me identify the cause of this behavior?

From the line “self.context.execute_v2(bindings=bindings)”, I am facing the same problem on jetson orin nano with my single batch output of “np.empty([1,7],dtype=np.float16)”

For me, batch size of 1 does not work (2-3 and so on works)

I reconverted my TF model to ONNX with fixed batch size as 1, then converted fixed batch size ONNX model to tensorrt with explicitBatch, problem is solved.