CUDA Illegal Memory Acess using PyCuda and TensorRT Inference

Hi guys, can someone help me? I encounter this error about an illegal memory access when inferencing using TensorRT Python

I also tried to use DeviceMemoryPool() instead of cuda.mem_alloc() but still not working

def allocate_buffers(engine):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    mem_pool = DeviceMemoryPool()

    for binding in engine:
        size = trt.volume(engine.get_binding_shape(

        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = mem_pool.allocate(host_mem.nbytes)

        # Append the device buffer to device bindings.
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))

            outputs.append(HostDeviceMem(host_mem, device_mem))

    return inputs, outputs, bindings, stream, mem_pool

The inference script is similar to

def do_inference(engine, bindings, inputs, outputs, stream, batch_size=1):
    with engine.create_execution_context() as context:
        # Transfer input data to the GPU.
        [cuda.memcpy_htod_async(inp.device,, stream) for inp in inputs]

        # Run inference.
        context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)

        # Transfer predictions back from the GPU.
        [cuda.memcpy_dtoh_async(, out.device, stream) for out in outputs]
        # Synchronize the stream

    # Return only the host outputs.
    return [ for out in outputs]

I’m not sure whats wrong. But I think it has something to do with the memcpy_htod_async and memcpy_dtoh_async

[TensorRT] ERROR: engine.cpp (169) - Cuda Error in ~ExecutionContext: 700 (an illegal memory access was encountered)
[TensorRT] ERROR: INTERNAL_ERROR: std::exception
[TensorRT] ERROR: Parameter check failed at: safeContext.cpp::terminateCommonContext::216, condition: cudnnDestroy(context.cudnn) failure.
[TensorRT] ERROR: Parameter check failed at: safeContext.cpp::terminateCommonContext::221, condition: cudaEventDestroy(context.start) failure.
[TensorRT] ERROR: Parameter check failed at: safeContext.cpp::terminateCommonContext::226, condition: cudaEventDestroy(context.stop) failure.
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (32) - Cuda Error in free: 700 (an illegal memory access was encountered)
terminate called after throwing an instance of 'nvinfer1::CudaError'
  what():  std::exception

I also attach the nvidia-bug-report nvidia-bug-report.log.gz (1.4 MB)