Hi,
I am trying out Bert inference from this tutorial: Real-Time Natural Language Processing with BERT Using NVIDIA TensorRT (Updated) | NVIDIA Technical Blog
However, I ran into the following error:
pycuda._driver.LogicError: cuMemHostRegister failed: operation not supported
Code pertaining to this error is the following:
input_ids = cuda.register_host_memory(np.ascontiguousarray(input_ids_batch.ravel()))
segment_ids = cuda.register_host_memory(np.ascontiguousarray(segment_ids_batch.ravel()))
input_mask = cuda.register_host_memory(np.ascontiguousarray(input_mask_batch.ravel()))
This post suggests “use cudaHostAlloc() with the flag cudaHostAllocMapped to allocate device-mapped host-accessible memory”. However, the Pycuda package does not seem to contain such a method.
What would be a workaround of cuda_host_register in Pycuda?
Thank you!