Alternative of pycuda cuda_host_register on Jetson TX2


I am trying out Bert inference from this tutorial: Real-Time Natural Language Processing with BERT Using NVIDIA TensorRT (Updated) | NVIDIA Technical Blog

However, I ran into the following error:

pycuda._driver.LogicError: cuMemHostRegister failed: operation not supported

Code pertaining to this error is the following:

input_ids = cuda.register_host_memory(np.ascontiguousarray(input_ids_batch.ravel()))
segment_ids = cuda.register_host_memory(np.ascontiguousarray(segment_ids_batch.ravel()))
input_mask = cuda.register_host_memory(np.ascontiguousarray(input_mask_batch.ravel()))

This post suggests “use cudaHostAlloc() with the flag cudaHostAllocMapped to allocate device-mapped host-accessible memory”. However, the Pycuda package does not seem to contain such a method.

What would be a workaround of cuda_host_register in Pycuda?

Thank you!


This is a hardware limitation.
cudaHostRegister is supported on the platform that its compute capability is greater than or equal to 7.2.

You can find this information in the below document: