Can MPI_Scatter scatter from a pinned host pointer to GPU memory?

Hi. I have a scientific application I’m building with nvhpc-hpcx-cuda13/26.3. I’m doing the simple case using a single host and GPU, and trying to scatter from pinned host memory (allocated with cudaMallocHost) to GPU memory (allocated with cudaMalloc). Code snippet follows

CUDA_RT_CALL(cudaMallocHost((void**) &u,(sizeof(double)*(NR))));
CUDA_RT_CALL(cudaMalloc((void**) &u_device, size_u_device));
MPI_CALL(MPI_Scatter(u+offset, nelem_init_device, MPI_DOUBLE,
                        u_device+offset, nelem_init_device, MPI_DOUBLE, 0, MPI_COMM_WORLD));

I’ve checked all the size parameters (offset, nelem_init_device, etc) but I get the following error when I try to execute the scatter in cuda_gdb. Should this type of scatter be allowed, or do I have to copy to device and do the scatter from device 0 to all the others? Thanks..

Cuda Driver error detected: Address specified(addr: 0x32c882020) must belong to a range reserved previously by cuMemAddressReserve()
Cuda Driver error detected: Returning 1 (CUDA_ERROR_INVALID_VALUE) from cuMemRetainAllocationHandle
Cuda Driver error detected: Parameter memPool cannot be NULL
Cuda Driver error detected: Returning 1 (CUDA_ERROR_INVALID_VALUE) from cuMemPoolGetAccess