How to understand "CU_FILE_RDMA_REGISTER"?

dongkesi · March 11, 2024, 8:32am

Dear all,

What does this macro in cufile.h mean? When will it be used? Thanks。

dong

striker159 · March 11, 2024, 8:45am

According to the ubuntu man page Ubuntu Manpage: cufile.h - cuFile C APIs

define CU_FILE_RDMA_REGISTER 1

CUfileError_t cuFileBufRegister (const void * devPtr_base, size_t length, int flags)
register an existing cudaMalloced memory with cuFile to pin for GPUDirect Storage access.
  Parameters:
      devPtr_base device pointer to allocated
      length size of memory region from the above specified devPtr
      flags CU_FILE_RDMA_REGISTER

Note that this description is different from the official documentation here cuFile API Reference Guide - NVIDIA Docs

CUfileError_t cuFileBufRegister(const void *bufPtr_base,
size_t size, int flags);

Based on the memory type, this API registers existing cuMemAlloc’d (pinned) memory for GDS IO operations or >host memory for IO operations.

Parameters

bufPtr_base Address of device pointer. cuFileRead and cuFileWrite must use this bufPtr_base as the base address.
size Size in bytes from the start of memory to map.
flags Reserved for future use, must be 0.

dongkesi · March 11, 2024, 9:27am

Thanks for your reply. I find that the header file (cufile.h 11.6) does not say that “Reserved for future use, must be 0”. Which one is right? If it’s an exprimental flag, which case need to set the flag to “CU_FILE_RDMA_REGISTER”? Thanks.

user157267 · March 11, 2024, 12:37pm

Same. My cufile.h header is same at the Ubuntu man page. I will go with that the header is the right ground truth:

 /**
  * @brief register an existing cudaMalloced memory with cuFile to pin for GPUDirect Storage access.
  *
  * @param devPtr_base  device pointer to allocated
  * @param length  size of memory region from the above specified devPtr
  * @param flags   CU_FILE_RDMA_REGISTER
  *
  * @return  CU_FILE_SUCCESS on success
  * @return  CU_FILE_NVFS_DRIVER_ERROR
  * @return  CU_FILE_INVALID_VALUE
  * @return  CU_FILE_CUDA_ERROR for unsuported memory type
  * @return  CU_FILE_MEMORY_ALREADY_REGISTERED on error
  * @return  CU_FILE_GPU_MEMORY_PINNING_FAILED if not enough pinned memory is available
  * @note This memory will be use to perform GPU direct DMA from the supported storage.
  * @warning This API is intended for usecases where the memory is used as streaming buffer that is reused across multiple cuFile IO     operations before calling @ref cuFileBufDeregister
  *
  * @see cuFileBufDeregister
  * @see cuFileRead
  * @see cuFileWrite
  */
 CUfileError_t cuFileBufRegister(const void *devPtr_base, size_t length, int flags);

I want to addup that the sample code repo also has no reference on the CU_FILE_RDMA_REGISTER flag: GitHub - NVIDIA/MagnumIO: Magnum IO community repo

kmodukuri · March 11, 2024, 4:49pm

CU_FILE_RDMA_REGISTER if set to 1 will be used to pre register the buffers for use with RDMA.
This is useful for Weka and GPFS mounts. This can help reduce the latency for first IO using the buffer for RDMA. if it is not set, the buffers will be automatically registered during the IO path.

This flag should not be used if the application is used with Lustre, NFS, BeeGFS or NVMe mounts.

dongkesi · March 12, 2024, 2:50am

Thanks for the detailed explanation, but why can’t it be used for these DFS?

system · March 26, 2024, 2:50am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.