Support GPUDirect RDMA on Jetson AGX Orin development kit

Hi

Is it possible to enable GPUdirect RDMA on Jetson AGX Orin Development kit?
I’m using R35.3.1 version of software and compiled GitHub - NVIDIA/gds-nvidia-fs: NVIDIA GPUDirect Storage Driver for Jetson AGX Orin and insmod this kernel driver. However, it looks not supporting GPUdirect RDMA…

If Jetson AGX Orin is not supporting GPUdirect then Is there any plan to support this?
or Isn’t it possible to support it due to hardware limitation?

Here is the log…

@ubuntu:/usr/local/cuda-11.8$ sudo python /usr/local/cuda-11.8/gds/tools/gdscheck.py -p
GDS release version: 1.4.0.31
nvidia_fs version: 2.15 libcufile version: 2.12
Platform: aarch64

ENVIRONMENT:

=====================
DRIVER CONFIGURATION:

NVMe : Supported
NVMeOF : Unsupported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Unsupported
BeeGFS : Unsupported
WekaFS : Unsupported
Userspace RDMA : Unsupported
–Mellanox PeerDirect : Disabled
–rdma library : Not Loaded (libcufile_rdma.so)
–rdma devices : Not configured
–rdma_device_status : Up: 0 Down: 0

CUFILE CONFIGURATION:

properties.use_compat_mode : true
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 33554432
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 32
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
profile.nvtx : false
profile.cufile_stats : 0
miscellaneous.api_check_aggressive : false

GPU INFO:

GPU index 0 Orin: Model Not Supported

PLATFORM INFO:

IOMMU: disabled
Platform verification succeeded

Hi,

There is a symbol conflict issue that we are actively working on.
Currently, you can apply the WAR below manually.

Thanks.

Sure. Thank you so much let me try it.

Hi AastaLLL,

I compiled and insmod following kernel modules. However, I still can see it is not supporint… Do you have any advise?

sungwook@ubuntu:/usr/local/cuda-11.8/gds/tools$ lsmod | grep rdma
picoevb_rdma 24576 0
nvidia_p2p 20480 1 picoevb_rdma
sungwook@ubuntu:/usr/local/cuda-11.8/gds/tools$ lsmod | grep nvidia
nvidia_fs 262144 0
nvidia_p2p 20480 1 picoevb_rdma
nvidia_modeset 1093632 6
nvidia 1462272 13 nvidia_modeset

sungwook@ubuntu:/usr/local/cuda-11.8/gds/tools$ python gdscheck.py -p
warn: error opening log file: Permission denied, logging will be disabled
GDS release version: 1.4.0.31
nvidia_fs version: 2.15 libcufile version: 2.12
Platform: aarch64

ENVIRONMENT:

=====================
DRIVER CONFIGURATION:

NVMe : Supported
NVMeOF : Unsupported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Unsupported
BeeGFS : Unsupported
WekaFS : Unsupported
Userspace RDMA : Unsupported
–Mellanox PeerDirect : Disabled
–rdma library : Not Loaded (libcufile_rdma.so)
–rdma devices : Not configured
–rdma_device_status : Up: 0 Down: 0

CUFILE CONFIGURATION:

properties.use_compat_mode : true
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 33554432
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 32
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
profile.nvtx : false
profile.cufile_stats : 0
miscellaneous.api_check_aggressive : false

GPU INFO:

GPU index 0 Orin: Model Not Supported

PLATFORM INFO:

IOMMU: disabled
Platform verification succeeded

Hi,

It seems that you are using CUDA 11.8.

How do you install CUDA 11.8 on Orin?
Do you upgrade the CUDA package from the website?

Thanks.

Yes. I just installed CUDA 11.8 from website for using gds things…

2023년 4월 23일 (일) 오후 11:18, AastaLLL via NVIDIA Developer Forums <notifications@nvidia.discoursemail.com>님이 작성:

Hi,

Could you use the default CUDA package that comes with JetPack to see if it works?
Thanks.

Can I know how to check the gds with the default cuda package? I cannot find the gds folder on the cuda 11.04 folder.

Thanks, Sungwook

2023년 4월 24일 (월) 오후 8:57, AastaLLL via NVIDIA Developer Forums <notifications@nvidia.discoursemail.com>님이 작성:

Hi,

Is it possible to check the RDMA functionality without using gds?

It’s expected that RDMA can work on Jetson.
But we don’t have the experience to use it along with the gds.

This experiment can give us some hints about the issue comes from.
Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.