Hi,
I have a problem about installing GDS on Ubuntu22.04 with kernel version 6.5.0-44-generic.
According to Nvidia-fs could not be loaded: several "Unknown symbol" errors.
Starting 2.17.5 kernel driver of nvidia-fs.ko are not supported wht GDS, so I try to install NVIDIA open kernel driver instead.
I followed instructions in CUDA Installation Guide for Linux.
However I can’t find cuda-drivers. Only exist cuda-drivers-fabricmanager-535.
I also tried to git clone opensource: GitHub - NVIDIA/open-gpu-kernel-modules at 535.183.01.
and followed these two instructions:
make modules -j$(nproc)
make modules_install -j$(nproc)
Unfortunately, when I process /usr/local/cuda-12.2/gds/tools/gdscheck.py -p, GDS version was still not appear.
============
ENVIRONMENT:
============
=====================
DRIVER CONFIGURATION:
=====================
NVMe : Unsupported
NVMeOF : Unsupported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Unsupported
WekaFS : Unsupported
Userspace RDMA : Unsupported
--Mellanox PeerDirect : Disabled
--rdma library : Not Loaded (libcufile_rdma.so)
--rdma devices : Not configured
--rdma_device_status : Up: 0 Down: 0
=====================
CUFILE CONFIGURATION:
=====================
properties.use_compat_mode : true
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 1024
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 18014398509481980
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 32
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
profile.nvtx : false
profile.cufile_stats : 0
miscellaneous.api_check_aggressive : false
execution.max_io_threads : 0
execution.max_io_queue_depth : 128
execution.parallel_io : false
execution.min_io_threshold_size_kb : 1024
execution.max_request_parallelism : 0
properties.force_odirect_mode : false
properties.prefer_iouring : false
=========
GPU INFO:
=========
GPU index 0 NVIDIA RTX A4000 bar:1 bar size (MiB):256 supports GDS, IOMMU State: Disabled
==============
PLATFORM INFO:
==============
IOMMU: disabled
Platform verification succeeded
and my other environment here:
OS: Ubuntu 22.04
kernel version: 6.5.0-44-generic
CUDA: 12.2
GPU driver version: 535.183.01
MLNX_OFED: MLNX_OFED_LINUX-23.10-3.2.2.0-ubuntu22.04-x86_64
Can anyone offer some advice?
Thanks.