Hi Team
Below we enabled Nvidia A100s GPUs on Dell R760 server with U.2 NVMe Device .
[root@A100 ~]# /usr/local/cuda-12.3/gds/tools/gdscheck.py -p
GDS release version: 1.8.1.2
nvidia_fs version: 2.18 libcufile version: 2.12
Platform: x86_64
============
ENVIRONMENT:
============
=====================
DRIVER CONFIGURATION:
=====================
NVMe : Supported
NVMeOF : Unsupported
SCSI : Unsupported
ScaleFlux CSD : Unsupported
NVMesh : Unsupported
DDN EXAScaler : Unsupported
IBM Spectrum Scale : Unsupported
NFS : Unsupported
BeeGFS : Unsupported
WekaFS : Unsupported
Userspace RDMA : Unsupported
--Mellanox PeerDirect : Disabled
--rdma library : Not Loaded (libcufile_rdma.so)
--rdma devices : Not configured
--rdma_device_status : Up: 0 Down: 0
=====================
CUFILE CONFIGURATION:
=====================
properties.use_compat_mode : false
properties.force_compat_mode : false
properties.gds_rdma_write_support : true
properties.use_poll_mode : false
properties.poll_mode_max_size_kb : 4
properties.max_batch_io_size : 128
properties.max_batch_io_timeout_msecs : 5
properties.max_direct_io_size_kb : 16384
properties.max_device_cache_size_kb : 131072
properties.max_device_pinned_mem_size_kb : 33554432
properties.posix_pool_slab_size_kb : 4 1024 16384
properties.posix_pool_slab_count : 128 64 32
properties.rdma_peer_affinity_policy : RoundRobin
properties.rdma_dynamic_routing : 0
fs.generic.posix_unaligned_writes : false
fs.lustre.posix_gds_min_kb: 0
fs.beegfs.posix_gds_min_kb: 0
fs.weka.rdma_write_support: false
fs.gpfs.gds_write_support: false
profile.nvtx : true
profile.cufile_stats : 0
miscellaneous.api_check_aggressive : false
execution.max_io_threads : 4
execution.max_io_queue_depth : 128
execution.parallel_io : true
execution.min_io_threshold_size_kb : 8192
execution.max_request_parallelism : 4
properties.force_odirect_mode : false
properties.prefer_iouring : false
=========
GPU INFO:
=========
GPU index 0 NVIDIA A100-PCIE-40GB bar:1 bar size (MiB):65536 supports GDS, IOMMU State: Disabled
GPU index 1 NVIDIA A100-PCIE-40GB bar:1 bar size (MiB):65536 supports GDS, IOMMU State: Disabled
==============
PLATFORM INFO:
==============
IOMMU: disabled
Nvidia Driver Info Status: Supported(Nvidia Open Driver Installed)
Cuda Driver Version Installed: 12030
Platform: PowerEdge R760, Arch: x86_64(Linux 5.14.0-362.13.1.el9_3.x86_64)
Platform verification succeeded
Below is the output from GDSIO tool where throughput is reaching around 175 GiB/s which is more than the NVMe device pcie Gen5 speed , is GDSIO tool reporting HBM memory throughput or where its coming from ?
Running ['sudo', '/usr/local/cuda-12.3/gds/tools/gdsio', '-D', '/mnt/gds', '-d', '1', '-n', '1', '-T', '30', '-s', '30G', '-i', '1M', '-w', '1', '-x', '0', '-I', '2']
['IoType:', 'RANDREAD', 'XferType:', 'GPUD', 'Threads:', '1', 'DataSetSize:', '615096320/31457280(KiB)', 'IOSize:', '1024(KiB)', 'Throughput:', '19.676493', 'GiB/sec,', 'Avg_Latency:', '49.629803', 'usecs', 'ops:', '600680', 'total_time', '29.812302', 'secs\n']
latency 49.629803 throughput 19.676493
Running ['sudo', '/usr/local/cuda-12.3/gds/tools/gdsio', '-D', '/mnt/gds', '-d', '1', '-n', '1', '-T', '30', '-s', '30G', '-i', '1M', '-w', '1', '-x', '2', '-I', '2']
['IoType:', 'RANDREAD', 'XferType:', 'CPU_GPU', 'Threads:', '1', 'DataSetSize:', '322764800/31457280(KiB)', 'IOSize:', '1024(KiB)', 'Throughput:', '10.600037', 'GiB/sec,', 'Avg_Latency:', '92.127097', 'usecs', 'ops:', '315200', 'total_time', '29.038815', 'secs\n']
latency 92.127097 throughput 10.600037
Running ['sudo', '/usr/local/cuda-12.3/gds/tools/gdsio', '-D', '/mnt/gds', '-d', '1', '-n', '1', '-T', '30', '-s', '30G', '-i', '1M', '-w', '1', '-x', '4', '-I', '2']
['IoType:', 'RANDREAD', 'XferType:', 'CPU_CACHED_GPU', 'Threads:', '1', 'DataSetSize:', '129925120/31457280(KiB)', 'IOSize:', '1024(KiB)', 'Throughput:', '4.236128', 'GiB/sec,', 'Avg_Latency:', '230.524078', 'usecs', 'ops:', '126880', 'total_time', '29.249885', 'secs\n']
latency 230.524078 throughput 4.236128
Running ['sudo', '/usr/local/cuda-12.3/gds/tools/gdsio', '-D', '/mnt/gds', '-d', '1', '-n', '1', '-T', '30', '-s', '30G', '-i', '1M', '-w', '32', '-x', '0', '-I', '2']
['IoType:', 'RANDREAD', 'XferType:', 'GPUD', 'Threads:', '32', 'DataSetSize:', '5262499840/1006632960(KiB)', 'IOSize:', '1024(KiB)', 'Throughput:', '172.277523', 'GiB/sec,', 'Avg_Latency:', '181.402613', 'usecs', 'ops:', '5139160', 'total_time', '29.131548', 'secs\n']
latency 181.402613 throughput 172.277523
Running ['sudo', '/usr/local/cuda-12.3/gds/tools/gdsio', '-D', '/mnt/gds', '-d', '1', '-n', '1', '-T', '30', '-s', '30G', '-i', '1M', '-w', '32', '-x', '2', '-I', '2']
['IoType:', 'RANDREAD', 'XferType:', 'CPU_GPU', 'Threads:', '32', 'DataSetSize:', '584962048/1006632960(KiB)', 'IOSize:', '1024(KiB)', 'Throughput:', '18.204442', 'GiB/sec,', 'Avg_Latency:', '1716.491343', 'usecs', 'ops:', '571252', 'total_time', '30.644350', 'secs\n']
latency 1716.491343 throughput 18.204442