I need some help in understanding the performance of GDS.
I am using the gdsio utility to explore the performance of GDS. My experiments show that Storage -> GPU consumes more CPU resources than Storage -> CPU -> GPU. This outcome is contrary to my expectations, and I am not sure about the reasons. Below are the details of my setup and results.
I manually created a file at /mnt/nvme2n1/20GFile and used the gdsio utility to measure performance. I monitored GPU and CPU usage using nvtop.
The command was like sudo gdsio -f "/mnt/nvme2n1/20GFile" -d 0 -x 0 -w 16 -s "20G" -i "4K" -I 2 -T 20. I tuned the -x parameter to control the data transfer mode.
-x 0: Storage → GPU
-x 2: Storage → CPU → GPU
-x 1: Storage → CPU (used to check if the SSD is the bottleneck)
I use nvtop to monitor the GPU and CPU usage.
The following tables summarize the GPU and CPU behavior under different scenarios.
16 Threads (-w 16)
Storage-> GPU
Storage → CPU → GPU
Storage → CPU
GPU Usage
40 %
39 %
GPU Memory
424 MiB
422 MiB
CPU Usage
299 %
258 %
Host Memory
115 MiB
115 MiB
Latency
86.53 usec
85.86 usec
72.62 usec
Throughput
0.70 GiB/s
0.71 GiB/s
0.84 GiB/s
Increased Threads (-w 32)
Storage-> GPU
Storage → CPU → GPU
Storage → CPU
GPU Usage
58 %
57 %
GPU Memory
430 MiB
424 MiB
CPU Usage
654 %
550 %
Host Memory
114 MiB
112 MiB
Latency
102.76 usec
98.92 usec
80.77 usec
Throughput
1.18 GiB/s
1.24 GiB/s
1.51 GiB/s
As you can see, the Storage -> GPU operation consumes more CPU resources, while the memory usage remains similar. If the I/O bypasses the data copy process, we would expect the resource usage to decrease, right? Maybe there’s an issue with my configuration?
I haven’t modified any configuration files related to GDS or cuFile; I’m using the default settings.
How can I verify that GDS is functioning correctly and bypassing the data copying process?
Below are the results of the gdscheck -p and nvidia-smi topo -m commands. However, the nvidia-smi topo -name option does not work (It reports: Option “-name” is not recognized).
10-01-2025 08:58:47:618 [pid=10987 tid=10987] ERROR cufio-fs:79 mount option not found in mount table data device: /dev/nvme2n1
10-01-2025 08:58:47:618 [pid=10987 tid=10987] ERROR cufio-fs:152 EXT4 journal options not found in mount table for device,can't verify data=ordered mode journalling
10-01-2025 08:58:47:618 [pid=10987 tid=10987] NOTICE cufio:293 cuFileHandleRegister GDS not supported or disabled by config, using cuFile posix read/write with compat mode enabled