I want to test the transfer speed and CPU usage of GDS using gdsio.While transferring data, I used top to monitor CPU usage. However, I noticed that the CPU usage was higher when using -x 0 (with GDS enabled) compared to using -x 2:
sudo /usr/local/cuda-12.2/gds/tools/gdsio -f /mnt/dd.txt -d 0 -w 4 -s 100G -i 1M -I 0 -x 0
IoType: READ XferType: GPUD Threads: 4 DataSetSize: 104837120/104857600(KiB) IOSize: 1024(KiB) Throughput: 6.622534 GiB/sec, Avg_Latency: 589.436984 usecs ops: 102380 total_time 15.097011 secs
top
PID USER PR NI VIRT RES SHR %CPU %MEM TIME+ COMMAND
2685 root 20 0 5572460 174724 91880 S 82.1 1.1 0:08.72 gdsio
sudo /usr/local/cuda-12.2/gds/tools/gdsio -f /mnt/dd.txt -d 0 -w 4 -s 100G -i 1M -I 0 -x 2
IoType: READ XferType: CPU_GPU Threads: 4 DataSetSize: 104739840/104857600(KiB) IOSize: 1024(KiB) Throughput: 6.573733 GiB/sec, Avg_Latency: 593.864340 usecs ops: 102285 total_time 15.194972 secs
top
PID USER PR NI VIRT RES SHR %CPU %MEM TIME+ COMMAND
2758 root 20 0 5199792 108296 93488 S 72.8 0.7 0:04.88 gdsio
sudo /usr/local/cuda-12.2/gds/tools/gdsio -f /mnt/dd.txt -d 0 -w 4 -s 100G -i 1M -I 0 -x 1
IoType: READ XferType: CPUONLY Threads: 4 DataSetSize: 104857600/104857600(KiB) IOSize: 1024(KiB) Throughput: 6.478798 GiB/sec, Avg_Latency: 597.596406 usecs ops: 102400 total_time 15.434962 secs
top
PID USER PR NI VIRT RES SHR %CPU %MEM TIME+ COMMAND
2783 root 20 0 4835100 13736 8408 S 43.9 0.1 0:03.79 gdsio
It seems that when using CPU for data transfer, the CPU usage is actually the lowest.
Additionally, although the transfer rate using GDS has slightly improved, the gap is very small. Does this align with expectations?
Besides, why are samples no longer provided in versions after GDS 12.2? I couldn’t find the /gds/samples folder in either CUDA 12.4 or CUDA 12.9.