Why does GDS write performance degrade when increasing the number of threads?

Hi,

I’m running experiments using gdsio to measure GPU Direct Storage (GDS) performance. I observed that the write throughput decreases when I increase the number of threads.

Specifically:

  • Throughput increases as threads go from 1 to 4

  • But when threads increase to 8 or more, the write performance drops

Here is the command I’m using for the test:

./gdsio -d 0 -f /mnt/data/gds/test_16G -I 1 -s 4GB -T 20 -w 16 -x 0

Could someone explain why the write throughput decreases when using more threads? Is there a known limitation or tuning required for GDS with multiple threads?

Thank you!