Hi,
I’m running experiments using gdsio to measure GPU Direct Storage (GDS) performance. I observed that the write throughput decreases when I increase the number of threads.
Specifically:
-
Throughput increases as threads go from 1 to 4
-
But when threads increase to 8 or more, the write performance drops
Here is the command I’m using for the test:
./gdsio -d 0 -f /mnt/data/gds/test_16G -I 1 -s 4GB -T 20 -w 16 -x 0
Could someone explain why the write throughput decreases when using more threads? Is there a known limitation or tuning required for GDS with multiple threads?
Thank you!