Hello, I am currently exploring GPUDirect Storage and measuring its performance, specifically sequential read bandwidth on an SSD. I have been referring to the documentation provided here: NVIDIA GPUDirect Storage Benchmarking and Configuration Guide - NVIDIA Docs
In one section of the documentation, it mentions a reported throughput of 6.5 GB/s with an average latency of 1.2ms:
The average sustained throughput was 6.5GB/sec, with a 1.2ms average latency.
However, when I observed the throughput using nvidia-smi and iostat, I noticed a higher throughput of over 8 GB/s:
During our GPUDirect Storage reads initiated with the gdsio command line, we observed the selected GPU (GPU 2) receiving over 8GB/sec over PCIe. This demonstrates GPUDirect Storage in action, with the GPU reading directly from the NVMe drives over PCIe.
The documentation acknowledges this difference between 6.5 GB/s and 8 GB/s, but doesn’t provide an explanation for it. I’m having trouble understanding this significant performance gap. It’s worth noting that I conducted my own performance benchmark and was able to reproduce this gap in measurements using gdsio, nvidia-smi, and iostat.
My question is: why and where does this discrepancy in reported throughput come from? I assume that nvidia-smi and iostat should be more accurate since they primarily measure PCIe and block device performance.
Thank you for your assistance.
On a separate note, there seems to be a typo in the documentation.
In the link provided, the section titled “A Simple Example: Writing to large files with a large IO size using the GDS path” appears to be incorrect. It seems to describe a reading from large files experiment, rather than writing. Please correct me if I’m mistaken.