Storage Performance Basics for Deep Learning

jwitsoe · March 21, 2018, 1:09pm

Originally published at: Storage Performance Basics for Deep Learning | NVIDIA Technical Blog

Introduction When production systems are not delivering expected levels of performance, it can be a challenging and time-consuming task to root-cause the issue(s). Especially in today’s complex environments, where the workload is comprised of many software components, libraries, etc, and rely on virtually all of the underlying hardware subsystems (CPU, memory, disk IO, network IO)…

anon96431672 · March 29, 2018, 10:28pm

Great write up ! I enjoyed reading that...

anon13936343 · April 10, 2018, 1:29pm

Thanks for the article, it was a nice read.

For CUDA developers that need very low latency disk access and do not require a file system, I have made a library for creating CUDA storage applications: https://github.com/enfiskut...

I've also made a synthetic benchmark for it, comparing it to among other things memory mapping a file. It's still very much a work in progress, so don't expect too much from it, but it shows some interesting concepts like directly accessing a disk using GPUDirect RDMA/Async.

anon6844961 · April 23, 2018, 8:59pm

Thanks very much Tim. Much more to come!

anon6844961 · April 23, 2018, 9:00pm

Thanks very much. Having a look at your code this afternoon - very interesting!

anon6844961 · April 23, 2018, 9:25pm

On a related note, applying some basic sanity checking on several white-box storage nodes we have in our lab is time well spent. These nodes each have 6 NVMe SSD's, and on one of the storage nodes, one of the NVMe devices gets less than half the random 4k read IOPS as the other five NVMe SSD's. I have not yet root-caused this, but it's one of those things that would potentially cause a lot of hair-pulling once in production. The NVMe SSD's are getting near 500k random 4k reads, but the 'bad' NVMe SSD sustains less than 200k. Huge difference, and something that would have dragged a RAID group down for sure.

anon53560764 · March 19, 2020, 7:44am

Thanks James. Really help us in our Testing of NVMe drives.

Topic		Replies	Views
DevKit NVMe performance Jetson Xavier NX nvme	13	2580	June 10, 2020
Gen 3 PCIe NVMe SSD with x4 lanes gets higher IOPS on Nano compared to the Xavier NX Jetson Xavier NX pcie , ssd , nvme	3	1428	September 28, 2022
GPUDirect Storage: A Direct Path Between Storage and GPU Memory Technical Blog	7	1266	March 22, 2022
Imbalanced Performance between Read and Write Performance Jetson AGX Xavier	19	2289	December 14, 2018
Squeasing max d2d memory bandwidth (GTX 480) CUDA Programming and Performance	15	7133	November 2, 2010
NVIDIA GDS output exceeds NVMe device throughput GPU-Accelerated Libraries gds	10	783	January 9, 2024
GDS performance not as expected GPU-Accelerated Libraries gds	5	1738	July 9, 2023
Gds tools gdsio ,the Throughput is less then 500M CUDA Programming and Performance cuda	1	1065	August 29, 2022
Getting the best performance from NVIDIA GPUDirect storage APIs (batch io) GPU-Accelerated Libraries gds	5	966	June 29, 2023
Boosting Data Ingest Throughput with GPUDirect Storage and RAPIDS cuDF Technical Blog	5	579	January 28, 2025

Storage Performance Basics for Deep Learning

Related topics