(gpu && ssd)

hello,

would it be possible to slot a ssd pci card next to a gpu, and have the gpu access it (directly), you think?

If an SSD vendor adds support for GPUdirect with RDMA to their driver, it seems that should be possible. I am not aware of any shipping products of that kind, but there was this presentation at last year’s GTC:

[url]http://on-demand.gputechconf.com/gtc/2014/presentations/S4265-rdma-gpu-direct-for-fusion-io-iodrive.pdf[/url]

I am not sure there is a lot to gain when using a single standard SSD, rather than the high-performance storage solution discussed in the above presentation. As far as I know run-of-the-mill SSDs have a throughput of about 0.5 GB/sec, which is easily accommodated when shuffling the data through the host.

you made the distinction between “run-of-the-mill” and “high-performance” ssd drives/ solutions
on this, i see some of the high end ssd drives offer up to 2 - 3 GB/s read, which i would consider rather ‘accommodating’

i also see what you mean by shipping products

but nvme/ nvme on fabric seems to be the new valiant knight appearing on the horizon, which leaves me wondering whether there would really be rdma gpu-direct ssd offerings per se

rather, seems one only need a nvme ssd:

[url]http://blog.pmcs.com/project-donard-peer-to-peer-communication-with-nvm-express-devices-part-1/[/url]

I do not have an in-depth knowledge of storage solutions, and I do not know your system context. I am reasonably certain that common consumer-level SSD products do not offer much beyond 0.5 GB/sec throughput at this time.

By comparison, the practical uni-directional bandwidth of PCIe gen 3 is in the 10-12 GB/sec range. DDR3-based system memory provides bandwidth of about 25 GB/sec for a two-channel solution (typical consumer-level PC), and about 50 GB/sec for a four-channel solution (high-end workstations, servers).

Based on that, switching to a direct transfer of data between SSD and GPU in a consumer-level system probably provides insufficient performance advantages to make it worthwhile for the SSD vendor to offer drivers using GPUdirect and RDMA. As you point out, the situation may well be different for high-performance storage solutions used in server contexts.

your bandwidth figures are optimistic
but then i would likely have to deem bargaining on 2-3GB/s for 2-3GB/s ssds as optimistic as well
and your bandwidth measures do make a good point

for this one particular application, i am contemplating offloading some (a lot of) data for reuse, and i was wondering whether ssd could be a viable alternative to (significantly) increasing host memory/ host memory strain

but you bring forth a point that i slightly overlooked - i would have to check or guarantee the rate at which the data is consumed, to deem ssd or even increased host memory - offloading the data - as an option in the first place

I have personally seen unidirectional throughput of > 10 GB/sec for PCIe gen 3, and measured throughput for dual-channel DDR3 around 22+ GB/sec when using fast speed grades of DDR3.

For similar results, see for example [url]http://www.xbitlabs.com/articles/memory/display/haswell-ddr3_5.html#sect1[/url] which shows DDR3-1866 at 24.5 GB/sec for copy. I seem to recall published measurements for four-channel DDR3 in a high-end workstation approaching 50 GB/sec, but cannot find the source right now. Similar numbers can be found here, however: [url]CORSAIR Site Maintenance.

Obviously these are peak rates for large transfers. But then the peak throughput of 500 - 600 MB/sec published for various common SSDs is likewise measured for sequential reads of large chunks of data. So I would claim that all the throughput numbers I quoted are equally optimistic.

I do not know your use case obviously, but as long as the data is streaming (which I assume because you mention large amounts of data not able to fit into system memory), performance would likely be constrained by the throughput of the I/O device, not by whatever is in between it and the GPU. Latency might be an interesting aspect to look at, though.