Direct load to memory in CUDA? Is there a fast (parallel) load from file?

I have to load data from files to host memory before sending to GPU via cudaMemcpy.
I’m trying to cut down on the data load time. Is there a way to load data from a file directly to GPU global memory?

Nope. You can google this group for similar questions and more elaborate answers :)

I’m processing data with the GPU almost as fast as I can load the data from files on the hard disk.
I guess getting a SSD drive will help push down that disk read latency.
Thanks for the reply.

If you are not already doing so, I would suggest the asynchronous copy API is probably mandatory in that sort of situation. Allocate two pinned buffers, and have the host copy into one while a kernel is processing the other. That should help hide the disk access latency a bit.