I am relatively very new to CUDA. Please excuse me if I repeat the questions. My work in cuda requires processing of huge files ~ 46GB, which can not fit into either host or device memory.
To implement the code, I was hoping to memcpy device to host a separate stream and/or use textured memory. Am I on the right track.
I understand that CUDA is relatively low level, but is it possible to write generic code allocate the largest memory buffer?
Any article that discusses details about how to process huge data over GPU could be helpful to me.