I am interested in processing very large images that will not fit into
neither main CPU memory nor GPU memory.
So, I want to the the following:
- Read a tile asyncronoulsy from disk
- Copy it from the CPU memory to GPU memory
- Process the tile with CUDA kernel
Can anybody provide an example of how to do this
in the most efficient way so that disk IO,
copying from CPU to GPU memory and processing
can be overlapped/interleaved.
Any ideas/examples are appreciated.