Where to store a picture about 5 MB


I have a kernel where each thread have to get some data from a picture. The data each thread needs is not necesary the same pixel neither is in any specific order, so I think that the only option possible here is using global memory. The problem is that global memory is really slow, and there’re not enough processes in the kernel to shadow the latence of each lecture. On the other hand, there’re not enough shared memory to copy the image as in the example of the matrix product (one pixel per thread or something similar).

What do you think should be the best option?



You could use Texture memory (reads are cached, and accessing near coordinate in 2D is improved, too). I never used it so I cannot tell you how faster it is than access in global memory. However, it seems that you cannot save data into Texture memory from device, you can just read them with your kernel.

I also have the same problem.

Maybe you can have a try on the texture memory.

The manual said texture memory is cached. So maybe it will be faster.

I don’t know how big the texture memory can be. My data set is about 256Mb, it worked well.