Hi, I want to get some good idea for reading same data.
I have to make some CUDA program. In that program, my thread must read same data
And that data has some dependency, so I have to read it from first to last.
If I use the shared memory, then I think it gives me a bank conflict.
If I use the global memory, then I think it is hard to make it coalsced.
But I think global or texture memory will be good because of the caching.
Which memory will be good?
And I saw that the texture memory is good for 2D array.
I don’t know what it means exactly. Could you let me know?
Shared memory has no bank conflict if all threads access the same word, which just results in a broadcast.
Bank conflicts result only from accesses to different words in the same bank.
Texture memory will also be fine, as will be global memory on compute capability 2.x devices (where it is cached).
Texture memory can use CUDA arrays for storage, which provide a way to map memory for 2D locality (so that array elements which are neighbors in 2D tend to map to the same cacheline). Textures can however also operate on linear memory, which is good for memory accesses with 1D locality.