Hi there, i need one info about that strange monster called “coalesced memory access”
I have read around more topics about this argument, in this forum and also in the Net. But I’m not able to answer at my question:
I have many threads that need to access and process exactly one byte of data. If each thread loads him byte from global memory into shared memory the memory access is not coalesced (i think…) So how can I speed up this copy?
The only answer that I’ve reached is retrieve that data through one linear texture, so only the first read access to global memory, the next ones access at texture cache.
Is this the only way or is there a better way to do this?
thanks for answers