I have implemented FCC in CUDA and wanted to use shared memory to enhance the speed up but due to overlapping memory access by threads I have bank conflicts. Is there a way I can use texture memory to enhance the speed up. In essence I want to know does texture memory support simultaneous memory accesses by threads or it also serializes them just like shared memory does?
I don’t think you can getter better performance using Textures if you are able to keep your data in shared memory and playing with it without the need of writing/ reading to/from global memory. Also, if your bank conflicts are not of a higher degree, you should keep it this way, else try for another approach that helps you reduce the bank conflicts.
I don’t think you can getter better performance using Textures if you are able to keep your data in shared memory and playing with it without the need of writing/ reading to/from global memory. Also, if your bank conflicts are not of a higher degree, you should keep it this way, else try for another approach that helps you reduce the bank conflicts.
Can you help me out to know what order of bank conflicts we are facing if 1 thread access a 1-d array from say 1-16 index and the other thread is accessing from 2-17.In my particular case potential solution lies in further lowering down the box size but is the above degree of bank conflict too high?
I know that if all the threads work in total synchronization then each one will access a position adjacent to each other which lie in different banks but in actual conditions such a synchronization is not possible so what could be the predicted order of bank conflict