You may notice there’s no use of syncthreads() in the example. This is because the thread is the only one who writes or reads to those locations it “owns” so there’s no need to worry about other threads being able to “see” the results or not.
I fully understand all the shared memory bank conflicts, as they are well documented. My original question was dealing with bringing in data from global memory into shared memory and getting it to be coalesced (from/to global memory). If that is what you are addressing, then I’m not sure I understand; however, I fully understand shared memory bank conflicts.
The same spacing method I posted works with global memory reads and writes, if your data is per-thread ( a big if!). Just interleave the data in global memory, and your reads and writes will always be coalesced even though each thread is doing unpredictable random access into its “own” data array.
What doesn’t work is if threads have to share the same data addresses and read and write them unpredictably and independently. That’s pretty much the definition of uncoalescable.
Bank conflicts have to do with speed - syncthreads() has more to do with correctness. What spworley was underlining is that each thread should be guaranteed a nice, private spot in shared memory, with a green lawn and drinks on the terrace.
Bank conflicts have to do with speed - syncthreads() has more to do with correctness. What spworley was underlining is that each thread should be guaranteed a nice, private spot in shared memory, with a green lawn and drinks on the terrace.
[snapback]397981[/snapback]
Right, I understand this. Unfortunately, it doesn’t really address my original question at all, but I appreciate the help!
Right, I understand this. Unfortunately, it doesn’t really address my original question at all, but I appreciate the help!
[snapback]397990[/snapback]
NP! Can I ask you a question back?
Why are the reads/writes not coalesced in your example? Find the answer and I’m confident yhat you will be able to coalesce them.
NP! Can I ask you a question back?
Why are the reads/writes not coalesced in your example? Find the answer and I’m confident yhat you will be able to coalesce them.
[snapback]398077[/snapback]
Yeah, I think I have my answer a few posts up.
They are not coalesced because the first thread participating in the access is not accessing the first element in the array and it is not a multiple of 16, at least that’s my guess.
It’s my understanding that in order to be coalesced the access sould be:
baseAddress+tid%16==0
Which is not the case.
so skipping the first elements for thread 0 and then reading those elements afterwards might help a LOT.