I think there are banks conflict in my code but I can’t say where…
I see shared memory as an array of 16 rows, each blocks hold a row :
shared_temp[( offset)*16 + lineNumber ]
(with lineNumber = thid%16 and offset is a position on a line).
Each threads will access only to one line. There is 16 lines, so each threads within a wrap reach only one bank.
When I tune my code in order to be sure that there is no conflicts (for example, all threads reach the same memory address), I get a high seed-up.
Where is the problem ?