Does that mean that blocks 0…7 have been finished when 8…15 are processed?
The reason for the quesion is: Shared memory is quite small compared to 768MB device memory. So I have to reuse the shared memory in my kernel over and over again. I have to make sure that data in shared memory doesn’t get overwritten when I still need it.
Is my assumption right that always 8 blocks can use the complete shared memory without problems?