I have a simple doubt that i need to solve. Let´s say I have every thread in a block of threads incrementing an int for example. Is there any way to ensure that every thread increments it and is not disturbed by any other, i mean if memory accesses are syncronized and locked.
Not using that code, no. If you want that counter increment to work correctly, you will need to use an atomic function. Shared memory atomic operations are only supported on compute capability 1.2 or greater devices.
Not using that code, no. If you want that counter increment to work correctly, you will need to use an atomic function. Shared memory atomic operations are only supported on compute capability 1.2 or greater devices.
Well, theres two ways of doing it. The easiest would be to replace ‘pixels_that_change++’ with ‘atomicAdd(&pixels_that_change, 1)’ in your first code sample and make sure your arch is sm_12 or higher. There will be a conflict every cycle for every thread so it will be slow, but it will work.
The other MUCH faster option is similiar to your second code block, just using a proper reduction algorithm using ints instead. Take a look at the reduction sample in the sdk. Also, I’m not entirely sure how bool arrays are stored in shared memory on the GPU, but if its less than 32-bits you may want to use an int array anyway to reduce the amount of bank conflicts?
Of course, if you have a Fermi capable card and you use bools, I can imagine you can pull off a clever trick using __ballot() and __popc() …
Well, theres two ways of doing it. The easiest would be to replace ‘pixels_that_change++’ with ‘atomicAdd(&pixels_that_change, 1)’ in your first code sample and make sure your arch is sm_12 or higher. There will be a conflict every cycle for every thread so it will be slow, but it will work.
The other MUCH faster option is similiar to your second code block, just using a proper reduction algorithm using ints instead. Take a look at the reduction sample in the sdk. Also, I’m not entirely sure how bool arrays are stored in shared memory on the GPU, but if its less than 32-bits you may want to use an int array anyway to reduce the amount of bank conflicts?
Of course, if you have a Fermi capable card and you use bools, I can imagine you can pull off a clever trick using __ballot() and __popc() …