Unfortunately I can’t have atomic adds in shared memory with CC 1.1.
I’d say, we can close this thread. Thank you for your tips and your time!
Unfortunately I can’t have atomic adds in shared memory with CC 1.1.
I’d say, we can close this thread. Thank you for your tips and your time!
One more thing:
You could use unsigned char for the bins. This would allow for 192 threads, which is near optimal. Synchronize every 255th iteration and do a temporary sum in integer registers…
One more thing:
You could use unsigned char for the bins. This would allow for 192 threads, which is near optimal. Synchronize every 255th iteration and do a temporary sum in integer registers…