If there is one thing I’m missing in cuda it’s atomic float operations!
Or more specific atomic float additions.
A lot of times i find my algorithms to calculate seperate pieces of a result
in a non regular fashion that need to be added up. Consider photon mapping
just to throw in a example. All the ray tracing can be done easily in parallel.
Just the hit location is not so predictable so adding that up needs fine granular
synchronisation. Having to do this with integers is possible but a major
bummer and limiting.
There is probably a way to do hierarchical locks but I don’t find that very
satisfying considering it could be so much simpler (and more efficient !!!?).
If i understand the ptx def. correctly it is prepared for atomic float operations
only the hardware does not support them yet. So my question is if atomic
float operations are going to be implemented and if so when.