Multiple atomic operations

Hi,

Is there a way to simultaneously record three consecutive numbers x, y, z faster than using three separate atomic operations - atomicAdd?
Is it possible to somehow combine?

typedef struct align(16)
{
float x, y, z;
float radius;
} Force;

Force & force_j = in->force_d[j];

atomicAdd (&force_j.x, -fx);
atomicAdd (&force_j.y, -fy);
atomicAdd (&force_j.z, -fz);

No, but maybe you are able to convert your code to use a reduction scheme instead of atomic operations which might be faster.

Thank you.

I found the software implementations of mutexes for CUDA. Maybe they can help?

A software mutex in CUDA is almost certainly going to be far slower than calling 3 atomic operations. Atomics have gotten much faster in Kepler (and already were pretty fast in Fermi), so I would just use them directly.