Your idea of using a large fixed-point accumulator (if I get it correctly) makes sense. If you know the order of magnitude of your numbers, even a 64-bit integer may be enough. If scaled properly, a 64-bit fixed-point number actually offers a higher precision than a 64-bit floating point (which has a 53-bit mantissa).
For instance, assuming an accumulator of type long long:
atomicAdd(&accumulator, llrint(scalebn(x, -(MAX_EXP-63)));
(you may have to cast &accumulator to unsigned long long* as there is no atomicAdd on signed ints)
where MAX_EXP is adjusted to be large enough to guarantee that no input, output or intermediate value ever exceed 2^MAX_EXP in magnitude. It is best if you can make a tight estimate, as overestimating the bound will increase the rounding error.
Then you may convert back the final result to floating-point with scalebn((double)accumulator, MAX_EXP-63).
An interesting side effect is that the sum becomes deterministic. Unlike a floating-point sum, the result of the fixed-point sum does not depend on the summation order.