help wanted global memory update

I am building something in which multiple threads update random float values/vectors in global device memory a bit like a histogram problem. It is possible that multiple threads need to update the same value/vector. Since these values add or subtract something, it does not matter in which order the threads update the global memory. It is just important they all do.
The array with float values/vectors to be updated is too large to store in shared memory.

How to tackle this problem with compute capability 1.1 ?

AtomicAdd can not be used with float only in CC2.0
Building semaphores in shared memory that block a thread when another is updating, can only be done with atomicInc/atomicDec which are only availble for shared memory with CC1.2

Does anyone have an idea how to proceed here?

You can use integer atomic to do float addition, check the thread…7&hl=atomic

it implement atomicAdd on “double”, you can modifiy it for “float”.

Thanks for the suggestion, I’ll look into it.

Strange thing is that it does not accept the function atomicAdd(f*address, float value) with sm_11 on the grounds that it already exists. If I rename it, it is accepted. Still the documentation says that atomicAdd for floats does not exist in CC 1.1