Any locking mechanism?


I’m trying to write a program in which the threads may write to the same location on device memory. Will this cause inconsistency? Do I need to use some locking mechanism? Thank you!

Yes, it will cause inconsistencies, the order in which blocks are executed is undefined. By the same note, there is no way to implement a locking mechanism. Depending on what you are doing, the atomic integer operations may make your algorithm possible, although they are only available on the newer (and slower) 8600 and 8500 cards.

If you could describe what you are trying to do, someone may have a suggestion how to implement it without require multiple writes to the same memory location.

I’m trying to do something like this:

there is a global array, say temp[k].

Every thread do it’s own work, and update some certain elements in this array. Different threads may be updating the same element. Is there any way to do this? Thanks!

for setting a byte you just write until the read is equal to the value you want. Don’t think this will work for operations like additions etc. So if all threads create same value you will be fine. Also I think performance will be bad for global memory.

Thank you very much. Yeath, I AM doing something like addition. So there’s no locking functions that I can use?

It was discussed here and is very slow for global memory.


Hi, Thank you very much. Your discussion is very helpful. You mentioned that it is very slow for global memory. Did you finally use this mechanism? Or isn’t there a feasible solution for this problem?

I have not used any collision write techniques, just discussed them as they could get one out of a difficult situation and should work according to the spec. Always best to design to avoid them if possible.

If you can limit collisions to within a warp then you can forget the syncs and branch out of the write loop once done. This could significantly improve performance.

If you can afford the global memory, accumulate results for each thread separately then use parallel reduction to get your answer.



Thank you so much for your replies. My application will always have collisions, and the memory for the common components is very large, so I guess I can’t afford keeping a copy for each thread. What if I put the common elements in global memory and put a lock for each element in shared memory?

Sounds like your problem is quite sparse, perhaps you should partition your nodes into non colliding sets and do each set sequentially - bit like the map colouring problem. Without coding up separate or hashed locks I could not comment on the performance. I suspect you would still end up with similar issues.