Multiple threads writing to same address

Hello all,
I am pretty new to CUDA programming, and I have the following situation. I have a boolean array initialized to 0 in global memory that I want to write to, and I have multiple threads in parallel that either write a 1 to an element or do nothing. The problem is I will potentially have multiple threads writing to the same element of the array. I don’t know too much about how it works at the low-level, but I am assuming if I do multiple writes to the same memory location, then it will likely do the writes in serial and I will lose some parallelism.

I was wondering if the hardware is smart enough to take care of this, or if it will have an impact on efficiency. Is it possible to tell the hardware to only allow one write per array element and discard the rest? Because I don't need 5 threads to write a 1 to one element. 

Thank you!

No, there is no way to tell the memory controller to discard multiple writes to the same address. Is your boolean array small enough to fit a copy in shared memory? If so, you could have each block locally build the array with less contention, then write the 1 elements back out to the global array when finished.

Regardless, you will want to run this code on a Fermi-class GPU (GTX 400 and 500 series), where the on-chip L2 cache will reduce the penalty for multiple writes to the same address significantly.