Race conditions

Hi all,

Okay, in part of my cuda fortran code several executing threads access and modify the same array element which is causing my code to produce the wrong output because of race conditions.

The way around this usually would be to implement a lock around the code that’s causing the issue. Is there amything like this that can be used in cuda fortran or even just a way of getting the threads to execute one-by-one for that operation?

Cheers,
Crip_crop

Hi Crip_Crop,

No, at least not on a global level. You can investigate atomic operators but they only guarantee memory is visible to all threads, and do not lock globally. There are methods that attempt global synchronization but they only works if all blocks are active. Since blocks can be retired before others even start, the only true global synchronization point occurs between kernel calls.

  • Mat

I think the atomic add operator might solve my problem because the line of code which is resulting in the incorrect answer is something like this:

array(ij)=array(ij)+x

Because some of the threads are updating the same array element the answers aren’t accumulating properly in memory. The only problem is that it says in the programming guide…

Both arguments must be of type integer(kind=4)

…but my array is double precision! Is there any way round this that you know of?

Cheers,
Crip_crop

Hi Crip Crop,

but my array is double precision! Is there any way round this that you know of?

NVIDIA just added support for single precision atomics but there is currently no support for double precision atomics.

  • Mat