atomiccas usage

lamtnguyen · December 23, 2014, 5:31am

Hi all,

I have troubles using the function atomiccas as suggested in the book by Ruetsch and Fatica. Here is my code:

module test

implicit none
integer(kind=4),device::lock

contains

attributes(global) subroutine adds(a_d,asum_d,lock)
implicit none
real(kind=8),device::a_d(:),asum_d

integer(kind=4),device::lock

integer n


n=threadidx%x+(blockidx%x-1)*blockDim%x

do while(atomiccas(lock,0,1)==1) ! set lock
end do

asum_d=asum_d+a_d(n)

call threadfence()

lock=0 ! release lock


end subroutine

end module

!---------------------------
program test2
use cudafor
use test

implicit none

real(kind=8),allocatable::a(:)
real(kind=8),allocatable,device::a_d(:)
real(kind=8),device::asum_d
real(kind=8)::asum

integer n,j

n=1024

lock=0


allocate(a(n),a_d(n))

do j=1,n
   a(j)=1.0d0*j
end do

a_d=a

asum_d=0.0d0

call adds<<<(n-1)/32+1,32>>>(a_d,asum_d,lock)


istat=cudaDeviceSynchronize()

asum=asum_d

print*,asum

end

But when I compile and run this simple code, it never finishes, just hanging there.

Can you tell me what’s wrong?

Thanks,

Lam

MatColgrove · December 24, 2014, 3:25pm

Hi Lam,

I found a good explanation of the issue:

Basically, you can have only one thread in the block enter the critical section. Given that all threads in a warp will execute the same code at the same time, if one thread grabs the lock, it can’t proceed since the other threads are stuck in the do while loop.

You’ll see this in Ruetsch and Fatica’s example where the atomiccas is only executed by one thread and used to perform the final sum reduction.

Hope this helps,
Mat

lamtnguyen · December 25, 2014, 2:02am

Thanks Mat

Topic		Replies	Views
atomicCAS for mutiple blocks & mutiple threads - CUDA 3.2 - Fedora 10 CUDA Programming and Performance	7	2488	April 25, 2011
atomicCAS() doesn't compile! CUDA Programming and Performance	7	6049	April 20, 2011
atomicCAS() doesn't work! CUDA Programming and Performance	4	9147	July 22, 2010
atomicCAS issue (possible deadlock) CUDA Programming and Performance	5	3229	October 26, 2011
atomic locks CUDA Programming and Performance	15	12875	January 27, 2012
Try to use lock and unlock in CUDA CUDA Programming and Performance	1	19011	June 14, 2017
Problem with lock using atomicCAS CUDA Programming and Performance	3	3511	July 19, 2014
why this deadlocks? try to invoke a critical area CUDA Programming and Performance	11	6089	November 6, 2009
My lock and unlock function. Help~~~ CUDA Programming and Performance	5	3454	April 30, 2010
questions about using atomicCAS as a lock CUDA Programming and Performance	0	1327	November 10, 2011

atomiccas usage

Related topics