Shiroi
January 23, 2010, 11:46am
1
Hello:
CUDA compiler 2.2 won’t let me synchronize after atomic operations, causing a warning:
“Advisory: Removed dead synchronization intrinsic”.
The code I’m using to test this is:
__global__ static void test(int* var)
{
__shared__ int acc;
if(threadIdx.x==0)
acc = 0;
__syncthreads();
atomicAdd(&acc,1);
__syncthreads();
if(threadIdx.x==0)
var[0] = acc;
}
I call this function with 256 threads. It should return n-1, 255 in this case.
Assembler removes the “__syncthreads();” after the atomicAdd, causing variable “acc” to take a random value when thread 0 stores it. Emulator compiles it ok.
I’m using Tesla C1060, with Ubuntu 8.04 and CUDA 2.2.
avidday
January 23, 2010, 12:16pm
2
That compiles fine without warning using the nvcc from the 2.3 toolkit:
.entry _Z4testPi (
.param .u64 __cudaparm__Z4testPi_var)
{
.reg .u32 %rv1;
.reg .u32 %r<7>;
.reg .u64 %rd<4>;
.reg .pred %p<3>;
.shared .s32 __cuda_acc0;
.loc 16 1 0
$LBB1__Z4testPi:
cvt.u32.u16 %r1, %tid.x;
mov.u32 %r2, 0;
setp.eq.u32 %p1, %r1, %r2;
@!%p1 bra $Lt_0_1794;
.loc 16 6 0
mov.s32 %r3, 0;
st.shared.s32 [__cuda_acc0], %r3;
$Lt_0_1794:
.loc 16 8 0
bar.sync 0;
.loc 15 103 0
mov.u64 %rd1, __cuda_acc0;
mov.s32 %r4, 1;
atom.shared.add.s32 %rv1, [%rd1], %r4;
.loc 16 12 0
bar.sync 0;
@!%p1 bra $Lt_0_2306;
.loc 16 15 0
ld.shared.s32 %r5, [__cuda_acc0];
ld.param.u64 %rd2, [__cudaparm__Z4testPi_var];
st.global.s32 [%rd2+0], %r5;
$Lt_0_2306:
.loc 16 16 0
exit;
$LDWend__Z4testPi:
} // _Z4testPi
You can see both sync instructions make it into the ptx.
Shiroi
January 23, 2010, 4:10pm
3
I’ll perform an upgrade as soon as possible then.
Thank you.