Shared memory & sync, changes not visible

Smokey · January 25, 2010, 5:02am

Two things…

First, why was this thread locked: http://forums.nvidia.com/index.php?showtopic=156615 ?

(and worse, without any reason/notification - very unprofessional and suspicious)

Second, I’m experiencing similar issues to the thread I linked above.

I have a kernel with a single block, 256 threads (8 warps). In this kernel I have a small segment of code similar to the following:

[codebox]…

shared float3 var;

…

if(threadIdx.x == blah)

{

var = some_value;

}

__syncthreads();

…

// At this point in the code, no other threads except the writing thread (threadIdx.x == blah) can see that var = some_value, not even threads in the same warp as the writing thread.[/codebox]

This appears to be a compiler bug of sorts, as I’ve failed to reproduce this test case multiple times on CUDA 2.3 or the Nexus alpha SDKs in a smaller kernel.

My main question really, is asking for advice on how to best diagnose/nail down the cause of the bug (I’m not overly familiar with ptx, and disassembling this kernel results in hundreds of thousands of lines of ptx, a few mb of text)…

Any advice would be greatly appreciated.

tmurray · January 25, 2010, 5:05am

good question as to why that thread was locked, I have no idea. it was probably an accident. anyway, it’s unlocked now.

anyway yeah this looks like a compiler bug–can you post a repro case that fails on the 3.0 beta toolkit?

jma · January 25, 2010, 3:36pm

What type is “some_value”?

Smokey · January 26, 2010, 10:11pm

some_value’s type is a data structure (EuclideanTransform3f - see below)

[codebox]struct float3x3

{

float3 rows[3];

__device__ void operator=(const float3x3 &b)

{

	rows[0] = b.rows[0];

	rows[1] = b.rows[1];

	rows[2] = b.rows[2];

}

};

struct align(16) EuclideanTransform3f

{

float3x3 R;

float3 T;

__device__ void operator=(const EuclideanTransform3f &b)

{

	R = b.R;

	T = b.T;

}

};[/codebox]

Update: I just tested with various other types (float2/3/4) - they all exhibit the same problem.

Looking into a repro case (I’ve tried this before though, with little/no success - I suspect the size of my kernel is somehow making the compiler a bit crazy, that said it’s a fairly straight forward kernel)

Topic		Replies	Views
warp synchronization test CUDA Programming and Performance	5	1741	September 2, 2014
CUDA BUG? Shared memory contents differ across threads __syncthreads() not working??? CUDA Programming and Performance	1	1895	September 10, 2009
__syncthreads() + shared memory issue CUDA Programming and Performance	7	5704	August 26, 2008
Cuda: threads over 2 warps not synchronising correctly Legacy PGI Compilers	5	6957	May 26, 2011
Cuda reading shared variables in device CUDA Programming and Performance	4	1359	June 16, 2017
Shared memory and __syncthreads CUDA Programming and Performance	1	4146	August 4, 2008
Shared memory coherance problem within a thread CUDA Programming and Performance	3	814	November 17, 2010
Simple kernel producing wrong results: CUDA Programming and Performance	2	718	May 3, 2014
cuda threads synchronization CUDA Programming and Performance	3	15861	October 21, 2010
Mysteriously changing shared memory variable Race condition? Overwriting an array? CUDA Programming and Performance	12	6134	November 25, 2011

Shared memory & sync, changes not visible

Related topics