Shared memory & sync, changes not visible

Two things…

First, why was this thread locked: http://forums.nvidia.com/index.php?showtopic=156615 ?

(and worse, without any reason/notification - very unprofessional and suspicious)

Second, I’m experiencing similar issues to the thread I linked above.

I have a kernel with a single block, 256 threads (8 warps). In this kernel I have a small segment of code similar to the following:

[codebox]…

shared float3 var;

if(threadIdx.x == blah)

{

var = some_value;

}

__syncthreads();

// At this point in the code, no other threads except the writing thread (threadIdx.x == blah) can see that var = some_value, not even threads in the same warp as the writing thread.[/codebox]

This appears to be a compiler bug of sorts, as I’ve failed to reproduce this test case multiple times on CUDA 2.3 or the Nexus alpha SDKs in a smaller kernel.

My main question really, is asking for advice on how to best diagnose/nail down the cause of the bug (I’m not overly familiar with ptx, and disassembling this kernel results in hundreds of thousands of lines of ptx, a few mb of text)…

Any advice would be greatly appreciated.

good question as to why that thread was locked, I have no idea. it was probably an accident. anyway, it’s unlocked now.

anyway yeah this looks like a compiler bug–can you post a repro case that fails on the 3.0 beta toolkit?

What type is “some_value”?

some_value’s type is a data structure (EuclideanTransform3f - see below)

[codebox]struct float3x3

{

float3 rows[3];

__device__ void operator=(const float3x3 &b)

{

	rows[0] = b.rows[0];

	rows[1] = b.rows[1];

	rows[2] = b.rows[2];

}

};

struct align(16) EuclideanTransform3f

{

float3x3 R;

float3 T;

__device__ void operator=(const EuclideanTransform3f &b)

{

	R = b.R;

	T = b.T;

}

};[/codebox]

Update: I just tested with various other types (float2/3/4) - they all exhibit the same problem.

Looking into a repro case (I’ve tried this before though, with little/no success - I suspect the size of my kernel is somehow making the compiler a bit crazy, that said it’s a fairly straight forward kernel)