Two things…
First, why was this thread locked: http://forums.nvidia.com/index.php?showtopic=156615 ?
(and worse, without any reason/notification - very unprofessional and suspicious)
Second, I’m experiencing similar issues to the thread I linked above.
I have a kernel with a single block, 256 threads (8 warps). In this kernel I have a small segment of code similar to the following:
[codebox]…
shared float3 var;
…
if(threadIdx.x == blah)
{
var = some_value;
}
__syncthreads();
…
// At this point in the code, no other threads except the writing thread (threadIdx.x == blah) can see that var = some_value, not even threads in the same warp as the writing thread.[/codebox]
This appears to be a compiler bug of sorts, as I’ve failed to reproduce this test case multiple times on CUDA 2.3 or the Nexus alpha SDKs in a smaller kernel.
My main question really, is asking for advice on how to best diagnose/nail down the cause of the bug (I’m not overly familiar with ptx, and disassembling this kernel results in hundreds of thousands of lines of ptx, a few mb of text)…
Any advice would be greatly appreciated.