__syncthreads() + shared memory issue

NitrousUK · August 20, 2008, 10:30am

I’ve got an issue I’ve been banging my head against for a couple of days, and I’m reaching the point of looking to CUDA to be at fault.

Reason being I have done a printf of each threads ID, plus the value of a shared variable, directly after a __syncthreads() call, and one thread has a different value from the others.

How can this be possible if __syncthreads is supposed to update all threads with the latest value in shared memory??

I’ve also tried making the shared boolean value ‘volatile’ too, but without success.
Also note the incorrect usage of __syncthreads error is because thread 9 is entering a branch with a __syncthreads() because its dataToLoad value is 1.

Heres the code, and its output:

CODE:
shared bool dataToLoad;
__syncthreads();
printf(“Thread:%2d dataLoad:%d\n”,threadIdx.x,dataToLoad);

OUTPUT:
Thread: 0 dataLoad:0
Thread: 1 dataLoad:0
Thread: 2 dataLoad:0
Thread: 3 dataLoad:0
Thread: 4 dataLoad:0
Thread: 5 dataLoad:0
Thread: 6 dataLoad:0
Thread: 7 dataLoad:0
Thread: 8 dataLoad:0
Thread: 9 dataLoad:1
Error:incorrect use of __syncthreads()

From nvidia programming guide, p.21 :
“Only after the execution of a __syncthreads() (Section 4.4.2) are writes to
shared variables guaranteed to be visible by other threads.”

As far as i can see, this should be impossible, I don’t see how any code or errors on my behalf could cause the above output. A shared variable should have the same value amongst all threads after a __syncthreads() call…

ps. I can’t paste the whole code since this is work for a company, top secret hush hush stuff ;)

Sarnath · August 20, 2008, 11:25am

Device emulation does not model the underlying hardware correctly!

Shared memory variables need to be initialized! Operations on Un-initialized variables are just undefined.

NitrousUK · August 20, 2008, 12:51pm

Cheers for the input Sarnath, but I’ve initialised the variable. It’s not until well into the running of the kernel the issue arises.

I Don’t suppose if anyone knows of documentation beyond the few sentences in the programming guide for the __syncthreads() function?
I’m wondering if threads sync with specific calls to the __syncthreads() function, or perhaps they simply block until all threads hit a call, rather than a specific one?

It seems to be whenever I set this shared boolean to true (which causes the ‘if’ following the code I pasted to then be true) i get the cuda error on the first following occurence of __syncthreads(). I wonder if maybe the cuda compiler is erroneously thinking that since that variable is liked to a potentially divergent if, that threads will diverge and miss __syncthread calls? Thought as far as I can see, if __syncthreads does what its supposed to, no threads should diverge.

redpill · August 21, 2008, 8:25pm

Sorry if this is too basic, but:

Is one thread executing __syncthreads() more than the others? That would be bad.

Also, can you reproduce this behavior OUTSIDE the emulator?

tmurray · August 21, 2008, 8:41pm

“Also note the incorrect usage of __syncthreads error is because thread 9 is entering a branch with a __syncthreads() because its dataToLoad value is 1.”

Yeah, you can’t have __syncthreads() in divergent branches.

smokyboy · August 23, 2008, 2:47pm

I believe I ran into a similar problem of incorrect thread synchronization in emulation mode. I am using a shared array as well. My code goes like this:

Â  Â extern __shared__ char array[];

 Â  Â int tx = threadIdx.x, ty = threadIdx.y;

 Â  Â int offset = SIZE * blockIdx.x + tx + DIM * ty;

 Â  Â array[offset] = 3;

 Â  Â __syncthreads();

When I run this program in debug+emulation mode, all threads except one seem to block on syncthreads. The rebel thread just comes through somehow, so the program deadlocks. The program runs fine on the GPU.

I haven’t really tried if the shared memory is to blame, but what worries me that the debugging is not possible (or can I debug kernels without emulation?).

I am using ubuntu, cuda 2.0beta, and netbeans 6.1.

smokyboy · August 23, 2008, 2:58pm

I tried removing the shared memory write in my code above and the program runs fine. As soon as I uncomment it, it deadlocks again. So this must be some emulation bug, right?

NitrousUK · August 26, 2008, 3:09pm

“Yeah, you can’t have __syncthreads() in divergent branches.” - tmurray

Yeah, this is true, however my code should be synchonous as all threads’ condition branchs operate on the same shared memory, so they should all evalute to true/false the same, if __syncthreads is updating the values obtained from shared memory as it should be.

smokyboy, I think your describing the same problem I’m having. I’ve managed to get around it by moving the shared boolean to outside the branch. I still havent figured out why it deadlocks.

I’m starting to wonder if theres some data structure describing the dependencies between variables and conditional logic between threads running in the background and its erroneously flagging a deadlock situation.

I’ve tried putting a __syncthreads before/after every line that operates on shared memory, so I dont see how user error could be causing a deadlock if every operation should be synchonrised. The only way there could be a deadlock then is if I were using the threadIdx variable in an if, which I’m not.

I’m hoping nvidia had the foresight to sync up individual __syncthreads calls, and not simply have a counter, otherwise if one were to call one more __syncthreads than the others that would explain why it falls through a call when others hold. But if they were prudent and ID’d each __syncthreads call, it should flag the error upon the next call to __syncthreads. Would be nice if Nvidia gave a bit more documentation about the guts of CUDA…

Topic		Replies	Views
Shared Memory Problems - __syncthreads() doesn't work? CUDA Programming and Performance	5	2605	December 29, 2011
shared memory and __syncthreads() one writer, n readers CUDA Programming and Performance	5	2978	August 25, 2008
CUDA BUG? Shared memory contents differ across threads __syncthreads() not working??? CUDA Programming and Performance	1	1868	September 10, 2009
syncthreads() issue CUDA Programming and Performance	3	1687	March 29, 2009
using syncthreads still at n00b status CUDA Programming and Performance	4	16035	December 1, 2010
__syncthreads question CUDA Programming and Performance	9	2048	September 30, 2009
what 'incorrect use of __syncthreads()' means ? CUDA Programming and Performance	4	2948	September 9, 2008
How do some threads in a block get past __syncthreads, while others are still before it? CUDA Programming and Performance	9	1709	July 23, 2015
apparent shared memory race condition despite using syncthreads CUDA Programming and Performance	5	14386	July 15, 2010
Cuda: threads over 2 warps not synchronising correctly Legacy PGI Compilers	5	6909	May 26, 2011

__syncthreads() + shared memory issue

Related topics