Can't get all threads to hit a syncthreads barrier correctly...

Brad_Peterson · March 21, 2015, 1:43am

This should be a simple issue, but it’s got me stumped. I’ve got an array of structs, and a set of inputs I’m trying to match. So I have each thread in the block look at its assigned struct elements to find a match. When a thread does, it alone gets to set the shared index variable. Then they meet up at a syncthreads. Then after all threads know which index it is via the shared index variable. But…it’s not working all the time. Some threads in the block are processing code after syncthreads while other threads in the block haven’t yet reached syncthreads.

The code is here:

__shared__ int index;
  int i = threadID;
  index = -1;
  __syncthreads(); 

  while(i<d_numItems){
    int strmatch=0;
    char const *s1 = label; //reset s1 and s2 back to the start
    char const *s2 = &(d_varDB[i].label[0]);

    //a one-line strcmp.  This should keep branching down to a minimum.
    while (!(strmatch = *(unsigned char *) s1 - *(unsigned char *) s2) && *s1++ && *s2++);

    //One and only one entry in d_varDB has these matching values.
    if (strmatch == 0 && d_varDB[i].domainID == patchID && d_varDB[i].matlIndex == matlIndex) {
      index = i; //This thread found it.
    }
    i = i + numThreads; //Try the next section.
  }

  //sync before return;
  __syncthreads();
  if (index == -1) {
    printf("Error, didn't find anything for %s patch %d matl %d with threadID %d and numthreads %d\n", label, patchID, matlIndex, threadID, numThreads);
    return NULL;
  }
  else return &d_varDB[index];

Some threads in the block are hitting the if (index == -1) and gets into the error handling code, while other threads in the block who should have found the index haven’t got that far yet.

I’ve always read that you shouldn’t put syncthreads inside conditions. But that you can put conditions between syncthreads. Am I mistaken on this? Or is it something stupid and obvious and I just can’t see it. :)

little_jimmy · March 21, 2015, 9:38am

"Some threads in the block are hitting the if (index == -1) and gets into the error handling code, while other threads in the block who should have found the index haven’t got that far yet. "

so, are you thus saying that some of the threads essentially violate the __syncthreads() call, by simply jumping it?

i do not see an issue with some threads arriving at the if (index == -1) before others
but, such threads then ignoring the __syncthreads() would be something, and may indeed occur, under certain circumstances, and generally as an error condition

Brad_Peterson · March 21, 2015, 4:25pm

Yes, some threads go past the second __syncthreads after if (index == -1) before other threads reach the __syncthreads.

Brad_Peterson · March 21, 2015, 7:46pm

Ok, I figured it out. __syncthreads was working as intended. The issue is not as obvious.

The attached code is part of a function call. But I believe with CUDA function calls are inlined.

This function call was being called twice, one immediately after another. So what was happening was:

The correct index was found by 1 of the threads.
The second __syncthreads was called, all threads met up there.
Some threads in the block then made a second “function” call and reset index to -1
Meanwhile, those other threads were still in the first “function” call and hadn’t yet processed if (index == -1). They do, and see that index is -1.
The debugger hit at the breakpoint I set, and it looks like some threads are before the second __syncthreads and some are after second __syncthreads. But in reality some threads are well after the second __syncthreads and some are just getting started after the second __syncthreads.

My fix was to put another __syncthreads before I set index to -1.

Topic		Replies	Views
Semantics of __syncthreads CUDA Programming and Performance	18	18147	January 2, 2008
Cuda: threads over 2 warps not synchronising correctly Legacy PGI Compilers	5	6931	May 26, 2011
why I do not have a problem with __syncthreads ? CUDA Programming and Performance	10	7225	May 26, 2010
Problems with __syncthreads() CUDA Programming and Performance	2	922	May 4, 2013
shared memory and __syncthreads() one writer, n readers CUDA Programming and Performance	5	3009	August 25, 2008
__syncthreads() is ignored by threads CUDA Programming and Performance	4	7662	December 5, 2011
__syncthreads(); doesn't work more than 32 threads in a block. Shared memory: make sure the shared CUDA Programming and Performance	5	1185	December 9, 2013
Strange __syncthreads behavior CUDA Programming and Performance	2	1069	January 21, 2014
does this code have problem? CUDA Programming and Performance	6	3932	December 9, 2007
__syncthreads() + shared memory issue CUDA Programming and Performance	7	5674	August 26, 2008

Can't get all threads to hit a syncthreads barrier correctly...

Related topics