Can't get all threads to hit a syncthreads barrier correctly...

This should be a simple issue, but it’s got me stumped. I’ve got an array of structs, and a set of inputs I’m trying to match. So I have each thread in the block look at its assigned struct elements to find a match. When a thread does, it alone gets to set the shared index variable. Then they meet up at a syncthreads. Then after all threads know which index it is via the shared index variable. But…it’s not working all the time. Some threads in the block are processing code after syncthreads while other threads in the block haven’t yet reached syncthreads.

The code is here:

__shared__ int index;
  int i = threadID;
  index = -1;
  __syncthreads(); 

  while(i<d_numItems){
    int strmatch=0;
    char const *s1 = label; //reset s1 and s2 back to the start
    char const *s2 = &(d_varDB[i].label[0]);

    //a one-line strcmp.  This should keep branching down to a minimum.
    while (!(strmatch = *(unsigned char *) s1 - *(unsigned char *) s2) && *s1++ && *s2++);

    //One and only one entry in d_varDB has these matching values.
    if (strmatch == 0 && d_varDB[i].domainID == patchID && d_varDB[i].matlIndex == matlIndex) {
      index = i; //This thread found it.
    }
    i = i + numThreads; //Try the next section.
  }

  //sync before return;
  __syncthreads();
  if (index == -1) {
    printf("Error, didn't find anything for %s patch %d matl %d with threadID %d and numthreads %d\n", label, patchID, matlIndex, threadID, numThreads);
    return NULL;
  }
  else return &d_varDB[index];

Some threads in the block are hitting the if (index == -1) and gets into the error handling code, while other threads in the block who should have found the index haven’t got that far yet.

I’ve always read that you shouldn’t put syncthreads inside conditions. But that you can put conditions between syncthreads. Am I mistaken on this? Or is it something stupid and obvious and I just can’t see it. :)

"Some threads in the block are hitting the if (index == -1) and gets into the error handling code, while other threads in the block who should have found the index haven’t got that far yet. "

so, are you thus saying that some of the threads essentially violate the __syncthreads() call, by simply jumping it?

i do not see an issue with some threads arriving at the if (index == -1) before others
but, such threads then ignoring the __syncthreads() would be something, and may indeed occur, under certain circumstances, and generally as an error condition

Yes, some threads go past the second __syncthreads after if (index == -1) before other threads reach the __syncthreads.

Ok, I figured it out. __syncthreads was working as intended. The issue is not as obvious.

The attached code is part of a function call. But I believe with CUDA function calls are inlined.

This function call was being called twice, one immediately after another. So what was happening was:

  • The correct index was found by 1 of the threads.
  • The second __syncthreads was called, all threads met up there.
  • Some threads in the block then made a second “function” call and reset index to -1
  • Meanwhile, those other threads were still in the first “function” call and hadn’t yet processed if (index == -1). They do, and see that index is -1.
  • The debugger hit at the breakpoint I set, and it looks like some threads are before the second __syncthreads and some are after second __syncthreads. But in reality some threads are well after the second __syncthreads and some are just getting started after the second __syncthreads.

My fix was to put another __syncthreads before I set index to -1.