IntelliSense: identifier "__syncthreads" is undefined

bbfredrik · February 22, 2012, 4:30pm

Hi,

I’m trying to do a parallel reduction but cant get __syncthreads() to work.

I’m running:
CUDA4.0
Nsight 2.1
Visual Studio 2010

Here is the code:

#include “cuda_runtime.h”
#include “device_launch_parameters.h”
#include <stdio.h>
#include <device_functions.h>

global void

reduce0(int *g_idata, int *g_odata)
{
extern shared int sdata;

// load shared mem
unsigned int tid = threadIdx.x;
unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;
sdata[tid] = g_idata[i];
__syncthreads();

// do reduction in shared mem
for(unsigned int s=1; s < blockDim.x; s *= 2) {
    // modulo arithmetic is slow!
    if ((tid % (2*s)) == 0) {
        sdata[tid] += sdata[tid + s];
    }
    __syncthreads();
}

// write result for this block to global mem
if (tid == 0) g_odata[blockIdx.x] = sdata[0];

}

int main()
{
//set parameters in host
const int threads =10;
const int blocks =1;
dim3 dimBlock(threads, 1, 1);
dim3 dimGrid(blocks, 1, 1);
int smemSize = threads * sizeof(int);
int in_list[threads];
int out_list[threads];
for(int i =0;i<threads; i++)
in_list[i] = i;

//set parameters i device
int *d_idata, *d_odata;
cudaMalloc((void**)&d_idata,sizeof(int)*threads);	
cudaMalloc((void**)&d_odata,sizeof(int)*threads);	
cudaMemcpy(d_idata,in_list,sizeof(int)*threads,cudaMemcpyHostToDevice);

//run kernel
reduce0<<< dimGrid, dimBlock, smemSize >>>(d_idata, d_odata);

//copy back to host
cudaMemcpy(out_list,d_odata,sizeof(int)*threads,cudaMemcpyDeviceToHost);	

cudaFree(d_idata);
cudaFree(d_odata);

return 0;

}

I get the following error when I compile the program, “IntelliSense: identifier “__syncthreads” is undefined”. Because I can’t sync my threads it constantly fails, due to race condition :( please help! I really need to get this to work.

Gilles_C · March 1, 2012, 9:53am

Hi,

Without arguing on the effectiveness of your reduction algorithm since I guess this is not the question, you have a bug here. Actually, you allow for your threads to overflow the shared memory area you defined.
By changing your
if ((tid % (2s)) == 0) {
sdata[tid] += sdata[tid + s];
}
into
if (((tid % (2s)) == 0) && ((tid + s) < blockDim.x)) {
sdata[tid] += sdata[tid + s];
}
You avoid this pitfall.

Now, regarding your issue with __syncthreads(), honestly I don’t know. Are you sure you use the right compiler?

HTH

Gilles

Topic		Replies	Views
__syncthreads(); is undefined need a help CUDA Programming and Performance	12	49173	May 2, 2021
Undefined reference to __syncthreads() CUDA Setup and Installation cuda	0	871	June 11, 2021
__syncthreads() problem CUDA Programming and Performance	1	168	May 10, 2024
Is __syncthreads() abnormal in for loop? CUDA Programming and Performance	3	3037	November 14, 2013
IS __syncthread() resetting shared memory values? CUDA Programming and Performance	2	713	August 9, 2018
I want to ask parallel reduction.... CUDA Programming and Performance	0	483	January 30, 2019
Reduction & block dimension Using the easiest reduction example of the SDK CUDA Programming and Performance	6	2208	November 23, 2009
__syncthreads() + shared memory issue CUDA Programming and Performance	7	5597	August 26, 2008
CUDA BUG? Shared memory contents differ across threads __syncthreads() not working??? CUDA Programming and Performance	1	1858	September 10, 2009
__syncthreads() problem __syncthreads() results in infinite loop CUDA Programming and Performance	5	2202	August 27, 2008

IntelliSense: identifier "__syncthreads" is undefined

Related topics