problem with __syncthreads();

Naiilo · December 15, 2011, 2:07pm

As the topic title ive got a big problem.

This function seems not to work on my machine. I use MS Visual Studio 2010 Ultimate with Parallel Nsight.

First of all, progrmas using __syncthreads(); do compile, and work, but, in code function is always underlayed and

after hovering mouse pointer over it, it says that undefined - strangly it comile…

And furthermore, this instruction seems not to work at all.

__global__ void oceniajDev(float *tabDev, int row, int col, float *wynikDev)

{

	int bid = blockIdx.x;

	int tid = threadIdx.x;

	if(tid < col)

	{

		//atomicAdd(&wynikDev[bid], tabDev[bid*row+tid]);	

		wynikDev[bid] = tabDev[bid*row+tid];

		__syncthreads();				

	}

}

I need to avoid atomicAdd because i need to use floats, and there is no atomic operation that sums floats.

So i fugured out that i will synchronize tthreads in each block. Obwiously there is thread conflict, reading and writing to the same memory space in the same time. __syncthreads(); should make each thread to hold on untill it will close all of its operation befor stepping to another thread. Well after printing my results it seems not to work.

Is there some aditional header/library i should include? what m’I doing wrong? Any help would be welcome.

pasoleatis · December 15, 2011, 2:17pm

Oh no. What happens if tid>col. Those threads will never get to the sync instructions. Your kernel will crash.

Put __syncthreads(); outside of the if. Anyway for this kernel you do not need it.

The __syncthreads(); does not do what you think is doing, it only says stop here until all threads are getting are getting here. Use shared memory to save intermediate results, and then collect data in each block.

A suggestion (works only for blockDim.x power of 2):

__global__ void oceniajDev(float *tabDev, int row, int col, float *wynikDev)

{       

        shared double temp[blockDim.x];

        int bid = blockIdx.x;

        int tid = threadIdx.x;

if(tid < col)

        {

                //atomicAdd(&wynikDev[bid], tabDev[bid*row+tid]); 

                temp[tid]=tabDev[bid*row+tid];            

        }

        for(int ofs=blockDim.x/2;ofs<blockDim.x; ofs=ofs/2)

        {

                temp[tid]=temp[tid]+temp[tid+ofs];

                __syncthreads();                  

        }

        if(tid==0)

        {

                wynikDev[bid]=temp[0];

         }

}

Topic		Replies	Views
Problems with __syncthreads() CUDA Programming and Performance	2	951	May 4, 2013
__syncthreads screwes calculation CUDA Programming and Performance	2	3424	November 22, 2007
cuda syncthreads fail CUDA Programming and Performance	7	3880	February 22, 2013
IntelliSense: identifier "__syncthreads" is undefined CUDA Programming and Performance	1	7953	March 1, 2012
problem with '__syncthreads()' CUDA Programming and Performance	2	5198	August 23, 2011
__syncthreads() not syncing all threads in my thread block CUDA Programming and Performance synchronization	2	1171	April 6, 2021
syncthread race condition CUDA Programming and Performance	1	1606	December 13, 2012
Problem with __syncthreads() It does not work for threads > 64 CUDA Programming and Performance	4	969	April 11, 2012
Syncthreads and Stalling Kernels CUDA Programming and Performance	16	4150	August 26, 2010
shared memory and __syncthreads() one writer, n readers CUDA Programming and Performance	5	3041	August 25, 2008

problem with __syncthreads();

Related topics