__syncthreads() not syncing the threads, although not in if statement

Sergnavov · April 26, 2016, 7:32am

__global__ void matVecMultCUDAKernel(int* aOnGPU, int* bOnGPU, int* cOnGPU, int matSize) {
	__shared__ int aShared[BLOCK_SIZE][BLOCK_SIZE];
	__shared__ int bShared[BLOCK_SIZE];

	int myRow = blockIdx.x * blockDim.x + threadIdx.x;
	int myRowInBlock = threadIdx.x, myColInBlock = threadIdx.y;
	int rowSum = 0;

	for (int m = 0; m < (matSize + BLOCK_SIZE - 1) / BLOCK_SIZE; m++) {
		aShared[myRowInBlock][myColInBlock] = getValFromMatrix(aOnGPU,myRow,m*BLOCK_SIZE+myColInBlock,matSize);
		if (myColInBlock==0) {bShared[myRowInBlock] = getValFromVector(bOnGPU,m*BLOCK_SIZE+myRowInBlock,matSize);}

		__syncthreads(); // Sync threads to make sure all fields have been written by all threads in the block to cShared and xShared

		if (myColInBlock==0) {
			for (int k=0;k<BLOCK_SIZE;k++) {
//				rowSum += getValFromMatrix(aOnGPU,myRow,m*BLOCK_SIZE+k,matSize) * getValFromVector(bOnGPU,m*BLOCK_SIZE+k,matSize);
				rowSum += aShared[myRowInBlock][k] * bShared[k];
			}
		}
	}

	if (myColInBlock==0 && myRow<matSize) {cOnGPU[myRow] = rowSum;}
}

The above kernel gives incorrent results for matrix-vector multiplication. In debugging, I can see that some warps in the block are executing when m=0 and others when m=1. If i decomment line 17 and comment line 18, the results are correct again. Is there something wrong with how I’m using _syncthreads()?

Sergnavov · April 26, 2016, 8:08am

I think i figured it out, i needed to add an additional __syncthreads(); between line 20 and 21, since otherwise threads may update the shared arrays before it has been read fully

Topic		Replies	Views
using syncthreads still at n00b status CUDA Programming and Performance	4	16067	December 1, 2010
The result is unpredictable. CUDA Programming and Performance	6	1135	October 25, 2013
__syncthreads screwes calculation CUDA Programming and Performance	2	3408	November 22, 2007
__syncthreads thread syncronization CUDA Programming and Performance	7	18710	October 27, 2009
Shared Memory Problems - __syncthreads() doesn't work? CUDA Programming and Performance	5	2651	December 29, 2011
syncthreads() in loop why does this work? CUDA Programming and Performance	1	13754	August 1, 2008
Why does single warp need syncthreads? CUDA Programming and Performance	2	1962	January 24, 2012
problem with '__syncthreads()' CUDA Programming and Performance	2	5188	August 23, 2011
problem with __syncthreads(); CUDA Programming and Performance	1	1689	December 15, 2011
does this code have problem? CUDA Programming and Performance	6	3941	December 9, 2007

__syncthreads() not syncing the threads, although not in if statement

Related topics