Is __syncthreads compatibility with CUDA 1.1?

Jedi_Master · August 8, 2013, 11:46pm

VS 2012 shows that “__syncthreads is undefined” and I don’t know that is GPU fault or something with VS ?

sBc-Random · August 9, 2013, 12:21am

__syncthreads() ?

CudaaduC · August 9, 2013, 12:59am

Is the terms __syncthreads() showing up as red in VS 2012? Sometimes VS thinks it is undefined, but you can still compile (via nvcc) and the code works fine.

There is a way around that, but I have seen this before and usually it still is able to build.

Jedi_Master · August 9, 2013, 4:17pm

Project is able to build but program shows me wrong results. Here is kernel code:

__global__ void countDistanceFromFistVertex(bool *dataArray, bool *firsL, 								int *d_countData, int n, long long int twoPowerN, long long int arraySize)
{
	int id = threadIdx.x + blockDim.x * blockIdx.x; //blockIdx.x;
	int idx;

	__shared__ bool firstLane[512];

	if (id == 0)
	{
		for (int i=0; i<n; i++)
			firstLane[i]=dataArray[i];

	d_countData[id]=-3;
	}
	
	__syncthreads();
	
	int bufor=0;

	if (id!=0)
	{
		for (int i=0; i < n; i++)
		{
			idx = i + id * n;
			if (dataArray[idx] != firstLane[i])
			bufor++;
		}

//	if (bufor == 1 || bufor == 2)
	d_countData[id*2]=-bufor;
	}

}

It’s compare every line in matrix (represent as 1D array) to first lane of matrix. To speed up program I copy first lane to shared memory but it returns wrong result.

If I compare it in globala memory like this:

for (int i=0; i < n; i++)
		{
			idx = i + id * n;
			if (dataArray[idx] != dataArray[i])
			bufor++;
		}

it works fine. So I think is problem with sync.

Jedi_Master · August 9, 2013, 4:36pm

I know what was wrong! I do mistake when run kernel. Instead run one kernel for 512 threads, I run 512 blocks fwith one thread. So sync. don’t work beetwen blocks ;)

So I have another question. If I want shared memory in every block I should do smth like this:

if (id % 512 == 0)
	{
		for (int i=0; i<n; i++)
			firstLane[i]=dataArray[i];

	d_countData[id]=-3;
	}

?
Assuming I run 512 threads per block.