VS 2012 shows that “__syncthreads is undefined” and I don’t know that is GPU fault or something with VS ?
__syncthreads() ?
Is the terms __syncthreads() showing up as red in VS 2012? Sometimes VS thinks it is undefined, but you can still compile (via nvcc) and the code works fine.
There is a way around that, but I have seen this before and usually it still is able to build.
Project is able to build but program shows me wrong results. Here is kernel code:
__global__ void countDistanceFromFistVertex(bool *dataArray, bool *firsL, int *d_countData, int n, long long int twoPowerN, long long int arraySize)
{
int id = threadIdx.x + blockDim.x * blockIdx.x; //blockIdx.x;
int idx;
__shared__ bool firstLane[512];
if (id == 0)
{
for (int i=0; i<n; i++)
firstLane[i]=dataArray[i];
d_countData[id]=-3;
}
__syncthreads();
int bufor=0;
if (id!=0)
{
for (int i=0; i < n; i++)
{
idx = i + id * n;
if (dataArray[idx] != firstLane[i])
bufor++;
}
// if (bufor == 1 || bufor == 2)
d_countData[id*2]=-bufor;
}
}
It’s compare every line in matrix (represent as 1D array) to first lane of matrix. To speed up program I copy first lane to shared memory but it returns wrong result.
If I compare it in globala memory like this:
for (int i=0; i < n; i++)
{
idx = i + id * n;
if (dataArray[idx] != dataArray[i])
bufor++;
}
it works fine. So I think is problem with sync.
I know what was wrong! I do mistake when run kernel. Instead run one kernel for 512 threads, I run 512 blocks fwith one thread. So sync. don’t work beetwen blocks ;)
So I have another question. If I want shared memory in every block I should do smth like this:
if (id % 512 == 0)
{
for (int i=0; i<n; i++)
firstLane[i]=dataArray[i];
d_countData[id]=-3;
}
?
Assuming I run 512 threads per block.