As the topic title ive got a big problem.
This function seems not to work on my machine. I use MS Visual Studio 2010 Ultimate with Parallel Nsight.
First of all, progrmas using __syncthreads(); do compile, and work, but, in code function is always underlayed and
after hovering mouse pointer over it, it says that undefined - strangly it comile…
And furthermore, this instruction seems not to work at all.
__global__ void oceniajDev(float *tabDev, int row, int col, float *wynikDev)
{
int bid = blockIdx.x;
int tid = threadIdx.x;
if(tid < col)
{
//atomicAdd(&wynikDev[bid], tabDev[bid*row+tid]);
wynikDev[bid] = tabDev[bid*row+tid];
__syncthreads();
}
}
I need to avoid atomicAdd because i need to use floats, and there is no atomic operation that sums floats.
So i fugured out that i will synchronize tthreads in each block. Obwiously there is thread conflict, reading and writing to the same memory space in the same time. __syncthreads(); should make each thread to hold on untill it will close all of its operation befor stepping to another thread. Well after printing my results it seems not to work.
Is there some aditional header/library i should include? what m’I doing wrong? Any help would be welcome.