hi Folks:
This is weird. I have following kernel summing over all entries of an vector using reduction. I have tested it a few days ago, and it l looked successful. However, today, when I tested again, trying to sum from 10000 of 1s, I got 10016! Here is my kernel code:
__global__ void myreduce
( double *g_odata,
double *g_idata,
unsigned int n
)
{
__shared__ volatile double sdata[blockSize];
unsigned int tid = threadIdx.x;
unsigned int i = blockIdx.x*(blockSize*2) + tid;
unsigned int gridSize = blockSize*2*gridDim.x;
sdata[tid] = 0;
while (i < n)
{
sdata[tid] += g_idata[i] + g_idata[i+blockSize];
i += gridSize;
}
__syncthreads();
if (blockSize >= 512) { if (tid < 256) { sdata[tid] += sdata[tid + 256]; } __syncthreads(); }
if (blockSize >= 256) { if (tid < 128) { sdata[tid] += sdata[tid + 128]; } __syncthreads(); }
if (blockSize >= 128) { if (tid < 64) { sdata[tid] += sdata[tid + 64]; } __syncthreads(); }
if (tid < 32)
{
if (blockSize >= 64) sdata[tid] += sdata[tid + 32];
if (blockSize >= 32) sdata[tid] += sdata[tid + 16];
if (blockSize >= 16) sdata[tid] += sdata[tid + 8];
if (blockSize >= 8) sdata[tid] += sdata[tid + 4];
if (blockSize >= 4) sdata[tid] += sdata[tid + 2];
if (blockSize >= 2) sdata[tid] += sdata[tid + 1];
}
//if (tid == 0) g_odata[blockIdx.x] = sdata[0];
if (tid == 0) atomicAdd(&g_odata[0], sdata[0]);
}
What is more, when I sum from 100000 of 1s, I got 1.8535X10^193! In general, what happens is, when summing over small vectors ( up to about 10000 1s), I always got fractional results. For example:
sum 10 of 1s, I got 9.9999
sum 100 of 1s, I got 100.0006
sum 1000 of 1s, I got 999.9998
sum 10000 of 1s, I got 10016
When the size of vector goes up, I got:
sum 100000 of 1s, I got 1.8535X10^193
But what is even weird, when size kept going up, I got it right!
sum 1000000 of 1s, I got 1000000
sum 10000000 of 1s, I got 10000000
sum 100000000 of 1s, I got 100000000
Anyone please could tell me what is happening here? Same code, working fine days ago…then becoming crazy today…
Is this has anything to do with the GPU card? We are sharing a same GPU card on a remote Ubuntu machine…