hi Folks:

This is weird. I have following kernel summing over all entries of an vector using reduction. I have tested it a few days ago, and it l looked successful. However, today, when I tested again, trying to sum from 10000 of 1s, I got 10016! Here is my kernel code:

```
__global__ void myreduce
( double *g_odata,
double *g_idata,
unsigned int n
)
{
__shared__ volatile double sdata[blockSize];
unsigned int tid = threadIdx.x;
unsigned int i = blockIdx.x*(blockSize*2) + tid;
unsigned int gridSize = blockSize*2*gridDim.x;
sdata[tid] = 0;
while (i < n)
{
sdata[tid] += g_idata[i] + g_idata[i+blockSize];
i += gridSize;
}
__syncthreads();
if (blockSize >= 512) { if (tid < 256) { sdata[tid] += sdata[tid + 256]; } __syncthreads(); }
if (blockSize >= 256) { if (tid < 128) { sdata[tid] += sdata[tid + 128]; } __syncthreads(); }
if (blockSize >= 128) { if (tid < 64) { sdata[tid] += sdata[tid + 64]; } __syncthreads(); }
if (tid < 32)
{
if (blockSize >= 64) sdata[tid] += sdata[tid + 32];
if (blockSize >= 32) sdata[tid] += sdata[tid + 16];
if (blockSize >= 16) sdata[tid] += sdata[tid + 8];
if (blockSize >= 8) sdata[tid] += sdata[tid + 4];
if (blockSize >= 4) sdata[tid] += sdata[tid + 2];
if (blockSize >= 2) sdata[tid] += sdata[tid + 1];
}
//if (tid == 0) g_odata[blockIdx.x] = sdata[0];
if (tid == 0) atomicAdd(&g_odata[0], sdata[0]);
}
```

What is more, when I sum from 100000 of 1s, I got 1.8535X10^193! In general, what happens is, when summing over small vectors ( up to about 10000 1s), I always got fractional results. For example:

sum 10 of 1s, I got 9.9999

sum 100 of 1s, I got 100.0006

sum 1000 of 1s, I got 999.9998

sum 10000 of 1s, I got 10016

When the size of vector goes up, I got:

sum 100000 of 1s, I got 1.8535X10^193

But what is even weird, when size kept going up, I got it right!

sum 1000000 of 1s, I got 1000000

sum 10000000 of 1s, I got 10000000

sum 100000000 of 1s, I got 100000000

Anyone please could tell me what is happening here? Same code, working fine days ago…then becoming crazy today…

Is this has anything to do with the GPU card? We are sharing a same GPU card on a remote Ubuntu machine…