Kernel Algorithm problem A kernel to compute pi on the GPU does not completely work.

global void DevicePIComputation(float a, int threadCount){
int idx = blockIdx.x
blockDim.x + threadIdx.x;
if (idx< threadCount){
int modVal = idx % 2;
if (modVal == 0)
a[idx] = 4.0 / ((2.0 * idx) + 1.0);
if (modVal == 1)
a[idx] = -4.0 / ((2.0 * idx) + 1.0);
}
}

When I run this kernel to compute pi on the GPU, it does not work for certain threadCount values, yet works perfectly on others and on the CPU. Does anyone know why this would be?